首页> 外文会议>FTRA international conference on future information technology >TermExtract: Accuracy of Compound Noun Detection in Japanese
【24h】

TermExtract: Accuracy of Compound Noun Detection in Japanese

机译:术语缩题:日语中复合名词检测的准确性

获取原文

摘要

Term recognition in the Japanese language is known as one of the challenging problem in natural language processing and information retrieval. We often use morphological analyzers to process Japanese documents. These tools usually do not recognize compound nouns. These nouns are combinations of single nouns expressing different meaning compared to basic nouns. Morphological analyzers usually separate compound nouns into single nouns. Therefore reconstructing compound nouns is essential to preserve text semantics. There is a tool called TermExtract to do the aforementioned reconstruction. In this study we evaluate its accuracy. To identify terms created by TermExtract, online resources are utilized. They are the ALC online dictionary, Wiki-pedia and Google phrase search service. Experiments are conducted with abstracts of scientific documents from the NTCIR-1 collection. We found that TermExtract is able to reconstruct 36.23% of all compound nouns in the corpus. Most of these nouns belong to scientific terminology.
机译:日语中的一期识别被称为自然语言处理和信息检索中的具有挑战性问题之一。我们经常使用形态分析仪来处理日语文件。这些工具通常不识别复合名词。与基本名词相比,这些名词是表达不同含义的单个名词的组合。形态学分析仪通常将复合名词分成单个名词。因此,重建复合名词对于保留文本语义至关重要。有一个名为Termextract的工具,以进行上述重建。在这项研究中,我们评估其准确性。要识别由TermExtract创建的术语,使用在线资源。它们是ALC在线词典,Wiki-PEDIA和Google短语搜索服务。实验与来自NTCIR-1收集的科学文档的摘要进行。我们发现术赛术语能够在语料库中重建36.23%的所有复合名词。这些名词中的大多数属于科学术语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号