...
首页> 外文期刊>The Electronic Library >Exploration and study of multilingual thesauri automation construction for digital libraries in China
【24h】

Exploration and study of multilingual thesauri automation construction for digital libraries in China

机译:中国数字图书馆多语言叙词表自动化建设的探索与研究

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - The paper aims to explore multilingual thesauri automation construction based on the freely available digital library resources. The key methods and study results are presented in the paper. It also proposes a way that terms are automatically extracted from multilingual parallel corpus. Design/methodology/approach - The study adopted the technology of natural language processing to analyze the linguistics characteristics of terms, and combined this with statistical analyses to extract the terms from technological documents. The methods consist of automatically extracting and filtering terms, judging and building relationship among terms, building the multilingual parallel corpus, and extracting term pairs between Chinese and foreign languages through calculating their associated probability. The experiments run on the Java test platform. Findings - The study obtains the following conclusions: finding the similarities and differences between the Chinese thesaurus standard and international thesaurus standard. The methods for automatically extracting terms and building relationships among them are presented. Eventually the multilingual terms' translation sets are generated based on real corpora. The results of the study show that the proposed methods can obtain better performance. The effect of automatic terms' translation alignment method is better than that of traditional IBM model method.Practical implications - The study results can provide references for further study and application of multilingual thesauri automation construction using Chinese as a pivot Originality/value - The paper proposes new ideas on thesaurus automation construction in the digital age. The presented method based on linguistics and statistics is a new attempt According to the experimental results, this exploration and study is innovative and valuable. In addition, these ideas and methods give a good start for improving information services of the PRC's National Science and Technology Digital Library.
机译:目的-本文旨在探索基于免费数字图书馆资源的多语言叙词表自动化建设。本文介绍了关键方法和研究结果。它还提出了一种从多语言并行语料库中自动提取术语的方法。设计/方法/方法-该研究采用自然语言处理技术来分析术语的语言特性,并将其与统计分析相结合以从技术文档中提取术语。该方法包括自动提取和过滤术语,判断和建立术语之间的关系,建立多语言并行语料库以及通过计算中外语言之间的相关概率来提取术语对。实验在Java测试平台上运行。调查结果-该研究得出以下结论:发现中国词库标准与国际词库标准之间的异同。提出了自动提取术语并在它们之间建立关系的方法。最终,基于实际语料库生成多语言术语的翻译集。研究结果表明,所提出的方法可以获得较好的性能。实用的术语翻译对齐方法的效果要优于传统的IBM模型方法。实际意义-研究结果可为进一步研究和应用以中文为中心的多语言叙词表自动化构建提供参考/价值-本文提出数字时代叙词表自动化建设的新思路。提出的基于语言学和统计学的方法是一种新的尝试。根据实验结果,这种探索和研究具有创新性和价值。此外,这些想法和方法为改善中国国家科技数字图书馆的信息服务提供了良好的开端。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号