首页> 外文会议>World multiconference on systems, cybernetics and informatics >Document Multiplicity Elimination and Corpora Management
【24h】

Document Multiplicity Elimination and Corpora Management

机译:文档多重消除和语料库管理

获取原文

摘要

This paper deals with the process of corpora (large text collections) creation, their storing and retrieving. It is advantageous to include WWW sources easily accessible on the Internet into a new built corpus. It is true especially for less frequent languages, the example of which is Czech. However, the consequence of such approach is relatively high document multiplicity. The first part of this paper presents the method of document multiplicity elimination. The second part then deals with corpora management tools, considers its strengths and gives the possible directions of future developments of these systems.
机译:本文涉及Grouda(大型文本收集)创建,他们的存储和检索过程。在互联网上容易地访问WWW源是有利的,可以在互联网上进入新的构建语料库中。特别是对于较少的语言,这是捷克语的例子。然而,这种方法的结果是相对较高的文件多重性。本文的第一部分呈现了文档多重消除的方法。然后,第二部分处理Corpora Management Tools,考虑其优势,并提供了这些系统未来发展的可能指示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号