首页> 外文期刊>Information Processing & Management >Building parallel corpora by automatic title alignment using length-based and text-based approaches
【24h】

Building parallel corpora by automatic title alignment using length-based and text-based approaches

机译:使用基于长度和基于文本的方法通过自动标题对齐来构建并行语料库

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-lingual semantic interoperability has drawn significant attention in recent digital library and World Wide Web research as the information in languages other than English has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish, and French, has been widely explored; however, CUR across European languages and Oriental languages is still in the initial stage. To cross language boundary, corpus-based approach is promising to overcome the limitation of the knowledge-based and controlled vocabulary approaches but collecting parallel corpora between European language and Oriental language is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches and compare their performances in aligning English and Chinese titles of parallel documents available on the Web. (C) 2004 Elsevier Ltd. All rights reserved.
机译:跨语言语义互操作性在最近的数字图书馆和万维网研究中引起了极大的关注,因为英语以外的其他语言的信息呈指数增长。跨语言信息检索(CLIR)跨不同欧洲语言,例如英语,西班牙语和法语,已经得到了广泛的探索。但是,跨欧洲语言和东方语言的CUR仍处于起步阶段。为了跨越语言边界,基于语料库的方法有望克服基于知识和受控词汇方法的局限性,但是在欧洲语言和东方语言之间收集平行语料库并非易事。基于长度的方法和基于文本的方法是对齐并行文档的两种主要方法。在本文中,我们研究了使用这些方法的几种技术,并比较了它们在对齐网络上可用的并行文档的英文和中文标题方面的性能。 (C)2004 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号