【24h】

Building Parallel Corpora by Automatic Title Alignment

机译:通过自动标题对齐构建并行语料库

获取原文
获取原文并翻译 | 示例

摘要

Cross-lingual semantic interoperability has drawn significant research attention recently, as the number of digital libraries in non-English languages has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish and French, has been widely explored, but CLIR across European and Oriental languages is still at the initial stages. To cross the language boundary, a corpus-based approach shows promise of overcoming the limitations of knowledge-based and controlled vocabulary approaches. However, collecting parallel corpora between European and Oriental languages is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches, and compare their performance in aligning English and Chinese titles of parallel documents available on the Web.
机译:随着非英语数字图书馆的数量呈指数增长,跨语言语义互操作性最近引起了广泛的研究关注。跨语言信息检索(CLIR)跨英语,西班牙语和法语等不同欧洲语言的语言已得到广泛研究,但跨欧洲和东方语言的CLIR仍处于起步阶段。为了跨越语言边界,基于语料库的方法显示出克服基于知识和受控词汇方法的局限性的希望。但是,在欧洲和东方语言之间收集平行语料库并非易事。基于长度的方法和基于文本的方法是对齐并行文档的两种主要方法。在本文中,我们研究了使用这些方法的几种技术,并比较了它们在对齐网络上可用的并行文档的英文和中文标题时的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号