Building parallel corpora by automatic title alignment using length-based and text-based approaches

Yang CC; Li KW

首页> 外文期刊>Information Processing & Management >Building parallel corpora by automatic title alignment using length-based and text-based approaches

【24h】

Building parallel corpora by automatic title alignment using length-based and text-based approaches

机译：使用基于长度和基于文本的方法通过自动标题对齐来构建并行语料库

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cross-lingual semantic interoperability has drawn significant attention in recent digital library and World Wide Web research as the information in languages other than English has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish, and French, has been widely explored; however, CUR across European languages and Oriental languages is still in the initial stage. To cross language boundary, corpus-based approach is promising to overcome the limitation of the knowledge-based and controlled vocabulary approaches but collecting parallel corpora between European language and Oriental language is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches and compare their performances in aligning English and Chinese titles of parallel documents available on the Web. (C) 2004 Elsevier Ltd. All rights reserved.

机译：跨语言语义互操作性在最近的数字图书馆和万维网研究中引起了极大的关注，因为英语以外的其他语言的信息呈指数增长。跨语言信息检索（CLIR）跨不同欧洲语言，例如英语，西班牙语和法语，已经得到了广泛的探索。但是，跨欧洲语言和东方语言的CUR仍处于起步阶段。为了跨越语言边界，基于语料库的方法有望克服基于知识和受控词汇方法的局限性，但是在欧洲语言和东方语言之间收集平行语料库并非易事。基于长度的方法和基于文本的方法是对齐并行文档的两种主要方法。在本文中，我们研究了使用这些方法的几种技术，并比较了它们在对齐网络上可用的并行文档的英文和中文标题方面的性能。（C）2004 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Processing & Management》 |2004年第6期|p. 939-955|共17页
作者
Yang CC; Li KW;
展开▼
作者单位

Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类图书馆学、图书馆事业;
关键词
cross-lingual information retrieval; parallel corpus; sentence alignment; covert translation; CHINESE;

机译：跨语言信息检索平行语料库句子对齐隐蔽翻译中文;

相似文献

外文文献
中文文献
专利

1. Automatically Building VoIP Speech Parallel Corpora for Arabic Dialects [J] . KHALID ALMEMAN ACM transactions on Asian language information processing . 2018,第1期

机译：自动为阿拉伯语建立VoIP语音并行语料库
2. Sentence Level Alignment of Digitized Books Parallel Corpora [J] . Laukaitis Algirdas, Plikynas Darius, Ostasius Egidijus Informatica . 2018,第4期

机译：数字书籍平行语料库的句级对齐
3. Bilingual terminology extraction from parallel corpora using chunk-based alignment [J] . Lieve Macken, Els Lefever, Veronique Hoste Terminology . 2013,第1期

机译：使用基于块的对齐方式从并行语料库中提取双语术语
4. Building Parallel Corpora by Automatic Title Alignment [C] . Christopher C. Yang, Kar Wing Li, Lecture Notes in Computer Science 2555 International conference on Asian digital libraries . 2002

机译：通过自动标题对齐构建平行语料库
5. Parallel automatic term extraction from large Web corpora. [D] . Zhang, Lingyan. 2004

机译：从大型Web语料库中并行自动提取术语。
6. Contribution to Terminology Internationalization by Word Alignment in Parallel Corpora [O] . Louise Deléger, Magnus Merkel, Pierre Zweigenbaum 2006

机译：平行语料库中单词对齐对术语国际化的贡献
7. Automatic Building and Using Parallel Resources for SMT from Comparable Corpora [O] . Santanu Pal, Partha Pakray, Sudip Kumar Naskar, 2015

机译：可比公司smT自动构建和使用并行资源

Building parallel corpora by automatic title alignment using length-based and text-based approaches

摘要

著录项

相似文献

相关主题

期刊订阅