首页> 中文期刊> 《微处理机》 >一种汉藏双语句子对齐算法

一种汉藏双语句子对齐算法

         

摘要

双语语料库建设及其自动对齐研究对计算语言学的发展具有重要意义.双语对齐技术是加工双语文本的核心,对齐效果的好坏直接影响了以后工作的进行.基于汉藏双语的实际情况,提出了一种利用句子长度、相似度和锚点信息的汉藏双语句子对齐方法,该方法用相似度找到句子的锚点,用锚点将双语文本分割成几个分块,在对应双语分块中用基于长度的对齐实现句子的对齐.通过测试数据进行的实验结果显示,这种方法有着良好的准确率,有效地解决了汉藏双语真实文本的句子对齐问题.%Bilingual corpus and its automatic alignment are of great significance to the development of computational linguistics. As the key technology during the course of building corpus, bilingual alignment technology has a direct impact on the future work process. Based on the actual situation of Chinese -Tibetan bilingual, a Chinese- Tibetan bilingual sentence aligning method is proposed in this paper,taking advantage of the length and similarity of sentences as well as the anchor information. In this method, after identifying the anchor of a sentence with the similarity measure, the lingual text will be separated into several fragments with the anchor information. Eventually, these text fragments could be aligned to response their counterparts based upon the length of sentences. According to experiments on plenty of testing data, this method manages to tackle the problem about aligning real Chinese - Tibetan bilingual texts effectively with high standard of accuracy.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号