首页> 外国专利> Multi-domain machine translation system with training data clustering and dynamic domain adaptation

Multi-domain machine translation system with training data clustering and dynamic domain adaptation

机译:具有训练数据聚类和动态域自适应的多域机器翻译系统

摘要

A machine translation system capable of clustering training data and performing dynamic domain adaptation is disclosed. An unsupervised domain clustering process is utilized to identify domains in general training data that can include in-domain training data and out-of-domain training data. Segments in the general training data are then assigned to the domains in order to create domain-specific training data. The domain-specific training data is then utilized to create domain-specific language models, domain-specific translation models, and domain-specific model weights for the domains. An input segment to be translated can be assigned to a domain at translation time. The domain-specific model weights for the assigned domain can be utilized to translate the input segment.
机译:公开了一种能够对训练数据进行聚类并执行动态域自适应的机器翻译系统。利用无监督的域聚类过程来识别通用训练数据中的域,这些数据可以包括域内训练数据和域外训练数据。然后,将常规训练数据中的段分配给域,以创建特定于域的训练数据。然后,将特定领域的训练数据用于创建特定领域的语言模型,特定领域的翻译模型以及特定领域的模型权重。可以在翻译时将要翻译的输入段分配给域。可以利用已分配域的特定于域的模型权重来翻译输入段。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号