首页> 外文会议>International Conference on Document Analysis and Recognition >Cross-Language Sensitive Words Distribution Map: A Novel Recognition-Based Document Understanding Method for Uighur and Tibetan
【24h】

Cross-Language Sensitive Words Distribution Map: A Novel Recognition-Based Document Understanding Method for Uighur and Tibetan

机译:跨语言敏感词分布图:一种基于识别的维吾尔文和藏文文献理解新方法

获取原文

摘要

Cross-language document recognition and understanding have urgent realistic needs and extensive application prospects. In this paper, we propose a novel recognition-based Uighur and Tibetan document understanding method, termed "cross-language sensitive words distribution map" (CSWDM). In our unified recognition-understanding framework, digital Uighur/Tibetan document images are first recognized using OCR technology, and then CSWDM labels the Chinese information of sensitive words on the recognized transcriptions or directly on the original digital images, thus the space location and occurrence frequency of these sensitive words can be intuitively represented. With such information, readers can roughly understand the theme and meaning of the cross-language documents.
机译:跨语言文档的识别和理解具有迫切的现实需求和广阔的应用前景。在本文中,我们提出了一种新的基于识别的维吾尔语和藏文文献理解方法,称为“跨语言敏感词分布图”(CSWDM)。在我们统一的识别理解框架中,首先使用OCR技术识别数字维吾尔/西藏文献图像,然后CSWDM在已识别的转录物上或直接在原始数字图像上标记敏感词的中文信息,从而确定空间位置和出现频率这些敏感词中的一个可以直观地表示出来。借助这些信息,读者可以大致了解跨语言文档的主题和含义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号