首页> 外国专利> System and methods for arabic text recognition and arabic corpus building

System and methods for arabic text recognition and arabic corpus building

机译:阿拉伯文本识别和阿拉伯语语料库构建的系统和方法

摘要

A method for automatically recognizing Arabic text includes building an Arabic dataset comprising Arabic text files written in different writing styles and actual meanings of the Arabic text corresponding to each of the Arabic text files (100), storing writing-style indices in association with the Arabic text files (105), digitizing a line of Arabic characters to form an array of pixels (321-323) (130), dividing the line of the Arabic characters into line images, (311-313) (120), forming a text feature vector from the line images (311-313) (140), training a Hidden Markov Model using the Arabic text files and ground truths in the Arabic dataset in accordance with the writing-style indices (160), and feeding the text feature vector into a Hidden Markov Model to recognize the line of Arabic characters (170).
机译:一种自动识别阿拉伯文本的方法,包括建立包括以不同书写风格书写的阿拉伯文本文件和与每个阿拉伯文本文件(100)相对应的阿拉伯文本的实际含义的阿拉伯数据集,与阿拉伯文本相关联地存储书写样式索引文本文件(105),将阿拉伯字符行数字化以形成像素阵列(321-323)(130),将阿拉伯字符行分成线图像(311-313)(120),形成文本线图像中的特征向量(311-313)(140),根据书写风格索引使用阿拉伯文本文件和阿拉伯数据集中的地面真理训练隐马尔可夫模型(160),并提供文本特征向量进入隐马尔可夫模型以识别阿拉伯字符行(170)。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号