首页> 外文会议>12th International Conference on Frontiers in Handwriting Recognition >A Full-Text Search System for Images of Hand-Written Cursive Documents
【24h】

A Full-Text Search System for Images of Hand-Written Cursive Documents

机译:手写草书文档图像的全文搜索系统

获取原文

摘要

We propose a full-text search technique for image-scanned documents that does not recognize individual characters. The system is as fast as a full-text search of machine-readable documents. Such a system is important when working with historical handwritten manuscripts. The proposed method works independently of differences in language and font because it uses a new pseudo-coding scheme based on the statistical features of character shapes. We evaluated our method in recall-precision curves for n-gram-based query strings in Japanese manuscripts and word-based query strings in English manuscripts using two types of image features and two different pseudo-coding schemes. Results demonstrate that the precision reached over 50% at a recall point of 80% for 3-gram queries in the Japanese manuscripts. Results also indicate that our pseudo-code is suitable for applications that use machine-learning techniques. The combination of an HMM-based filtering method and our pseudo-code can significantly improve performance in terms of retrieval precision.
机译:我们为无法识别单个字符的图像扫描文档提出了一种全文搜索技术。该系统的速度与机器可读文档的全文搜索一样快。当使用历史手写手稿时,这样的系统很重要。所提出的方法独立于语言和字体的差异而工作,因为它基于字符形状的统计特征使用了一种新的伪编码方案。我们使用两种类型的图像特征和两种不同的伪编码方案,针对日文手稿中基于n-gram的查询字符串和英文手稿中基于单词的查询字符串,在召回精度曲线中评估了我们的方法。结果表明,对于日语手稿中的3克查询,在80%的召回点时,精度达到了50%以上。结果还表明,我们的伪代码适用于使用机器学习技术的应用程序。基于HMM的过滤方法和我们的伪代码的组合可以显着提高检索精度方面的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号