首页> 美国卫生研究院文献>The Scientific World Journal >Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis
【2h】

Recognition of the Script in Serbian Documents Using Frequency Occurrence and Co-Occurrence Analysis

机译:使用频率出现和共现分析来识别塞尔维亚文档中的脚本

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Any document in Serbian language can be written in two different scripts: Latin or Cyrillic. Although characteristics of these scripts are similar, some of their statistical measures are quite different. The paper proposed a method for the extraction of certain script from document according to the occurrence and co-occurrence of the script types. First, each letter is modeled with the certain script type according to characteristics concerning its position in baseline area. Then, the frequency analysis of the script types occurrence is performed. Due to diversity of Latin and Cyrillic script, the occurrence of modeled letters shows substantial statistics dissimilarity. Furthermore, the co-occurrence matrix is computed. The analysis of the co-occurrence matrix draws a strong margin as a criteria to distinguish and recognize the certain script. The proposed method is analyzed on the case of a database which includes different types of printed and web documents. The experiments gave encouraging results.
机译:塞尔维亚语的任何文档都可以用两种不同的脚本编写:拉丁文或西里尔文。尽管这些脚本的特征相似,但是它们的某些统计量却大不相同。提出了一种根据脚本类型的发生和共现从文档中提取特定脚本的方法。首先,根据字母在基线区域中的位置特征,使用特定的脚本类型对每个字母建模。然后,执行脚本类型发生的频率分析。由于拉丁文和西里尔文文字的多样性,模型字母的出现显示出统计学上的巨大差异。此外,计算共现矩阵。对共现矩阵的分析得出了很大的余量作为区分和识别特定脚本的标准。在包含不同类型的打印文档和Web文档的数据库的情况下分析了所提出的方法。实验给出了令人鼓舞的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号