首页> 美国政府科技报告 >Methodology for Empirical Performance Evaluation of Page Segmentation Algorithms
【24h】

Methodology for Empirical Performance Evaluation of Page Segmentation Algorithms

机译:页面分割算法实证绩效评价方法

获取原文

摘要

Document page segmentation is a crucial preprocessing step in Optical Character Recognition (OCR) systems. While numerous page segmentation algorithms have been proposed, there is relatively less literature on comparative evaluation--empirical or theoretical-- of these algorithms. Fore the existing performance evaluation methods, two crucial components are usually missing: (1) automatic training of algorithms with free parameters and (2) statistical and error analysis of experimental results. In this thesis, we use the following five-step methodology to quantitatively compare the performance of page segmentation algorithms: (1) First we create mutually exclusive training and test datasets with groundtruth, (2) we then select a meaningful and computable performance metric, (3) an optimization procedure is then used to search automatically for the optimal parameter values of the segmentation algorithms, (4) the segmentation algorithms are then evaluated on the test dataset, and finally (5) a statistical error analysis is performed to give the statistical significance of the experimental results. The automatic training of algorithms is posed as an optimization problem and a direct search method -- the simplex method -- is sued to search for a set of optimal parameter values. A paired-model statistical analysis and an error analysis are conducted to provide confidence intervals for the experimental results and to interpret the functionalities of algorithms. This methodology is applied to the evaluation of five page segmentation algorithms, of which three are representative research algorithms and the other two are well-known commercial products, on 978 images from the University of Washington III dataset. It is found that the performances of the Voronoi, Docstrum and Caere segmentation algorithms are not significantly different from each other, but they are significantly better than that of ScanSoft's segmentation algorithm, which in turn is significantly better than X-Y cut.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号