Segmentation is a key step in current OCR systems. It has beenestimated that half the errors in character recognition are due tosegmentation. A novel approach that performs OCR without thesegmentation step was developed. The approach starts by extractingsignificant geometric features from the input document image of thepage. Each feature then votes for the character that could havegenerated that feature. Thus, even if some of the features are occludedor lost due to degradation, the remaining features can successfullyidentify the character. In extreme cases, the degradation may be severeenough to prevent recognition of some of the characters in a word. Insuch cases, a lexicon-based word recognition technique is used toresolve ambiguity. Inexact matching and probabilistic evaluation used inthe technique make it possible to identify the correct word, bydetecting a partial set of characters. The authors first present anoverview of their segmentation-free OCR system and then focus on theword recognition technique. Preliminary experimental results show thatthis is a very promising approach
展开▼