Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model,by employing word clustering and syntactic parsing model.Firstly, In order to overcome the defects of the classical HMM, Markov family model(MFM), a new statistical model was introduced. Secondly, to solve the problem of data sparseness, we propose a bottom-to-up hierarchical word clustering algorithm. Then we combine syntactic parsing with part-of-speech tagging. The Part-ofSpeech tagging experiments show that the improved PartOf-Speech tagging model has higher performance than Hidden Markov models(HMMs) under the same testing conditions, the precision is enhanced from 94.642% to97.235%.
展开▼