首页> 外国专利> DEVICE AND METHOD FOR CORRECTING BOTH MIS-SPACING WORDS AND MIS-SPELLED WORDS USING N-GRAM

DEVICE AND METHOD FOR CORRECTING BOTH MIS-SPACING WORDS AND MIS-SPELLED WORDS USING N-GRAM

机译:使用n-gram纠正误码词和误码词的装置和方法

摘要

A device and a method for correcting incorrect space and spell of a word at the same time by using a syllable n-gram are provided to correct the incorrect space and spell of the word at the same time by forming a syllable n-gram language model from a corpus excluding an error, extracting a grapheme unit conversion probability and a syllable conversion pattern, generating a grapheme and syllable unit candidate for the corpus to be corrected, and finding an optimal path with the formed language model. A syllable n-gram database builder(10) builds a syllable n-gram database(S2) by extracting a syllable n-gram from a refined corpus database(S1). A grapheme unit/syllable conversion database builder(20) builds a grapheme unit conversion probability database(S5) and a syllable conversion pattern database(S6) by extracting a grapheme unit conversion probability and a syllable conversion pattern from an error-included corpus(S3) and a corrected corpus(S4). A grapheme dividing/candidate generating part(30) generates a candidate by separating an input sentence into graphemes and searching the grapheme from the grapheme unit conversion probability database and the syllable conversion pattern database. An optimal path estimator(40) estimates an optimal path for the generated candidate by using output of the syllable n-gram database.
机译:提供了一种通过使用音节n-gram同时纠正单词的不正确空间和拼写的装置和方法,以通过形成音节n-gram语言模型来同时纠正单词的不正确空间和拼写。从排除错误的语料库中提取词素单位转换概率和音节转换模式,为要纠正的语料库生成一个字素和音节单位候选,并使用形成的语言模型找到最佳路径。音节n元语法数据库构建器(10)通过从精炼语料库(S1)中提取音节n元语法来构建音节n元语法数据库(S2)。字素单元/音节转换数据库构建器(20)通过从包含错误的语料库(S3)中提取字素单元转换概率和音节转换模式来构建音素单元转换概率数据库(S5)和音节转换模式数据库(S6)。 )和更正的语料(S4)。字素分割/候选生成部(30)通过将输入的句子分割成字素并从字素单位转换概率数据库和音节转换模式数据库中搜索该字素来生成候选。最优路径估计器(40)通过使用音节n元语法数据库的输出来估计所生成的候选者的最优路径。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号