首页>
外国专利>
System and Method for Unsupervised Text Normalization Using Distributed Representation of Words
System and Method for Unsupervised Text Normalization Using Distributed Representation of Words
展开▼
机译:使用单词的分布式表示进行无监督文本规范化的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A system, method and computer-readable storage devices for providing unsupervised normalization of noisy text using distributed representation of words. The system receives, from a social media forum, a word having a non-canonical spelling in a first language. The system determines a context of the word in the social media forum, identifies the word in a vector space model, and selects an “n-best” vector paths in the vector space model, where the n-best vector paths are neighbors to the vector space path based on the context and the non-canonical spelling. The system can then select, based on a similarity cost, a best path from the n-best vector paths and identify a word associated with the best path as the canonical version.
展开▼