【24h】

Lexico-syntactic normalization model for noisy SMS text

机译:嘈杂短信文本的词汇句法归一化模型

获取原文
获取原文并翻译 | 示例

摘要

Today, digital mediated interactions and communications being an important constituent. The expeditious growth of electronic communications such as Emails, micro blogs, SMS and chats etc has fabricated extensively noisy forms of text. It predominantly in young urbanites. The tremendous growth of noises in text are due to a variety of factors, such as the small number of characters allowed per text messages (160 characters is allowed per SMS and 140 characters allowed per tweets), inventing new abbreviations, using non standard orthographic forms, phonetic substitution etc. In this paper we introduce a lexico-syntactic normalization model for cleaning the noisy texts. The normalization is based on the channelized database and a user feedback system. The syntactic analysis of sentences is based on a bottom up parser. The model will capture the user interaction for improving the model accuracy. Precursory evaluation shows that the channel model will normalize the noisy word to their standard peer with better accuracy. The sentence validation achieved 95.7% accuracy.
机译:如今,数字媒体互动和通信已成为重要组成部分。电子通讯的迅猛发展,例如电子邮件,微博客,短信和聊天等,已经制造出大量嘈杂的文本形式。它主要发生在年轻的城市居民中。文本中噪声的巨大增长是由于多种因素造成的,例如每条短信允许的字符数少(每个SMS允许160个字符和每个tweet允许140个字符),使用非标准的正字法形式发明了新的缩写,语音替换等。在本文中,我们介绍了一种用于清除嘈杂文本的词汇语法规范化模型。归一化基于渠道化数据库和用户反馈系统。句子的句法分析基于自底向上的解析器。该模型将捕获用户交互以提高模型准确性。前期评估表明,信道模型将以较高的准确性将有噪声的单词归一化为它们的标准对等体。句子验证的准确率达到95.7%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号