首页> 外文会议>Annual meeting of the Association for Computational Linguistics >On the Distribution of Lexical Features at Multiple Levels of Analysis
【24h】

On the Distribution of Lexical Features at Multiple Levels of Analysis

机译:论多层次分析中词汇特征的分布

获取原文

摘要

Natural language processing has increasingly moved from modeling documents and words toward studying the people behind the language. This move to working with data at the user or community level has presented the field with different characteristics of linguistic data. In this paper, we empirically characterize various lexical distributions at different levels of analysis, showing that, while most features are decidedly sparse and non-normal at the message-level (as with traditional NLP), they follow the central limit theorem to become much more Log-normal or even Normal at the user- and county-levels. Finally, we demonstrate that modeling lexical features for the correct level of analysis leads to marked improvements in common social scientific prediction tasks.
机译:自然语言处理已逐渐从建模文档和单词转变为研究语言背后的人们。这种在用户或社区级别使用数据的举措为该领域提供了语言数据的不同特征。在本文中,我们对不同分析级别的各种词汇分布进行了经验表征,结果表明,尽管大多数功能在消息级别上绝对是稀疏且非正态的(与传统的NLP一样),但它们遵循中心极限定理而变得非常多。在用户和县一级,更多日志正常,甚至正常。最后,我们证明为正确的分析水平建模词汇特征可导致常见的社会科学预测任务得到显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号