On the Distribution of Lexical Features at Multiple Levels of Analysis

机译：论多层次分析中词汇特征的分布

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural language processing has increasingly moved from modeling documents and words toward studying the people behind the language. This move to working with data at the user or community level has presented the field with different characteristics of linguistic data. In this paper, we empirically characterize various lexical distributions at different levels of analysis, showing that, while most features are decidedly sparse and non-normal at the message-level (as with traditional NLP), they follow the central limit theorem to become much more Log-normal or even Normal at the user- and county-levels. Finally, we demonstrate that modeling lexical features for the correct level of analysis leads to marked improvements in common social scientific prediction tasks.

机译：自然语言处理已逐渐从建模文档和单词转变为研究语言背后的人们。这种在用户或社区级别使用数据的举措为该领域提供了语言数据的不同特征。在本文中，我们对不同分析级别的各种词汇分布进行了经验表征，结果表明，尽管大多数功能在消息级别上绝对是稀疏且非正态的（与传统的NLP一样），但它们遵循中心极限定理而变得非常多。在用户和县一级，更多日志正常，甚至正常。最后，我们证明为正确的分析水平建模词汇特征可导致常见的社会科学预测任务得到显着改善。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2017年|79-84|共6页
会议地点
作者
Fatemeh Almodaresi; Lyle Ungar; Vivek Kulkarni; Mohsen Zakeri; Salvatore Giorgi; H. Andrew Schwartz;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Senti-CS: Building a lexical resource for sentiment analysis using subjective feature selection and normalized Chi-Square-based feature weight generation [J] . Khan Farhan Hassan, Qamar Usman, Bashir Saba Expert Systems . 2016,第5期

机译：Senti-CS：使用主观特征选择和标准化的基于卡方的特征权重生成来构建用于情感分析的词汇资源
2. Understanding the distribution of a threatened bird at multiple levels: A hierarchical analysis of the ecological niche of the Santa Marta Bush-Tyrant (Myiotheretes pernix) [J] . Esteban Botero-Delgadillo, Nicholas J. Bayly, Sandra Escudero-Páez, The condor . 2015,第4期

机译：了解受威胁鸟类在多个级别的分布：圣玛尔塔·布什·暴君（Myiotheretes pernix）生态位的层次分析
3. Understanding the distribution of a threatened bird at multiple levels: A hierarchical analysis of the ecological niche of the Santa Marta Bush-Tyrant (Myiotheretes pernix) [J] . Botero-Delgadillo Esteban, Bayly Nicholas J., Escudero-Paez Sandra, The condor . 2015,第4期

机译：了解受威胁鸟类在多个级别上的分布：圣玛尔塔·布什·暴君（Myiotheretes pernix）生态位的层次分析
4. On the Distribution of Lexical Features at Multiple Levels of Analysis [C] . Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, Annual meeting of the Association for Computational Linguistics . 2017

机译：关于多级分析中词汇特征的分布
5. Close and distant charismatic and contingent reward leadership: Multiple levels -of -management and multiple levels -of -analysis perspectives. [D] . Chun, Jae Uk. 2006

机译：近距离和远距离的魅力型或偶发性奖励领导力：多个层次的管理和多个层次的分析视角。
6. Accounting for observation processes across multiple levels of uncertainty improves inference of species distributions and guides adaptive sampling of environmental DNA [O] . Amy J. Davis, Kelly E. Williams, Nathan P. Snow, 2018

机译：考虑到跨多个不确定性水平的观测过程可以改善物种分布的推断并指导环境DNA的自适应采样
7. On the Distribution of Lexical Features at Multiple Levels of Analysis [O] . Fatemeh Almodaresi, Lyle Ungar, Vivek Kulkarni, 2017

机译：关于多级分析中词汇特征的分布

On the Distribution of Lexical Features at Multiple Levels of Analysis

摘要

著录项

相似文献

相关主题

期刊订阅