首页> 外文会议>International conference on text, speech and dialogue >A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese
【24h】

A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese

机译:轻量级回归法推断巴西葡萄牙语的心理语言属性

获取原文

摘要

Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.
机译:单词的心理语言特性已用于自然语言处理任务的各种方法中,例如文本简化和可读性评估。这些属性大多数都是主观的,涉及要收集的昂贵且费时的调查。最近的方法使用心理语言属性的有限数据集将其自动扩展到大型词典。但是,这种方法使用的某些资源对于大多数语言而言并不可用。这项研究提出了一种方法,该方法使用具有少量资源(通常适用于资源较少的语言)的轻量级特征构建的回归函数来推断巴西葡萄牙语(BP)的心理语言特性:单词长度,频率列表,由学校词典组成的词汇数据库和单词嵌入模型。推断出的性质之间的相关性接近于通过相关工作获得的相关性。最终的资源包含BP中的26,874个单词,并注明了具体性,获取时间,可成像性和主观频率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号