首页> 外文期刊>The quarterly journal of experimental psychology: QJEP >How useful are corpus-based methods for extrapolating psycholinguistic variables?
【24h】

How useful are corpus-based methods for extrapolating psycholinguistic variables?

机译:基于语料库的方法对心理语言变量的推断有多有用?

获取原文
获取原文并翻译 | 示例
           

摘要

Subjective ratings for age of acquisition, concreteness, affective valence, and many other variables are an important element of psycholinguistic research. However, even for well-studied languages, ratings usually cover just a small part of the vocabulary. A possible solution involves using corpora to build a semantic similarity space and to apply machine learning techniques to extrapolate existing ratings to previously unrated words. We conduct a systematic comparison of two extrapolation techniques: k-nearest neighbours, and random forest, in combination with semantic spaces built using latent semantic analysis, topic model, a hyperspace analogue to language (HAL)-like model, and a skip-gram model. A variant of the k-nearest neighbours method used with skip-gram word vectors gives the most accurate predictions but the random forest method has an advantage of being able to easily incorporate additional predictors. We evaluate the usefulness of the methods by exploring how much of the human performance in a lexical decision task can be explained by extrapolated ratings for age of acquisition and how precisely we can assign words to discrete categories based on extrapolated ratings. We find that at least some of the extrapolation methods may introduce artefacts to the data and produce results that could lead to different conclusions that would be reached based on the human ratings. From a practical point of view, the usefulness of ratings extrapolated with the described methods may be limited.
机译:对获得年龄,具体性,情感价和许多其他变量的主观评价是心理语言学研究的重要内容。但是,即使对于经过精心研究的语言,评分通常也仅覆盖词汇表的一小部分。一种可能的解决方案涉及使用语料库来构建语义相似性空间,并应用机器学习技术将现有的评分推算到以前未评分的单词。我们对两种外推技术进行了系统的比较:k近邻和随机森林,并结合了使用潜在语义分析,主题模型,类似于语言(HAL)的超空间模型和skip-gram构建的语义空间模型。与跳过语法词向量一起使用的k最近邻方法的一种变体给出了最准确的预测,但是随机森林方法的优点是能够轻松合并其他预测变量。我们通过探究词汇决策任务中多少人的表现可以通过获取年龄的外推等级来解释,以及我们如何根据外推等级将单词分配给离散类别来精确地评估方法的有效性。我们发现至少某些外推方法可能会将伪像引入数据中,并产生可能导致基于人类评级得出不同结论的结果。从实践的角度来看,用所述方法推断的等级的有用性可能是有限的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号