首页> 外文期刊>Language Resources and Evaluation >Vector space explorations of literary language
【24h】

Vector space explorations of literary language

机译:文学语言的向量空间探索

获取原文
获取原文并翻译 | 示例
           

摘要

Literary novels are said to distinguish themselves from other novels through conventions associated with literariness. We investigate the task of predicting the literariness of novels as perceived by readers, based on a large reader survey of contemporary Dutch novels. Previous research showed that ratings of literariness are predictable from texts to a substantial extent using machine learning, suggesting that it may be possible to explain the consensus among readers on which novels are literary as a consensus on the kind of writing style that characterizes literature. Although we have not yet collected human judgments to establish the influence of writing style directly (we use a survey with judgments based on the titles of novels), we can try to analyze the behavior of machine learning models on particular text fragments as a proxy for human judgments. In order to explore aspects of the texts associated with literariness, we divide the texts of the novels in chunks of 2-3 pages and create vector space representations using topic models (Latent Dirichlet Allocation) and neural document embeddings (Distributed Bag-of-Words Paragraph Vectors). We analyze the semantic complexity of the novels using distance measures, supporting the notion that literariness can be partly explained as a deviation from the norm. Furthermore, we build predictive models and identify specific keywords and stylistic markers related to literariness. While genre plays a role, we find that the greater part of factors affecting judgments of literariness are explicable in bag-of-words terms,even in short text fragments and among novels with higher literary ratings. The code and notebook used to produce the results in this paper are available at https://github.com/andreasvc/litvecspace..
机译:据说文学小说通过与文学性相关的惯例将自己与其他小说区分开。我们根据对当代荷兰小说的大型读者调查,调查了预测读者感知小说文学性的任务。先前的研究表明,使用机器学习可以从文本上很大程度上预测文学水平,这表明有可能将读者对于哪些小说是文学的共识解释为对代表文学特征的写作风格的共识。尽管我们尚未收集人的判断来直接确定写作风格的影响(我们使用基于小说标题的判断进行调查),但我们可以尝试分析特定文本片段上的机器学习模型的行为作为代理。人的判断。为了探索与文学相关的文本方面,我们将小说的文本分成2-3页,并使用主题模型(潜在狄利克雷分配)和神经文档嵌入(分布式词袋)创建矢量空间表示形式段落向量)。我们使用距离量度来分析小说的语义复杂性,支持以下观点:文学性可以部分解释为与规范的偏离。此外,我们建立了预测模型,并确定了与识字相关的特定关键字和风格标记。尽管体裁发挥了作用,但我们发现影响文学水平判断的大部分因素都可以用词袋解释,即使是短文本片段和文学评价较高的小说也是如此。 https://github.com/andreasvc/litvecspace上提供了用于产生本文结果的代码和笔记本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号