Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings

机译：基于无监督的主题模型学习词嵌入式的文本网络构建

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Distributed word embeddings have proven remarkably effective at capturing word level semantic and syntactic regularities in language for many natural language processing tasks. One recently proposed semi-supervised representation learning method called Predictive Text Embedding (PTE) utilizes both semantically labeled and unlabeled data in information networks to learn the embedding of text that produces state of-the-art performance when compared to other embedding methods. However, PTE uses supervised label information to construct one of the networks and many other possible ways of constructing such information networks are left untested. We present two unsupervised methods that can be used in constructing a large scale semantic information network from documents by combining topic models that have emerged as a powerful technique of finding useful structure in an unstructured text collection as it learns distributions over words. The first method uses Latent Dirichlet Allocation (LDA) to build a topic model over text, and constructs a word topic network with edge weights proportional to the word-topic probability distributions. The second method trains an unsupervised neural network to learn the word-document distribution, with a single hidden layer representing a topic distribution. The two weight matrices of the neural net are directly reinterpreted as the edge weights of heterogeneous text networks that can be used to train word embeddings to build an effective low dimensional representation that preserves the semantic closeness of words and documents for NLP tasks. We conduct extensive experiments to evaluate the effectiveness of our methods.

机译：分布式单词嵌入式在语言中捕获语言的单词级语义和句法规律，以获得许多自然语言处理任务，已经证明了显着有效。最近提出了称为预测文本嵌入（PTE）的半监督表示学习方法利用信息网络中的语义标记和未标记的数据，以了解与其他嵌入方法相比，在嵌入文本中产生最新性能的文本。然而，PTE使用监督标签信息来构建其中一个网络，并且许多构建此类信息网络的许多其他可能的方式被留下了未经测试。我们提出，可以通过组合已经成为它在学习单词分布非结构化文本集合中发现有用的结构的强大技术主题模型构建从文档的大规模语义信息网络使用两个无监督的方法。第一个方法使用潜在的Dirichlet分配（LDA）在文本中构建主题模型，并构造与与单词主题概率分布成比例的边缘权重的单词主题网络。第二种方法列举了一个无监督的神经网络来学习单词文档分发，其中单个隐藏层表示主题分布。神经网络的两个权重矩阵被直接重新诠释为可用于培训Word Embeddings的异构文本网络的边缘权重，以构建有效的低维表示，以保留用于NLP任务的单词和文档的语义闭合。我们对评估我们方法的有效性进行了广泛的实验。

著录项

来源
《IEEE International Conference on Machine Learning and Applications》|2019年|1 v.|共7页
会议地点
作者
Sun Sunnie Chung; Michael DArcy;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机软件;
关键词
Task analysis; Semantics; Context modeling; Neural networks; Linear programming; Predictive models; Buildings;

机译：任务分析;语义;语境建模;神经网络;线性规划;预测模型;建筑物;

相似文献

外文文献
中文文献
专利

1. Enhancing unsupervised neural networks based text summarization with word embedding and ensemble learning [J] . Alami Nabil, Meknassi Mohammed, En-nahnahi Noureddine Expert Systems with Application . 2019,第JUNa期

机译：通过词嵌入和集成学习来增强基于文本的无监督神经网络汇总
2. Adaptive cross-contextual word embedding for word polysemy with unsupervised topic modeling [J] . Li Shuangyin, Pan Rong, Luo Haoyu, Knowledge-Based Systems . 2021,第Apra22期

机译：与无监督主题建模的自适应交叉上下文词嵌入Word Polysemy
3. Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings [J] . Li Ximing, Zhang Ang, Li Changchun, The Computer journal . 2019,第3期

机译：关系双项主题模型：使用词嵌入的短文本主题建模
4. Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings [C] . Sun Sunnie Chung, Michael DArcy IEEE International Conference on Machine Learning and Applications . 2019

机译：基于无监督主题模型的文本网络学习词嵌入
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Unsupervised learning of temporal features for word categorization in a spiking neural network model of the auditory brain [O] . Irina Higgins, Simon Stringer, Jan Schnupp -1

机译：听脑突刺神经网络模型中单词分类的时态特征的无监督学习
7. Short Text Classification Based on Latent Topic Modeling and Word Embedding [O] . Peng LI, Jun-Qing HE, Cheng-Long MA 2017

机译：基于潜在主题建模和单词嵌入的简短文本分类

Unsupervised Topic Model Based Text Network Construction for Learning Word Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅