首页> 美国卫生研究院文献>AMIA Annual Symposium Proceedings >Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.
【2h】

Utility of General and Specific Word Embeddings for Classifying Translational Stages of Research.

机译:通用词词嵌入和特定词词嵌入对分类研究翻译阶段的效用。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Conventional text classification models make a bag-of-words assumption reducing text into word occurrence counts per document. Recent algorithms such as word2vec are capable of learning semantic meaning and similarity between words in an entirely unsupervised manner using a contextual window and doing so much faster than previous methods. Each word is projected into vector space such that similar meaning words such as “strong” and “powerful” are projected into the same general Euclidean space. Open questions about these embeddings include their utility across classification tasks and the optimal properties and source of documents to construct broadly functional embeddings. In this work, we demonstrate the usefulness of pre-trained embeddings for classification in our task and demonstrate that custom word embeddings, built in the domain and for the tasks, can improve performance over word embeddings learnt on more general data including news articles or Wikipedia.
机译:常规的文本分类模型做出了一个词袋假设,将文本减少为每个文档的单词出现次数。诸如word2vec之类的最新算法能够使用上下文窗口以完全不受监督的方式学习单词之间的语义和相似性,并且比以前的方法快得多。每个单词都投影到向量空间中,以便将类似含义的单词(例如“强”和“有力”)投影到相同的一般欧几里得空间中。关于这些嵌入的未解决问题包括它们在分类任务中的效用以及构造广泛功能性嵌入的最佳属性和文档来源。在这项工作中,我们展示了预训练的嵌入对于任务分类的有用性,并展示了在领域和任务中内置的自定义单词嵌入相对于从更广泛的数据(包括新闻报道或Wikipedia)上学习的单词嵌入,可以提高性能。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号