Learning Compressed Sentence Representations for On-Device Text Processing

机译：学习用于设备上文本处理的压缩语句表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a bi-narized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.

机译：经过大量文本语料库训练的句子矢量表示，已广泛用作跨各种NLP问题的通用句子嵌入。通常将学习的表示假定为连续且具有实际值，这会导致较大的内存占用量和较慢的检索速度，从而阻碍了它们对低资源（内存和计算）平台（例如移动设备）的适用性。在本文中，我们提出了四种不同的策略，可以将连续的和通用的句子嵌入转换成双歧形式，同时保留它们丰富的语义信息。引入的方法可在广泛的下游任务中进行评估，其中二值化句子嵌入被证明相对于连续的句子嵌入仅使性能降低约2％，同时将存储需求降低了98％以上。而且，利用学习到的二进制表示，可以通过简单地计算两个汉明距离来评估两个句子的语义相关性，这与连续嵌入之间的内积运算相比具有更高的计算效率。详细的分析和案例研究进一步验证了所提出方法的有效性。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2019年|107-116|共10页
会议地点
作者
Dinghan Shen; Pengyu Cheng; Dhanasekar Sundararaman; Xinyuan Zhang; Qian Yang; Meng Tang; Asli Celikyilmaz; Lawrence Carin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Sentence representation with manifold learning for biomedical texts [J] . Zhao Di, Wang Jian, Lin Hongfei, Knowledge-Based Systems . 2021,第Apra22期

机译：句子表示与生物医学文本的歧管学习
2. Enabling on-device classification of ECG with compressed learning for health IoT [J] . Li Wenzhuo, Chu Haoming, Huang Boming, Microelectronics Journal . 2021,第Sepa期

机译：通过压缩学习的ECG实现ECG的设备分类
3. Text Document Categorization using Enhanced Sentence Vector Space Model and Bi-Gram Text Representation Model Based on Novel Fusion Techniques [J] . Abdisa Demissie Amensisa New Media and Mass Communication . 2020,第4期

机译：基于新型融合技术的基于增强句子矢量空间模型和双革文本表示模型的文本文档分类
4. Learning Compressed Sentence Representations for On-Device Text Processing [C] . Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, Annual meeting of the Association for Computational Linguistics . 2019

机译：学习用于设备文本处理的压缩句子表示
5. Learning Representations of Text through Language and Discourse Modeling: From Characters to Sentences. [D] . Jernite, Yacine. 2018

机译：通过语言和话语建模学习文本表示形式：从字符到句子。
6. Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text [O] . Zhiheng Li, Zhihao Yang, Chen Shen, 2019

机译：将最短依赖路径和句子序列集成到深度学习框架中以提取临床文本中的关系
7. Learning Compressed Sentence Representations for On-Device Text Processing [O] . Dinghan Shen, Pengyu Cheng, Dhanasekar Sundararaman, 2019

机译：学习用于设备文本处理的压缩句子表示

Learning Compressed Sentence Representations for On-Device Text Processing

摘要

著录项

相似文献

相关主题

期刊订阅