首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Learning Compressed Sentence Representations for On-Device Text Processing
【24h】

Learning Compressed Sentence Representations for On-Device Text Processing

机译:学习用于设备上文本处理的压缩语句表示

获取原文

摘要

Vector representations of sentences, trained on massive text corpora, are widely used as generic sentence embeddings across a variety of NLP problems. The learned representations are generally assumed to be continuous and real-valued, giving rise to a large memory footprint and slow retrieval speed, which hinders their applicability to low-resource (memory and computation) platforms, such as mobile devices. In this paper, we propose four different strategies to transform continuous and generic sentence embeddings into a bi-narized form, while preserving their rich semantic information. The introduced methods are evaluated across a wide range of downstream tasks, where the binarized sentence embeddings are demonstrated to degrade performance by only about 2% relative to their continuous counterparts, while reducing the storage requirement by over 98%. Moreover, with the learned binary representations, the semantic relatedness of two sentences can be evaluated by simply calculating their Hamming distance, which is more computational efficient compared with the inner product operation between continuous embeddings. Detailed analysis and case study further validate the effectiveness of proposed methods.
机译:经过大量文本语料库训练的句子矢量表示,已广泛用作跨各种NLP问题的通用句子嵌入。通常将学习的表示假定为连续且具有实际值,这会导致较大的内存占用量和较慢的检索速度,从而阻碍了它们对低资源(内存和计算)平台(例如移动设备)的适用性。在本文中,我们提出了四种不同的策略,可以将连续的和通用的句子嵌入转换成双歧形式,同时保留它们丰富的语义信息。引入的方法可在广泛的下游任务中进行评估,其中二值化句子嵌入被证明相对于连续的句子嵌入仅使性能降低约2%,同时将存储需求降低了98%以上。而且,利用学习到的二进制表示,可以通过简单地计算两个汉明距离来评估两个句子的语义相关性,这与连续嵌入之间的内积运算相比具有更高的计算效率。详细的分析和案例研究进一步验证了所提出方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号