On-Device Sentence Similarity for SMS Dataset

机译：SMS DataSet的设备句子相似度

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Determining the sentence similarity between Short Message Service (SMS) texts/sentences plays a significant role in mobile device industry. Gauging the similarity between SMS data is thus necessary for various applications like enhanced searching and navigation, clubbing together SMS of similar type when given a custom label or tag is provided by user irrespective of their sender etc. The problem faced with SMS data is its incomplete structure and grammatical inconsistencies. In this paper, we propose a unique pipeline for evaluating the text similarity between SMS texts. We use Part of Speech (POS) model for keyword extraction by taking advantage of the partial structure embedded in SMS texts and similarity comparisons are carried out using statistical methods. The proposed pipeline deals with major semantic variations across SMS data as well as makes it effective for its application on-device (mobile phone). To showcase the capabilities of our work, our pipeline has been designed with an inclination towards one of the possible applications of SMS text similarity discussed in one of the following sections but nonetheless guarantees scalability for other applications as well.

机译：确定短消息服务（SMS）文本/句子之间的句子相似性在移动设备行业中发挥着重要作用。因此，根据增强型搜索和导航，如增强的搜索和导航，如增强的搜索和导航，在给定自定义标签或标签时，使用相似类型的短信是不管他们的发件人等所提供的。结构和语法不一致。在本文中，我们提出了一种唯一的管道，用于评估SMS文本之间的文本相似性。我们利用嵌入在短信文本中的部分结构和使用统计方法进行相似性比较来使用关键字提取的一部分语音（POS）模型。该拟议的管道涉及SMS数据的主要语义变化，并使其在设备上的应用程序（移动电话）有效。为了展示我们工作的能力，我们的管道已经设计，并倾向于朝着以下部分之一讨论的SMS文本相似性之一，但仍然保证了其他应用程序的可扩展性。

著录项

来源
《IEEE International Conference on Semantic Computing》|2021年|140-146|共7页
会议地点
作者
Arun D Prabhu; Nikhil Arora; Shubham Vatsal; Gopi Ramena; Sukumar Moharana; Naresh Purre;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Statistical analysis; Navigation; Scalability; Pipelines; Semantics; Search problems; Mobile handsets;

机译：统计分析;导航;可扩展性;管道;语义;搜索问题;移动手机;

相似文献

外文文献
中文文献
专利

1. The CMIP6 Historical Simulation Datasets Produced by the Climate System Model CAMS-CSM [J] . Xinyao RONG, Jian LI, Haoming CHEN, 大气科学进展（英文版） . 2021,第002期
2. CAS-ESM2.0 Model Datasets for the CMIP6 Flux-Anomaly-Forced Model Intercomparison Project (FAFMIP) [J] . Jiangbo JIN, Guangqing ZHOU, Zhaohui LIN, 大气科学进展（英文版） . 2021,第002期
3. CAS-ESM2.0 Model Datasets for the CMIP6 Ocean Model Intercomparison Project Phase 1 (OMIP1) [J] . Xiao DONG, Mirong SONG, Zhaohui LIN, 大气科学进展（英文版） . 2021,第002期
4. BCC-ESM1 Model Datasets for the CMIP6 Aerosol Chemistry Model Intercomparison Project (AerChemMIP) [J] . Jie ZHANG, Qianxia LIU, Jinghui Yan, 大气科学进展（英文版） . 2021,第002期
5. The attribute-trend-similarity method to improve learning performance for small datasets [J] . Li Der-Chiang, Lin Wu-Kuo, Lin Liang-Sian, International Journal of Production Research . 2017,第7a8期

机译：属性趋势相似度方法提高小型数据集的学习性能
6. A New Sentence Similarity Measure And Sentence Based Extractive Technique For Automatic Text Summarization [J] . Ramiz M. Aliguliyev Expert systems with applications . 2009,第4期

机译：一种新的句子相似度度量和基于句子的自动文本摘要提取技术
7. iSpreadRank: Ranking sentences for extraction-based summarization using feature weight propagation in the sentence similarity network [J] . Jen-Yuan Yeh, Hao-Ren Ke, Wei-Pang Yang Expert systems with applications . 2008,第3期

机译：iSpreadRank：在句子相似性网络中使用特征权重传播对句子进行排序，以进行基于提取的摘要
8. Discourse Component to Sentence (DC2S): An Efficient Human-Aided Construction of Paraphrase and Sentence Similarity Dataset [C] . Won Ik Cho, Jong In Kim, Young Ki Moon, International Conference on Language Resources and Evaluation . 2020

机译：话语组件句子（DC2S）：释放和句子相似性数据集的有效人物辅助构造
9. Hashing Based Similarity Search over Massive Datasets [D] . Li, Jinfeng. 2018

机译：基于哈希的大规模数据集相似度搜索
10. Estimating multivariate similarity between neuroimaging datasets with sparse canonical correlation analysis: an application to perfusion imaging [O] . Maria J. Rosa, Mitul A. Mehta, Emilio M. Pich, 2015

机译：用稀疏典范相关分析估计神经影像数据集之间的多元相似性：在灌注成像中的应用
11. Figure 4: (A) One conserved sequence, which occurs 79 times in 46,264 binding site peaks from the ChIP-seq data-set. The mutation profile of this conserved sequence is illustrated, where ’_ ’ indicates this base is unchanged; DEL indicates this base is lost; INS X indicates a new base X is inserted in front of this base. (B) Several repeated elements patterns are listed. (C) In the first column, the top five DNA motifs, mined by meme-chip tools (Machanick Bailey, 2011) are illustrated. The resemblant conserved sequences, found by the CFSP algorithm are listed in the second column. In the third column, the position-specific scoring matrices, which are transformed from mutational information are listed. The similarity between meme motif and resemblant conserved sequence with PSSM format was calculated via a stamp motif comparison tool (Mahony Benos, 2007). The E-values for the similarity of those pairs is displayed in the fourth column. (D) One motif is selected in each group clustered by gkmsvm descriptors, and the corresponding motif found by the CFSP algorithm is listed below. (E) There are additional datasets (File No: ENCFF100GRL, ENCFF616IRT, ENCFF870CER, Target: SREBF1) collected from https://www.encodeproject.org. The top two motifs are selected in each file using meme tools, and the corresponding motifs found by our algorithm are listed below. [O] . -1

机译：图4：（a）一种保守序列，其发生在芯片-SEQ数据集中的46,264个结合位点峰值中的79倍。说明了这种保守序列的突变分布，其中'_'表示该碱度不变; del表示此基础丢失; INS X表示新的基础X插入此基础前面。（b）列出了几种重复的元素模式。（c）在第一栏中，示出了由MEME芯片工具（Machanick＆Bailey，2011）开采的前五个DNA主题。由CFSP算法发现的相应保守序列列于第二列中。在第三列中，列出了从突变信息转换的特定位置的评分矩阵。 MEME主题与PSSM格式的相似性与PSSM格式之间的相似性通过邮票图章比较工具（Mahony＆Benos，2007）计算。这些对相似性的电子值显示在第四列中。（d）在由GKMSVM描述符聚集的每个组中选择了一个图案，下面列出了CFSP算法的相应主题。（e）从https://www.encodeproject.org收集的，有附加数据集（文件no：cernff100grl，cenf616irl，conf8.20cer，target：srebf1）。使用MEME工具在每个文件中选择前两个图案，并且我们的算法发现的相应主题如下所示。

On-Device Sentence Similarity for SMS Dataset

摘要

著录项

相似文献

相关主题

期刊订阅