A Classification Approach to Text Normalization

机译：文本规范化的分类方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a new model for text normalization: GRFE (Gated Recurrent Feature Extractor). With neural network GRU, it classifies the token into predefined types such as date, time, digit. and then normalized the tokens according to domain knowledge. GRFE can avoid many "silly errors" such as it won't normalize '17' as 'eighteen' or blending British English and American English in Date, and enhance the robustness and extendibility of the network. Experiments show that compared with the previous models, GRFE exploits less parameters and fewer layers. The number of parameters of GRFE is 30.69% of LSTM and 34.96% of CFE (Causal Feature Extractor). It takes less training time to achieve a better accuracy (92.77%).

机译：我们提出了一种用于文本规范化的新模型：GRFE（门控循环特征提取器）。使用神经网络GRU，它将令牌分类为预定义的类型，例如日期，时间，数字。然后根据领域知识对令牌进行归一化。 GRFE可以避免许多“愚蠢的错误”，例如它不会将“ 17”标准化为“十八”或在日期中混合英式英语和美式英语，并增强了网络的健壮性和可扩展性。实验表明，与以前的模型相比，GRFE利用更少的参数和更少的层。 GRFE的参数数量为LSTM的30.69％和CFE（因果特征提取器）的34.96％。花费较少的培训时间即可获得更高的准确性（92.77％）。

著录项

来源
《International Conference on Advanced Electronic Materials, Computers and Software Engineering》|2020年|573-577|共5页
会议地点
作者
Guozhang Zhao; Chenkai Ma; Wenxian Feng; Rui Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
GRFE, Text Normalization, Classification;

机译：GRFE，文本规范化，分类;

相似文献

外文文献
中文文献
专利

1. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach [J] . Xiaoyi Pan, Boyu Chen, Heng Weng, JMIR Medical Informatics . 2020,第7期

机译：中国叙事临床文本的时间表达分类和标准化：模式学习方法
2. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task [J] . Abeed Sarker, Maksim Belousov, Jasper Friedrichs, Journal of the American Medical Informatics Association : . 2018,第10期

机译：Twitter的药物相关文本分类和概念标准化的数据和系统：来自社交媒体挖掘的洞察力（SMM4H） - 2017年共享任务
3. Feature selection based on a normalized difference measure for text classification [J] . Abdur Rehman, Kashif Javed, Haroon A. Babri Information Processing & Management . 2017,第2期

机译：基于归一化差异度量的文本分类特征选择
4. Influence of Word Normalization and Chi-Squared Feature Selection on Support Vector Machine (SVM) Text Classification [C] . Ardy Wibowo Haryanto, Edy Kholid Mawardi, Muljono International Seminar on Application for Technology of Information and Communication . 2018

机译：单词归一化和Chi-Square特征选择对支持向量机（SVM）文本分类的影响
5. A Data Augmentation Approach to Short Text Classification. [D] . Rosario, Ryan Robert. 2017

机译：短文本分类的数据增强方法。
6. Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task [O] . Abeed Sarker, Maksim Belousov, Jasper Friedrichs, 2018

机译：Twitter上与药物有关的文本分类和概念归一化的数据和系统：来自社交媒体健康促进会（SMM4H）-2017的共享任务的见解
7. Temporal Expression Classification and Normalization From Chinese Narrative Clinical Texts: Pattern Learning Approach [O] . Xiaoyi Pan, Boyu Chen, Heng Weng, 2020

机译：中国叙事临床文本的时间表达分类和标准化：模式学习方法

A Classification Approach to Text Normalization

摘要

著录项

相似文献

相关主题

期刊订阅