【24h】

A Classification Approach to Text Normalization

机译:文本规范化的分类方法

获取原文

摘要

We propose a new model for text normalization: GRFE (Gated Recurrent Feature Extractor). With neural network GRU, it classifies the token into predefined types such as date, time, digit. and then normalized the tokens according to domain knowledge. GRFE can avoid many "silly errors" such as it won't normalize '17' as 'eighteen' or blending British English and American English in Date, and enhance the robustness and extendibility of the network. Experiments show that compared with the previous models, GRFE exploits less parameters and fewer layers. The number of parameters of GRFE is 30.69% of LSTM and 34.96% of CFE (Causal Feature Extractor). It takes less training time to achieve a better accuracy (92.77%).
机译:我们提出了一种用于文本规范化的新模型:GRFE(门控循环特征提取器)。使用神经网络GRU,它将令牌分类为预定义的类型,例如日期,时间,数字。然后根据领域知识对令牌进行归一化。 GRFE可以避免许多“愚蠢的错误”,例如它不会将“ 17”标准化为“十八”或在日期中混合英式英语和美式英语,并增强了网络的健壮性和可扩展性。实验表明,与以前的模型相比,GRFE利用更少的参数和更少的层。 GRFE的参数数量为LSTM的30.69%和CFE(因果特征提取器)的34.96%。花费较少的培训时间即可获得更高的准确性(92.77%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号