首页> 外文会议>IEEE Interantional Conference on Industrial Technology >Classifying Address Components of Thai Mail by Natural Language Processing.
【24h】

Classifying Address Components of Thai Mail by Natural Language Processing.

机译:通过自然语言处理对泰语邮件的地址组件进行分类。

获取原文

摘要

Since the writing format of Thai postal address is not fixed, it is difficult to classify the address components. This paper proposes a method to classify address components by using natural language processing (NLP) in order to absorb the nonfixed writing format and a little misspelling. This method finds the zip code and house number and uses them to extract only the address components from the overall destination address block. Secondly, we find the prefix of province that is the largest area component in the address. The province name following the searched prefix is a key to classify the smaller districts such as district and locality by matching in database. In case of a little misspelling, the most similar district in the matched province domain is selected as candidate, and the thresholding determines the district. In experiments, we utilized 500 address samples. The results show 86% accuracy.
机译:由于泰国邮政地址的写作格式不固定,因此很难对地址组件进行分类。本文提出了一种通过使用自然语言处理(NLP)来对地址组件进行分类的方法,以吸收非修复的写入格式和少许拼写。此方法查找邮政编码和房屋号,并使用它们仅从整个目标地址块中提取地址组件。其次,我们发现省份的前缀是地址中最大的区域组成部分。搜索前缀之后的省名称是通过在数据库中匹配来分类较小区域等较小区域的键。如有一点拼写错误,匹配省域中最相似的地区被选为候选人,并且门槛化决定了该地区。在实验中,我们使用了500个地址样本。结果表明了86%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号