首页> 外国专利> METHOD AND APPARATUS FOR NEURAL NETWORK-BASED WORD SEGMENTATION AND PART-OF-SPEECH TAGGING, DEVICE AND STORAGE MEDIUM

METHOD AND APPARATUS FOR NEURAL NETWORK-BASED WORD SEGMENTATION AND PART-OF-SPEECH TAGGING, DEVICE AND STORAGE MEDIUM

机译:基于神经网络的词分词和词性标记,设备和存储介质的方法和装置

摘要

A method and apparatus for neural network-based word segmentation and part-of-speech tagging, a computer device, and a storage medium, which relate to the technical field of artificial intelligence. The method comprises: acquiring a corpus to undergo word segmentation and inputting same into a pre-trained first DNN neural network model, and acquiring a plurality of initially segmented words outputted by the first DNN neural network model in response to the corpus to undergo word segmentation (201, 202); and calculating an internal aggregation degree and information entropy of each initially segmented word, and determining an intially segmented word of which both the internal aggregation degree and information entropy exceed set thresholds to be a final segmented word (203). The final segmented word is inputted into a pre-trained second DNN neural network model and KNN model for use in analyzing the candidate word part-of-speech and candidate word part-of-speech probabilities as well as the similar word part-of-speech and similar word part-of-speech probabilities of the final segmented word (204, 205), and the part of speech that has the highest probability is returned as the part of speech of the final segmented word (206). The described method completes part-of-speech tagging at the same time as word segmentation, further improves the accuracy of word segmentation, and provides word segmentation results for different scenarios that best fit the scenarios.
机译:基于神经网络的分词和词性标注的方法和装置,计算机设备和存储介质,涉及人工智能技术领域。该方法包括:获取要进行词分割的语料并将其输入到预先训练的第一DNN神经网络模型中;以及响应于要进行词分割的语料,获取由第一DNN神经网络模型输出的多个初始分割的词。 (201,202);计算每个初始分割词的内部聚集度和信息熵,并将内部聚集度和信息熵均超过设定阈值的初始分割词确定为最终分割词(203)。将最终的分段词输入到预训练的第二DNN神经网络模型和KNN模型中,以用于分析候选词词性和候选词词性以及相似词的词性。最终分割词的语音和类似词的词性概率(204、205)以及具有最高概率的词性作为最终分割词的词性(206)被返回。所描述的方法与单词分割同时完成了词性标注,进一步提高了单词分割的准确性,并为最适合场景的不同场景提供了单词分割结果。

著录项

  • 公开/公告号WO2020206913A1

    专利类型

  • 公开/公告日2020-10-15

    原文格式PDF

  • 申请/专利权人 PING AN TECHNOLOGY (SHENZHEN) CO. LTD.;

    申请/专利号WO2019CN103298

  • 发明设计人 WU ZHUANGWEI;

    申请日2019-08-29

  • 分类号G06F17/27;

  • 国家 WO

  • 入库时间 2022-08-21 11:09:04

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号