首页> 外国专利> Text segmentation method and apparatus, text segmentation program, and storage medium storing text segmentation program

Text segmentation method and apparatus, text segmentation program, and storage medium storing text segmentation program

机译:文本分割方法和装置,文本分割程序以及存储文本分割程序的存储介质

摘要

PROBLEM TO BE SOLVED: To make only the boundary of semantic paragraphs settable as a right answer from a text neither too much nor too little. SOLUTION: The text is divided into words by morpheme analysis, a vector corresponding to each of words provided by morpheme analyzing processing is acquired by retrieving a concept base storing vectors expressing the meanings of words, and word strings as set of words of a certain number are taken before and after the boundary of words. Then a word string coupling degree is calculated from information on the vectors of words comprising each of word strings as similarity scale or distance scale of preceding and following word strings and a minimum word boundary when the word strings coupling degree is similarity scale or maximum word boundary when is distance scale, is recognized as a boundary of semantic paragraphs of the text.
机译:要解决的问题:仅使语义段落的边界可被设置为来自文本的正确答案,既不过多又少。解决方案:通过词素分析将文本分为多个单词,通过检索存储表示单词含义的向量的概念库以及作为一定数量的单词集合的单词字符串,获取与词素分析处理提供的每个单词相对应的向量在单词边界之前和之后进行。然后,根据包括每个单词串的单词的向量的信息,作为前后单词串的相似度标度或距离标度,以及当单词列耦合度为相似度标度或最大单词边界时的最小单词边界,来计算单词列耦合度。 when是距离刻度,被识别为文本的语义段落的边界。

著录项

  • 公开/公告号JP3775239B2

    专利类型

  • 公开/公告日2006-05-17

    原文格式PDF

  • 申请/专利权人 日本電信電話株式会社;

    申请/专利号JP20010146872

  • 发明设计人 別所 克人;

    申请日2001-05-16

  • 分类号G06F17/27;G06F17/21;

  • 国家 JP

  • 入库时间 2022-08-21 21:50:49

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号