首页>
外国专利>
Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree
Methods for obtaining improved text similarity measures which replace similar characters with a string pattern representation by using a semantic data tree
展开▼
机译:获得改进的文本相似性度量的方法,该方法通过使用语义数据树以字符串模式表示形式替换相似字符
展开▼
页面导航
摘要
著录项
相似文献
摘要
The embodiments of the invention provide methods for obtaining improved text similarity measures. More specifically, a method of measuring similarity between at least two electronic documents begins by identifying similar terms between the electronic documents. This includes basing similarity between the similar terms on patterns, wherein the patterns can include word patterns, letter patterns, numeric patterns, and/or alphanumeric patterns. The identifying of the similar terms also includes identifying multiple pattern types between the electronic documents. Moreover, the basing of the similarity on patterns identifies terms within the electronic documents that are within a category of a hierarchy. Specifically, the identifying of the terms reviews a hierarchical data tree, wherein nodes of the tree represent terms within the electronic documents. Lower nodes of the tree have specific terms; and, wherein higher nodes of the tree have general terms.
展开▼