首页> 外文期刊>The Electronic Library >Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering
【24h】

Disambiguating USPTO inventor names with semantic fingerprinting and DBSCAN clustering

机译:通过语义指纹识别和DBSCAN集群消除USPTO发明人名称的歧义

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose The aim of this study is to present a novel approach based on semantic fingerprinting and a clustering algorithm called density-based spatial clustering of applications with noise (DBSCAN), which can be used to convert investor records into 128-bit semantic fingerprints. Inventor disambiguation is a method used to discover a unique set of underlying inventors and map a set of patents to their corresponding inventors. Resolving the ambiguities between inventors is necessary to improve the quality of the patent database and to ensure accurate entity-level analysis. Most existing methods are based on machine learning and, while they often show good performance, this comes at the cost of time, computational power and storage space. Design/methodology/approach Using DBSCAN, the meta and textual data in inventor records are converted into 128-bit semantic fingerprints. However, rather than using a string comparison or cosine similarity to calculate the distance between pair-wise fingerprint records, a binary number comparison function was used in DBSCAN. DBSCAN then clusters the inventor records based on this distance to disambiguate inventor names. Findings Experiments conducted on the PatentsView campaign database of the United States Patent and Trademark Office show that this method disambiguates inventor names with recall greater than 99 per cent in less time and with substantially smaller storage requirement. Originality/value Compared with the existing methods, the proposed method does not rely on feature selection and complex feature comparison computation. Most importantly, running time and storage requirements are drastically reduced.
机译:目的本研究的目的是提出一种基于语义指纹的新方法,以及一种称为基于密度的带有噪声的应用程序空间聚类(DBSCAN)的聚类算法,该算法可用于将投资者记录转换为128位语义指纹。发明人消除歧义是一种用于发现一组独特的基础发明人并将一组专利映射到其相应发明人的方法。解决发明人之间的歧义对于提高专利数据库的质量并确保准确的实体级分析是必要的。现有的大多数方法都是基于机器学习的,尽管它们通常表现出良好的性能,但这是以时间,计算能力和存储空间为代价的。设计/方法/方法使用DBSCAN,可以将Inventor记录中的元数据和文本数据转换为128位语义指纹。但是,不是使用字符串比较或余弦相似度来计算成对指纹记录之间的距离,而是在DBSCAN中使用了二进制数比较功能。然后,DBSCAN根据该距离将发明人记录聚类,以消除发明人名称的歧义。在美国专利商标局的PatentsView活动数据库上进行的实验结果表明,该方法可在更短的时间内使召回率超过99%的发明人名称消除歧义,并且存储需求大大减少。独创性/价值与现有方法相比,该方法不依赖特征选择和复杂特征比较计算。最重要的是,大大减少了运行时间和存储需求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号