首页> 外文学位 >Design and construction of an entity resolution system that supports entity identity information management and asserted resolution.
【24h】

Design and construction of an entity resolution system that supports entity identity information management and asserted resolution.

机译:支持实体身份信息管理和断言解析的实体解析系统的设计和构建。

获取原文
获取原文并翻译 | 示例

摘要

This work describes the design and construction of an open source, entity resolution system that enables users to assign and maintain persistent identifiers for master data items. Two key features of this system that are not available in current ER systems and that make persistent identification possible are (1) The capture and management of entity identity information (2) Support for user-directed asserted resolution to complement automated direct matching and transitive closure;Another important feature of the design is that the system can be easily configured at runtime into any one of four types of entity resolution architectures including (1) Traditional merge/purge, also known as, record linking (2) Identity Capture (3) Identity Update (4) Identity Resolution.;Because these configurations can be established by the user at run-time, the system provides a valuable tool for academic research and instruction. This will allow researchers and students to use the same system to explore the behavior and nature of different ER architectures. Even though the most common string-match comparators have been built into the system, such as, Levenshtein Edit Distance, Q-Gram, Soundex, and many others, the system has been designed to allow users to easily add additional comparators by extending the systems Comparator class. Furthermore, the system incorporates a dynamic filtering system that improves the performance of the matching algorithm by avoiding record pairs that cannot possibly match.
机译:这项工作描述了开放源代码实体解析系统的设计和构建,该系统使用户能够分配和维护主数据项的持久标识符。该系统的两个关键功能在当前的ER系统中不可用,并且使持久识别成为可能:(1)实体标识信息的捕获和管理(2)支持用户控制的断言分辨率,以补充自动直接匹配和传递闭包;该设计的另一个重要特征是,可以在运行时将系统轻松配置为四种类型的实体解析架构中的任何一种,其中包括(1)传统的合并/清除,也称为记录链接(2)身份捕获(3)身份更新(4)身份解析;由于用户可以在运行时建立这些配置,因此该系统为学术研究和教学提供了有价值的工具。这将使研究人员和学生可以使用同一系统来探索不同ER体系结构的行为和性质。即使系统中已内置了最常见的字符串匹配比较器,例如Levenshtein Edit Distance,Q-Gram,Soundex等,该系统仍被设计为允许用户通过扩展系统轻松添加其他比较器比较器类。此外,该系统并入了动态过滤系统,该系统通过避免可能不匹配的记录对来提高匹配算法的性能。

著录项

  • 作者

    Nelson, Eric Derrand.;

  • 作者单位

    University of Arkansas at Little Rock.;

  • 授予单位 University of Arkansas at Little Rock.;
  • 学科 Information Technology.;Computer Science.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 218 p.
  • 总页数 218
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号