首页> 外文会议>International conference on information knowledge engineering >Address and Participant Entity-Resolution in a Large, Cohort Observational Study Utilizing an Open-source Entity Resolution Tool (OYSTER)
【24h】

Address and Participant Entity-Resolution in a Large, Cohort Observational Study Utilizing an Open-source Entity Resolution Tool (OYSTER)

机译:利用开放源代码实体解析工具(OYSTER)在大型队列观察研究中解决地址和参与者实体的问题

获取原文

摘要

The National Children's Study (NCS) Arkansas Study Center (ASC) uses the open-source software application called Open sYSTem Entity Resolution (OYSTER), developed at the University of Arkansas at Little Rock (available at [1]), to resolve multiple records of a participant's address. The duplicate records arise because addresses are collected from multiple sources that include instruments, the participant's healthcare provider, and other data-collection forms. The ASC conducts study instruments using the open-source LimeSurvey application. Most address information is obtained via the pregnancy screener instrument, but other instruments also require an address if the subject has moved or plans to move. Participants' demographic information, including address, is entered and managed in caBIG Central Clinical Participant Registry (C3PR). To properly submit participant address and instrument information to Vanguard Data Repository (VDR), we must ensure that a participant's addresses recorded in these applications are resolved if duplicated. Furthermore, given that manual entry of address data in both applications is error prone and subject to variability (e.g., entering St. vs. Street), resolving duplicates is not straightforward and simple string matching will frequently fail to detect duplicates. OYSTER is an entity resolution system that supports probabilistic direct matching, transitive linking, and asserted linking. To facilitate prospecting for match candidates (blocking), the system builds and maintains an in-memory index of attribute values to identities. Once OYSTER identifies the duplicates, we manually resolve them in LimeSurvey and C3PR, and we are moving to an automated process.
机译:国家儿童研究(NCS)阿肯色研究中心(ASC)使用由阿肯色大学Little Rock开发的开源软件应用程序Open sYSTem Entity Resolution(OYSTER)(可从[1]获得)来解析多个记录。参与者的地址。出现重复记录是因为地址是从多个来源收集的,这些来源包括仪器,参与者的医疗保健提供者和其他数据收集表格。 ASC使用开源的LimeSurvey应用程序进行研究。大多数地址信息是通过怀孕筛查仪获得的,但是如果受试者已移动或计划移动,则其他仪器也需要地址。参与者的人口统计信息(包括地址)在caBIG中央临床参与者注册表(C3PR)中输入和管理。为了将参与者的地址和工具信息正确提交给Vanguard Data Repository(VDR),我们必须确保记录在这些应用程序中的参与者的地址(如果重复)将得到解析。此外,由于在两个应用程序中手动输入地址数据都容易出错并且容易变化(例如,输入St.vs.Street),所以解决重复项不是那么简单,简单的字符串匹配将经常无法检测到重复项。 OYSTER是一个实体解析系统,它支持概率直接匹配,传递链接和声明的链接。为了便于寻找匹配候选者(阻止),系统构建并维护属性值到身份的内存中索引。一旦OYSTER识别出重复项,我们将在LimeSurvey和C3PR中手动对其进行解析,然后我们将转向自动化流程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号