首页> 外文会议>IEEE International Conference on Data Mining Workshops >Exploring the Effect of Household Structure in Historical Record Linkage of Early 1900s Ireland Census Records
【24h】

Exploring the Effect of Household Structure in Historical Record Linkage of Early 1900s Ireland Census Records

机译:探索家庭结构在历史记录历史记录中的影响爱尔兰人口普查记录

获取原文

摘要

Record linkage is the process of identifying records corresponding to unique entities across datasets. Linking historical data allows researchers to better characterize topics like population mobility, impacts of local/national events, and generational changes. Most record linkage algorithms rely on string similarities (e.g. edit distance of name); however sometimes we expect to see changes not captured by standard text similarity metrics (e.g. name changes after marriage). The recently available Ireland 1901, 1911 national census records have limited, non-standardized fields containing the typical errors associated with digitizing and formatting hand-written records. These issues, coupled with high frequencies of common names, are part of the reasons traditional methods struggle. These methods often only consider pairwise information without incorporating household or relationship information across records (e.g. parents, siblings). However, the original census records correspond to households which allows us to explore incorporating additional structure into traditional record linkage methods. In this paper, we describe an initial labeling procedure for a subset of County Carlow, Ireland and compare approaches for including household information into both supervised and unsupervised record linkage techniques.
机译:记录链接是识别与数据集中唯一实体对应的记录的过程。链接历史数据允许研究人员更好地表征人口流动性,当地/国家事件的影响以及世代变化。大多数记录链接算法依赖于字符串相似度(例如,编辑名称的距离);但有时我们希望看到标准文本相似度量未捕获的更改(例如,婚后的名称)。最近可用的爱尔兰1901,1911国家人口普查记录有有限的非标准化字段,其中包含与数字化和格式化手写记录相关的典型错误。这些问题与高频的共同名称相结合,是传统方法斗争的原因的一部分。这些方法通常只考虑成对信息而不将家庭或关系信息跨越记录(例如,父母,兄弟姐妹)。然而,原始人口普查记录对应于家庭,使我们能够探索将额外的结构融入传统的记录联动方法。在本文中,我们描述了县城,爱尔兰,爱尔兰的子集的初始标签程序,并比较包括家庭信息,包括家庭信息的方法,进入监督和无监督的记录联动技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号