首页> 外文会议>IEEE International Conference on Big Data >Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History
【24h】

Mining Local Gazetteers of Literary Chinese with CRF and Pattern based Methods for Biographical Information in Chinese History

机译:中国历史上的CRF与拟学型拟宪报矿业矿业

获取原文

摘要

Person names and location names are essential building blocks for identifying events and social networks in historical documents that were written in literary Chinese. We take the lead to explore the research on algorithmically recognizing named entities in literary Chinese for historical studies with language-model based and conditional-random-field based methods, and extend our work to mining the document structures in historical documents. Practical evaluations were conducted with texts that were extracted from more than 220 volumes of local gazetteers (Difangzhi). Difangzhi is a huge and the single most important collection that contains information about officers who served in local government in Chinese history. Our methods performed very well on these realistic tests. Thousands of names and addresses were identified from the texts. A good portion of the extracted names match the biographical information currently recorded in the China Biographical Database (CBDB) of Harvard University, and many others can be verified by historians and will become as new additions to CBDB.
机译:人员名称和位置名称是用于在文学中撰写的历史文档中识别事件和社交网络的基本构建块。我们采取了探讨了基于语言模型和条件 - 随机场的方法的历史研究中算法识别文学汉语命名实体的研究,并扩展了我们的工作来挖掘历史文档中的文档结构。使用从220多个本地公鸡(Difangzhi)中提取的文本进行了实际评估。 Difangzhi是一个巨大的和最重要的收藏品,包含有关在中国历史中当地政府服务的官员的信息。我们的方法对这些现实测试非常好。从文本中确定了数以千计的名称和地址。提取名称的一部分匹配当前在哈佛大学的中国传记数据库(CBDB)中录制的传记信息,其他许多其他人可以通过历史学家验证,并将成为CBDB的新补充。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号