...
首页> 外文期刊>Journal of Medical Systems >Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings
【24h】

Information Extraction Approaches to Unconventional Data Sources for “Injury Surveillance System”: the Case of Newspapers Clippings

机译:用于“伤害监视系统”的非常规数据源的信息提取方法:以报刊剪辑为例

获取原文
获取原文并翻译 | 示例
           

摘要

Injury Surveillance Systems based on traditional hospital records or clinical data have the advantage of being a well established, highly reliable source of information for making an active surveillance on specific injuries, like choking in children. However, they suffer the drawback of delays in making data available to the analysis, due to inefficiencies in data collection procedures. In this sense, the integration of clinical based registries with unconventional data sources like newspaper articles has the advantage of making the system more useful for early alerting. Usage of such sources is difficult since information is only available in the form of free natural-language documents rather than structured databases as required by traditional data mining techniques. Information Extraction (IE) addresses the problem of transforming a corpus of textual documents into a more structured database. In this paper, on a corpora of Italian newspapers articles related to choking in children due to ingestion/inhalation of foreign body we compared the performance of three IE algorithms- (a) a classical rule based system which requires a manual annotation of the rules; (ii) a rule based system which allows for the automatic building of rules; (b) a machine learning method based on Support Vector Machine. Although some useful indications are extracted from the newspaper clippings, this approach is at the time far from being routinely implemented for injury surveillance purposes.
机译:基于传统医院记录或临床数据的伤害监测系统的优势在于,它是一个完善的,高度可靠的信息源,可以对特定伤害(例如儿童窒息)进行主动监视。但是,由于数据收集程序效率低下,它们具有使数据可供分析使用的延迟的缺点。从这个意义上讲,基于临床的注册表与诸如报纸文章之类的非常规数据源的集成具有使该系统对早期警报更加有用的优势。由于仅以免费的自然语言文档的形式提供信息,而不是传统数据挖掘技术所需的结构化数据库,因此很难使用这些资源。信息提取(IE)解决了将文本文档的语料库转换为结构更复杂的数据库的问题。在本文中,在一系列意大利报纸上有关因摄入/吸入异物而导致儿童窒息的文章中,我们比较了三种IE算法的性能-(a)一种基于经典规则的系统,需要手动注释规则; (ii)允许自动建立规则的基于规则的系统; (b)一种基于支持向量机的机器学习方法。尽管从剪报中提取了一些有用的指示,但这种方法在当时远非常规用于伤害监视目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号