首页> 外文会议>International conference on emerging technologies for information systems, computing, and management >Construction of Linguistic Resources for Information Extraction of News Reports on Corporate Merger and Acquisition
【24h】

Construction of Linguistic Resources for Information Extraction of News Reports on Corporate Merger and Acquisition

机译:企业并购新闻报道信息提取的语言资源建设

获取原文

摘要

Detecting real time corporate merger and acquisition information from publicly available text data and feeding it to the decision-making module are essential for an applicable e-business management system. There are plenty of machine learning algorithms in text categorization and information extraction to address the problem, requiring different feature selection methods. Among them, linguistic features are key issues to accomplish this task. The acquisition of IBM's PC division by Lenovo in 2004 was chosen as a case, and news reports of this event were gathered from the Internet to build a Corporate Merger and Acquisition (M&A) mini-corpus. Comparing the M&A corpus with larger general corpora, we constructed a feature word list by applying Term Frequency and Inverse Document Frequency strategies and augmented it by introduction of the word groups from thesauri. Typical patterns, which highlighted the event of M&A, were collected by employing regular expression matching on these words acquired in the former step. By means of the accumulated language resources, the precision and recall for predicting the amount of the M&A in Chinese and English texts are 61.76 %, 65.22 % and 84 %, 71.43 % respectively.
机译:从可公开获取的文本数据中检测实时的公司合并和收购信息,并将其提供给决策模块对于适用的电子商务管理系统而言至关重要。在文本分类和信息提取中有很多机器学习算法可以解决此问题,因此需要不同的特征选择方法。其中,语言功能是完成此任务的关键问题。以联想于2004年收购IBM PC部门为例,该事件的新闻报道从互联网上收集,以建立一个企业并购(M&A)小型公司。将并购语料库与较大的一般语料库进行比较,我们通过应用术语频率和逆文档频率策略来构建特征词列表,并通过引入词库中的词组来对其进行扩充。通过对前一步中获取的单词进行正则表达式匹配,收集了突出显示并购事件的典型模式。借助积累的语言资源,中文和英文文本中并购金额的预测准确度和召回率分别为61.76%,65.22%和84%,71.43%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号