首页> 外文会议>International Conference on Computational Linguistics >Unsupervised Fact Checking by Counter-Weighted Positive and Negative Evidential Paths in A Knowledge Graph
【24h】

Unsupervised Fact Checking by Counter-Weighted Positive and Negative Evidential Paths in A Knowledge Graph

机译:通过对知识图中的反加权正和负证据路径检查无监督的事实

获取原文

摘要

Misinformation spreads across media, community, and knowledge graphs in the Web by not only human agents but also information extraction algorithms that extract factual statements from unstructured textual data to populate the existing knowledge graphs. Traditional fact checking by experts or crowds is increasingly difficult to keep pace with the volume of newly created misinformation in the Web. Therefore, it is important and necessary to enhance the computational ability to determine whether a given factual statement is truthful or not. We view this problem as a truth scoring task in a knowledge graph. We present a novel rule-based approach that finds positive and negative evidential paths in a knowledge graph for a given factual statement, and calculates a truth score for the given statement by unsupervised ensemble of the found positive and negative evidential paths. For example, we can determine the factual statement "United States is the birth place of Barack Obama" as truthful if there is the positive evidential path (Barack Obama, birthplace, Hawaii) Λ (Hawaii, country, United States) in a knowledge graph. For another example, we can determine the factual statement "Canada is the nationality of Barack Obama" as untruthful if there is the negative evidential path (Barack Obama, nationality, United States) Λ (United States, ≠, Canada) in a knowledge graph. For evaluating on a real-world situation, we constructed an evaluation dataset by labeling truth or untruth label on factual statements that were extracted from Wikipedia texts by using the state-of-the-art BERT-based information extraction system. Our evaluation results show that our approach outperforms the state-of-the-art unsupervised approaches significantly by up to 0.12 AUC-ROC and even outperforms the supervised approach by up to 0.05 AUC-ROC not only in our dataset but also in the two different standard datasets.
机译:错误信息不仅是人类代理的媒体,社区和知识图表,还可以通过人类代理传播,而且还可以从非结构化文本数据中提取事实语句的信息提取算法填充现有知识图。通过专家或人群检查的传统事实越来越难以与网络中的新创造的误导的数量保持同步。因此,增强了确定给定的事实陈述是否真实的计算能力是重要的,必要的。我们将此问题视为知识图中的真实评分任务。我们提出了一种基于规则的基于规则的方法,为特定的事实声明找到了知识图中的积极和负证据路径,并通过发现正面和负证据路径的无监督集合来计算给定陈述的真实性评分。例如,我们可以确定“美国是巴拉克奥巴马的出生地”,如有真实的证据(巴拉克奥巴马,出生地,夏威夷)λ(夏威夷,国家,美国)在知识图表中。另一个例子,我们可以确定“加拿大是巴拉克奥巴马的国籍”,如果有证知识图中的负数证据道路(巴拉克奥巴马,加拿大),那么“加拿大是巴拉克奥巴马的国籍”是不诚实的。为了评估真实世界的情况,我们通过使用最先进的BERT基信息提取系统来标记从维基百科文本提取的事实陈述的真实语句构建了评估数据集。我们的评价结果​​表明,我们的方法优于最先进的无监督方法,明显高达0.12 AUC-ROC,甚至优于监督方法,不仅在我们的数据集中最多0.05 AUC-ROC,而且在两种不同中标准数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号