首页> 外文学位 >Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods.

【24h】

Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods.

机译：利用基于图的方法，利用外部/领域知识来增强传统的文本挖掘。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Finding the best way to utilize external/domain knowledge to enhance traditional text mining has been a challenging task. The difficulty centers on the lack of means in representing a document with external/domain knowledge integrated. Graphs are powerful and versatile tools, useful in various subfields of science and engineering for their simple illustration of complicated problems. However, the graph-based approach on knowledge representation and discovery remains relatively unexplored. In this thesis, I propose a graph-based text mining system to incorporate semantic knowledge, document section knowledge, document linkage knowledge, and document category knowledge into the tasks of text clustering and topic analysis. I design a novel term-level graph knowledge representation and a graph-based clustering algorithm to incorporate semantic and document section knowledge for biomedical literature clustering and topic analysis. I present a Markov Random Field (MRF) with a Relaxation Labeling (RL) algorithm to incorporate document linkage knowledge. I evaluate different types of linkage among documents, including explicit linkage such as hyperlink and citation link, implicit linkage such as coauthor link and co-citation link, and pseudo linkage such as similarity link. I develop a novel semantic-based method to integrate Wikipedia concepts and categories as external knowledge into traditional document clustering. In order to support these new approaches, I develop two automated algorithms to extract multiword phrases and ontological concepts, respectively. The evaluations of news collection, web dataset, and biomedical literature prove the effectiveness of the proposed methods.;In the experiment of document clustering, the proposed term-level graph-based method not only outperforms the baseline k-means algorithm in all configurations but also is superior in terms of efficiency. The MRF-based algorithm significantly improves spherical k-means and model-based k-means clustering on the datasets containing explicit or implicit linkage; the Wikipedia knowledge-based clustering also improves the document-content-only-based clustering. On the task of topic analysis, the proposed graph presentation, sub graph detection, and graph ranking algorithm can effectively identify corpus-level topic terms and cluster-level topic terms.

机译：寻找利用外部/领域知识来增强传统文本挖掘的最佳方法一直是一项艰巨的任务。困难集中在缺乏表示集成了外部/领域知识的文档的手段上。图形是功能强大且用途广泛的工具，可用于简单地说明复杂问题，因此在科学和工程学的各个子领域中都非常有用。但是，关于知识表示和发现的基于图的方法仍然相对未被开发。本文提出了一种基于图的文本挖掘系统，将语义知识，文档部分知识，文档链接知识和文档类别知识纳入文本聚类和主题分析任务。我设计了一种新颖的术语级图知识表示法和基于图的聚类算法，以将语义和文档部分知识纳入到生物医学文献聚类和主题分析中。我提出了带有松弛标记（RL）算法的马尔可夫随机场（MRF），以结合文档链接知识。我评估了文档之间的不同类型的链接，包括显式链接（例如超链接和引文链接），隐式链接（例如合著者链接和共引文链接）以及伪链接（例如相似性链接）。我开发了一种新颖的基于语义的方法，将作为外部知识的Wikipedia概念和类别集成到传统文档聚类中。为了支持这些新方法，我开发了两种自动算法来分别提取多词短语和本体概念。对新闻收集，Web数据集和生物医学文献的评估证明了所提方法的有效性。；在文档聚类实验中，所提出的基于术语图的基于图的方法不仅在所有配置下均优于基线k均值算法，而且在效率方面也很优越。基于MRF的算法在包含显式或隐式链接的数据集上显着改善了球形k均值和基于模型的k均值聚类； Wikipedia基于知识的聚类也改进了仅基于文档内容的聚类。在主题分析任务上，提出的图表示，子图检测和图排名算法可以有效地识别语料库级主题词和聚类级主题词。

著录项

作者
Zhang, Xiaodan.;
展开▼
作者单位

Drexel University.;

展开▼
授予单位 Drexel University.;
学科 Information Science.;Computer Science.
学位 Ph.D.
年度 2009
页码 149 p.
总页数 149
原文格式 PDF
正文语种 eng
中图分类信息与知识传播;自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. An External Knowledge Enhanced Graph-based Neural Network for Sentence Ordering [J] . Yongjing Yin, Shaopeng Lai, Linfeng Song, The Journal of Artificial Intelligence Research . 2021,第a期

机译：用于句子排序的外部知识增强的基于图形的神经网络
2. Text mining for traditional Chinese medical knowledge discovery: a survey. [J] . Zhou X, Peng Y, Liu B Journal of biomedical informatics. . 2010,第4期

机译：中医知识发现的文本挖掘：一项调查。
3. Discovering treatment pattern in Traditional Chinese Medicine clinical cases by exploiting supervised topic model and domain knowledge [J] . Journal of biomedical informatics. . 2015,第Null期

机译：利用监督主题模型和领域知识发现中医临床病例的治疗模式
4. Exploring rules of traditional Chinese medicine external therapy and food therapy in treatment of mammary gland hyperplasia with text mining [C] . Shanshan Shen, Yaoxian Wang, Guang Zheng, IEEE International Conference on Bioinformatics and Biomedicine . 2014

机译：文本挖掘探索中医外治和食品疗法治疗乳腺增生的规则
5. Transformation of relational database domain into graph-based domain for graph-based data mining. [D] . Palod, Swapnil. 2004

机译：将关系数据库域转换为基于图的域以进行基于图的数据挖掘。
6. Of text and gene – using text mining methods to uncover hidden knowledge in toxicogenomics [O] . Mikyung Lee, Zhichao Liu, Reagan Kelly, 2014

机译：文本和基因的研究–使用文本挖掘方法发现毒理基因组学中的隐藏知识
7. From text mining to knowledge mining: An integrated framework of concept extraction and categorization for domain ontology [O] . Gillani Andleeb Saira 2015

机译：从文本挖掘到知识挖掘：领域本体概念提取和分类的集成框架

Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods.

摘要

著录项

相似文献

相关主题

期刊订阅