首页> 外文学位 >Semantic Web for Everyone: Exploring Semantic Web Knowledge Bases via Contextual Tag Clouds and Linguistic Interpretations.
【24h】

Semantic Web for Everyone: Exploring Semantic Web Knowledge Bases via Contextual Tag Clouds and Linguistic Interpretations.

机译:适合所有人的语义网:通过上下文标记云和语言解释来探索语义网知识库。

获取原文
获取原文并翻译 | 示例

摘要

The amount of Semantic Web data is huge and still keeps growing rapidly today. However most users are still not able to use a Semantic Web Knowledge Base (KB) effectively as desired to due to the lack of various background knowledge. Furthermore, the data is usually heterogeneous, incomplete, and even contains errors, which further impairs understanding the dataset. How to quickly familiarize users with the ontology and data in a KB is an important research challenge to the Semantic Web community.;The core part of our proposed resolution to the problem is the contextual tag cloud system: a novel application that helps users explore a large scale RDF(Resource Description Framework) dataset. The tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include different RDFS entailment regimes in the calculations of tag sizes, thereby understanding the impact of semantics on the data. To resolve the key challenge of scalability, we combine a scalable preprocessing approach with a specially-constructed inverted index and co-occurrence matrix, use three approaches to prune unnecessary counts for faster online computations, and design a paging and streaming interface. Via experimentation, we show how much our design choices benefit the responsiveness of our system. We conducted a preliminary user study on this system, and find novice participants felt the system provided a good means to investigate the data and were able to complete assigned tasks more easily than using a baseline interface.;We then extend the definition of tags to more general categories, particularly including property values, chaining property values, or functions on these values. With a totally different scenario and more general tags, we find the system can be used to discover interesting value space patterns. To adapt the different dataset, we modify the infrastructure with new indexing data structure, and propose two strategies for online queries, which will be chosen based on different requests, in order to maintain responsiveness of the system.;In addition, we consider other approaches to help users locate classes by natural language inputs. Using an external lexicon, Word Sense Disambiguation (WSD) on the label words of classes is one way to understand these classes. We propose our novel WSD approach with our probability model, derive the problem formula into small computable pieces, and propose ways to estimate the values of these pieces. For the other approach, instead of relying on external sources, we investigate how to retrieve query-relevant classes by using the annotations of instances associated with classes in the knowledge base. We propose a general framework of this approach, which consists of two phases: the keyword query is first used to locate relevant instances; then we induce the classes given this list of weighted matched instances.;Following the description of the accomplished work, I propose some important future work for extending the current system, and finally conclude the dissertation.
机译:语义Web数据量巨大,并且今天仍保持快速增长。但是,由于缺乏各种背景知识,大多数用户仍然无法有效使用语义Web知识库(KB)。此外,数据通常是异构的,不完整的,甚至包含错误,这进一步削弱了对数据集的理解。如何快速使用户熟悉知识库中的本体和数据是语义Web社区面临的重要研究挑战。我们提出的解决此问题的核心部分是上下文标记云系统:一种新颖的应用程序,可以帮助用户探索知识库。大型RDF(资源描述框架)数据集。我们系统中的标签是本体术语(类和属性),用户可以使用一组定义实例子集的标签来构建上下文。然后,在上下文标签云中,每个标签的字体大小取决于与该标签关联的实例数以及上下文中的所有标签。每个上下文标签云都是相关数据分布的摘要,并且通过更改上下文,用户可以快速了解数据中的模式。此外,用户可以选择在标签大小的计算中包括不同的RDFS包含机制,从而了解语义对数据的影响。为了解决可扩展性的关键挑战,我们将可扩展的预处理方法与经过特殊构造的倒排索引和共现矩阵相结合,使用三种方法来修剪不必要的计数以进行更快的在线计算,并设计一个分页和流接口。通过实验,我们显示了我们的设计选择在很大程度上有益于系统的响应能力。我们对该系统进行了初步的用户研究,发现新手参与者认为该系统提供了一种很好的手段来研究数据,并且比使用基线界面更容易完成分配的任务;然后将标签的定义扩展到更多一般类别,尤其包括属性值,链接属性值或这些值上的函数。通过完全不同的场景和更通用的标签,我们发现该系统可用于发现有趣的价值空间模式。为了适应不同的数据集,我们用新的索引数据结构修改了基础结构,并提出了两种在线查询策略,这些策略将根据不同的请求进行选择,以保持系统的响应能力。此外,我们还考虑了其​​他方法帮助用户通过自然语言输入来定位课程。使用外部词典,对类的标签单词进行词义歧义消除(WSD)是理解这些类的一种方法。我们用概率模型提出了新颖的WSD方法,将问题公式推导为可计算的小片段,并提出了估算这些片段的值的方法。对于另一种方法,我们不依赖外部资源,而是研究如何通过使用与知识库中与类相关联的实例的注释来检索与查询相关的类。我们提出了这种方法的通用框架,该框架包括两个阶段:关键字查询首先用于定位相关实例;然后,在给出加权匹配实例的列表的基础上,归纳出相应的类。在完成工作的描述之后,我提出了一些扩展当前系统的重要工作,最后总结了论文。

著录项

  • 作者

    Zhang, Xingjian.;

  • 作者单位

    Lehigh University.;

  • 授予单位 Lehigh University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 230 p.
  • 总页数 230
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号