首页> 外文学位 >Researcher expertise search, homepage finding and metadata annotation.
【24h】

Researcher expertise search, homepage finding and metadata annotation.

机译:研究人员专业知识搜索,主页查找和元数据注释。

获取原文
获取原文并翻译 | 示例

摘要

Expert Search, the problem of retrieving people with expertise on a queried topic, has important applications. For instance, conference organizers can use expert search capability while forming a panel for reviewing papers. Similarly, recruiters can use expert search to track potential employees for their companies. Although significant progress has been made on this problem, existing models for expert search are not tailored for academic domains. In academic domains, expert search and similar expert finding involve ranking researchers in response to topic and name queries based on academic documents. Academic or research documents are different from webpages in terms of their type (e.g., homepages, publications, grant proposals), structure (e.g., abstract, sections), associated metadata (e.g., venue, authors) and connections (e.g., citations).;Enabling expert search in an open-access digital library such as CiteSeer is challenging since academic documents are not directly available for estimating expertise. Instead, CiteSeer acquires freely-available publications and other relevant academic documents by crawling the Web. Previous studies indicate that researchers list their publication information online using their homepages since this substantially increases the impact of their work. It becomes imperative, therefore, to periodically track researcher homepage URLs in CiteSeer for obtaining up-to-date collections of academic documents. In addition to their use as a resource for academic documents, professional homepages of researchers also typically include descriptions of research interests and other metadata that is crucial in tasks such as author disambiguation and profile extraction.;Despite several studies on homepage finding in the context of the general web, academic homepage finding is not fully addressed in existing research. The first question we address in this dissertation is: how can we acquire an accurate homepage collection? We study this question in two settings. First, we study academic homepage finding on the Web. Given the results of web search for a researcher name query, our goal is to identify the correct homepage in the set of pages retrieved from the Web. We design features based on insights from content analysis of known academic homepages to learn a ranking function for academic homepage retrieval. In the second setting, we address homepage finding on university department websites where academic homepages need to be discriminated from other kinds of academic webpages. Despite training the classifier on "outdated" webpage instances, we show that unlabeled data and multiple views of webpages can be used to adapt our classifier to current-day academic websites.;In the second part of this dissertation, we address expert search for academic domains. We study ranking models for researchers in response to topic and name queries. We use the content of research documents and the structural connections among documents to build query-dependent graphs for scoring researchers. We propose a simple extension to PageRank for combining evidence from multiple types of documents. This model scores researchers based on the structural connections among the documents and the importance of each document-type. In addition, we propose Author-Document-Topic graphs for scoring researchers based on the topical content of documents generated by them. Our models handle name and topic queries uniformly and show state-of-the-art retrieval performance on expert finding tasks.
机译:专家搜索(检索专家)是一个重要的应用程序。例如,会议组织者可以使用专家搜索功能,同时组成一个小组来审阅论文。同样,招聘人员可以使用专家搜索来跟踪公司的潜在员工。尽管在此问题上已经取得了重大进展,但现有的专家搜索模型并未针对学术领域量身定制。在学术领域,专家搜索和类似的专家发现涉及对研究人员进行排名,以响应基于学术文档的主题和姓名查询。学术或研究文档与网页在类型(例如,主页,出版物,赠款提案),结构(例如,摘要,章节),关联的元数据(例如,地点,作者)和连接(例如,引文)方面有所不同。 ;在CiteSeer这样的开放式数字图书馆中进行专家搜索非常困难,因为学术文献无法直接用于估算专业知识。相反,CiteSeer通过爬网获取免费提供的出版物和其他相关的学术文档。先前的研究表明,研究人员使用其主页在线列出了他们的出版物信息,因为这大大增加了他们工作的影响力。因此,必须定期在CiteSeer中跟踪研究人员的主页URL,以获取最新的学术文献资料集。除了用作学术文件的资源之外,研究人员的专业主页通常还包括对研究兴趣和其他元数据的描述,这些内容对作者进行歧义消除和个人资料提取等任务至关重要。在一般的网络中,现有研究并未完全解决学术首页的查找问题。在本文中,我们要解决的第一个问题是:如何获取准确的主页集?我们在两种情况下研究这个问题。首先,我们研究网络上的学术主页查找。给定研究人员姓名查询的网络搜索结果,我们的目标是在从Web检索的一组页面中识别正确的主页。我们基于对已知学术主页的内容分析的见解来设计功能,以学习学术主页检索的排名功能。在第二种设置中,我们处理大学部门网站上的主页查找问题,在这些网站上,需要将学术主页与其他种类的学术网页区分开。尽管在“过时的”网页实例上对分类器进行了训练,但我们显示出未标记的数据和网页的多个视图可用于使分类器适应当今的学术网站。;在本文的第二部分,我们针对专家对学术界的搜索域。我们研究针对主题和名称查询的研究人员排名模型。我们使用研究文档的内容以及文档之间的结构联系来构建与查询相关的图表,以为研究人员评分。我们建议对PageRank进行简单扩展,以合并来自多种类型文档的证据。该模型根据文档之间的结构联系以及每种文档类型的重要性对研究人员进行评分。此外,我们根据作者所生成文档的主题内容,为作者评分提供了作者-文档-主题图。我们的模型统一处理名称和主题查询,并显示专家查找任务的最新检索性能。

著录项

  • 作者

    Gollapalli, Sujatha Das.;

  • 作者单位

    The Pennsylvania State University.;

  • 授予单位 The Pennsylvania State University.;
  • 学科 Information science.;Computer engineering.;Engineering.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 109 p.
  • 总页数 109
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号