首页> 外文学位 >Multifaceted geotagging for streaming news.
【24h】

Multifaceted geotagging for streaming news.

机译:流媒体新闻的多方位地理标记。

获取原文
获取原文并翻译 | 示例

摘要

News sources on the Web generate constant streams of information, describing the events that shape our world. In particular, geography plays a key role in the news, and understanding the geographic information present in news allows for its useful spatial browsing and retrieval. This process of understanding is called geotagging, and involves first finding in the document all textual references to geographic locations, known as toponyms, and second, assigning the correct lat/long values to each toponym, steps which are termed toponym recognition and toponym resolution, respectively. These steps are difficult due to ambiguities in natural language: some toponyms share names with non-location entities, and further, a given toponym can have many location interpretations. Removing these ambiguities is crucial for successful geotagging.;To this end, geotagging methods are described which were developed for streaming news. First, a spatio-textual search engine named STEWARD, and an interactive map-based news browsing system named NewsStand are described, which feature geotaggers as central components, and served as motivating systems and experimental testbeds for developing geotagging methods. Next, a geotagging methodology is presented that follows a multifaceted approach involving a variety of techniques. First, a multifaceted toponym recognition process is described that uses both rule-based and machine learning--based methods to ensure high toponym recall. Next, various forms of toponym resolution evidence are explored. One such type of evidence is lists of toponyms, termed comma groups, whose toponyms share a common thread in their geographic properties that enables correct resolution. In addition to explicit evidence, authors take advantage of the implicit geographic knowledge of their audiences. Understanding the local places known by an audience, termed its local lexicon, affords great performance gains when geotagging articles from local newspapers, which account for the vast majority of news on the Web. Finally, considering windows of text of varying size around each toponym, termed adaptive context, allows for a tradeoff between geotagging execution speed and toponym resolution accuracy. Extensive experimental evaluations of all the above methods, using existing and two newly-created, large corpora of streaming news, show great performance gains over several competing prominent geotagging methods.
机译:Web上的新闻源会不断产生信息流,描述塑造我们世界的事件。特别地,地理在新闻中起着关键作用,并且了解新闻中存在的地理信息可以对其进行有用的空间浏览和检索。这种理解过程称为地理标记,其中包括首先在文档中找到对地理位置的所有文字参考,称为地名,其次,为每个地名分配正确的经/纬度值,这些步骤称为地名识别和地名解析,分别。由于自然语言的歧义,这些步骤很困难:某些地名与非位置实体共享名称,此外,给定的地名可以具有许多位置解释。消除这些歧义对于成功进行地理标记至关重要。为此,本文描述了为流新闻开发的地理标记方法。首先,描述了一个名为STEWARD的时空文本搜索引擎,以及一个名为NewsStand的基于交互式地图的新闻浏览系统,该系统以地标作为主要组成部分,并充当了开发地理标记方法的激励系统和实验平台。接下来,提出了一种地理标记方法,该方法遵循涉及多种技术的多方面方法。首先,描述了一个多方面的地名识别过程,该过程同时使用基于规则和基于机器学习的方法来确保较高的地名回想率。接下来,探索各种形式的地名解析证据。一种这样的证据是被称为逗号组的地名列表,其地名在其地理属性中具有共同的线索,从而能够正确解决。除了明确的证据,作者还利用了读者的隐性地理知识。当对来自当地报纸的文章进行地理标记时,了解听众所知道的称为“地方词典”的地方,可以显着提高性能,这些报纸占据了网络上的绝大多数新闻。最后,考虑每个地名周围大小不同的文本窗口(称为自适应上下文),可以在地理标记执行速度和地名解析精度之间进行权衡。使用现有的和两个新创建的大型流新闻语料库,对上述所有方法进行了广泛的实验评估,结果表明,与几种竞争性的著名地理标记方法相比,该方法具有显着的性能提升。

著录项

  • 作者

    Lieberman, Michael David.;

  • 作者单位

    University of Maryland, College Park.;

  • 授予单位 University of Maryland, College Park.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 275 p.
  • 总页数 275
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号