首页> 外文学位 >Using web archives to enrich the live web experience through storytelling.
【24h】

Using web archives to enrich the live web experience through storytelling.

机译:使用网络档案馆通过讲故事来丰富现场网络体验。

获取原文
获取原文并翻译 | 示例

摘要

Much of our cultural discourse occurs primarily on the Web. Thus, Web preservation is a fundamental precondition for multiple disciplines. Archiving Web pages into themed collections is a method for ensuring these resources are available for posterity. Services such as Archive-It exists to allow institutions to develop, curate, and preserve collections of Web resources. Understanding the contents and boundaries of these archived collections is a challenge for most people, resulting in the paradox of the larger the collection, the harder it is to understand. Meanwhile, as the sheer volume of data grows on the Web, "storytelling" is becoming a popular technique in social media for selecting Web resources to support a particular narrative or "story".;In this dissertation, we address the problem of understanding the archived collections through proposing the Dark and Stormy Archive (DSA) framework, in which we integrate "storytelling" social media and Web archives. In the DSA framework, we identify, evaluate, and select candidate Web pages from archived collections that summarize the holdings of these collections, arrange them in chronological order, and then visualize these pages using tools that users already are familiar with, such as Storify.;To inform our work of generating stories from archived collections, we start by building a baseline for the structural characteristics of popular (i.e., receiving the most views) human-generated stories through investigating stories from Storify. Furthermore, we checked the entire population of Archive-It collections for better understanding the characteristics of the collections we intend to summarize. We then filter off-topic pages from the collections the using different methods to detect when an archived page in a collection has gone off-topic. We created a gold standard dataset from three Archive-It collections to evaluate the proposed methods at different thresholds. From the gold standard dataset, we identified five behaviors for the TimeMaps (a list of archived copies of a page) based on the page's aboutness. Based on a dynamic slicing algorithm, we divide the collection and cluster the pages in each slice. We then select the best representative page from each cluster based on different quality metrics (e.g., the replay quality, and the quality of the generated snippet from the page). At the end, we put the selected pages in chronological order and visualize them using Storify.;For evaluating the DSA framework, we obtained a ground truth dataset of hand-crafted stories from Archive-It collections generated by expert archivists. We used Amazon's Mechanical Turk to evaluate the automatically generated stories against the stories that were created by domain experts. The results show that the automatically generated stories by the DSA are indistinguishable from those created by human subject domain experts, while at the same time both kinds of stories (automatic and human) are easily distinguished from randomly generated stories.
机译:我们的许多文化论述主要发生在网络上。因此,Web保留是多学科的基本前提。将网页归档到主题集合中是一种确保后代可以使用这些资源的方法。存在诸如Archive-It之类的服务,以允许机构开发,管理和保存Web资源的集合。对于大多数人来说,了解这些存档馆藏的内容和界限是一个挑战,这导致馆藏越大,越难理解的悖论。同时,随着网络上海量数据的增长,“讲故事”正成为社交媒体中一种流行的技术,用于选择Web资源以支持特定的叙事或“故事”。通过提出“黑暗与暴风雨档案”(DSA)框架来归档档案,在该框架中我们整合了“讲故事”的社交媒体和Web档案。在DSA框架中,我们从已归档的集合中识别,评估和选择候选Web页面,以总结这些集合的内容,按时间顺序排列它们,然后使用用户已经熟悉的工具(如Storify)可视化这些页面。 ;要告知我们从存档收藏中生成故事的工作,我们首先通过调查Storify的故事来为流行的(即,获得最多观看次数)人类生成的故事的结构特征建立基线。此外,我们检查了全部Archive-It馆藏,以更好地了解我们要总结的馆藏的特征。然后,我们使用不同的方法从集合中过滤掉主题外的页面,以检测集合中的存档页面何时变得主题外。我们从三个Archive-It集合创建了黄金标准数据集,以在不同阈值下评估所提出的方法。从黄金标准数据集中,我们根据页面的相关性为TimeMap(页面的存档副本列表)确定了五个行为。基于动态切片算法,我们对集合进行划分并将每个切片中的页面聚类。然后,我们根据不同的质量指标(例如,重播质量和从页面生成的摘要的质量)从每个群集中选择最佳的代表性页面。最后,我们按时间顺序对所选页面进行排序,并使用Storify对其进行可视化。为了评估DSA框架,我们从专家档案管理员生成的Archive-It集合中获取了手工制作故事的地面事实数据集。我们使用了Amazon的Mechanical Turk来根据领域专家创建的故事评估自动生成的故事。结果表明,DSA的自动生成的故事与人类学科领域专家创建的故事没有区别,同时,两种类型的故事(自动和人类)都容易与随机生成的故事区分开。

著录项

  • 作者

    AlNoamany, Yasmin.;

  • 作者单位

    Old Dominion University.;

  • 授予单位 Old Dominion University.;
  • 学科 Computer science.;Web studies.;Information science.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 257 p.
  • 总页数 257
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 古生物学;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号