【24h】

Global Pattern Search at Scale

机译:全球模式搜索规模

获取原文

摘要

In recent years, data collection has far outpaced the tools for data analysis in the area of non-traditional GEOINT analysis. Traditional tools are designed to analyze small-scale numerical data, but there are few good interactive tools for processing large amounts of unstructured data such as raw text. In addition to the complexities of data processing, presenting the data in a way that is meaningful to the end user poses another challenge. In our work, we focused on analyzing a corpus of 35,000 news articles and creating an interactive geovisualization tool to reveal patterns to human analysts. Our comprehensive tool, Global Pattern Search at Scale (GPSS), addresses three major problems in data analysis: free text analysis, high volumes of data, and interactive visualization. GPSS uses an Accumulo database for high-volume data storage, and a matrix of word counts and event detection algorithms to process the free text. For visualization, the tool displays an interactive web application to the user, featuring a map overlaid with document clusters and events, search and filtering options, a timeline, and a word cloud. In addition, the GPSS tool can be easily adapted to process and understand other large free-text datasets.
机译:近年来,数据收集远远超过了非传统地理分析领域的数据分析工具。传统工具旨在分析小规模的数值数据,但很少有很多良好的交互工具,用于处理大量非结构化数据,如原始文本。除了数据处理的复杂性之外,以对最终用户有意义的方式呈现数据,呈现另一个挑战。在我们的工作中,我们专注于分析35,000个新闻文章的语料库,并创建一个互动地理化工具,以揭示人类分析师的模式。我们的综合工具,全球模式搜索以缩放(GPSS),解决了数据分析中的三个主要问题:自由文本分析,高卷数据和交互式可视化。 GPSS使用累计数据库进行高批量数据存储,以及用于处理自由文本的字数和事件检测算法的矩阵。为了可视化,该工具将交互式Web应用程序显示给用户,其中包含包含文档群集和事件,搜索和过滤选项,时间轴和单词云的地图。此外,GPSS工具可以很容易地适应处理和理解其他大型自由文本数据集。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号