首页> 外文学位 >Extracting Information From Heterogeneous Internet of Things Data Streams
【24h】

Extracting Information From Heterogeneous Internet of Things Data Streams

机译:从异构的物联网数据流中提取信息

获取原文
获取原文并翻译 | 示例

摘要

Recent advancements in sensing, networking technologies and collecting real-world data on a large scale and from various environments have created an opportunity for new forms of services and applications. This is known under the umbrella term of the Internet of Things (IoT). Physical sensor devices constantly produce very large amounts of data. Methods are needed which give the raw sensor measurements a meaningful interpretation for building automated decision support systems. One of the main research challenges in this domain is to extract actionable information from real-world data, that is information that can readily be used to make informed automatic decisions in intelligent systems. Most existing approaches are application or domain dependent or are only able to deal with specific data sources of one kind. This PhD research concerns multiple approaches for analysing IoT data streams. We propose a method which determines how many different clusters can be found in a stream based on the data distribution. After selecting the number of clusters, we use an online clustering mechanism to cluster the incoming data from the streams. Our approach remains adaptive to drifts by adjusting itself as the data changes. The work is benchmarked against state-of-the art stream clustering algorithms on data streams with data drift. We show how our method can be applied in a use case scenario involving near real-time traffic data. Our results allow to cluster, label and interpret IoT data streams dynamically according to the data distribution. This enables to adaptively process large volumes of dynamic data online based on the current situation. We show how our method adapts itself to the changes and we demonstrate how the number of clusters in a real-world data stream can be determined by analysing the data distributions.;Using the ideas and concepts of this approach as a starting point we designed another novel dynamic and adaptable clustering approach that is more suitable for multi-variate time-series data clustering. Our solution uses probability distributions and analytical methods to adjust the centroids as the data and feature distributions change over time. We have evaluated our work against some well-known time-series clustering methods and have shown how the proposed method can reduce the complexity and perform efficient in multi-variate datastreams.;Finally we propose a method that uncovers hidden structures and relations between multiple IoT data streams. Our novel solution uses Latent Dirichlet Allocation (LDA), a topic extraction method that is generally used in text analysis. We apply LDA on meaningful labels that describe the numerical data in human understandable terms. To create the labels we use Symbolic Aggregate approXimation (SAX), a method that converts raw data into string-based patterns. The extracted patterns are then transformed with a rule engine into the labels.;The work investigates how heterogeneous sensory data from multiple sources can be processed and analysed to create near real-time intelligence and how our proposed method provides an efficient way to interpret patterns in the data streams. The proposed method provides a novel way to uncover the correlations and associations between different pattern in IoT data streams. The evaluation results show that the proposed solution is able to identify the correlation with high efficiency with an F-measure up to 90%.;Overall, this PhD research has designed, implemented and evaluated unsupervised adaptive algorithms to analyse, structure and extract information from dynamic and multi-variate sensory data streams. The results of this research has significant impact in designing flexible and scalable solutions in analysing real-world sensory data streams and specially in cases where labelled and annotated data is not available or it is too costly to be collected. Research and advancements in healthcare and smarter cities are two key areas that can directly from this research.
机译:传感,网络技术和从各种环境中大规模收集现实世界数据的最新进展为各种形式的服务和应用创造了机会。这在物联网(IoT)的总称下是众所周知的。物理传感器设备不断产生大量数据。需要为原始传感器测量提供有意义的解释的方法,以构建自动化的决策支持系统。该领域的主要研究挑战之一是从现实世界的数据中提取可操作的信息,即可以轻松用于在智能系统中做出明智的自动决策的信息。大多数现有方法依赖于应用程序或域,或者只能处理一种特定的数据源。该博士研究涉及多种分析物联网数据流的方法。我们提出一种方法,该方法根据数据分布确定可以在流中找到多少个不同的群集。选择集群数量后,我们使用在线集群机制对来自流的传入数据进行集群。我们的方法通过随着数据的变化进行自我调整而保持对漂移的适应性。这项工作是针对具有数据漂移的数据流上的最新流聚类算法进行基准测试的。我们展示了如何在涉及近实时交通数据的用例场景中应用我们的方法。我们的结果允许根据数据分布动态地聚类,标记和解释IoT数据流。这使得可以根据当前情况在线自适应地处理大量动态数据。我们展示了我们的方法如何适应变化,并展示了如何通过分析数据分布来确定实际数据流中的簇数。;以这种方法的思想和概念为起点,我们设计了另一个方法新颖的动态和自适应聚类方法,更适合于多元时间序列数据聚类。当数据和特征分布随时间变化时,我们的解决方案使用概率分布和分析方法来调整质心。我们根据一些著名的时间序列聚类方法对我们的工作进行了评估,并展示了所提出的方法如何在多变量数据流中降低复杂性并高效执行;最后,我们提出了一种揭示隐藏结构和多个IoT之间关系的方法数据流。我们的新颖解决方案使用潜在Dirichlet分配(LDA),这是一种通常在文本分析中使用的主题提取方法。我们将LDA应用于有意义的标签,这些标签以人类可以理解的术语描述数值数据。为了创建标签,我们使用了符号聚合方法(SAX),该方法将原始数据转换为基于字符串的模式。然后使用规则引擎将提取的模式转换为标签。;工作研究如何处理和分析来自多个来源的异类感官数据以创建近实时智能,以及我们提出的方法如何提供一种有效的方式来解释模式。数据流。所提出的方法提供了一种新颖的方式来揭示物联网数据流中不同模式之间的关联和关联。评估结果表明,所提出的解决方案能够以高达90%的F值识别效率高的相关性。总体而言,该博士研究设计,实施和评估了无监督自适应算法,以分析,构造和提取信息。动态和多元感官数据流。这项研究的结果对于设计用于分析现实世界中的感官数据流的灵活,可扩展的解决方案具有重大影响,尤其是在没有可用标签和注释数据或收集成本太高的情况下。医疗保健和智慧城市的研究与进步是可以直接从这项研究中获得的两个关键领域。

著录项

  • 作者

    Puschmann, Daniel.;

  • 作者单位

    University of Surrey (United Kingdom).;

  • 授予单位 University of Surrey (United Kingdom).;
  • 学科 Computer engineering.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 107 p.
  • 总页数 107
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号