首页> 外国专利> A METHOD OF EXPLORING DATABASES OF TIME-STAMPED DATA IN ORDER TO DISCOVER DEPENDENCIES BETWEEN THE DATA AND PREDICT FUTURE TRENDS

A METHOD OF EXPLORING DATABASES OF TIME-STAMPED DATA IN ORDER TO DISCOVER DEPENDENCIES BETWEEN THE DATA AND PREDICT FUTURE TRENDS

机译:有序数据挖掘数据库的一种方法,以发现数据与预测未来趋势之间的依赖关系

摘要

A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends, characterized in that internet data ("soft data") collected from the source (websites, social networking sites, blogs, web forums as well as words or phrases put in web search engines) and statistical data ("hard data") available in different forms (lists, tables, reports) are transformed (converted) into normalized time series, and then associated (linked) with the predicted time series, using a locality-sensitive hashing algorithm based on projection of the series onto the basis of normalized gaussian random walks of the same length; the keyword-based ("soft") data are transformed into time series by measuring the relative frequency of occurrence in the source text, weighted by the relevance of the keyword; the N-bits long LSH hashes are encoded into a text string consisting of N trigrams and stored into an indexed field of a relational database or search engine; the selected best series (predictors), found by searching the database for the most similar trigram-encoded hashes, are then used as inputs for the multiple regression algorithm to generate a forecast series for a specified number of time intervals in the future; during the process of training, the irrelevant input series are filtered-out from the initial input set; as a result, the remaining inputs are the relevant predictors of the target time series, revealing the hidden dependencies between seemingly unrelated data, while the root-mean-square error from the training process indicates uncertainty of the obtained results.
机译:一种探究时间戳数据数据库以发现数据之间的依存关系并预测未来趋势的方法,其特征是从源(网站,社交网络)收集的互联网数据(“软数据”)网站,博客,网络论坛以及放置在网络搜索引擎中的单词或短语)和以不同形式(列表,表格,报告)可用的统计数据(“硬数据”)被转换(转换为)标准化时间序列,然后使用局部敏感的哈希算法,将预测的时间序列与预测的时间序列关联(链接),该算法基于时间序列在相同长度的归一化高斯随机游动的基础上的投影;通过测量源文本中出现的相对频率,将基于关键字的(“软”)数据转换为时间序列,并按关键字的相关性进行加权;将N位长的LSH哈希编码为由N个三元组组成的文本字符串,并将其存储在关系数据库或搜索引擎的索引字段中;然后,通过在数据库中搜索最相似的三元编码编码的哈希值而找到的最佳序列(预测变量)用作多元回归算法的输入,以在将来生成指定时间间隔的预测序列;在训练过程中,从初始输入集中滤除无关的输入序列;结果,剩余的输入是目标时间序列的相关预测变量,揭示了看似无关的数据之间的隐藏依赖性,而训练过程的均方根误差表明了所得结果的不确定性。

著录项

  • 公开/公告号EP3493082A1

    专利类型

  • 公开/公告日2019-06-05

    原文格式PDF

  • 申请/专利权人 OKE POLAND SPOLKA Z O.O.;

    申请/专利号EP20170460071

  • 发明设计人 PYTLASINSKI ARTUR GRZEGORZ;

    申请日2017-11-29

  • 分类号G06F17/30;G06Q10/04;

  • 国家 EP

  • 入库时间 2022-08-21 12:26:13

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号