首页>
外国专利>
A METHOD OF EXPLORING DATABASES OF TIME-STAMPED DATA IN ORDER TO DISCOVER DEPENDENCIES BETWEEN THE DATA AND PREDICT FUTURE TRENDS
A METHOD OF EXPLORING DATABASES OF TIME-STAMPED DATA IN ORDER TO DISCOVER DEPENDENCIES BETWEEN THE DATA AND PREDICT FUTURE TRENDS
展开▼
机译:有序数据挖掘数据库的一种方法,以发现数据与预测未来趋势之间的依赖关系
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method of exploring databases of time-stamped data in order to discover dependencies between the data and predict future trends, characterized in that internet data ("soft data") collected from the source (websites, social networking sites, blogs, web forums as well as words or phrases put in web search engines) and statistical data ("hard data") available in different forms (lists, tables, reports) are transformed (converted) into normalized time series, and then associated (linked) with the predicted time series, using a locality-sensitive hashing algorithm based on projection of the series onto the basis of normalized gaussian random walks of the same length; the keyword-based ("soft") data are transformed into time series by measuring the relative frequency of occurrence in the source text, weighted by the relevance of the keyword; the N-bits long LSH hashes are encoded into a text string consisting of N trigrams and stored into an indexed field of a relational database or search engine; the selected best series (predictors), found by searching the database for the most similar trigram-encoded hashes, are then used as inputs for the multiple regression algorithm to generate a forecast series for a specified number of time intervals in the future; during the process of training, the irrelevant input series are filtered-out from the initial input set; as a result, the remaining inputs are the relevant predictors of the target time series, revealing the hidden dependencies between seemingly unrelated data, while the root-mean-square error from the training process indicates uncertainty of the obtained results.
展开▼