【24h】

Unstructured data extraction in distributed NoSQL

机译:分布式NoSQL中的非结构化数据提取

获取原文
获取原文并翻译 | 示例

摘要

While “Big data” has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.
机译:尽管“大数据”在轻松访问大量数据方面带来了良好的消息,但我们也面临着挑战。由于当今的数据是非结构化的,因此针对面向模式的数据源提出的现有数据库知识发现(KDD)流程不再适用。以前,我们部署了一个称为TouchR的工具,该工具依赖于隐马尔可夫模型(HMM)从非结构化数据源(特别是NoSQL数据库)中提取术语。本文对最初部署的版本进行了改进,在该版本中,我们引入了可重复使用的字典和关联规则以提高提取术语的质量。而且,该工具在其当前阶段更适合于基于最频繁搜索的术语的用户搜索。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号