【24h】

Taming the metadata mess

机译:驯服元数据混乱

获取原文
获取原文并翻译 | 示例

摘要

The rapid growth of scientific data shows no sign of abating. This growth has led to a new problem: with so much scientific data at hand, stored in thousands of datasets, how can scientists find the datasets most relevant to their research interests? We have addressed this problem by adapting Information Retrieval techniques, developed for searching text documents, into the world of (primarily numeric) scientific data. We propose an approach that uses a blend of automated and “semi-curated” methods to extract metadata from large archives of scientific data, then evaluates ranked searches over this metadata. We describe a challenge identified during an implementation of our approach: the large and expanding list of environmental variables captured by the archive do not match the list of environmental variables in the minds of the scientists. We briefly characterize the problem and describe our initial thoughts on resolving it.
机译:科学数据的迅速增长没有丝毫减弱的迹象。这种增长导致了一个新问题:手头拥有如此之多的科学数据并存储在成千上万个数据集中,科学家如何才能找到与其研究兴趣最相关的数据集?我们通过将为检索文本文档而开发的信息检索技术改编为(主要是数字的)科学数据世界来解决此问题。我们提出了一种方法,该方法使用自动化和“半策划”方法的混合从大型科学数据档案中提取元数据,然后评估对该元数据进行的排名搜索。我们描述了在实施方法过程中发现的挑战:档案馆捕获的庞大且不断扩大的环境变量列表与科学家们认为的环境变量列表不匹配。我们简要地描述了该问题,并描述了解决该问题的最初想法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号