首页> 美国卫生研究院文献>Database: The Journal of Biological Databases and Curation >Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge
【2h】

Query expansion using MeSH terms for dataset retrieval: OHSU at the bioCADDIE 2016 dataset retrieval challenge

机译:使用MeSH术语进行数据集检索的查询扩展:OHSU在bioCADDIE 2016数据集检索挑战中

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Scientific data are being generated at an ever-increasing rate. The Biomedical and Healthcare Data Discovery Index Ecosystem (bioCADDIE) is an NIH-funded Data Discovery Index that aims to provide a platform for researchers to locate, retrieve, and share research datasets. The bioCADDIE 2016 Dataset Retrieval Challenge was held to identify the most effective dataset retrieval methods. We aimed to assess the value of Medical Subject Heading (MeSH) term-based query expansion to improve retrieval. Our system, based on the open-source search engine, Elasticsearch, expands queries by identifying synonyms from the MeSH vocabulary and adding these to the original query. The number and relative weighting of MeSH terms is variable. The top 1000 search results for the 15 challenge queries were submitted for evaluation. After the challenge, we performed additional runs to determine the optimal number of MeSH terms and weighting. Our best overall score used five MeSH terms with a 1:5 terms:words weighting ratio, achieving an inferred normalized distributed cumulative gain (infNDCG) of 0.445, which was the third highest score among the 10 research groups who participated in the challenge. Further testing revealed our initial combination of MeSH terms and weighting yielded the best overall performance. Scores varied considerably between queries as well as with different variations of MeSH terms and weights. Query expansion using MeSH terms can enhance search relevance of biomedical datasets. High variability between queries and system variables suggest room for improvement and directions for further research. >Database URL:
机译:科学数据正以越来越高的速度生成。生物医学和医疗保健数据发现指数生态系统(bioCADDIE)是NIH资助的数据发现指数,旨在为研究人员提供一个平台,以查找,检索和共享研究数据集。举行了bioCADDIE 2016数据集检索挑战赛,以确定最有效的数据集检索方法。我们旨在评估基于医学主题词(MeSH)术语的查询扩展在提高检索价值方面的价值。我们的系统基于开源搜索引擎Elasticsearch,通过从MeSH词汇表中识别同义词并将其添加到原始查询中来扩展查询。 MeSH项的数量和相对权重是可变的。 15个质询的前1000个搜索结果已提交进行评估。挑战之后,我们进行了额外的运行以确定最佳的MeSH项数和权重。我们最好的总体评分使用了五个MeSH术语,以及1:5的word:words词权重比,得出的归一化归一化分布累积增益(infNDCG)为0.445,在参与挑战的10个研究组中排名第三。进一步的测试表明,我们将MeSH术语和权重进行了初步组合,从而获得了最佳的整体性能。各个查询之间的得分差异很大,MeSH术语和权重也有所不同。使用MeSH术语的查询扩展可以增强生物医学数据集的搜索相关性。查询和系统变量之间的高度可变性提示了改进的空间和进一步研究的方向。 >数据库网址:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号