The Corpus Query Middleware of Tomorrow - A Proposal for a Hybrid Corpus Query Architecture

机译：明天的语料库查询中间件-混合语料库查询体系结构的建议

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Development of dozens of specialized corpus query systems and languages over the past decades has let to a diverse but also fragmented landscape. Today we are faced with a plethora of query tools that each provide unique features, but which are also not interoperable and often rely on very specific database back-ends or formats for storage. This severely hampers usability both for end users that want to query different corpora and also for corpus designers that wish to provide users with an interface for querying and exploration. We propose a hybrid corpus query architecture as a first step to overcoming this issue. It takes the form of a middleware system between user front-ends and optional database or text indexing solutions as back-ends. At its core is a custom query evaluation engine for index-less processing of corpus queries. With a flexible JSON-LD query protocol the approach allows communication with back-end systems to partially solve queries and offset some of the performance penalties imposed by the custom evaluation engine. This paper outlines the details of our first draft of aforementioned architecture.

机译：在过去的几十年中，数十种专门的语料库查询系统和语言的发展使人们产生了多样化但又零散的局面。今天，我们面临着众多的查询工具，每个查询工具都提供独特的功能，但是它们却不能互操作，并且通常依赖于非常特定的数据库后端或格式进行存储。这对于希望查询不同语料库的最终用户以及希望为用户提供查询和探索界面的语料库设计者都严重地妨碍了可用性。我们提出了一种混合语料库查询架构，作为克服此问题的第一步。它采用了介于用户前端和可选数据库或文本索引解决方案之间作为后端的中间件系统的形式。其核心是用于对语料库查询进行无索引处理的自定义查询评估引擎。通过灵活的JSON-LD查询协议，该方法允许与后端系统进行通信以部分解决查询并抵消自定义评估引擎施加的一些性能损失。本文概述了上述架构的第一稿的细节。

著录项

来源
《Workshop on Challenges in the Management of Large Corpora;Language Resources and Evaluation Conference》|2020年|31-39|共9页
会议地点
作者
Markus Gartner;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
corpus query system; query language; middleware;

机译：语料查询系统;查询语言;中间件;

相似文献

外文文献
中文文献
专利

1. ANNIS3: A new architecture for generic corpus query and visualization [J] . Krause Thomas, Zeldes Amir Literary & linguistic computing . 2016,第1期

机译：ANNIS3：用于通用语料库查询和可视化的新架构
2. A Topic Modeling Based Approach for Enhancing Corpus Querying [J] . International journal of open source software & processes . 2019,第3期

机译：基于主题建模的语料库查询增强方法
3. Characterising postgraduate students' corpus query and usage patterns for disciplinary data-driven learning [J] . Crosthwaite Peter, Wong Lillian L. C., Cheung Joyce ReCall . 2019,第Sepa期

机译：表征研究生的语料库查询和使用模式，以学科为导向的数据驱动学习
4. Individual Query Cardinality Estimation using Multiple Query Combinations on a Search Engine's Corpus [C] . Fahad Islam, Abdelaali Hassaine, Ali Jaoua, International Conference on Computer and Applications . 2017

机译：各个查询基数估计在搜索引擎的语料库上使用多个查询组合
5. A crash and resume query mechanism for the NetTraveler database middleware system. [D] . Ferrero Baker, Osvaldo. 2010

机译：NetTraveler数据库中间件系统的崩溃和恢复查询机制。
6. Question answering system using Q A site corpus Query expansion and answer candidate evaluation [O] . Kanako Komiya, Yuji Abe, Hajime Morita, -1

机译：使用问答站点语料库的问答系统查询扩展和候选答案评估
7. Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium [O] . Evert Stefan, Hardie Andrew 2011

机译：二十一世纪的语料库工作台：更新新世纪的查询体系结构

The Corpus Query Middleware of Tomorrow - A Proposal for a Hybrid Corpus Query Architecture

摘要

著录项

相似文献

相关主题

期刊订阅