首页> 美国卫生研究院文献>other >Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora
【2h】

Sublanguage Corpus Analysis Toolkit: A tool for assessing the representativeness and sublanguage characteristics of corpora

机译:亚语言语料库分析工具包:一种用于评估语料库的代表性和亚语言特征的工具

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Sublanguages are varieties of language that form “subsets” of the general language, typically exhibiting particular types of lexical, semantic, and other restrictions and deviance. SubCAT, the Sublanguage Corpus Analysis Toolkit, assesses the representativeness and closure properties of corpora to analyze the extent to which they are either sublanguages, or representative samples of the general language. The current version of SubCAT contains scripts and applications for assessing lexical closure, morphological closure, sentence type closure, over-represented words, and syntactic deviance. Its operation is illustrated with three case studies concerning scientific journal articles, patents, and clinical records. Materials from two language families are analyzed—English (Germanic), and Bulgarian (Slavic). The software is available at sublanguage.sourceforge.net under a liberal Open Source license.
机译:子语言是形成通用语言“子集”的各种语言,通常表现出特定类型的词汇,语义以及其他限制和偏差。 SubCAT(亚语言语料库分析工具包)评估语料库的代表性和封闭性,以分析它们在多大程度上是亚语言或通用语言的代表性样本。 SubCAT的当前版本包含脚本和应用程序,用于评估词汇闭合,形态闭合,句子类型闭合,过度代表的单词和语法偏差。通过有关科学期刊文章,专利和临床记录的三个案例研究来说明其操作。分析了来自两个语言家族的材料-英语(德语)和保加利亚语(斯拉夫语)。该软件可在sublanguage.sourceforge.net上以自由开源许可证获得。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号