首页> 外文会议>Workshop on biomedical natural language processing >Effect of small sample size on text categorization with support vector machines
【24h】

Effect of small sample size on text categorization with support vector machines

机译:小样本大小对带载体机器文本分类的影响

获取原文

摘要

Datasets that answer difficult clinical questions are expensive in part due to the need for medical expertise and patient informed consent. We investigate the effect of small sample size on the performance of a text categorization algorithm. We show how to determine whether the dataset is large enough to train support vector machines. Since it is not possible to cover all aspects of sample size calculation in one manuscript, we focus on how certain types of data relate to certain properties of support vector machines. We show that normal vectors of decision hyperplanes can be used for assessing reliability and internal cross-validation can be used for assessing stability of small sample data.
机译:由于需要医疗专业知识和患者知情同意,应对困难临床问题的数据集是昂贵的。我们调查小样本大小对文本分类算法性能的影响。我们展示了如何确定数据集是否足够大以培训支持向量机。由于不可能在一个稿件中涵盖样本量计算的所有方面,因此我们专注于某些类型的数据如何与支持向量机的某些属性有关。我们表明,决策超平面的正常矢量可用于评估可靠性,内部交叉验证可用于评估小样本数据的稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号