【24h】

Analysing Part-of-Speech for Portuguese Text Classification

机译:分析词性以进行葡萄牙语文本分类

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de Sao Paulo) and juridical documents from the Portuguese Attorney General's Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification.
机译:本文提出并评估了在文本分类的预处理阶段中语言信息的使用。我们提出了一些实验,这些实验基于不同的度量和语言知识来评估术语的选择。为了构建分类器,我们使用了支持向量机(SVM),众所周知,它可以在文本分类任务上产生良好的结果。我们的提案被应用到两个以葡萄牙语编写的不同数据集:巴西报纸(圣保罗剧院)的文章和葡萄牙司法部长办公室的司法文件。结果表明,词性信息与文本分类的预处理阶段相关,可以大大减少文本分类所需的特征数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号