Analysing Part-of-Speech for Portuguese Text Classification

机译：分析词性以进行葡萄牙语文本分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes and evaluates the use of linguistic information in the pre-processing phase of text classification. We present several experiments evaluating the selection of terms based on different measures and linguistic knowledge. To build the classifier we used Support Vector Machines (SVM), which are known to produce good results on text classification tasks. Our proposals were applied to two different datasets written in the Portuguese language: articles from a Brazilian newspaper (Folha de Sao Paulo) and juridical documents from the Portuguese Attorney General's Office. The results show the relevance of part-of-speech information for the pre-processing phase of text classification allowing for a strong reduction of the number of features needed in the text classification.

机译：本文提出并评估了在文本分类的预处理阶段中语言信息的使用。我们提出了一些实验，这些实验基于不同的度量和语言知识来评估术语的选择。为了构建分类器，我们使用了支持向量机（SVM），众所周知，它可以在文本分类任务上产生良好的结果。我们的提案被应用到两个以葡萄牙语编写的不同数据集：巴西报纸（圣保罗剧院）的文章和葡萄牙司法部长办公室的司法文件。结果表明，词性信息与文本分类的预处理阶段相关，可以大大减少文本分类所需的特征数量。

著录项

来源
《International Conference on Computational Linguistics and Intelligent Text Processing(CICLing 2006); 20060219-25; Mexico City(MX)》|2006年|P.551-562|共12页
会议地点 Mexico City(MX)
作者
Teresa Goncalves; Cassiana Silva; Paulo Quaresma; Renata Vieira;
展开▼
作者单位

Dep. Informatica, Universidade de Evora, 7000 Evora, Portugal;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类程序语言、算法语言;
关键词

相似文献

外文文献
中文文献
专利

1. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese [J] . Erick R Fonseca, Jo#227, o Lu#237, Brazilian Computer Society. Journal . 2015,第1期

机译：评估葡萄牙语中词性标记的词嵌入和修订的语料库
2. Evaluating word embeddings and a revised corpus for part-of-speech tagging in Portuguese [J] . Erick R Fonseca, João Luís G Rosa, Sandra Maria Aluísio Journal of the Brazilian Computer Society . 2015,第1期

机译：评估葡萄牙语中词性标记的词嵌入和修订语料库
3. Fine-grained part-of-speech tagging in Nepali text [J] . Ingroj Shrestha, Shreeya Singh Dhakal Procedia Computer Science . 2021,第a期

机译：在尼泊尔文本中细粒度的致辞标记
4. Analysing Part-of-Speech for Portuguese Text Classification [C] . Teresa Goncalves, Cassiana Silva, Paulo Quaresma, International Conference on Computational Linguistics and Intelligent Text Processing . 2006

机译：分析葡萄牙文本分类的术语
5. IITagger: Tagging Wall Street Journal text with part-of-speech information [D] . Kim, Yeongkwun 1996

机译：IITagger：使用词性信息标记“华尔街日报”文本
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Analysing part-of-speech for Portuguese text classification [O] . Gonçalves, Teresa, Quaresma, Paulo 2006

机译：分析词性以进行葡萄牙语文本分类

Analysing Part-of-Speech for Portuguese Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅