...
首页> 外文期刊>International journal of intelligent information and database systems >Improve feature selection method of web page language identification using fuzzy ARTMAP
【24h】

Improve feature selection method of web page language identification using fuzzy ARTMAP

机译:改进的基于模糊ARTMAP的网页语言识别特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

The information available in languages other than English on the World Wide Web and global information systems is increasing significantly. Different languages can be produced by using one particular script such as Arabic, Persian, Urdu and Pashto that use Arabic script letters. The issue is how to produce reliable features of a web page that is to undergo language identification. Incorrectly identifying the language results in garbled translations as well as faulty and incomplete analyses. The aim of this study is to enhance the effectiveness of feature selection method of web page language identification. We have investigated total TV-grams, TV-grams frequency, TV-grams frequency document frequency, and TV-grams frequency inverse document frequency of web page language identification. From the experimental results, it is proven that TV-grams frequency gives the most promising result compared to other feature selection methods.
机译:万维网和全球信息系统上以英语以外的其他语言提供的信息正在大量增加。通过使用一种使用阿拉伯语字母的阿拉伯语,波斯语,乌尔都语和普什图语等特定脚本,可以产生不同的语言。问题是如何为要进行语言识别的网页提供可靠的功能。错误地识别语言会导致翻译乱码以及分析错误和不完整。这项研究的目的是提高网页语言识别的特征选择方法的有效性。我们调查了网页语言识别的总电视克数,电视克数频率,电视克数频率文档频率和电视克数频率逆文档频率。从实验结果证明,与其他特征选择方法相比,TV-grams频率给出了最有希望的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号