首页> 外文期刊>Canadian journal of electrical and computer engineering >CLASS: A general approach to classifying categorical sequences
【24h】

CLASS: A general approach to classifying categorical sequences

机译:CLASS:分类序列的通用方法

获取原文
获取原文并翻译 | 示例
           

摘要

The rapid burgeoning of available data in the form of categorical sequences, such as biological sequences, natural language texts, network and retail transactions, makes the classification of categorical sequences increasingly important. The main challenge is to identify significant features hidden behind the chronological and structural dependencies characterizing their intrinsic properties. Almost all existing algorithms designed to perform this task are based on the matching of patterns in chronological order, but categorical sequences often have similar features in non-chronological order. In addition, these algorithms have serious difficulties in outperforming domain-specific algorithms. In this paper we propose CLASS, a general approach for the classification of categorical sequences. By using an effective matching scheme called SPM for Significant Patterns Matching, CLASS is able to capture the intrinsic properties of categorical sequences. Furthermore, the use of Latent Semantic Analysis allows capturing semantic relations using global information extracted from large number of sequences, rather than comparing merely pairs of sequences. Moreover, CLASS employs a classifier called SNN for Significant Nearest Neighbours, inspired from the K Nearest Neighbours approach with a dynamic estimation of K, which allows the reduction of both false positives and false negatives in the classification. The extensive tests performed on a range of datasets from different fields show that CLASS is oftentimes competitive with domain-specific approaches.
机译:诸如生物序列,自然语言文本,网络和零售交易之类的分类序列形式的可用数据的迅速发展使得分类序列的分类变得越来越重要。主要的挑战是确定隐藏在表征其固有属性的时间和结构依赖性后面的重要特征。设计用于执行此任务的几乎所有现有算法都基于时间顺序的模式匹配,但是分类序列通常具有非时间顺序的相似特征。此外,这些算法在超越特定领域算法方面存在严重困难。在本文中,我们提出了CLASS,这是分类序列分类的通用方法。通过使用称为SPM的有效匹配方案进行有效模式匹配,CLASS可以捕获分类序列的固有属性。此外,潜在语义分析的使用允许使用从大量序列中提取的全局信息来捕获语义关系,而不是仅比较序列对。此外,CLASS采用了一种称为SNN的重要最近邻分类器,该分类器是从K最近邻方法启发而来的,动态估算K可以减少分类中的假阳性和假阴性。对来自不同领域的一系列数据集进行的广泛测试表明,CLASS通常与特定领域的方法竞争。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号