首页> 外文期刊>International journal on digital libraries >Multilabel graph-based classification for missing labels
【24h】

Multilabel graph-based classification for missing labels

机译:基于Multilabel图形的缺失标签的分类

获取原文
获取原文并翻译 | 示例
           

摘要

Assigning several labels to digital data is becoming easier as this can be achieved in a collaborative manner with Internet users. However, this process is still a challenge, especially in cases where several labels are assigned to each datum, as some suitable labels may be missed. The missing labels lead to inaccuracies in classification. In this study, we propose a novel graph-based multi-label classifier that exhibits stability for obtaining high-accuracy results; this is achieved even where there are missing labels in training data. The core process of our algorithm is to smoothen the label values of the training data from their top-k similar data by propagating their values and averaging them to generate values for the missing labels in the training data. In experimental evaluations, we used multi-labeled document and image datasets to evaluate classifiers, and then measured micro-averaged F-scores for eight classifiers. Even though we incrementally removed correct labels from the two datasets, the proposed algorithm tended to maintain the F-scores, whereas other classifiers decreased the scores. In addition, we evaluated the algorithm using Wikipedia, which comprises a real dataset that includes missing labels, in order to determine how well the algorithm predicted the correct labels and how useful it was for manual annotations, as initial decisions. We have confirmed that LPAC is useful for not only automatic annotation, but also the facilitation of decision making in the initial manual category assignment.
机译:为数字数据分配几个标签正在变得更容易,因为这可以以与互联网用户的协同方式实现。但是,此过程仍然是一项挑战,特别是在分配给每个基准的几个标签的情况下,可能会错过一些合适的标签。缺失的标签导致分类中的不准确性。在这项研究中,我们提出了一种基于图形的基于图的多标签分类器,其表现出获得高精度结果的稳定性;即使在训练数据中缺少标签的情况下也是实现的。我们的算法的核心过程是通过传播它们的值并平均训练数据中缺少标签的值来使训练数据的标签值从其顶-K类似的数据中平滑。在实验评估中,我们使用了多标记的文档和图像数据集来评估分类器,然后测量八个分类器的微平均F分数。尽管我们从两个数据集逐渐删除了正确的标签,所提出的算法往往维持F分数,而其他分类器则降低了分数。此外,我们使用Wikipedia评估了该算法,该算法包括一个包含缺少标签的真实数据集,以确定算法预测正确标签的程度以及手动注释如何有用,作为初始决策。我们已经证实,LPAC不仅适用于自动注释,而且是在初始手动类别分配中进行决策的便利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号