首页> 外文会议>International Conference on Scientometrics and Informetrics; 200308; Beijing(CN) >New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping
【24h】

New classification quality estimators for analysis of documentary information: application to patent analysis and web mapping

机译:用于文献信息分析的新分类质量估算器:应用于专利分析和网络制图

获取原文
获取原文并翻译 | 示例

摘要

The information analysis process includes a duster analysis or classification step associated to an expert validation of the results. In this paper, we propose new measures for estimating the quality of cluster analysis. These measures derive form the Galois lattice theory and from the Infoimation retrieval (IR) domain. As opposed to classical measures of inertia, they present the main advantages to be both independent of the classification method and of the difference between the intrinsic dimensions of the data and those of the clusters. We present two experiments using the MultiSOM model, which is an extension of the Kohonen's SOM model, as cluster analysis method. Our first experiment on patents data shows how such measures can be used to compare viewpoint-oriented classification method, like MultiSOM, with a global cluster analysis approach, like Kohonen' s SOM. Our second experiment, which takes part in the EISCTES EEC project, highlights that break-even points between our different measures of Recall/Precision can be used in order to determine an optimal number of dusters for Web data classification. The contents of the dusters obtained when using different break-even points are compared in order to study the quality of the resulting maps. This optimisation seems to be mandatory when one want to classify documents issued from the Web, where sparseness is usually a blocking factor.
机译:信息分析过程包括与结果的专家验证相关联的除尘器分析或分类步骤。在本文中,我们提出了评估聚类分析质量的新方法。这些措施是从伽罗瓦格子理论衍生而来的,并且是从信息检索(IR)领域中得出的。与传统的惯性量度相反,它们具有的主要优点既与分类方法无关,又与数据的内在维度和聚类的内在维度之间的差异无关。我们使用MultiSOM模型(是Kohonen SOM模型的扩展)作为聚类分析方法,进行了两个实验。我们对专利数据进行的第一个实验表明,如何使用这些措施将面向观点的分类方法(如MultiSOM)与全局聚类分析方法(如Kohonen的SOM)进行比较。我们的第二个实验(参加EISCTES EEC项目)强调指出,可以使用我们不同的召回/精确度度量之间的收支平衡点,以确定用于Web数据分类的最佳除尘器数量。比较使用不同收支平衡点时获得的除尘器内容,以研究生成的图的质量。当要对从Web发布的文档进行分类时,这种优化似乎是必需的,而稀疏通常是阻碍因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号