With the development of microarray technology, it provides massive amounts of high dimensional gene expression data simultaneously and most of their functions are unknown. Computational methods that can effectively resolve high dimensionality and small sample size problems for the high throughput data are valuable in systems biology. Self-supervised learning techniques, which take a hybrid of labeled and unlabeled data to train classifiers, can solve the problem efficiently. Discriminant-EM (DEM) proposes a framework for such tasks by applying self-supervised learning in an optimal discriminating subspace of the original feature space. In this paper, the linear algorithm is extended to a nonlinear kernel algorithm to capture the non-linearity in the data distribution. Extensive experiments on the Plasmodium falciparum dataset show the promising performance of the approach.
展开▼