【24h】

Combinatorial Optimization Algorithms to Mine a Sub-Matrix of Maximal Sum

机译:挖掘最大和子矩阵的组合优化算法

获取原文

摘要

Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. [7] already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. The use of these properties results in an improved CP-LNS implementation evaluated here. Two additional algorithms are also proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and a mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive.
机译:比对技术已被广泛用于识别大型数据矩阵中的同类子组,例如在患者子集之间相似表达的基因子集。挖掘最大总和子矩阵是一个相关但独特的问题,对于该问题,人们会寻找一个(无必要连续的)矩形子矩阵及其最大条目数之和。 Le Van等。 [7]已经说明了它在基因表达分析中的适用性,并通过结合大邻域搜索(LNS)的约束编程(CP)方法解决了该问题。在这项工作中,我们展示了此NP难题的一些关键特性,并定义了边界函数,以便可以在合理的时间内解决较大的问题。这些属性的使用导致此处评估了改进的CP-LNS实现。为了利用问题的突出特征,还提出了另外两种算法:具有全局约束的CP方法(CPGC)和混合整数线性规划(MILP)。在合成和真实基因表达数据上进行的实际实验展示了这些方法的特点及其相对于CP-LNS方法的相对优势。总体而言,CPGC方法往往是产生良好解决方案的最快方法。但是,可以说,MILP配方最容易配制,也具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号