首页> 外文学位 >Three Essays on Feature and Model Selection for Classification and Regression Problems
【24h】

Three Essays on Feature and Model Selection for Classification and Regression Problems

机译:分类和回归问题的特征和模型选择的三篇论文

获取原文
获取原文并翻译 | 示例

摘要

This thesis comprises of three essays on feature and model selection for classification and regression problems. The first essay focuses on selection of features for classification problems based on the notions of redundancy and complementarity. Redundancy and complementarity are the non-additive effects that result from feature interaction. While redundancy leads to a decrease in the predictive power of a subset of features, complementarity leads to information gain and improves prediction. This essay examines how combining complementarity with relevance and redundancy can lead to superior prediction for classification problems. A filter-based feature selection heuristic is proposed that combines these three criteria using an adaptive multi-objective optimization framework. The heuristic is adaptive in the sense that it updates the relative trade-off between these criteria adaptively based on the redundancy-complementarity ratio of the candidate subset. The proposed heuristic differs from many existing methods in that it distinguishes redundancy from complementarity explicitly, and does not penalize all dependencies. Using empirical study, we show that this approach can yield superior classification performance compared to many existing feature selection methods.;The second essay extends this notion of non-additivity to selection of interaction terms for linear regression problems. In a regression problem, an interaction effect is said to exist if the effect of one variable on the outcome depends on the value of the other variable, called the moderator variable. Existing literature on interaction mostly use sequential regression with regularization or penalty parameters to select relevant interaction effects. In this work, we examine the redundancy and complementarity that results from the correlation between the predictors. Although such synergy or redundancy does not statistically imply an interaction, we hypothesize that it is a potential indicator of the existence of an interaction effect. Based on this hypothesis, two methods of finding interaction terms are proposed. The proposed methods select an interaction term based on the principle of non-additivity. Using empirical study, we show that the proposed methods can select relevant interaction effects relatively quickly and produce comparable prediction accuracy with smaller number of features.;The last essay deals with a model selection problem in the context of securities class-action cases in United States. Insurance companies that provide Directors and Officers (D&O) insurance coverage to public limited companies are highly sensitive to class-action litigations due to high cost of settlement and legal fees. Predicting the probability of dismissal of a case early in the trial may give significant competitive advantage to the insurers in deciding the appropriate course of action- whether to fight or settle. This essay looks into this problem and proposes a hybrid probability model that combines the best of two well-established methods used for prediction. Using past data of securities class-action cases filed in US federal courts between 2002-2010, we show this probability model can predict the probability of dismissal of a case based on five important predictors. This model is useful for insurance companies that underwrite D&O policy.
机译:本文包括三篇关于分类和回归问题的特征和模型选择的论文。第一篇文章重点介绍基于冗余和互补性概念的分类问题的特征选择。冗余和互补是特征交互产生的非加性效应。冗余会导致功能子集的预测能力下降,而互补性则会导致信息增益并改善预测。本文探讨了将互补性与相关性和冗余性相结合可以如何为分类问题提供更好的预测。提出了一种基于过滤器的特征选择启发式算法,该方法使用自适应多目标优化框架将这三个标准组合在一起。启发式算法是自适应的,因为它基于候选子集的冗余-互补比来自适应地更新这些标准之间的相对权衡。所提出的启发式方法与许多现有方法的不同之处在于,它明确地将冗余与互补性区分开,并且不会惩罚所有依赖项。通过经验研究,我们证明了与许多现有的特征选择方法相比,该方法可以产生更好的分类性能。第二篇文章将这种非可加性的概念扩展到了线性回归问题的交互项的选择上。在回归问题中,如果一个变量对结果的影响取决于另一个变量(即主持人变量)的值,则认为存在交互作用。现有的交互作用文献大多使用具有正则化或惩罚参数的顺序回归来选择相关的交互作用效果。在这项工作中,我们检查了预测变量之间的相关性所导致的冗余和互补性。尽管这种协同作用或冗余在统计上并不暗示有相互作用,但我们假设它是存在相互作用效应的潜在指标。基于该假设,提出了两种寻找交互作用项的方法。所提出的方法基于非可加性原理选择相互作用项。通过实证研究,我们表明所提出的方法可以相对快速地选择相关的交互作用,并以较少的特征数量产生可比较的预测准确性。;最后一篇论文涉及美国证券集体诉讼案例中的模型选择问题。向公众有限公司提供董事和高级管理人员(D&O)保险的保险公司对集体诉讼高度敏感,原因是和解费用和律师费较高。在审判初期就预测案件被驳回的可能性,可以为保险人在决定采取适当的行动方式(是打架还是和解)时提供巨大的竞争优势。本文探讨了这个问题,并提出了一种混合概率模型,该模型结合了两种用于预测的成熟方法中的最佳方法。使用2002年至2010年间美国联邦法院提起的证券集体诉讼案件的数据,我们显示此概率模型可以基于五个重要的预测变量来预测案件被驳回的可能性。该模型对于承保D&O政策的保险公司很有用。

著录项

  • 作者

    Singha, Sumanta.;

  • 作者单位

    University of Kansas.;

  • 授予单位 University of Kansas.;
  • 学科 Operations research.
  • 学位 Ph.D.
  • 年度 2018
  • 页码 216 p.
  • 总页数 216
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号