...
首页> 外文期刊>Briefings in bioinformatics >iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data
【24h】

iLearn: an integrated platform and meta-learner for feature engineering, machine-learning analysis and modeling of DNA, RNA and protein sequence data

机译:ILEARN:用于特征工程,机器学习分析和DNA,RNA和蛋白质序列数据建模的集成平台和META学习者

获取原文
获取原文并翻译 | 示例
           

摘要

With the explosive growth of biological sequences generated in the post-genomic era, one of the most challenging problems in bioinformatics and computational biology is to computationally characterize sequences, structures and functions in an efficient, accurate and high-throughput manner. A number of online web servers and stand-alone tools have been developed to address this to date; however, all these tools have their limitations and drawbacks in terms of their effectiveness, user-friendliness and capacity. Here, we present iLearn, a comprehensive and versatile Python-based toolkit, integrating the functionality of feature extraction, clustering, normalization, selection, dimensionality reduction, predictor construction, best descriptor/model selection, ensemble learning and results visualization for DNA, RNA and protein sequences. iLearn was designed for users that only want to upload their data set and select the functions they need calculated from it, while all necessary procedures and optimal settings are completed automatically by the software. iLearn includes a variety of descriptors for DNA, RNA and proteins, and four feature output formats are supported so as to facilitate direct output usage or communication with other computational tools. In total, iLearn encompasses 16 different types of feature clustering, selection, normalization and dimensionality reduction algorithms, and five commonly used machine-learning algorithms, thereby greatly facilitating feature analysis and predictor construction. iLearn is made freely available via an online web server and a stand-alone toolkit.
机译:随着在后基因组时代产生的生物序列的爆炸性生长,生物信息学和计算生物学中最具挑战性的问题之一是以有效,准确和高吞吐量的方式计算序列,结构和功能。已经开发了许多在线网络服务器和独立工具来解决这一目标;但是,所有这些工具都在其有效性,用户友好性和容量方面具有它们的局限性和缺点。在这里,我们展示了Ilearn,一个基于多功能的Python的工具包,集成了特征提取,集群,归一化,选择,维数减少,预测器构建,最佳描述符/模型选择,集合学习和结果可视化的功能的功能的功能蛋白质序列。 ILEARN专为仅想要上传其数据集的用户而设计,并选择所需的功能,而软件将自动完成必要的程序和最佳设置。 ILEARN包括用于DNA,RNA和蛋白质的各种描述符,并且支持四种特征输出格式,以便于直接输出使用或与其他计算工具通信。总共包括16种不同类型的特征聚类,选择,归一化和维数减少算法,以及五种常用的机器学习算法,从而大大促进了特征分析和预测结构。 iLearn通过在线Web服务器和独立工具包自由使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号