class="head no_bottom_margin" id="sec1title">IntroductionMost natural proteins are only marginally stable (). Thus, taken out of their natural context, through either overexpression, heterologous expression, or changes in environmental conditions, many proteins misfold and aggregate. The most general origin of overexpression challenges is low stability of the protein’s native, functional state relative to alternative nonfunctional or aggregation-prone states. By designing variants with more favorable native-state energy, yields of soluble and functional protein obtained by heterologous overexpression can be dramatically increased, alongside other merits such as longer storage and usage lifetimes and enhanced engineering potential.Engineering stable protein variants is a widely pursued goal. Methods based on phylogenetic analysis (, ) and structure-based rational or computational design (, , , ) yielded proteins with improved stability and higher functional expression (). Individual mutations, however, contribute little to stability (typically ≤1 kcal/mol) (), whereas stabilizing large and poorly expressed proteins typically requires many mutations. However, since even a single severely destabilizing mutation can undermine the benefit accruing from all others, high prediction accuracy is essential. Despite improvements in accuracy, existing approaches have a relatively high probability of inadvertently introducing disruptive mutations (false-positive predictions) (, , ). Published efforts to stabilize large proteins therefore either incorporate only a few predicted stabilizing mutations (typically ≤4) at each experimental step or use library approaches to identify optimal combinations of stabilizing mutations (, , , href="#bib37" rid="bib37" class=" bibr popnode">Whitehead et al., 2012, href="#bib38" rid="bib38" class=" bibr popnode">Wijma et al., 2014). Such approaches are laborious and impractical for proteins without established medium-to-high throughput screens, let alone for proteins of unknown function. To address the demand for stabilizing large, recalcitrant proteins by a wide range of researchers, who lack background in computational design, we developed an automated algorithm based on atomistic Rosetta modeling and phylogenetic sequence information. We specifically aimed to develop a general method that minimizes false-positive predictions to ensure that only a few variants need to be experimentally tested to achieve high functional yields (ideally, just one variant). We applied this algorithm to four different enzymes and one protein of unknown function. In each case, up to five variants were designed as the default output, encoding from 9 to 67 mutations relative to wild-type. These variants exhibited enhanced bacterial expression yields and stability, without sacrificing or altering activity.
展开▼