Methods in Protein Engineering

Directed Evolution


We mimic natural selection to develop proteins with novel and advanced properties. Through iterative rounds of mutation and screening/selection, we traverse fitness landscapes to find optimal proteins for user defined goals.

Directed evolution circumvents our profound ignorance of how a protein’s sequence encodes its function by using iterative rounds of random mutation and artificial selection to discover new and useful proteins. Proteins can be tuned to adapt to new functions or environments by simple adaptive walks involving small numbers of mutations. Directed evolution studies have shown how rapidly some proteins can evolve under strong selection pressures and, because the entire ‘fossil record’ of evolutionary intermediates is available for detailed study, they have provided new insight into the relationship between sequence and function. Directed evolution has also shown how mutations that are functionally neutral can set the stage for further adaptation.

How to Create New Enzyme Functions in the Laboratory

For the most part, directed evolution is conceptually and technically straightforward: once an enzyme that displays a low level of activity for a desired function is identified, mutagenesis and screening for improved function will often provide enhancements. However, that first step—identifying a protein with some initial activity—can be far from straightforward. How does a protein that performs one function evolve into another with a different function? And how can enzyme engineers accomplish this quickly to address current, time-sensitive problems? Two key concepts for accomplishing this are catalytic promiscuity—the ability of an enzyme to carry out functions other than its primary one—and chemical intuition—the how and why a reaction might happen.


When looking to create a new enzyme for a reaction that is not known to be catalyzed by an existing enzyme, we must hypothesize how a new reaction might be catalyzed, find protein(s) that might be able to do that, and test them using the appropriate substrates (and sometimes even cofactors). This process can guide us to identify a starting point for directed evolution of a new enzyme function. It may even happen over multiple directed evolution campaigns, iteratively accessing new activities as others are optimized.


Read more about how to leverage chemical intuition and enzyme promiscuity to create new enzyme functions here.


Crystallography enables structural characterization of proteins, providing molecular insights and guiding protein design. 

We use x-ray crystallography to structurally characterize the proteins we have engineered. We can visualize protein-subunit interfaces involved in activity regulation, active site organization of our enzymes, and substrate and cofactor binding-sites. Visualizing our advanced protein variants at the molecular level tells the story behind beneficial mutations. These crystal structures provide the foundation of our protein design efforts.

Check out our structures!


Structure-Guided Recombination

We have developed structure-guided recombination methods to create novel, highly functional protein diversity. 

We are trying to understand the benefits of recombination (sex) in evolution. We also want to understand how to use it efficiently to make new proteins with new features and functions. Sex in the test tube is not limited to two parents, nor to sequences from the same species. We can recombine 32 parents. Or sequences from monkeys and worms. We want to understand the rules for molecular sex: how to do it, what it can make, and what we can learn from it. We have observed, for example, that sex in the test tube is an innovation generation machine.

Homologous recombination is remarkably efficient for searching sequence space for functional proteins (i.e. it has a good chance of creating functional proteins) due to the conservative nature of homologous substitutions (they are less disruptive on average than random substitutions) and to the conservative nature of swapping blocks of sequence among related proteins. Chimeric proteins inherit the best and worst residues the parents have to offer, in new combinations that are not observed in nature. This leads to functional innovation.

We have developed computational tools that use protein structure information to design chimeric proteins and libraries of such proteins. These libraries are extremely diverse, with members that differ by tens or even hundreds of mutations while still maintaining a high proportion of sequences that fold and function. These chimeric proteins can be more stable than any of their parents. They can also catalyze reactions better than their parents, or even reactions their parents do not catalyze. We have also discovered that recombination leads to simplified (additive) sequence-function relationships that can be exploited to predict useful new sequences based on data from a small sampling of chimeras.

SCHEMA Recombination

Homologous recombination means swapping pieces of protein (blocks) between a set of homologs (parental proteins). The goal of site-directed SCHEMA recombination is to simultaneously maximize the mutation level of the chimeras and the probability the chimeric proteins will fold and function. We do this by minimizing the number of structural contacts that are disrupted when portions of sequence are inherited from different parent proteins. Using SCHEMA, we have made functional chimeras from parents sharing as little as 30% sequence identity. Guided by structural information, we have designed and constructed recombination libraries of a variety of proteins, including beta-lactamases, arginases, cytochrome P450s, GH48 cellulases, GH6 cellulases, GH7 cellulases, and Channelrhodopsins.

We have discovered that the recombination fitness landscape has a large additive component, which enables us to use simple linear regression models built from small data sets to predict highly stable chimera sequences. Homologous recombination thus gives us the opportunity to create and study a large number of functional enzymes whose properties vary significantly. With empirical models, we can accurately predict some of these properties and use these predictions to search for improved enzymes. We can also identify the sequence basis for variations in function.

Non-contiguous recombination

We have extended our recombination design tools to include libraries where the blocks are not necessarily contiguous in the primary sequence. Although not contiguous along the polypeptide chain, the blocks are contiguous on the folded 3-D structure of the protein. Non-contiguous recombination further reduces structural disruption, as important contacts between residues not next to each other in the protein chain can be preserved. We expect that this will allow us to design chimeras and chimera libraries using more distantly-related parent proteins, further increasing the diversity of chimera progeny.