^{1}, Pablo G. Debenedetti

^{1,a)}, Frank H. Stillinger

^{2}and Peter J. Rossky

^{3}

### Abstract

We investigate the properties of a two-dimensional lattice heteropolymer model for a protein in which water is explicitly represented. The model protein distinguishes between hydrophobic and polar monomers through the effect of the hydrophobicmonomers on the entropy and enthalpy of the hydrogen bonding of solvation shell water molecules. As experimentally observed, model heteropolymersequences fold into stable native states characterized by a hydrophobic core to avoid unfavorable interactions with the solvent. These native states undergo cold, pressure, and thermal denaturation into distinct configurations for each type of unfolding transition. However, the heteropolymersequence is an important element, since not all sequences will fold into stable native states at positive pressures. Simulation of a large collection of sequences indicates that these fall into two general groups, those exhibiting highly stable native structures and those that do not. Statistical analysis of important patterns in sequences shows a strong tendency for observing long blocks of hydrophobic or polar monomers in the most stable sequences. Statistical analysis also shows that alternation of hydrophobic and polar monomers appears infrequently among the most stable sequences. These observations are not absolute design rules and, in practice, these are not sufficient to rationally design very stable heteropolymers. We also study the effect of mutations on improving the stability of the model proteins, and demonstrate that it is possible to obtain a very stable heteropolymer from directed evolution of an initially unstable heteropolymer.

We thank Shona Patel, Scott McAllister, and Christopher Bristow for many helpful discussions throughout the course of this investigation. P.G.D. and P.J.R. gratefully acknowledge the support of the National Science Foundation [Collaborative Research in Chemistry Grants CHE0404699 (P.G.D.) and CHE0404695 (P.J.R.)], the U.S. Department of Energy, Division of Chemical Sciences, Geosciences, and Biosciences, Office of Basic Energy Sciences, Grant No. DE-FG02-87ER13714 (P.G.D.), and the R.A. Welch Foundation [No. F0019 (P.J.R.)]. We also acknowledge the Texas Advanced Computing Center (TACC) at the University of Texas at Austin for high performance computing resources.

I. INTRODUCTION

II. MODEL DESCRIPTION

III. METHODS

A. Calculation of the density of states

B. Sequence pattern analysis

IV. RESULTS

A. General model properties

B. Pattern analysis

C. Directed evolution

V. CONCLUSIONS

### Key Topics

- Proteins
- 150.0
- Hydrophobic interactions
- 90.0
- Sequence analysis
- 78.0
- Polymers
- 74.0
- Hydrogen bonding
- 59.0

## Figures

Phase diagram of Staphylococcal nuclease from a combination of Fourier transform infrared spectroscopy, small angle X-ray scattering, and differential scanning calorimetry experiments. Adapted with permission (Ref. 1).

Phase diagram of Staphylococcal nuclease from a combination of Fourier transform infrared spectroscopy, small angle X-ray scattering, and differential scanning calorimetry experiments. Adapted with permission (Ref. 1).

Schematic of the model protein and water. The black circles are hydrophobic (H) monomers, the gray circles are polar (P) monomers, and the lines connecting them are covalent bonds. The white circles are water molecules, and the four arms on each water molecule are the hydrogen bonding arms. Examples of each of the four types of bonding arms are shown, along with the variables which count their number: bulk bonding arm pairs , hydrophobic bonding arm pairs , polar bonding arm pairs , and unpaired bonding arms . This figure shows a portion of the whole system and, in practice, a much larger box is used to prevent the protein from interacting with itself across the periodic boundary.

Schematic of the model protein and water. The black circles are hydrophobic (H) monomers, the gray circles are polar (P) monomers, and the lines connecting them are covalent bonds. The white circles are water molecules, and the four arms on each water molecule are the hydrogen bonding arms. Examples of each of the four types of bonding arms are shown, along with the variables which count their number: bulk bonding arm pairs , hydrophobic bonding arm pairs , polar bonding arm pairs , and unpaired bonding arms . This figure shows a portion of the whole system and, in practice, a much larger box is used to prevent the protein from interacting with itself across the periodic boundary.

The phase diagram of a 16-mer heteropolymer denoted 16.4, with sequence , for parameter values of , , , , . The inner line marks the region within which the probability of observing the native state is 60% or greater. In the same way, the other lines mark the regions within which the native state probabilities are greater than 50% (bold), 40%, 30%, 20%, and 10% (outermost).

The phase diagram of a 16-mer heteropolymer denoted 16.4, with sequence , for parameter values of , , , , . The inner line marks the region within which the probability of observing the native state is 60% or greater. In the same way, the other lines mark the regions within which the native state probabilities are greater than 50% (bold), 40%, 30%, 20%, and 10% (outermost).

Representative configurations for sequence 16.4 in the (a) native state, (b) cold-denatured state, and (c) thermally denatured ensemble of states.

Representative configurations for sequence 16.4 in the (a) native state, (b) cold-denatured state, and (c) thermally denatured ensemble of states.

Contour of 50% native state probability for sequence 16.4 with temperature and pressure converted into dimensional quantities using and for parameter values , , , , .

Contour of 50% native state probability for sequence 16.4 with temperature and pressure converted into dimensional quantities using and for parameter values , , , , .

Contours of 50% native state probability for sequence 16.4 for varying values of the enthalpic bonus . The other model parameters remained constant at , , , and .

Contours of 50% native state probability for sequence 16.4 for varying values of the enthalpic bonus . The other model parameters remained constant at , , , and .

Contours of 50% native state probability for sequence 16.4 for varying values of the relative entropic penalty for hydrogen bonding around hydrophobic monomers. The parameter values used were and changing . To maintain the same bulk water thermodynamics, the total number of water orientations increases so that the fraction of bonding orientations for a pair of bonding arms [i.e., ] is kept constant at 0.1. The other model parameters remained constant at , .

Contours of 50% native state probability for sequence 16.4 for varying values of the relative entropic penalty for hydrogen bonding around hydrophobic monomers. The parameter values used were and changing . To maintain the same bulk water thermodynamics, the total number of water orientations increases so that the fraction of bonding orientations for a pair of bonding arms [i.e., ] is kept constant at 0.1. The other model parameters remained constant at , .

Distributions of the range of thermal stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The number of sequences with thermal stability in the interval in dimensionless units is shown by the height of the bar marked 0.01. The left axis shows the total number of sequences in each interval of thermal stability, while the right axis shows that number relative to the total number of simulated sequences in that set. The model parameters values are , , , , and .

Distributions of the range of thermal stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The number of sequences with thermal stability in the interval in dimensionless units is shown by the height of the bar marked 0.01. The left axis shows the total number of sequences in each interval of thermal stability, while the right axis shows that number relative to the total number of simulated sequences in that set. The model parameters values are , , , , and .

Distributions of the pressure stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The model parameter values are , , , , and .

Distributions of the pressure stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The model parameter values are , , , , and .

Distributions of the range of thermal stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The model parameter values are , , , , and .

Distributions of the range of thermal stability of randomly generated sequences for four different sets of sizes and H composition: (a) 16-mers at 37.5%, (b) 16-mers at 50%, (c) 16-mers at 62.5%, and (d) 20-mers at 50%. The model parameter values are , , , , and .

Directed evolution of initial sequence through four generations of mutation and selection for optimal pressure stability. The black circles are the best mutants at each generation that are used for subsequent rounds of mutation, and the line shows the improvement in pressure stability of the selected sequence. The empty circles show the pressure stability of the other mutants not selected at each generation for comparison with the best mutant.

Directed evolution of initial sequence through four generations of mutation and selection for optimal pressure stability. The black circles are the best mutants at each generation that are used for subsequent rounds of mutation, and the line shows the improvement in pressure stability of the selected sequence. The empty circles show the pressure stability of the other mutants not selected at each generation for comparison with the best mutant.

## Tables

Example 16-mer and 20-mer sequences with their thermal stability , pressure stability , and aggregate stability given in dimensionless units for model parameters , , , , and . The ranks listed next to each stability measure are the position of that sequence when ranked among sequences of the same size and composition in order of most stable to least stable for that stability measure.

Example 16-mer and 20-mer sequences with their thermal stability , pressure stability , and aggregate stability given in dimensionless units for model parameters , , , , and . The ranks listed next to each stability measure are the position of that sequence when ranked among sequences of the same size and composition in order of most stable to least stable for that stability measure.

Average values of the properties of large sets of simulated sequences for model parameters , , , , and . %H is the percent hydrophobicity of the set of sequences.

Average values of the properties of large sets of simulated sequences for model parameters , , , , and . %H is the percent hydrophobicity of the set of sequences.

Statistically significant patterns between two and five monomers in length from the set of very stable 16-mers with 50% composition. The frequent patterns appear more often than expected by random chance in the top 10% most stable simulated sequences, while the infrequent patterns appear less often than expected by random chance.

Statistically significant patterns between two and five monomers in length from the set of very stable 16-mers with 50% composition. The frequent patterns appear more often than expected by random chance in the top 10% most stable simulated sequences, while the infrequent patterns appear less often than expected by random chance.

Statistically significant patterns from the set of very stable 16-mers with 37.5% composition.

Statistically significant patterns from the set of very stable 16-mers with 37.5% composition.

Statistically significant patterns from the set of very stable 16-mers with 62.5% composition.

Statistically significant patterns from the set of very stable 16-mers with 62.5% composition.

Statistically significant patterns from the set of very stable 20-mers with 50% composition.

Statistically significant patterns from the set of very stable 20-mers with 50% composition.

Results of four generations of directed evolution beginning with an unstable initial 16-mer of 50% composition, for parameters , , , , and . The sequence, percent hydrophobicity (%H), and properties of the best mutant at each generation are given below. Three different selection criteria were used to determine the best mutant at each generation: the cold denaturation temperature , the thermal denaturation temperature , and the maximum stable pressure . After two generations of mutations selecting for , none of the mutants improved upon the previous generation’s best sequence.

Results of four generations of directed evolution beginning with an unstable initial 16-mer of 50% composition, for parameters , , , , and . The sequence, percent hydrophobicity (%H), and properties of the best mutant at each generation are given below. Three different selection criteria were used to determine the best mutant at each generation: the cold denaturation temperature , the thermal denaturation temperature , and the maximum stable pressure . After two generations of mutations selecting for , none of the mutants improved upon the previous generation’s best sequence.

Results of four generations of directed evolution beginning with an unstable 20-mer of 50% composition, for parameters , , , , and . The selection for , and proceeded along identical paths. After three generations, none of the subsequent mutations improved protein stability for any of the metrics.

Results of four generations of directed evolution beginning with an unstable 20-mer of 50% composition, for parameters , , , , and . The selection for , and proceeded along identical paths. After three generations, none of the subsequent mutations improved protein stability for any of the metrics.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content