^{1,a)}and Ioan Andricioaei

^{1,b)}

### Abstract

We present a simple method for utilizing experimental data to improve the efficiency of numerical calculations of free energy profiles from molecular dynamics simulations. The method involves umbrella sampling simulations with restraining potentials based on a known approximate estimate of the free energy profile derived solely from experimental data. The use of the experimental data results in optimal restraining potentials, guides the simulation along relevant pathways, and decreases overall computational time. In demonstration of the method, two systems are showcased. First, guided, unguided (regular) umbrella sampling simulations and exhaustive sampling simulations are compared to each other in the calculation of the free energy profile for the distance between the ends of a pentapeptide. The guided simulation use restraints based on a simulated “experimental” potential of mean force of the end-to-end distance that would be measured by fluorescence resonance energy transfer (obtained from exhaustive sampling). Statistical analysis shows a dramatic improvement in efficiency for a 5 window guided umbrella sampling over 5 and 17 window unguided umbrella sampling simulations. Moreover, the form of the potential of mean force for the guided simulations evolves, as one approaches convergence, along the same milestones as the extensive simulations, but exponentially faster. Second, the method is further validated by replicating the forced unfolding pathway of the titin I27 domain using guiding umbrella sampling potentials determined from actual single molecule pulling data. Comparison with unguided umbrella sampling reveals that the use of guided sampling encourages unfolding simulations to converge faster to a forced unfolding pathway that agrees with previous results and produces a more accurate potential of mean force.

M.M. was partially supported by an NIH Molecular Biophysics Predoctoral Research Training Program. I.A. acknowledges support from the NSF CAREER award program (CHE-0548047) and the donors of the ACS Petroleum Research Foundation.

I. INTRODUCTION

II. BACKGROUND ON UMBRELLA SAMPLING

III. THE EXPERIMENTALLY GUIDED UMBRELLA SAMPLING METHOD

IV. RESULTS

A. Pentapeptide test case

B. Application to titin protein: Unfolding free energy profile

V. CONCLUDING DISCUSSION

### Key Topics

- Free energy
- 25.0
- Probability theory
- 16.0
- Proteins
- 16.0
- Biochemical reactions
- 13.0
- Molecular dynamics
- 12.0

## Figures

Schematic examples of restraining potentials for the pedagogical case of a double well potential, showing the effect of subtracting the PMF from the restraints. (a) The original double well potential. (b) A set of uneducated-guess (poor) starting potentials (dashed lines) with the negative of the PMF (solid line) and (c) the results of subtracting the PMF from these harmonic potentials (dashed lines) overlaid with the original PMF. (d) A nearly optimal set of restraining potentials and (e) the results of subtracting the PMF from these optimal harmonic potentials, overlaid with the original PMF. The potentials in (e) resulting from the subtraction (dotted lines) produce biasing restraints similar to the type we suggest to use in our guided umbrella sampling protocol.

Schematic examples of restraining potentials for the pedagogical case of a double well potential, showing the effect of subtracting the PMF from the restraints. (a) The original double well potential. (b) A set of uneducated-guess (poor) starting potentials (dashed lines) with the negative of the PMF (solid line) and (c) the results of subtracting the PMF from these harmonic potentials (dashed lines) overlaid with the original PMF. (d) A nearly optimal set of restraining potentials and (e) the results of subtracting the PMF from these optimal harmonic potentials, overlaid with the original PMF. The potentials in (e) resulting from the subtraction (dotted lines) produce biasing restraints similar to the type we suggest to use in our guided umbrella sampling protocol.

(a) “Experimental” potential of mean force for the end-to-end distance of pentapeptide with fitted biasing potentials. (b) Second derivative of the PMF calculated using a finite-difference scheme and by eliminating, with a running average filter of width , changes in the curvature occurring on length scales smaller than the thermally accessible range for the harmonic potentials. The plot has five flat regions; correspondingly, five guiding umbrella potentials are chosen with the locations of their minima within the five regions placed according to a root-mean-square best fit to the experimental PMF.

(a) “Experimental” potential of mean force for the end-to-end distance of pentapeptide with fitted biasing potentials. (b) Second derivative of the PMF calculated using a finite-difference scheme and by eliminating, with a running average filter of width , changes in the curvature occurring on length scales smaller than the thermally accessible range for the harmonic potentials. The plot has five flat regions; correspondingly, five guiding umbrella potentials are chosen with the locations of their minima within the five regions placed according to a root-mean-square best fit to the experimental PMF.

Biased histograms for the pentapeptide in the case of (a) the unguided 17 window umbrella sampling, (b) 5 window unguided umbrella sampling, and (c) 5 window guided umbrella sampling after per window for each. Note the poor overlap for the histograms in panel (b) relative to (a), and the recovery of the overlap in (c).

Biased histograms for the pentapeptide in the case of (a) the unguided 17 window umbrella sampling, (b) 5 window unguided umbrella sampling, and (c) 5 window guided umbrella sampling after per window for each. Note the poor overlap for the histograms in panel (b) relative to (a), and the recovery of the overlap in (c).

K-S test of similarity for the pentapeptide model. The cumulative probability distribution functions for the distance between the sulfur of cysteine and the center of mass of the aromatic ring of tryptophan of the pentapeptide for the three umbrella sampling methods at different simulation times were compared to the exhaustive simulations at . The K-S values for the 17 window umbrella sampling, 5 window umbrella sampling, and the 5 window guided umbrella sampling are shown. The K-S statistics for the guided umbrella sampling at various times as compared to the 17 window umbrella sampling at is also shown. The lines are meant as guides to the eye. A lower K-S value indicates higher likelihood that the distributions represent the same data.

K-S test of similarity for the pentapeptide model. The cumulative probability distribution functions for the distance between the sulfur of cysteine and the center of mass of the aromatic ring of tryptophan of the pentapeptide for the three umbrella sampling methods at different simulation times were compared to the exhaustive simulations at . The K-S values for the 17 window umbrella sampling, 5 window umbrella sampling, and the 5 window guided umbrella sampling are shown. The K-S statistics for the guided umbrella sampling at various times as compared to the 17 window umbrella sampling at is also shown. The lines are meant as guides to the eye. A lower K-S value indicates higher likelihood that the distributions represent the same data.

(a) Potentials of mean force for the end-to-end distance of the pentapeptide for the guiding extensive sampling at and for the umbrella sampling methods at the time they most resemble the extensive run: i.e., at for the 17 window unguided run and at for the guided run. The 5 window unguided umbrella sampling never fully resembles the extensive simulation, not even after . (b) PMFs for the 17 window unguided umbrella sampling after the full and the guided umbrella sampling after the full compared to of extensive sampling.

(a) Potentials of mean force for the end-to-end distance of the pentapeptide for the guiding extensive sampling at and for the umbrella sampling methods at the time they most resemble the extensive run: i.e., at for the 17 window unguided run and at for the guided run. The 5 window unguided umbrella sampling never fully resembles the extensive simulation, not even after . (b) PMFs for the 17 window unguided umbrella sampling after the full and the guided umbrella sampling after the full compared to of extensive sampling.

Comparison of probability curves of pentapeptide end-to-end distance at various simulation times. (a) Exhaustive sampling at , guided umbrella sampling at ( per window), and the 17 window unguided umbrella sampling at ( per window). (b) Exhaustive sampling at , guided umbrella sampling at ( per window), and 17 window unguided umbrella sampling at ( per window). It can be seen that the umbrella sampling probabilities are beginning to shift to the right. (c) Probability distribution for the unguided 17 window umbrella sampling is shown at ( per window) with the guided umbrella sampling at 50 and . There is very little difference in the guided umbrella sampling between 50 and .

Comparison of probability curves of pentapeptide end-to-end distance at various simulation times. (a) Exhaustive sampling at , guided umbrella sampling at ( per window), and the 17 window unguided umbrella sampling at ( per window). (b) Exhaustive sampling at , guided umbrella sampling at ( per window), and 17 window unguided umbrella sampling at ( per window). It can be seen that the umbrella sampling probabilities are beginning to shift to the right. (c) Probability distribution for the unguided 17 window umbrella sampling is shown at ( per window) with the guided umbrella sampling at 50 and . There is very little difference in the guided umbrella sampling between 50 and .

Comparison of the exhaustive and guided umbrella sampling PMFs for the pentapeptide end-to-end distance as a function of simulation time. (a) Plot of exhaustive sampling times vs the guided umbrella sampling times with a similar PMF; exponential fit yields . (b) Guided umbrella sampling PMF as a function of simulation time. The time evolution of the potential of mean force of the reaction coordinate shows that the free energy well begins to narrow after about and does not converge until about . (c) Overlay of the PMFs from the exhaustive sampling (purple) and the guided umbrella sampling (red). The time of the guided umbrella sampling has been transformed using the exponential from panel (a) in order to make it line up with the exhaustive run results. The two simulations follow the same relative time path with respect to the reaction coordinate.

Comparison of the exhaustive and guided umbrella sampling PMFs for the pentapeptide end-to-end distance as a function of simulation time. (a) Plot of exhaustive sampling times vs the guided umbrella sampling times with a similar PMF; exponential fit yields . (b) Guided umbrella sampling PMF as a function of simulation time. The time evolution of the potential of mean force of the reaction coordinate shows that the free energy well begins to narrow after about and does not converge until about . (c) Overlay of the PMFs from the exhaustive sampling (purple) and the guided umbrella sampling (red). The time of the guided umbrella sampling has been transformed using the exponential from panel (a) in order to make it line up with the exhaustive run results. The two simulations follow the same relative time path with respect to the reaction coordinate.

Free energy profiles for “perpendicular” degrees of freedon, i.e., the pentapeptide dihedral angles. Exhaustive sampling distribution at is shown in green, 17 window umbrella sampling in blue, and 5 window guided umbrella sampling in red.

Free energy profiles for “perpendicular” degrees of freedon, i.e., the pentapeptide dihedral angles. Exhaustive sampling distribution at is shown in green, 17 window umbrella sampling in blue, and 5 window guided umbrella sampling in red.

(A) Experimental PMF and a selection of guiding umbrella sampling potentials used in the experimentally guided umbrella sampling simulations of unfolding of titin I27. As the simulations progess toward higher energy states, the force constant increases. (B) PMF for guided and unguided umbrella sampling simulations. The two methods produce substantially different curves; in contrast to the unguided runs, the guided sampling converges significantly faster, and to a profile consistent with experimental data (see text).

(A) Experimental PMF and a selection of guiding umbrella sampling potentials used in the experimentally guided umbrella sampling simulations of unfolding of titin I27. As the simulations progess toward higher energy states, the force constant increases. (B) PMF for guided and unguided umbrella sampling simulations. The two methods produce substantially different curves; in contrast to the unguided runs, the guided sampling converges significantly faster, and to a profile consistent with experimental data (see text).

(A) Equilibrated structure of the titin I27 domain. (B) Guided umbrella sampling titin structure for . (C) Unguided umbrella sampling structure for . The dotted lines mark the distance between carbons for residues 3 to 26 and 5 to 24, which are hydrogen bonded in the equilibrium structure. The dissociation of the A strand (in red), residues 3 to 7, is the first step in the forced unfolding pathway, which is correctly reproduced by the guided sampling, but not by the unguided one, even for twice the simulation time (see text).

(A) Equilibrated structure of the titin I27 domain. (B) Guided umbrella sampling titin structure for . (C) Unguided umbrella sampling structure for . The dotted lines mark the distance between carbons for residues 3 to 26 and 5 to 24, which are hydrogen bonded in the equilibrium structure. The dissociation of the A strand (in red), residues 3 to 7, is the first step in the forced unfolding pathway, which is correctly reproduced by the guided sampling, but not by the unguided one, even for twice the simulation time (see text).

Article metrics loading...

Full text loading...

Commenting has been disabled for this content