^{1,a)}and Daniel M. Zuckerman

^{1,b)}

### Abstract

One reason that free energy difference calculations are notoriously difficult in molecular systems is due to insufficient conformational overlap, or similarity, between the two states or systems of interest. The degree of overlap is irrelevant, however, if the absolute free energy of each state can be computed. We present a method for calculating the absolute free energy that employs a simple construction of an exactly computable reference system which possesses high overlap with the state of interest. The approach requires only a physical ensemble of conformations generated via simulation and an auxiliary calculation of approximately equal central-processing-unit cost. Moreover, the calculations can converge to the correct free energy value even when the physical ensemble is incomplete or improperly distributed. As a “proof of principle,” we use the approach to correctly predict free energies for test systems where the absolute values can be calculated exactly and also to predict the conformational equilibrium for leucine dipeptide in implicit solvent.

The authors would like to thank Edward Lyman, Ronald White, Srinath Cheluvarajah, and Hagai Meirovitch for many fruitful discussions. The authors thank the Departments of Computational Biology and Environmental and Occupational Health at the University of Pittsburgh, and the National Institutes of Health (F32 GM073517) for support.

I. INTRODUCTION

II. REFERENCE SYSTEM METHOD

A. The fundamental relations

B. The reference energy and its normalization

C. Using the physical and reference ensembles

D. The physical ensemble and construction of the reference system

E. Generation of the reference ensemble

F. Summary of the reference system method

III. RESULTS

A. Simple test systems

B. Leucine dipeptide

IV. DISCUSSION

A. Correlation of coordinates

B. Quality of the physical ensemble

C. Extension to larger systems

V. CONCLUSIONS

### Key Topics

- Free energy
- 81.0
- Peptides
- 19.0
- Methane
- 13.0
- Molecular conformation
- 7.0
- Biomolecules
- 4.0

## Figures

Depiction of how the reference potential energy is calculated for a one-coordinate system. First the coordinate is binned, creating a histogram (solid bars) populated according to a simulation. Then Eq. (4) is used to calculate reference energies for each coordinate bin (dashed bars). A hypothetical physical potential is shown as a dotted curve for comparison to . For a multicoordinate system would be the sum of the single-coordinate reference potential energies.

Depiction of how the reference potential energy is calculated for a one-coordinate system. First the coordinate is binned, creating a histogram (solid bars) populated according to a simulation. Then Eq. (4) is used to calculate reference energies for each coordinate bin (dashed bars). A hypothetical physical potential is shown as a dotted curve for comparison to . For a multicoordinate system would be the sum of the single-coordinate reference potential energies.

Absolute free energy for methane estimated by the reference system method as a function of the number of reference structures used in the estimate. The solid horizontal line is the exact free energy obtained by numerical integration. Five independent simulations are shown on a log scale to clearly show the convergence of the free energy estimate. The results shown were obtained using Eq. (10) with 100 bins for each degree of freedom, i.e., the estimates for the absolute free energy of methane in Table I are the values shown here for .

Absolute free energy for methane estimated by the reference system method as a function of the number of reference structures used in the estimate. The solid horizontal line is the exact free energy obtained by numerical integration. Five independent simulations are shown on a log scale to clearly show the convergence of the free energy estimate. The results shown were obtained using Eq. (10) with 100 bins for each degree of freedom, i.e., the estimates for the absolute free energy of methane in Table I are the values shown here for .

Absolute free energy for methane estimated by the reference system method as a function of the number of histogram bins used for each degree of freedom. The plot shows the “sweet spot” where histogram bins are small enough to reveal histogram features, yet large enough to give sufficient population in each bin. The results are shown with a vertical scale of and on a log scale to emphasize the wide range of bin sizes that produce excellent results for the reference system approach. The results shown were obtained using Eq. (10) for a methane molecule using (dashed curve) and (solid curve). The solid horizontal line shows the exact free energy and the error bars are the standard deviations of five independent trials. The plot demonstrates that at least 50 bins should be used for each independent coordinate and that the maximum number of bins depends on the number of snapshots in the physical ensemble.

Absolute free energy for methane estimated by the reference system method as a function of the number of histogram bins used for each degree of freedom. The plot shows the “sweet spot” where histogram bins are small enough to reveal histogram features, yet large enough to give sufficient population in each bin. The results are shown with a vertical scale of and on a log scale to emphasize the wide range of bin sizes that produce excellent results for the reference system approach. The results shown were obtained using Eq. (10) for a methane molecule using (dashed curve) and (solid curve). The solid horizontal line shows the exact free energy and the error bars are the standard deviations of five independent trials. The plot demonstrates that at least 50 bins should be used for each independent coordinate and that the maximum number of bins depends on the number of snapshots in the physical ensemble.

Free energy for leucine dipeptide estimated by the reference system method as a function of the number of reference structures used in the estimate. Five independent simulations are shown on a log scale to demonstrate the convergence behavior of the free energy estimate for (a) the alpha configuration and (b) the beta configuration. The results shown were obtained using Eq. (10) with 50 bins for each degree of freedom.

Free energy for leucine dipeptide estimated by the reference system method as a function of the number of reference structures used in the estimate. Five independent simulations are shown on a log scale to demonstrate the convergence behavior of the free energy estimate for (a) the alpha configuration and (b) the beta configuration. The results shown were obtained using Eq. (10) with 50 bins for each degree of freedom.

Scatter plots of the two torsions of each residue for leucine dipeptide. The results are shown for both physical and reference ensembles containing 100 000 structures each. The figure shows that (i) the reference system has good overlap with the physical system, as can be seen by the similarity between the two plots, and (ii) the reference system is more broadly distributed than the physical system, as evidenced by the data at for the reference system that is not present for the physical system.

Scatter plots of the two torsions of each residue for leucine dipeptide. The results are shown for both physical and reference ensembles containing 100 000 structures each. The figure shows that (i) the reference system has good overlap with the physical system, as can be seen by the similarity between the two plots, and (ii) the reference system is more broadly distributed than the physical system, as evidenced by the data at for the reference system that is not present for the physical system.

Histogram of the distance between the of residue 1 and the of residue 2 for leucine dipeptide. The results are shown for both reference and physical ensembles containing 100 000 structures each. The figure shows that (i) the reference system has good overlap with the physical system and (ii) the reference system is broader than the physical system.

Histogram of the distance between the of residue 1 and the of residue 2 for leucine dipeptide. The results are shown for both reference and physical ensembles containing 100 000 structures each. The figure shows that (i) the reference system has good overlap with the physical system and (ii) the reference system is broader than the physical system.

## Tables

Absolute free energy estimates obtained using our reference system approach for cases where the absolute free energy can be determined exactly. In all cases, the estimate is in excellent agreement with the exact free energy. The uncertainty, shown in parentheses [e.g., 3.14 ] is the standard deviation from five independent simulations. The results for the two-dimensional systems are in units and methane results have units of kcal/mole. The table shows estimates of the configurational integral in Eq. (2), i.e., the constant term is not included in the estimate.

Absolute free energy estimates obtained using our reference system approach for cases where the absolute free energy can be determined exactly. In all cases, the estimate is in excellent agreement with the exact free energy. The uncertainty, shown in parentheses [e.g., 3.14 ] is the standard deviation from five independent simulations. The results for the two-dimensional systems are in units and methane results have units of kcal/mole. The table shows estimates of the configurational integral in Eq. (2), i.e., the constant term is not included in the estimate.

Absolute free energy estimates of the alpha and beta conformations obtained using the reference system method for leucine dipeptide with GBSA solvation, in units of kcal/mol. The independent measurement for the free energy difference was obtained via a unconstrained simulation. The uncertainty for the absolute free energies, shown in parentheses, is the standard deviation from five independent leucine dipeptide simulations using reference structures in the reference ensemble. The uncertainty for the free energy differences is obtained by using every possible combination of and , i.e., 25 independent estimates. The standard error associated with the reference system estimate is , reflecting the 25 independent estimates. The table shows estimates of the configurational integral in Eq. (2), i.e., the constant term is not included in the estimate.

Absolute free energy estimates of the alpha and beta conformations obtained using the reference system method for leucine dipeptide with GBSA solvation, in units of kcal/mol. The independent measurement for the free energy difference was obtained via a unconstrained simulation. The uncertainty for the absolute free energies, shown in parentheses, is the standard deviation from five independent leucine dipeptide simulations using reference structures in the reference ensemble. The uncertainty for the free energy differences is obtained by using every possible combination of and , i.e., 25 independent estimates. The standard error associated with the reference system estimate is , reflecting the 25 independent estimates. The table shows estimates of the configurational integral in Eq. (2), i.e., the constant term is not included in the estimate.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content