^{1,a)}, Rahul K. Das

^{2,b)}and Rohit V. Pappu

^{2,c)}

### Abstract

Conformational heterogeneity is a defining characteristic of proteins. Intrinsically disordered proteins (IDPs) and denatured state ensembles are extreme manifestations of this heterogeneity. Inferences regarding globule versus coil formation can be drawn from analysis of polymeric properties such as average size, shape, and density fluctuations. Here we introduce a new parameter to quantify the degree of conformational heterogeneity within an ensemble to complement polymeric descriptors. The design of this parameter is guided by the need to distinguish between systems that couple their unfolding-folding transitions with coil-to-globule transitions and those systems that undergo coil-to-globule transitions with no evidence of acquiring a homogeneous ensemble of conformations upon collapse. The approach is as follows: Each conformation in an ensemble is converted into a conformational vector where the elements are inter-residue distances. Similarity between pairs of conformations is quantified using the projection between the corresponding conformational vectors. An ensemble of conformations yields a distribution of pairwise projections, which is converted into a distribution of pairwise conformational dissimilarities. The first moment of this dissimilarity distribution is normalized against the first moment of the distribution obtained by comparing conformations from the ensemble of interest to conformations drawn from a Flory random coil model. The latter sets an upper bound on conformational heterogeneity thus ensuring that the proposed measure for intra-ensemble heterogeneity is properly calibrated and can be used to compare ensembles for different sequences and across different temperatures. The new measure of conformational heterogeneity will be useful in quantitative studies of coupled folding and binding of IDPs and in de novo sequence design efforts that are geared toward controlling the degree of heterogeneity in unbound forms of IDPs.

This work was supported by grants from the National Institutes of Health (5RO1NS056114) and the National Science Foundation (MCB-1121867). We thank Professor Anders Carlsson, Professor Gary Stormo, and two anonymous reviewers for helpful comments and suggestions.

I. INTRODUCTION

II. METHODS

A. Polypeptide systems included in this work

B. Details of the metropolis Monte Carlo(MC) simulations

C. The MC sampling protocol

D. The Flory random coil (FRC) model

III. RESULTS

A. Estimating Φ

B. Assessment of conformational ensembles using Φ

C. Application of Φ to assess conformational heterogeneity in IDPs with different secondary structure propensities

IV. DISCUSSION

A. Practical uses for Φ

### Key Topics

- Conformational dynamics
- 26.0
- Proteins
- 18.0
- Sequence analysis
- 16.0
- Molecular conformation
- 15.0
- Monte Carlo methods
- 13.0

## Figures

Temperature dependence of s 2 and density for five archetypal systems. Panel (b) quantifies the temperature dependence of chain density (in units of gm-cm−3), which is calculated as , where MW denotes the molecular weight in gm mol−1.

Temperature dependence of s 2 and density for five archetypal systems. Panel (b) quantifies the temperature dependence of chain density (in units of gm-cm−3), which is calculated as , where MW denotes the molecular weight in gm mol−1.

Temperature dependence of fluctuations in density and energy for five archetypal systems. Panel (a) shows the temperature dependence of the density fluctuations quantified as the variance of the density distribution for a given temperature, i.e., . Panel (b) shows the temperature dependence of the specific heat capacity. The specific, constant volume heat capacities were calculated as , where MW is the molecular weight and ⟨E⟩ is the ensemble-averaged potential energy for simulated ensembles at a given temperature. Typically, one expects sharp transitions for well-defined order-to-disorder transitions and yet, interestingly, the Q56 system shows the sharpest transition. The relatively broad transitions for NTL9 and GB1 highlight the joint contributions of gradual melting and different degrees of residual local structure in their unfolded states.

Temperature dependence of fluctuations in density and energy for five archetypal systems. Panel (a) shows the temperature dependence of the density fluctuations quantified as the variance of the density distribution for a given temperature, i.e., . Panel (b) shows the temperature dependence of the specific heat capacity. The specific, constant volume heat capacities were calculated as , where MW is the molecular weight and ⟨E⟩ is the ensemble-averaged potential energy for simulated ensembles at a given temperature. Typically, one expects sharp transitions for well-defined order-to-disorder transitions and yet, interestingly, the Q56 system shows the sharpest transition. The relatively broad transitions for NTL9 and GB1 highlight the joint contributions of gradual melting and different degrees of residual local structure in their unfolded states.

Sample distributions P( ) for two systems at different temperatures. The panel on the left shows P( ) distributions for NTL9 at three different temperatures and the panel on the right shows these distributions for the Q56 system at three different simulation temperatures. In both panels, the solid curves represent intra-ensemble P( ) distributions whereas the dashed curves are for comparisons between conformations within an ensemble at temperature T and conformations drawn from the FRC ensemble.

Sample distributions P( ) for two systems at different temperatures. The panel on the left shows P( ) distributions for NTL9 at three different temperatures and the panel on the right shows these distributions for the Q56 system at three different simulation temperatures. In both panels, the solid curves represent intra-ensemble P( ) distributions whereas the dashed curves are for comparisons between conformations within an ensemble at temperature T and conformations drawn from the FRC ensemble.

Temperature dependence of ⟨ ⟩ and Φ for the five archetypal systems. Panel (b) includes error bars from a bootstrap analysis whereby 100 distinct bootstrap trials were performed to estimate Φ and the error bars therefore represent standard deviations for the estimate of the mean Φ values.

Temperature dependence of ⟨ ⟩ and Φ for the five archetypal systems. Panel (b) includes error bars from a bootstrap analysis whereby 100 distinct bootstrap trials were performed to estimate Φ and the error bars therefore represent standard deviations for the estimate of the mean Φ values.

(a)–(d) Plots to quantify the assessments of conformational properties that derive from the joint analysis s 2 (ordinates) and Φ (abscissae). In each panel, the symbol colors progress from cool to hot as temperature increases.

(a)–(d) Plots to quantify the assessments of conformational properties that derive from the joint analysis s 2 (ordinates) and Φ (abscissae). In each panel, the symbol colors progress from cool to hot as temperature increases.

Assessments conformational heterogeneity in ensembles with different degrees of helical structure. Panel (a) plots Φ against for T = 298 K. The results are shown for 17 naturally occurring and designed sequences. Panel (b) plots Φ against σ2(R g) and panel (c) plots Φ against σ2( ) for each of the 17 bZIP-bRs.

Assessments conformational heterogeneity in ensembles with different degrees of helical structure. Panel (a) plots Φ against for T = 298 K. The results are shown for 17 naturally occurring and designed sequences. Panel (b) plots Φ against σ2(R g) and panel (c) plots Φ against σ2( ) for each of the 17 bZIP-bRs.

Analysis of conformational heterogeneity in terms of the distribution of helical segment lengths for three of the bZIP-bRs. The figure shows three panels one each for the bZIP-bR of fra1, the chimeric cys3-fos, and gcn4. Each panel shows a histogram of helical segment lengths within the simulated ensembles. A helical segment corresponds to a consecutive stretch of residues in a conformation with a DSSP “H” designation. The value of Φ is dictated by the width of a segment length distribution as opposed to the ensemble-averaged helicity.

Analysis of conformational heterogeneity in terms of the distribution of helical segment lengths for three of the bZIP-bRs. The figure shows three panels one each for the bZIP-bR of fra1, the chimeric cys3-fos, and gcn4. Each panel shows a histogram of helical segment lengths within the simulated ensembles. A helical segment corresponds to a consecutive stretch of residues in a conformation with a DSSP “H” designation. The value of Φ is dictated by the width of a segment length distribution as opposed to the ensemble-averaged helicity.

Temperature dependence of the variance of calculated from the distributions of values for each of the five archetypal systems. All inferences regarding conformational heterogeneity that are drawn from analysis of the variance are consistent with those drawn from analysis of Φ.

Temperature dependence of the variance of calculated from the distributions of values for each of the five archetypal systems. All inferences regarding conformational heterogeneity that are drawn from analysis of the variance are consistent with those drawn from analysis of Φ.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content