banner image
No data available.
Please log in to see this content.
You have no subscription access to this content.
No metrics data to plot.
The attempt to load metrics for this article has failed.
The attempt to plot a graph for these metrics has failed.
Unfolded protein ensembles, folding trajectories, and refolding rate prediction
Rent this article for


Image of FIG. 1.
FIG. 1.

Overview of algorithm.

Image of FIG. 2.
FIG. 2.

Illustration of an example pivot move for PDB 1L2Y.

Image of FIG. 3.
FIG. 3.

Crankshaft move for SOD1 (PDB 1HL5), a protein with a long-range disulfide bond between C57 and C146. A minimized, non-equilibrated configuration is shown.

Image of FIG. 4.
FIG. 4.

Schematic figures indicating the processes of backbone and side-chain addition, energy minimization, and 1 ns thermal equilibration.

Image of FIG. 5.
FIG. 5.

Example optimal folding trajectories for 5 atoms in apo-myoglobin (1A6N). Unfolded and folded structures are also shown.

Image of FIG. 6.
FIG. 6.

Each trajectory is divided into a smooth “laminar” and rugged “turbulent” part. Panels (a) and (b) show sample trajectories for (4) and (75) of apo-myoglobin. Panel (a) is predominantly laminar – the corresponding distances are Å, Å. Panel (b) is predominantly turbulent – the corresponding distances are Å, Å. (c) Criterion for determining the transition from laminar to turbulent trajectories. When the root variance in the distance travelled per step jumps above a threshold given by 7 times the baseline value, the trajectory from then on is defined as turbulent.

Image of FIG. 7.
FIG. 7.

Different ensembles considered in this study to compare with protein folding kinetics.

Image of FIG. 8.
FIG. 8.

(Panel (a)) TM-score distributions between structures, showing homology of our dataset compared to a NR dataset, and other datasets used for protein folding kinetics analysis. One can see some homologous protein pairs in other datasets. (Panel (b)) TM-score distribution between 1299 unfolded states for α-synuclein. Similar distributions are obtained for other proteins.

Image of FIG. 9.
FIG. 9.

Comparison between experimental and simulated chemical shift values, for Aβ. (Main panel) Black data points are experimental values from Ref. , red data points are those from the simulated ensemble of 773 conformations, using CAMSHIFT. (Inset (a)) Scatter plot of experimental simulated chemical shifts ( = 0.93). (Panel (b)) Convergence study of the correlation coefficient between experimental and simulated data. Mean correlation coefficient is shown; vertical bars indicate the standard deviation of correlation coefficient values when random subsets with a given number of frames are taken from the total dataset.

Image of FIG. 10.
FIG. 10.

(a) Radius of gyration vs. time (equilibration process), for proTα: a highly charged, intrinsically disordered protein. The relaxation time is about 0.8 ns, and the asymptotic value of the radius of gyration is about 35.5 Å. (b) Scaling of the radius of gyration with chain length, obtained by taking all subsections of a given length and finding the ensemble averaged radius of gyration. (Inset) Extrapolation procedure to find the asymptotic value of the scaling exponent ν. The value of ν is obtained for ensembles at a given equilibration time. This value converges exponentially to the → ∞ value. Extrapolation from ensembles with ⩽ 1 ns gives an asymptotic value of 0.633, while extrapolation from ensembles with ⩽ 5 ns gives an asymptotic value of 0.631. A similar conclusion was obtained from extrapolation of the data for α-syn. Thus, extrapolation of ν from ⩽ 1 ns ensembles is likely to be sufficiently accurate in general.

Image of FIG. 11.
FIG. 11.

Nearest neighbor clustering using TM-score of 1299 structures of α-synuclein, projected onto the TM-scores to the centroid structures of the largest three clusters (blue, red, and black, respectively). Representative conformations in each cluster are shown. The lack of distinct clustering indicates diverse sampling of the unfolded ensemble.

Image of FIG. 12.
FIG. 12.

(a) Scatter plot of the absolute contact order (ACO) and average laminar distance (equilibrium ensemble, with smoothed trajectories), for the 15 natively folded proteins in Table I . 2-state proteins (blue squares) and 3-state proteins (red triangles) are well-clustered by , but not by ACO, as can be seen by inspection, i.e., by projecting data onto each order parameter. Closed curves circumscribing each class of protein are a guide to the eye. (b) Statistical significance (p-values) that the various metrics for 2-state and 3-state folders arise from different distributions, as determined by t-test. −log(p) is plotted, so that a higher number indicates better ability to distinguish between the two classes. The dashed black horizontal line indicates a threshold of 5% for statistical significance. Only ACO and maxcluster-determined TM-score fail to distinguish 2-state from 3-state folders. Error bars for ACO and are obtained by removing 1 data point at random from the dataset, recomputing −log(p), and then calculating the standard deviation for the resulting collection of values. Notation used in this panel is further described in Figure 14 .

Image of FIG. 13.
FIG. 13.

Optimal folding trajectory of (50) in apo-myoglobin (1A6N). The trajectory is curved, due to steric constraints with the remainder of the protein. (50) is shown as blue spheres in the initial and final states. The region of protein N-terminal to (50) in the initial unfolded state is shown in red. This transforms to the short helix N-terminal to (50) in the final position.

Image of FIG. 14.
FIG. 14.

Correlation matrix for all geometrical parameters, as well as experimental folding rates. The upper triangular elements are Pearson correlation coefficients. The lower triangular elements are the corresponding statistical significance values, which are represented as −log so that, e.g., 4.5 corresponds to = 10 = 3.2× 10. Red represents strong positive correlation; blue represents strong negative correlation. “_raw” indicates numbers taken from the raw trajectory, while “_smooth” indicates numbers taken from the smoothed trajectory. Trajectories are further divided into “_laminar” and “_turbulent” parts. Initial ensembles are either equilibrated “_equil,” or pre-equilibration (energy minimized only or “_min”). Other parameters shown include ACO, protein length, GDT-TS, TM-score, natural log of the folding and unfolding rates in 0 M denaturant, and natural log of relaxation rate at the transition midpoint.

Image of FIG. 15.
FIG. 15.

(Panel (a)) Correlation of various distance metrics with experimental refolding rate in water, for the dataset of proteins listed in Table I . Raw (rather than smoothed) data are taken here. Minus the log base 10 of the statistical significance is plotted, and the horizontal dashed line gives the threshold of statistical significance (p = 0.05). The best predictor of folding rates in water, the turbulent distance, has a significance of 10. Each integer below this value in the plot corresponds to a decrease in significance by an order of magnitude. (Panel (b)) Same as panel (a) but for experimental unfolding rate in water. Here, ACO emerges as the strongest correlator of unfolding rate. (Panel (c)) Same as panel (a) but for relaxation rate at the transition midpoint. Here, several variants of the distance travelled correlate best with relaxation rate, e.g., both and have a correlation coefficient = −0.84.

Image of FIG. 16.
FIG. 16.

(Panel (a)) Scatter plot of experimental folding rate at 0 M denaturant with the unfolded ensemble-averaged turbulent distance travelled during folding, corresponding to late-stage protein reconfiguration of structured elements. (Panel (b)) Scatter plot of the folding rate at 0 M denaturant with the ensemble-averaged RMSD between unfolded structures and the native. For both plots, the pre-equilibrated, energy-minimized, ensemble is taken, and raw rather than smoothed data are taken. Data for 2-state proteins are shown as squares, data for 3-state proteins are shown as triangles.


Generic image for table
Table I.

Proteins and their properties used in this study.


Article metrics loading...


Full text loading...

This is a required field
Please enter a valid email address
752b84549af89a08dbdd7fdb8b9568b5 journal.articlezxybnytfddd
Scitation: Unfolded protein ensembles, folding trajectories, and refolding rate prediction