Overview of algorithm.
Illustration of an example pivot move for PDB 1L2Y.
Crankshaft move for SOD1 (PDB 1HL5), a protein with a long-range disulfide bond between C57 and C146. A minimized, non-equilibrated configuration is shown.
Schematic figures indicating the processes of backbone and side-chain addition, energy minimization, and 1 ns thermal equilibration.
Example optimal folding trajectories for 5 C α atoms in apo-myoglobin (1A6N). Unfolded and folded structures are also shown.
Each C α trajectory is divided into a smooth “laminar” and rugged “turbulent” part. Panels (a) and (b) show sample trajectories for C α(4) and C α(75) of apo-myoglobin. Panel (a) is predominantly laminar – the corresponding distances are Å, Å. Panel (b) is predominantly turbulent – the corresponding distances are Å, Å. (c) Criterion for determining the transition from laminar to turbulent trajectories. When the root variance in the distance travelled per step jumps above a threshold given by 7 times the baseline value, the trajectory from then on is defined as turbulent.
Different ensembles considered in this study to compare with protein folding kinetics.
(Panel (a)) TM-score distributions between native structures, showing homology of our dataset compared to a NR dataset, 56 and other datasets used for protein folding kinetics analysis. 84,118 One can see some homologous protein pairs in other datasets. (Panel (b)) TM-score distribution between 1299 unfolded states for α-synuclein. Similar distributions are obtained for other proteins.
Comparison between experimental and simulated 13 C α chemical shift values, for Aβ1−42. (Main panel) Black data points are experimental values from Ref. 57 , red data points are those from the simulated ensemble of 773 conformations, using CAMSHIFT. (Inset (a)) Scatter plot of experimental vs simulated chemical shifts (r = 0.93). (Panel (b)) Convergence study of the correlation coefficient between experimental and simulated data. Mean correlation coefficient is shown; vertical bars indicate the standard deviation of correlation coefficient values when random subsets with a given number of frames are taken from the total dataset.
(a) Radius of gyration vs. time (equilibration process), for proTα: a highly charged, intrinsically disordered protein. The relaxation time is about 0.8 ns, and the asymptotic value of the radius of gyration R G is about 35.5 Å. (b) Scaling of the radius of gyration R G with chain length, obtained by taking all subsections of a given length and finding the ensemble averaged radius of gyration. (Inset) Extrapolation procedure to find the asymptotic value of the scaling exponent ν. The value of ν is obtained for ensembles at a given equilibration time. This value converges exponentially to the t → ∞ value. Extrapolation from ensembles with t ⩽ 1 ns gives an asymptotic value of 0.633, while extrapolation from ensembles with t ⩽ 5 ns gives an asymptotic value of 0.631. A similar conclusion was obtained from extrapolation of the data for α-syn. Thus, extrapolation of ν from t ⩽ 1 ns ensembles is likely to be sufficiently accurate in general.
Nearest neighbor clustering using TM-score of 1299 structures of α-synuclein, projected onto the TM-scores to the centroid structures of the largest three clusters (blue, red, and black, respectively). Representative conformations in each cluster are shown. The lack of distinct clustering indicates diverse sampling of the unfolded ensemble.
(a) Scatter plot of the absolute contact order (ACO) and average laminar distance (equilibrium ensemble, with smoothed trajectories), for the 15 natively folded proteins in Table I . 2-state proteins (blue squares) and 3-state proteins (red triangles) are well-clustered by , but not by ACO, as can be seen by inspection, i.e., by projecting data onto each order parameter. Closed curves circumscribing each class of protein are a guide to the eye. (b) Statistical significance (p-values) that the various metrics for 2-state and 3-state folders arise from different distributions, as determined by t-test. 30 −log(p) is plotted, so that a higher number indicates better ability to distinguish between the two classes. The dashed black horizontal line indicates a threshold of 5% for statistical significance. Only ACO and maxcluster-determined TM-score fail to distinguish 2-state from 3-state folders. Error bars for ACO and are obtained by removing 1 data point at random from the dataset, recomputing −log(p), and then calculating the standard deviation for the resulting collection of values. Notation used in this panel is further described in Figure 14 .
Optimal folding trajectory of C α(50) in apo-myoglobin (1A6N). The trajectory is curved, due to steric constraints with the remainder of the protein. C α(50) is shown as blue spheres in the initial and final states. The region of protein N-terminal to C α(50) in the initial unfolded state is shown in red. This transforms to the short helix N-terminal to C α(50) in the final position.
Correlation matrix for all geometrical parameters, as well as experimental folding rates. The upper triangular elements are Pearson correlation coefficients. The lower triangular elements are the corresponding statistical significance values, which are represented as −log10 so that, e.g., 4.5 corresponds to p = 10−4.5 = 3.2× 10−5. Red represents strong positive correlation; blue represents strong negative correlation. “_raw” indicates numbers taken from the raw trajectory, while “_smooth” indicates numbers taken from the smoothed trajectory. Trajectories are further divided into “_laminar” and “_turbulent” parts. Initial ensembles are either equilibrated “_equil,” or pre-equilibration (energy minimized only or “_min”). Other parameters shown include ACO, protein length, GDT-TS, TM-score, natural log of the folding and unfolding rates in 0 M denaturant, and natural log of relaxation rate at the transition midpoint.
(Panel (a)) Correlation of various distance metrics with experimental refolding rate in water, for the dataset of proteins listed in Table I . Raw (rather than smoothed) data are taken here. Minus the log base 10 of the statistical significance is plotted, and the horizontal dashed line gives the threshold of statistical significance (p = 0.05). The best predictor of folding rates in water, the turbulent distance, has a significance of 10−7. Each integer below this value in the plot corresponds to a decrease in significance by an order of magnitude. (Panel (b)) Same as panel (a) but for experimental unfolding rate in water. Here, ACO emerges as the strongest correlator of unfolding rate. (Panel (c)) Same as panel (a) but for relaxation rate at the transition midpoint. Here, several variants of the distance travelled correlate best with relaxation rate, e.g., both and have a correlation coefficient r = −0.84.
(Panel (a)) Scatter plot of experimental folding rate at 0 M denaturant with the unfolded ensemble-averaged turbulent distance travelled during folding, corresponding to late-stage protein reconfiguration of structured elements. (Panel (b)) Scatter plot of the folding rate at 0 M denaturant with the ensemble-averaged RMSD between unfolded structures and the native. For both plots, the pre-equilibrated, energy-minimized, ensemble is taken, and raw rather than smoothed data are taken. Data for 2-state proteins are shown as squares, data for 3-state proteins are shown as triangles.
Proteins and their properties used in this study. a
Article metrics loading...
Full text loading...