Schematic figure for the multilevel analysis of a free energy landscape. (a) a 1D free energy landscape divided into four levels; (b) a cluster tree representing this free energy landscape. At level one, two nodes are formed that correspond to the two deepest free energy minima. At level two, four nodes are identified for four free energy minima. At level three and four, the number of nodes are reduced since some free energy minima are connected.
An illustration of Hierarchical Nyström Extension Graph (HNEG) applied to the alanine dipeptide system. (a) A conformation of the alanine dipeptide with two torsion angels (ϕ and ψ) labeled. (b) The free energy landscape projected onto the ϕ − ψ plane, where the red color indicates regions of high density or low free energy. (c) A Hierarchical Nyström Extension Graph containing 9 levels constructed for this system.
Bayesian comparison of MSMs constructed by the Hierarchical Nyström Extension Graph (HNEG) and the PCCA method for the alanine dipeptide. The y axis displays the logarithm of the posterior probability (ln (P(L 1∣D))) for models generated by HNEG (red) and PCCA (black). The logarithmic Bayes factor ln B = ln (P(L 1∣D)) HNEG − ln (P(L 2∣D) PCCA ) ≳ 100, indicating that HNEG consistently provides a better lumping than PCCA. The lag time is 9 ps.
Bayesian comparison of MSMs constructed by the Hierarchical Nyström Extension Graph (HNEG) and the PCCA method for the trpzip2 peptide. The y axis displays the logarithm of the posterior probability (ln (P(L 1∣D))) for models generated by HNEG (red) and PCCA (black). The logarithmic Bayes factor ln B = ln (P(L 1∣D)) HNEG − ln (P(L 2∣D) PCCA ) ∈ [250, 500], indicating that HNEG consistently provides a better lumping than PCCA for the trpzip2 system. The lag time is 10 ns.
Representative structures of the 13 macrostates from the optimal lumping (with the highest posterior probability) for the trpzip2 system. Their equilibrium populations are also displayed. Macrostate 3 corresponds to a folded hairpin structure and has the largest population (38.5%), indicating that the trpzip2 peptide still has a significant fraction of the folded structure at 350 K.
(a) An illustration of block diagonal structures of the microstate transition probability matrices. There are 2000 microstates in total, and the matrices are permuted to group microstates that belong to the same macrostate together. The results show that PCCA (left panel) tends to separate nearly disconnected small blocks first, while HNEG (right panel) focuses on identifying the well populated macrostates. The number of macrostates for both lumpings is 11. (b) HNEG successfully identify the large macrostates (around 9) with a much smaller total number of macrostates (<20) compared to PCCA (>80).
The Hierarchical Nyström method can robustly identify the large metastable macrostates with population greater than 1% (red), 2% (green), and 3% (blue), when varying the fraction of data that are included in the submatrix A (see Sec. II for details). The percentage of data we include(P m ) in the submatrix A is varied from 41% to 99%. The number of large macrostates keeps the same after we include 50% of the data or more.
Histogram of pairwise mutual information (with a mean value of 2.104) between the optimal lumping and all other lumpings (259 of them) obtained from the Nyström method by varying the level sets (red). For comparison, the mutual information between a lumping generated by PCCA (with 13 macrostates) and the optimal lumping is only 0.785 (green). The entropy of the optimal lumping (upper limit of the mutual information) is 3.313 (black), while the averaged mutual information between random lumpings (we produced 259 random lumpings) and the optimal lumping is 0.051.
Illustration of the overlapping of microstates (or protein conformations) assigned to large macrostates in different lumpings. The optimal lumping (A) is compared with a representative lumping (B) with mutual information at around 2.1. Both of these lumpings contain 9 large macrostates states with population >2%. The joint probability matrix P A, B (i, j) indicates the overlapping of microstates assigned to macrostate i in the optimal lumping A and macrostate j in the representative lumping B. P A, B has large diagonal elements but small off-diagonal elements after permutation. These results indicate that the large macrostates in the two lumpings share a relatively large fraction of identical microstates.
Hierarchical Nyström Extension Graph (HNEG).
Article metrics loading...
Full text loading...