^{1,2}, Steven M. Kreuzer

^{3}, Alfredo E. Cardenas

^{2}and Ron Elber

^{2,4,a)}

### Abstract

Network representations are becoming increasingly popular for analyzing kinetic data from techniques like Milestoning, Markov State Models, and Transition Path Theory. Mapping continuous phase space trajectories into a relatively small number of discrete states helps in visualization of the data and in dissecting complex dynamics to concrete mechanisms. However, not only are molecular networks derived from molecular dynamics simulations growing in number, they are also getting increasingly complex, owing partly to the growth in computer power that allows us to generate longer and better converged trajectories. The increased complexity of the networks makes simple interpretation and qualitative insight of the molecular systems more difficult to achieve. In this paper, we focus on various network representations of kinetic data and algorithms to identify important edges and pathways in these networks. The kinetic data can be local and partial (such as the value of rate coefficients between states) or an exact solution to kinetic equations for the entire system (such as the stationary flux between vertices). In particular, we focus on the Milestoning method that provides fluxes as the main output. We proposed Global Maximum Weight Pathways as a useful tool for analyzing molecular mechanism in Milestoning networks. A closely related definition was made in the context of Transition Path Theory. We consider three algorithms to find Global Maximum Weight Pathways: Recursive Dijkstra's, Edge-Elimination, and Edge-List Bisection. The asymptotic efficiency of the algorithms is analyzed and numerical tests on finite networks show that Edge-List Bisection and Recursive Dijkstra's algorithms are most efficient for sparse and dense networks, respectively. Pathways are illustrated for two examples: helix unfolding and membrane permeation. Finally, we illustrate that networks based on local kinetic information can lead to incorrect interpretation of molecular mechanisms.

The authors thank Eric Vanden-Eijnden for interesting discussions and helpful comments on the paper. S.V. would like to thank Vishvas Vasuki for useful discussions about the algorithms and the comments on the CS Theory Stack Exchange forum for references to literature on maximum capacity paths. R.E. acknowledges support from the National Institutes of Health (NIH) Grant No. GM059796 and Welch Grant No. F-1783.

I. INTRODUCTION

II. DEFINITION OF THE NETWORK

III. NETWORK REPRESENTATIONS OF KINETIC DATA

A. State-space (anchor-based) graphs with flux-based edge weights

B. Flux-space (milestone-based) graphs with flux-based edge weights

C. Flux-space (milestone-based) graph based on rate coefficients

IV. DEFINITION OF PATHWAYS

A. Maximum weight path (MWP)

B. Global maximum weight path (GMWP)

V. DETERMINATION OF MAXIMUM WEIGHT AND GLOBAL MAXIMUM WEIGHT PATHWAYS

A. Recursive Dijkstra's algorithm

1. Dijkstra's single source shortest path algorithm

2. Modification to Dijkstra's shortest path algorithm for maximum weight path calculation

3. Recursive Dijkstra's algorithm for GMWP calculation

B. Comparison to edge-elimination based MaxFlux algorithm

1. Explanation and example

2. Efficiency

C. Comparison to the edge-list bisection algorithm

1. Explanation and example

2. Efficiency

VI. RESULTS AND DISCUSSION

A. Helix unfolding under stress

B. Membrane permeation of DOPC

C. Analysis of runtimes and benchmarks

VII. CONCLUSIONS

### Key Topics

- Hydrogen bonding
- 9.0
- Data analysis
- 6.0
- Exact solutions
- 6.0
- Molecular dynamics
- 6.0
- Transition state theory
- 6.0

## Figures

A schematic representation of mapping continuous space and trajectories to a network. We have five cells in phase space denoted by X α, X β, X γ, X δ, X ɛ. Each cell can be mapped to a network vertex and the edges would be between vertices, e.g., (α, β) and (β, γ). Sometimes the cells are represented by specific conformations (anchors) that are illustrated in the figure by the blue ellipses. In an alternate network representation, the vertices can be interfaces or milestones denoted on the figure by dashed red lines, which indicate the boundary between domains. There are six milestones in the above figure, I i − I n . Continuous trajectories are mapped to the network either by their location in phase space or by the last milestone that they have passed (color coded curves in the figure). On the right side of the figure we show network representations. The top figure is an anchor-based network and the lower figure is based on the milestones.

A schematic representation of mapping continuous space and trajectories to a network. We have five cells in phase space denoted by X α, X β, X γ, X δ, X ɛ. Each cell can be mapped to a network vertex and the edges would be between vertices, e.g., (α, β) and (β, γ). Sometimes the cells are represented by specific conformations (anchors) that are illustrated in the figure by the blue ellipses. In an alternate network representation, the vertices can be interfaces or milestones denoted on the figure by dashed red lines, which indicate the boundary between domains. There are six milestones in the above figure, I i − I n . Continuous trajectories are mapped to the network either by their location in phase space or by the last milestone that they have passed (color coded curves in the figure). On the right side of the figure we show network representations. The top figure is an anchor-based network and the lower figure is based on the milestones.

Conversion of a flux-space path with milestones as vertices to a state-space path with the corresponding anchors as vertices. The table in the figure shows the mapping from milestone index to anchor index. The weight on each milestone edge contributes to weights on two anchor edges. Each anchor edge in the middle gets a contribution from two milestone edges.

Conversion of a flux-space path with milestones as vertices to a state-space path with the corresponding anchors as vertices. The table in the figure shows the mapping from milestone index to anchor index. The weight on each milestone edge contributes to weights on two anchor edges. Each anchor edge in the middle gets a contribution from two milestone edges.

(a) An example graph with multiple paths between vertices of interest, A and D. (b) Maximum weight paths (MWP) between A and D shown in green. (c) Global maximum weight path (GMWP) between A and D shown in red.

(a) An example graph with multiple paths between vertices of interest, A and D. (b) Maximum weight paths (MWP) between A and D shown in green. (c) Global maximum weight path (GMWP) between A and D shown in red.

Example graph demonstrating Dijkstra's single-source shortest path algorithm. This is a snapshot during the optimization process and not the final result. We are calculating the shortest path lengths from A to all other vertices. Path lengths marked by L are current estimates of the shortest path length from vertex A to a given vertex. Vertices that have already been explored by the algorithm are marked with ✓ and current vertex being examined is marked with *.

Example graph demonstrating Dijkstra's single-source shortest path algorithm. This is a snapshot during the optimization process and not the final result. We are calculating the shortest path lengths from A to all other vertices. Path lengths marked by L are current estimates of the shortest path length from vertex A to a given vertex. Vertices that have already been explored by the algorithm are marked with ✓ and current vertex being examined is marked with *.

Example graph demonstrating single-source maximum weight algorithm. This is a snapshot during the optimization process and not the final result. See text for more details. We are calculating the maximum weights from A to all other vertices. Weights marked by M are current estimates of the maximum weight from vertex A to a given vertex. Vertices that have already been explored by the algorithm are marked with ✓ and current vertex being examined is marked with *.

Example graph demonstrating single-source maximum weight algorithm. This is a snapshot during the optimization process and not the final result. See text for more details. We are calculating the maximum weights from A to all other vertices. Weights marked by M are current estimates of the maximum weight from vertex A to a given vertex. Vertices that have already been explored by the algorithm are marked with ✓ and current vertex being examined is marked with *.

An example graph showing the GMWP between vertices A and G in red.

An example graph showing the GMWP between vertices A and G in red.

Visualization of average networks for helix unfolding under a load level 30 pN in (a) state-space, with 14 anchors (vertices). (b) flux-space with 125 milestones (vertices). The graphs are to illustrate the complexity of analysis and were prepared with the Pajek program. ^{43}

Visualization of average networks for helix unfolding under a load level 30 pN in (a) state-space, with 14 anchors (vertices). (b) flux-space with 125 milestones (vertices). The graphs are to illustrate the complexity of analysis and were prepared with the Pajek program. ^{43}

Global maximum weight paths using three different graph representations for helix unfolding under 0pN stress. Bottleneck edges (EMW) are in red. Note that the second path, represented in anchor space, has a loop to from alpha3 to alpha2. Directional Milestoning representation allows for such choices since entry milestones (interfaces) can be different for the same state. The source of the difference between the state-based and flux-based graph is the finer (more detailed) description of the system in the flux-based graph. (a) OpN, Path from state-space graph based on flux-based edge weights; (b) OpN, Path from flux-space graph based on flux-based edge wights; (c) OpN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for helix unfolding under 0pN stress. Bottleneck edges (EMW) are in red. Note that the second path, represented in anchor space, has a loop to from alpha3 to alpha2. Directional Milestoning representation allows for such choices since entry milestones (interfaces) can be different for the same state. The source of the difference between the state-based and flux-based graph is the finer (more detailed) description of the system in the flux-based graph. (a) OpN, Path from state-space graph based on flux-based edge weights; (b) OpN, Path from flux-space graph based on flux-based edge wights; (c) OpN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for helix unfolding under 30 pN stress. Bottleneck edges (EMW) are in red. (a) 30pN, Path from state-space graph based on flux-based edge weights; (b) 30pN, Path from flux-space graph based on flux-based edge weights; (c) 30pN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for helix unfolding under 30 pN stress. Bottleneck edges (EMW) are in red. (a) 30pN, Path from state-space graph based on flux-based edge weights; (b) 30pN, Path from flux-space graph based on flux-based edge weights; (c) 30pN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for helix unfolding under 70 pN stress. Bottleneck edges (EMW) are in red. (a) 70pN, Path from state-space graph based on flux-based edge weights; (b) 70pN, Path from flux-space graph based on flux-based edge weights; (c) 70pN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for helix unfolding under 70 pN stress. Bottleneck edges (EMW) are in red. (a) 70pN, Path from state-space graph based on flux-based edge weights; (b) 70pN, Path from flux-space graph based on flux-based edge weights; (c) 70pN, Path from flux-space graph based on rate coefficients.

Global maximum weight paths using three different graph representations for membrane permeation of DOPC. The path descriptions are as follows: Path A: Path obtained from the state-space graph with flux-based weights. Path B: obtained from the flux-space graph with flux-based weights. Path C: obtained from the flux-space graph weighted by local rate coefficients.

Global maximum weight paths using three different graph representations for membrane permeation of DOPC. The path descriptions are as follows: Path A: Path obtained from the state-space graph with flux-based weights. Path B: obtained from the flux-space graph with flux-based weights. Path C: obtained from the flux-space graph weighted by local rate coefficients.

Visualization of proximity of paths shown in Figure 11 . All the nodes visited by the three paths are collected into a single network. The position of the nodes in the network are optimized using force model proposed in Gephi. ^{46} Connected nodes attract each other, while nodes that are far repel each other. The larger cyan nodes are the initial and final milestones. The path descriptions are as follows: Path A (green): Path obtained from the state-space graph with flux-based weights. Path B (red): obtained from the flux-space graph with flux-based weights. Path C (yellow): obtained from the flux-space graph weighted by local rate coefficients. This representation emphasizes similarities between paths A and B (because they both share several milestones), and their difference with respect to path C.

Visualization of proximity of paths shown in Figure 11 . All the nodes visited by the three paths are collected into a single network. The position of the nodes in the network are optimized using force model proposed in Gephi. ^{46} Connected nodes attract each other, while nodes that are far repel each other. The larger cyan nodes are the initial and final milestones. The path descriptions are as follows: Path A (green): Path obtained from the state-space graph with flux-based weights. Path B (red): obtained from the flux-space graph with flux-based weights. Path C (yellow): obtained from the flux-space graph weighted by local rate coefficients. This representation emphasizes similarities between paths A and B (because they both share several milestones), and their difference with respect to path C.

## Tables

Algorithm 1 - Dijkstra's algorithm to find the shortest path lengths from vertex s to all other vertices in graph G.

Algorithm 1 - Dijkstra's algorithm to find the shortest path lengths from vertex s to all other vertices in graph G.

Algorithm 2 - Modified Dijkstra's algorithm for finding maximum weights and bottleneck (EMW) edges from s to all other vertices in a graph G. Refer to the algorithm description for the meaning of all variables.

Algorithm 2 - Modified Dijkstra's algorithm for finding maximum weights and bottleneck (EMW) edges from s to all other vertices in a graph G. Refer to the algorithm description for the meaning of all variables.

Algorithm 3 – Recursive Dijkstra's algorithm to find the global maximum weight path between vertices s and t, in a directed graph, based on the modified Dijkstra algorithm for maximum weight paths. Refer to the Dijkstra's algorithm description for the meaning of variables.

Algorithm 3 – Recursive Dijkstra's algorithm to find the global maximum weight path between vertices s and t, in a directed graph, based on the modified Dijkstra algorithm for maximum weight paths. Refer to the Dijkstra's algorithm description for the meaning of variables.

Summary of asymptotic time complexities using various algorithms for dense (E ≈ V ^{2}) and sparse (E ≈ V) graphs. G: List means the graph is implemented using adjacency lists, G: Matrix means the graph is implemented using adjacency matrices, Q: Array means the priority queue in Dijkstra's algorithm is implemented using arrays and Q: Heap means the priority queue is implemented using Fibonacci heaps. These scaling factors have been derived in the Efficiency section of each algorithm.

Summary of asymptotic time complexities using various algorithms for dense (E ≈ V ^{2}) and sparse (E ≈ V) graphs. G: List means the graph is implemented using adjacency lists, G: Matrix means the graph is implemented using adjacency matrices, Q: Array means the priority queue in Dijkstra's algorithm is implemented using arrays and Q: Heap means the priority queue is implemented using Fibonacci heaps. These scaling factors have been derived in the Efficiency section of each algorithm.

Average runtimes in milliseconds for random graphs with 100, 1000, and 10 000 vertices, for the Recursive Dijkstra's, Edge-List Bisection, and Edge-Elimination algorithm. Runtimes were calculated on a single core of an 8 core Linux Intel Xeon X5460 processor with clock speed of 3.16 GHz and 16 GB memory shared among 8 cores. Runtimes were not calculated for the Edge-Elimination algorithm for 10 000 vertices since the estimated runtime was too long. Also shown is the number of edges for each size of random graphs.

Average runtimes in milliseconds for random graphs with 100, 1000, and 10 000 vertices, for the Recursive Dijkstra's, Edge-List Bisection, and Edge-Elimination algorithm. Runtimes were calculated on a single core of an 8 core Linux Intel Xeon X5460 processor with clock speed of 3.16 GHz and 16 GB memory shared among 8 cores. Runtimes were not calculated for the Edge-Elimination algorithm for 10 000 vertices since the estimated runtime was too long. Also shown is the number of edges for each size of random graphs.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content