Volume 139, Issue 18, 14 November 2013

The calculation of a proteinligand binding free energy based on molecular dynamics (MD) simulations generally relies on a thermodynamic cycle in which the ligand is alchemically inserted into the system, both in the solvated protein and free in solution. The corresponding ligandinsertion free energies are typically calculated in nanoscale computational boxes simulated under periodic boundary conditions and considering electrostatic interactions defined by a periodic latticesum. This is distinct from the ideal bulk situation of a system of macroscopic size simulated under nonperiodic boundary conditions with Coulombic electrostatic interactions. This discrepancy results in finitesize effects, which affect primarily the charging component of the insertion free energy, are dependent on the box size, and can be large when the ligand bears a net charge, especially if the protein is charged as well. This article investigates finitesize effects on calculated charging free energies using as a test case the binding of the ligand 2amino5methylthiazole (net charge +1 e) to a mutant form of yeast cytochrome c peroxidase in water. Considering different charge isoforms of the protein (net charges −5, 0, +3, or +9 e), either in the absence or the presence of neutralizing counterions, and sizes of the cubic computational box (edges ranging from 7.42 to 11.02 nm), the potentially large magnitude of finitesize effects on the raw charging free energies (up to 17.1 kJ mol^{−1}) is demonstrated. Two correction schemes are then proposed to eliminate these effects, a numerical and an analytical one. Both schemes are based on a continuumelectrostatics analysis and require performing PoissonBoltzmann (PB) calculations on the proteinligand system. While the numerical scheme requires PB calculations under both nonperiodic and periodic boundary conditions, the latter at the box size considered in the MD simulations, the analytical scheme only requires three nonperiodic PB calculations for a given system, its dependence on the box size being analytical. The latter scheme also provides insight into the physical origin of the finitesize effects. These two schemes also encompass a correction for discrete solvent effects that persists even in the limit of infinite box sizes. Application of either scheme essentially eliminates the size dependence of the corrected charging free energies (maximal deviation of 1.5 kJ mol^{−1}). Because it is simple to apply, the analytical correction scheme offers a general solution to the problem of finitesize effects in freeenergy calculations involving charged solutes, as encountered in calculations concerning, e.g., proteinligand binding, biomolecular association, residue mutation, pKa and redox potential estimation, substrate transformation, solvation, and solventsolvent partitioning.
 ARTICLES

 Theoretical Methods and Algorithms

Polyad quantum numbers and multiple resonances in anharmonic vibrational studies of polyatomic molecules
View Description Hide DescriptionIn the theory of anharmonic vibrations of a polyatomic molecule, mixing the zeroorder vibrational states due to cubic, quartic and higherorder terms in the potential energy expansion leads to the appearance of moreorless isolated blocks of states (also called polyads), connected through multiple resonances. Such polyads of states can be characterized by a common secondary integer quantum number. This polyad quantum number is defined as a linear combination of the zeroorder vibrational quantum numbers, attributed to normal modes, multiplied by nonnegative integer polyad coefficients, which are subject to definition for any particular molecule. According to Kellman's method [J. Chem. Phys. 93, 6630 (1990)], the corresponding formalism can be conveniently described using vector algebra. In the present work, a systematic consideration of polyad quantum numbers is given in the framework of the canonical Van Vleck perturbation theory (CVPT) and its numericalanalytic operator implementation for reducing the Hamiltonian to the quasidiagonal form, earlier developed by the authors. It is shown that CVPT provides a convenient method for the systematic identification of essential resonances and the definition of a polyad quantum number. The method presented is generally suitable for molecules of significant size and complexity, as illustrated by several examples of molecules up to six atoms. The polyad quantum number technique is very useful for assembling comprehensive basis sets for the matrix representation of the Hamiltonian after removal of all nonresonance terms by CVPT. In addition, the classification of anharmonic energy levels according to their polyad quantum numbers provides an additional means for the interpretation of observed vibrational spectra.

Violation of the massaction law in dilute chemical systems
View Description Hide DescriptionThe massaction law, which predicts the rates of chemical reactions, is widely used for modeling the kinetics of the chemical reactions and their stationary states, also for complex chemical reaction networks. However, violations of the massaction equations have been reported in various cases: in confined systems with a small number of molecules, in nonideallystirred systems, when the reactions are limited by the diffusion, at high concentrations of reactants, or in chemical reaction networks with marginally stable massaction equations. In this paper, I describe a new mechanism, leading to the violation of the massaction equations, that takes place at a low concentration of at least one of the reactants; in this limit, the reaction rates can be easily inferred from the chemical reaction network. I propose that this mechanism underlies the replication stability of the hypercycles, a class of chemical reaction networks hypothetically connected with abiogenesis. I provide two simple examples of chemical reaction networks in which the mechanism leading to the violation of the massaction law is present. I study the two chemical reaction networks by means of a simulation performed with a cellular automaton model. The results have a general validity and represent a limitation of the validity of the massaction law, which has been overlooked up to now in the studies about the chemical reaction networks.

Calculating the binding free energies of charged species based on explicitsolvent simulations employing latticesum methods: An accurate correction scheme for electrostatic finitesize effects
View Description Hide DescriptionThe calculation of a proteinligand binding free energy based on molecular dynamics (MD) simulations generally relies on a thermodynamic cycle in which the ligand is alchemically inserted into the system, both in the solvated protein and free in solution. The corresponding ligandinsertion free energies are typically calculated in nanoscale computational boxes simulated under periodic boundary conditions and considering electrostatic interactions defined by a periodic latticesum. This is distinct from the ideal bulk situation of a system of macroscopic size simulated under nonperiodic boundary conditions with Coulombic electrostatic interactions. This discrepancy results in finitesize effects, which affect primarily the charging component of the insertion free energy, are dependent on the box size, and can be large when the ligand bears a net charge, especially if the protein is charged as well. This article investigates finitesize effects on calculated charging free energies using as a test case the binding of the ligand 2amino5methylthiazole (net charge +1 e) to a mutant form of yeast cytochrome c peroxidase in water. Considering different charge isoforms of the protein (net charges −5, 0, +3, or +9 e), either in the absence or the presence of neutralizing counterions, and sizes of the cubic computational box (edges ranging from 7.42 to 11.02 nm), the potentially large magnitude of finitesize effects on the raw charging free energies (up to 17.1 kJ mol^{−1}) is demonstrated. Two correction schemes are then proposed to eliminate these effects, a numerical and an analytical one. Both schemes are based on a continuumelectrostatics analysis and require performing PoissonBoltzmann (PB) calculations on the proteinligand system. While the numerical scheme requires PB calculations under both nonperiodic and periodic boundary conditions, the latter at the box size considered in the MD simulations, the analytical scheme only requires three nonperiodic PB calculations for a given system, its dependence on the box size being analytical. The latter scheme also provides insight into the physical origin of the finitesize effects. These two schemes also encompass a correction for discrete solvent effects that persists even in the limit of infinite box sizes. Application of either scheme essentially eliminates the size dependence of the corrected charging free energies (maximal deviation of 1.5 kJ mol^{−1}). Because it is simple to apply, the analytical correction scheme offers a general solution to the problem of finitesize effects in freeenergy calculations involving charged solutes, as encountered in calculations concerning, e.g., proteinligand binding, biomolecular association, residue mutation, pKa and redox potential estimation, substrate transformation, solvation, and solventsolvent partitioning.

Linearscaling symmetryadapted perturbation theory with scaled dispersion
View Description Hide DescriptionWe present a linearscaling symmetryadapted perturbation theory (SAPT) method that is based on an atomic orbital (AO) formulation of zerothorder SAPT (SAPT0). The nondispersive terms are realized with linearscaling cost using both the continuous fast multipole method (CFMM) and the linear exchange (LinK) approach for integral contractions as well as our efficient Laplacebased coupledperturbed selfconsistent field method (DLCPSCF) for evaluating response densities. The reformulation of the dispersion term is based on our linearscaling AO MøllerPlesset secondorder perturbation theory (AOMP2) method, that uses our recently introduced QQRtype screening [S. A. Maurer, D. S. Lambrecht, J. Kussmann, and C. Ochsenfeld, J. Chem. Phys.138, 014101 (2013)] for preselecting numerically significant energy contributions. Similar to scaled oppositespin MP2, we neglect the exchangedispersion term in SAPT and introduce a scaling factor for the dispersion term, which compensates for the error and at the same time accounts for basis set incompleteness effects and intramonomer correlation. We show in extensive benchmark calculations that the new scaleddispersion (sd)SAPT0 approach provides reliable results for small and large interacting systems where the results with a small 631G** basis are roughly comparable to supermolecular MP2 calculations in a triplezeta basis. The performance of our method is demonstrated with timings on cellulose fragments, DNA systems, and cutouts of a proteinligand complex with up to 1100 atoms on a single computer core.

Characterizing Ndimensional anisotropic Brownian motion by the distribution of diffusivities
View Description Hide DescriptionAnisotropic diffusion processes emerge in various fields such as transport in biological tissue and diffusion in liquid crystals. In such systems, the motion is described by a diffusion tensor. For a proper characterization of processes with more than one diffusion coefficient, an average description by the mean squared displacement is often not sufficient. Hence, in this paper, we use the distribution of diffusivities to study diffusion in a homogeneous anisotropic environment. We derive analytical expressions of the distribution and relate its properties to an anisotropy measure based on the mean diffusivity and the asymptotic decay of the distribution. Both quantities are easy to determine from experimental data and reveal the existence of more than one diffusion coefficient, which allows the distinction between isotropic and anisotropic processes. We further discuss the influence on the analysis of projected trajectories, which are typically accessible in experiments. For the experimentally most relevant cases of two and threedimensional anisotropic diffusion, we derive specific expressions, determine the diffusion tensor, characterize the anisotropy, and demonstrate the applicability for simulated trajectories.

The instantaneous fluctuation theorem
View Description Hide DescriptionWe give a derivation of a new instantaneous fluctuation relation for an arbitrary phase function which is odd under time reversal. The form of this new relation is not obvious, and involves observing the system along its transient phase space trajectory both before and after the point in time at which the fluctuations are being compared. We demonstrate this relation computationally for a number of phase functions in a shear flow system and show that this nonlocality in time is an essential component of the instantaneous fluctuation theorem.

Asymptotic expansion of twoelectron integrals and its application to Coulomb and exchange lattice sums in metallic, semimetallic, and nonmetallic crystals
View Description Hide DescriptionA simple, easily implemented, accurate, and efficient approximation of longrange electronelectronrepulsion and electronnucleusattraction integrals is proposed. It replaces each product of two atomicorbital (AO) basis functions of an electron by a point charge centered at the midpoint of the two AO's. The magnitude of the point charge is equal to the overlap integral of the two AO's. Each integral is then rapidly evaluated in the direct algorithm as a Coulomb interaction between two point charges. This scheme is implemented in ab initio Hartree–Fock crystalline orbital theory and tested for one, two, and threedimensional solids of metallic, semimetallic, and nonmetallic electronic structures, in which the lattice sums of the direct Coulomb and/or exchange interactions are expected to be slowly convergent. It is shown that this approximation reduces operation and/or memory costs by up to an order of magnitude to achieve converged lattice sums, although the scaling (size dependence) of operation cost is unchanged. An improved criterion for truncating the exchange lattice sum is also proposed.

Assessment of G3(MP2)//B3 theory including a pseudopotential for molecules containing first, second, and thirdrow representative elements
View Description Hide DescriptionG3(MP2)//B3 theory was modified to incorporate compact effective potential (CEP) pseudopotentials, providing a theoretical alternative referred to as G3(MP2)//B3CEP for calculations involving first, second, and thirdrow representative elements. The G3/05 test set was used as a standard to evaluate the accuracy of the calculated properties. G3(MP2)//B3CEP theory was applied to the study of 247 standard enthalpies of formation, 104 ionization energies, 63 electron affinities, 10 proton affinities, and 22 atomization energies, comprising 446 experimental energies. The mean absolute deviations compared with the experimental data for all thermochemical results presented an accuracy of 1.4 kcal mol^{−1} for G3(MP2)//B3 and 1.6 kcal mol^{−1} for G3(MP2)//B3CEP. Approximately 75% and 70% of the calculated properties are found with accuracy between ±2 kcal mol^{−1} for G3(MP2)//B3 and G3(MP2)//B3CEP, respectively. Considering a confidence interval of 95%, the results may oscillate between ±4.2 kcal mol^{−1} and ±4.6 kcal mol^{−1}, respectively. The overall statistical behavior indicates that the calculations using pseudopotential present similar behavior with the allelectron theory. Of equal importance to the accuracy is the CPU time, which was reduced by between 10% and 40%.

Nonlinear intrinsic variables and state reconstruction in multiscale simulations
View Description Hide DescriptionFinding informative lowdimensional descriptions of highdimensional simulation data (like the ones arising in molecular dynamics or kinetic Monte Carlo simulations of physical and chemical processes) is crucial to understanding physical phenomena, and can also dramatically assist in accelerating the simulations themselves. In this paper, we discuss and illustrate the use of nonlinear intrinsic variables (NIV) in the mining of highdimensional multiscale simulation data. In particular, we focus on the way NIV allows us to functionally merge different simulation ensembles, and different partial observations of these ensembles, as well as to infer variables not explicitly measured. The approach relies on certain simple features of the underlying process variability to filter out measurement noise and systematically recover a unique reference coordinate frame. We illustrate the approach through two distinct sets of atomistic simulations: a stochastic simulation of an enzyme reaction network exhibiting both fast and slow time scales, and a molecular dynamics simulation of alanine dipeptide in explicit water.

Broken symmetry approach to density functional calculation of zero field splittings including anisotropic exchange interactions
View Description Hide DescriptionThe broken symmetry approach to the calculation of zero field splittings (or magnetic anisotropies) of multinuclear transition metal complexes is further developed. A procedure is suggested how to extract spin Hamiltonian parameters for anisotropic exchange from a set of broken symmetry density functional calculations. For isotropic exchange coupling constants J ij , the established procedure is retrieved, and anisotropic (or pseudodipolar) exchange coupling tensors D ij are obtained analogously. This procedure only yields the sum of the individual singleion zero field splitting tensors D i . Therefore, a procedure based on localized orbitals has been developed to extract the individual singleion contributions. With spin Hamiltonian parameters at hand, the zero field splittings of the individual spin multiplets are calculated by an exact diagonalization of the isotropic part, followed by a spin projection done numerically. The method is applied to the binuclear cation [LCr(OH)3CrL]^{3 +} (L = 1,4,7trimethyl1,4,7triazanonane) for which experimental zero field splittings for all lowenergy spin states are known, and to the singlemolecule magnet [Fe4(CH3C(CH2O)3)2(dpm)6] (Hdpm = 2,2,6,6tetramethylheptane3,5dione). In both these 3d compounds, the singleion tensors mainly come from the spinorbit interaction. Anisotropic exchange is dominated by the spindipolar interaction only for the chromium compound. Despite the rather small isotropic exchange couplings in the iron compound, spinorbit and spindipolar contributions to anisotropic exchange are of similar size here.

Shortcomings of the standard Lennard–Jones dispersion term in water models, studied with force matching
View Description Hide DescriptionIn this work, ab initio parametrization of water force field is used to get insights into the functional form of empirical potentials to properly model the physics underlying dispersion interactions. We exploited the force matching algorithm to fit the interaction forces obtained with dispersion corrected density functional theory based molecular dynamics simulations. We found that the standard LennardJones interaction potentials poorly reproduce the attractive character of dispersion forces. This drawback can be resolved by accounting for the distinctive short range behavior of dispersion interactions, multiplying the r ^{−6} term by a damping function. We propose two novel parametrizations of the force field using different damping functions. Structural and dynamical properties of the new models are computed and compared with the ones obtained from the nondamped force field, showing an improved agreement with reference first principle calculations.

Efficient calculation of manybody induced electrostatics in molecular systems
View Description Hide DescriptionPotential energy functions including manybody polarization are in widespread use in simulations of aqueous and biological systems, metalorganics, molecular clusters, and other systems where electronically induced redistribution of charge among local atomic sites is of importance. The polarization interactions, treated here via the methods of Thole and Applequist, while longranged, can be computed for moderatesized periodic systems with extremely high accuracy by extending Ewald summation to the induced fields as demonstrated by Nymand, Sala, and others. These full Ewald polarization calculations, however, are expensive and often limited to very small systems, particularly in Monte Carlo simulations, which may require energy evaluation over several hundredthousand configurations. For such situations, it shall be shown that sufficiently accurate computation of the polarization energy can be produced in a fraction of the central processing unit (CPU) time by neglecting the longrange extension to the induced fields while applying the longrange treatments of Ewald or Wolf to the static fields; these methods, denoted Ewald EStatic and Wolf EStatic (WES), respectively, provide an effective means to obtain polarization energies for intermediate and large systems including those with several thousand polarizable sites in a fraction of the CPU time. Furthermore, we shall demonstrate a means to optimize the damping for WES calculations via extrapolation from smaller trial systems.

Nbody:Manybody QM:QM vibrational frequencies: Application to small hydrogenbonded clusters
View Description Hide DescriptionWe present an efficient method for reproducing CCSD(T) (i.e., the coupledcluster method with single, double and perturbative connected triple excitations) optimized geometries and harmonic vibrational frequencies for molecular clusters with the Nbody:Manybody QM:QM technique. In this work, all 1body through Nbody interactions are obtained from CCSD(T) computations, and the higherorder interactions are captured at the MP2 level. The linear expressions from the manybody expansion facilitate a straightforward evaluation of geometrical derivative properties (e.g., gradients and Hessians). For (H2O) n clusters (n = 3–7), optimized structures obtained with the 2body:Manybody CCSD(T):MP2 method are virtually identical to CCSD(T) optimized geometries. Harmonic vibrational frequencies calculated with this 2body:Manybody approach differ from CCSD(T) frequencies by at most a few cm^{−1}. These deviations can be systematically reduced by including more terms from the manybody expansion at the CCSD(T) level. Maximum deviations between CCSD(T) and 3body:Manybody CCSD(T):MP2 frequencies are typically only a few tenths of a cm^{−1} for the H2O clusters examined in this work. These results are obtained at a fraction of the wall time of the supermolecular CCSD(T) computation, and the approach is wellsuited for parallelization on relatively modest computational hardware.

Projected and hidden Markov models for calculating kinetics and metastable states of complex molecules
View Description Hide DescriptionMarkov state models (MSMs) have been successful in computing metastable states, slow relaxation timescales and associated structural changes, and stationary or kinetic experimental observables of complex molecules from large amounts of molecular dynamics simulation data. However, MSMs approximate the true dynamics by assuming a Markov chain on a clusters discretization of the state space. This approximation is difficult to make for highdimensional biomolecular systems, and the quality and reproducibility of MSMs has, therefore, been limited. Here, we discard the assumption that dynamics are Markovian on the discrete clusters. Instead, we only assume that the full phasespace molecular dynamics is Markovian, and a projection of this full dynamics is observed on the discrete states, leading to the concept of Projected Markov Models (PMMs). Robust estimation methods for PMMs are not yet available, but we derive a practically feasible approximation via Hidden Markov Models (HMMs). It is shown how various molecular observables of interest that are often computed from MSMs can be computed from HMMs/PMMs. The new framework is applicable to both, simulation and singlemolecule experimental data. We demonstrate its versatility by applications to educative model systems, a 1 ms Anton MD simulation of the bovine pancreatic trypsin inhibitor protein, and an optical tweezer force probe trajectory of an RNA hairpin.

A new postquantization constrained propagator for rigid tops for use in path integral quantum simulations
View Description Hide DescriptionIn this paper, we extend the previously introduced PostQuantization Constraints (PQC) procedure [G. Guillon, T. Zeng, and P.N. Roy, J. Chem. Phys.138, 184101 (2013)] to construct approximate propagators and energy estimators for different rigid body systems, namely, the spherical, symmetric, and asymmetric tops. These propagators are for use in Path Integral simulations. A thorough discussion of the underlying geometrical concepts is given. Furthermore, a detailed analysis of the convergence properties of the density as well as the energy estimators towards their exact counterparts is presented along with illustrative numerical examples. The PostQuantization Constraints approach can yield converged results and is a practical alternative to socalled sum over states techniques, where one has to expand the propagator as a sum over a complete set of rotational stationary states [as in E. G. Noya, C. Vega, and C. McBride, J. Chem. Phys.134, 054117 (2011)] because of its modest memory requirements.

Extreme densitydriven delocalization error for a model solvatedelectron system
View Description Hide DescriptionDelocalization (or chargetransfer) error is one of the scarce but spectacular failures of densityfunctional theory. It is particularly apparent in extensively delocalized molecules, and manifests in the calculation of bandgaps, reaction barriers, and dissociation limits. Even though delocalization error is always present in the selfconsistent electron density, the differences from reference densities are often quite subtle and the error tends to be driven by the exchangecorrelation energy expression. In this article, we propose a model system (the Kevan model) where approximate density functionals predict dramatically different charge distributions because of delocalization error. The model system consists of an electron trapped in a water hexamer and is a finite representation of an experimentally observed class of solids: electrides. The Kevan model is of fundamental interest because it allows the estimation of charge transfer error without recourse to fractional charge calculations, but our results are also relevant in the context of the modeling of confined electrons in densityfunctional theory.

Molecular electrostatic potentials by systematic molecular fragmentation
View Description Hide DescriptionA simple method is presented for estimating the molecular electrostatic potential in and around molecules using systematic molecular fragmentation. This approach estimates the potential directly from the electron density. The accuracy of the method is established for a set of organic molecules and ions. The utility of the approach is demonstrated by estimating the binding energy of a water molecule in an internal cavity in the protein ubiquitin.

Metrics for measuring distances in configuration spaces
View Description Hide DescriptionIn order to characterize molecular structures we introduce configurational fingerprint vectors which are counterparts of quantities used experimentally to identify structures. The Euclidean distance between the configurational fingerprint vectors satisfies the properties of a metric and can therefore safely be used to measure dissimilarities between configurations in the high dimensional configuration space. In particular we show that these metrics are a perfect and computationally cheap replacement for the rootmeansquare distance (RMSD) when one has to decide whether two noise contaminated configurations are identical or not. We introduce a Monte Carlo approach to obtain the global minimum of the RMSD between configurations, which is obtained from a global minimization over all translations, rotations, and permutations of atomic indices.

The critical compressibility factor of fluids from the global isomorphism approach
View Description Hide DescriptionThe relation between the critical compressibility factors Z c of the LennardJones fluid and the Lattice Gas (Ising model) is derived within the global isomorphism approach. On this basis, we obtain the alternative form for the value of the critical compressibility factor which is different from widely used phenomenological Timmermans relation. The estimates for the critical pressure P c and Z c of the LennardJones fluid are obtained in case of two and three dimensions. The extension of the formalism is proposed to include the Pitzer's acentric factor into consideration.
 Advanced Experimental Techniques

Enhanced sensitivity in H photofragment detection by twocolor reducedDoppler ion imaging
View Description Hide DescriptionTwocolor reducedDoppler (TCRD) and onecolor velocity map imaging (VMI) were used for probing H atom photofragments resulting from the ∼243.1 nm photodissociation of pyrrole. The velocity components of the H photofragments were probed by employing two counterpropagating beams at close and fixed wavelengths of 243.15 and 243.12 nm in TCRD and a single beam at ∼243.1 nm, scanned across the Doppler profile in VMI. The TCRD imaging enabled probing of the entire velocity distribution in a single pulse, resulting in enhanced ionization efficiency, as well as improved sensitivity and signaltonoise ratio. These advantages were utilized for studying the pyrrole photodissociation at ∼243.1 and 225 nm, where the latter wavelength provided only a slight increase in the H yield over the selfsignal from the probe beams. The TCRD imaging enabled obtaining high quality H^{+} images, even for the low H photofragment yields formed in the 225 nm photolysis process, and allowed determining the velocity distributions and anisotropy parameters and getting insight into pyrrole photodissociation.