^{1}, A. Pukrittayakamee

^{2}, L. M. Raff

^{3,a)}, M. Hagan

^{2}, S. Bukkapatnam

^{4}and R. Komanduri

^{1}

### Abstract

A novel method is presented that significantly reduces the computational bottleneck of executing high-level, electronic structure calculations of the energies and their gradients for a large database that adequately samples the configuration space of importance for systems containing more than four atoms that are undergoing multiple, simultaneous reactions in several energetically open channels. The basis of the method is the high-degree of correlation that generally exists between the Hartree–Fock (HF) and higher-level electronic structureenergies. It is shown that if the input vector to a neural network (NN) includes both the configuration coordinates and the HF energies of a small subset of the database, MP4(SDQ) energies with the same basis set can be predicted for the entire database using only the HF and MP4(SDQ) energies for the small subset and the HF energies for the remainder of the database. The predictive error is shown to be less than or equal to the NN fitting error if a NN is fitted to the entire database of higher-level electronic structureenergies. The general method is applied to the computation of MP4(SDQ) energies of 68 308 configurations that comprise the database for the simultaneous, unimolecular decomposition of vinyl bromide into six different reaction channels. The predictive accuracy of the method is investigated by employing successively smaller subsets of the database to train the NN to predict the MP4(SDQ) energies of the remaining configurations of the database. The results indicate that for this system, the subset can be as small as 8% of the total number of configurations in the database without loss of accuracy beyond that expected if a NN is employed to fit the higher-level energies for the entire database. The utilization of this procedure is shown to save about 78% of the total computational time required for the execution of the MP4(SDQ) calculations. The sampling error involved with selection of the subset is shown to be about 10% of the predictive error for the higher-level energies. A practical procedure for utilization of the method is outlined. It is suggested that the method will be equally applicable to the prediction of electronic structureenergies computed using even higher-level methods than MP4(SDQ).

This project is funded by a grant from the National Science Foundation (Grant No. DMI-0457663), Division of Civil, Mechanical, and Manufacturing Innovation (CMMI). The authors thank Dr. Jocelyn Harrison, Program Director for Materials Processing and Manufacturing, for the interest and support of this work. One of the authors (R.K.) also thanks the A. H. Nelson, Jr. Endowed Chair for additional financial support.

I. INTRODUCTION

II. CONCEPT AND GENERAL PROCEDURES

III. APPLICATION TO VINYL BROMIDE DATABASE

IV. DISCUSSION AND EVALUATION OF THE METHOD

V. SUMMARY AND CONCLUSIONS

### Key Topics

- Databases
- 76.0
- Ab initio calculations
- 20.0
- Electronic structure calculations
- 16.0
- Testing procedures
- 16.0
- Electronic structure
- 10.0

## Figures

Variation in the MP4(SDQ) energies as a function of the HF energy in eV for the 68 308 configurations in the vinyl bromide database.

Variation in the MP4(SDQ) energies as a function of the HF energy in eV for the 68 308 configurations in the vinyl bromide database.

Comparison of the spread of the differences between the predicted MP4(SDQ) energies obtained from the median network with and the computed *ab initio* MP4(SDQ) energies in eV. The spread increases significantly at larger energies due to the much wider spread of vinyl bromide configurations that have a given HF energy, each with differing amounts of correlation energy.

Comparison of the spread of the differences between the predicted MP4(SDQ) energies obtained from the median network with and the computed *ab initio* MP4(SDQ) energies in eV. The spread increases significantly at larger energies due to the much wider spread of vinyl bromide configurations that have a given HF energy, each with differing amounts of correlation energy.

Distribution of the predicted MP4(SDQ) energies obtained from the median NN with and the computed *ab initio* MP4(SDQ) energies in eV using GAUSSIAN-03. The MAE of the distribution is 0.0549 eV.

Distribution of the predicted MP4(SDQ) energies obtained from the median NN with and the computed *ab initio* MP4(SDQ) energies in eV using GAUSSIAN-03. The MAE of the distribution is 0.0549 eV.

Comparison of the spread of the differences between the predicted MP4(SDQ) energies obtained from the median network with and the computed *ab initio* MP4(SDQ) energies in eV. The spread increases significantly at larger energies due to the much wider spread of vinyl bromide configurations that have a given HF energy, each with differing amounts of correlation energy. It is smaller than that seen in Fig. 2 because of the much larger training set.

Comparison of the spread of the differences between the predicted MP4(SDQ) energies obtained from the median network with and the computed *ab initio* MP4(SDQ) energies in eV. The spread increases significantly at larger energies due to the much wider spread of vinyl bromide configurations that have a given HF energy, each with differing amounts of correlation energy. It is smaller than that seen in Fig. 2 because of the much larger training set.

Distribution of the predicted MP4(SDQ) energies obtained from the median NN with and a training set that comprises 60% of the database and the computed *ab initio* MP4(SDQ) energies in eV using GAUSSIAN-03. The MAE of the distribution is 0.0292 eV.

Distribution of the predicted MP4(SDQ) energies obtained from the median NN with and a training set that comprises 60% of the database and the computed *ab initio* MP4(SDQ) energies in eV using GAUSSIAN-03. The MAE of the distribution is 0.0292 eV.

Atom notation used to specify the NN input elements for vinyl bromide.

Atom notation used to specify the NN input elements for vinyl bromide.

Variation in the MAE on the testing set vs data with and without HF energy in the NN input vector.

Variation in the MAE on the testing set vs data with and without HF energy in the NN input vector.

Flowchart for practical application of the method.

Flowchart for practical application of the method.

Illustration of single-input neuron.

Illustration of single-input neuron.

Illustration of a NN layer with neurons.

Illustration of a NN layer with neurons.

Matrix illustration of a three-layer MLP network.

Matrix illustration of a three-layer MLP network.

## Tables

Typical fitting accuracy of MSI methods for three- and four-body systems undergoing a single, two-center bond dissociation reaction.

Typical fitting accuracy of MSI methods for three- and four-body systems undergoing a single, two-center bond dissociation reaction.

Typical fitting accuracy of IMLS methods for three- and four-body systems undergoing a single, two-center bond dissociation reaction.

Typical fitting accuracy of IMLS methods for three- and four-body systems undergoing a single, two-center bond dissociation reaction.

Typical fitting accuracy of NN methods for three-, four-, and six-body systems undergoing multiple two-, three-, and/or four-center reactions.

Typical fitting accuracy of NN methods for three-, four-, and six-body systems undergoing multiple two-, three-, and/or four-center reactions.

Database sizes required for convergence of some systems whose dynamics has been investigated using *ab initio* methods.

Database sizes required for convergence of some systems whose dynamics has been investigated using *ab initio* methods.

Median error values, RMSE and MAE, using nine different random samplings for eight different values of for the 68 308 point database for vinyl bromide. All errors are given in units of eV. In each case, . In each case, the input vector for the (16-80-1) NN comprises the 15 bond distances and the HF energy for vinyl bromide.

Median error values, RMSE and MAE, using nine different random samplings for eight different values of for the 68 308 point database for vinyl bromide. All errors are given in units of eV. In each case, . In each case, the input vector for the (16-80-1) NN comprises the 15 bond distances and the HF energy for vinyl bromide.

Typical MAE results obtained for each of nine different random samplings of the vinyl bromide database for , 0.40, and 0.70. All errors are given in units of eV. The standard deviations of the nine results from the median results reported in Table V are given at the bottom of each column. These deviations are a measure of the expected sampling error of the method.

Typical MAE results obtained for each of nine different random samplings of the vinyl bromide database for , 0.40, and 0.70. All errors are given in units of eV. The standard deviations of the nine results from the median results reported in Table V are given at the bottom of each column. These deviations are a measure of the expected sampling error of the method.

Sensitivity of the computed MSE of the NN to each of the 16 input elements for the median NNs trained with and 10% of the database used for training and with and 60% of the database used for training. The results are all normalized with the largest sensitivity being set equal to unity. See text for the definition of sensitivity. The notation for the input elements follows the atom numbering given in Fig. 6.

Sensitivity of the computed MSE of the NN to each of the 16 input elements for the median NNs trained with and 10% of the database used for training and with and 60% of the database used for training. The results are all normalized with the largest sensitivity being set equal to unity. See text for the definition of sensitivity. The notation for the input elements follows the atom numbering given in Fig. 6.

Median error values, RMSE and MAE, using nine different random samplings for four different values of for the 68 308 point database for vinyl bromide with a NN that does not contain the HF energy of the configuration as one of the input elements. Therefore, the (15-80-1) NN has only the 15 configuration coordinates in the input vector. All errors are given in units of eV. In each case, .

Median error values, RMSE and MAE, using nine different random samplings for four different values of for the 68 308 point database for vinyl bromide with a NN that does not contain the HF energy of the configuration as one of the input elements. Therefore, the (15-80-1) NN has only the 15 configuration coordinates in the input vector. All errors are given in units of eV. In each case, .

Testing set errors for the NNs and as described in the text. All errors are given in eV. In each case, the architecture of the networks is (15-140-1).

Testing set errors for the NNs and as described in the text. All errors are given in eV. In each case, the architecture of the networks is (15-140-1).

Article metrics loading...

Full text loading...

Commenting has been disabled for this content