*ab initio*potential-energy databases

^{1}, L. M. Raff

^{2,a)}, M. Hagan

^{3}, S. Bukkapatnam

^{4}and R. Komanduri

^{1}

### Abstract

The variation in the fitting accuracy of neural networks (NNs) when used to fit databases comprising potential energies obtained from *ab initio*electronic structure calculations is investigated as a function of the number and nature of the elements employed in the input vector to the NN. *Ab initio*databases for , HONO, , and were employed in the investigations. These systems were chosen so as to include four-, five-, and six-body systems containing first, second, third, and fourth row elements with a wide variety of chemical bonding and whose conformations cover a wide range of structures that occur under high-energy machining conditions and in chemical reactions involving *cis-trans*isomerizations, six different types of two-center bond ruptures, and two different three-center dissociationreactions. The *ab initio*databases for these systems were obtained using density functional theory/B3LYP, MP2, and MP4 methods with extended basis sets. A total of 31 input vectors were investigated. In each case, the elements of the input vector were chosen from interatomic distances, inverse powers of the interatomic distance, three-body angles, and dihedral angles. Both redundant and nonredundant input vectors were investigated. The results show that among all the input vectors investigated, the set employed in the Z-matrix specification of the molecular configurations in the electronic structure calculations gave the lowest NN fitting accuracy for both and vinyl bromide. The underlying reason for this result appears to be the discontinuity present in the dihedral angle for planar geometries. The use of trigometric functions of the angles as input elements produced significantly improved fitting accuracy as this choice eliminates the discontinuity. The most accurate fitting was obtained when the elements of the input vector were taken to have the form , where the are the interatomic distances. When the Levenberg–Marquardt procedure was modified to permit error minimization with respect to n as well as the weights and biases of the NN, the optimum powers were all found to lie in the range of 1.625–2.38 for the four systems studied. No statistically significant increase in fitting accuracy was achieved for vinyl bromide when a different value of n was employed and optimized for each bond type. The rate of change in the fitting error with n is found to be very small when n is near its optimum value. Consequently, good fitting accuracy can be achieved by employing a value of n in the middle of the above range. The use of interparticle distances as elements of the input vector rather than the Z-matrix variables employed in the electronic structure calculations is found to reduce the rms fitting errors by factors of 8.86 and 1.67 for and vinyl bromide, respectively. If the interparticle distances are replaced with input elements of the form with n optimized, further reductions in the rms error by a factor of 1.31 to 2.83 for the four systems investigated are obtained. A major advantage of using this procedure to increase NN fitting accuracy rather than increasing the number of neurons or the size of the database is that the required increase in computational effort is very small.

This project was funded by grants from the National Science Foundation (Nos. DMI-0200327 and DMI-0457663). We thank Dr. G. Hazelrigg of the Division of Civil, Mechanical, and Manufacturing Innovation (CMMI) for his interest and support of this work. One of the authors (R.K.) also thank A. H. Nelson, Jr., Endowed Chair in Engineering for additional financial support.

I. INTRODUCTION

II. INPUT VECTOR STUDIES

A. Systems investigated and databases

B. Input vectors

C. Architecture of the NNs

D. Training of the NNs

III. RESULTS, DISCUSSION, AND CONCLUSIONS

### Key Topics

- Databases
- 36.0
- Interatomic distances
- 31.0
- Chemical bonds
- 19.0
- Electronic structure calculations
- 15.0
- Ab initio calculations
- 12.0

## Figures

Definition of atom numbering, interatomic distances, and three-body angles used in the Z-matrix specification of the configuration for the electronic structure calculations.

Definition of atom numbering, interatomic distances, and three-body angles used in the Z-matrix specification of the configuration for the electronic structure calculations.

Definition of atom numbering, interatomic distances, and three-body angles used in the Z-matrix specification of the vinyl bromide configuration for the electronic structure calculations.

Definition of atom numbering, interatomic distances, and three-body angles used in the Z-matrix specification of the vinyl bromide configuration for the electronic structure calculations.

## Tables

NN fitting errors to the 10 202 *ab initio* energies for the system as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

NN fitting errors to the 10 202 *ab initio* energies for the system as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Mean absolute NN fitting errors (MAE) to the 68 302 *ab initio* energies for vinyl bromide as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Mean absolute NN fitting errors (MAE) to the 68 302 *ab initio* energies for vinyl bromide as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Root mean square NN fitting errors (RMSE) to the 15 472 *ab initio* energies for as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Root mean square NN fitting errors (RMSE) to the 15 472 *ab initio* energies for as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Root mean square NN fitting errors (RMSE) to the 21 584 *ab initio* energies for HONO as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Root mean square NN fitting errors (RMSE) to the 21 584 *ab initio* energies for HONO as a function of the form chosen for the configuration input vector. N(h) and are the number of neurons in the NN hidden layer and the total number of weight and bias parameters, respectively (see text for explanation of the notation of the input specification and the details of the fitting procedure).

Percent increases in NN fitting errors observed if the input vector to the NN is comprised of inverse powers of the interparticle distances, , with n being assigned a value of 2.00 rather than being optimized. The value is the midpoint of the range of values obtained for the optimized values of n for the four systems studied. The details of the NNs employed in each case are given in Tables I–IV.

Percent increases in NN fitting errors observed if the input vector to the NN is comprised of inverse powers of the interparticle distances, , with n being assigned a value of 2.00 rather than being optimized. The value is the midpoint of the range of values obtained for the optimized values of n for the four systems studied. The details of the NNs employed in each case are given in Tables I–IV.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content