^{1}and Christian Ochsenfeld

^{1,a)}

### Abstract

We present a simple but accurate preselection method based on Schwarz integral estimates to determine the significant elements of the exact exchange matrix before its evaluation, thus providing an asymptotical linear-scaling behavior for non-metallic systems. Our screening procedure proves to be highly suitable for exchange matrix calculations on massively parallel computing architectures, such as graphical processing units, for which we present a first linear-scaling exchange matrix evaluation algorithm.

The authors thank Simon Maurer (LMU Munich) for useful comments on the paper. C.O. acknowledges financial support by the Volkswagen Stiftung within the funding initiative “New Conceptual Approaches to Modeling and Simulation of Complex Systems,” by the SFB 749 “Dynamik und Intermediate molekularer Transformationen” (DFG), and the DFG cluster of excellence (EXC114) “Center for Integrative Protein Science Munich” (CIPSM).

I. INTRODUCTION

II. EXACT EXCHANGE CALCULATION BY PRE-SELECTIVE LINEAR K FORMATION (PRELINK)

III. EXEMPLARY CALCULATIONS

IV. CONCLUSION

## Figures

Number of significant shell-pairs determined by using the screening matrix **Q**′ (red/circles) as compared to those by selecting exact elements in the exchange matrix **K** (blue/squares) for an amylose fragment containing 16 α-D-glucose units (HF/SVP). The black line depicts the total number of shell-pairs.

Number of significant shell-pairs determined by using the screening matrix **Q**′ (red/circles) as compared to those by selecting exact elements in the exchange matrix **K** (blue/squares) for an amylose fragment containing 16 α-D-glucose units (HF/SVP). The black line depicts the total number of shell-pairs.

Average wall-times (in seconds) for exchange-matrix calculations for a series of linear alkanes with HF/SV using 1, 2, and 4 GPUs. The largest system is C_{640}H_{1282} comprising 1922 atoms and 8324 basis functions.

Average wall-times (in seconds) for exchange-matrix calculations for a series of linear alkanes with HF/SV using 1, 2, and 4 GPUs. The largest system is C_{640}H_{1282} comprising 1922 atoms and 8324 basis functions.

Average wall-times (in seconds) for exchange- and Coulomb-matrix calculations for a series of linear alkanes with HF/SV and HF/SVP using 4 GPUs. Here, the largest system is C_{640}H_{1282} comprising 1922 atoms and 8324 (SV) or 15 370 (SVP) basis functions, respectively.

Average wall-times (in seconds) for exchange- and Coulomb-matrix calculations for a series of linear alkanes with HF/SV and HF/SVP using 4 GPUs. Here, the largest system is C_{640}H_{1282} comprising 1922 atoms and 8324 (SV) or 15 370 (SVP) basis functions, respectively.

Average wall-times (in seconds) for exchange-correlation- and Coulomb-matrix calculations for a series of linear alkanes using S-VWN/SVP and PBE/SVP and two different grids on 4 GPUs.

Average wall-times (in seconds) for exchange-correlation- and Coulomb-matrix calculations for a series of linear alkanes using S-VWN/SVP and PBE/SVP and two different grids on 4 GPUs.

## Tables

Effect of pre-selective screening on the final SCF energy, average wall-time for the exchange calculation, and total number of SCF iterations for the example of a DNA-fragment with four A-T base pairs (HF/SVP). Note that preLinK only selects significant elements in the final **K** matrix, while the number of integrals necessary is also controlled by the Schwarz screening threshold ϑ_{int}, set here conservatively to 10^{−10}. The deviation from the reference value (ϑ_{pre} = 10^{−12}) is given in 10^{−6} a.u. (μ*H*).

Effect of pre-selective screening on the final SCF energy, average wall-time for the exchange calculation, and total number of SCF iterations for the example of a DNA-fragment with four A-T base pairs (HF/SVP). Note that preLinK only selects significant elements in the final **K** matrix, while the number of integrals necessary is also controlled by the Schwarz screening threshold ϑ_{int}, set here conservatively to 10^{−10}. The deviation from the reference value (ϑ_{pre} = 10^{−12}) is given in 10^{−6} a.u. (μ*H*).

Average wall-times (in seconds) using 4 GPUs for a single calculation of the exchange- and Coulomb-matrix, respectively. All calculations were performed with a convergence criterion of ϑ_{conv} = 10^{−7}, a conservative integral-threshold of ϑ_{int} = 10^{−10}, and a preselection threshold of ϑ_{pre} = 10^{−4}. N_{ A } denotes the number of atoms, N_{ Q } the number of significant shell-quartets (× 10^{3}) after preselection, the scaling of N_{ Q } with respect to the next smaller system size. The largest systems listed are C_{640}H_{1282}, Amylose64, and DNA16, respectively.

Average wall-times (in seconds) using 4 GPUs for a single calculation of the exchange- and Coulomb-matrix, respectively. All calculations were performed with a convergence criterion of ϑ_{conv} = 10^{−7}, a conservative integral-threshold of ϑ_{int} = 10^{−10}, and a preselection threshold of ϑ_{pre} = 10^{−4}. N_{ A } denotes the number of atoms, N_{ Q } the number of significant shell-quartets (× 10^{3}) after preselection, the scaling of N_{ Q } with respect to the next smaller system size. The largest systems listed are C_{640}H_{1282}, Amylose64, and DNA16, respectively.

Wall-times (in seconds) using 4 GPUs and scaling behavior for a single calculation of the exchange- and Coulomb-matrix, respectively, for a series of DNA-fragments using different exchange thresholds. The significant shell-pairs are determined with ϑ = 10^{−10}, the Coulomb screening threshold is ϑ_{int} = 10^{−10}. The error in the final SCF-energy (mhartree) is given with respect to the calculation with ϑ_{int} = 10^{−10} and ϑ_{pre} = 10^{−4} for the exchange calculation.

Wall-times (in seconds) using 4 GPUs and scaling behavior for a single calculation of the exchange- and Coulomb-matrix, respectively, for a series of DNA-fragments using different exchange thresholds. The significant shell-pairs are determined with ϑ = 10^{−10}, the Coulomb screening threshold is ϑ_{int} = 10^{−10}. The error in the final SCF-energy (mhartree) is given with respect to the calculation with ϑ_{int} = 10^{−10} and ϑ_{pre} = 10^{−4} for the exchange calculation.

Wall-times (in seconds) for a single Coulomb-, exchange-, and exchange-correlation matrix calculation using the PRISM-algorithm on CPUs as well as the GPU-algorithm on GPUs and CPUs, respectively, for an amylose fragment containing 16 α-D-glucose units (HF and PBE using SV and SVP basis sets, ϑ_{int} = 10^{−10}, ϑ_{pre} = 10^{−4}, grid: 75/302). DFT timings include grid generation.

Wall-times (in seconds) for a single Coulomb-, exchange-, and exchange-correlation matrix calculation using the PRISM-algorithm on CPUs as well as the GPU-algorithm on GPUs and CPUs, respectively, for an amylose fragment containing 16 α-D-glucose units (HF and PBE using SV and SVP basis sets, ϑ_{int} = 10^{−10}, ϑ_{pre} = 10^{−4}, grid: 75/302). DFT timings include grid generation.

Article metrics loading...

Full text loading...

Commenting has been disabled for this content