Document [original]

Numerical solution of the Hartree-Fock equation

by multilevel tensor-structured methods

vorgelegt von Diplom-Physikerin

Venera Khoromskaia

Stadt Kazan, Russland

Von der Fakult¨at II - Mathematik und Naturwissenschaften

der Technischen Universit¨at Berlin

zur Erlangung des akademisches Grades

Doktor der Naturwissenschaften

Dr.rer.nat.

genehmigte Dissertation

Promotionausschuss:

Vorsitzender: Prof. Dr. J. Blath

Berichter/Gutachter: Prof. Dr. Reinhold Schneider

Berichter/Gutachter: Prof. Dr. Dr. h.c. Wolfgang Hackbusch

zus¨atzlicher Gutachter: Prof. Dr. Eugene Tyrtyshnikov

Tag der m¨undlichen Pr¨ufung: 10 December 2010

Berlin 2011

D 83

Acknowledgements

I would like to express my gratitude to Prof. Dr. Reinhold Schneider for su-

pervising my PhD project, for fruitful discussions and his friendly encouragement

during the work on this project. His interest and expertise in the research topics

related to the Hartree-Fock equation and the density functional theory motivated

much of the recent progress in the algebraic tensor methods in electronic structure

calculations.

I would like to thank Prof. Dr. Dr. h.c. Wolfgang Hackbusch for valuable

discussions and excellent conditions for performing research at the Max-Planck-

Institute for Mathematics in the Sciences.

This work is done due to fruitful collaboration with Dr. Heinz-J¨urgen Flad. I

appreciate very much his kind encouragement and support of my work and always

beneficial and stimulating discussions owing to his thorough expertise in modern

quantum chemistry.

Professional assistance of PD DrSci. Boris Khoromskij in the research on tensor

numerical methods helped me to make an active start in a new field. I acknowl-

edge him for the statement of some research problems and for proofreading the

manuscript.

I kindly acknowledge Prof. Eugene Tyrtyshnikov, Prof. Dr. Christian Lu-

bich and Prof. Dr. Lars Grasedyck, for their interest to my work and for the

opportunity to give talks at recent conferences and seminars on tensor methods.

I am very much appreciative to Prof. Dr. Ivan Gavrilyuk for his encouragement

and interest to my work.

I would like to thank my colleagues at the Max-Planck-Institute in Leipzig,

Dr. Ronald Kriemann, Dr. Jan Schneider, Dr. Kishore Kumar Naraparaju,

Dr. Sambasiva Rao Chinnamsetty, Dr. Hongjun Luo, Dr. Thomas Blesgen, Dr.

Lehel Banjai, Dipl.-Math. Konrad Kaltenbach, Dipl.-Math. Stephan Schwinger,

Dipl.-Math. Florian Drechsler and colleagues at the TU Berlin, Prof. Dr. Harry

Yserentant, Dipl.-Math. Fritz Kr¨uger and Dipl.-Math. Andr´e Ushmaev for inter-

esting discussions. I would like to acknowledge the colleagues from the Institute

of Numerical Mathematics of the Russian Academy of Science in Moscow, Dr.

Ivan Oseledets and Dr. Dmitrij Savostyanov for stimulating discussions.

I would like to thank Prof. Vikram Gavini (University of Michigan) for pro-

ductive collaboration and for the data on electron density of large Aluminium

clusters.

Kind assistance of the librarians Mrs. Britta Schneemann and Mrs. Katarzyna

Baier was very helpful during my work. I would like to thank cordially the

secretaries at the Max-Planck-Institute and TU Berlin, Mrs. Valeria H¨unniger

and Mrs. Susan Kosub for their helpful technical support.

Numerische L¨osung der Hartree-Fock-Gleichung

mit mehrstufigen Tensor-strukturierten Verfahren

Venera Khoromskaia

Abstract der PhD Dissertation

Die genaue L¨osung der Hartree-Fock-Gleichung (HFG), die ein nichtlineares Eigen-

wertproblem in R3darstellt, ist infolge der nichtlokalen Intergraltransformationen

und der scharfen Peaks in der Elektronendichte und den Molek¨ulorbitalen eine

herausfordernde numerische Aufgabe. Aufgrund der nichtlinearen Abh¨angigkeit

der Hamilton-Matrix von den Eigenvektoren, ist das Problem nur iterativ l¨osbar.

Die traditionelle L¨osung der HFG basiert auf einer analytischen Berechnung der

auftretenden Faltungsintegrale im R3mit Hilfe von dem Problem angepassten

Basen (so genannte Zwei-Elektron-Integrale). Die inh¨arenten Grenzen dieses

Konzepts werden wegen der starken Abh¨angigkeit der numerischen Effizienz von

der Gr¨oße und den Eigenschaften der analytisch separablen Basis sichtbar.

In dieser Dissertation wurden neue gitter-basierte mehrstufige Tensor-strukturierte

Verfahren entwickelt und anhand der numerischen L¨osung der HFG getestet.

Diese Methoden beinhalten effiziente Algorithmen zur Darstellung diskretisierter

Funktionen und Operatoren in R3durch strukturierte Tensoren in den kanonis-

chen, Tucker- und kombinierten Tensorformaten mit einer kontrollierbaren Genauigkeit

sowie schnelle entsprechenden Operationen f¨ur multilineare Tensoren. Insbeson-

dere wird die beschleunigte Mehrgitter-Rang-Reduktion des Tensors vorgestellt,

die auf der reduzierten Singul¨arwertzerlegung h¨oherer Ordnung basiert.

Der Kern der Anwendung dieser Verfahren f¨ur die L¨osung der HFG ist die Ver-

wendung strukturierter Tensoren zur genauen Berechnung der Elektronendichte

und der nichtlinearen Hartree- und (nichtlokalen) Austauschoperatoren in R3,

die in jedem Iterationsschritt auf einer Reihenfolge von n×n×nkartesischen

Gittern darstellt wurden. Somit wurden die entsprechenden sechs-dimensionalen

Integrationen durch multilineare algebraische Operationen wie das Skalar- und

Hadamardprodukt, die dreidimensionale Faltungstransformation und die Rang-

Reduktion f¨ur Tensoren dritter Ordnung ersetzt, die ann¨ahernd mit O(n)-Komplexit¨at

implementiert wurden, wobei ndie eindimensionale Gittergr¨oße ist. Daher ist

der wesentliche Vorteil unserer Tensor-strukturierten Verfahren, dass die gitter-

basierte Berechnung von Integraloperatoren in Rd,d≥3, lineare Komplexit¨at in

nhat. Man beachte, dass im Sinne der ¨ubliche Absch¨atzung mittels des Gitter-

volumens Nvol =n3die Operationen mit strukturierten Tensoren eine sublineare

Komplexit¨at haben, O(N1/3

vol ).

Das vorgestellte ”‘grey-box”’-Schema zur L¨osung der HFG erfordert keine an-

alytischen Vorberechnungen der Zwei-Elektron-Integrale. Weiterhin ist dieses

Schema sehr flexibel hinsichtlich der Wahl der gitter-orientierten Basisfunktio-

nen.

Numerische Berechnungen am Beispiel des “all electron” Falls f¨ur H2O, CH4

und C2H6und des Pseudopotentialfalls f¨ur CH3OH and C2H5OH Molek¨ule zeigen

die geforderte hohe Genauigkeit.

Die Tensor-strukturierten Verfahren k¨onnen auch zur L¨osung der Kohn-Sham-

Gleichung angewandt werden, indem anstelle einer problem-unabh¨angigen Basis,

wie die der ebenen Wellen oder einer großen Anzahl finiter Elemente im R3, eine

geringe Anzahl problem-orientierter rang-strukturierter algebraischer Basisfunk-

tionen verwendet werden, die auf einem Tensorgitter dargestellt sind.

Numerical solution of the Hartree-Fock equation

by multilevel tensor-structured methods

Venera Khoromskaia

Abstract of PhD dissertation

An accurate solution of the Hartree-Fock equation, a nonlinear eigenvalue prob-

lem in R3, is a challenging numerical task due to the presence of nonlocal integral

transforms and strong cusps in the electron density and molecular orbitals. In

view of the nonlinear dependence of the Hamiltonian matrix on the eigenvec-

tors, this problem can only be solved iteratively, by self-consistent field iterations.

Traditionally, the solution of the Hartree-Fock equation is based on rigorous an-

alytical precomputation of the arising convolution type integrals in R3in the

naturally separable basis (so-called two-electron integrals). Inherent limitations

of this concept are evident because of the strong dependence of the numerical

efficiency on the size and approximation quality of the problem adapted basis

sets.

In this dissertation, novel grid-based multilevel tensor-structured methods are

developed and tested by a numerical solution of the Hartree-Fock equation. These

methods include efficient algorithms for the low-rank representation of discretized

functions and operators in R3, in the canonical, Tucker and mixed tensor formats

with a controllable accuracy, and fast procedures for the corresponding multilin-

ear tensor operations. In particular, a novel multigrid accelerated tensor rank

reduction method is introduced, based on the reduced higher order singular value

decomposition.

The core of our approach to the solution of the Hartree-Fock equation is the

accurate tensor-structured computation of the electron density and the nonlinear

Hartree and the (nonlocal) exchange operators in R3, discretized on a sequence

of n×n×nCartesian grids, at every step of nonlinear iterations. Hence, the

corresponding six-dimensional integrations are replaced by multilinear algebra

operations such as the scalar and Hadamard products, the 3D convolution trans-

form, and the rank truncation for 3rd order tensors, which are implemented with

an almost O(n)-complexity, where nis the univariate grid size. In this way, the

basic advantage of our tensor-structured methods is the grid-based evaluation of

integral operators in Rd,d≥3, with linear complexity in n. Note that in terms

of usual estimation by volume size Nvol =n3, the tensor-structured operations

are of sublinear complexity, O(N1/3

vol ).

The proposed “grey-box” scheme for the solution of the Hartree-Fock equation

does not require analytical precomputation of two-electron integrals. Also, this

scheme is very flexible to the choice of grid-based separable basis functions.

Numerical illustrations for all electron case of H2O, CH4, C2H6and pseudopo-

tential case of CH3OH and C2H5OH molecules demonstrate the required high

accuracy of calculations and an almost linear computational complexity in n.

The tensor-structured methods can be also applied to the solution of the Kohn-

Sham equation, where instead of problem-independent bases like plane waves or

a large number of finite elements in R3, one can use much smaller set of problem

adapted basis functions specified on a tensor grid.

Contents

1 Introduction 11

2 Tensor structured (TS) methods for functions in Rd,d≥327

2.1 Definitions of rank-structured tensor formats . . . . . . . . . . . . 28

2.1.1 Full format dth order tensors . . . . . . . . . . . . . . . . 28

2.1.2 Tucker, canonical and mixed (two-level) tensor formats . . 30

2.2 Best orthogonal Tucker approximation (BTA) . . . . . . . . . . . 34

2.2.1 General discussion . . . . . . . . . . . . . . . . . . . . . . 34

2.2.2 BTA algorithm for full format tensors . . . . . . . . . . . . 37

2.2.3 BTA for rank-Rcanonical input . . . . . . . . . . . . . . . 40

2.2.4 Mixed BTA for full format and Tucker tensors . . . . . . . 45

2.2.5 Remarks on the Tucker-to-canonical transform . . . . . . . 48

2.3 Numerics on BTA of function related tensors in R3........ 50

2.3.1 General description . . . . . . . . . . . . . . . . . . . . . . 50

2.3.2 Numerics for classical potentials . . . . . . . . . . . . . . . 51

2.3.3 Application to functions in electronic structure calculations 61

2.4 Tensorisation of basic multilinear algebra (MLA) operations . . . 65

2.4.1 Some bilinear operations in the Tucker format . . . . . . . 66

2.4.2 Summary on MLA operations in rank-Rcanonical format . 68

3 Multigrid Tucker approximation of function related tensors 71

3.1 Motivations .............................. 71

3.2 Multigrid accelerated BTA of canonical tensors . . . . . . . . . . 72

3.2.1 Basic concept . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.2.2 Description of the Algorithm and complexity bound . . . . 75

3.2.3 Numerics on rank reduction of the electron density ρ. . . 78

3.3 Multigrid accelerated BTA for the full format function related tensors 82

3.3.1 Numerics on the MGA Tucker approximation (ρ1/3) . . . . 83

3.3.2 BTA of the electron density of Aluminium clusters . . . . 85

4 TS computation of the Coulomb and exchange Galerkin matrices 89

4.1 Generalremarks............................ 89

Contents

4.2 Accurate evaluation of the Hartree potential by the tensor-product

convolution in R3........................... 90

4.3 Tensor computation of the Coulomb matrix . . . . . . . . . . . . 100

4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules 100

4.5 Agglomerated computation of the Hartree-Fock exchange . . . . . 106

4.5.1 Galerkin exchange operator in the Gaussian basis . . . . . 107

4.5.2 Discrete computational scheme . . . . . . . . . . . . . . . 109

4.6 Numericals experiments . . . . . . . . . . . . . . . . . . . . . . . 117

4.6.1 All electron case . . . . . . . . . . . . . . . . . . . . . . . 117

4.6.2 Pseudopotential case . . . . . . . . . . . . . . . . . . . . . 118

5 Solution of the Hartree-Fock equation by multilevel TS methods 119

5.1 Galerkin scheme for the Hartree-Fock equation . . . . . . . . . . . 121

5.1.1 Problem setting . . . . . . . . . . . . . . . . . . . . . . . . 121

5.1.2 Traditional discretization . . . . . . . . . . . . . . . . . . 122

5.1.3 Novel scheme via agglomerated tensor-structured calcula-

tion of Galerkin matrices . . . . . . . . . . . . . . . . . . . 124

5.2 Multilevel tensor-truncated iteration via DIIS . . . . . . . . . . . 126

5.2.1 General SCF iteration . . . . . . . . . . . . . . . . . . . . 126

5.2.2 SCF iteration by using DIIS scheme . . . . . . . . . . . . . 126

5.2.3 Unigrid and multilevel tensor-truncated DIIS iteration . . 127

5.2.4 Complexity estimates in terms of R0,Norb and n...... 129

5.3 Numerical illustrations . . . . . . . . . . . . . . . . . . . . . . . . 131

5.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . 131

5.3.2 Multilevel tensor-truncated SCF iteration applied to some

moderate size molecules . . . . . . . . . . . . . . . . . . . 132

5.3.3 Conclusions to Section 5 . . . . . . . . . . . . . . . . . . . 134

6 Summary of main results 137

6.1 Briefsummary ............................ 137

6.2 Presentations ............................. 140

7 Appendix 143

7.1 Singular value decomposition and the best rank-kapproximation

ofamatrix .............................. 143

7.2 Reduced SVD of a rank-Rmatrix.................. 143

7.3 List of abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . 145

Bibliography 145

1 Introduction

All truths are easy to understand once they are discovered;

the point is to discover them.

Galileo Galilei

The classical Hartree-Fock equation is one of the basic ab initio models in

electronic structure calculations. Accurate solution of the Hartree-Fock equation,

a nonlinear eigenvalue problem (EVP) in R3, is a challenging numerical task due

to the presence of nonlocal integral transforms and strong cusps in the electron

density and molecular orbitals. This nonlinear problem of finding the eigenvectors

(orbitals) of a molecular system is solved under the condition that also the electron

correlation part of the Hamiltonian matrix, depending on orbitals is unknown.

Therefore it is solved iteratively using the self-consistent field iteration (SCF)

method.

Traditionally, the solution of the Hartree-Fock equation is based on the analyti-

cal precomputation of the arising convolution type integral transforms in R3in the

problem adapted naturally separable basis (the so-called two-electron integrals).

This rigorous approach includes a number of efficient implementations which are

widely used in computational quantum chemistry. The success of the analytical

method stems from the big amount of precomputed physical information involved

in the computational scheme including the problem adapted basis. Inherent limi-

tations of this concept arise due to strong dependence of the numerical efficiency

on the size and quality of the chosen GTO-type basis sets.

In this dissertation, novel multilevel grid-based tensor-structured methods are

developed and tested by the numerical solution of the Hartree-Fock equation.

These methods include efficient algorithms for the low-rank tensor representa-

tion of functions and operators in R3in the canonical, Tucker and mixed tensor

formats with a controllable accuracy, and fast procedures for the corresponding

multilinear tensor operations. In particular, the novel multigrid accelerated ten-

sor rank reduction method is proposed, based on the coarse-level reduced higher

order singular value decomposition and selection of the most significant fibers. A

beneficial feature of this method is the employment of the interpolated orthogonal

basis functions on finer grid levels.

The core of our approach to the solution of the Hartree-Fock equation is the

accurate tensor-structured computation of the electron density and the nonlin-

1 Introduction

ear Hartree and nonlocal exchange operators in R3, discretized on n×n×n

Cartesian grids, at all steps of iterations on nonlinearity. Within the solution

process, the six-dimensional integrations are replaced by the multilinear alge-

bra operations such as the scalar and Hadamard products, the three-dimensional

convolution transform, and rank truncation, which are implemented with almost

O(n)-complexity. Efficient “formatted” separable tensor representations of func-

tions and operators enable fast algebraic transforms with multidimensional data

arrays. The grid-based solution of the Hartree-Fock equation also benefits from

the multilevel arrangement of the conventional SCF iteration on a sequence of

refined grids.

In this way, the basic advantage of the tensor-structured methods is the grid-

based evaluation of the integral operators in Rd,d≥3, with linear complexity in

the univariate grid size n, see [61, 68]. Note that the commonly used notation

as “linear in the problem size” for the problems with d= 3, often means linear

complexity with respect to the volume size Nvol =n3. From this point of view

the tensor-structured operations are of sublinear complexity, that is O(N1/3

vol ).

High accuracy is achieved due to efficient rank optimization algorithms for 3rd

order tensors, which enable computations over the 3D Cartesian grids with up

to 1012 (163843) grid nodes, using MATLAB on a SUN station. In electronic

structure calculations, this implies a fine resolution with the mesh size h≈10−4

◦

A, providing possibility for arbitrary space orientation of a molecule in the com-

putational box (like in the analytical approaches).

The proposed “grey-box” scheme for the solution of the Hartree-Fock equation

does not require an analytical precomputation of two-electron integrals. Also,

this scheme is very flexible to the choice of grid-based separable basis functions.

In the current implementation, the discretized Gaussians are used for the sake of

convienient comparison of the intermediate results with the benchmark programs.

The tensor-structured methods can be also applied to the solution of the Kohn-

Sham equation, where instead of problem-independent bases like plane waves or

a large number of finite elements in R3, one can use much smaller set of problem

adapted basis functions specified on a tensor grid.

1. Recent progress in multilinear algebra

Algebraic tensor algorithms for the low-rank approximation of multi-dimensional

data have been originally developed for the problems of chemometrics and signal

processing [112, 73, 97, 21, 22, 92], beginning from early papers [54, 55, 106, 17,

77, 18]. The higher order singular value decomposition (HOSVD) provides the

generalization of the singular value decomposition (SVD) of matrices [37]. The

theoretical basis for the HOSVD, that is the principal ingredient for the best or-

thogonal Tucker1approximation of higher order tensors, have been introduced in

[23, 24]. Comprehensive surveys on tensor decompositions with applications in

computer science are presented in [74] and [97].

Recently, a number of numerical methods based on the separation of vari-

ables have been proposed for the multivariate problems of scientific computing

[11, 41, 8, 94, 110, 111, 104, 102]. Extensive research on tensor approxima-

tion methods combined with the traditional numerical approaches, for example,

wavelets, plane waves and sparse matrices have opened new perspectives for a

feasible numerical solution of multi-dimensional problems arising in large scale

electronic and molecular structure calculations [12, 27, 28, 40, 79, 13, 80, 93].

First results on theoretical approval of the extension of the basic tools in the tra-

ditional numerical linear algebra towards the modern multilinear algebra (MLA)

have been presented in [104, 105, 47, 48]. Original works on the theory of sinc-

approximation of the multivariate functions [35, 36] gave a significant impact for

further development of the tensor methods in the problems of scientific computing.

Validity and efficiency of the Tucker-type approximations and combined tensor

formats to function related tensors in higher dimensions were demonstrated in

[104, 47, 60, 59, 86]. In this way, the main principles and concepts of the rep-

resentation of operators and functions in the rank-structured tensor formats as

well as the methods of the algebraic tensor computations have been understood

[34, 36, 46, 44, 45]. Nowadays this topic of research has attracted the interest of

several groups from linear algebra, optimization theory and scientific computing

communities, see [31, 19, 29, 85, 86, 87, 88] and [39, 7, 82, 66, 75, 71, 72].

Recent papers [60, 67, 68, 85, 70] demonstrated numerical efficiency of the

Tucker and canonical tensor decompositions for a wide class of the problems of

scientific computing in Rd, (d≥3) which are not feasible for the treatment by

conventional numerical methods due to their exponential complexity scaling in

dimension d. These tensor approximations reduce dramatically the complexity of

computations of the discretized multivariate operators and functions. Our main

difficulties in establishing applicability of the tensor algorithms, earlier approved

in chemometrics, for the treatment of higher order tensors in the problems of

numerical analysis and scientific computing have been related to accuracy issues

and to the challenge of large spatial grids required in the real-life applications.

This dissertation presents the tensor-structured methods2which were the topic

of the author’s work during 2006-2010, [67, 68, 69, 57, 70, 14]. Mostly, here

1Tucker-type decomposition, introduced by L. R. Tucker in 1966, see [106].

2We use the term “tensor-structured methods” as a common title to the numerical techniques

based on the principles of algebraically separable rank-structured tensor representations.

1 Introduction

the numerical applicability of the Tucker, canonical and mixed approximations

is considered in application to the problems in R3, arising in electronic structure

calculations. The numerical results described in these papers are based on the

original algorithms implemented in MATLAB and tested in molecular computa-

tions.

[67] B. N. Khoromskij and V. Khoromskaia. Low Rank Tucker-Type Tensor

Approximation to Classical Potentials.

Central European Journal of Mathematics v.5, N.3, 2007, pp.523-550,

(Preprint MPI MIS 105/2006).

[69] B. N. Khoromskij, V. Khoromskaia, S.R. Chinnamsetty, and H.-J. Flad.

Tensor Decomposition in Electronic Structure Calculations on 3D Cartesian

Grids. Journal of Comp. Physics, 228(2009), pp. 5749-5762, 2009,

(Preprint MPI MIS 65/2007).

[68] B. N. Khoromskij and V. Khoromskaia. Multigrid Accelerated Tensor

Approximation of Function Related Multi-dimensional Arrays.

SIAM Journ. on Scient. Comp., vol. 31, No. 4, pp. 3002-3026, 2009,

(Preprint MPI MIS 40/2008).

[57] V. Khoromskaia. Computation of the Hartree-Fock Exchange by the

Tensor-structured Methods.

Comp. Methods in Applied Math., Vol. 10(2010), No.2, pp.1-16.

(Preprint 25/2009, MPI MIS Leipzig, 2009).

[70] V. Khoromskaia, B. N. Khoromskij, and H.-J. Flad. Numerical Solution

of the Hartree-Fock Equation in the Multilevel Tensor-structured Format.

SIAM Journ. on Scient. Comp., v.33, No.1, pp.45-65, 2011.

(Preprint 44/2009, MPI MIS Leipzig, 2009).

[58] V. Khoromskaia. Multilevel Tucker Approximation of 3D Tensors.

2010, in progress.

[14] T. Blesgen, V. Gavini and V. Khoromskaia. Tensor Product Approxi-

mation of the Electron Density of Aluminium Clusters in OFDFT.

Preprint 66/2009 MPI MIS Leipzig, 2009.

[52] E. Hayryan and V. Khoromskaia. Low-rank Approximation of the Elec-

trostatic Potentials of Proteins, 2010, in progress.

The success of tensor methods based on the concept of the low-rank separable

approximation of multivariate functions in Rd, can be explained by their intrin-

sic nearly one-dimensional data structure organization. In fact, this allows us

to relax the so-called ”curse of dimensionality” inherent to traditional numerical

methods. The idea of the tensor-structured techniques is based on the search

and employment of normally hidden, well formatted data-sparse representations

of large and highly redundant data arising in the conventional computer represen-

tation of physically relevant functions in Rd. Hence, the rank-structured tensor

methods enable an efficient “compact” approximation of multivariate functions

and operators in Rd,d≥3, represented on large tensor grids of size nd.

Generally, the tensor methods employ the so-called orthogonal Tucker and

canonical models. The main benefit of the orthogonal Tucker approximation is the

robust construction of a problem adapted orthogonal basis that simultaneously

resolves the peculiarities of the approximated data with an optimal tensor rank.

In turn, the canonical tensor format is efficient in the bilinear rank-structured

tensor operations.

2. Basic rank-structured formats

Section 2 of the dissertation begins with the description of the full format dth

order tensors and respective bilinear operations necessary for the Tucker ALS

minimization problem: the contracted product of a tensor with a matrix, matrix

unfolding of tensors, and the scalar product. These operations can be understood

as higher-order analogues of the standard linear algebra operations for vector-

vector, matrix-vector and matrix-matrix calculations.

Discussion of the basic rank-structured tensor formats starts with the defini-

tions of the rank-Rcanonical and the orthogonal rank-(r1,...rd) Tucker decompo-

sitions. The efficient tools for the rank-structured approximation of tensors cannot

be derived via the straightforward extension of the algorithms of the classical nu-

merical linear algebra, like the singular value decomposition (SVD). Instead, one

arrives at the challenging nonlinear optimisation problems. As a starting point,

we recall the seminal theorem in [21], asserting that the minimization problem

of the Tucker approximation is equivalent to the dual maximization problem of

finding vectors of the orthogonal mapping dominating subspaces, which provide

the maximum norm of the Tucker core tensor over the product manifold of orthog-

onal matrices. We also remind the ALS iterative algorithm for the best Tucker

tensor approximation (BTA) applied to the full format tensors (F2T) and the cor-

responding theorems on the higher order singular value decomposition (HOSVD)

introduced in [21, 23, 24].

1 Introduction

Our Theorem 2.14 on the canonical-to-Tucker decomposition (C2T) describes

the reduced HOSVD (RHOSVD) applied to the canonical rank-Rtensors and

provides the error estimate with respect to the truncated SVD of the ℓ-mode side

matrices, supplemented by the corresponding complexity bounds.

The general ALS based BTA algorithm [23] provides the complexity of the

rank-(r1,...,rd) Tucker tensor decomposition of the order

WF2T=O(nd+1),(1.1)

applied to full format target tensors, and, as we show in [67], it amounts to

WC2T=O(Rn min{R, n}+rd−1nmin{rd−1, n}),(1.2)

operations with

r= max

ℓrℓ, ℓ = 1,...,d,

in the case of canonical rank-Rinput tensors.

Next, we describe the two-level (mixed) tensor format, based on our Lemma

2.16, which proves that the Tucker-to-canonical (T2C) approximation applied to

the Tucker decomposition of the full format tensor can be reduced to the canon-

ical approximation of a small-size core tensor. We extensively use this format

in electronic structure calculations, since the orthogonal Tucker decomposition is

employed as the principle rank reduction technique, while the canonical represen-

tation is efficient in tensor product operations

In the numerical examples in Section 2, the standard collocation discretization

has been used to represent the classical Newton potential, the Slater function, and

the Yukawa and Helmholtz kernels on the n×n×n3D Cartesian grids. Figures

demonstrate that for the function related tensors,

a) the error of the BTA decays exponentially in the Tucker rank,

b) the orthogonal vectors of the decomposition are of special shapes that resolve

the peculiarities of a function,

c) the core tensor (coefficients of the orthogonal Tucker transform) is of sparse

character (up to a certain threshold).

It is also shown, that for the tensors corresponding to the functions with periodic

cells (with bumps or singularities) in Rd,d= 3, the Tucker approximation error

does not depend on a number of cells in a periodic structure. As examples, we

consider cubic boxes with 27, 1000 and 4096 cells of the Slater function, see

Figures 2.12 and 2.13. Next, the numerical examples on the electron densities ρ

of some small organic molecules show that ρcan be efficiently approximated by

the low-rank Tucker format.

In Section 2.4, the description of basic bilinear operations in the rank-structured

tensor formats, including the scalar product, the Hadamard product and the con-

volution transform, is supplemented by the corresponding complexity estimates

and bounds on required storage. This part is completed by the summary of

tensor-structured operations in the canonical format which are of linear complex-

ity scaling with respect to the parameters of the target tensors.

Note that for the full format tensors the complexity of basic bilinear opera-

tions vary from Wfull =O(nd) to Wfull =O(n2d). Instead, for the canonical

tensors with ranks R1and R2, univariate grid size nand problem dimension d,

the complexity of bilinear tensor operations is (up to log nfactor)

Wc↔c=O(dnR1R2).(1.3)

Notice that in evaluation of the Hartree potential, one should take into account

large input ranks R1of the canonical tensor representing the electron density, see

(4.4), (4.5), and possibly large n. Moreover, when using consequent tensor-to-

tensor operations in the canonical format, tensor ranks are multiplied and hence

increase enormously after several operations even with moderate initial ranks.

To reduce large canonical ranks with controllable accuracy, we introduce the

C2T algorithm (see Algorithm C BTA in subsection 2.2.3) with the consequent

T2C transform. However, the conventional C2T and full-format-to-Tucker (F2T)

algorithms are not computationally feasible in electronic structure calculations on

large n×n×n3D Cartesian grids, since:

•For C2T case, the complexity is (see (1.2))

WC2T=O(Rn min{R, n}+dr2nmin{r2, n}),(1.4)

i.e., polynomial (of degree at least 2) in rand either Ror n. To resolve

cusps due to core electrons in the electron density, both Rand nare of the

order 104.

For simultaneously large Rand nthe SVD in the RHOSVD might be com-

putationally unfeasible.

•For F2T case, the complexity of the general HOSVD based BTA algorithm,

WF2T=O(n4),(1.5)

restricts the size of the input tensor (n3∼106). Our goal is to reach the

complexity (and resolution) corresponding to maximum size of the input

tensor, O(n3), (that is, n3∼108).

1 Introduction

3. Multigrid accelerated rank-structured approximation

To avoid above limitations, in Section 3 we introduce the multigrid accelerated

(MGA) Tucker decomposition. The MGA tensor approximation method applied

to discretised functions in R3, enables to relax the restrictions of the single grid

Tucker model for large nand R.

The idea of the multilevel acceleration is based on the successive reiteration of

the orthogonal Tucker tensor approximation on a sequence of nested refined grids.

The HOSVD or RHOSVD are performed only on the coarsest grid representation

of the target tensor, given in full or canonical formats, respectively. As the initial

guess for the ALS iteration on finer grids, we apply the interpolated orthogonal

side-matrices calculated on the coarser grid.

For the canonical target tensor, along with a good initial guess for the nonlinear

ALS approximation, the MGA approach provides the transfer of the important

data structure information from the coarse-to-fine grids, based on the introduced

maximum energy principle. By using this principle, on the coarsest grid level, we

extract the location of dominating columns – most important fibers of the ℓ-mode

contracted unfolding matrices. This leads to the fast nonlinear ALS iteration

on huge 3D Cartesian grids now performed over an almost minimal sufficient

subset of directional fibers in the target tensor. Figures in Section 3 illustrate the

mechanism of the choice of most important fibers.

The MGA technique exhibits the following benefits for the tensors related to

functions in R3.

•For the rank-Rcanonical target tensors it is proved to have linear (instead

of polynomial) scaling in all input parameters: n,Rand the Tucker rank r,

WC2T=O(rRn),

see Theorem 3.1 and Algorithm MG C BTA, in Section 3.2.

•For the full format tensors of size n3, it leads to a nested F2T decomposition

with the linear cost in the volume size of input data,

WF2T=O(n3),

(instead of O(n4)-scaling for the standard Tucker approximation).

It should be emphasized that our technique is particularly efficient for the rank

reduction of the canonical rank-Rtensors with large Rand n. Therefore, this

algorithm is suitable for the functions with multiple strong singularities, such as

electron densities of molecules in the Hartree-Fock equation.

Theorem 3.1 proves linear complexity with respect to the input parameters of

a tensor, for the MGA BTA applied to the rank-Rcanonical input in R3.

Combination of C2T and T2C algorithms is successfully applied for rank re-

duction of the electron density of some organic molecules. This enables accurate

computation of the Hartree potential (a convolution integral in R3), see Section

4, using remarkably fine n×n×ngrids with the size n= 16384 for all three

dimensions. Figures present the comparison of elapsed times for the unigrid and

multigrid schemes, as well as the cost for C2T transform in the MGA rank reduc-

tion of the electron density of H2O and some small organic molecules. Parameters

of the algorithms allow to choose the required accuracy level.

In the case of full format tensors, our multigrid accelerated Tucker algorithm en-

ables usage of large grids up to Nvol = 5123. Computation time of this algorithm

essentially outperforms the existing benchmark method based on the Newton-type

scheme on the product Stiefel manifold [92]. The numerical examples on the MGA

BTA of full size tensors related to functions in R3include, for example, ρ1/3, which

is commonly used in the density functional theory. Notice that the alternative

approach based on the cross approximation over the incomplete data set in 3D,

is presented in [85]. Next, interesting numerical examples are presented on the

Tucker approximation of the electron density of Aluminium clusters with 14, 172,

and 365 atoms [14]. Along with the multicentered Slater functions considered in

Section 2, they demonstrate that the rank of the Tucker representation of function

related tensors with periodic multiple cusps (or bumps), is almost independent

of the number of cells in the periodic structure. This indicates that the effective

numerical complexity for solving larger grid-based quasi-periodic problems by the

tensor-structured methods, for example in the orbital-free density functional the-

ory [32, 33], may be expected to increase only linearly in the univariate grid size n.

4. The Hartree-Fock equation

In Section 4, the tensor-structured methods are applied for computation of

the Galerkin matrices of the Coulomb and exchange operators in the Hartree-

Fock model. The Hartree-Fock equation is the mean-field approximation of the

electronic Schr¨odinger equation for computation of the ground state energy of

many-electron systems. It is an eigenvalue problem (we consider a closed shell

case) on finding Norb lowest eigenvalues λiand respective pairwise L2-orthogonal

eigenfunctions (molecular orbitals) ϕi:R3→R,ϕi∈H1(R3), from the equation

Fϕi(x) = λiϕi(x), i = 1, ..., Norb, x ∈R3,(1.6)

with RR3ϕi(x)ϕj(x)dx =δij, (i, j = 1, ..., Norb), where Norb is the number of

1 Introduction

electron pairs in a molecule. The nonlinear Fock operator Fis given by

F:= −1

2∆ + Vc+VH−K.

Here, an external nuclear potential Vcis defined by

Vc(x) = −

ν=1

Zν

kx−Aνk,(1.7)

where Mis the number of nuclei in a molecule, and Zν,Aν∈R3denote the

charge and spatial coordinates of the nuclei, respectively. The Hartree potential

VHdetermines the Coulomb interaction of every single electron with the field

generated by all electrons of the system

VH(x) := ZR3

ρ(y)

kx−ykdy, x ∈R3,(1.8)

which corresponds to the convolution of the Coulomb potential with the electron

density

ρ(y) = 2

Norb

i=1

(ϕi(y))2.(1.9)

Calculation of the exchange Galerkin matrix in the Hartree-Fock equation is a

challenging problem due to the nonlocal character of the exchange operator K

(Kϕ) (x) := 1

2ZR3

τ(x, y)

kx−ykϕ(y)dy, (1.10)

with the density matrix

τ(x, y) = 2

Norb

i=1

ϕi(x)ϕi(y).

Traditionally the Hartree and exchange potentials are computed by the analytical

evaluation of two-electron integrals, using the separable Gaussian basis functions,

or Gaussian type orbitals (GTOs), to represent the eigenfunctions,

ϕi(x) =

k=1

ci,kgk(x), x = (x1, x2, x3)∈R3.(1.11)

Here basis functions gk,k= 1,...,R0, are naturally separable Cartesian Gaus-

sians represented in the rank-1 canonical tensor product form,

gk(x) = g(1)

k(x1)g(2)

k(x2)g(3)

k(x3),

with 1,2,3 designating space dimensions. When using the traditional analytical-

based approaches, computation of the Hartree and exchange potentials represents

a major bottleneck for the numerical solution of the Hartree-Fock and Kohn-Sham

equations.

5. Computation of the Hartree and exchange potentials

For grid-based agglomerated computation of (1.8) and the integral operator

(1.10) on 3D Cartesian grids, we introduce an appropriate fixed computational

box and apply a collocation discretization scheme for all involved multivariate

functions and a projection (Galerkin) scheme for representation of operators. In

particular, the collocation/projection scheme is used for representation of the

Newton kernel in the discrete tensor-product convolution. The canonical rank-

RNrepresentation of a tensor representing the projected Newton potential is

computed using the sinc-quadratures in [10].

The fast tensor product convolution of the multivariate functions in Rddevel-

oped in [61] with complexity O(dn log n), is applied in the computation of the

integral (1.8) and the integral operator (1.10). When applied in 3D case, it con-

siderably outperforms the benchmark convolution based on the 3D Fast Fourier

Transform (FFT) having the cost O(n3log n). Table 4.1 illustrates linear depen-

dence of the CPU times for convolution (and preliminary rank reduction for the

electron density) versus the univariate grid size nin the computation examples

of VHfor H2O, C2H6and CH4molecules.

Accuracy of the tensor-structured computations on a fixed grid is O(h2), where

h=O(n−1) is the mesh-size. We achieve O(h3) accuracy in evaluation of the

convolution integral operators by using the Richardson extrapolation on a cou-

ple of consequent equidistant grids. Notice that the tensor convolution on non-

equidistant grids was described in [42, 43].

The size of the computational box for the considered molecules is in the range

of 14 ÷20 ◦

A. Our tensor-structured methods enable usage of large n×n×n3D

Cartesian grids, and hence fine mesh sizes

h≈10−2◦

Afor n= 1024,

h≈10−4◦

Afor n= 16384.

In the calculation of the Hartree potential VH, the electron density ρis a square

of a sum of Gaussians, see (1.9), (1.11), therefore it is expressed in terms of a

larger number of Gaussians Rρ0=R0(R0+1)

2. As a result, the convolution operator

with the rank RNNewton kernel, applies to the canonical tensor for ρ, with large

ranks Rρ0∼104. Moreover, the results of convolution with the ranks RC=

1 Introduction

Rρ0RN, should be used further to compute the Galerkin projection of the Hartree

potential in the Gaussian basis set, which can yield the total complexity of the

order O(R2

ρ0RNnlog n), to compute the Coulomb matrix in the Fock operator.

Hence, a rank reduction for the electron density is required, to enable evaluation

of the Hartree potential VHfor both large Rρ0and n. We apply the multigrid

accelerated C2T and T2C algorithms introduced in Sections 2, 3, to decrease Rρ0

to much smaller value,

Rρ≪Rρ0,with Rρ≤r2,

where ris the Tucker rank (usually, we use equal ranks, r=rℓ, ℓ = 1,2,3).

In this way, we have linear scaling (up to lower order terms) in all parameters

of the input canonical tensors, at every step of evaluation of the Coulomb matrix,

•Rank reduction scheme for the electron density. Complexity of C2T, WC→T=

O(rpRρ0n), where p≤ris the multigrid parameter. T2C complexity,

WT→C=O(Rρn), where Rρ≤r2.

•The Hartree potential VH, is computed by the “agglomerated” tensor-product

convolution, WC∗C=O(RρRNnlog n).

•Tensor scalar product WCC =O(nRρ0RρRN) to compute Coulomb matrix

entries.

In the case of large molecules, the MGA rank reduction can be applied to the

result of convolution, to reduce the rank of the tensor before computation of the

Coulomb matrix entries.

Figures in Section 4 show linear scaling of the CPU times with respect to both

Rρ0, and n, in the rank reduction and the tensor-product convolution. Figures

illustrate the absolute error in calculations of VHand the Coulomb matrix for

CH4, C2H6and H2O. Accuracy up to 10−6hartree (absolute error) even for cusp

areas in ρis achieved due to huge n×n×nCartesian grids with nup to 104.

Next, in Section 4, the tensor-structured evaluation of the nonlocal (integral)

exchange operator Kin the Hartree-Fock equation is considered. Note that com-

putation of the Hartree-Fock exchange leads to integration in six dimensions,

see (4.17). Mostly, analytical evaluation of these six-dimensional integrals was

considered in the literature (two-electron integrals), and many efforts have been

devoted to solution of this problem (see [109, 78] and references therein).

We introduce the “agglomerated” scheme for the computation of the exchange

matrix entries by “dividing” the integration into the following steps.

•Tensor-product convolutions of the Newton kernel with the rank-R0canoni-

cal tensors representing the Hadamard product of the corresponding molec-

ular orbital with every Gaussian.

•Computation of the Galerkin projection of the convolution result with re-

spect to the GTO basis set (by tensor scalar products), to obtain contribu-

tion to the entries of the exchange matrix from the corresponding orbital.

•Summation over all Norb orbitals completes the matrix computation.

This tensor-product algorithm for evaluation of (4.16) has the complexity

W=O(neff R4

0Norb),

where Norb is the number of orbitals in a molecule, neff ≪nis the “effective”

univariate grid size, and R0is the number of Galerkin basis functions. To reduce

the R0-asymptotics to O(R3

0), we apply the C2T algorithm to reduce the ranks

after the convolution step.

Note, that in the traditional ab initio electronic structure calculations, many

years have been devoted to the development of the rigorous schemes for the an-

alytical evaluation of the two-electron integrals inherent to this approach, which

yielded state-of-the-art packages like GAUSSIAN, Abinit and MOLPRO [5, 108].

In analytical-based programs, elaborated by large scientific groups, essential parts

of calculations use precomputed parameters and additional “non-zero” initial ap-

proximations for accelerating the iterative solution of the eigenvalue problem (1.6).

6. Description of the Hartree-Fock solver

In Section 5 of this dissertation we introduce the novel multilevel scheme for the

numerical solution of the Hartree-Fock equation (1.6) using the grid-based tensor-

structured methods to represent basic operators as described in previous sections.

The new concept for the numerical solution of the Hartree-Fock equation is a

“grey box” scheme based on a moderate number of the problem-adapted Galerkin

basis functions {gk}represented on 3D Cartesian grid, which are used as “global

elements” with the low separation rank3.

We solve the nonlinear EVP obtained by the Galerkin-type discretisation of the

Hartree-Fock equation (1.6), with respect to this “global element basis”,

FCi=λiSCiwith F=H0+J(C)−K(C), i = 1, ..., Norb,(1.12)

3Here, the grid-based GTO basis is chosen for the reasons of convenient comparison of the inter-

mediate results of computations with the MOLPRO output. Any appropriate algebraically

separable basis with the rank larger than 1 can be used instead.

1 Introduction

where the eigenvectors Ci∈RR0form a matrix C={cki}= [C1C2. . . CNorb ]∈

RR0×Norb ,representing the Galerkin expansion coefficients of orbitals in the given

basis set, ϕi=

k=1

ckigk. Here, J(C), and K(C), correspond to the Galerkin

matrices of the Hartree and exchange operators, respectively.

The discrete Hartree-Fock equation is solved by the self-consistent field (SCF)

iteration on a sequence of refined grids. To enhance the convergence, the SCF

iteration is supplemented by the direct inversion in the iterative subspace (DIIS)

scheme commonly used in the physical literature [89, 53, 16]. The multilevel

acceleration of DIIS iteration described in Section 5.2.3 leads to stable and fast

convergence on fine grid levels. We begin the iterative solution of the EVP with

the initial zero approach for the matrices J,K, (J(C) = 0, K(C) = 0), i.e., only

with the given core Hamiltonian part of the Fock operator, H0, which contains

the contribution of the nuclear external potential and the kinetic energy of the

electrons in a molecule (linear part of F).

The core of our method is the tensor-structured computation of the electron

density and Galerkin matrices of the Hartree and exchange operators J(C), K(C),

with O(nlog n)-complexity using an updated matrix C, at every step of nonlinear

iteration.

High accuracy is achieved due to 3D tensor-structured arithmetic, with rank

truncation, enabling computations over huge n×n×ntensor grids with up to 1012

entries at the finest level. In electronic structure calculations, this implies rather

fine resolution discussed above, which enables an arbitrary space orientation of a

molecule in the computational box, as in the case of analytically based methods4.

Our multilevel tensor-structured scheme for the numerical solution of the ab

initio Hartree-Fock equation includes the following ingredients.

•In current computations we use, for simplicity, the H0part of the Fock

operator from MOLPRO (solution independent part).

•Computations involving the n×n×n3D grid, start with a coarse approxi-

mation (say, for n0= 64), and proceed using the dyadic grid refinement up

to n= 1024 in the pseudopotential cases, and up to n= 8192 in all electron

cases.

•Iterations for solving the EVP start with Jn0(C) = 0, Kn0(C) = 0.

•At every iteration step, the Hartree and exchange parts of the Fock ma-

trix Fare updated by the tensor-structured computations with O(nlog n)

complexity.

4Note that our algorithms are computationally feasible using MATLAB on a standard SUN

station.

•A grid dependent termination criterion is used for switching iterations to

the next level of grid refinement.

Specifically, as the termination criterion, we control the norms of the differences

of the orbitals taken as the residual over subsequent iterations. Our multilevel

strategy has a four-fold effect, since it

a) provides fast convergence of the SCF DIIS iterations on the coarse grids, in

spite of zero initial guess for Jn0(C), Kn0(C),

b) ensures good initial guess for iterations on (time consuming) finer grids,

c) allows the improved asymptotical approximation O(h3), via the Richardson

extrapolation over a sequence of grids.

d) allows to reduce considerably the number of recomputed entries in K(C) at

fine grid levels (much less than R2

0/2), by using the filtering strategy on fine

grid levels.

The discrete orbitals, represented by the respective coefficient vectors are updated

by diagonalising the Galerkin stiffness matrix at each iteration of the solution of

the nonlinear EVP problem, at the expense O(R3

0), where R0is the dimension

of the Galerkin subspace. We observe that the multilevel DIIS iteration exhibits

uniform linear convergence rate qNit , q < 1, where Nit is the number of itera-

tions, while the overall computational time for one iteration on an n×n×n3D

Cartesian grid scales as O(nlog n) in the univariate grid size n(under fixed rank

parameters).

The current version of our method still scales cubically in the size of the ap-

proximating basis. Hence, any algebraic optimisation of this basis set may give

new opportunity to high accuracy ab initio computations for large molecules. The

quadratic scaling in the size of the approximating basis might be possible for the

iterative solution of the discrete spectral problem, or in the framework of direct

minimization algorithms, see [95, 93] for the detailed discussion on the direct

minimization methods.

The numerical illustrations for the SCF iteration are presented for the all elec-

tron case of H2O, and the pseudopotential case of CH4, CH3OH and C2H5OH

molecules. Figures demonstrate the exponential convergence of the residuals with

respect to the number of iterations. The number of effective iterations, scaled with

respect to the time-unit required for one iteration on the finest grid level, is fairly

small. The uniform convergence rate q∼0.4 is observed for the multilevel itera-

tions, and it turns out to be quite small, q∼0.1, in terms of effective iterations.

1 Introduction

Numerical computations confirm almost linear scaling in n, and possibilities

for the low-rank representation of the multivariate functions and operators, indi-

cating attractive features of the multilevel tensor-truncated SCF iteration in the

prospects of efficient ab initio and DFT computations for large molecules.

To summarize, we note that the novel tensor-structured methods considered

in this dissertation address many interesting mathematical and algorithmic prob-

lems which need future rigorous theoretical and numerical analysis. Validity of

these methods in molecular computations is verified by the numerical results dis-

cussed in this thesis, signifying their perspectives in application to computational

problems in modern quantum chemistry.

Leipzig 2006-2010

Max-Planck Institute for Mathematics in the Sciences

2 Tensor structured (TS) methods

for functions in Rd,d≥3

During the last decades, the methods for the low-rank tensor approximation of

multiway data, originating from the fundamental papers [54, 106, 77], have been

intensively developed in application to the problems of chemometrics, psychomet-

ric, independent component analysis, signal processing and higher order statistics.

A thorough mathematical approval and analysis of the Tucker decomposition algo-

rithm has been presented in the seminal works on the higher order singular value

decomposition [24] and the best rank-(r1,...,rd) orthogonal Tucker approxima-

tion of higher order tensors [23]. A comprehensive survey [74] summarized the

results of the extensive research on tensor decomposition methods and applica-

tions in computer science.

In this section, we recall some of the results in [23, 24] and describe the new

tensor decomposition methods especially designed for application to the problems

of scientific computing. The validity of the Tucker model in scientific computing

can be understood on the base of numerical results on low rank Tucker-type

decomposition applied to classes of function related tensors in Rd,d= 3, [67] that

gainfully exploits the important analytical properties of a respective multivariate

function.

We begin with the description of the full format dth order tensors and definitions

of the basic rank-structured tensor formats: the rank-Rcanonical, the orthogonal

rank-(r1,...,rd) Tucker and mixed type decompositions. As a starting point, we

recall the important theorem in [21] that the minimization problem of the Tucker

approximation is equivalent to the dual maximization problem of finding vectors

of the orthogonal mapping dominating subspaces which provide the maximum

norm of the Tucker core tensor over the product manifold of orthogonal matrices.

Our Theorem 2.14 on the canonical-to-Tucker decomposition describes the re-

duced higher order SVD (RHOSVD) applied to the canonical rank-Rtensors and

provides the error estimate in terms of the truncated SVD of the ℓ-mode side

matrices, supplemented by the corresponding complexity bounds. This result is

an analogue to the well known higher order SVD (HOSVD) approximation for

the full format tensors in [24], but now applied to the case of rank-Rtargets.

2 Tensor structured (TS) methods for functions in Rd,d≥3

We present numerical illustrations on the Tucker approximation of full for-

mat tensors on examples of the following discretized functions in R3: Newton,

Yukawa and Helmholtz potentials, Slater and periodic multi-centered Slater-type

functions, and electron densities of small-size organic molecules.

This section is concluded by the summary of linear and bilinear operations

in the Tucker and canonical tensor formats with estimation of the required nu-

merical complexity. In general, along with the consequent multigrid accelerated

Tucker approximation techniques introduced in Section 3, the material of this

section provides sufficient tools for the construction of efficient tensor-structured

numerical methods in electronic structure calculations in R3(see Sections 4, 5).

2.1 Definitions of rank-structured tensor formats

2.1.1 Full format dth order tensors

A tensor of order dis a multidimensional array of real (complex) numbers whose

entries are referred by using a product index set I=I1×...×Id. We use the

common notation

A= [ai1...id:iℓ∈Iℓ]∈RI, Iℓ={1, ..., nℓ}, ℓ = 1, ..., d, (2.1)

to denote a dth order tensor, and ifor the d-tuple i= (i1, ..., id) of integers1. A

tensor Ais an element of the linear vector space Vn=RI, where n= (n1, ..., nd),

with the entrywise addition (A+B)i=ai+biand the multiplication (c A)i=c ai

(c∈R). The linear vector space Vnof tensors is equipped with the Euclidean

scalar product h·,·i :Vn×Vn→R, defined as

hA, Bi:= X

(i1...id)∈I

ai1...idbi1...idfor A, B ∈Vn.(2.2)

We call the related norm kAkF:= phA, Ai, the Frobenius norm, as for matrices.

Notice that a vector is an order-1 tensor, while a matrix is an order-2 tensor, so

the Frobenius tensor norm coincides with the Euclidean norm of vectors and the

Frobenius norm of matrices, respectively.

In some cases when the tensor size should be specified explicitly, we use the

equivalent notation for the linear space of tensors, Rn1×...×nd, instead of RI.

Some multilinear algebraic operations with tensors of order d(d≥3), can be

reduced to the standard linear algebra by unfolding of a tensor into a matrix.

Here, we recall the construction of matrix unfolding (or the so-called matriciza-

tion) as given in the survey [74]. First, we recall the notion of fibers given in

1The alternative notation for the tensor entries ai1...idis a(i1,...,id).

2.1 Definitions of rank-structured tensor formats

[74], which are the higher order analogue of matrix rows and columns. A fiber is

defined by fixing all indices of a tensor except one. In this way, a matrix column

is a mode-1 fiber, and a matrix row is a mode-2 fiber. The ℓ-mode matricization

of a tensor A∈RI1×...×Idarranges the ℓ-mode fibers of a tensor to be the columns

of the resulting matrix.

Definition 2.1 ([74]) The unfolding of a tensor along mode ℓis a matrix of

dimension nℓ×(nℓ+1...ndn1...nℓ−1), further denoted by

A(ℓ)= [aij]∈Rnℓ×(nℓ+1...ndn1...nℓ−1),(2.3)

whose columns are the respective fibers of Aalong the ℓ-th mode, such that the

tensor element ai1i2...idis mapped into the matrix element aiℓjwhere

j= 1 +

k=1,k6=ℓ

(ik−1)Jk,with Jk=

k−1

m=1,m6=ℓ

nm.

An illustration of the tensor unfolding A(ℓ), (ℓ= 1,2,3) for the 3-rd order tensor

is presented in Figure 2.3.

Another important tensor operation is the so-called contracted product of two

tensors. In the following, we frequently use its special case of the tensor-matrix

multiplication along mode ℓ.

Definition 2.2 ([21]) Given a tensor A∈RI1×...×Idand a matrix M∈RJℓ×Iℓ,

we define the respective mode-ℓtensor-matrix product by

B=A×ℓM∈RI1×...×Iℓ−1×Jℓ×Iℓ+1...×Id,(2.4)

where

bi1...iℓ−1jℓiℓ+1...id=

nℓ

iℓ=1

ai1...iℓ−1iℓiℓ+1...idmjℓiℓ, jℓ∈Jℓ.

Notice that the order of indices Jℓ×Iℓin Definition 2.2 corresponds to the tradi-

tional contracted product notations for the Tucker decomposition as in (2.10).

The tensor-matrix product can be applied successively along several modes,

and it can be shown to be commutative

(A×ℓM)×mP= (A×mP)×ℓM=A×ℓM×mP, ℓ 6=m.

The repeated (iterated) mode-ℓtensor-matrix product for matrices Mand Pof

appropriate dimensions can be simplified as follows,

(A×ℓM)×ℓP=A×ℓ(PM),

2 Tensor structured (TS) methods for functions in Rd,d≥3

Figure 2.1: Contracted product of a 3rd order tensor with a matrix, see (2.4).

as discussed in [24].

For example, the contracted product of A∈Rn1×n2×n3with a matrix M∈

Rr3×n3yields the tensor B∈Rn1×n2×r3, see Figure 2.1.

An important property of the contracted product that resembles the matrix

transpose is given in the following Lemma.

Lemma 2.3 For any B∈RI1×...×Iℓ−1×Jℓ×Iℓ+1...×Id, we have

hA×ℓM, Bi=hA, B ×ℓMTi.(2.5)

Proof: By definition,

hA×ℓM, Bi=X

i∈I\Iℓ, jℓ∈Jℓ nℓ

iℓ=1

ai1,...iℓ−1,iℓ,iℓ+1,idmjℓ,iℓ!bi1,...,jℓ,iℓ+1,...,id

i∈I

ai X

jℓ∈Jℓ

bi1,...iℓ−1,jℓ,iℓ+1,idmjℓ,iℓ!=hA, B ×ℓMTi.

The number of entries in a full format tensor is

ℓ=1

#Iℓ. Assume for simplicity

that #Iℓ=nfor all ℓ= 1, ..., d, then the number of entries in Aamounts to nd,

hence growing exponentially in d.

2.1.2 Tucker, canonical and mixed (two-level) tensor formats

To get rid of exponential scaling in the dimension approximate representations

in some classes S ⊂ Vnof “rank structured” tensors will be applied. To that

end, the traditional concept of tensor-product Hilbert spaces (see, e.g. [91]) plays

an important role. Specifically, the initial linear vector space of tensors Vn, is

considered as the tensor-product Hilbert space Vn=⊗d

ℓ=1Vℓof real-valued d-th

order tensors with Vℓ=RIℓ, where RIℓ(ℓ= 1, ..., d) is the standard Euclidean

“univariate” vector space.

The tensor product of vector spaces Vℓ(ℓ= 1, ..., d) is defined by using the usual

construction of the so-called rank-1 or elementary tensors: the tensor product of

2.1 Definitions of rank-structured tensor formats

vectors u(ℓ)={u(ℓ)

iℓ}iℓ∈Iℓ∈Vℓ(ℓ= 1, ..., d) forms the canonical rank-1tensor

U≡[ui]i∈I =u(1) ⊗... ⊗u(d)∈Vnwith entries ui=u(1)

i1···u(d)

id.

Now the tensor product of vector spaces Vℓis defined by the span

⊗d

ℓ=1Vℓ:= span{u(1) ⊗... ⊗u(d):u(ℓ)∈RIℓ,1≤ℓ≤d}.

Taking all linear combinations of rank-1 tensors defined by the unit vectors in RIℓ

(ℓ= 1, ..., d) shows that Vn=⊗d

ℓ=1RIℓ.

Notice that a rank-1 tensor requires only dn numbers to store it (now linear

scaling in the dimension). Moreover, the scalar product of two rank-1 tensors U

and Vin Vncan be represented by the componentwise univariate scalar products

hU, V i:=

ℓ=1 u(ℓ), v(ℓ),

that can be calculated in O(dn) operations. When d= 2, the tensor product of

two vectors u∈RIand v∈RJrepresents a rank-1 matrix,

u⊗v=uvT∈RI×J.

As the simplest rank structured ansatz, we make use of rank-1 tensors. In the

following, we consider the rank-structured representation of higher order tensors

based on sums of rank-1 tensors. Specifically, we shall use the Tucker, canonical

and mixed models.

Definition 2.4 (The canonical format).

Given a rank parameter R∈N, we denote by CR,n=CR⊂Vna set of tensors

which can be represented in the canonical format,

U=XR

ν=1ξνu(1)

ν⊗...⊗u(d)

ν, ξν∈R,(2.6)

with normalised vectors u(ℓ)

ν∈Vℓ(ℓ= 1, ..., d). The minimal parameter Rin

(2.6) is called the rank (or canonical rank) of a tensor.

Introducing the side-matrices corresponding to representation (2.6),

U(ℓ)= [u(ℓ)

1...u(ℓ)

and the diagonal tensor ξ:= diag{ξ1, ..., ξR}such that ξν1,...,νd= 0 except when

ν1=... =νdwith ξν,...,ν =ξν(ν= 1, ..., R), we obtain the equivalent contracted

product representation

U=ξ×1U(1) ×2U(2)... ×dU(d).(2.7)

2 Tensor structured (TS) methods for functions in Rd,d≥3

The canonical tensor representation is gainful for the multilinear tensor opera-

tions. In Section 2.4.2 it is shown that the linear tensor operations with tensors

in the rank-Rcanonical format have linear complexity Od

ℓ=1

nℓwith respect to

the univariate grid size nℓof a tensor. The disadvantage of this representation is

the lack of fast and stable algorithms for best approximation of arbitrary tensors

in the fixed-rank canonical format.

The rank-(r1,...,rd)Tucker tensor format [106, 23] is based on a representation

in subspaces

Tr:= ⊗d

ℓ=1Tℓof Vnfor certain Tℓ⊂Vℓ

with fixed dimension parameters rℓ:= dim Tℓ≤n.

Definition 2.5 (The Tucker format).

For given rank parameter r= (r1, ..., rd), we denote by Tr,n(or shortly Tr) the

subset of tensors in Vnrepresented in the so-called Tucker format

A(r)=Xr1

ν1=1 ...Xrd

νd=1βν1,...,νdv(1)

ν1⊗...⊗v(d)

νd∈Vn,(2.8)

with some vectors v(ℓ)

νℓ∈Vℓ=RIℓ(1≤νℓ≤rℓ), which form the orthonormal

basis of rℓ-dimensional subspaces Tℓ= span{v(ℓ)

ν}rℓ

ν=1 (ℓ= 1, ..., d). With fixed r

and n, we can also write

Tr,n:= {A∈ ⊗d

ℓ=1Tℓfor arbitrary Tℓ⊂Vℓ,such that dim Tℓ=rℓ}.

Here we call the parameter r= max

ℓ{rℓ}the maximal Tucker rank.

In our applications, we usually apply the Tucker approximation with r≪n, say

r=O(log n). The coefficients tensor β= [βν1,...,νd], that is an element of a tensor

space

Br=Rr1×...×rd,(2.9)

is called the core tensor. Introducing the (orthogonal) side matrices V(ℓ)=

[v(ℓ)

1...v(ℓ)

rℓ], such that V(ℓ)TV(ℓ)=Irℓ×rℓ, we then use a tensor-by-matrix con-

tracted product to represent the Tucker decomposition of A(r)∈Tr,

A(r)=β×1V(1) ×2V(2)... ×dV(d).(2.10)

Remark 2.6 Notice that the representation (2.10) is not unique, since the tensor

A(r)is invariant under directional rotations. In fact, for any set of orthogonal

rℓ×rℓmatrices Yℓ(ℓ= 1, ..., d), we have the equivalent representation

A(r)=b

β×1b

V(1) ×2b

V(2)... ×db

V(d),

with

β=β×1Y1×2Y2... ×dYd,b

V(ℓ)=V(ℓ)YT

ℓ, ℓ = 1, ..., d.

2.1 Definitions of rank-structured tensor formats

Remark 2.7 If the subspaces Tℓ= span{v(ℓ)

ν}rℓ

ν=1 ⊂Vℓare fixed then the ap-

proximation A(r)∈Trof a given tensor A∈Vnis reduced to the orthogonal

projection of Aonto the particular linear space Tr=⊗d

ℓ=1Tℓ⊂Tr,n, that is

A(r)=

ν1,...,νd=1hv(1)

ν1⊗... ⊗v(d)

νd, Aiv(1)

ν1⊗...⊗v(d)

νd

=A×1V(1)T×2···×dV(d)T×1V(1) ×2...×dV(d).

This property plays an important role in the computation of the best orthogonal

Tucker approximation, where the ”optimal” subspaces Tℓare recalculated within

a nonlinear iteration process.

In the following, to simplify the discussion of complexity issues, we assume that

rℓ=r(ℓ= 1, ..., d). The storage requirements for the Tucker (resp. canonical)

decomposition is given by rd+drn (resp. R+dRn), where usually ris noticably

smaller than n. In turn, the maximal canonical rank of the Tucker representation

is bounded by rd−1(see Remark 2.17).

Since the Tucker core still presupposes rdstorage, we introduce further the

approximation methods using a mixed (two-level) representation [60, 67] which

gainfully combines the beneficial features of both the Tucker and canonical mod-

els.

Definition 2.8 (The mixed (two-level) Tucker-canonical format).

Given the rank parameters r, R, we denote by TCR,rthe subclass of tensors in

Tr,nwith the core βrepresented in the canonical format, β∈CR,r⊂Br. An

explicit representation of A∈TCR,ris given by

A= R

ν=1

ξνu(1)

ν⊗...⊗u(d)

ν!×1V(1) ×2V(2)... ×dV(d),(2.11)

with some u(ℓ)

ν∈Rrℓ. Clearly, we have the embedding TCR,r⊂CR,nwith the cor-

responding (non-orthogonal) side-matrices U(ℓ)= [V(ℓ)u(ℓ)

1...V (ℓ)u(ℓ)

R], and scaling

coefficients ξν(ν= 1, ..., R).

A target tensor A∈Vncan be approximated by a sum of rank-1 tensors as in

(2.8), (2.6), or using the mixed format TCR,ras in (2.11). More details on the

mixed (two-level) tensor format are given in Sections 2.2.4 and 2.2.5.

In the next sections we discuss fast and efficient methods to compute the cor-

responding rank structured approximations in different problem settings.

2 Tensor structured (TS) methods for functions in Rd,d≥3

(3)

(2)

1r2r3

r1+ . . . +

(3) (3)

(2)

(1)

+d r n

r1rd

(1)

u u

b1 bR(2)

u(1)

R < = r

Vβ

ββ

Figure 2.2: Mixed Tucker-canonical representation of the full format 3rd order

tensor.

2.2 Best orthogonal Tucker approximation (BTA)

2.2.1 General discussion

The numerical Tucker-type approximation of dth order tensors is one of the most

practically important MLA operations. This operation is, in fact, one of the

possible higher order extensions of the best rank-rmatrix approximation in linear

algebra, based on the truncated SVD.

In our applications, we deal with the best Tucker approximation applied to

the so-called function related tensors, whose entries are computed by sampling a

given multivariate function in Rd,d= 3 over an n×n×nCartesian tensor grid.

Further, in the discussion of the numerical results in Section 2.3.2, we describe

the particular construction of function related tensors and observe some useful

properties of the Tucker format applied to these tensors.

In general, the target tensor A0to be approximated may belong itself to a

certain class S0⊂Vnof data structured tensors. Since both Tr,nand CR,nare

not linear spaces we are led to the challenging nonlinear approximation problem

A0∈ S0⊂Vn:f(A) := kA0−Ak2→min (2.12)

over all tensors A∈ S with S ∈ {Tr,n,CR,n,TCR,r}. The target tensor A0might

inherit a certain data-sparse structure like S0⊂ {CR0,n,Tr0,n}.

As the basic nonlinear approximation scheme, we consider the best orthogonal

rank-(r1, ..., rd) Tucker format corresponding to the choice S=Tr,n. Tensors

A∈Tr, are parametrised as in (2.10), with the orthogonality constraints

V(ℓ)∈ Vnℓ,rℓ(ℓ= 1, ..., d),

2.2 Best orthogonal Tucker approximation (BTA)

where

Vn,r := {Y∈Rn×r:YTY=Ir×r∈Rr×r}(2.13)

is the so-called Stiefel manifold of n×rorthogonal matrices. This minimisation

problem on the product of Stiefel manifolds was first addressed in [76].

In the following we denote by Gℓthe Grassman manifold that is the factor space

of the Stiefel manifold Vnℓ,rℓ(ℓ= 1, ..., d) in (2.13) with respect to all possible

rotations, see Remark 2.6.

For a wide class of function related tensors, the quality of approximation via the

minimisation (2.12) can be effectively controlled by the Tucker rank. In particular,

for certain classes of function related tensors it is possible to prove exponential

convergence (cf. [44, 60]),

kA(r)−A0k ≤ Ce−αˆrwith ˆr= min

ℓrℓ,(2.14)

where A(r)is a minimizer in (2.12). As a consequence, the approximation error

ε > 0 can be achieved with ˆr=O(|log ε|).

The following Lemma proves that the relative difference of norms of the best

rank-(r1,...,rd) Tucker approximation A(r)and the target A0is estimated by the

square of the relative Frobenius norm of A(r)−A0.

Lemma 2.9 (quadratic convergence in norms). Let A(r)∈RI1×...×Idsolve the

minimisation problem (2.12) over A∈Tr. Then we have the ”quadratic” relative

error bound kA0k−kA(r)k

kA0k≤kA(r)−A0k2

kA0k2.(2.15)

Moreover, it holds kβk=kA(r)k ≤ kA0k.

Proof: First part of the proof is given for the completeness (cf. [23] for a short

exposition). Letting A(r)=β×1V(1) ×2V(2) ...×dV(d)and using Lemma 2.3,

we easily obtain the identity

kA(r)k=kβk,(2.16)

since orthogonal matrices V(ℓ)∈ Vnℓ,rℓdo not effect the Frobenius norm. Further-

more, with fixed V(ℓ)(ℓ= 1, ..., d), relation (2.12) is merely a linear least-square

problem with respect to β∈Rr1×...×rd,

g(β) := hA0, A0i−2hA0,β×1V(1) ×2...×dV(d)i+hβ,βi → min.(2.17)

Hence, the corresponding minimisation condition

g(β+δβ)−g(β)≥0∀δβ∈Rr1×...×rd,

2 Tensor structured (TS) methods for functions in Rd,d≥3

leads to the following equation for the minimiser,

−hA0, δβ×1V(1) ×2...×dV(d)i+hβ, δβi= 0 ∀δβ∈Rr1×...×rd.

This implies, using Lemma 2.3, that

h−A0×1V(1)T×2...×dV(d)T+β, δβi= 0,∀δβ∈Rr1×...×rd,

and we obtain

β−A0×1V(1)T×2...×dV(d)T= 0.(2.18)

Next we readily derive

f(A(r)) = kA(r)k2−2hβ×1V(1) ×2...×dV(d), A0i+kA0k2

=kA(r)k2+kA0k2−2hβ, A0×1V(1)T×2...×dV(d)Ti,

=kA0k2−kβk2,

hence it follows that

kA0k2−kA(r)k2=kA(r)−A0k2.(2.19)

The latter leads to the estimate (clearly (2.19) implies kA0k ≥ kA(r)k)

kA0k−kA(r)k

kA0k=kA(r)−A0k2

(kA(r)k+kA0k)kA0k≤kA(r)−A0k2

kA0|2,

that completes the proof.

The key point for the efficient solution of the minimization problem (2.12) with

S=Tr,nis its equivalence to the dual maximisation problem [23],

[Z(1), ..., Z(d)] = argmax 



hv(1)

ν1⊗... ⊗v(d)

νd, Air

ν=1



(2.20)

over the set of side-matrices V(ℓ)= [v(ℓ)

1. . . v(ℓ)

rℓ] in the Stiefel manifold Vnℓ,rℓ, as

in (2.13).

The following lemma reduces the minimisation of the original quadratic func-

tional to the dual maximisation problem thus eliminating the core tensor βfrom

the optimization process.

Lemma 2.10 ([23]) For given A0∈RI1×...×Id, the minimisation problem (2.12)

on Tris equivalent to the dual maximisation problem

g(Y(1), ..., Y (d)) := 



A0×1Y(1)T×2... ×dY(d)T



2→max (2.21)

2.2 Best orthogonal Tucker approximation (BTA)

over a set Y(ℓ)∈Rnℓ×rℓfrom the Grassman manifold, i.e., Y(ℓ)∈ Gℓ(ℓ= 1, ..., d).

For given maximizing matrices V(m)(m= 1, ..., d), the tensor βminimising

(2.12) is represented by

β=A0×1V(1)T×2... ×dV(d)T∈Rr1×...×rd.(2.22)

Proof: Inspecting the proof of Lemma 2.9 we find that substitution of βin (2.18)

to (2.17) leads to the equivalent minimizing equation

hA0, A0i−hA0×1V(1)T×2...×dV(d)T, A0×1V(1)T×2...×dV(d)Ti → min,(2.23)

that proves (2.21). Finally, (2.18) yields (2.22).

In view of Remark 2.6, the rotational non-uniqueness of the maximizer in (2.20)

can be avoided if one solves this maximisation problem in the so-called Grassmann

manifold that is the factor space of Vnℓ,rℓwith respect to the rotational transforms

[26]. The dual maximisation problem (2.21) posed on the compact manifold can be

proven to have at least one global maximum (see [60, 26]). For the size consistency

of the arising tensors, we require the natural compatibility conditions

rℓ≤¯rℓ:= r1...rℓ−1rℓ+1...rd, ℓ = 1, ..., d. (2.24)

2.2.2 BTA algorithm for full format tensors

The best (nonlinear) Tucker approximation (BTA) based on solving the dual

maximization problem (2.20) is usually solved numerically by the ALS iteration

with the initial guess computed by the higher order SVD [24].

The generalization of the SVD to the dth order tensors has been introduced in

[24] in application to the multidimensional problems in signal processing. It is

called the higher order SVD (HOSVD) or dth order SVD. We recall the theorem

with the basic notations as in [23].

Theorem 2.11 (dth order SVD, HOSVD, [23]).

Every complex n1×n2×... ×nd-tensor Acan be written as the product

A=S ×1U(1) ×2U(2)... ×dU(d),

in which

1. U(ℓ)= [U(ℓ)

1U(ℓ)

2...U(ℓ)

nℓ]is a unitary nℓ×nℓ-matrix,

2. Sis a complex n1×n2×...×nd-tensor of which the subtensors Siℓ=α, obtained

by fixing the ℓth index to α, have the properties of

2 Tensor structured (TS) methods for functions in Rd,d≥3

(i) all-orthogonality: two subtensors Siℓ=αand Siℓ=βare orthogonal for all pos-

sible values of ℓ,α, and βsubject to α6=β:

hSiℓ=α,Siℓ=βi= 0 when α6=β,

(ii) ordering: kSiℓ=1k ≥ kSiℓ=2k ≥ ... ≥ kSiℓ=nℓk ≥ 0for all positive values of ℓ.

The Frobenius norms kSiℓ=ik, symbolized by σ(ℓ)

i, are ℓ-mode singular values of

A(ℓ)and the vector U(ℓ)

iis an ith ℓ-mode left singular vector of A(ℓ).

Next theorem gives the error bound for the truncated HOSVD.

Theorem 2.12 (approximation by HOSVD, [24]). Let the HOSVD of Abe given

as in Theorem 2.11 and let the ℓ-mode rank of A, rank(A(ℓ)), be equal to Rℓ

(ℓ= 1, ..., d). Define a tensor ˜

Aby discarding the smallest ℓ-mode singular values

σ(ℓ)

rℓ+1, σ(ℓ)

rℓ+2, ..., σ(ℓ)

Rℓfor given values of rℓ(ℓ= 1, ..., d), i.e., set the corresponding

parts of Sequal to zero. Then we have

kA−˜

Ak2≤

i1=r1+1

σ(1)

i2=r2+1

σ(2)

2+···+

id=rd+1

σ(d)

Proof: We have

kA−˜

Ak2=

i1=1

i2=1 ···

id=1

i1i2...id−

i1=1

i2=1 ···

id=1

i1i2...id

≤

i1=r1+1

i2=1 ···

id=1

i1i2...id+

i1=1

i2=r2+1 ···

id=1

i1i2...id

+···+

i1=1

i2=1 ···

id=rd+1

i1i2...id

i1=r1+1

σ(1)2

i1+

i2=r2+1

σ(2)2

i2+···+

id=rd+1

σ(d)2

id,

that completes the proof.

Next, we recall the method to solve the (local) maximization problem in Lemma

2.10 which is based on the alternating least squares (ALS) iteration. For the

full format tensors, the sketch of ALS algorithm G BTA reads as follows (see

Algorithm 4.2 in [23] for more details).

Algorithm G BTA (Vn→Tr,n). Given the input tensor A∈Vn, the Tucker

rank r, and the maximum number of ALS iterations kmax ≥1.

2.2 Best orthogonal Tucker approximation (BTA)

SVD

(1)

(2)

(3)

n . n

3 2

1 3

2 1

[1]

[2]

[3]

Figure 2.3: Truncated HOSVD of a tensor A∈Rn1×n2×n3is computed by the truncated

SVD of the ℓ-mode unfolding matrices, ℓ= 1,2,3.

1. Compute the “truncated” HOSVD of Ato obtain an initial guess V(ℓ)

0∈

Rnℓ×rℓfor the ℓ-mode side-matrices V(ℓ)(ℓ= 1, ..., d) (“truncated” SVD ap-

plied to each matrix unfolding A(ℓ)). The complexity of HOSVD is bounded

W=O(dnd+1).(2.25)

2. For k= 1 : kmax perform:

for each q= 1, ..., d, and with fixed side-matrices V(ℓ)

k−1∈Rnℓ×rℓ,ℓ6=q, the

ALS iteration optimises the q-mode matrix V(q)

kvia computing the domi-

nating rq-dimensional subspace (truncated SVD) for the respective matrix

unfolding

B(q)∈Rnq×¯rq,¯rq=r1...rq−1rq+1...rd=O(rd−1),(2.26)

corresponding to the tensor obtained by the q-mode contracted product

B=A×1V(1)

T×2... ×q−1V(q−1)

T×q+1 V(q+1)

k−1

T... ×dV(d)

k−1

Each iteration has the cost O(drd−1nmin{rd−1, n}+dndr), that represents

the expense of SVDs and the computation of matrix unfoldings B(q).

3. Set V(ℓ)=V(ℓ)

kmax , and compute the core βas the representation coefficients

of the orthogonal projection of Aonto Tn=⊗d

ℓ=1Tℓwith Tℓ= span{v(ℓ)

ν}rℓ

ν=1

2 Tensor structured (TS) methods for functions in Rd,d≥3

(see Remark 2.7),

β=A×1V(1)T×2... ×dV(d)T∈Br,

at the cost O(rdn).

With fixed kmax, the overall complexity of the algorithm for d= 3, nℓ=n, and

rℓ=r(ℓ= 1,2,3) (suppose that r2≤n), is estimated by

WF→T=O(n4+n3r+n2r2+nr3) = O(n4),

where different summands denote the cost of initial HOSVD of A, computation

of unfolding matrices B(q), related SVDs, and computation of the core tensor.

For the class of function related tensors we observe fast and robust local con-

vergence of the ALS iteration (though it is not always the case in traditional

applications of the Tucker decomposition in the independent component analysis

and chemometrics). This fact can be, probably, illuminated by the exponential

error bound in the Tucker rank for the rank-rorthogonal approximation (see

(2.14)), which is often observed in applications to tensors representing the physi-

cally relevant functions [67].

However, notice that the Tucker model applied to the general fully populated

tensor of size ndrequires O(dnd+1) arithmetical operations due to the presence of

complexity dominating higher-order SVD. Hence, in computational practice this

algorithm applies only to small dand moderate n.

2.2.3 BTA for rank-Rcanonical input

In some applications, for example in electronic structure calculations, the target

tensor is already presented in the rank-Rcanonical format, A∈CR,n, as in (2.6),

but with relatively large R,

U(R)=XR

ν=1ξνu(1)

ν⊗...⊗u(d)

ν, ξν∈R.

The complexity of tensor-structured operations, in spite of linear scaling with

respect to the one-dimension grid size, is O(R1R2n), is thus increasing also linearly

with respect to the product of the ranks R1and R2of the input tensors. Hence,

large parameters R1and R2may as well lead to substantial numerical cost.

In this case, to reduce the ranks of input tensors, we develop the two-level

canonical-to-Tucker (C2T) approximation with the consequent Tucker-to-canonical

(T2C) transform. The resulting computational cost of MLA operations supple-

mented by such a two-level method can be reduced essentially.

2.2 Best orthogonal Tucker approximation (BTA)

The corresponding canonical-to-Tucker-to-canonical approximation scheme [68]

is presented as the following two-level chain,

CR,n

→TCR,r

→TCR′,r.(2.27)

Here, on Level-I, we compute the best orthogonal Tucker approximation applied

to the CR,n-type input, so that the resultant core is represented in the CR,rformat.

On Level-II, the small-size Tucker core in CR,ris approximated by a tensor in CR′,r

with R′< R. Here we describe the Algorithm on Level-I (which is, in fact, the

most laborious part in computational scheme (2.27)) that has a polynomial cost

in the size of the input data in CR,n(see Remark 2.15).

Next theorem gives the characterisation on the solution structure for the Level-

I scheme in (2.27), and provides the key ingredients to construct its efficient

numerical implementation provided that the target Ais represented by (2.6).

It also presents the error estimates for the reduced rank-rHOSVD type ap-

proximation (RHOSVD), given in the definition below. Suppose for definiteness

that n≤R, so that an SVD of the side-matrix U(ℓ)is given by

U(ℓ)=Z(ℓ)DℓW(ℓ)T=

k=1

σℓ,kz(ℓ)

kw(ℓ)

T, z(ℓ)

k∈Rn, w(ℓ)

k∈RR,

with orthogonal matrices Z(ℓ)= [z(ℓ)

1, ..., z(ℓ)

n], and W(ℓ)= [w(ℓ)

1, ..., w(ℓ)

n], ℓ=

1, ..., d. We use the following notations for the vector entries, w(ℓ)

k(ν) = w(ℓ)

k,ν

(ν= 1, ..., R).

Definition 2.13 (RHOSVD).

Introduce the truncated SVD of the side-matrices U(ℓ),Z(ℓ)

0Dℓ,0W(ℓ)

T, (ℓ= 1, ..., d),

where Dℓ,0=diag{σℓ,1, σℓ,2, ..., σℓ,rℓ}and Z(ℓ)

0∈Rn×rℓ,W0(ℓ)∈RR×rℓ, represent

the orthogonal factors being the respective submatrices of Z(ℓ)and W(ℓ). Then the

RHOSVD approximation is given by

(r)=ξ×1hZ(1)

0D1,0W(1)

Ti×2hZ(2)

0D2,0W(2)

Ti···×dhZ(d)

0Dd,0W(d)

Ti.(2.28)

Theorem 2.14 (Canonical to Tucker approximation).

(a) Let A∈CR,nbe given by (2.6). Then the minimisation problem

A∈CR,n⊂Vn:A(r)= argminT∈Tr,nkA−TkVn,(2.29)

is equivalent to the dual maximisation problem

[V(1), ..., V (d)] = argmax

Y(ℓ)∈Gℓ



ν=1

ξνY(1)Tu(1)

ν⊗... ⊗Y(d)Tu(d)

ν



,(2.30)

2 Tensor structured (TS) methods for functions in Rd,d≥3

over the Grassman manifolds Gℓ,Y(ℓ)= [y(ℓ)

1...y(ℓ)

rℓ]∈ Gℓ(ℓ= 1, ..., d), and where

Y(ℓ)Tu(ℓ)

ν∈Rrℓ.

(b) The compatibility condition (2.24) is simplified to

rℓ≤rank(U(ℓ))with U(ℓ)= [u(ℓ)

1...u(ℓ)

R]∈Rn×R,

and we have the solvability of (2.30) assuming that the above relation is valid.

The maximizer is given by orthogonal matrices V(ℓ)= [v(ℓ)

1...v(ℓ)

rℓ]∈Rn×rℓ, which

can be computed similarly, as in Algorithm GBTA, where the truncated HOSVD

at Step 1 is now substituted by RHOSVD, see(2.28).

A(r)=

k=1

µkv(1)

k1⊗... ⊗v(d)

kd, µk=hv(1)

k1⊗···⊗v(d)

kd, Ai,

so that the core tensor µ= [µk], can be represented in the rank-Rcanonical format

µ=

ν=1

ξν(V(1)Tu(1)

ν)⊗···⊗(V(d)Tu(d)

ν)∈CR,r.(2.31)

(d) Let σℓ,1≥σℓ,2... ≥σℓ,min(n,R)be the singular values of the ℓ-mode side-

matrix U(ℓ)∈Rn×R(ℓ= 1, ..., d). Then the RHOSVD approximation A0

(r), as in

(2.28), exhibits the error estimate

kA−A0

(r)k ≤ kξk

ℓ=1

(

min(n,R)

k=rℓ+1

σ2

ℓ,k)1/2,(2.32)

where kξk=sR

ν=1

ξ2

ν.

Proof: (a) The generic dual maximization problem (2.20) with Agiven by (2.6),

now takes the form (2.30) due to the relation

hy(1)

k1⊗... ⊗y(d)

kd, Ai=

ν=1

ξνhy(1)

k1, u(1)

νi...hy(d)

kd, u(d)

νi.

(b) The compatibility condition ensures the size-consistency of all matrix unfold-

ings. Let us assume that nℓ≤Rfor definiteness. To justify the choice of Z(ℓ)

we notice that using the contracted product representation (2.7) of the canonical

tensors A∈CR,n,

A=ξ×1U(1) ×2U(2) ···×dU(d),

2.2 Best orthogonal Tucker approximation (BTA)

we have expansion (2.28) for the RHOSVD. Now we start from the error repre-

sentation

A−A0

(r)=ξ×1U(1) ×2U(2) ···×dU(d)

−ξ×1hZ(1)

0D1,0W(1)

Ti×2hZ(2)

0D2,0W(2)

Ti···×dhZ(d)

0Dd,0W(d)

=ξ×1hU(1) −Z(1)

0D1,0W(1)

Ti×2hZ(2)

0D2,0W(2)

Ti···×dhZ(d)

0Dd,0W(d)

+ξ×1U(1) ×2hU(2) −Z(2)

0D2,0W(2)

Ti···×dhZ(d)

0Dd,0W(d)

Ti+...

+ξ×1U(1) ×2U(2) ···×dhU(d)−Z(d)

0Dd,0W(d)

Ti.

To proceed, we introduce

∆(ℓ)=U(ℓ)−Z(ℓ)

0Dℓ,0W(ℓ)

T, U(ℓ)

0=Z(ℓ)

0Dℓ,0W(ℓ)

then the ℓth summand in the right-hand side above takes the form

Bℓ=ξ×1U(1) ···×ℓ−1U(ℓ−1) ×ℓ∆(ℓ)×ℓ+1 U(ℓ+1)

0···×dU(d)

This leads to the error bound (by the triangle inequality)

kA−A0

(r)k ≤

ℓ=1 kBℓk=kξ×1∆(1) ×2U(2)

0···×dU(d)

+kξ×1U(1) ×2∆(2) ···×dU(d)

0k+...

+kξ×1U(1) ×2U(2) ···×d∆(d)k.

Here the ℓth term Bℓcan be represented by

ν=1

ξν⊗u(1)

ν···⊗u(ℓ−1)

ν⊗

k=rℓ+1

σℓ,kz(ℓ)

kw(ℓ)

k,ν⊗

rℓ+1

k=1

σℓ+1,kz(ℓ+1)

kw(ℓ+1)

k,ν ···⊗

k=1

σd,kz(d)

kw(d)

k,ν,

providing the estimate (take into account that ku(ℓ)

νk= 1, ℓ= 1, ..., d,ν= 1, ..., R)

kBℓk ≤

ν=1 |ξν|(

k=rℓ+1

σ2

ℓ,kw(ℓ)

k,ν

2)1/2·(

rℓ+1

k=1

σ2

ℓ+1,kw(ℓ+1)

k,ν

2)1/2···(

k=1

σ2

d,kw(d)

k,ν

2)1/2.

Recall that U(ℓ)(ℓ= 1, ..., d) has normalised columns, i.e., 1 = ku(ℓ)

νk=k

k=1

σℓ,kz(ℓ)

kw(ℓ)

k,νk,

implying

k=1

σ2

ℓ,kw(ℓ)

k,ν

2= 1 for ℓ= 1, ..., d and ν= 1, ..., R.

2 Tensor structured (TS) methods for functions in Rd,d≥3

Hence, we finalise the error bound as follows,

kA−A0

(r)k ≤

ℓ=1

ν=1 |ξν| n

k=rℓ+1

σ2

ℓ,kw(ℓ)

k,ν

2!1/2

≤

ℓ=1 R

ν=1

ξ2

ν!1/2 R

ν=1

k=rℓ+1

σ2

ℓ,kw(ℓ)

k,ν

2!1/2

ℓ=1 kξk n

k=rℓ+1

σ2

ℓ,k

ν=1

w(ℓ)

k,ν

2!1/2

=kξk

ℓ=1 n

k=rℓ+1

σ2

ℓ,k!1/2

The case R < n can be analysed along the same line. Now item (d) follows.

We notice that the error estimate (2.32) in Theorem 2.14 actually provides the

control of the RHOSVD approximation error via the computable ℓ-mode error

bounds since, by the construction, we have

kU(ℓ)−Z(ℓ)

0Dℓ,0W(ℓ)

0k2

k=rℓ+1

σ2

ℓ,k, ℓ = 1, ..., d.

This result is similar to the well-known error estimate for the HOSVD approxi-

mation (see Theorem 2.12 and Property 10 in [24]).

Based on Theorem 2.14 the corresponding algorithm C BTA for the rank-Rin-

put data can be designed by respective modifications of Steps 1, 2 in the general

GBTA scheme. In this way we note that each column of the ℓ-mode unfold-

ing matrix A(ℓ)for the rank-Rcanonical tensor in (2.6) can be represented as

the weighted sum of the ℓ-mode canonical vectors u(ℓ)

ν(ν= 1, ..., R) implying

rank(A(ℓ))≤R. Keeping this modification in mind, the sketch of the new algo-

rithm C BTA reads as follows.

Algorithm C BTA (CR,n→TCR,r). Given A∈CR,nin the form (2.6), an

iteration parameter kmax, and the rank parameter r.

1. For ℓ= 1, ..., d compute the truncated SVD of U(ℓ)to obtain orthogonal

matrices Z(ℓ)

0∈Rnℓ×rℓ, representing the rank-rℓRHOSVD approximation

of ℓ-mode dominating subspaces (cost O(dRn min{R, n})).

2. Given an initial guess Z(ℓ)

0, (ℓ= 1, ..., d) for ℓ-mode orthogonal matrices.

Perform kmax ALS iterations as at Step 2 in the general G BTA algorithm

to obtain the maximizer V(ℓ)∈Rnℓ×rℓ,ℓ= 1, ..., d (cost O(drd−1nmin{rd−1, n})

per iteration).

2.2 Best orthogonal Tucker approximation (BTA)

3. Calculate projections of U(ℓ)onto the basis of orthogonal vectors of W(ℓ)as

the matrix product V(ℓ)TU(ℓ)(ℓ= 1, ..., d), at the cost O(drRn).

4. Using the columns in V(ℓ)TU(ℓ)(ℓ= 1, ..., d), calculate the rank-Rcore

tensor µ∈CR,ras in (2.31), in O(drRn) operations and with O(drR)-

storage.

Notice that Step 2 in Algorithm C BTA (CR,n→TCR,r) above is not manda-

tory. It can be omitted if the initial guess Z(ℓ)

0turns out to be “good enough”

with respect to chosen threshold criterion (see estimate (2.32)). Our numerical

study indicates that in the case of tensors related to the class of functions de-

scribing physical quantities in electronic structure calculations, Step 2 in the C2T

transform is not required.

The following remark addresses the complexity issues.

Remark 2.15 Algorithm C BTA (CR,n→TCR,r) exhibits polynomial cost in R, r, n,

O(dRn min{n, R}+drd−1nmin{rd−1, n}),

with exponential scaling in d. In absence of Step 2 (if RHOSVD provides a sat-

isfactory approximation), the algorithm does not contain iteration loops, and for

any d≥2it is a finite SVD-based scheme.

Numerical tests show that Algorithm C BTA(CR,n→TCR,r) is efficient for mod-

erate Rand n, in particular, it works well in electronic structure calculations on

3D Cartesian grids for moderate grid size n.103and for R≤103. However,

in real life applications the computations may require one-dimension grid sizes in

the range nℓ.3·104, (ℓ= 1,2,3) with canonical ranks R≤104. Therefore,

to get rid of a polynomial scaling in R, n, r for 3D applications, we develop new

generation of best Tucker approximation methods based on the idea of multigrid

acceleration of the nonlinear ALS iteration, introduced in Section 3.

2.2.4 Mixed BTA for full format and Tucker tensors

A further reduction of numerical complexity for the Tucker model is based on

the concept of the mixed (two-level) Tucker-canonical approximation [60, 67, 68],

see Definition 2.8. The main idea of the mixed approximation consists of a rank-

structured representation to the Tucker core βin certain tensor classes S ⊂ Br.

In particular, we consider a class S=CR,rof rank-Rcanonical tensors, i.e.,

β∈CR,r.

In the case of full format tensors, the two-level version of Algorithm G BTA

(Vn→Tr,n) can be described as the following computation chain,

→Tr,n

→TCR,r⊂CR,n,

2 Tensor structured (TS) methods for functions in Rd,d≥3

where the Level-I is understood as application of Algorithm G BTA (Vn→Tr,n)

and the Level-II includes the rank-Rcanonical approximation to the small size

Tucker core β∈Br. Figure 2.2 illustrates the computational scheme of the two-

level Tucker approximation.

In the case of function related tensors, our goal is to compute the Level-I ap-

proximation with linear cost in the size of the input data (see Section 3.3).

If the input tensor A0is already presented in the rank-rTucker format then one

can apply the following Lemma 2.16. This lemma presents a simple but useful

characterisation of the mixed (two-level) Tucker model (cf. [60, 67]) which allows

to approximate the elements in Trvia the canonical decomposition applied to

the small sized core tensor.

Lemma 2.16 (Mixed Tucker-to-canonical approximation).

Let the target tensor A∈Tr,nin (2.12) have the form A=β×1V(1) ×2... ×d

V(d)with the orthogonal side-matrices V(ℓ)= [v(ℓ)

1. . . v(ℓ)

rℓ]∈Rn×rℓand with β∈

Rr1×...×rd. Then, for a given R≤min

1≤ℓ≤drℓ(see (2.37), (2.38)),

min

Z∈CR,nkA−Zk= min

µ∈CR,rkβ−µk.(2.33)

Assume that there exists a best rank-Rapproximation A(R)∈CR,nof A, then

there is a best rank-Rapproximation β(R)∈CR,rof β, such that

A(R)=β(R)×1V(1) ×2... ×dV(d).(2.34)

Proof: Below we present the more detailed proof compared with the sketch in

Lemma 2.5, [67]. Notice that the canonical vectors y(ℓ)

kof any test element (see

(2.6)) in the left-hand side of (2.33),

k=1

λky(1)

k⊗... ⊗y(d)

k∈CR,n,(2.35)

can be chosen in span{v(ℓ)

1,...,v(ℓ)

rℓ}, i.e.,

y(ℓ)

rℓ

m=1

µ(ℓ)

k,mv(ℓ)

m, k = 1, . . . , R, ℓ = 1, ..., d. (2.36)

Indeed, assuming

y(ℓ)

rℓ

m=1

µ(ℓ)

k,mv(ℓ)

m+E(ℓ)

kwith E(ℓ)

k⊥span{v(ℓ)

1,...,v(ℓ)

rℓ},

2.2 Best orthogonal Tucker approximation (BTA)

we conclude that E(ℓ)

kdoes not effect the cost function in (2.33) because of the

orthogonality of V(ℓ). Hence, setting E(ℓ)

k= 0, and substituting (2.36) into (2.35),

we arrive at the desired Tucker decomposition of Z,

Z=βz×1V(1) ×2...×dV(d),βz∈CR,r.

This implies

kA−Zk2=k(βz−β)×1V(1) ×2...×dV(d)k2=kβ−βzk2≥min

µ∈CR,rkβ−µk2.

On the other hand, we have

min

Z∈CR,nkA−Zk2≤min

βz∈CR,rk(β−βz)×1V(1) ×2...×dV(d)k2= min

µ∈CR,rkβ−µk2.

Hence, we arrive at (2.33).

Likewise, for any minimizer A(R)∈CR,nin the right-hand side of (2.33), we

obtain

A(R)=β(R)×1V(1) ×2V(2)... ×dV(d)

with the respective rank-Rcore tensor

β(R)=

k=1

λku(1)

k⊗... ⊗u(d)

k∈CR,r,

where u(ℓ)

k={µ(ℓ)

k,mℓ}rℓ

mℓ=1 ∈Rrℓ, are calculated by using representation (2.36),

and then changing the order of summation,

A(R)=

k=1

λky(1)

k⊗... ⊗y(d)

k=1

λk r1

m1=1

µ(1)

k,m1v(1)

m1!⊗... ⊗ rd

md=1

µ(d)

k,mdv(d)

md!

m1=1

...

md=1 (R

k=1

λk

ℓ=1

µ(ℓ)

k,mℓ)v(1)

m1⊗... ⊗v(d)

md.

Now the relation (2.34) implies that

kA−ARk=kβ−β(R)k,

since the ℓ-mode multiplication with orthogonal side matrices V(ℓ)does not change

the cost function. Using the already proven relation (2.33) this indicates that β(R)

is the minimizer in the right-hand side of (2.33).

2 Tensor structured (TS) methods for functions in Rd,d≥3

Lemma 2.16 means that the corresponding low rank Tucker-canonical approxi-

mation of A∈Tr,ncan be reduced to the canonical approximation of a small-size

core tensor.

Lemma 2.16 suggests a two-level dimensionality reduction approach that leads

to a better data structure compared with the standard Tucker model. Though

A(R)∈CR,ncan be represented in the mixed Tucker-canonical format, its efficient

storage depends on further multilinear operations. In fact, if the resultant tensor is

further used in scalar, Hadamard or convolution products with canonical tensors,

it is better to store A(R)in the canonical format of the complexity rdn.

2.2.5 Remarks on the Tucker-to-canonical transform

In the rank reduction scheme for the canonical rank-Rtensors, we use conse-

quently the canonical-to-Tucker (C2T) transform and then the Tucker-to-canonical

(T2C) tensor approximation. Next, we give two useful remarks which characterize

the canonical representation of the full format tensors.

Remark 2.17 applied to the Tucker core tensor of the size r×r×r, indicates

that the ultimate canonical rank of a large-size tensor in Vnhas the upper bound

r2. According to Remark 2.18, its canonical rank can be reduced to a smaller

value using the SVD-based truncation procedure up to a fixed tolerance ε > 0.

Denote by nℓthe single-hole product of ℓ-mode dimensions

nℓ=n1···nℓ−1nℓ+1 ···nd.(2.37)

Remark 2.17 The canonical rank of a tensor A∈Vnhas the upper bound

R≤min

1≤ℓ≤dnℓ.(2.38)

Proof: First, consider the case d= 3. Let n1= max

1≤ℓ≤dnℓfor definiteness. We can

represent a tensor Aas

k=1

Bk⊗Zk, Bk∈Rn1×n2, Zk∈Rn3,

where Bk=A(:,:, k) (k= 1,...,n3) is the n1×n2matrix slice of Aand Zk(i) = 0,

for i6=k,Zk(k) = 1. Let rank(Bk) = rk≤n2,k= 1,...,n3, then

rank(Bk⊗Zk) = rank(Bk)≤n2,

and we obtain

rank(A)≤

n=1

rank(Bk)≤n2n3= min

1≤ℓ≤3nℓ.

2.2 Best orthogonal Tucker approximation (BTA)

The general case of d > 3 can be proven similarly by induction argument.

The next remark shows that the maximal canonical rank of the Tucker core

of 3rd order tensor can be easily reduced to the value ≤r2by the SVD-based

procedure. Though, being not practically attractive for arbitrary high order ten-

sors, the simple algorithm described in Remark 2.18 is proved to be useful for

the treatment of small size 3rd order Tucker core tensors in the rank reduction

algorithms described in the previous sections.

Remark 2.18 Let d= 3 for the sake of clearness. There is a simple procedure

based on SVD to reduce the canonical rank of the core tensor β, within the accu-

racy ε > 0. Denote by Bm∈Rr×r,m= 1, ..., r the two-dimensional slices of βin

some fixed mode. Hence, we can represent

β=

m=1

Bm⊗Zm, Zm∈Rr,(2.39)

where Zm(m) = 1,Zm(j) = 0 for j= 1,...,r,j6=m(there are exactly dpossible

decompositions). Let pmbe the minimal integer, such that the singular values of

Bmsatisfy σ(m)

k≤ε

r3/2for k=pm+ 1, ..., r (if σ(m)

r>ε

r3/2, then set pm=r).

Then, denoting by

Bpm=

km=1

σ(m)

kmukm⊗vkm,

the corresponding rank-pmapproximation to Bm(by truncation of σ(m)

pm+1, ..., σ(m)

r),

we arrive at the rank-Rcanonical approximation to β,

β(R):=

m=1

Bpm⊗Zm, Zm∈Rr,(2.40)

providing the error estimate

kβ−β(R)k ≤

m=1 kBm−Bpmk=

m=1 v

km=pm+1

(σ(m)

km)2≤

m=1 rrε2

r3=ε

Representation (2.40) is a sum of rank-pmterms so that the total rank is bounded

by R≤p1+... +pr≤r2.

This approach can be easily extended to arbitrary d≥3with the bound R≤rd−1.

2 Tensor structured (TS) methods for functions in Rd,d≥3

2.3 Numerics on BTA of function related tensors in

2.3.1 General description

Here we discuss the best rank-rTucker decomposition of 3D tensors arising as

discretization of classical potentials.

For a given continuous function g: Ω →R, Ω := Qd

ℓ=1[aℓ, bℓ]⊂Rd,−∞ < aℓ<

bℓ<∞, we introduce the collocation-type function related tensor of order dby

A0≡A0(g) := [ai1...id]∈RI1×...×Idwith ai1...id:= g(x(1)

i1,...,x(d)

id),

where (x(1)

i1,...,x(d)

id)∈Rdare grid collocation points, indexed by I=I1×...×Id,

x(ℓ)

iℓ=aℓ+iℓ−1

2bℓ−aℓ

nℓ, iℓ= 1,2,...,nℓ, ℓ = 1,...,d, (2.41)

which are the midpoints of nℓequally spaced subintervals with the size hℓ=bℓ−aℓ

nℓ,

in the intervals [aℓ, bℓ], corresponding to mode ℓ. We are interested in the validity

and the rank-dependence of the Tucker approximations to the function related

tensors discretized on 3D Cartesian grid.

The initial tensor A0is approximated by a rank r= (r, ..., r) Tucker tensor,

where the rank-parameter rincreases from r= 1,2, ... to some predefined value,

rmax. The orthogonal Tucker vectors and the core tensor of the size r×r×r

are then used for the construction of the approximating tensor A(r)≈A0, for

estimating the approximation properties of the tensor decomposition with the

given rank. For every Tucker rank rin the respective range, we compute the

relative error in the Frobenius norm as in (2.2)

EF N =kA0−A(r)k

||A0|| ,(2.42)

the relative difference of norms

EF E =kA0k−kA(r)k

||A0|| ,(2.43)

as well as the relative maximum norm

EC:= maxi∈I |a0,i −ar,i|

maxi∈I |a0,i|.(2.44)

2.3 Numerics on BTA of function related tensors in R3

2 4 6 8 10 12

10−12

10−10

10−8

10−6

10−4

10−2

100

Tucker rank

error

Newton potential , b=10, n = 64

EFN

EFE

0 2 4 6 8 10

−0.6

−0.4

−0.2

0.2

0.4

0.6

Orthogonal vectors, r=6

x(1)−axis

Figure 2.4: Convergence in the corresponding norms with respect to the Tucker

rank r(left) and orthogonal vectors v(1)

k,k= 1,...,6, (right) for the

Tucker approximation to the Newton potential.

2.3.2 Numerics for classical potentials

We apply the numerical BTA algorithms to the full format tensors, and study the

Tucker decomposition properties of the tensors generated by the Newton kernel,

the Slater-type, Yukawa and Helmholtz functions in Rd,d= 3.

1. Newton kernel

We apply the best rank-rTucker decomposition algorithm with r= (r, ..., r)

for approximating the Newton potential

g(x) = 1

kxk, x ∈R3,

in the cube [0, b]3with b= 10, on the cell-centred uniform grid with n= 64.

Here and in the following kxk=sd

ℓ=1

ℓdenotes the Euclidean norm of x∈

Rd. Due to spherical symmetry of the function, we consider the same sampling

points x(ℓ)

i=h/2 + (i−1)hfor all space variables. Figure 2.4, left shows stable

exponential convergence of the relative Frobenius, FE and maximum norms with

respect to the Tucker rank up to r= 12. The right hand side of Figure 2.4

shows the orthogonal vectors v(1)

k,k= 1,...,6, for the mode ℓ= 1 (x(1)-axis).

The absolute error for the Tucker approximation with rank r= 14 in the cross-

section S:= [h/2, b −h/2] ×[h/2, b −h/2] ×h/2 is shown in Figure 2.5, where

the maximum value is of the order ∼10−7. Figure 2.9 presents matrix slices

Mβ,νr∈R9×9×1, νr= 1,...9, of the core tensor β∈R9×9×9(see (2.8)), where the

2 Tensor structured (TS) methods for functions in Rd,d≥3

0510

4x 10−7

Figure 2.5: Left: the 3D Newton potential over cross-section S= [h/2, b −h/2] ×

[h/2, b −h/2] ×h/2, right: the absolute approximation error for the

Tucker decomposition with rank r= 14 (max ∼10−7).

6.9e+01 1.0e+01 2.7e+00

8.9e−01 2.9e−01 9.4e−02

2.9e−02 8.4e−03 2.4e−03

Newton potential , b=10, n = 64

Figure 2.6: Coefficients of the core tensor β∈R9×9×9for the Tucker approxima-

tion of the Newton potential. We show the maximum values of |βν|

for every matrix slice Mβ,νr∈R9×9×1,νr= 1,...,9, of β.

2.3 Numerics on BTA of function related tensors in R3

2 4 6 8 10 12 14

10−10

10−5

100

Tucker rank

error

Slater function, b=10, n = 64

EFN

EFE

0 2 4 6 8 10

−0.5

−0.4

−0.3

−0.2

−0.1

0.1

0.2

0.3

0.4

0.5

Orthogonal vectors, r=6

y−axis

Figure 2.7: Convergence history and orthogonal vectors of the Tucker side matrix

V(1) ∈Rn×r1,r1= 6, for the Slater potential.

numbers indicate the maximum values of |βν|at a given slice Mβ,νrof β.

Figure 2.6 shows the decay of the entries of the core tensor βfor the Tucker

decomposition with the rank r= 9 (smaller ranks are chosen for simplicity of

visualization). We observe that for the particular class of function related tensors

A0∈Vn, the core tensor β∈Brturns out to be sparse (up to a certain thresh-

old). We further demonstrate that the sparsity of the representation coefficients

of the target tensor in the space Brwill be even stronger for the physically rele-

vant functions with strong singularities arising in electronic structure calculations.

2. Slater function

We are interested in the approximate low-rank representation of the Slater type

functions which play significant role in electronic structure calculations.

The Slater function given by

g(x) = exp(−αkxk) with x= (x1, x2, x3)T∈R3,

presents the electron “orbital” (α= 1) and the electron density function (α= 2)

corresponding to the Hydrogen atom. We compute the rank-(r, r, r) Tucker

approximation to the function related tensor defined on the same grid as in the

previous section, with b= 10.

Figure 2.7 shows that the Slater function can be efficiently approximated by low

rank Tucker tensors. In fact, Tucker rank r= 14 provides a maximum absolute

error of the approximation of order ∼10−8, see Figure 2.8.

Figure 2.9 presents the matrix slices Mβ,νr∈R9×9×1, νr= 1,...,9, of the core

tensor β∈R9×9×9, where the numbers indicate the maximum values of the core

entries at a given slice Mβ,νr∈R9×9×1of β. Figure shows that the “energy” of

2 Tensor structured (TS) methods for functions in Rd,d≥3

0510

0.2

0.4

0.6

0.8

0510

0.5

1.5

2x 10−8

Figure 2.8: Left: the 3D Slater function in the cross-section S= [0, b −h]×

[0, b−h]×{0}, right: the absolute approximation error for the Tucker

decomposition with rank r= 14 (max ∼10−8), for the same section.

9.2e+00 6.8e−01 1.1e−01

2.4e−02 6.0e−03 1.6e−03

4.7e−04 1.4e−04 4.0e−05

Slater function , b=10, n = 64

Figure 2.9: Coefficients of the core tensor β∈R9×9×9for the Tucker approxima-

tion of the Slater function. We show the maximum values of |βν|for

every matrix slice Mβ,νr∈R9×9×1,νr= 1,...,9, of β.

2.3 Numerics on BTA of function related tensors in R3

2 4 6 8 10 12

10−10

10−5

100

Tucker rank

error

Yukawa potential, b=10, n = 64

EFN

EFE

0 10 20 30 40 50 60

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6 Orthogonal vectors, r=6

Yukawa , AR=10, n = 64

y−axis grid points

Figure 2.10: Tucker approximation of the Yukawa potential and an example of

the Tucker orthogonal vectors v(2)

k,k= 1,...,6, (right).

2 4 6 8 10 12

10−8

10−6

10−4

10−2

100

Tucker rank

error

Helmholz potential, b=6.2832, n = 64

EFN

EFE

0 10 20 30 40 50 60

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6 Orthogonal vectors, r=6

Helmholz , AR=10, n = 64

x−axis grid points

Figure 2.11: Convergence history (left) and orthogonal vectors v(1)

k,k= 1,...,6,

(right) for the Tucker approximation of the Helmholtz potential.

the decomposed function is concentrated in several upper slices of the core tensor.

Our numerical experiments show that the dominating entries in βare compactly

concentrated in its ”upper left corner”. This feature of the Tucker decomposition

will be employed further in Section 3.

3. Yukawa and Helmholtz Functions

In the next example, we consider a Tucker approximation of the third-order

function-related tensor generated by the Yukawa potential

g(x) = e−kxk

kxkwith x= (x1, x2, x3)T∈R3.

We consider the function related tensor with the same “voxel-centred” collocation

2 Tensor structured (TS) methods for functions in Rd,d≥3

−5

0.05

0.1

0.15

0.2

0.25

Figure 2.12: A slice of a 3D multi-centred Slater potential with 43centers.

points with respect to the n×n×n-grid over [0, b]3with b= 10 as in previous

examples.

Figure 2.10 shows the convergence history and the orthogonal vectors v(2)

k= 1,...,6, of the Tucker decomposition of the Yukawa potential. These vec-

tors represent the problem adaptive orthogonal basis. In almost all cases the

ALS iteration to compute the Tucker approximation converges very fast and it is

terminated at most after 5 iterations.

Fig. 2.11 provides computational results for the Helmholtz function given by

g(x) = cos kxk

kxkwith x= (x1, x2, x3)T∈R3,

indicating robust convergence in the considered applications.

5. Periodic structures of Slater functions

Finally, we analyse the “multi-centered Slater potential“ obtained by displacing

a single Slater function with respect to the m×m×mspatial grid of step-size

H > 0, specifying centers of Slater functions,

g(x) = c

i=1

j=1

k=1

e−α√(x1−iH+m+1

2H)2+(x2−jH+m+1

2H)2+(x3−kH+m+1

2H)2.(2.45)

Figure 2.12 shows the multi-centred Slater potential for m= 4, H= 3, α= 2

and the corresponding approximation error in the cube [−5,5]3on the n×n×n

grid with n= 64, the surface level corresponds to x3= 2. Convergence is shown

in Figure 2.15 (left-top).

2.3 Numerics on BTA of function related tensors in R3

2 4 6 8 10 12

10−15

10−10

10−5

100

Tucker rank

Multicentered Slater funct.with 1000 samp.

EFN, n=129

EFE, n=129

−5

0.2

0.4

0.6

Figure 2.13: Top: rank-convergence of the Tucker approximation for the multi-

centered Slater potential with 103cells ; bottom: a 2D plane section

of the 3rd order tensor representing the multi-centred Slater potential

with 103centers.

2 Tensor structured (TS) methods for functions in Rd,d≥3

2 4 6 8 10 12

10−15

10−10

10−5

100

Tucker rank

Slater function with 4096 nodes

EFN, n=257

EFN, n=129

EFN, n=65

EFE, n=257

EFE, n=129

EFE, n=65

−6 −4 −2 0246

−6

−4

−2

0.5

Figure 2.14: Top: convergence of G BTA applied to the multi-centered Slater po-

tential with 4096 cells with respect to the Tucker rank; bottom: a

2D plane section of the 3rd order tensor corresponding to the multi-

centred Slater potential with 4096 centers.

2.3 Numerics on BTA of function related tensors in R3

Next, we consider function related tensors with a large number of periodic cells.

Figure 2.13 (top) shows the convergence history for the tensor corresponding to

the multicentered Slater function with 103centers. Figure 2.13 (bottom) visualizes

a 2D plane section of this 3rd order tensor. Figures 2.14 show the approximating

error and a 2D plane section of a 3rd order tensor representing the multicentered

Slater function with 4096 cells. Note, that the Tucker approximation using the

grid size n= 257, is computationally feasible only when applying the MGA Tucker

algorithm which we introduce in Section 3.

Investigation of these periodic structures show, that the convergence rate of the

rank-(r, r, r) Tucker approximation practically does not depend on the number of

cells in a considered structure. It is demonstrated that in all cases for the Slater

function (see convergence of the multicentered Slater function in Figures 2.13, 2.14

and the convergence for a single Slater function in Figure 2.7), equal rank param-

eters imply equal accuracy of the Tucker approximation. For example, for the

Tucker rank r= 10, it is exactly 10−5for all versions of the single/multicentered

Slater function. This feature can be valuable, in the grid-based modelling of peri-

odic (or nearly periodic) structures in the density functional theory. It indicates

that the Tucker decomposition can be helpful in constructing of a small number

of problem-adapted basis functions for huge (almost) periodic clusters of atoms.

6. Multicentered Slater function with random perturbations

Results in the case of the Slater potential which entries are perturbed randomly

are given in Figure 2.15. The random constituent equals to 1, 0.1 and 0.01 per-

cents of the maximum amplitude. It is shown that the exponential convergence

in the Tucker rank is observed only until the approximation level of the order of

the random perturbation. Further increase in the Tucker rank does not improve

essentially the approximation.

7. Conclusions to Section 2.3.2

The numerical examples for the function related tensors presented in Section

2.3.2 have led to further development of the Tucker-based algorithms for the

problems in electronic structure calculations. Here we formulate some conclusions

to above numerics.

Remark 2.19 The Tucker approximation error for the considered class of func-

tion related tensors decays exponentially with respect to the Tucker rank.

Remark 2.20 The shape of the orthogonal vectors in the unitary matrices of the

Tucker decomposition for the class of function related tensors is almost indepen-

dent on n.

2 Tensor structured (TS) methods for functions in Rd,d≥3

2 4 6 8 10 12

10−10

10−5

100

Tucker rank

error

Slater−multi potential, AR=10, n = 32

E(FN)

E(FE)

E(C)

1 1.5 2 2.5 3 3.5 4

10−4

10−3

10−2

10−1

100

relative energy−norm

relative energy

Tucker rank

error

Slater−Mult−Rand 1% , AR=10, n = 64

1 2 3 4 5 6

10−6

10−5

10−4

10−3

10−2

10−1

100

relative energy−norm

relative energy

Tucker rank

error

Slater−Mult−Rand 0.1% , AR=10, n = 64

1 2 3 4 5 6

10−8

10−6

10−4

10−2

100

relative energy−norm

relative energy

Tucker rank

error

Slater−Mult−Rand 0.01% , AR=10, n = 64

Figure 2.15: Convergence history for the multi-centered unperturbed (upper left

figure) and randomly perturbed Slater potential.

2.3 Numerics on BTA of function related tensors in R3

Remark 2.21 The entries of the core tensor of the Tucker decomposition for the

considered function related tensors decay fast vs. index kℓ= 1,...,r,ℓ= 1,2,3.

Remark 2.22 For a fixed approximation error, the Tucker rank of periodic struc-

tures practically does not depend on the number of cells included in the computa-

tional box (see also numerics in Section 3.3.2).

Properties of the Tucker decomposition for the function related tensors described

in Remarks 2.20 and 2.21 will be used further in the development of the multigrid

accelerated BTA.

2.3.3 Application to functions in electronic structure

calculations

Here, the approximating properties of the orthogonal Tucker-type representation

(2.10) are studied for the 3D electron densities of some simple molecules. The

ALS iterative scheme is applied to compute the low-rank tri-linear approximations

for electron densities of the H atom, LiH, CH4, C2H6and H2O molecules.

We assume that any particular molecule is embedded in a certain fixed com-

putational box [−b, b]3with a suitable b > 0. Let ω3⊂[−b, b]3be a uniform

n×n×ntensor grid of collocation points introduced in (2.41), and indexed by

I=I1×I2×I3. The molecular orbitals and electron densities of the considered

molecules are represented as

ϕa(x) =

k=1

ckagk(x), a = 1, ..., Norb,(2.46)

ρ(x) := 2

Norb

a=1

ϕ2

a(x) = 2

Norb

a=1 R0

k=1

ckagk(x)!2

, x ∈R3,(2.47)

where Norb is the number of electron pairs in a molecule and R0is the number of

Gaussians in the GTOs basis, of type

gk(x) = µk(x1−Ak)ℓk(x2−Bk)mk(x3−Ck)nkexp(−αkζ2

k) (2.48)

with

ζ2

k= (x1−Ak)2+ (x2−Bk)2+ (x3−Ck)2, x = (x1, x2, x3)T∈R3,

where the parameters for the Gaussians are taken from the standard quantum

chemistry package MOLPRO [108]. For each particular molecule we use the

following physically relevant parameters: b= 10 bohr, R0= 10 for H atom,

2 Tensor structured (TS) methods for functions in Rd,d≥3

5 10 15

10−6

10−4

10−2

100

rank

5 10 15

10−6

10−4

10−2

100

rank

5 10 15

10−6

10−4

10−2

100

rank

EFN

Figure 2.16: Approximation error in EF N and ECnorms versus Tucker rank for

the electron densities of a) CH4, b) C2H6and c) H2O molecules with

n= 65.

b= 7 bohr, R0= 34 for LiH, b= 5 bohr, R0= 55 for CH4,b= 5.8 bohr, R0= 96

for C2H6and b= 10 bohr, R0= 41 for H2O.

We discretize ρ: [−b, b]3→R, by the function related tensor of order 3 by

computing its entries according to relations (2.47), (2.48)

A0≡A0(g) := [ai1i2i3](i1,i2,i3)∈I ∈Rn×n×nwith ai1i2i3:= ρ(x(1)

i1, x(2)

i2, x(3)

i3),

where (x(1)

i1, x(2)

i2, x(3)

i3)∈ω3are the grid collocation points. In this way, we ob-

tain the tensor representation A0of the corresponding density function, which is

approximated by a rank r= (r, r, r) Tucker decomposition for a sequence of rank-

parameters r= 1,2, . . ., r0. The orthogonal matrices V(ℓ)∈Rn×r, ℓ = 1,2,3,

and the corresponding core tensor β∈Rr×r×rare then used for the construction

of the approximating tensor A(r)≈A0. For every rank-(r, r, r) Tucker approxi-

mation, we compute the relative error with respect to the Euclidean norm (2.42),

as well as the relative maximum error (2.44).

The approximation errors shown in Figure 2.16 verify the exponential con-

vergence of the orthogonal Tucker approximation (in the rank-parameter r) of

electron densities reaching the relative accuracy ∼10−5for CH4, H2O and C2H6

with r= 16. It is seen that the orthogonal vectors v(ℓ)

νℓ(νℓ= 1, ..., 4, ℓ= 1,3) of

the tensor-product decomposition for the H atom, LiH and C2H6molecules shown

in Figures 2.17 and 2.18 resemble the shape of the decomposed electron density

along the corresponding spatial axis. Due to orthogonality of the decomposi-

tion, the Tucker model appears to be suitable for constructing a low dimensional

problem-dependent orthogonal basis. The entries of the core tensor presented in

Figure 2.18 are the weights βν1ν2ν3of the corresponding tensor products of the

orthogonal vectors v(1)

ν1⊗v(2)

ν2⊗v(3)

ν3, which compose the summands of A(r)in (2.8)

for νℓ≤6. Figures 2.19 a) and c) visualise the electron density of CH4in a

2.3 Numerics on BTA of function related tensors in R3

−10 −5 0 5 10

−0.5

0.5

atomic units

a) Hydrogen

V(1)1

V(1)2

V(1)3

V(1)4

−5 0 5

−0.8

−0.6

−0.4

−0.2

0.2

0.4

0.6

atomic units

b) LiH

V(1)1

V(1)2

V(1)3

V(1)4

Figure 2.17: Orthogonal vectors v(1)

ν1,ν1= 1, ..., 4, for the rank-10 orthogonal

Tucker approximation of the electron density for a) the H atom and

b) for the LiH molecule.

1.1e+02 1.6e+01 5.3e+00

3.1e+00 1.2e+00 6.0e−01

C2H6, core tensor

−6 −4 −2 0 2 4 6

−0.4

−0.2

0.2

0.4

0.6

C2 H6

atomic units

V(3)

Figure 2.18: Slices Mβ,νr∈R6×6×1, νr= 1,...,6 of the core tensor β∈R6×6×6

and the orthogonal vectors v(3)

ν3,ν3= 1, ..., 6 of the rank-(6,6,6)

Tucker decomposition of the electron density of C2H6molecule.

Numbers correspond to the maxima of |βν|for the corresponding

slice of the core tensor.

2 Tensor structured (TS) methods for functions in Rd,d≥3

−5 05

−5

a) CH4

−5 05

−5

c) C2 H6

−5 05

−5

x 10−4

d) C2 H6, abs.error, r=16

−5 05

−5

x 10−5

b) CH4, abs. error , r=16

Figure 2.19: Electron densities and absolute errors of rank r= 16 Tucker approx-

imation for CH4and C2H6molecules.

plane containing the C atom and of C2H6in a plane containing both C atoms,

correspondingly. Figures 2.19 b) and d) visualise the absolute approximation er-

ror for the electron densities ρof these molecules in the corresponding planes for

r= 16. In spite of large values of ρat the cusp regions (∼60 units) we observe

a rather uniform distribution of the absolute approximation error of the order

∼10−4÷10−5in the computational domain. This is a typical feature of the

orthogonal Tucker decomposition. For H2O with even larger cusp (∼148 units)

at the origin, we see in Figure 2.16 c) that the Tucker approximation of ρfor this

molecule yields the relative accuracy ∼10−5for the Tucker rank r= 16. Finally,

we present the convergence behaviour of the Tucker approximation applied to

the tensor representation of the Hartree potentials of C2H6and H2O molecules,

see Figure 2.20, indicating exponential convergence in the Tucker rank r. The

numerical examples demonstrate efficiency of the low-rank Tucker tensor approx-

imations for the electron densities and the Hartree potential of the considered

molecules. This provides the background for further application of the Tucker

model in low-complexity numerical evaluation of functionals and operators in the

electronic structure calculations, see Sections 3, 4 and 5.

2.4 Tensorisation of basic multilinear algebra (MLA) operations

2 4 6 8 10 12 14 16 18

10−4

10−2

100

Tucker rank

error

b) Hartree Potential, C2 H6

EFN

2 4 6 8 10 12 14 16 18

10−4

10−2

100

Tucker rank

c) Hartree Potential, H2 O

EFN

Figure 2.20: Approximation error in EF N and ECnorms versus Tucker rank for

the Hartree potentials of C2H6(left) and H2O (right).

2.4 Tensorisation of basic multilinear algebra

(MLA) operations

For the sake of clarity (and without loss of generality) in this section we assume

that r=rℓ,n=nℓ(ℓ= 1, ..., d). If there is no confusion, the index ncan

be skipped. We denote by Nthe complexity of various tensor operations (say,

Nh·,·i) or the related storage requirements (say, Nst(β)). We estimate the storage

demands Nst and complexity of the following standard tensor-product operations:

the scalar product, the so-called Hadamard (component-wise) product, and the

convolution transform. We consider the MLA operations in Tr,n, and CR,ntensor

classes.

The Tucker model requires

Nst,T =drn +rd(2.49)

storage to represent a tensor.

The number of parameters in the rank-Rcanonical model scales linearly in R,

Nst,C =dRn. (2.50)

Setting R=αr with α≥1, we can specify the range of parameters where the

Tucker model is less storage consuming compared with the canonical one

rd−1≤d(α−1)n, (for d= 3 : r2≤3(α−1)n).

In general, the numerical Tucker decomposition leads to a fully populated core

tensor, i.e., it is represented by rdnonzero elements. However, the special data

2 Tensor structured (TS) methods for functions in Rd,d≥3

structure of the Tucker core can be imposed which reduces the complexity of the

corresponding tensor operations (cf. [60]). In particular, for the mixed (two-

level) Tucker-canonical decomposition (see Definition 2.8), storage demands scale

linearly in d,

Nst,T C =dr(n+R).

2.4.1 Some bilinear operations in the Tucker format

For given tensors A1∈Tr1, A2∈Tr2represented in the form (2.8), i.e.,

A1=β×1U(1) ×2U(2) ...×dU(d), A2=ζ×1V(1) ×2V(2) ...×dV(d),(2.51)

the scalar product (2.2) is computed by

hA1, A2i:=

k=1

m=1

βk1...kdζm1...md

ℓ=1 Du(ℓ)

kℓ, v(ℓ)

mℓE.(2.52)

In fact, applying the definition of the scalar product in (2.2) to the rank-1 tensors

(with R=r= 1), we have

hA1, A2i:= X

i∈I

u(1)

i1···u(d)

idv(1)

i1···v(d)

i1=1

u(1)

i1v(1)

i1···

id=1

u(d)

idv(d)

id=

ℓ=1 u(ℓ), v(ℓ).(2.53)

Then the above representation follows by combining all rank-1 terms in the left-

hand side in (2.52).

We further simplify and suppose r=r1=r2. Then the calculation in (2.52)

includes dr2scalar products of vectors of size nplus r2dmultiplications, leading

to the overall complexity

Nh·,·i =Odnr2+r2d,

and the same for the respective norm.

Note that in the case of mixed Tucker-canonical decomposition (see Definition

2.8) the scalar product can be computed in O(R2+dr2n+dR2r) operations (cf.

[60], Lemma 2.8).

For given tensors A, B ∈RI, the Hadamard product A⊙B∈RIof two tensors

of the same size Iis defined by the componentwise product,

(A⊙B)i=ai·bi,i∈ I.

2.4 Tensorisation of basic multilinear algebra (MLA) operations

Hence, for A1, A2∈Tr, as in (2.51), we tensorize the Hadamard product by

A1⊙A2:=

k1,m1=1 ···

kd,md=1

βk1...kdζm1...mdu(1)

k1⊙v(1)

m1⊗... ⊗u(d)

kd⊙v(d)

md.

(2.54)

Again, applying definition (2.2) to the rank-1 tensors (with β=ζ= 1), we obtain

(A1⊙A2)i=(u(1)

i1v(1)

i1)···(u(d)

idv(d)

id)

=u(1) ⊙v(1)⊗···⊗u(d)⊙v(d).(2.55)

Then (2.54) follows by summation over all rank-1 terms in A1⊙A2. Relation

(2.54) leads to the storage requirements

Nst(A⊙B)=O(dr2n+r2d),

that includes the memory size for dmodes n×r×rTucker vectors, and for the

new Tucker core of size (r2)d.

The multi-dimensional convolution product is one of the basic integral trans-

forms in the wide range of applications including many-particle models (see [27,

59, 65]). In this dissertation we apply the tensor-product convolution scheme

corresponding to the discretisation by piecewise constant basis functions over

uniform Cartesian grids. Methods to compute the multidimensional convolution

on refined grids, by higher order elements and using the wavelet basis were con-

sidered in [42, 43].

For given tensors F= [fi]∈RI, G = [gi]∈RI, we define their discrete

convolution product by

F∗G:= "X

i∈I

figj−i+1#j∈J

,J:= {1, ..., 2n−1}d,

where j−i+ 1 ∈ I (equivalent to the assumption that Gcan be extended to the

larger index set beyond Iby zeros).

For given A1, A2∈Tr, see (2.51), we now tensorize the convolution product

via

A1∗A2:=

k=1

m=1

βk1...kdζm1...mdu(1)

k1∗u(1)

m1⊗... ⊗u(d)

kd∗v(d)

md.(2.56)

This relation again follows from the analysis for the case of rank-1 convolving

tensors Fand G, similar to the discussion for scalar product of tensors,

(F∗G)j:= X

i∈I

f(1)

i1···f(d)

idg(1)

j1−i1+1 ···g(d)

jd−id+1

i1=1

f(1)

i1g(1)

j1−i1+1 ···

id=1

f(d)

idg(d)

jd−id+1 =

ℓ=1 f(ℓ)∗g(ℓ)jℓ.(2.57)

2 Tensor structured (TS) methods for functions in Rd,d≥3

Assuming that ”one-dimensional” convolutions of n-vectors, u(ℓ)

kℓ∗v(ℓ)

mℓ∈R2n−1,

can be computed in O(nlog n) operations, we arrive at the overall complexity

estimate

N·∗· =O(dr2nlog n+r2d).

In our particular case of equidistant grids we obtain (by setting a=u(ℓ)

kℓ, b =

v(ℓ)

mℓ∈Rn)

(u(ℓ)

kℓ∗v(ℓ)

mℓ)j=

i=1

aibj−i+1, j = 1, ..., 2n−1.

Hence, the 1D convolution can be performed by FFT in O(nlog n) operations.

We notice that the convolution product appears to be one of the most com-

putationally elaborate operations since in general one might have r2dterms in

(2.56). Significant complexity reduction is already observed if at least one of the

convolving tensors can be represented in the rank-Rcanonical format, so that we

have only rdRterms in the sum (2.56).

2.4.2 Summary on MLA operations in rank-Rcanonical format

In our numerical scheme we apply the following linear operations with dth order

tensors:

1. summation of tensors;

2. scalar product of tensors;

3. Hadamard product of tensors;

4. convolution of tensors.

We consider tensors A1,A2, represented in the rank-Rcanonical format, (2.6),

A1=

k=1

cku(1)

k⊗...⊗u(d)

k, A2=

m=1

bmv(1)

m⊗...⊗v(d)

m,(2.58)

with normalized vectors u(ℓ)

k, v(ℓ)

m∈Rnℓ. For simplicity of discussion, we assume

nℓ=n,ℓ= 1, ..., d.

1. A sum of two canonical tensors given by (2.58) can be written as

A1+A2=

k=1

cku(1)

k⊗...⊗u(d)

m=1

bmv(1)

m⊗...⊗v(d)

m,(2.59)

resulting in the canonical tensor with the rank at most RS=R1+R2. This

operation has no cost since it is simply a concatenation of two tensors.

2.4 Tensorisation of basic multilinear algebra (MLA) operations

2. For given canonical tensors A1,A2, the scalar product (2.2) is computed by

(see (2.53))

hA1, A2i:=

k=1

m=1

ckbm

ℓ=1 Du(ℓ)

k, v(ℓ)

mE.(2.60)

Calculation of (2.60) includes R1R2scalar products of vectors in Rn, leading to

the overall complexity

Nh·,·i =O(dnR1R2).

3. For A1, A2given by (2.58), we tensorize the Hadamard product by (see

(2.55))

A1⊙A2:=

k=1

m=1

ckbmu(1)

k⊙v(1)

m⊗...⊗u(d)

k⊙v(d)

m.(2.61)

This leads to the complexity O(dnR1R2).

4. The convolution product of two tensors in the canonical format (2.58), is

given by (see (2.57))

A1∗A2:=

k=1

m=1

ckbmu(1)

k∗v(1)

m⊗...⊗u(d)

k∗v(d)

m,(2.62)

leading to the asymptotic complexity O(dn log nR1R2).

2 Tensor structured (TS) methods for functions in Rd,d≥3

3 Multigrid Tucker approximation

of function related tensors

3.1 Motivations

In the previous section we have discussed the general BTA algorithms, which

provide complexity of the Tucker tensor approximation of the order

WF2T=O(nd+1) (3.1)

for full format tensors and

WC2T=O(Rn min{R, n}+rd−1nmin{rd−1, n}),(3.2)

for the canonical rank-Rinput tensors. These bounds restrict application of the

general BTA scheme to small dimensions dand moderate grid size n.

Notice, that the general BTA algorithm is computationally not feasible in elec-

tronic structure calculations on large n×n×n3D Cartesian grids, since:

1. For the full format tensors, O(n4) complexity of HOSVD in the general BTA

algorithm restricts the size of the input tensors (maximum size n≈128,

for conventional computers). In this case, our goal is to reach linear in the

volume complexity O(n3), allowing maximum size of the input tensors up

to ≈5123, for conventional computers.

2. In the case of rank-Rinput data, in our applications, both the canonical

rank Rand the univariate grid size nmight be at least of the order 104.

(Fine grids are necessary to resolve strong cusps due to core electrons in

the Hartree potential.) It means that one has to compute the SVD of side

matrices of the sizes ≈104×104, with complexity of order 1012 which is

computationally unfeasible.

To avoid the above limitations, we introduce the multigrid accelerated (MGA)

Tucker decomposition [68], which is based on the successive reiteration of the ALS

Tucker approximation on a sequence of refined grids, using the results of the coarse

3 Multigrid Tucker approximation of function related tensors

grid approximation as the initial guess for the dominating subspaces on finer grid

levels. The idea of the multilevel acceleration originates from investigation of

the numerical examples of BTA for the function related tensors, described in

Section 2. These include Remarks 2.21 and 2.20 on the decay of the Tucker core

entries and the weak dependence of the shape of the orthogonal vectors on the

discretization parameter n, respectively.

We show that in the case of rank-Rcanonical target tensors in Rn×n×n, the

MGA method provides linear complexity (up to low order terms) of the canonical-

to-Tucker decomposition with respect to all input parameters: the maximal uni-

variate grid size n, the canonical rank Rand the Tucker rank r,

WC2T=O(rRn).

The resulting complexity of the decomposition for full format tensors is

WF2T=O(n3),

which currently makes possible the application of multigrid accelerated F2T algo-

rithm to the 3D function-related tensors with the maximal grid size n×n×n=

5123. In fact, applications are only bounded by the storage size for the input ten-

sor. These grids provide the accuracy level required for the consistent electronic

structure calculations in pseudopotential cases.

3.2 Multigrid accelerated BTA of canonical tensors

3.2.1 Basic concept

The concept of the MGA Tucker approximation applies to the multi-dimensional

data obtained as a discretization of physically relevant continuous multivariate

functions on a sequence of refined spatial grids. The typical application areas

include the tensor approximation of multi-dimensional operators and functionals,

the solution of integral-differential equations in Rd, data-structured representation

of physically relevant quantities [14, 52].

For a fixed grid parameter n, let us introduce the equidistant tensor grid

ωd,n := ω1×ω2···×ωd,(3.3)

where ωℓ:= {−b+ (k−1)h:k= 1, ..., n + 1}(ℓ= 1, ..., d) with mesh-size

h= 2b/n. Define a set of collocation points {xi}in Ω := [−b, b]d∈Rd, located

at the midpoints of the grid-cells, and numbered by i∈ I := {1, ..., n}d(see the

explicit definition in (2.41)). For fixed n, the target tensor An= [an,i]∈RIis

3.2 Multigrid accelerated BTA of canonical tensors

defined by sampling the given continuous multivariate function f: Ω →Ron the

set of collocation points {xi}as follows

an,i=f(xi),i∈ I.

The idea of the multigrid accelerated best orthogonal Tucker approximation is

based on the following principles (topics 3-4 below apply to CR,ninitial data):

1. General multigrid concept. Solving a sequence of nonlinear approximation

problems for A=Anas in (2.12) with n=nm:= n02m, m = 0,1, ..., M,

corresponding to a sequence of (d-adic) refined spatial grids ωd,nm. The

sequence of approximation problems is treated successively in one run from

coarse-to-fine grid.

2. Coarse initial approximation to the side-matrices V(q),q= 1, ..., d.Specifi-

cally, the initial approximation of the q-mode orthogonal side-matrices V(q)

on finer grid ωd,nmis obtained by the linear interpolation of the correspond-

ing orthogonal vectors from the coarser grid ωd,nm−1.

3. Most important fibers. (Applies to CR,ninitial data.) We employ the

idea of “most important fibers” (MIFs) of the q-mode unfolding matrices

B(q)∈Rn×rq(see (2.26) in Step 2 of basic Algorithm G BTA, Section 2.2.2),

whose positions are extracted from data on the coarser grids. To identify

the location of MIFs we apply the so-called maximal energy principle as

follows. On the coarse grid, we calculate a projection of the q-mode un-

folding matrix B(q)onto the true q-mode subspace span{v(q)

1, ..., v(q)

rq}with

V(q)= [v(q)

1...v(q)

rq], which is computed as the matrix product,

β(q)=V(q)TB(q)∈Rrq×rq.(3.4)

Now the maximal energy principle specifies the location of MIFs by finding

pr columns in β(q)with maximal Euclidean norms (supposing that pr ≪rq),

see Figures 3.1 and 3.2. The positions of MIFs are numbered by the index

set Jq,p with #Jq,p =pr, being the subset of the larger index set

Jq,p ⊂Jrq:= {1, ..., rq},#Jrq=rq=O(rd−1).

This strategy allows to predict a fixed portion of q-mode fibers in the Tucker

core on fine grids, which accumulate the maximum part of the Frobenius

norm. The union of selected fibers from every mode q(specified by the

index set Jq,p,q= 1, ..., d) accumulates the important information on the

structure of the rank-Rtensor in the space of Tucker coefficients Rr1×···×rd.

This information enormously reduces the amount of computational work on

the fine grids.

3 Multigrid Tucker approximation of function related tensors

+ ... +

n1n n n

1 1 1

r r2

r33 r33

rr22

B(q)

find MIFs

βcore (q)

+ ... +

Figure 3.1: Illustration for d= 3. Finding MIFs in the “preliminary core” β(q)

for q= 1 for the rank-Rinitial data on the coarse grid n=n0=

(n1, n2, n3). B(q)is presented in a tensor form for explanatory reasons.

2 4 6 8 10 12 14

Figure 3.2: MIFs: selected positions of the fibers of the preliminary “cores” for

computing V(1)(left), V(2) (middle) and V(3) (right). The example is

taken from the multigrid rank compression in the computation of the

Hartree potential for H2O, r= 14, p= 4.

4. Performing restricted ALS iteration. The proposed choice of MIFs allows

to reduce the cost of ALS iteration to solving the problem of best rank-r

approximation to the large unfolding matrix B(q)∈Rn×rqwith dominating

second dimension rq=rd−1(always the case for large d). In fact, we reduce

one step of ALS iteration to computation of the r-dimensional dominating

subspace of small n×pr submatrices B(q,p)of B(q)(q= 1, ..., d), where

p=O(1) is some fixed parameter (SVD with matrix-size n×pr instead of

n×rq).

The invention of above principles leads to dramatic complexity reduction of the

standard tensor algorithms G BTA(Vn→TBr) and C BTA(CR,n→TCR,r). In the

latter case, this approach leads to the efficient tensor approximation method with

linear scaling in nand R, up to the computational cost on the coarsest level. In

the case of fully populated tensors we arrive, at least, at the linear cost O(nd),

corresponding to the storage space for the initial tensor.

3.2 Multigrid accelerated BTA of canonical tensors

2 4 6 8 10 12 14 16 18 20

Figure 3.3: MIFs: selected projections of the fibers of the preliminary “cores” for

computing V(1)(left), V(2) (middle) and V(3) (right). The example

from the MGA rank reduction in computation of VH, in pseudopoten-

tial case of CH4with r= 22, and number of MIFs p= 8.

3.2.2 Description of the Algorithm and complexity bound

For further constructions, we use the 1D interpolation operator I(ℓ)

(m−1,m)from the

coarse to fine grids acting in spatial direction ℓ= 1, ..., d. This might be either the

interpolation by piecewise linear or cubic splines (the latter is our choice in the

current implementation). In the following we focus on the case of rank-Rinput.

The proposed algorithm of the MGA best Tucker approximation for A∈CR,n

can be outlined as follows:

Algorithm MG C BTA (CR,nM→TCR,r). (Multigrid accelerated canonical-

to-Tucker approximation).

1. Given Am∈CR,nmin the form (2.6), corresponding to a sequence of grid

parameters nm:= n02m, m = 0,1, ..., M. Fix a structural constant p=O(1)

(i.e. pr ≪rd−1), iteration parameter kmax, and the Tucker rank r.

2. For m= 0, solve C BTA(CR,n0→ TCR,r) and compute the index set Jq,p(n0)⊂

Jrqvia identification of MIFs in the matrix unfolding B(q),q= 1, ..., d, us-

ing the maximum energy principle applied to the q-mode unfolding of the

Tucker core β(q)as in (3.4).

3. For m= 1, ..., M perform the cascadic MGA nonlinear Tucker approxima-

tion by the restricted ALS iteration:

3a) Compute the initial guess for side matrices on level mby interpolation

I(m−1,m)of the side matrix from level m−1 (by using piecewise linear or

cubic splines)

V(q)=V(q)

m=I(m−1,m)(V(q)

m−1), q = 1, ..., d.

3b) For each q= 1, ..., d, fix V(ℓ)(ℓ= 1, ..., d, ℓ 6=q) and perform:

3 Multigrid Tucker approximation of function related tensors

U(l) (l)

m=0,..., M, l=1,...,d

HOSVD : 00

Given β ,U(l)

m, for,p>0

find pMIFs in B,specify

, l=1,...,d

compute projections

ALS for m=1,...,M (l)

m−1

(l)

using

its canonical rank

compute µMand reduce

q=1,..., d

(q)

Bµ0

using matrix unfolding of

compute dominating subspaces of

q,p

(q)

(l)

)

(l)

and the core tensor µ 0

Jq,p

compute reduced unfoldings B

B(q,p) m

(l)

(q,p)

interpolate V V

Figure 3.4: Flow chart of Algorithm MG C BTA applied to the rank-Rtarget.

→Compute matrix products V(ℓ)TU(ℓ),ℓ= 1, ..., d;ℓ6=q, and construct

the ”restricted” q-mode matrix unfolding B(q,p)

B(q,p)=B(q)|Jq,p(n0)∈Rnm×pr,

by calculating pr columns in the complete unfolding matrix B(q)∈Rnm×rq.

→Update the orthogonal matrix V(q)=V(q)

m∈Rnm×rvia computing the

r-dimensional dominating subspace for the ”restricted” matrix unfolding

B(q,p)(truncated SVD of nm×pr matrix).

4. Compute the rank-Rcore tensor β∈CR,r, as in Step 3 of basic algorithm

CBTA (CR,n→ TCR,r).

Figure 3.4 shows the flow chart of Algorithm MG C BTA.

The next statement proves the linear complexity of Algorithm MG C BTA.

Theorem 3.1 Algorithm MG C BTA(CR,nM→TCR,r) amounts to

O(dRrnM+dp2r2nM)

operations per ALS loop, plus extra cost of the coarse mesh solver C BTA (CR,n0→TCR,r).

It requires O(drnM+drR)storage to represent the result.

3.2 Multigrid accelerated BTA of canonical tensors

20 40 60 80 100

10−10

10−8

10−6

10−4

10−2

100

n=128

n=256

n=512

n=1024

n=2048

Figure 3.5: Singular values of the mode-1 matrix unfolding B(1,p),p= 4.

Proof: Step 3a) requires O(drnM) operations and memory. Assume without loss

of generality that pr ≤nM, hence the complexity of Step 3b) on the finest level

is bounded by O(dRrnM+dprnM+dp2r2nM) per iteration loop. The rank-R

representation of β∈CR,rrequires O(drRnM) operations and O(drR)-storage.

Summing up these costs over the levels m= 1, ..., M, proves the result since the

relation nM(1 + 1/2 + ...1/2M−1)<2nM.

Figure 3.5 shows the singular values of the mode-1 matrix unfolding B(1,p)

with the choice p= 4, which demonstrates the reliability of the maximal energy

principle in the error control. Similar fast decay of respective singular values

is typical in most of our numerical examples in electronic structure calculations

considered so far. Remarkably, that the “representative subset” of fibers Jq,p

normally has the size pr of several r-s with p≪r. The (controllable) decay

of singular values of the small-size restricted unfolding matrix B(q,p)provides a

criterion for the satisfactory choice of parameter p. If the decay is not fast enough

the algorithm can be restarted with the larger parameter p→p+ 1.

Figure 3.6 demonstrates the linear scaling of the MGA Tucker approximation

in the input rank R, and in the grid size n, applied to C2T rank reduction of the

electron density of CH4. Further numerical illustrations to the above algorithm

will be presented in Section 3.2.3.

3 Multigrid Tucker approximation of function related tensors

200 400 600 800 1000 1200

rank

seconds

n=2048

n=1024

n=512

n=256

Figure 3.6: Linear scaling in Rand n, of the C2T rank reduction algorithm.

Notice that the complexity and error of the MGA Tucker approximation can

be effectively controlled by the adaptive choice of the governing parameters p,r

and n0. Figure 3.7 shows the dependence of computational accuracy of Algorithm

MG C BTA on the choice of the number of important fiber pr and the Tucker

rank r, in the case of Hartree potential of C2H6.

3.2.3 Numerics on rank reduction of the electron density ρ

The MG C BTA algorithm has been evoked by the problem of fast computation of

the Hartree potential VHand the respective Coulomb matrix in the Fock operator,

using multilinear algebra in the tensor-structured formats, see Section 4.

In the evaluation of the Hartree potential for pseudodensities of some simple

molecules good accuracy of order 10−6hartree in the max-norm is achieved already

by computation on moderate n×n×ngrids with n= 400 and n= 800. In this

case the unigrid C2T algorithm described in Section 2.2.3 is sufficient.

However, when computing the all electron densities of molecules, which contain

strong cusps due to core electrons contribution (see the example of all electron

density for the water molecule presented in Figure 3.8), large grid parameters nof

the order of several thousands are required to ensure the high resolution of local

singularities.

In [19], where the wavelet basis set is used for approximation of the electronic

3.2 Multigrid accelerated BTA of canonical tensors

−5 0 5 10

x 10−3

atomic units

abs. error for VH ,C2 H6 , n=8192

r=18, p=5 (times:3.7,4.3)

r=16, p=3 (2.3,3.4)

r=14, p=2 (1.8;2.8)

Figure 3.7: Absolute error in the Hartree potential of C2H6vs. different multigrid

parameters rand p, visualised in the grid interval centered at (0,0,0).

Figure 3.8: Electron density of H2O in the cross-section Ω = [−4,4]×[−4,4]×{0}.

3 Multigrid Tucker approximation of function related tensors

1000 2000 3000 4000 5000 6000 7000 8000

100

X: 8192

Y: 3.45

univariate grid size

minutes

C2 H6, HP, rT =22

unigrid C2T

MG C2T

Figure 3.9: Multigrid vs. single grid MATLAB times for the canonical-to-Tucker

transform in computations of the Hartree potential of C2H6molecule,

r= 22, p= 4.

structure quantities, it is shown that the Hartree potential of CH4and C2H6

molecules can be resolved in the cusp area with accuracies of the order of 10−3,

by computations with the univariate grid size n= 5.1·103in the volume of [−b, b]3

with b= 20 au. Hence, the univariate mesh size of 3D Cartesian grids should be

not larger than h= 2b/n ≤8·10−3au.

In our tensor-structured computations, we apply uniform grids with the mesh-

size up to h= 1.3·10−3au, corresponding to the univariate grid size n= 16384

with b= 10.6 au. Large grids are required for the representation of functions with

multiple strong singularities discretized on large 3D uniform grids. The examples

of cusps of electron densities of some molecules are presented in Figures 3.8 for

the water molecule, and in Figure 2.19 for CH4and C2H6molecules.

The algorithm for computation of the Hartree potential by the tensor-product

convolution will be discussed in Section 4.2, where we demonstrate numerical re-

sults on the accuracy and computational cost. Here we present the comparison

of the MATLAB times for the unigrid and multigrid rank reduction algorithms

which are used in the accurate and fast computation of the Hartree potential on

large 3D Cartesian grids. Figures 3.9 and 3.10 show the comparison of MATLAB

times of the unigrid C2T versus MGA C2T algorithms applied to calculations for

C2H6and CH4molecules, respectively. Figures 3.9 and 3.10 show the compari-

son of MATLAB times of the unigrid C2T versus MGA C2T algorithms applied

to C2H6and CH4molecules, respectively.

3.2 Multigrid accelerated BTA of canonical tensors

2000 4000 6000 8000

grid size

minutes

MG C2T

Single grid C2T

Figure 3.10: Multigrid vs. single grid CPU times for the canonical-to-Tucker

transform in computations of the Hartree potential of CH4molecule.

n312822563512310243204834096381923

p= 4 C2T 10 12 19 41 52 95 188

r= 20 CONV 0.2 0.8 1.5 10.6 22 58 160

p= 16 C2T 34 44 67 118 200 388 770

r= 28 CONV 0.5 1.8 3.2 15.6 35 108.2 352

Table 3.1: Elapsed times (in sec) for the C2T rank reduction of ρand the tensor-

product convolution (CONV) in the computation of VHfor the C2H6

molecule.

Tables 3.1 and 3.2 give computation times for the C2T rank reduction and of

the convolution with the Newton kernel for electron densities of H2O and C2H6

molecules. The initial canonical ranks Rρ0of the tensors are 861 and 4656, for

H2O and C2H6, respectively. The C2T algorithm yields the ranks Rρ≈r p.

Computation times depend on the level of the thresholds εin the algorithms C2T

and T2C. For example, we observe accuracies of the order 10−4for parameters

p= 2, r= 16, and of the order 10−5for parameters p= 8, r= 20 in computations

for H2O molecule. For C2H6molecule, p= 4, r= 20 yield accuracies of the order

10−3, and p= 16, r= 28, the accuracies of the order 10−3. We observe, that

for moderate accuracies the times for the C2T and T2C rank reduction and the

3 Multigrid Tucker approximation of function related tensors

n32563512310243204834096381923163843

p= 2 C2T 4 4.5 5.5 5.7 12.2 23.7 66

r= 16 CONV 0.6 1.1 5.5 12 38 127 255

p= 8 C2T 6.1 8 12 18 39 75 129

r= 20 CONV 1.8 3.2 15.6 34 140 378 784

Table 3.2: Elapsed times (in sec) for the C2T rank reduction of ρand the

tensor-product convolution (CONV) in the computation of VHfor H2O

molecule.

corresponding convolution times are reduced dramatically.

3.3 Multigrid accelerated BTA for the full format

function related tensors

In the case of full format tensors the main principles of the multigrid concept

are based on topics 1, 2 and 4 from §3.2.1. Here we describe the algorithm of

the MGA Tucker approximation of the function related tensors discretized on a

sequence of grids specified as in Section 3.2.

Algorithm MG G BTA (Vn→Tr). (MGA full-to-Tucker approximation).

1. Given Am∈Vnin the form (2.1), corresponding to a sequence of grid

parameters nm:= n02m, m = 0,1, ..., M. Fix the Tucker rank r, and the

iteration number kmax.

2. For m= 0, solve the approximation problem by Algorithm G BTA(Vn0

→Tr) via kmax steps of ALS iteration.

3. For m= 1, ..., M, perform the cascadic MGA Tucker approximation:

3a) Compute the initial guess for the side matrices on level mby interpola-

tion I(m−1,m)from level m−1 (using piecewise linear or cubic splines)

V(ℓ)=V(ℓ)

m=I(m−1,m)(V(ℓ)

m−1), ℓ = 1, ..., d.

3b) Starting with the initial guess V(ℓ)(ℓ= 1, ..., d) perform kmax steps of

the ALS iteration as in Step 2 of Algorithm G BTA (see §2).

4. Compute the core βby the orthogonal projection of Aonto Tn=⊗d

ℓ=1Tℓ

with Tℓ= span{v(ℓ)

ν}rℓ

ν=1 (see Remark 2.7),

β=A×1V(1)T×2... ×dV(d)T∈Br,

3.3 Multigrid accelerated BTA for the full format function related tensors

at the cost O(rdn).

The complexity of the MGA Tucker approximation by the ALS algorithm applied

to full format tensors is given in the following Lemma.

Lemma 3.2 Suppose that r2≤nmfor large m, then the numerical cost of Algo-

rithm MG G BTA is estimated by

WF→T=O(n3

Mr+n2

Mr2+nMr4+n4

0) = O(n3

Mr+n4

0).

Proof: In Step 2, the HOSVD on the coarsest grid level requires O(n4

0) operations

(which for large n=nmis negligible compared to other costs in the algorithm).

Next, for fixed n=nmthe assumption r2≤nimplies that at every step of the

ALS iterations the costs of the consequent contractions to compute the n×r2un-

folding matrix B(q)is estimated by O(n3r+n2r2), while the SVD of B(q), requires

O(nr4) operations. Summing up over the levels completes the proof, taking into

account that the Tucker core is computed in O(n3

Mr) operations.

3.3.1 Numerics on the MGA Tucker approximation (ρ1/3)

Figure 3.11 shows the numerical example of the MGA Tucker approximation to

fully populated tensors given by the 3D Slater function e−kxk, (x∈[−b, b]3,b=

5.0), sampled over large n×n×nuniform grids with n= 64,128,256 and 512.

The MATLAB times (min) of the MGA Tucker decomposition corresponding to

data in Figure 3.11 are shown in Figure 3.12. It is worth to note that the times for

the grid size 2563outperform essentially the existing benchmark high performance

algorithm based on the Newton-type method on the Grassman manifold presented

in [92] (maximal grid size there is less than 2563). Another efficient approach

for the Tucker decomposition based on the cross-approximation algorithm was

recently presented in [86].

The results on the Tucker approximation for the irrational function of electron

density ρ,ρ1/3, discretized over the tensor grids in [−b, b]3,b= 5.0, with n≤512,

are shown in Figure 3.13 (corresponding to CH4molecule). Notice that the low-

rank tensor representation of the “exchange” potential ρ1/3, is an important issue

in the density functional theory computations. As it was mentioned before, the

convergence upon the Tucker rank depends on physical “relevance” (regularity

properties) of the function. Our multigrid accelerated scheme requires SVD of

complexity O(nr2min{n, r2}), where r≪n, instead of O(n4) in the unigrid

approach. As a result, the admissible grid size for applicability of the MGA

Tucker decomposition to full-format tensors is limited only by the amount of

available computer memory for tensor representation, O(n3).

3 Multigrid Tucker approximation of function related tensors

2 4 6 8 10 12

10−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Tucker rank

Slater function, MGA Tucker, EFN (solid), EEN (dashed)

n=512

n=256

n=128

n=64

n=512

n=256

n=128

n=64

Figure 3.11: Convergence of the MGA Tucker approximation with respect to the

Tucker rank rapplied to the discretized Slater function (relative

Frobenius norm).

2 4 6 8 10 12

0.5

1.5

2.5

3.5

Tucker rank

minutes

Times for MGA Tucker, Slater function

n=512

n=256

n=128

Figure 3.12: MATLAB times for the MGA Tucker approximation for the dis-

cretized Slater function.

3.3 Multigrid accelerated BTA for the full format function related tensors

14 16 18 20 22

10−6

10−5

10−4

10−3

10−2

Tucker rank

relative error, ρ1/3 ,pseudodensity of CH4, kmax=4

n=128

n=256

n=512

Figure 3.13: Relative approximation error (Frobenius norm) of the MGA Tucker

approximation applied to the discretized “exchange” potential ρ1/3.

3.3.2 BTA of the electron density of Aluminium clusters

In our next example, we consider the Tucker approximation of the electron den-

sity of the Aluminium molecular clusters originating from the large scale finite

element (FE) calculations in the OFDFT method [32, 33]. By using the multigrid

accelerated Tucker decomposition procedure, the appropriate mapping subspaces

are computed for the data from the FE simulations interpolated to the uniform

3D Cartesian grid.

FE computations have been performed for two configuration of the Aluminium

lattice, further called cluster14 and cluster172, containing 14 and 172 atoms, re-

spectively. Figure 3.14 visualises the middle-point plane cross-section of electron

densities for these two configurations.

First, we interpolated (by cubic splines) the electron density from the initial

FE unstructured grid onto an nm×nm×nmCartesian grid for a sequence of

mesh parameters nm, thus obtaining 3D tensors Anm,m= 0,1, ..., M. Since

the accuracy level of the finite element discretization for the considered examples

was resolved on a Cartesian grid of size n3

M= 2003, there was no need for finer

interpolation grids. Then the tensor AMis approximated by the rank-(r, r, r)

Tucker model applying Algorithm MG G BTA to full-format tensors. We check

the Tucker approximation error versus the rank-parameter r, increasing from

r= 1,2,...to some predefined value.

Figure 3.15 shows the first 6 orthogonal vectors of the matrix V(1), for cluster14

(left) and cluster172 (right), respectively. We can observe that the shape of the

3 Multigrid Tucker approximation of function related tensors

−10 −5 0510

−10

−5

0.02

0.04

0.06

cluster1/sdv4 , initial ρ for Al, z=0

−10

−15

−10

−5

0.05

Al, ρ , cluster1/subdiv4, z=0

Figure 3.14: a) Pseudo-density of Aluminium cluster on the regular n⊗3-grid for

n= 200: cluster14 (left), cluster172 (right).

−8 −6 −4 −2 0 2 4 6 8

−0.4

−0.3

−0.2

−0.1

0.1

0.2

0.3

0.4

u1(1)

u2(1)

u3(1)

u4(1)

u5(1)

u6(1)

−15 −10 −5 0 5 10 15

−0.3

−0.2

−0.1

0.1

0.2

0.3

0.4 orthogonal vectors, ρ for Al, cluster3,subdiv3

Figure 3.15: a) Orthogonal vectors of matrices V(1) for cluster14 (left) and

cluster172 (right), (r= 6).

Tucker orthogonal basis functions “resolves“ the complicated shape of the con-

sidered lattice structure, providing good approximation properties already with

moderate Tucker ranks, as seen in the next Figure 3.16. This feature can be use-

ful for the construction of the small-size set of problem-adapted orthogonal basis

functions.

Figure 3.16 presents the absolute error in the Tucker approximation with the

rank r= 10 for the cluster14 (left) and for the cluster172 (right). The maximum

absolute approximation errors for both clusters are bounded as ≤10−4, which is

in the range of the error of the employed FE approximation, see [14]. We observe

convergence in the Frobenius norms EF N with respect to the Tucker rank rover

Cartesian grids of size 1013, 2013for cluster14 (left), and 913, 1813for cluster172

3.3 Multigrid accelerated BTA for the full format function related tensors

Figure 3.16: a) The absolute error of the rank -10 Tucker approximation to the

Aluminium cluster14 with n= 201 (left) in the cross-section S=

[−10,10] ×[−10,10] × {0}, and for cluster172 with n= 181 in the

cross-section S= [−18,18] ×[−18,18] ×{0}(right).

(right), correspondingly.

Figures in 3.17 demonstrate that even the Tucker rank r= 6 is sufficient to

represent the data from FE simulations with the absolute error up to 10−4÷10−5

(corresponding to relative accuracy 10−2÷10−3, see blue lines in Figure 3.17). For

the Tucker rank r= 8 and n= 200, the storage requirements are O(r3+ 3rn)∼

5300, in contrast to approximately 20MB of the original data.

We observe that the original tensor can be represented in the Tucker or canon-

ical tensor format using a relatively tiny number of coefficients. For example,

the original 3D tensor corresponding to the electron density of the Aluminium

cluster172 can be approximated (up to relative accuracy 10−3) either by r3

T= 83

basis functions constructed using the combinations of the orthogonal vectors in

Rnfrom the respective Tucker mapping, or by R≤r2

Tbasis functions using the

canonical vectors in Rn. Note, that the Tucker rank rto achieve the resolution

of FE scheme remains almost independent of the increasing number of atoms in

a cluster, as we already observed in the numerics on (almost) periodic structures

in Section 2.3.2. It indicates that the numerical complexity for solving larger

problems would increase only linearly in the univariate grid size n.

3 Multigrid Tucker approximation of function related tensors

2 4 6 8 10 12 14 16 18 20

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

Tucker rank

Al., cluster1 subdiv4, AR=20.0, n=51,101,201

2 4 6 8 10 12 14 16 18 20

10−6

10−5

10−4

10−3

10−2

10−1

100

Tucker rank

Aluminium el. dens., cluster3/subdiv4

EFN n=181

EFN n=91

EFN n=46

EEN n=181

EEN n=91

EEN n=46

Figure 3.17: Convergence of the Tucker tensor approximation for cluster14 (top)

and cluster172 (bottom).

4 TS computation of the Coulomb

and exchange Galerkin matrices

4.1 General remarks

In this section, we consider a novel approach for the numerical evaluation of the

Hartree and exchange integral operators in the Hartree-Fock equation. It is based

on the tensor-structured representation of the involved functions and operators on

3DCartesian grids and the agglomerated computations of the Hartree potential

and the Coulomb and exchange matrices using the tensor-structured operations

described in Section 2.4.2. This provides linear complexity scaling with respect

to the univariate grid size n.

Reduction of the computational complexity in electronic structure calculations

based on low-rank separable representations of functions is a usual practice in

quantum chemistry. But, the low rank concept is applied in the choice of well-

separable basis functions enabling the analytical evaluation of the one- and two-

electron integrals corresponding to the Hartree and the exchange operators. Usu-

ally, Gaussian-type orbital (GTO) basis sets are used for the evaluation of the

involved integrals [3, 53, 25, 20].

Our grid-based approach for the TS computations of functions and operators

in the Hartree-Fock equation is essentially different. We discretize the basis func-

tions on the n×n×nCartesian grid and perform integrations with the ag-

glomerated densities/orbitals in tensor-product format using efficient multilinear

algebra on low-rank tensors. Therefore, our main goal is to construct the numer-

ical algorithms which enable computations on fine enough spatial grids to fulfill

the accuracy requirements, and to introduce stable rank reduction schemes, also

conserving the required high accuracy.

For reducing the initial or intermediate ranks of the canonical tensors, we use

the combination of the MGA C2T with the consequent T2C transforms, also

scaling linearly in all input parameters. As shown in Section 2.3.3 on the examples

of the low-rank orthogonal Tucker tensor approximations to electron densities and

Hartree potentials of some molecules, exponential convergence with respect to the

Tucker rank is observed.

4 TS computation of the Coulomb and exchange Galerkin matrices

This enables an efficient tensor-product convolution for computation of the

Hartree potential using a collocation-type approximation via piecewise constant

basis functions on a uniform n×n×ngrid. In a similar way, the tensor-product

convolution is applied to the Hartree-Fock exchange. As it was already mentioned,

the complexity of the tensor product convolution introduced in [61] is of order

WC∗C=O(R1R2nlog n),

where R1and R2are the canonical ranks of 3D tensors representing the convolving

kernel and density, respectively, and nis the one-dimension grid size. Note that

the tensor-product convolution outperforms significantly the 3D FFT of complex-

ity O(n3log n), that is linear-logarithmic with respect to the volume size of the

3D tensor (see Table 4.1).

It should be noted, that our algorithms for the computation of the Coulomb

and the exchange matrices work as “black-box” schemes, since in the convolution

integrals (1.8) and (1.10) only a pair of 3D tensors is involved - the grid-based

electron density/molecular orbitals and the Newton kernel. Therefore, there are

no restrictions on grid-based basis functions which can be used for the grid rep-

resentation of the electron density and orbitals.

Notice, that due to the grid-based concept, our approach is very flexible con-

cerning the character of the chosen Galerkin basis set, see properties (A)–(D) in

Section 5.1.3. In the present computations, we use the GTO basis sets, mostly,

due to possibility of easy evaluation of the computational accuracy for all impor-

tant entities (comparison with MOLPRO), beginning from the calculation of the

Hartree and exchange potentials, and up to energy estimates from the solution of

the Hartree-Fock equation in the tensor-structured format.

4.2 Accurate evaluation of the Hartree potential by

the tensor-product convolution in R3

We consider the computation of the Hartree potential

VH(x) := ZR3

ρ(y)

kx−ykdy =ρ∗1

k·k(x), x ∈R3,(4.1)

that is the convolution product of the Newton kernel with the electron density

ρ(y) = 2

Norb

a=1

(ϕa(y))2.(4.2)

We apply the discrete tensor-product convolution on tensor grids in R3, described

in [61].

4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3

In general, our computational scheme for the convolution integral (4.1) involves

the following steps (which will be considered further in details):

1. Compute the grid-based canonical representation of the electron density

of the molecule using the expansion coefficients for GTOs for the electron

orbitals, and by discretizing the corresponding GTO basis functions on n×

n×ntensor grid. In this way, the initial rank Rρ0of the canonical tensor

representing ρis of the order 103or even 104, depending on the molecule.

2. Reduce the rank of the electron density by consequent MGA C2T and T2C

transforms. Then the canonical rank of ρis reduced by an order of magni-

tude, which is sufficient for the fast discrete tensor-product convolution. As

shown in Section 3.2 the MGA C2T algorithm yields linear scaling in nand

the initial rank Rρ0. As a result of this step, we obtain the computationally

feasible rank Rρ≪Rρ0.

3. Compute the convolution (4.1) (representing the Hartree potential) in tensor-

product format using rank-RNcanonical representation of the Newton po-

tential with RNabout 20 ÷30 and canonical representation of the electron

density obtained in Step 2.

We use the total electron density ρ(y), y= (y1, y2, y3)∈R3, in a form including

rank-R0separable representation of orbitals in GTO basis set,

ϕa(x) =

k=1

ckagk(x), a = 1, ..., Norb,(4.3)

so that ρ(y) is given by the polynomial-exponential sum

ρ(y) := 2

Norb

ν=1 R0

k=1

ckν

ℓ=1

(yℓ−Aℓ

k)βℓ

ke−αkky−Akk2!2

,(4.4)

where Ak= (A1

k, A2

k, A3

k)∈R3(k= 1, ..., R0) correspond to the locations of atoms

in a molecule, βℓ

k∈N0,Norb is the number of electron pairs, and R0is the number

of GTO basis functions. For example, the number of GTO basis functions is given

by R0= 55,41 and 96 for CH4, H2O and C2H6molecules, respectively.

We assume that a particular molecule is embedded in an appropriate compu-

tational box [−b, b]3. We use b= 10.6 atomic units (au) for H2O and b= 14.6

au for CH4and C2H6molecules. In the following, we represent the convolving

tensor corresponding to ρ(y) in the canonical format. Since the products of two

Gaussians in (4.4) can be written in the form of a single Gaussian by recomputing

the coefficients as

e−λ(y−a)2·e−β(y−b)2=eσ·e−γ(y−c)2, σ =λβ(a−b)2

λ+β, γ =λ+β, c =aλ +bβ

λ+β,

4 TS computation of the Coulomb and exchange Galerkin matrices

and taking into account the symmetry of the representation, we arrive at the

following bound on the initial rank of the input tensor

R≤Rρ0=R0(R0+ 1)

2.(4.5)

Hence, for the considered particular molecules, we have the following ranks of the

discrete canonical representation of the electron densities:

Rρ0,CH4= 1540, Rρ0,C2H6= 4656, Rρ0,H2O= 861.

In our computational scheme, we apply Algorithm MG C BTA for rank reduction

of the electron density tensor (see the discussion below).

We apply the collocation scheme to discretise the convolution product in tensor

formats (see [61]). To that end we introduce the equidistant tensor grid in the

computational box [−b, b]3,

ω3,n := ω1×ω2×ω3, ωℓ:= {−b+ (m−1)h:m= 1, ..., n + 1},(4.6)

with ℓ= 1,2,3, and mesh-size h= 2b/n, and define a set of collocation points

{xm} ∈ ω3,n,m∈ M := {1, ..., n + 1}3, composing the tensor grid ω3,n. For

technical reasons we further assume that n= 2k,k∈N. Now we choose a set

of piecewise constant basis functions {φi},i∈ I := {1, ..., n}3, defined as the

characteristic functions of elementary grid-cells in ω3,n. For a given continuous

function fsupported in the computational box [−b, b]3(i.e. f(y) = 0 for y∈

R3\[−b, b]3), let fi=f(yi), i∈ I, be the (collocation) representation coefficients

of fin the basis set {φi},

f(y)≈X

i∈I

fiφi(y), y ∈[−b, b]3,(4.7)

where yiis the midpoint of the grid-cell numbered by i∈ I (see (2.41) for explicit

definition).

Now the discrete collocation convolution scheme recovers approximately the

values of the exact convolution product

(f∗p)(x) = ZR3

f(y)p(x−y)dy, x ∈[−b, b]3,

in the set of collocation points {xm},

f∗p≈[wm]m∈M, wm:= X

i∈I

fiZR3

φi(y)p(xm−y)dy, xm∈ω3,n.

We refer to [61] concerning particular assumptions on the convolving density f

and convolving kernel function p.

4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3

In the following applications, our particular choice is specified by f(y) =

ρ(y), y ∈[−b, b]3, where ρis the electron density, and p(x−y) = 1/kx−yk,

being the classical Newton (Coulomb) potential. To compute the integral trans-

forms, we apply the piecewise constant approximation to discretize the separable

GTO basis functions of the type

gk(y) =

ℓ=1

(yℓ−Aℓ

k)βℓ

ke−αkky−Akk2≡

ℓ=1

g(ℓ)

k(yℓ), y ∈R3, k = 1,...,R0,(4.8)

by rank-1 n×n×ntensors. The respective canonical vectors are obtained by sam-

pling the univariate functions g(ℓ)

k(yℓ) at the centers of intervals prescribed by the

tensor grid ω3,n. Specifically, we define the sampling points {yi= (y1

i1, y2

i2, y3

i3)}, iℓ∈

I={1,...,n}for ℓ= 1,2,3, where, as above, yiis the midpoint of the grid-cell

numbered by i∈ I.

The rank-1 tensor representing the single Gaussian gk,k= 1, ..., R0, is given

by the canonical rank-1 tensor

Gk≡[gi]i∈I =γ(1) ⊗γ(2) ⊗γ(3) ∈Vnwith entries gi=g(1)

i1·g(2)

i2·g(3)

i3,(4.9)

where

γ(ℓ)

k={γ(ℓ)

k,i }i∈Iℓ∈Vℓ, γ(ℓ)

k,i = (yℓ

i−Aℓ

k)βℓ

ke−αk(yℓ

i−Aℓ

k)2, ℓ = 1,2,3.

At the first step, the 3rd order coefficients tensor F= [fi]∈RIapproximat-

ing the electron density ρ, is represented by a rank-Rρ0canonical tensor in the

“discretized” GTO basis set1corresponding to (4.4).

At the second step, we precompute the coefficients tensor P= [pi]∈RI,

pi=ZR3

φi(y)

kykdy, i∈ I.(4.10)

The coefficient tensor P= [pi] corresponding to the Coulomb potential 1

kx−yk

is approximated in the rank-RNcanonical tensor format using optimised sinc-

quadratures [98, 10, 61], where the rank parameter RN=O(|log ε|log n) depends

logarithmically on both the required accuracy ε > 0 and the grid-size n. In all

computations presented below we choose the tensor rank in the range RN∈

[20,30] to ensure the desired accuracy of order 10−4÷10−6.

1To reduce the numerical complexity of the tensor-product convolution it is approximated

either in the rank-(r, r, r) Tucker format or via the canonical model with the tensor rank

R < Rρ0using the MGA canonical-to-Tucker-to-canonical rank reduction scheme described

in Section 3.

4 TS computation of the Coulomb and exchange Galerkin matrices

At the third step, following [61, 67], we compute the set of values wm,m∈ M,

by copying the corresponding portion of entries (centred at j=n) taken from the

discrete tensor convolution of 3rd order tensors,

F∗P:= {zj}, zj:= X

i∈I

fipj−i+1,j∈ J := {1, ..., 2n−1}3,(4.11)

where the sum is over all i,j∈ I, which lead to legal subscripts for pj−i+1, i.e.,

j−i+1∈ I. Specifically, we define wm=zm+n/2−1,m∈ M.

Remark 4.1 The tensor [wm]can be identified with the piecewise trilinear in-

terpolating function uniquely defined by its values at the grid (collocation) points

{xm},m∈ M. Sampling this interpolant at cell-centered points {yi},i∈ I,

defined above, we return the coefficient tensor V∈Vn=RIof the resultant ap-

proximate convolution, providing the representation in the same basis set {φi}as

for the initial electron density f=ρ. In the following, the resultant coefficients

tensor V∈Vnover the index set Iwill be denoted by V≡F∗TP≈f∗p.

Representing Fin the rank-Rcanonical format (see (2.6)) enables us to compute

F∗Pin the form (see Section 2.4.2)

F∗P=

k=1

m=1

ckbmu(1)

k∗v(1)

m⊗u(2)

k∗v(2)

m⊗u(3)

k∗v(3)

m,(4.12)

which leads to the cost

NC∗C=O(RRNnlog n).

Thus, computation of the convolution product with the rank R=Rρ0of the input

tensors has practical limitations since the exact rank of the resulting tensor is the

product of those for the convolving arrays. For reducing the tensor rank of the

convolving density, we apply the fast algorithm MG C BTA(CR,nM→TCR,r) for

the canonical-to-Tucker multigrid transform defined on the sequence of refined

grids, followed by the Tucker-to-canonical transform of the core tensor. In this

way, the canonical rank Rcan be reduced by the order of magnitude, from several

thousands to few hundreds or even tens, depending on the input data and the

required accuracy.

Table 4.1 shows the advantage of the fast tensor-product convolution method

with the rank recompression via the MG C BTA algorithm, compared with those

based on 3D FFT (of the complexity O(n3log n)). We present the MATLAB

CPU time for a high accuracy computation of the Hartree potential for the H2O

4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3

n312832563512310243204834096381923163843

3DFFT (sec) 4.3 55.4 582.8∼6000 – – – ∼2 years

ConvCC (sec) 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2

CR→Tr(sec) 1.5 3.8 8.2 16.4 32.0 64.0 136.0

Table 4.1: Comparison of the 3D FFT and the tensor-product convolution. The

bottom line shows the times for C2T rank reduction.

molecule on a Sun Fire X4600 computer with 2,6 GHz processor. The CPU time

for the FFT-based scheme with n≥1024 is obtained by extrapolation, with factor

10 for each step of grid refinement.

As it was mentioned, in general, the complexity of the single grid canonical-

to-Tucker transform depends polynomially on the parameters of the model, see

Algorithm C BTA (CR,n→TCR,r). The multigrid acceleration technique employed

by MG C BTA algorithm provides calculations of the 3D integral transform (4.1)

using benchmark grid sizes up to Nvolume = 163843in small computational times,

of the multigrid preprocessing and discrete 3D convolution. Figure 4.2 b) demon-

strates the linear scaling of the rank reduction algorithm in the grid-size n.

The approximation error of the discrete tensor-product convolution V(n)

Vapproximating the Hartree potential VH(x) (see Remark 4.1) and applied to

the particular class of electron density functions appears to be of order O(h2)

(in the Frobenius norm), see the error analysis in Theorems 2.2 and 2.3, [61].

Following [61], we apply the Richardson extrapolation technique to obtain higher

accuracy approximations of order O(h3) without extra computational cost. The

corresponding Richardson extrapolant V(n)

H,Rich approximating VH(x) over a pair

of nested grids ω3,n and ω3,2nand defined on the coarser n×n×ngrid is given

V(n)

H,Rich = (4 ·V(2n)

H−V(n)

H)/3 over the grid-points in ω3,n.(4.13)

Figure 4.1 illustrates the effect of the Richardson extrapolation in the compu-

tation of VHfor the pseudodensity case of CH4on small n×n×ngrids with

n= 112,224,448. Due to a distinct asymptotic convergence ratio of 4 in the

computations on dyadically refined grids, the Richardson method gives an im-

provement of accuracy from 10−3to 10−5÷10−6.

For the all electron case of moderate size molecules large grids are required to

enable high accuracy of the computations in the presence of cusps in the electron

density due to core electrons contribution. Figures 4.2 a), 4.3 a) and 4.4 a) show

the accuracy of the computation of the Hartree potential of the H2O, CH4and

C2H6molecules on logarithmic scale. We present the absolute error for VHin

the subinterval along the x-axis compared with the corresponding values of VH

4 TS computation of the Coulomb and exchange Galerkin matrices

−6 −4 −2 0 2 4 6

0.5

1.5

x 10−3

atomic units

abs. error

abs. error for VH in subinterval

n=448

n=224

n=112

−6 −4 −2 0 2 4 6

x 10−5

X: 5.551e−16

Y: 5.531e−06

atomic units

Richardson error

abs. Richardson error for VH

112−224

224−448

Figure 4.1: Absolute approximation error in the Hartree potential VHfor the

pseudo-density of CH4in the subinterval Ω = [−7,7] × {0} × {0}

(left) and the reduced error by Richardson extrapolation involving

two pairs of grids (right).

computed by the MOLPRO package. We compare the computational error for

the grid sizes n= 4096,8192 and for the corresponding Richardson extrapolation

of order O(h3).

We observe the accuracy 8 ·10−5hartree at the cusp region corresponding to

Carbon atom in CH4, 5 ·10−5hartree at the cusp region corresponding to the

Oxygen atom in H2O, and 5 ·10−6hartree at the cusps corresponding to the

Carbon atoms in C2H6. Note that maximum values of the Hartree potential at

the cusp points are VH(0,0,0) = 8.35, 11.73 and 9.49 hartree for the CH4, H2O

and C2H6molecules, respectively. This yields relative accuracy of the order 10−6

in the cusp region for the considered molecules.

Figures 4.2 b), 4.3 b) and 4.4 b) show the CPU times for the CH4, H2O and

C2H6molecules, respectively, for C2T preprocessing and 3D convolution. The

total time for VHat a fixed grid size n=nfconsists of a sum of preprocessing

times from all previous levels, starting from the initial grid with, say n0= 64, up

to nf, plus the convolution time for the level with n=nf.

The computational time depends strongly on the chosen accuracy of approx-

imation, which is prescribed by the multigrid parameters pand r, see Theorem

3.1. It should be noted that the CPU times can be crucially reduced if the ac-

curacy of the result is not so demanding, say up to 10−3. In the applications

discussed above the accuracy requirements are challenging, which leads to the in-

4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3

−6 −4 −2 0 2 4 6

10−4

10−3

hartree

Abs. approx. error, V

H for H2O

a) atomic units

n=4096

n=8192

Ri−4096−8192

2000 4000 6000 8000 10000 12000 14000 16000

univariate grid size n

minutes

H2O , rT=20

C−2−T time

3D conv. time

Figure 4.2: a) Absolute approximation error of the tensor-product computation

of the Hartree potential of the water molecule in the subinterval Ω =

[−6,6] ×{0}×{0}, and the corresponding CPU times (bottom).

4 TS computation of the Coulomb and exchange Galerkin matrices

−5 0 5

10−6

10−5

10−4

10−3

a) atomic units

hartree

abs. approx. error, V

H for CH4

n=4096

n=8192

Ri−8192−4096

2000 4000 6000 8000

b) univariate grid size n

minutes

C−2−T time

3D conv. time

Figure 4.3: a) Absolute approximation error in the Hartree potential of CH4

molecule measured in the subinterval Ω = [−8,8] ×{0}×{0}, and b)

the corresponding CPU times.

4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3

−4 −2 0 2 4 6

10−6

10−4

10−2

100

a) atomic units

hartree

C2H6, abs. appr. error for V

n=4096

n=8192

Ri−4096−8192

1000 2000 3000 4000 5000 6000 7000 8000

univariate grid size n

minutes

C2H6 , rT=30, p=10

C−2−T time

3D conv. time

Figure 4.4: a) Absolute approximation error in the Hartree potential of C2H6in

the subinterval Ω = [−5,7]×{0}×{0}, the corresponding CPU times

(bottom).

4 TS computation of the Coulomb and exchange Galerkin matrices

crease of parameters rand pin the MGA scheme, and consequently, the demands

on computational resources.

4.3 Tensor computation of the Coulomb matrix

The Coulomb (Galerkin) matrix J={Jkm}R0

k,m=1 for the Hartree potential VH(x)

with respect to the set of normalized Gaussians {egk}is given elementwise by

Jkm := ZR3egk(x)egm(x)VH(x)dx, k, m = 1,...R0.(4.14)

For given rank-1 tensors Gkrepresenting Gaussians {egk}on n×n×n-grid, and

the Hartree potential tensor V=V(n)

Hcomputed for the considered molecule as

above, we calculate the Galerkin Coulomb matrix {Jkm}using the tensor scalar

product and Hadamard product operations as discussed in Section 2.4,

Jkm := hGk, Gm⊙V(n)

Hi, k, m = 1,...R0.(4.15)

Since the molecular orbitals are specified by the Galerkin coefficients matrix C=

{cka} ∈ RR0×Norb as in (4.3), in the following, we denote the nonlinear dependence

in Jby J=J(C).

4.4 Numerics: the Coulomb matrices of CH4, C2H6

and H2O molecules

In this section we present numerical results on the tensor-structured computation

of the Coulomb matrix for some particular molecules. Next figures show the

difference between the reference matrices obtained by MOLPRO and our TS

calculations of the Coulomb matrix for H2O and small organic molecules in all

electron case. Figure 4.7 visualises [Jkm] for H2O and the absolute error of the

TS computations.

Figures 4.6 shows absolute values of [Jkm] for C2H6and the absolute approxima-

tion error of our TS calculations using the MGA parameters r= 26 and p= 18.

Figures 4.5 show the Coulomb matrix errors for computations over the grids with

the univariate sizes n= 4096 and n= 8192. Note, that for matrix elements the

decay of the error has a factor of 4. This relation is a prerequisite for efficiency

of the Richardson extrapolation (4.13). Then we obtain the resulting Coulomb

matrix errors shown in Figure 4.6. Note, that the entries of the Coulomb matrix

corresponding to cusp areas are approximated by the Richardson technique with

100

4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules

even better accuracy than other diagonal matrix entries, due to the almost exact

convergence factor of 4 observed.

Results for the Coulomb matrix of CH4molecule are shown in Figure 4.8.

The upper Figure shows the absolute values of the matrix entries. We perform

the multigrid tensor-structured calculations, using rank reduction with the MGA

parameters Tucker rank r= 24 and the number p= 16 of most important fibers.

Lower figure in 4.8 shows the ultimate approximation error in computations using

the Richardson extrapolation over the grids with n= 4096 and n= 8192.

Notice, that the errors for the diagonal matrix elements corresponding to cusp

areas, are below 4·10−6, 8·10−6and 8·10−5for the CH4, C2H6and H2O molecules,

respectively. The accuracy of the order of 10−5÷10−6is defined by the chosen error

control parameter ε > 0 in the tensor-structured computations of the Coulomb

matrix. By using smaller values of εat all steps of computations it is possible to

get better accuracies at the expense of the computation time.

101

4 TS computation of the Coulomb and exchange Galerkin matrices

Figure 4.5: Absolute approximation error in the Coulomb matrix entries of C2H6

computed with the grid sizes 40963(top) and 81923(bottom).

102

4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules

Figure 4.6: Absolute values of the Coulomb matrix entries of the C2H6molecule

(top) and absolute approximation error of calculations using MGA

rank reduction (bottom).

103

4 TS computation of the Coulomb and exchange Galerkin matrices

Figure 4.7: a) Coulomb matrix (absolute values) for H2O, b) the absolute ap-

proximation error of the multigrid tensor product computation of the

Coulomb matrix for H2O.

104

4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules

Figure 4.8: CH4molecule. Top: absolute values of the Coulomb matrix entries.

Bottom: absolute approximation error in the Coulomb matrix ob-

tained by the MGA tensor-structured method (r= 24, p= 16),

with the Richardson extrapolation over the grids with n= 4096 and

n= 8192.

105

4 TS computation of the Coulomb and exchange Galerkin matrices

Figure 4.9: Absolute approximation error in the Coulomb matrix entries of CH4

molecule computed with the grid size 163843.

4.5 Agglomerated computation of the Hartree-Fock

exchange

In this section, we consider the tensor product approximation of the nonlocal

(integral) exchange operator Kin the Hartree-Fock equation by the agglomer-

ated tensor-product operations used in Section 4.2 for the computation of the

Hartree operator. Note that the calculation of the exchange Galerkin matrix in

the Hartree-Fock equation is a challenging problem due to the nonlocal character

of the exchange operator

(Kψ) (x) :=

Norb

a=1 ZR3

ϕa(x)ϕa(y)

kx−ykψ(y)dy x ∈R3,(4.16)

which leads to the integration in six dimensions (see (4.17)). This problem is

usually solved analytically by evaluating the so-called two-electron integrals using

naturally separable basis sets like Gaussians, see [109, 78] and references therein.

Here, we propose and implement the grid-based evaluation of the Hartree-Fock

exchange (4.16) by using the tensor product approximation of the included oper-

ators and functions on the 3D Cartesian grid. We apply the fast tensor product

106

4.5 Agglomerated computation of the Hartree-Fock exchange

convolution for the multivariate functions in Rd(see Section 4.2), providing com-

plexity O(dn log n).

To cover the general case of molecular geometries, in the numerical examples,

we use equal grid sizes nfor three spatial dimensions (a cubic computational box)

and do not employ information on a symmetry of the molecules. Therefore, our

scheme works as a “grey-box” algebraic algorithm, where as the input data only

the discrete representation of the Galerkin basis functions is used. However, the

algorithm works as well with arbitrary n1×n2×n3grids.

Our initial algorithm for the evaluation of (4.16) has the complexity O(R2

0nlog n+

nef R4

0Norb), where nef ≪nis the “effective” univariate grid size, and R0is the

number of Galerkin basis functions. Here we reduce the constant in the linear com-

plexity scaling in nby truncating the effective size of the computation intervals,

where the values of rapidly decaying basis functions (in particular, Gaussians) are

less than a threshold controlling the accuracy of computations. Thus, we have for

the number of grid points in effective support of the interacting vectors, nef =αn,

with αmuch less than 1.

To reduce the R0-asymptotics to O(R3

0), we further apply the canonical-to-

Tucker algorithm for decreasing the ranks of intermediate results after every con-

volution step. The corresponding rank reduction algorithms are considered in

Section 3.

The accuracy of the computation of matrix entries on a particular grid is es-

timated in numerical experiments by O(h2), where h=O(n−1) is the step-size

of the grid. We achieve O(h3) accuracy in our evaluation of the exchange matrix

by using the Richardson extrapolation on a couple of consequent grids. Usually,

the size of the computational box for small organic molecules is in the range of

14 ÷20 ◦

A. Since the TS methods enable computations on huge 3D Cartesian

grids, the univariate grid step-sizes of applied discretizations range from h≈0.02

◦

Afor n= 1024, up to h≈0.0008 ◦

Afor the benchmark grids with the number of

entries n3= 163843.

4.5.1 Galerkin exchange operator in the Gaussian basis

The exchange Galerkin matrix K=Kex with respect to the set of normalized

“Cartesian Gaussians” {gk}k=1,...R0is given by

Kex ={Kij}R0

i,j=1, Kij := 1

2ZR3ZR3

gi(x)τ(x, y)

kx−ykgj(y)dxdy, i, j = 1,...R0,

(4.17)

107

4 TS computation of the Coulomb and exchange Galerkin matrices

where the density matrix τ(x, y) is defined in terms of electron orbitals as

τ(x, y) = 2

Norb

a=1

ϕa(x)ϕa(y),

over all occupied orbitals a.

The low cost of the three-dimensional convolution using the canonical repre-

sentation of the convolving tensors makes possible the agglomerated numerical

evaluation of the exchange matrix in the Fock operator. For this purpose, we

split the integration in (4.17) into the following steps. First, we compute the con-

volutions of the pointwise products of molecular orbitals with the vectors from

the normalized Gaussian basis set

Waj(x) = ZR3

ϕa(y)gj(y)

kx−ykdy a = 1,...,Norb, j = 1,...,R0.(4.18)

These quantities are further used for the calculation of contributions to the

Galerkin matrix entries from every orbital a,

Vij,a =ZR3

ϕa(x)gi(x)Waj(x)dx, i, j = 1,...R0.(4.19)

The entries of the exchange matrix are then the sums of the corresponding values

over all orbitals

Kij =

Norb

a=1

Vij,a, i, j = 1,...Norb.(4.20)

We compute the exchange matrix (4.17) by the numerical scheme (4.18) - (4.20)

using the discrete tensor product representation of arising functions and operators.

The occupied orbital of the molecule is considered as an expansion over the

basis set of separable continuous functions gk(x),

ϕa(x) =

k=1

ckagk(x), x = (x1, x2, x3)∈R3,(4.21)

where the basis functions gk,k= 1,...,R0, are represented as the rank-1 canon-

ical tensor products,

gk(x) = g(1)

k(x1)g(2)

k(x2)g(3)

k(x3),(4.22)

with 1,2,3 designating spatial dimensions. Hence, with a fixed set of Galerkin

basis functions {gk}, the exchange matrix Kdepends (nonlinearly) on the rep-

resentation coefficients matrix C={cka} ∈ RR0×Norb , that will be specified as

K=K(C).

108

4.5 Agglomerated computation of the Hartree-Fock exchange

4.5.2 Discrete computational scheme

GTOs are used as conventional basis sets in electronic structure calculations due

to their separability in spatial variables which is useful in the analytical evalua-

tion of the two-electron integrals in the calculation of the Hartree and exchange

potentials.

In the following, for numerical illustrations, we choose the discretized Gaussians

as vectors in the rank-1 canonical representations of the basis functions, mainly for

the sake of convenient verification of the computational results (the corresponding

Galerkin matrix) with the standart MOLPRO output [108].

The rank-1 GTO basis functions gk(x), k= 1,...R0, are given by (4.22) with

R= 1, where g(ℓ)

k(y(ℓ)) denotes the generalized univariate Gaussians. The uni-

variate Gaussians g(ℓ)

k(y(ℓ)), ℓ= 1,2,3, are functions with infinite support given

g(ℓ)

k(y(ℓ)) = (y(ℓ)−A(ℓ)

k)β(ℓ)

kexp(−αk(y(ℓ)−A(ℓ)

k)2), y(ℓ)∈R, αk>0,

where β(ℓ)

k= 0,1,... is the polynomial degree, and the points (A1

k, A2

k, A3

k)∈R3

specify the positions of nuclei in a molecule. In our scheme we use the discrete

basis functions (given by vectors of the canonical tensor representation (2.6))

which are constructed by discretizing the Gaussians on the given tensor grid by

using the associated piecewise constant basis functions, as in Section 4.2.

Assume that the molecule is embedded in a certain fixed computational box

[−b, b]3with a suitable b > 0. For simplicity of notation, we take nℓ=nequal

for all dimensions. We use the equidistant tensor grid ω3,n , see (4.6), with grid

points {xm},m∈ M := {1, ..., n + 1}3. We use a representation like (4.7) with

f(y) = gk(y), where the rank-1 coefficients tensor Gk=γ(1)

k⊗γ(2)

k⊗γ(3)

kis given

by the values of ℓ-mode functions g(ℓ)

kat the centers y(ℓ)

iℓof the intervals of the

univariate grid [x(ℓ)

iℓ, x(ℓ)

iℓ+1], iℓ= 1,...,n. This results in canonical vectors of

length nwith the entries {g(ℓ)

k(y(ℓ)

iℓ)}n

iℓ=1,

γ(ℓ)

k={g(ℓ)

k(y(ℓ)

iℓ)}n

iℓ=1 ∈Rn,for ℓ= 1,2,3, k = 1,...R0.(4.23)

such that we have Gk=γ(1)

k⊗γ(2)

k⊗γ(3)

k. Given the coefficients matrix C={cka}

(k, a = 1, ..., R0) corresponding to expansion (4.21), then by summing tensor prod-

ucts of the canonical vectors with the corresponding weights cka, we obtain the

discrete representation of the orbital ϕa,a= 1,...Norb, in the rank-R0canonical

format,

Ua=

k=1

ckaγ(1)

k⊗γ(2)

k⊗γ(3)

k, cka ∈R,(4.24)

109

4 TS computation of the Coulomb and exchange Galerkin matrices

where R0is the number of basis functions. This discretization can be considered

as a representation in the Galerkin set of basis functions {Gk}obtained by repre-

senting the initial continuous basis set {gk}via piecewise constant basis functions

{φi}on the uniform grid (see (4.23)).

We use the rank-RNcanonical tensor product representation of the coefficient

tensor P= [pi]∈RIfor the Newton potential 1

kx−yk, given in (4.10). As in

Section 4.2 this tensor is precomputed by using the optimized sinc-quadratures

[10, 61], where the rank parameter RN=O(|log ε|log n) depends logarithmically

on both the required accuracy ε > 0 and the univariate grid size n. Recall that for

our computations tensor P, representing the Newton potential has the canonical

rank in the range 20 ≤RN≤30, depending on the one-dimension grid size nand

accuracy requirements ε > 0.

Below, we present Algorithm 1 describing the computational scheme for evalu-

ation of (5.11) - (5.13) in tensor product format2.

Lemma 4.2 The complexity of Algorithm 1 for the computation of the exchange

Galerkin matrix Kex in the Hartree-Fock equation using the discretized GTO basis

is estimated by

WKex =O(NorbRN(R2

0nlog n+R4

0nef )).

Proof: This estimate includes the cost of the evaluation of convolutions in

(5.11) for every orbital, O(NorbRNR2

0nlog n), and the scalar product (5.12) of

the rank-/R0RN) tensor Θa,k with the products of the orbitals and Gaussians,

O(NorbRNR4

0nef ).

Since the canonical rank RNof tensor Pcorresponding to the Coulomb potential

depends only logarithmically on n, it can be treated as a constant in the following

complexity estimate.

Remark 4.3 Notice that the rank reduction of the canonical tensor Θa,k after

step (5.11) reduces the complexity to

WKex =O(NorbR3

0nef ).(4.25)

In the case of large molecules further optimization up to O(NorbR2

0nef )-complexity

is possible due to the rank reduction applied to the rank-R0orbitals (tensors Ua).

2The Hadamard product θa,j =Ua⊙Gjin the Algorithm 1 can be either (1) stored for all

vectors γkat step (A) or (2) recomputed before evaluation of the scalar products at step

large grids, O(R2

0n), we prefer the case (2).

110

4.5 Agglomerated computation of the Hartree-Fock exchange

Algorithm 1 Computation of the Exchange Matrix in Tensor Arithmetic

Input data: rank-R0canonical tensors Ua∈Vn,a= 1,...,Norb, specified by

the coefficients matrix C={cka} ∈ RR0×R0, rank-RNtensor P∈Vn,rank-

1canonical tensors Gk=γ(1)

k⊗γ(2)

k⊗γ(3)

k,k= 1,...R0, and the filtering

threshold εF>0.

(A0) Find effective supports σj⊂[−b, b]3for γj,j= 1,...,R0, by εF-

thresholding,

σj=σ(1)

j×σ(2)

j×σ(3)

j,where σ(ℓ)

j={i:|γ(ℓ)

j(y(ℓ)

i)| ≥ εF} ⊂ {1,...,n}, ℓ = 1,2,3.

for a= 1,...,Norb

for k= 1,...,R0

(A) Compute the Hadamard product θa,k =Ua⊙Gkof tensors Uaand Gkby

using (2.61).

(B) Compute the tensor convolution Θa,k =θa,k ∗Pby using (4.12).

for j= 1,...,R0

Ka,k,j =hθa,j,Θa,ki|σj,

end for j

end for k

end for a.

(D) Sum the matrix elements over all orbital indices, Kkj =PNorb

a=1 Ka,k,j, for

k, j = 1,...,R0.

Output data: the exchange matrix Kex(C) = {Kkj}R0

k,j=1.

Remark 4.4 The rank-R0tensors Ua,a= 1,...,Norb, representing the orbitals,

can be chosen as the Galerkin basis set {Ga},a= 1,...,Norb, where Norb is

usually much smaller than R0. This may relax the critical dependence, O(R4

and O(R3

0)as in Lemma 4.2 and Remark 4.3 above (see also Lemma 3.1 in [70]).

1. Reducing complexity by reduction of the initial rank RΘ=RNR0.

The maximal initial rank of tensor Θa,k at the step (B) in Algorithm 1 is given

by RΘ=RNR0. We perform the rank reduction for this tensor by the C2T

and T2C algorithms introduced and discussed in details in Sections 2 and 3. In

particular, it is shown that the multigrid version of the C2T algorithm applied to

the 3-rd order rank-Rcanonical tensors has linear complexity with respect to all

input parameters: the canonical rank R, the Tucker rank r, and the univariate

grid size n. Thus, we can reduce the complexity of Algorithm 1 to (4.25) by

111

4 TS computation of the Coulomb and exchange Galerkin matrices

CH4CH3OH C2H5OH

RΘ1250 1875 2775

rT= 12, ǫT≤10−7,RRED 80 90 110

coefR15 20 23

rT= 10, ǫT≤10−6,RRED 50 70 100

coefR25 26 27

Table 4.2: Rank reduction for Θa,k in computation of the exchange matrix for the

pseudopotential case of some molecules.

solely multilinear algebraic methods, which do not take into account any previous

knowledge on the molecular structure.

Table 4.2 shows the average rank reduction by the C2T and T2C algorithms

applied to tensor Θa,k in calculations for the molecules CH4, CH3OH and C2H5OH.

We present the reduced canonical ranks RRED (and respective Tucker ranks rT) of

the tensors corresponding to the largest value, over the parameters a= 1,...,Norb,

k= 1,...R0, to achieve the prescribed approximation error ǫT,

RRED = max

1≤a≤Norb,1≤k≤R0

RRED(a, k),

where RRED(a, k) denotes the reduced canonical rank of Θa,k, for a given ϕaand

gk. Table 4.2 gives also the corresponding reduction coefficient, coefR=RΘ

RRED .

2. Windowing procedure for fast computation of the scalar products.

We compute the algebraic tensor representation of the discrete electron orbitals

Uagiven by (4.24) using the coefficients of their representation in the discrete

Gaussian basis set γ(ℓ)

k. It turns out by the construction that most of γ(ℓ)

khave

local character (fast exponential decay) with respect to the size of the whole

computation domain [−b, b]3. Therefore we precompute effective supports of the

canonical vectors γ(ℓ)

kby truncating their parts, which are below some predefined

threshold ε > 0. We call it the “windowing” procedure for finding the active

interval for each Gaussian. In our case, the resulting effective vector size of the

canonical vectors is at average 3 times smaller than the corresponding grid size n

even for small molecules. The resulting “effective” univariate grid size is nef =αn,

with α=α(ε)<1. For example, for small molecules α∼0.2÷0.3 for ε= 10−5.

This leads to reduced cost of the scalar products with respect to the univariate

grid size n.

We expect much stronger windowing effect in the case of large molecules, while

in this case, it can be directly applied to the Hadamard products Ua⊙Gk.

112

4.5 Agglomerated computation of the Hartree-Fock exchange

Figure 4.10: Top: entries of the exchange matrix for the all electron case of H2O.

Bottom: absolute error in Kex ∈R41×41 extrapolated over n×n×n

3D Cartesian grids with n= 8192 and n= 16384.

113

4 TS computation of the Coulomb and exchange Galerkin matrices

Figure 4.11: Top: the entries of exchange matrix for CH4. Bottom: absolute ap-

proximation error in Kex computed on 3D grids with one-dimension

sizes n= 2048 and n= 4096.

114

4.5 Agglomerated computation of the Hartree-Fock exchange

Figure 4.12: Top: the exchange matrix for the pseudopotential case of CH3OH.

Bottom: absolute approximation error of computations on the n×

n×ngrids with n= 512 and n= 1024.

115

4 TS computation of the Coulomb and exchange Galerkin matrices

Figure 4.13: Absolute error in the tensor-product computation of Kex the pseu-

dopotential case of molecules CH4(top) and C2H5OH (bottom).

116

4.6 Numericals experiments

We tested the presented tensor-structured method in computation of the exchange

Galerkin matrix for the following molecules:

•all electron case : H2O (Norb = 5, R0= 41), CH4(Norb = 5, R0= 55);

•pseudopotential case: CH4(Norb = 4, R0= 50), CH3OH (Norb = 7, R0=

75) and C2H5OH (Norb = 11, R0= 111).

The calculations are performed on a standard SUN station using Matlab 7.6. In

presented figures we give the absolute difference of the results of our computations

with the corresponding exchange matrix calculated by the MOLPRO program

[108].

The computational box [−b, b]3for small molecules is in the range of 2b= 14 ◦

for H2O, and 2b= 20 ◦

Afor CH4, C2H5OH, and CH3OH. The developed tensor-

structured algorithm enables computation of the Hartree-Fock exchange on huge

n×n×n3D Cartesian grids, with the number of entries up to n3= 163843. This

corresponds to the usage of univariate mesh-sizes from h≈2·10−2for the grids

with n= 1024, to h≈8·10−4◦

Afor the grids with n= 16384.

4.6.1 All electron case

For molecules with moderate size R0of basis sets, like CH4or H2O the grid-sizes

up to n= 16384 are computationally feasible for MATLAB, that is equivalent

to computations with ∼4.4·1012 nodes in the volume. These grids provide

the resolution of the strong cusps in electron orbitals corresponding to the core

electrons in a molecule, thus enabling accurate computations of the exchange

matrix for all electron case.

Computation of the exchange Galerkin matrix for the all electron case of H2O

molecule is a challenging problem due to “sharp” Gaussians corresponding to

core electrons of the Oxygen atom. Figure 4.10 (top) shows the absolute values of

the exchange matrix entries for H2O, Figure 4.10 (bottom) presents the absolute

error of the tensor-structured computations of this matrix using the Richardson

extrapolation on n×n×n3D Cartesian grids with n= 8192 and n= 16384.

We achieve high accuracy 1.89·10−5in the “cusp area”, the remaining entries are

computed with the absolute error in the range of 10−6÷10−8.

Figure 4.11 (top) displays the absolute values of the exchange matrix of CH4

and Figure 4.11 (bottom) shows the absolute error in Kex reaching the accuracy

10−4by using the Richardson extrapolation on the grids with n= 2048 and

117

4 TS computation of the Coulomb and exchange Galerkin matrices

n364312832563512310243

H2O1 1.3 2.0 3.2 8.0

CH4(ps) 1 1.3 2.0 3.6 8.9

CH3OH (ps) 1 1.3 1.9 3.3 5.1

Table 4.3: Comparison of relative times.

n= 4096. Again, the matrix entries apart from the “cusp area” are computed

with much higher accuracy.

4.6.2 Pseudopotential case

We consider the pseudopotential case for larger molecules, achieving an accuracy

up to 10−6, using smaller 3D grids with one-dimension size n= 1024.

Figure 4.12 (top) shows entries of the exchange matrix of CH3OH molecule and

Figure 4.12 (right) displays that tensor-structured computations for this molecule

using the Richardson extrapolation on grids with n= 512,1024 yield an accuracy

∼10−5. Figure 4.13 shows the absolute error in Kexin the pseudopotential case of

the CH3OH molecule (top) and C2H5OH (bottom), correspondingly. For CH3OH

the Richardson extrapolation on two consequent grids with n= 512,1024 yields

the accuracy ∼10−5, while for C2H5OH we obtain 7 ·10−4, already on small 3D

grids with the one-dimension size n= 256,512.

Table 4.6.2 illustrates linear scaling of the relative one orbital computation

time, with respect to the one-dimension grid size n, in respective units of the

coarsest grid calculations (n= 64).

118

5 Solution of the Hartree-Fock

equation by multilevel TS

methods

The traditional numerical approaches in ab initio electronic structure calcula-

tions are based on the Galerkin approximation in the naturally separable GTO

basis and analytical evaluation of one- and two-electron integrals inherent to this

approach. Many years of development of rigorous schemes for the analytical

evaluation of the operators in the Hartree-Fock equation, yielded state-of-the-art

packages like GAUSSIAN, Abinit and MOLPRO [5, 108]. In analytical-based pro-

grams, elaborated by large scientific groups, precomputed parameters and “non-

zero” initial approximations are used for accelerating the iterative solution of the

nonlinear EVP (1.6).

The Hartree-Fock model presupposes at least cubic (or fourfold) scaling in the

number of the basis functions. When using the conventional Gaussian bases, the

number of basis functions rapidly increases for larger molecules, thus making this

model computationally unfeasible.

As a first remedy, there are simplified models based on the appropriately ad-

justed pseudopotentials and the grid-oriented methods over n×n×nspatial

grids. These include the traditional plane waves, wavelet or finite element dis-

cretizations, at the expense that scales linearly in the volume size, Nvol =n3,

[4, 5]. Accuracy of these approaches is limited due to the constraints on practi-

cally tractable grid size n≈500.

Here, we introduce the novel multilevel scheme for the numerical solution of the

Hartree-Fock equation using the grid-based tensor-structured methods described

in previous sections, and providing O(nlog n)-complexity, i.e., sublinear in the

volume, O(N1/3

vol ). The new concept for the numerical solution of the Hartree-Fock

equation is a “grey box” scheme based on a moderate number of problem-adapted

discrete Galerkin basis functions represented on 3D Cartesian grid, which are used

as the “global elements” with a low separation rank.

The core of our method is the tensor-structured computation of the electron

density and the Galerkin matrices of the nonlinear Hartree and (nonlocal) Hartree-

119

5 Solution of the Hartree-Fock equation by multilevel TS methods

Fock exchange operators (see Section 5.1.3) at all steps of iterations in the solu-

tion of the nonlinear EVP problem . Within the solution process, the multilinear

algebra operations such as the scalar and Hadamard products, the 3D convo-

lution transform, and the rank truncation are implemented with almost O(n)-

complexity. In view of linear scaling in nof the 3D tensor-structured arithmetic,

high accuracy is achieved due to computations over large n×n×ntensor grids

of size up to 163843entries. In electronic structure calculations, this implies fine

resolution with the mesh size h≈10−4◦

A, providing possibility for arbitrary

space orientation of a molecule in the computational box (like in the analytical

approaches).

The piecewise constant representation of the Galerkin basis functions, and the

electron density, leads to approximation error of order O(h2), in the Hartree and

exchange potentials, where h=O(n−1) is the respective mesh size. In turn, the

two-grid version of the computational scheme improves the convergence rate up

to O(h3), by using the Richardson extrapolation. These approximation properties

are verified by the numerical experiments.

For solving the discrete Hartree-Fock equation on a fixed grid, we apply the

self-consistent field (SCF) iteration based on the traditional direct inversion in the

iterative subspace (DIIS) scheme commonly used in the physical literature [89].

In this iteration scheme, the current update to the Fock matrix is obtained by

relaxation over its values at the previous steps. The discrete orbitals, represented

by the Galerkin coefficients vectors, are updated by direct diagonalization of the

arising system matrix at each iteration on nonlinearity.

To enhance the classical DIIS iteration in our tensor-structured grid-based ap-

proach, we propose the multilevel strategy, that provides fast and robust iterative

solution method for the Hartree-Fock equation discretized on a sequence of re-

fined grids. Iterations begin on the coarsest grid with the zero initial guess for

the Hartree and exchange Galerkin matrices, J(C) = 0, K(C) = 0. We switch

iteration from the coarser to finer grid using a grid-dependent termination crite-

rion, such that the initial guess for the Coulomb and exchange matrices, obtained

from previous coarse levels, ensures robust convergence on finer grids. Low cost

of the iterations on the coarse grid levels reduces essentially the overall numerical

complexity of the global solution process.

In general, the convergence proof for the nonlinear DIIS iteration is still an open

question [78, 20]. We observe in numerical experiments for several small organic

molecules, that our multigrid accelerated DIIS iteration exhibits fast and uniform

in nconvergence (linear convergence rate), so that the overall computational time

scales almost linearly in n.

It is worth to note that the current version of the method still scales cubically

120

5.1 Galerkin scheme for the Hartree-Fock equation

in the size of approximating basis. Hence, any algebraic optimisation of this

basis set within the solution process gives the new opportunity to high accuracy

ab initio computations for large molecules. The quadratic scaling in the size of

approximating basis might be possible in the framework of direct minimization

algorithms (see [95] for detailed discussion on the direct minimization).

5.1 Galerkin scheme for the Hartree-Fock equation

5.1.1 Problem setting

As it was already mentioned in the Introduction, the Hartree-Fock equation for

pairwise L2-orthogonal electronic orbitals ϕi:R3→R,ϕi∈H1(R3), is a nonlin-

ear EVP,

Fϕi(x) = λiϕi(x),ZR3

ϕiϕjdx =δij, i, j = 1, ..., Norb (5.1)

with the nonlinear Fock operator F,

F:= −1

2∆ + Vc+VH−K,

where δij denotes the Kronecker delta, and Norb is the number of electron pairs

in a molecule.

Here, an external nuclear potential Vcis defined by

Vc(x) = −

ν=1

Zν

kx−Aνk,(5.2)

where Mis the number of nuclei in a molecule, and Zν>0, Aν∈R3denote the

charge and spatial coordinates of the nuclei, respectively. The Hartree potential

VHdetermines the Coulomb interaction of every single electron with the field

generated by all electrons of the system,

VH(x) := ZR3

ρ(y)

kx−ykdy, x ∈R3,(5.3)

that corresponds to the convolution of the Coulomb potential with the electron

density

ρ(y) = 2

Norb

i=1

(ϕi(y))2.(5.4)

121

5 Solution of the Hartree-Fock equation by multilevel TS methods

The exchange part Kof the Fock operator is given by the convolution integral,

(Kϕ) (x) := 1

2ZR3

τ(x, y)

kx−ykϕ(y)dy, (5.5)

with the density matrix

τ(x, y) = 2

Norb

i=1

ϕi(x)ϕi(y).

5.1.2 Traditional discretization

Usually, the Hartree-Fock equation is solved by the standard Galerkin approxi-

mation of the initial problem (5.1) posed in H1(R3) (see [78] for more details).

For a given finite basis set {gµ}1≤µ≤R0,gµ∈H1(R3), the molecular orbitals ϕi

are expanded as

ϕi=

µ=1

cµigµ, i = 1, ..., Norb.(5.6)

To derive the equation for the unknown coefficients matrix C={cµi} ∈ RR0×Norb ,

one first introduces the mass (overlap) matrix S={sµν}1≤µ,ν≤R0, given by

sµν =ZR3

gµgνdx,

and the stiffness matrix H={hµν}of the core Hamiltonian part of the Fock

operator, H=−1

2∆ + Vc, that includes the kinetic energy of electrons and the

nuclear potential energy,

hµν =1

2ZR3∇gµ·∇gνdx +ZR3

Vc(x)gµ(x)gν(x)dx, 1≤µ, ν ≤R0.

The nonlinear terms representing the Galerkin approximation of the Hartree and

exchange operators are usually constructed by using the so-called two-electron

convolution integrals (cf. for example [78]), defined as

bµν,κλ =ZR3ZR3

gµ(x)gν(x)gκ(y)gλ(y)

kx−ykdxdy, 1≤µ, ν, κ, λ ≤R0.

Traditionally, the Hartree and exchange parts in the Fock operator are computed

by the analytical evaluation of one- and two-electron integrals, using the natu-

rally separable Cartesian Gaussians gk,k= 1,...,R0, represented in the rank-1

canonical tensor product form,

gk(x) = g(1)

k(x1)g(2)

k(x2)g(3)

k(x3), x ∈R3,

122

5.1 Galerkin scheme for the Hartree-Fock equation

with 1,2,3 designating the space dimensions.

Denoting by D= 2CCT∈RR0×R0the so-called density matrix, the R0×R0

Galerkin matrices J(C) for the Hartree, and K(C) for the exchange potentials,

can be defined as (cf. [78])

J(C)µν =

κ,λ=1

bµν,κλDκλ, K(C)µν =1

κ,λ=1

bµλ,νκDκλ.

The complete Fock matrix F=F(C) is then given by,

F(C) = H+G(C), G(C) = J(C) + K(C).(5.7)

The respective Galerkin system of nonlinear equations for the coefficients matrix

C∈RR0×Norb takes the form

F(C)C=SCΛ,Λ = diag(λ1,...,λNorb),(5.8)

CTSC =INorb ,

where the second equation represents the orthogonality constraints RR3ϕiϕj=δij,

with INorb being the Norb ×Norb identity matrix.

Since the traditional implementation of the SCF iteration for the Hartree-Fock

equation is based on the precomputed two-electron integrals, the complexity to

build up the matrix Gnormally scales as O(R4

0), that is dominated by computa-

tional cost for the exchange matrix K(C). In turn, the core Hamiltonian Hcan

be precomputed in O(R2

0) operations, hence, in the following, we focus on the

efficient Galerkin approximation of the nonlinear Hartree and exchange operators

to be updated at each step of SCF iteration.

The nonlinear system (5.8) can be solved by a certain SCF iteration, where

at each iterative step the respective linear eigenvalue problem has to be solved

with the updated matrix G(C). Given F(C), using the direct diagonalization for

solving the system (5.8) leads to the cost O(R3

0) at each iterative step.

An alternative approach can be based on the direct minimization of the Hartree-

Fock energy functional

IHF = inf (1

Norb

i=1 ZR3|∇ϕi|2+ZR3

ρVc+ZR3ZR3

ρ(x)ρ(y)−|τ(x, y)|2

kx−ykdxdy),

over ϕi∈H1(R3), under the orthogonality constraints in (5.1), see [95] for more

details.

123

5 Solution of the Hartree-Fock equation by multilevel TS methods

5.1.3 Novel scheme via agglomerated tensor-structured

calculation of Galerkin matrices

We introduce the novel tensor-structured method for the numerical solution of the

nonlinear Hartree-Fock equation, which does not require the analytical evaluation

of the two-electron integrals described in Section 5.1.2. In our approach, we use

the grid-based agglomerated calculation of the three- and six-dimensional inte-

grals in evaluation of the Hartree and exchange operators and the corresponding

Galerkin matrices J(C) and K(C).

The idea is based on a certain reorganisation of the standard computational

scheme described in Section 5.1.2. Specifically, instead of precomputing the full

set of two-electron integrals bµν,κλ and the elements of the density matrix D, we

use agglomerated representations for J(C) and K(C).

Now, let us recall some constructions from Section 4.2. We suppose that the

initial problem is posed in the finite volume box Ω = [−b, b]3∈R3subject to

the homogeneous Dirichlet boundary conditions on ∂Ω. For given discretization

parameter n∈N, introduce the equidistant tensor grid ω3,n with the mesh-size

h= 2b/n, as in (4.6). Then define the set of piecewise constant basis functions

{φi},i∈ I := {1, ..., n}3, associated with the respective grid-cells in ω3,n (in-

dicator functions), and introduce the set of collocation discretisations of GTO

basis functions, {gk}, represented in this basis by the rank-1 coefficients tensor

{Gk} ∈ RI(k= 1, ..., R0). The projected Newton potential is represented in the

basis set {φi}by the low-rank coefficients tensor PN∈RI.

Now the computational scheme for the Coulomb matrix can be written by the

following tensor operations:

ρ≈Θ :=

Norb

a=1 R0

κ,λ=1

cκacλaGκ⊙Gλ!,

and then by the tensor-product convolution (see Section 4.2),

VH=ρ∗1

k·k ≈Θ∗PN,(5.9)

with PN∈Vnbeing the projection tensor for the Coulomb potential. This implies

the tensor representation of the Coulomb matrix,

J(C)µν =hgµ(x)gν(x), VH(x)i

≈ hGµ⊙Gν,Θ∗PNi,1≤µ, ν ≤R0.(5.10)

We have rank(Gµ) = 1, while 3rd order tensors Θ and PNare approximated

by low-rank tensors (see Section 4). Hence, the corresponding tensor operations

124

5.1 Galerkin scheme for the Hartree-Fock equation

are carried out using fast multilinear algebra supplemented by the corresponding

rank optimization (tensor truncation).

In turn, as proposed in Section 4.5, we represent the matrix K(C) using tensor

operations by the following three loops. For a= 1, ..., Norb, and ν= 1, ..., R0,

compute the convolution integrals,

Waν(x) = ZR3

gν(y)

m=1

cmagm(y)

kx−ykdy

≈Υaν := "Gν⊙

m=1

cmaGm#∗PN,(5.11)

and then the scalar products (µ, ν = 1, ..., R0),

Kµν,a =ZR3"R0

m=1

cmagm(x)#gµ(x)Waν(x)dx

≈χµν,a := h"R0

m=1

cmaGm#⊙Gµ,Υaνi.(5.12)

Finally, the entries of the exchange matrix are given by sums over all orbitals,

K(C)µν =

Norb

a=1

Kµν,a, µ, ν = 1, ..., R0.(5.13)

This scheme gains from the efficient low-rank separable approximation of the

Newton kernel, the discretised electron density ρ(x), and of auxiliary poten-

tials Waν(x) at step (5.11), that ensures low complexity of the three-dimensional

tensor-structured operations including rank reduction algorithms.

Notice that the effective realization of such a concept is possible with more

general basis sets {gµ}, than those generated by analytically separable rank-1

GTO basis chosen above. The key features of the Galerkin basis functions gµ

should include: (A) approximability, i.e., the Galerkin approximation error over

the quantities in (5.8) is satisfactory, and (B) separability, i.e., the collocation

coefficients tensor Gµfor the basis function gµ(x), can be represented by the

rank-RGexpansion with small RG,

Gµ=

k=1

g(1)

µ,k ⊗g(2)

µ,k ⊗g(3)

µ,k, µ = 1, ..., R0,(5.14)

(see [70] for more detailed discussion on this topic). In this case, the tensor-

product schemes as above remain essentially the same except that the rank of the

collocation coefficients tensor Gµ(µ= 1, ..., R0) increases to RG.

125

5 Solution of the Hartree-Fock equation by multilevel TS methods

5.2 Multilevel tensor-truncated iteration via DIIS

5.2.1 General SCF iteration

The standard SCF algorithm can be formulated as the following “fixed-point”

iteration ([78]): Starting from initial guess C0, perform iterations of the form

FkCk+1 =SCk+1Λk+1,Λk+1 = diag(λk+1

1, ..., λk+1

Norb ) (5.15)

k+1SCk+1 =INorb ,

where the current Fock matrix Fk= Φ(Ck, Ck−1,...,C0), k= 0,1, ..., is specified

by the particular relaxation scheme. For example, for the simplest approach,

called the Roothaan algorithm, one has Fk=F(Ck). In practically interesting

situations this algorithm usually leads to “flip-flop” stagnation [78].

Recall that here, λk+1

1≤λk+1

2≤... ≤λk+1

Norb are Norb negative eigenvalues of the

linear generalized eigenvalue problem

FkU=λSU, (5.16)

and the R0×Norb matrices Ck+1 contain the respective Norb orthonormal eigen-

vectors U1, ..., UNorb . We denote by Ck+1 ∈RR0×R0the matrix representing the

full set of orthogonal eigenvectors in (5.16).

We use the particular choice of Fk,k= 0,1, ..., via the DIIS-algorithm, (cf.

[89]), with the starting value F0=F(C0) = H.

We propose the modification to the standard DIIS iteration, by carrying out

the iteration on a sequence of successively refined grids with the grid-dependent

stopping criteria. The multilevel implementation provides robust convergence

from the zero initial guess for the Hartree and exchange operators. The coarse-

to-fine grids iteration, in turn, accelerates the solution process dramatically due

to low cost of the coarse grid calculations.

The principal feature of our tensor-truncated iteration is revealed on the fast

update of the Fock matrix F(C) by using tensor-product multilinear algebra of

3-tensors accomplished with the rank truncation described above. Another im-

portant point is that the multilevel implementation provides simple and robust

scheme for construction good initial guess on the fine grid-levels.

5.2.2 SCF iteration by using DIIS scheme

For each fixed discretization, we use the original version of DIIS scheme (cf. [53]),

defined by the following choice of the residual error vectors (matrices),

Ei:= [CT

i+1F(Ci)Ci+1]|{1≤µ≤Norb;Norb+1≤ν≤R0}∈RNorb×(R0−Norb),(5.17)

126

5.2 Multilevel tensor-truncated iteration via DIIS

for iteration number i= 0,1, ..., k, that should vanish on the exact solutions of

the Hartree-Fock Galerkin equation due to the orthogonality property. Hence,

some stopping criterion applies to residual error vector Eifor i= 0,1,2, ....

The minimizing coefficient vector c:= (c0, ..., ck)T∈Rk+1 is computed by

solving the constrained quadratic minimisation problem for the respective cost

functional (the averaged residual error vector over previous iterands),

f(c) := 1

2



i=0

ciEi



F≡1

2hBc, ci → min,provided that

i=0

ci= 1,

where

B={Bij}k

i,j=0 with Bij =hEi, Eji,

and with Eidefined by (5.17). Introducing the Lagrange multiplier ξ∈R, the

problem is reduced to minimization of the Lagrangian functional

L(c, ξ) = f(c)−ξ(h1, ci−1),

where 1= (1, ..., 1)T∈Rk+1, that leads to the linear augmented system of equa-

tions

Bc−ξ1= 0,(5.18)

h1,ci= 1.

Finally, the updated Fock operator Fkis built up by

Fk=

k−1

i=0

copt

iFi+copt

kF(Ck), k = 0,1,2, ..., (5.19)

where the minimizing coefficients copt

i=ci(i= 0,1, ..., k) solve the linear system

(5.18). For k= 0 the first sum in (5.19) is assumed to be zero, hence providing

copt

0= 1, and F0=F(C0).

Recall that if the stopping criterion on Ck,k= 1, ..., is not satisfied, one updates

Fkby (5.19) and solves the eigenvalue problem (5.15) for Ck+1.

Note that in practice one can use the averaged residual vector only on a reduced

subsequence of iterands, Ek, Ek−1, ..., Ek−k0,k−k0>0. In our numerical examples

below, we usually set k0= 4.

5.2.3 Unigrid and multilevel tensor-truncated DIIS iteration

In this section, we describe the resultant numerical algorithm. Recall that the

discrete nonlinear Fock operator is specified by a matrix

F(C) = H+J(C) + K(C),(5.20)

127

5 Solution of the Hartree-Fock equation by multilevel TS methods

where Hcorresponds to the core Hamiltonian (fixed in our scheme) and the

discrete Hartree and exchange operators are given by tensor representations (5.10)

and (5.12), respectively.

First, we describe the unigrid tensor-truncated DIIS scheme.

Algorithm TT DIIS (Unigrid tensor-truncated DIIS iteration).

1. Given the core Hamiltonian matrix H, the grid parameter n, and the termi-

nation parameter ε > 0.

2. Set C0= 0 (i.e. J(C0) = 0,K(C0) = 0), and F0=H.

3. For k= 0,1, ..., perform

a) Solve the full linear eigenvalue problem of size R0×R0, given by (5.16), and

define Ck+1 as the matrix containing the Norb eigenvectors corresponding to

Norb minimal eigenvalues.

b) Terminate the iteration by checking the stopping criterion

kCk+1 −CkkF≤ε.

c) If kCk+1 −CkkF> ε, compute the Fock matrix

F(Ck+1) = H+J(Ck+1) + K(Ck+1)

by the tensor-structured calculations of J(Ck+1)and K(Ck+1), using grid-

based basis functions with expansion coefficients Ck+1, (see Section 4), up-

date the Fock matrix Fk+1 by (5.19), and switch to Step a).

4. Returns: Eigenvalues λ1, ..., λNorb and eigenvectors C∈RR0×Norb.

Figure 5.1 shows the convergence of Algorithm TT DIIS for the solution of the

Hartree-Fock equation in the pseudopotential case of CH4. It demonstrates that

the convergence history is almost independent on the grid size on the examples

with n= 64 and n= 256, correspondingly.

To enhance the unigrid DIIS iteration, we propose the multilevel version of Al-

gorithm TT DIIS defined on a sequence of discrete Hartree-Fock equations spec-

ified by a sequence of grid parameters np=n0,2n0,...,2Mn0, with p= 0, ..., M,

corresponding to the succession of dyadically refined spacial grids. To that end,

for ease of exposition, we also introduce the incomplete version of Algorithm

TT DIIS, further called Algorithm TT DIIS(k), which represents only its part

starting from iteration number k=k≥1. The input data for Algorithm

TT DIIS(k) include the current approximation Ckand a sequence of all already

precomputed Fock matrices, F0, F1, ..., Fk−1.

We sketch this algorithm as follows.

128

5.2 Multilevel tensor-truncated iteration via DIIS

Algorithm TT DIIS(k)(Incomplete unigrid tensor-truncated DIIS iteration).

1. Given the core Hamiltonian matrix H, the grid parameter n, the termination

parameter ε > 0,Ck, and a sequence of Fock matrices F0, F1, ..., Fk−1.

2. Compute J(Ck),K(Ck),F(Ck) = H+J(Ck) + K(Ck), and Fkby (5.19).

3. For k=k, k + 1, ..., perform steps a) - c) in Algorithm MTT DIIS.

We further assume that the core Hamiltonian His precomputed beforehand.

Algorithm MTT DIIS (Multilevel tensor-truncated DIIS scheme).

1. Given the core Hamiltonian matrix H, the coarsest grid parameter n0, the

termination parameter ε0>0, and the number of grid refinements M.

2. For p= 0, apply the unigrid Algorithm TT DIIS with n=n0,εp=ε0, and

return the number of iterations k0, matrix Ck0+1, and a sequence of Fock matrices

F0, F1, ..., Fk0.

3. For p= 1, ..., M, apply successively Algorithm TT DIIS(kp−1+ 1), with the

input parameters np:= 2pn0,εp:= ε02−2p,Ckp−1+1. Keep continuous numbering

of the DIIS iterations through all levels, such that the maximal iteration number

at level pis given by

kp=

p=0

mp,

with mpbeing the number of iterative steps at level p.

4. Returns: kM,CkM+1, and a sequence of Fock matrices F0, F1, ..., FkM.

Usually, we start calculations on an n×n×n3D Cartesian grid, with n0= 64,

and end up with maximum nM= 8192, for all electron case computations, or

nM= 1024, for the pseudopotential case. Further, in Section 5.3, we show by

numerical examples that in large scale computations the multilevel Algorithm

MTT DIIS allows us to perform most of the iterative steps on coarse grids thus

reducing dramatically the computational cost and, at the same time providing

a good initial guess for the DIIS iteration on nonlinearity at each consequent

approximation level.

The flow-chart in Figure 5.2 shows the general steps of the unigrid/multigrid

tensor-structured SCF algorithms.

5.2.4 Complexity estimates in terms of R0,Norb and n

The rest of this section addresses the complexity estimate of the multilevel tensor-

truncated iteration in terms of RN,R0,nand other governing parameters of the

algorithm. For the ease of discussion we suppose that rank(Gµ) = 1, µ= 1, ..., R0,

(see [70] concerning the more detailed discussion on general case of rank(Gµ)≥1).

129

5 Solution of the Hartree-Fock equation by multilevel TS methods

2 4 6 8 10 12 14 16 18 20

10−5

10−4

10−3

10−2

10−1

100

iterations

λ1, n1=64

5 10 15 20 25

10−5

10−4

10−3

10−2

10−1

100

iterations

the residual (λn − λn−1)

Figure 5.1: Convergence in the eigenvalues for the unigrid TT DIIS algorithm,

in the pseudopotential case of the CH4molecule: the univariate grid

size n= 64 (left) and n= 256 (right).

Lemma 5.1 Let rank(Gµ) = 1,µ= 1,...R0and rank(PN)≤RN≤CNorb.

Suppose that the rank reduction procedure applied to the convolution products Υaν

in (5.11) provides the rank estimate rank(Υaν)≤r0. Then the numerical cost of

one iterative step in Algorithm MTT DIIS at level p, can be bounded by

Wp=O(R0RNnplog np+R3

0r0Norbnp).

Assume that the number of multigrid DIIS iterations at each level is bounded by

the constant I0, then the total cost of Algorithm MTT DIIS does not exceed the

double cost at the finest level n=nM,2WM=O(I0R3

0r0Norbn).

Proof: The rank bound rank(Gk) = 1 implies rank(

m=1

cma Gm)≤R0. Hence,

the numerical cost to compute the tensor-product convolution Υaν in (5.11)

amounts to

W(Υaν) = O(R0RNnplog np).

Since the initial canonical rank of Υaν is estimated by rank(Υaν)≤R0RN, the

multigrid rank reduction algorithm, having linear scaling in rank(Υaν), see Sec-

tion 3, provides the complexity bound O(r0R0RNnp). Hence the total cost to

compute scalar products in χµν,a (see (5.12)) can be estimated by

W(χµν,a) = O(R3

0r0Norbnp),

which completes the first part of our proof. The second assertion is due to linear

scaling of the unigrid algorithm in np, that implies the following bound

n0+ 2n0+... + 2pn0≤2p+1n0= 2nM,

130

5.3 Numerical illustrations

J=0, K=0

0discretize GTOs

ε=εset 04−p

F=H

0FC = λS C

p=0 0

,,ε0

compute ρ, VH,J,

λF C =

make DIIS

) = C

||del(C

n= 2pn

p=p+1

pyes

,k=1

kKk

k+1 k+1 k+1

if k>k

k=k+1

< M

k+1

del(C − Ck

)|| < ε

Figure 5.2: The flow-chart of the tensor-structured SCF algorithm.

hence, completing the proof.

Remark 5.2 In the case of large molecules and RG=rank(Gµ)≥1, further

optimisation of the algorithm up to O(RNR2

0np)-complexity may be possible on

the base of rank reduction applied to the rank-RGR0orbitals and by using an

iterative eigenvalue solver instead of currently employed direct solver via matrix

diagonalization, or by using direct minimization schemes [95].

5.3 Numerical illustrations

5.3.1 General discussion

Our algorithm for ab initio solution of the Hartree-Fock equation in tensor-

structured format is examined numerically on some moderate size molecules. In

particular, we consider the all electron case of H2O, and the case of pseudopo-

tential of CH4and CH3OH molecules. In the present numerical examples, we

use the discretised GTO basis functions for reasons of convenient comparison of

the results with the output from the standard MOLPRO package based on the

analytical evaluation of the integral operators in the GTO basis.

131

5 Solution of the Hartree-Fock equation by multilevel TS methods

The size of the computational box [−b, b]3introduced in Section 5.1.3 varies

from 2b= 11.2◦

Afor H2O up to 2b= 16 ◦

Afor small organic molecules. The

smallest step-size of the grid h= 0.0013 ◦

Ais reached in the SCF iterations for

the H2O molecule, using the finest level grid with n= 8192, while the average step

size for the computations using the pseudopotentials for small organic molecules

is about h= 0.015 ◦

A, corresponding to the grid size n= 1024.

5.3.2 Multilevel tensor-truncated SCF iteration applied to

some moderate size molecules

We solve numerically the ab initio Hartree-Fock equation, by using Algorithms

TT DIIS and MTT DIIS presented in Section 5.2.3. Starting with the zero

initial guess for matrices J(C) = 0 and K(C) = 0 in the Galerkin Fock matrix

(5.7), the eigenvalue problem at the first iterative step (p= 0) is solved by using

only the Hpart of the Fock matrix in (5.7), that does not depend on the solution,

and hence, can be precomputed beforehand.

Thus, the SCF iteration starts with the expansion coefficients cµi for orbitals in

the GTO basis, computed using only the core Hamiltonian H. At every iteration

step, the Hartree and exchange potentials and the corresponding Galerkin matri-

ces, are computed using the updated coefficients cµi. The renewed Coulomb and

exchange matrices generate the updated Fock matrix to be used for the solution

of the eigenvalue problem. The minimization of the Frobenius norm of the virtual

block of the Fock operator evaluated on eigenvectors of the consequent iterations,

Ck, Ck−1, ..., is utilized for the DIIS scheme.

The multilevel solution of the nonlinear eigenvalue problem (5.8) is realised via

the SCF iteration on a sequence of uniformly refined grids, beginning from the

initial coarse grid, say, with n0= 64, and proceeding on the dyadically refined

grids, np=n02p,p= 1, ..., M. We use the grid dependent termination criterion

εnp:= ε02−2p, keeping a continuous numbering of the iterations.

Figure 5.6 (left) shows the convergence of the iterative scheme in the case of

pseudopotential of CH4. Convergence in the total Hartree-Fock energy reaching

the absolute error 9·10−6on the grid size n= 1024 is shown in Figure 5.6 (right).

The total energy is calculated by

EHF = 2

Norb

a=1

λa−

Norb

a=1 e

Ja−e

Ka

with e

Ja=hψa, VHψaiL2, and e

Ka=hψa,VexψaiL2, being the so-called Coulomb and

exchange integrals, respectively, computed in the orbital basis ψa(a= 1, ..., Norb).

132

5.3 Numerical illustrations

5 10 15

10−6

10−4

10−2

100

iterations

residual in HF EVP, all electron case H2O

128 256 512 1024 2048 4096 8192

10−4

10−3

10−2

10−1

100

101

univariate grid size

Figure 5.3: Multilevel convergence of the DIIS iteration applied to the all electron

case of H2O (left), and convergence in the energy in n(right).

Figure 5.4 (left) shows the linear scaling in n, corresponding to the CPU time of

one iteration. Figure 5.4 (right) shows the number of “effective” iterations counted

by rescaling the total computational time to the iteration time-unit observed at

each iterative step at the finest grid-level: one iteration at level pis counted with

the factor 2p−M. Figure 5.3 (left) shows convergence of the SCF iteration for the

200 400 600 800 1000

univariate grid size

minutes

time per SCF iteration

01234

10−4

10−3

10−2

10−1

100conv.in eff.iterations, CH4, pseudo, n=512

Figure 5.4: Linear scaling in n(left) and convergence in the effective iterations

(right).

all electron case of H2O. This challenging problem is solved efficiently due to the

usage of large 3D Cartesian grids up to the volume size Nvol = 81923. Figure 5.3

(right) shows convergence of the HF energy for the corresponding grid levels.

133

5 Solution of the Hartree-Fock equation by multilevel TS methods

Figure 5.7 presents the convergence history of the nonlinear SCF iteration for

CH3OH (top) and C2H5OH (bottom) molecules. Red and blue lines show the

convergence in the residual error, and in the largest eigenvalue, respectively.

5 10 15 20

10−3

10−2

10−1

100

iterations

EVP CH3OH , residual, nmax=2048

1 2 3 4 5 6

10−4

10−3

10−2

10−1

100

101

grid levels (level 6−>n=2048)

ENERGY CH3OH

Figure 5.5: Iteration history for CH3OH molecule: Residual (left) and the energy

(right).

5.3.3 Conclusions to Section 5

We present the grid-based tensor-truncated numerical method for the robust and

accurate iterative solution of the nonlinear Hartree-Fock equation at the cost

O(N1/3

vol ) in the volume size Nvol =n3. The computational scheme is based on the

discrete tensor representation of the Fock operator over the 3D Cartesian grid at

each step of the multilevel SCF iteration applied to the nonlinear 3D eigenvalue

problem. The storage request is roughly estimated by O(N1/3

vol ).

This scheme is neither restricted to the analytically separable basis functions

like GTO orbitals nor to the traditional plane waves approximations. The Galerkin

basis can be modified by adapting to the particular problem in the framework of

the tensor-structured solution process.

Further improvement of the algorithm toward the O(log n)-complexity on the

base of quantics-TT approximation [87, 83, 88, 64, 84, 71], may open new per-

spectives for efficient ab initio numerical simulation of complex molecules and for

the FEM-DFT computations of large molecular clusters.

The main computational blocks of the numerical scheme allow the natural par-

allelization on the level of matrix elements computation, rank decompositions,

and the multilinear tensor operations.

134

5.3 Numerical illustrations

2 4 6 8 10 12

10−4

10−3

10−2

10−1

100

iterations

P. CH4 abs. error , |λ − λn,it |

n=64 n=128 n=256

n=512

n=1024

12345

10−6

10−4

10−2

100

univariate grid levels

HF energy, HF pseudo, n5=1024

Figure 5.6: Multilevel convergence for the pseudopotential of CH4molecule (top),

and convergence of the HF energy in the grid levels (bottom). Level

1 corresponds to n= 64, while on the finest level 5 we have n= 1024.

135

5 Solution of the Hartree-Fock equation by multilevel TS methods

5 10 15 20 25 30

10−5

10−4

10−3

10−2

10−1

100

101

iterations

DeltaLam1

residual

5 10 15 20 25 30 35

10−5

10−4

10−3

10−2

10−1

100

101

iterations

DelLam1

residual

n=64

n=128

n=256

n=512

Figure 5.7: SCF iterations for the pseudopotentials of CH3OH, with the maximum

grid level n= 2048 (top) and C2H5OH with the maximum grid level

n= 512 (bottom).

136

6 Summary of main results

6.1 Brief summary

In the present dissertation we developed and analysed tensor numerical methods

and algorithms for efficient grid-based evaluation of multivariate functions and

integral operators. These methods are successfully applied to the solution of the

nonlinear Hartree-Fock equation in quantum chemistry. Here we present the sum-

mary of main results.

1. The developed tensor methods and algorithms.

•Mixed (two-level) best Tucker approximation (BTA) algorithm applied to

the rank-Rcanonical input, combined with the rank reduction algorithm

applied to the small-size rank-RTucker core.

•Reduced HOSVD algorithm (RHOSVD) of complexity O(dR2n), for fast

computation of the initial orthogonal vectors in the ALS iteration imple-

menting the mixed BTA.

•The multigrid accelerated (MGA) mixed best Tucker approximation method

based on the enhanced ALS iteration1applied to large rank-Rcanonical n×

n×n-tensors represented on a sequence of refined grids (O(Rrn) complexity,

where ris the maximal Tucker rank). In the case of a full format target

tensor the complexity of MGA BTA algorithm is linear in the volume size,

O(n3), instead of O(n4) for HOSVD;

•Accurate and fast rank-structured tensor computation of the Hartree po-

tential, with O(Rn log n) complexity, where Ris the rank of the electron

density. Main ingredients are the MGA BTA of the agglomerated electron

density, and the tensor-product convolution with the low-rank approximand

to the Newton potential. The computational complexity of the respective

1The MGA-ALS iteration has a three-fold gain: avoids the problem to compute HOSVD on

large-size tensors, provides good initial guess for ALS corrections, resulting in the only few

ALS iterations at each approximation level, and allows to optimize the ℓ-mode orthogonal

subspaces only on the reduced set of “most important fibers”.

137

6 Summary of main results

Coulomb matrix is proportional to R2

0n, where R0is the number of Galerkin

basis functions.

•Accurate tensor-structured computation of the Hartree-Fock exchange us-

ing agglomerated orbitals and applying enhanced MLA via the window-

ing and rank reduction techniques, providing the asymptotical complexity

O(R3

0nlog n);

•The multilevel tensor-truncated DIIS-based iterative method for the numer-

ical solution of the nonlinear Hartree-Fock equation disretised on n×n×n-

grid, that provides almost linear scaling in n,O(nlog n) (a “grey-box” al-

gorithm). The multilevel scheme has a four-fold effect: fast convergence

with zero starting values for the Coulomb/exchange matrices, good initial

guess at each grid level, improved approximation by the Richardson extrap-

olation, considerable reduction of the number of recomputed entries in the

exchange matrix by using filtering strategy on fine grid levels. Notice that

the existing benchmark grid-oriented solvers for the Hartree-Fock equation,

based on the full-grid representation, scale at least linear in the volume size,

O(n3);

•A 3D nonlinear EVP solver is implemented in MATLAB and verified (com-

pared with MOLPRO output) in ab initio solution of the Hartree-Fock equa-

tion for particular molecules, H2O (all electron case), and CH4, CH3OH,

C2H5OH (pseudopotential case).

2. Analysis of algorithms.

Error and complexity analysis of presented methods include the following results.

•Theorem 2.14 on the error bound of RHOSVD in Algorithm C2T applied

to a canonical tensor;

•Theorem 3.1 on the complexity of the MGA Algorithm C2T for a canonical

input tensor;

•Lemma 3.2 on the complexity of the MGA Algorithm F2T for full format

tensors.

•Lemma 4.2 on the complexity of the tensor-structured computational scheme

for the Hartree-Fock exchange matrix.

•Lemma 5.1 on the complexity of the multilevel tensor-truncated iterative

solver for the Hartree-Fock equation.

138

6.1 Brief summary

3. Numerical results.

All algorithms are implemented in MATLAB and verified by extensive numeri-

cal experiments in electronic structure calculations that confirm the theoretical

performance of the algorithms, e.g., the almost linear scaling in all important

rank and grid-discretization parameters. These algorithms enable using large 3D

Cartesian grids with the benchmark sizes up to Nvol = 163843, which ensures high

approximation accuracy (resolution) for an arbitrary orientation of a molecule in

the computational box.

4. Interesting observations based on numerical experiments.

In the numerical experiments, we found out that for the Tucker-type approxima-

tion of function related tensors (corresponding to the classical potentials, electron

densities, and Hartree potentials):

(a) the error of the Tucker approximation decays exponentially with respect to

the Tucker rank (known theoretical result for the Newton, Yukawa and Helmholtz

potentials);

(b) the orthogonal vectors of the decomposition are of special shape, that is al-

most independent of the discretization step and resolves the peculiarities of the

considered function;

character (up to certain threshold);

(d) the Tucker rank of the almost periodic 3D structures weakly depends on the

number of cells;

(e) the ALS Tucker iteration usually demonstrates fast and robust local conver-

gence (likely due to exponential decay of the Tucker approximation error, see item

(a)).

This research project addresses many new and challenging mathematical and

algorithmic problems to be considered in the future. These problems are related

to the developing efficient solution methods that are free of the “curse of dimen-

sionality”, to be applied to high-dimensional equations arising in modern scientific

computing.

139

6 Summary of main results

6.2 Presentations

The author presented the following talks (and posters) on the topics of disserta-

tion.

1. Numerical Solution of the Hartree-Fock Equation in the Multilevel Tensor-

Structured Format.

The 16-th Conference of the International Linear Algebra Society (ILAS).

Pisa, Italy, June 21-25, 2010.

2. Numerical Solution of the Hartree-Fock Equation by the Tensor-Structured

Methods. (Poster).

First Principles in Quantum Chemistry: From Elementary Reactions to

Enzyms. Bad Herrenalb, Germany, April 14-17, 2010.

3. Tensor-Structured Methods in Electronic Structure Calculations.

GAMM 2010. 81-st Annual Meeting of the International Association of

Applied Mathematics and Mechanics. Contributed Session S16. Karlsruhe

Germany, March 22-26, 2010.

4. Numerical Solution of the Hartree-Fock Equation by the Multilevel Tensor-

Structured Methods.

26-th GAMM-Seminar Leipzig on Tensor Approximations and High-Dimensional

Problems. Max-Planck-Institute for Mathematics in the Sciences, Leipzig,

22-24.02.2010.

5. Numerical Solution of the Hartree-Fock Equation by the Tensor-Structured

Techniques.

Invited talk at the Seminar of the Numerical Analysis Group. Mathematics

Institute, University of T¨ubingen, 22.10.2009.

6. Tensor Product Approximation of the Hartree and Exchange Operators in

the Hartree-Fock Equation.

The eighth European Conference on Numerical Mathematics and Advanced

Applications, ENUMATH 2009, Uppsala, Sweden, 29.06-3.07.2009.

7. Towards Solution of the Hartree-Fock Equation by the Tensor-structured

Methods.

Berlin-Leipzig Seminar, TU Berlin, May 18, 2009.

8. Accurate Solution of the Hartree-Fock Equation by the Tensor-structured

Methods.

140

6.2 Presentations

Evaluation Poster, Max-Planck-Institute for Mathematics in the Sciences,

Leipzig, April 28-30, 2009.

9. Efficient Tensor-product Approximation of the Hartree and Exchange Po-

tentials in the Hartree-Fock Equation.

Saxonian Theoretical Seminar, Theoretical Methods for Complex molecular

systems, Wilhelm-Ostwald Institute of Physical and Theoretical Chemistry,

University of Leipzig, February 26-27, 2009.

10. Multigrid Accelerated Tensor Approximation in Electronic Structure Calcu-

lations.

Workshop: Numerical Methods in Density Functional Theory, DFG Re-

search Center MATHEON, TU Berlin, 23-25 July 2008.

11. Multigrid Accelerated Tensor Approximation in 3D Electronic Structure Cal-

culations.

Max-Planck-Institute for Mathematics in the Sciences, Leipzig, July 15,

2008.

12. (with H.-J. Flad, B. Khoromskij and S.R. Chinnamsetty) Tensor Decompo-

sition in Electronic Structure Calculations on 3D Cartesian Grids.(Poster).

CompPhys07, 8th NTZ-Workshop on Computational Physics, University of

Leipzig, 29 November-01 December 2007.

141

6 Summary of main results

142

7 Appendix

7.1 Singular value decomposition and the best

rank-kapproximation of a matrix

Here we recall the standard theorem of the numerical analysis on the singular

value decomposition (SVD) [100].

Theorem 7.1 Let A∈Rm×n, with m≥n, for definiteness. Then there exist

U∈Rm×n,Σ∈Rn×nand V∈Rn×nsuch that

A=UΣVT,(7.1)

where Σis a diagonal matrix whose diagonal entries, σi,i= 1,2,...,n, are the

ordered singular values of A,σ1≥σ2≥... ≥σn≥0and UTU=Imand

VTV=In, with Indenoting the n×nidentity matrix.

The algebraic complexity of the SVD transform scales as O(mn2).

The best approximation of an arbitrary matrix A∈Rm×nby a rank-kmatrix

Ak(say, in Frobenius norm, that is kAk2

F=P

(i,j)∈m×n

ij) can be calculated by the

truncated SVD as follows. Let us consider the SVD of a matrix A=UΣVT, and

set Σk= diag{σ1,...σk,0,...,0}, then the best rank-kapproximation is given by

Ak:= UΣkVT=

i=1

σiuivT

where ui, viare the respective left and right singular vectors of A. The approxi-

mation error in the Frobenius norm is bounded by

kAk−AkF≤v

i=k+1

σ2

i.(7.2)

7.2 Reduced SVD of a rank-Rmatrix

Let us consider a rank-Rmatrix M=ABT∈Rn×n, with the factor matrices

A∈Rn×Rand B∈Rn×R, where R≤n. We are interested in the best rank

143

7 Appendix

rapproximation of M, with r < R. It can be implemented using the following

algorithm, that avoids the singular value decomposition of the target matrix M

with possibly large n.

This algorithm includes the following steps.

1. Perform the QR-decomposition of the side matrices,

A=QARA, B =QBRB,

with the unitary matrices QA, QB∈Rn×R, and the upper triangular matri-

ces RA, RB∈RR×R.

2. Compute the SVD of the core matrix, RART

B∈RR×R

RART

B=UΣVT,

with the diagonal matrix Σ = diag{σ1,...,σR}, and unitary matrices U, V ∈

RR×R.

3. Compute the best rank-rapproximation of the core matrix, UrΣrVT

r, by

extracting the submatrix Σr=diag{σ1,...,σr}in Σ, and the first rcolumns

Ur, Vr∈RR×rin the unitary matrices Uand V, respectively.

4. Finally, set the rank-rapproximation Mr=QAUrΣrVT

rQT

B, where QAUr

and QBVrare n×runitary matrices.

The approximation error is bounded by sR

i=r+1

σ2

i. The complexity of above

algorithm scales linearly in n,O(nR2) + O(R3). In the case R≪n, this reduces

dramatically the cost O(n3) of the truncated SVD applied to the full-format n×n

matrix M.

144

7.3 List of abbreviations

ALS alternating least squares (algorithm)

BTA best Tucker approximation

C2T (C→T) canonical to Tucker transform

C BTA BTA algorithm for the canonical target tensor

DIIS direct inversion of iterative subspaces

F2T (F→T) full format to Tucker transform

GBTA BTA algorithm for the full format tensor

HOSVD higher order singular value decomposition

MLA multilinear algebra

MGA multigrid accelerated BTA

MG C BTA multigrid accelerated BTA algorithm for the

canonical target tensor

MG G BTA multigrid accelerated BTA algorithm for the

full format tensor

MIFs most important fibers in MG C BTA

RHOSVD reduced higher order singular value decomposition

for the canonical target tensor

SCF self consistent field (iteration)

SVD singular value decomposition

T2C (T→C) Tucker to canonical transform

145

7 Appendix

146

Bibliography

[1] P.-A. Absil, R. Mahoni and R. Sepulchre. Optimization Algorithms on Matrix

manifolds. Princeton University Press, 2008.

[2] E. Acar, T. G. Kolda and D. M. Dunlavy. An Optimization Approach

for Fitting Canonical Tensor Decompositions. Technical Report Number

SAND2009-0857, Sandia National Laboratories, Albuquerque, NM and Liv-

ermore, CA, February 2009.

[3] J. Alml¨of, Direct methods in electronic structure theory. In D. R. Yarkony,

Ed., Modern Electronic Structure Theory, Vol. II, (World Scientific, Singa-

pore, 1995) pp. 110–151.

[4] L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. A. Ghasemi, A. Wil-

land, D. Caliste, O. Zilberberg, M. Rayson, A. Bergman, R. Schneider.

Daubechies wavelets as a basis set for density functional pseudopotential cal-

culations. J. Chem. Phys. 129 (2008) 014109.

[5] X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese,

L. Sindic, M. Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami,

Ph. Ghosez, J.-Y. Raty, D. C. Allan. First-principles computation of material

properties: the ABINIT software project. Comput. Mater. Sci. 25 (2002) 478.

[6] B.W. Bader and T.G. Kolda. MATLAB Tensor Classes for Fast Algorithm

Prototyping. SANDIA Report, SAND2004-5187, Sandia National Laborato-

ries, 2004.

[7] J. Ballani and L. Grasedyck. A Projection Method to Solve Linear Systems

in Tensor Format. Preprint MIS MPI 22/2010, Leipzig.

[8] M. Bebendorf. Hierarchical matrices. A means to efficiently solve elliptic

boundary value problems. Lecture Notes in Computational Science and En-

gineering, 63. Springer-Verlag, Berlin, 2008.

[9] M. Bebendorf. Adaptive Cross Approximation of Multivariate functions. Con-

structive Approximation. 2010, to appear.

147

Bibliography

[10] C. Bertoglio, B. N. Khoromskij. Low rank tensor-product approximation of

projected Green kernels via sinc-quadratures. Preprint 79/2008, MPI MIS

Leipzig, 2008.

[11] G. Beylkin and M.J. Mohlenkamp. Numerical operator calculus in higher

dimensions. Proc. Natl. Acad. Sci. USA 99 (2002) 10246–10251.

[12] G. Beylkin and M.J. Mohlenkamp. Algorithms for numerical analysis in high

dimension. SIAM J. Scientific Computing, 26 (6) (2005) 2133-2159.

[13] G. Beylkin, M.J. Mohlenkamp and F. P´erez. Approximating a wavefunction

as an unconstrained sum of Slater determinants. Journal of Math. Physics,

49 032107 (2008).

[14] T. Blesgen, V. Gavini and V. Khoromskaia. Tensor Product Approximation

of the Electron Density of Large Aluminium Clusters in OFDFT. Preprint

66/2009 MPI MIS Leipzig, 2009.

[15] D. Braess, W. Hackbusch. Approximation of 1/x by exponential sums in

[1,∞).IMA J. Numer. Anal. 25 (2005).

[16] E. Canc´es and Le Bris. On the convergence of SCF algorithms for the Hartree-

Fock equations. ESAIM: M2AN, vol. 34/4, 2000, 749-774.

[17] J.D. Carrol and J. Chang. Analysis of individual differences in multidimen-

sional scaling via an N-way generalization of ’Eckart-Young’ decomposition.

Psychometrika 35 (1970), 283-319.

[18] J.D. Carrol, S. Pruzansky, and J.B. Kruskal. CANDELINC: A general

approach to multidimensional anlysis of many-way arrays with linear con-

straints on parameters. Psychometrika, 45 (1980), pp. 3-24.

[19] S.R. Chinnamsetty, M. Espig, W. Hackbusch, B.N. Khoromskij, H-J Flad,

Kronecker tensor product approximation in quantum chemistry. Journal of

Chemical Physics 127, 084110 (2007).

[20] P.G. Ciarlet and C. Le Bris, eds. Handbook of Numerical Analysis. Vol. X,

Computational Chemistry. Elsevier, Amsterdam, 2003.

[21] L. De Lathauwer. Signal processing based on multilinear algebra. PhD Thesis.

Katholeke Universiteit Leuven, 1997.

[22] L. De Lathauwer. Decomposition of a higher-order tensor in block terms. Part

II: Definitions and uniqueness. Tech. Report no. 07-81, ESAT/SCD/SISTA,

K.U. Leuven, Belgium, 2007.

148

Bibliography

[23] L. De Lathauwer, B. De Moor, J. Vandewalle. On the best rank-1and rank-

(R1, ..., RN)approximation of higher-order tensors. SIAM J. Matrix Anal.

Appl., 21 (2000) 1324-1342.

[24] L. De Lathauwer, B. De Moor, J. Vandewalle. A multilinear singular value

decomposition. SIAM J. Matrix Anal. Appl., 21 (2000) 1253-1278.

[25] T.H. Dunning Jr. Gaussian basis sets for use in correlated molecular calcu-

lations. I. The atoms boron through neon and hydrogen. J. Chem. Physics,

90 (1989), 1007-1023.

[26] L. Elden and B. Savas. A Newton-Grassmann method for computing the best

multilinear rank-(r1, r2, r3)approximation of a tensor. SIAM J. Matrix Anal.

and Appl., Volume 31, Issue 2, pp.248-271 (2009).

[27] H.-J. Flad, W. Hackbusch, B.N. Khoromskij, and R. Schneider. Concepts

of Data-Sparse Tensor-Product Approximation in Many-Particle Modeling.

In: Matrix Methods: Theory, Algorithms, Applications, (dedicated to the

memory of Gene Golub) V. Olshevsky, E. Tyrtyshnikov eds., World Scientific,

2010, pp.313-347.

[28] H.-J. Flad, W. Hackbusch, and R. Schneider. Best N-term approximation

in electronic structure calculations: I. One-electron reduced density matrix.

ESAIM: M2AN 40, 49-61 (2006).

[29] H.-J. Flad, B.N. Khoromskij, D.V. Savostyanov, and E.E. Tyrtyshnikov. Ver-

ification of the cross 3d algorithm on quantum chemistry data. J. Numer.

Anal. and Math. Modelling, 4 (2008), 1-16.

[30] H.-J. Flad, R. Schneider, B.-W. Schulze. Asymptotic regularity of solutions

to Hartree-Fock equations with Coulomb potential. Math. Methods Appl. Sci.,

31 (2008), no. 18, 2172-2201.

[31] S. Friedland, V. Mehrmann, A. Miedlar, and M. Nkengla. Fast low rank ap-

proximation of matrices and tensors. 2008, www.matheon.de/preprints/4903.

[32] V. Gavini, J. Knap, K. Bhattacharya, and M. Ortiz. Non-periodic Finite-

element formulation of orbital-free density-functional theory. J. Mech. Phys.

Solids 55(4) (2007), 669-696.

[33] V. Gavini, K. Bhattacharya, and M. Ortiz. Quasi-continuum orbital-free

density-functional theory: A route to multi-million atom non-periodic DFT

calculation. J. Mech. Phys. Solids 55(4) (2007), 697-718.

149

Bibliography

[34] I.P. Gavrilyuk, W. Hackbusch and B.N. Khoromskij. Hierarchical Tensor-

Product Approximation to the Inverse and Related Operators in High-

Dimensional Elliptic Problems. Computing 74 (2005), 131-157.

[35] I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij. Tensor-product ap-

proximation to elliptic and parabolic solution operators in higher dimensions.

Computing 74 (2005), 131-157.

[36] I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij. Data-sparse approxi-

mation of a class of operator-valued functions. Math. Comp. 74 (2005), pp.

681-708.

[37] G.H. Golub, C.F. Van Loan. Matrix Computations. Johns Hopkins Univer-

sity Press, Baltimore, MD, 1996.

[38] L. Grasedyck. Hierarchical Singular Value Decomposition of Tensors. SIAM

J. Matrix Anal. Appl. 31:2029-2054, 2010.

[39] L. Grasedyck. Polynomial Approximation in Hierarchical Tucker Format by

Vector-Tensorization. Preprint 43 DFG 1324, submitted to BIT.

[40] M. Griebel and J. Hamaekers. Sparse grids for the Schroedinger equation.

M2AN, 41 (2007), pp. 215-247.

[41] W. Hackbusch. Hierarchische Matrizen: Algorithmen und Analysis. Springer

Verlag, Berlin 2009.

[42] W. Hackbusch. Fast and Exact Projected Convolution for Non-equidistant

Grids. Computing 80 (2007), 137-168.

[43] W. Hackbusch. Efficient convolution with the Newton potential in d dimen-

sions. Numerische Mathematik, 110 (2008) 4, p. 449-489.

[44] W. Hackbusch and B.N. Khoromskij. Low-rank Kronecker product approxi-

mation to multi-dimensional nonlocal operators. Part I. Separable approxi-

mation of multi-variate functions. Computing 76 (2006), 177-202.

[45] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker-product Approxi-

mation to Multi-dimensional Nonlocal Operators. Part II. HKT Representa-

tion of Certain Operators. Computing 76 (2006), pp. 203-225.

[46] W. Hackbusch and B. N. Khoromskij. Tensor-product Approximation to Op-

erators and Functions in High dimension. Journal of Complexity 23 (2007),

697-714.

150

Bibliography

[47] W. Hackbusch, B.N. Khoromskij and E.E. Tyrtyshnikov. Hierarchical Kro-

necker tensor-product approximations. J. Numer. Math. 13 (2005), 119–156.

[48] W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov. Approximate iteration

for structured matrices. Numer. Math., 109 (2008), 365-383.

[49] W. Hackbusch, B.N. Khoromskij, S. Sauter and E. Tyrtyshnikov. Use of Ten-

sor Formats in Elliptic Eigenvalue Problems. Preprint 78, MPI MIS, Leipzig

2008, submitted.

[50] W. Hackbusch and S. K¨uhn. A new scheme for the tensor representation. J.

Fourier Anal. Appl. 15 (2009), no. 5, 706-722.

[51] R.J. Harrison, G.I. Fann, T. Yanai, Z. Gan, and G. Beylkin. Multiresolution

quantum chemistry: Basic theory and initial applications. J. of Chemical

Physics, 121 (23): 11587-11598, 2004.

[52] E. Hayryan and V. Khoromskaia. Low-rank Approximation of the Electro-

static Potential of Proteins, 2010, (in progress).

[53] T. Helgaker, P. Jørgensen and J. Olsen. Molecular Electronic-Structure The-

ory. Wiley, New York, 1999.

[54] F.L. Hitchcock. The expression of a tensor or a polyadic as a sum of products.

J. Math. Physics, 6 (1927), 164-189.

[55] F.L. Hitchcock. Multiple invariants and generalized rank of a p-way matrix

or tensor. J. Math. Physics, 7 (1927), 39-79.

[56] M. Ishteva, L. De Lathauwer, P.-A. Absil and S. Van Huffel. Differential-

geometric Newton method for the best rank-(R1, R2, R3)approximation of

tensors. Numer. Algorithms 51 (2009), no. 2, 179-194.

[57] V. Khoromskaia. Computation of the Hartree-Fock Exchange in the Tensor-

structured Format. Computational Methods in Applied Mathematics, Vol.

10(2010), No 2, pp.204-218.

[58] V. Khoromskaia. Multilevel Tucker Approximation of 3D Tensors. 2010, in

progress.

[59] B. N. Khoromskij. Structured Data-sparse Approximation to High Order Ten-

sors arising from the Deterministic Boltzmann equation. Math. Comp. 76

(2007), pp. 1292-1315.

151

Bibliography

[60] B. N. Khoromskij. Structured Rank-(r1, ..., rd)Decomposition of Function-

related Tensors in Rd.Comp. Meth. in Applied Math., 6(2006), 2, 194-220.

[61] B. N. Khoromskij. Fast and Accurate Tensor Approximation of Multivariate

Convolution with Linear Scaling in Dimension. J. Comp. Appl. Math., 2010,

DOI:10.1016/j.cam.2010.02.004, to appear.

[62] B. N. Khoromskij. On Tensor Approximation of Green Iterations for Kohn-

Sham equations. Computing and Visualisation in Science, (2008) 11:259-271.

[63] B.N. Khoromskij. Tensor-Structured Preconditioners and Approximate In-

verse of Elliptic Operators in Rd.J. Constructive Approximation, 30:599-620

(2009).

[64] B.N. Khoromskij. O(dlog n)-Quantics Approximation of n-dTensors in

High-Dimensional Numerical Modeling. Preprint MPI MIS 55/2009, Leipzig

2009; J. Constructive Approximation, accepted.

[65] B.N. Khoromskij. An Introduction to Structured Tensor-Product Represen-

tation of Discrete Nonlocal Operators. Lecture Notes 27, MPI MIS, Leipzig

2005.

[66] B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Com-

puting: Survey on Recent Advances. Preprint MPI MIS 21/2010, Leipzig 2010

(submitted).

[67] B. N. Khoromskij and V. Khoromskaia. Low Rank Tucker Tensor Approxi-

mation to the Classical Potentials. Central European J. of Math., 5(3) 2007,

1-28.

[68] B.N. Khoromskij and V. Khoromskaia. Multigrid Tensor Approximation of

Function Related Arrays. SIAM J. on Scient. Computing, 31(4), 3002-3026

(2009).

[69] B. N. Khoromskij, V. Khoromskaia, S. R. Chinnamsetty, H.-J. Flad. Tensor

Decomposition in Electronic Structure Calculations on 3D Cartesian Grids.

J. of Comput. Physics, 228 (2009) 5749-5762.

[70] B.N. Khoromskij, V. Khoromskaia and H.-J. Flad. Numerical solution of

the Hartree-Fock equation in multilevel tensor-structured format. SIAM J. on

Scient. Computing, 33(1), 45-65 (2011).

152

Bibliography

[71] B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic

solution operators in higher dimensions. Preprint MPI MiS 79/2009, Leipzig

2009, submitted.

[72] B.N. Khoromskij, and Ch. Schwab. Tensor Approximation of Multi-

parametric Elliptic Problems in SPDEs. Preprint MPI MiS 9/2010, Leipzig

2010, SISC, accepted.

[73] T. Kolda. Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 23

(2001) 243-255.

[74] T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications.

SIAM Rev, 51(2009), no. 3, pp.455-500.

[75] D. Kressner and C. Tobler. Krylov subspace methods for linear systems with

tensor product structure. SIAM J. Matrix Anal. Appl., 31(4):1688-1714, 2010.

[76] P.M. Kroonenberg and J. De Leeuw. Principal component analysis of three-

mode data by means of alternating least squares algorithms. Psychometrika,

45 (1980) 69-97.

[77] J.B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decom-

positions, with applications to arithmetic complexity and statistics. Linear

Algebra Appl., 18 (1977) 95-138.

[78] C. Le Bris. Computational chemistry from the perspective of numerical anal-

ysis. Acta Numerica (2005), 363 - 444.

[79] Ch. Lubich. On Variational Approximations in Quantum Molecular Dynam-

ics. Math. Comp. 74 (2005), 765-779.

[80] Ch. Lubich. From quantum to calssical molecular dynamics: reduced models

and numerical analysis. Zurich Lectures in Advanced Mathematics, EMS,

2008.

[81] F.R. Manby, P.J. Knowles and A.W. Lloyd. The Poisson equation in density

fitting for the Kohn-Sham Coulomb problem. J. Chem. Phys. 115 (2001),

9144-9148.

[82] K.K. Naraparaju and J. Schneider. Generalized Cross Approximation for 3d-

tensors. Preprint 29/2010 MPI MIS, Leipzig, 2010.

[83] I.V. Oseledets. Compact matrix form of the d-dimensional tensor decompo-

sition. Preprint 09-01, INM RAS, Moscow 2009.

153

Bibliography

[84] I.V. Oseledets. Approximation of 2d×2dmatrices by using tensor decompo-

sition. SIAM J. Matrix Anal., 2009, accepted.

[85] I.V. Oseledets, D.V. Savostyanov, and E.E. Tyrtyshnikov. Tucker dimen-

sionality reduction of three-dimensional arrays in linear time. SIMAX, 30(3),

939-956 (2008).

[86] I. Oseledets, D. Savostyanov, E. E. Tyrtyshnikov. Linear algebra for tensor

problems.. Computing, 85 (2009), no. 3, 169-188.

[87] I. Oseledets and E. E. Tyrtyshnikov. Breaking the curse of dimensionality

or how to use SVD in many dimensions. SIAM J. Scient. Computing, 31

(2009), no.5, 3744-3759.

[88] I. Oseledets and E. E. Tyrtyshnikov. TT-cross approximation for multidi-

mensional arrays. Linear Algebra Appl. 432 (2010), no.1, 70-88.

[89] P. Pulay. Improved SCF convergence acceleration. J. Comput. Chem. 3, 556-

560 (1982).

[90] R. Polly, H.-J. Werner, F. R. Manby and P. J. Knowles. Fast Hartree-Fock

theory using density fitting approximations. Mol. Physics, 102:2311-2321,

2004.

[91] M. Reed, and B. Simon. Functional analysis. Academic Press, 1972.

[92] B. Savas, L.-H. Lim. Quasi-Newton methods on Grassmanians and multilin-

ear approximations of tensors. SIAM J. Scient. Computing, submitted.

[93] R. Schneider. Analysis of the projected coupled cluster method in electronic

structure calculation. Numer. Math. 113, (2009), no. 3, 433-471.

[94] R. Schneider. Multiscalen-Matrixkompression und Wavelet-Matrix-

kompression. Tuebner Verlag, 1998.

[95] R. Schneider, Th. Rohwedder, J. Blauert, A. Neelov. Direct minimization

for calculating invariant subspaces in density functional computations of the

electronic structure. Journal of Comp. Math. 27 (2009), no. 2-3, 360-387.

[96] J. Sielk, H. F. von Horsten, F. Kr¨uger, R. Schneider, B. Hartke. Quantum-

mechanical wavepacket propagation in a sparse, adaptive basis of interpolating

Gaussians with collocation. Physical Chemistry Chemical Physics, 2009, 11

pp.463-475.

[97] A. Smilde, R. Bro, P. Geladi. Multi-way Analysis. Wiley, 2004.

154

Bibliography

[98] F. Stenger. Numerical methods based on Sinc and analytic functions.

Springer-Verlag, Heidelberg, 1993.

[99] J. Strang and G.J. Fix. An Analysis of the Finite Element Method. Prentice-

Hall, inc. N. J., 1973.

[100] E. S¨uli, D. Mayers. An introduction to Numerical Analysis. Cambridge Uni-

versity Press, 2008.

[101] E. E. Tyrtyshnikov. A Brief Introduction to Numerical Analysis. Birkhauser,

Boston, 1997.

[102] E. E. Tyrtyshnikov. Mosaic-skeleton approximations. Calcolo 33, 47-57

(1996).

[103] E. E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton

method. Computing 64, 367-380 (2000).

[104] E.E. Tyrtyshnikov. Tensor approximations of matrices generated by asymp-

totically smooth functions. Sbornik: Mathematics 194, No. 5-6 (2003), 941–

954 (translated from Mat. Sb. 194, No. 6 (2003), 146–160).

[105] E.E. Tyrtyshnikov. Kronecker-product approximations for some function-

related matrices. Linear Algebra Appl. 379 (2004), 423–437.

[106] L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psy-

chometrika 31 (1966) 279-311.

[107] J. VandeVondele, M. Krack, F. Mohamed, M. Parinello, Th. Chassaing,

J. Hutter. QUICKSTEP: Fast and accurate density functional calculations

using a mixed gaussian and plane waves approach. Comp. Phys. Comm.,

167(2005) 103-128.

[108] H.-J. Werner, P.J. Knowles et al. MOLPRO, version 2002.10, a package of

ab initio programs for electronic structure calculations.

[109] T. Yanai, G. Fann, Z. Gan, R. Harrison and G. Beylkin. Multiresolution

quantum chemistry: Hartree-Fock exchange. J. Chem. Phys. 121 (14) (2004)

6680-6688.

[110] H. Yserentant. The hyperbolic cross space approximation of electronic wave-

functions. Numer. Math., 105 (2007) 659-690.

[111] H. Yserentant. Regularity and Approximability of Electronic Wave Func-

tions. Lecture Notes in Mathematics series, Springer-Verlag, 2010, to appear.

155

Bibliography

[112] T. Zhang and G. Golub. Rank-0ne approximation to high order tensors.

SIAM J. Matrix Anal. Appl. v. 23 (2001) 534-550.

156

Bibliography

Daten zum Autor:

Name Venera Khoromskaia

geboren 25. August 1952, in Stadt Kazan, Russland

1969-1974 staatliche Universit¨at Kazan, Physik Fakult¨at.

Abschluss: Dipl.- Phys.

1974 -1995 Joint Institute for Nuclear Research, Dubna, Russland

wiss. Mitarbeiter

1995 -2005 Aufenthalt in Deutschland (ohne Arbeitserlaubnis)

2001, 2002, 2003 wiss. Stipendien an der Universit¨at Leipzig,

Institut f¨ur Informatik

2006 - Max-Planck-Institut f¨ur Mathematik in den Naturwissenschaften

wiss. Mitarbeiter

Bibliographische Daten

Numerical Solution of the Hartree-Fock Equation by the Multilevel Tensor-structured

Methods.

(Numerische L¨osung der Hartree-Fock-Gleichung mit der Multilevel Tensor-struktur-

Verfahren.)

Venera Khoromskaia

Dissertation

157 Seiten, 58 Abbildungen, 112 Referenzen

Hiermit erkl¨are ich, die vorliegende Dissertation selbst¨andig und ohne unzul¨assige

fremde Hilfe angefertigt zu haben. Ich habe keine anderen als die angef¨uhrten

Quellen und Hilfsmittel benutzt und s¨amtliche Textstellen, die w¨ortlich oder sinn-

gem¨aß aus ver¨offentlichten oder unver¨offentlichten Schriften entnommen wurden,

und alle Angaben, die auf m¨undlichen Ausk¨unften beruhen, als solche kenntlich

gemacht. Ebenfalls sind alle von anderen Personen bereitgestellten Materialien

oder erbrachten Dienstleistungen als solche gekennzeichnet.

Leipzig, 21 September, 2010

............................................

(Venera Khoromskaia)

157