scieee Science in your language
[en] (orig)
Numerical solution of the Hartree-Fock equation
by multilevel tensor-structured methods
vorgelegt von Diplom-Physikerin
Venera Khoromskaia
Stadt Kazan, Russland
Von der Fakult¨at II - Mathematik und Naturwissenschaften
der Technischen Universit¨at Berlin
zur Erlangung des akademisches Grades
Doktor der Naturwissenschaften
Dr.rer.nat.
genehmigte Dissertation
Promotionausschuss:
Vorsitzender: Prof. Dr. J. Blath
Berichter/Gutachter: Prof. Dr. Reinhold Schneider
Berichter/Gutachter: Prof. Dr. Dr. h.c. Wolfgang Hackbusch
zus¨atzlicher Gutachter: Prof. Dr. Eugene Tyrtyshnikov
Tag der m¨undlichen Pr¨ufung: 10 December 2010
Berlin 2011
D 83
2
Acknowledgements
I would like to express my gratitude to Prof. Dr. Reinhold Schneider for su-
pervising my PhD project, for fruitful discussions and his friendly encouragement
during the work on this project. His interest and expertise in the research topics
related to the Hartree-Fock equation and the density functional theory motivated
much of the recent progress in the algebraic tensor methods in electronic structure
calculations.
I would like to thank Prof. Dr. Dr. h.c. Wolfgang Hackbusch for valuable
discussions and excellent conditions for performing research at the Max-Planck-
Institute for Mathematics in the Sciences.
This work is done due to fruitful collaboration with Dr. Heinz-J¨urgen Flad. I
appreciate very much his kind encouragement and support of my work and always
beneficial and stimulating discussions owing to his thorough expertise in modern
quantum chemistry.
Professional assistance of PD DrSci. Boris Khoromskij in the research on tensor
numerical methods helped me to make an active start in a new field. I acknowl-
edge him for the statement of some research problems and for proofreading the
manuscript.
I kindly acknowledge Prof. Eugene Tyrtyshnikov, Prof. Dr. Christian Lu-
bich and Prof. Dr. Lars Grasedyck, for their interest to my work and for the
opportunity to give talks at recent conferences and seminars on tensor methods.
I am very much appreciative to Prof. Dr. Ivan Gavrilyuk for his encouragement
and interest to my work.
I would like to thank my colleagues at the Max-Planck-Institute in Leipzig,
Dr. Ronald Kriemann, Dr. Jan Schneider, Dr. Kishore Kumar Naraparaju,
Dr. Sambasiva Rao Chinnamsetty, Dr. Hongjun Luo, Dr. Thomas Blesgen, Dr.
Lehel Banjai, Dipl.-Math. Konrad Kaltenbach, Dipl.-Math. Stephan Schwinger,
Dipl.-Math. Florian Drechsler and colleagues at the TU Berlin, Prof. Dr. Harry
Yserentant, Dipl.-Math. Fritz Kr¨uger and Dipl.-Math. Andr´e Ushmaev for inter-
esting discussions. I would like to acknowledge the colleagues from the Institute
of Numerical Mathematics of the Russian Academy of Science in Moscow, Dr.
Ivan Oseledets and Dr. Dmitrij Savostyanov for stimulating discussions.
I would like to thank Prof. Vikram Gavini (University of Michigan) for pro-
ductive collaboration and for the data on electron density of large Aluminium
clusters.
Kind assistance of the librarians Mrs. Britta Schneemann and Mrs. Katarzyna
Baier was very helpful during my work. I would like to thank cordially the
3
secretaries at the Max-Planck-Institute and TU Berlin, Mrs. Valeria H¨unniger
and Mrs. Susan Kosub for their helpful technical support.
4
Numerische osung der Hartree-Fock-Gleichung
mit mehrstufigen Tensor-strukturierten Verfahren
Venera Khoromskaia
Abstract der PhD Dissertation
Die genaue osung der Hartree-Fock-Gleichung (HFG), die ein nichtlineares Eigen-
wertproblem in R3darstellt, ist infolge der nichtlokalen Intergraltransformationen
und der scharfen Peaks in der Elektronendichte und den Molek¨ulorbitalen eine
herausfordernde numerische Aufgabe. Aufgrund der nichtlinearen Abh¨angigkeit
der Hamilton-Matrix von den Eigenvektoren, ist das Problem nur iterativ osbar.
Die traditionelle osung der HFG basiert auf einer analytischen Berechnung der
auftretenden Faltungsintegrale im R3mit Hilfe von dem Problem angepassten
Basen (so genannte Zwei-Elektron-Integrale). Die inh¨arenten Grenzen dieses
Konzepts werden wegen der starken Abh¨angigkeit der numerischen Effizienz von
der Gr¨oße und den Eigenschaften der analytisch separablen Basis sichtbar.
In dieser Dissertation wurden neue gitter-basierte mehrstufige Tensor-strukturierte
Verfahren entwickelt und anhand der numerischen osung der HFG getestet.
Diese Methoden beinhalten effiziente Algorithmen zur Darstellung diskretisierter
Funktionen und Operatoren in R3durch strukturierte Tensoren in den kanonis-
chen, Tucker- und kombinierten Tensorformaten mit einer kontrollierbaren Genauigkeit
sowie schnelle entsprechenden Operationen f¨ur multilineare Tensoren. Insbeson-
dere wird die beschleunigte Mehrgitter-Rang-Reduktion des Tensors vorgestellt,
die auf der reduzierten Singul¨arwertzerlegung oherer Ordnung basiert.
Der Kern der Anwendung dieser Verfahren f¨ur die osung der HFG ist die Ver-
wendung strukturierter Tensoren zur genauen Berechnung der Elektronendichte
und der nichtlinearen Hartree- und (nichtlokalen) Austauschoperatoren in R3,
die in jedem Iterationsschritt auf einer Reihenfolge von n×n×nkartesischen
Gittern darstellt wurden. Somit wurden die entsprechenden sechs-dimensionalen
Integrationen durch multilineare algebraische Operationen wie das Skalar- und
Hadamardprodukt, die dreidimensionale Faltungstransformation und die Rang-
Reduktion f¨ur Tensoren dritter Ordnung ersetzt, die ann¨ahernd mit O(n)-Komplexit¨at
implementiert wurden, wobei ndie eindimensionale Gittergr¨oße ist. Daher ist
der wesentliche Vorteil unserer Tensor-strukturierten Verfahren, dass die gitter-
basierte Berechnung von Integraloperatoren in Rd,d3, lineare Komplexit¨at in
nhat. Man beachte, dass im Sinne der ¨ubliche Abscatzung mittels des Gitter-
volumens Nvol =n3die Operationen mit strukturierten Tensoren eine sublineare
Komplexit¨at haben, O(N1/3
vol ).
Das vorgestellte ‘grey-box”’-Schema zur osung der HFG erfordert keine an-
alytischen Vorberechnungen der Zwei-Elektron-Integrale. Weiterhin ist dieses
5
Schema sehr flexibel hinsichtlich der Wahl der gitter-orientierten Basisfunktio-
nen.
Numerische Berechnungen am Beispiel des “all electron” Falls f¨ur H2O, CH4
und C2H6und des Pseudopotentialfalls f¨ur CH3OH and C2H5OH Molek¨ule zeigen
die geforderte hohe Genauigkeit.
Die Tensor-strukturierten Verfahren onnen auch zur osung der Kohn-Sham-
Gleichung angewandt werden, indem anstelle einer problem-unabh¨angigen Basis,
wie die der ebenen Wellen oder einer großen Anzahl finiter Elemente im R3, eine
geringe Anzahl problem-orientierter rang-strukturierter algebraischer Basisfunk-
tionen verwendet werden, die auf einem Tensorgitter dargestellt sind.
6
Numerical solution of the Hartree-Fock equation
by multilevel tensor-structured methods
Venera Khoromskaia
Abstract of PhD dissertation
An accurate solution of the Hartree-Fock equation, a nonlinear eigenvalue prob-
lem in R3, is a challenging numerical task due to the presence of nonlocal integral
transforms and strong cusps in the electron density and molecular orbitals. In
view of the nonlinear dependence of the Hamiltonian matrix on the eigenvec-
tors, this problem can only be solved iteratively, by self-consistent field iterations.
Traditionally, the solution of the Hartree-Fock equation is based on rigorous an-
alytical precomputation of the arising convolution type integrals in R3in the
naturally separable basis (so-called two-electron integrals). Inherent limitations
of this concept are evident because of the strong dependence of the numerical
efficiency on the size and approximation quality of the problem adapted basis
sets.
In this dissertation, novel grid-based multilevel tensor-structured methods are
developed and tested by a numerical solution of the Hartree-Fock equation. These
methods include efficient algorithms for the low-rank representation of discretized
functions and operators in R3, in the canonical, Tucker and mixed tensor formats
with a controllable accuracy, and fast procedures for the corresponding multilin-
ear tensor operations. In particular, a novel multigrid accelerated tensor rank
reduction method is introduced, based on the reduced higher order singular value
decomposition.
The core of our approach to the solution of the Hartree-Fock equation is the
accurate tensor-structured computation of the electron density and the nonlinear
Hartree and the (nonlocal) exchange operators in R3, discretized on a sequence
of n×n×nCartesian grids, at every step of nonlinear iterations. Hence, the
corresponding six-dimensional integrations are replaced by multilinear algebra
operations such as the scalar and Hadamard products, the 3D convolution trans-
form, and the rank truncation for 3rd order tensors, which are implemented with
an almost O(n)-complexity, where nis the univariate grid size. In this way, the
basic advantage of our tensor-structured methods is the grid-based evaluation of
integral operators in Rd,d3, with linear complexity in n. Note that in terms
of usual estimation by volume size Nvol =n3, the tensor-structured operations
are of sublinear complexity, O(N1/3
vol ).
The proposed “grey-box” scheme for the solution of the Hartree-Fock equation
does not require analytical precomputation of two-electron integrals. Also, this
scheme is very flexible to the choice of grid-based separable basis functions.
7
Numerical illustrations for all electron case of H2O, CH4, C2H6and pseudopo-
tential case of CH3OH and C2H5OH molecules demonstrate the required high
accuracy of calculations and an almost linear computational complexity in n.
The tensor-structured methods can be also applied to the solution of the Kohn-
Sham equation, where instead of problem-independent bases like plane waves or
a large number of finite elements in R3, one can use much smaller set of problem
adapted basis functions specified on a tensor grid.
8
Contents
1 Introduction 11
2 Tensor structured (TS) methods for functions in Rd,d327
2.1 Definitions of rank-structured tensor formats . . . . . . . . . . . . 28
2.1.1 Full format dth order tensors . . . . . . . . . . . . . . . . 28
2.1.2 Tucker, canonical and mixed (two-level) tensor formats . . 30
2.2 Best orthogonal Tucker approximation (BTA) . . . . . . . . . . . 34
2.2.1 General discussion . . . . . . . . . . . . . . . . . . . . . . 34
2.2.2 BTA algorithm for full format tensors . . . . . . . . . . . . 37
2.2.3 BTA for rank-Rcanonical input . . . . . . . . . . . . . . . 40
2.2.4 Mixed BTA for full format and Tucker tensors . . . . . . . 45
2.2.5 Remarks on the Tucker-to-canonical transform . . . . . . . 48
2.3 Numerics on BTA of function related tensors in R3........ 50
2.3.1 General description . . . . . . . . . . . . . . . . . . . . . . 50
2.3.2 Numerics for classical potentials . . . . . . . . . . . . . . . 51
2.3.3 Application to functions in electronic structure calculations 61
2.4 Tensorisation of basic multilinear algebra (MLA) operations . . . 65
2.4.1 Some bilinear operations in the Tucker format . . . . . . . 66
2.4.2 Summary on MLA operations in rank-Rcanonical format . 68
3 Multigrid Tucker approximation of function related tensors 71
3.1 Motivations .............................. 71
3.2 Multigrid accelerated BTA of canonical tensors . . . . . . . . . . 72
3.2.1 Basic concept . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.2.2 Description of the Algorithm and complexity bound . . . . 75
3.2.3 Numerics on rank reduction of the electron density ρ. . . 78
3.3 Multigrid accelerated BTA for the full format function related tensors 82
3.3.1 Numerics on the MGA Tucker approximation (ρ1/3) . . . . 83
3.3.2 BTA of the electron density of Aluminium clusters . . . . 85
4 TS computation of the Coulomb and exchange Galerkin matrices 89
4.1 Generalremarks............................ 89
9
Contents
4.2 Accurate evaluation of the Hartree potential by the tensor-product
convolution in R3........................... 90
4.3 Tensor computation of the Coulomb matrix . . . . . . . . . . . . 100
4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules 100
4.5 Agglomerated computation of the Hartree-Fock exchange . . . . . 106
4.5.1 Galerkin exchange operator in the Gaussian basis . . . . . 107
4.5.2 Discrete computational scheme . . . . . . . . . . . . . . . 109
4.6 Numericals experiments . . . . . . . . . . . . . . . . . . . . . . . 117
4.6.1 All electron case . . . . . . . . . . . . . . . . . . . . . . . 117
4.6.2 Pseudopotential case . . . . . . . . . . . . . . . . . . . . . 118
5 Solution of the Hartree-Fock equation by multilevel TS methods 119
5.1 Galerkin scheme for the Hartree-Fock equation . . . . . . . . . . . 121
5.1.1 Problem setting . . . . . . . . . . . . . . . . . . . . . . . . 121
5.1.2 Traditional discretization . . . . . . . . . . . . . . . . . . 122
5.1.3 Novel scheme via agglomerated tensor-structured calcula-
tion of Galerkin matrices . . . . . . . . . . . . . . . . . . . 124
5.2 Multilevel tensor-truncated iteration via DIIS . . . . . . . . . . . 126
5.2.1 General SCF iteration . . . . . . . . . . . . . . . . . . . . 126
5.2.2 SCF iteration by using DIIS scheme . . . . . . . . . . . . . 126
5.2.3 Unigrid and multilevel tensor-truncated DIIS iteration . . 127
5.2.4 Complexity estimates in terms of R0,Norb and n...... 129
5.3 Numerical illustrations . . . . . . . . . . . . . . . . . . . . . . . . 131
5.3.1 General discussion . . . . . . . . . . . . . . . . . . . . . . 131
5.3.2 Multilevel tensor-truncated SCF iteration applied to some
moderate size molecules . . . . . . . . . . . . . . . . . . . 132
5.3.3 Conclusions to Section 5 . . . . . . . . . . . . . . . . . . . 134
6 Summary of main results 137
6.1 Briefsummary ............................ 137
6.2 Presentations ............................. 140
7 Appendix 143
7.1 Singular value decomposition and the best rank-kapproximation
ofamatrix .............................. 143
7.2 Reduced SVD of a rank-Rmatrix.................. 143
7.3 List of abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . 145
Bibliography 145
10
1 Introduction
All truths are easy to understand once they are discovered;
the point is to discover them.
Galileo Galilei
The classical Hartree-Fock equation is one of the basic ab initio models in
electronic structure calculations. Accurate solution of the Hartree-Fock equation,
a nonlinear eigenvalue problem (EVP) in R3, is a challenging numerical task due
to the presence of nonlocal integral transforms and strong cusps in the electron
density and molecular orbitals. This nonlinear problem of finding the eigenvectors
(orbitals) of a molecular system is solved under the condition that also the electron
correlation part of the Hamiltonian matrix, depending on orbitals is unknown.
Therefore it is solved iteratively using the self-consistent field iteration (SCF)
method.
Traditionally, the solution of the Hartree-Fock equation is based on the analyti-
cal precomputation of the arising convolution type integral transforms in R3in the
problem adapted naturally separable basis (the so-called two-electron integrals).
This rigorous approach includes a number of efficient implementations which are
widely used in computational quantum chemistry. The success of the analytical
method stems from the big amount of precomputed physical information involved
in the computational scheme including the problem adapted basis. Inherent limi-
tations of this concept arise due to strong dependence of the numerical efficiency
on the size and quality of the chosen GTO-type basis sets.
In this dissertation, novel multilevel grid-based tensor-structured methods are
developed and tested by the numerical solution of the Hartree-Fock equation.
These methods include efficient algorithms for the low-rank tensor representa-
tion of functions and operators in R3in the canonical, Tucker and mixed tensor
formats with a controllable accuracy, and fast procedures for the corresponding
multilinear tensor operations. In particular, the novel multigrid accelerated ten-
sor rank reduction method is proposed, based on the coarse-level reduced higher
order singular value decomposition and selection of the most significant fibers. A
beneficial feature of this method is the employment of the interpolated orthogonal
basis functions on finer grid levels.
The core of our approach to the solution of the Hartree-Fock equation is the
accurate tensor-structured computation of the electron density and the nonlin-
11
1 Introduction
ear Hartree and nonlocal exchange operators in R3, discretized on n×n×n
Cartesian grids, at all steps of iterations on nonlinearity. Within the solution
process, the six-dimensional integrations are replaced by the multilinear alge-
bra operations such as the scalar and Hadamard products, the three-dimensional
convolution transform, and rank truncation, which are implemented with almost
O(n)-complexity. Efficient formatted” separable tensor representations of func-
tions and operators enable fast algebraic transforms with multidimensional data
arrays. The grid-based solution of the Hartree-Fock equation also benefits from
the multilevel arrangement of the conventional SCF iteration on a sequence of
refined grids.
In this way, the basic advantage of the tensor-structured methods is the grid-
based evaluation of the integral operators in Rd,d3, with linear complexity in
the univariate grid size n, see [61, 68]. Note that the commonly used notation
as “linear in the problem size” for the problems with d= 3, often means linear
complexity with respect to the volume size Nvol =n3. From this point of view
the tensor-structured operations are of sublinear complexity, that is O(N1/3
vol ).
High accuracy is achieved due to efficient rank optimization algorithms for 3rd
order tensors, which enable computations over the 3D Cartesian grids with up
to 1012 (163843) grid nodes, using MATLAB on a SUN station. In electronic
structure calculations, this implies a fine resolution with the mesh size h104
A, providing possibility for arbitrary space orientation of a molecule in the com-
putational box (like in the analytical approaches).
The proposed “grey-box” scheme for the solution of the Hartree-Fock equation
does not require an analytical precomputation of two-electron integrals. Also,
this scheme is very flexible to the choice of grid-based separable basis functions.
In the current implementation, the discretized Gaussians are used for the sake of
convienient comparison of the intermediate results with the benchmark programs.
The tensor-structured methods can be also applied to the solution of the Kohn-
Sham equation, where instead of problem-independent bases like plane waves or
a large number of finite elements in R3, one can use much smaller set of problem
adapted basis functions specified on a tensor grid.
1. Recent progress in multilinear algebra
Algebraic tensor algorithms for the low-rank approximation of multi-dimensional
data have been originally developed for the problems of chemometrics and signal
processing [112, 73, 97, 21, 22, 92], beginning from early papers [54, 55, 106, 17,
77, 18]. The higher order singular value decomposition (HOSVD) provides the
generalization of the singular value decomposition (SVD) of matrices [37]. The
12
theoretical basis for the HOSVD, that is the principal ingredient for the best or-
thogonal Tucker1approximation of higher order tensors, have been introduced in
[23, 24]. Comprehensive surveys on tensor decompositions with applications in
computer science are presented in [74] and [97].
Recently, a number of numerical methods based on the separation of vari-
ables have been proposed for the multivariate problems of scientific computing
[11, 41, 8, 94, 110, 111, 104, 102]. Extensive research on tensor approxima-
tion methods combined with the traditional numerical approaches, for example,
wavelets, plane waves and sparse matrices have opened new perspectives for a
feasible numerical solution of multi-dimensional problems arising in large scale
electronic and molecular structure calculations [12, 27, 28, 40, 79, 13, 80, 93].
First results on theoretical approval of the extension of the basic tools in the tra-
ditional numerical linear algebra towards the modern multilinear algebra (MLA)
have been presented in [104, 105, 47, 48]. Original works on the theory of sinc-
approximation of the multivariate functions [35, 36] gave a significant impact for
further development of the tensor methods in the problems of scientific computing.
Validity and efficiency of the Tucker-type approximations and combined tensor
formats to function related tensors in higher dimensions were demonstrated in
[104, 47, 60, 59, 86]. In this way, the main principles and concepts of the rep-
resentation of operators and functions in the rank-structured tensor formats as
well as the methods of the algebraic tensor computations have been understood
[34, 36, 46, 44, 45]. Nowadays this topic of research has attracted the interest of
several groups from linear algebra, optimization theory and scientific computing
communities, see [31, 19, 29, 85, 86, 87, 88] and [39, 7, 82, 66, 75, 71, 72].
Recent papers [60, 67, 68, 85, 70] demonstrated numerical efficiency of the
Tucker and canonical tensor decompositions for a wide class of the problems of
scientific computing in Rd, (d3) which are not feasible for the treatment by
conventional numerical methods due to their exponential complexity scaling in
dimension d. These tensor approximations reduce dramatically the complexity of
computations of the discretized multivariate operators and functions. Our main
difficulties in establishing applicability of the tensor algorithms, earlier approved
in chemometrics, for the treatment of higher order tensors in the problems of
numerical analysis and scientific computing have been related to accuracy issues
and to the challenge of large spatial grids required in the real-life applications.
This dissertation presents the tensor-structured methods2which were the topic
of the author’s work during 2006-2010, [67, 68, 69, 57, 70, 14]. Mostly, here
1Tucker-type decomposition, introduced by L. R. Tucker in 1966, see [106].
2We use the term tensor-structured methods as a common title to the numerical techniques
based on the principles of algebraically separable rank-structured tensor representations.
13
1 Introduction
the numerical applicability of the Tucker, canonical and mixed approximations
is considered in application to the problems in R3, arising in electronic structure
calculations. The numerical results described in these papers are based on the
original algorithms implemented in MATLAB and tested in molecular computa-
tions.
[67] B. N. Khoromskij and V. Khoromskaia. Low Rank Tucker-Type Tensor
Approximation to Classical Potentials.
Central European Journal of Mathematics v.5, N.3, 2007, pp.523-550,
(Preprint MPI MIS 105/2006).
[69] B. N. Khoromskij, V. Khoromskaia, S.R. Chinnamsetty, and H.-J. Flad.
Tensor Decomposition in Electronic Structure Calculations on 3D Cartesian
Grids. Journal of Comp. Physics, 228(2009), pp. 5749-5762, 2009,
(Preprint MPI MIS 65/2007).
[68] B. N. Khoromskij and V. Khoromskaia. Multigrid Accelerated Tensor
Approximation of Function Related Multi-dimensional Arrays.
SIAM Journ. on Scient. Comp., vol. 31, No. 4, pp. 3002-3026, 2009,
(Preprint MPI MIS 40/2008).
[57] V. Khoromskaia. Computation of the Hartree-Fock Exchange by the
Tensor-structured Methods.
Comp. Methods in Applied Math., Vol. 10(2010), No.2, pp.1-16.
(Preprint 25/2009, MPI MIS Leipzig, 2009).
[70] V. Khoromskaia, B. N. Khoromskij, and H.-J. Flad. Numerical Solution
of the Hartree-Fock Equation in the Multilevel Tensor-structured Format.
SIAM Journ. on Scient. Comp., v.33, No.1, pp.45-65, 2011.
(Preprint 44/2009, MPI MIS Leipzig, 2009).
[58] V. Khoromskaia. Multilevel Tucker Approximation of 3D Tensors.
2010, in progress.
[14] T. Blesgen, V. Gavini and V. Khoromskaia. Tensor Product Approxi-
mation of the Electron Density of Aluminium Clusters in OFDFT.
Preprint 66/2009 MPI MIS Leipzig, 2009.
[52] E. Hayryan and V. Khoromskaia. Low-rank Approximation of the Elec-
trostatic Potentials of Proteins, 2010, in progress.
14
The success of tensor methods based on the concept of the low-rank separable
approximation of multivariate functions in Rd, can be explained by their intrin-
sic nearly one-dimensional data structure organization. In fact, this allows us
to relax the so-called ”curse of dimensionality” inherent to traditional numerical
methods. The idea of the tensor-structured techniques is based on the search
and employment of normally hidden, well formatted data-sparse representations
of large and highly redundant data arising in the conventional computer represen-
tation of physically relevant functions in Rd. Hence, the rank-structured tensor
methods enable an efficient “compact” approximation of multivariate functions
and operators in Rd,d3, represented on large tensor grids of size nd.
Generally, the tensor methods employ the so-called orthogonal Tucker and
canonical models. The main benefit of the orthogonal Tucker approximation is the
robust construction of a problem adapted orthogonal basis that simultaneously
resolves the peculiarities of the approximated data with an optimal tensor rank.
In turn, the canonical tensor format is efficient in the bilinear rank-structured
tensor operations.
2. Basic rank-structured formats
Section 2 of the dissertation begins with the description of the full format dth
order tensors and respective bilinear operations necessary for the Tucker ALS
minimization problem: the contracted product of a tensor with a matrix, matrix
unfolding of tensors, and the scalar product. These operations can be understood
as higher-order analogues of the standard linear algebra operations for vector-
vector, matrix-vector and matrix-matrix calculations.
Discussion of the basic rank-structured tensor formats starts with the defini-
tions of the rank-Rcanonical and the orthogonal rank-(r1,...rd) Tucker decompo-
sitions. The efficient tools for the rank-structured approximation of tensors cannot
be derived via the straightforward extension of the algorithms of the classical nu-
merical linear algebra, like the singular value decomposition (SVD). Instead, one
arrives at the challenging nonlinear optimisation problems. As a starting point,
we recall the seminal theorem in [21], asserting that the minimization problem
of the Tucker approximation is equivalent to the dual maximization problem of
finding vectors of the orthogonal mapping dominating subspaces, which provide
the maximum norm of the Tucker core tensor over the product manifold of orthog-
onal matrices. We also remind the ALS iterative algorithm for the best Tucker
tensor approximation (BTA) applied to the full format tensors (F2T) and the cor-
responding theorems on the higher order singular value decomposition (HOSVD)
introduced in [21, 23, 24].
15
1 Introduction
Our Theorem 2.14 on the canonical-to-Tucker decomposition (C2T) describes
the reduced HOSVD (RHOSVD) applied to the canonical rank-Rtensors and
provides the error estimate with respect to the truncated SVD of the -mode side
matrices, supplemented by the corresponding complexity bounds.
The general ALS based BTA algorithm [23] provides the complexity of the
rank-(r1,...,rd) Tucker tensor decomposition of the order
WF2T=O(nd+1),(1.1)
applied to full format target tensors, and, as we show in [67], it amounts to
WC2T=O(Rn min{R, n}+rd1nmin{rd1, n}),(1.2)
operations with
r= max
r, = 1,...,d,
in the case of canonical rank-Rinput tensors.
Next, we describe the two-level (mixed) tensor format, based on our Lemma
2.16, which proves that the Tucker-to-canonical (T2C) approximation applied to
the Tucker decomposition of the full format tensor can be reduced to the canon-
ical approximation of a small-size core tensor. We extensively use this format
in electronic structure calculations, since the orthogonal Tucker decomposition is
employed as the principle rank reduction technique, while the canonical represen-
tation is efficient in tensor product operations
In the numerical examples in Section 2, the standard collocation discretization
has been used to represent the classical Newton potential, the Slater function, and
the Yukawa and Helmholtz kernels on the n×n×n3D Cartesian grids. Figures
demonstrate that for the function related tensors,
a) the error of the BTA decays exponentially in the Tucker rank,
b) the orthogonal vectors of the decomposition are of special shapes that resolve
the peculiarities of a function,
c) the core tensor (coefficients of the orthogonal Tucker transform) is of sparse
character (up to a certain threshold).
It is also shown, that for the tensors corresponding to the functions with periodic
cells (with bumps or singularities) in Rd,d= 3, the Tucker approximation error
does not depend on a number of cells in a periodic structure. As examples, we
consider cubic boxes with 27, 1000 and 4096 cells of the Slater function, see
Figures 2.12 and 2.13. Next, the numerical examples on the electron densities ρ
of some small organic molecules show that ρcan be efficiently approximated by
the low-rank Tucker format.
16
In Section 2.4, the description of basic bilinear operations in the rank-structured
tensor formats, including the scalar product, the Hadamard product and the con-
volution transform, is supplemented by the corresponding complexity estimates
and bounds on required storage. This part is completed by the summary of
tensor-structured operations in the canonical format which are of linear complex-
ity scaling with respect to the parameters of the target tensors.
Note that for the full format tensors the complexity of basic bilinear opera-
tions vary from Wfull =O(nd) to Wfull =O(n2d). Instead, for the canonical
tensors with ranks R1and R2, univariate grid size nand problem dimension d,
the complexity of bilinear tensor operations is (up to log nfactor)
Wcc=O(dnR1R2).(1.3)
Notice that in evaluation of the Hartree potential, one should take into account
large input ranks R1of the canonical tensor representing the electron density, see
(4.4), (4.5), and possibly large n. Moreover, when using consequent tensor-to-
tensor operations in the canonical format, tensor ranks are multiplied and hence
increase enormously after several operations even with moderate initial ranks.
To reduce large canonical ranks with controllable accuracy, we introduce the
C2T algorithm (see Algorithm C BTA in subsection 2.2.3) with the consequent
T2C transform. However, the conventional C2T and full-format-to-Tucker (F2T)
algorithms are not computationally feasible in electronic structure calculations on
large n×n×n3D Cartesian grids, since:
For C2T case, the complexity is (see (1.2))
WC2T=O(Rn min{R, n}+dr2nmin{r2, n}),(1.4)
i.e., polynomial (of degree at least 2) in rand either Ror n. To resolve
cusps due to core electrons in the electron density, both Rand nare of the
order 104.
For simultaneously large Rand nthe SVD in the RHOSVD might be com-
putationally unfeasible.
For F2T case, the complexity of the general HOSVD based BTA algorithm,
WF2T=O(n4),(1.5)
restricts the size of the input tensor (n3106). Our goal is to reach the
complexity (and resolution) corresponding to maximum size of the input
tensor, O(n3), (that is, n3108).
17
1 Introduction
3. Multigrid accelerated rank-structured approximation
To avoid above limitations, in Section 3 we introduce the multigrid accelerated
(MGA) Tucker decomposition. The MGA tensor approximation method applied
to discretised functions in R3, enables to relax the restrictions of the single grid
Tucker model for large nand R.
The idea of the multilevel acceleration is based on the successive reiteration of
the orthogonal Tucker tensor approximation on a sequence of nested refined grids.
The HOSVD or RHOSVD are performed only on the coarsest grid representation
of the target tensor, given in full or canonical formats, respectively. As the initial
guess for the ALS iteration on finer grids, we apply the interpolated orthogonal
side-matrices calculated on the coarser grid.
For the canonical target tensor, along with a good initial guess for the nonlinear
ALS approximation, the MGA approach provides the transfer of the important
data structure information from the coarse-to-fine grids, based on the introduced
maximum energy principle. By using this principle, on the coarsest grid level, we
extract the location of dominating columns most important fibers of the -mode
contracted unfolding matrices. This leads to the fast nonlinear ALS iteration
on huge 3D Cartesian grids now performed over an almost minimal sufficient
subset of directional fibers in the target tensor. Figures in Section 3 illustrate the
mechanism of the choice of most important fibers.
The MGA technique exhibits the following benefits for the tensors related to
functions in R3.
For the rank-Rcanonical target tensors it is proved to have linear (instead
of polynomial) scaling in all input parameters: n,Rand the Tucker rank r,
WC2T=O(rRn),
see Theorem 3.1 and Algorithm MG C BTA, in Section 3.2.
For the full format tensors of size n3, it leads to a nested F2T decomposition
with the linear cost in the volume size of input data,
WF2T=O(n3),
(instead of O(n4)-scaling for the standard Tucker approximation).
It should be emphasized that our technique is particularly efficient for the rank
reduction of the canonical rank-Rtensors with large Rand n. Therefore, this
algorithm is suitable for the functions with multiple strong singularities, such as
electron densities of molecules in the Hartree-Fock equation.
18
Theorem 3.1 proves linear complexity with respect to the input parameters of
a tensor, for the MGA BTA applied to the rank-Rcanonical input in R3.
Combination of C2T and T2C algorithms is successfully applied for rank re-
duction of the electron density of some organic molecules. This enables accurate
computation of the Hartree potential (a convolution integral in R3), see Section
4, using remarkably fine n×n×ngrids with the size n= 16384 for all three
dimensions. Figures present the comparison of elapsed times for the unigrid and
multigrid schemes, as well as the cost for C2T transform in the MGA rank reduc-
tion of the electron density of H2O and some small organic molecules. Parameters
of the algorithms allow to choose the required accuracy level.
In the case of full format tensors, our multigrid accelerated Tucker algorithm en-
ables usage of large grids up to Nvol = 5123. Computation time of this algorithm
essentially outperforms the existing benchmark method based on the Newton-type
scheme on the product Stiefel manifold [92]. The numerical examples on the MGA
BTA of full size tensors related to functions in R3include, for example, ρ1/3, which
is commonly used in the density functional theory. Notice that the alternative
approach based on the cross approximation over the incomplete data set in 3D,
is presented in [85]. Next, interesting numerical examples are presented on the
Tucker approximation of the electron density of Aluminium clusters with 14, 172,
and 365 atoms [14]. Along with the multicentered Slater functions considered in
Section 2, they demonstrate that the rank of the Tucker representation of function
related tensors with periodic multiple cusps (or bumps), is almost independent
of the number of cells in the periodic structure. This indicates that the effective
numerical complexity for solving larger grid-based quasi-periodic problems by the
tensor-structured methods, for example in the orbital-free density functional the-
ory [32, 33], may be expected to increase only linearly in the univariate grid size n.
4. The Hartree-Fock equation
In Section 4, the tensor-structured methods are applied for computation of
the Galerkin matrices of the Coulomb and exchange operators in the Hartree-
Fock model. The Hartree-Fock equation is the mean-field approximation of the
electronic Schr¨odinger equation for computation of the ground state energy of
many-electron systems. It is an eigenvalue problem (we consider a closed shell
case) on finding Norb lowest eigenvalues λiand respective pairwise L2-orthogonal
eigenfunctions (molecular orbitals) ϕi:R3R,ϕiH1(R3), from the equation
Fϕi(x) = λiϕi(x), i = 1, ..., Norb, x R3,(1.6)
with RR3ϕi(x)ϕj(x)dx =δij, (i, j = 1, ..., Norb), where Norb is the number of
19
1 Introduction
electron pairs in a molecule. The nonlinear Fock operator Fis given by
F:= 1
2 + Vc+VHK.
Here, an external nuclear potential Vcis defined by
Vc(x) =
M
X
ν=1
Zν
kxAνk,(1.7)
where Mis the number of nuclei in a molecule, and Zν,AνR3denote the
charge and spatial coordinates of the nuclei, respectively. The Hartree potential
VHdetermines the Coulomb interaction of every single electron with the field
generated by all electrons of the system
VH(x) := ZR3
ρ(y)
kxykdy, x R3,(1.8)
which corresponds to the convolution of the Coulomb potential with the electron
density
ρ(y) = 2
Norb
X
i=1
(ϕi(y))2.(1.9)
Calculation of the exchange Galerkin matrix in the Hartree-Fock equation is a
challenging problem due to the nonlocal character of the exchange operator K
(Kϕ) (x) := 1
2ZR3
τ(x, y)
kxykϕ(y)dy, (1.10)
with the density matrix
τ(x, y) = 2
Norb
X
i=1
ϕi(x)ϕi(y).
Traditionally the Hartree and exchange potentials are computed by the analytical
evaluation of two-electron integrals, using the separable Gaussian basis functions,
or Gaussian type orbitals (GTOs), to represent the eigenfunctions,
ϕi(x) =
R0
X
k=1
ci,kgk(x), x = (x1, x2, x3)R3.(1.11)
Here basis functions gk,k= 1,...,R0, are naturally separable Cartesian Gaus-
sians represented in the rank-1 canonical tensor product form,
gk(x) = g(1)
k(x1)g(2)
k(x2)g(3)
k(x3),
20
with 1,2,3 designating space dimensions. When using the traditional analytical-
based approaches, computation of the Hartree and exchange potentials represents
a major bottleneck for the numerical solution of the Hartree-Fock and Kohn-Sham
equations.
5. Computation of the Hartree and exchange potentials
For grid-based agglomerated computation of (1.8) and the integral operator
(1.10) on 3D Cartesian grids, we introduce an appropriate fixed computational
box and apply a collocation discretization scheme for all involved multivariate
functions and a projection (Galerkin) scheme for representation of operators. In
particular, the collocation/projection scheme is used for representation of the
Newton kernel in the discrete tensor-product convolution. The canonical rank-
RNrepresentation of a tensor representing the projected Newton potential is
computed using the sinc-quadratures in [10].
The fast tensor product convolution of the multivariate functions in Rddevel-
oped in [61] with complexity O(dn log n), is applied in the computation of the
integral (1.8) and the integral operator (1.10). When applied in 3D case, it con-
siderably outperforms the benchmark convolution based on the 3D Fast Fourier
Transform (FFT) having the cost O(n3log n). Table 4.1 illustrates linear depen-
dence of the CPU times for convolution (and preliminary rank reduction for the
electron density) versus the univariate grid size nin the computation examples
of VHfor H2O, C2H6and CH4molecules.
Accuracy of the tensor-structured computations on a fixed grid is O(h2), where
h=O(n1) is the mesh-size. We achieve O(h3) accuracy in evaluation of the
convolution integral operators by using the Richardson extrapolation on a cou-
ple of consequent equidistant grids. Notice that the tensor convolution on non-
equidistant grids was described in [42, 43].
The size of the computational box for the considered molecules is in the range
of 14 ÷20
A. Our tensor-structured methods enable usage of large n×n×n3D
Cartesian grids, and hence fine mesh sizes
h102
Afor n= 1024,
h104
Afor n= 16384.
In the calculation of the Hartree potential VH, the electron density ρis a square
of a sum of Gaussians, see (1.9), (1.11), therefore it is expressed in terms of a
larger number of Gaussians Rρ0=R0(R0+1)
2. As a result, the convolution operator
with the rank RNNewton kernel, applies to the canonical tensor for ρ, with large
ranks Rρ0104. Moreover, the results of convolution with the ranks RC=
21
1 Introduction
Rρ0RN, should be used further to compute the Galerkin projection of the Hartree
potential in the Gaussian basis set, which can yield the total complexity of the
order O(R2
ρ0RNnlog n), to compute the Coulomb matrix in the Fock operator.
Hence, a rank reduction for the electron density is required, to enable evaluation
of the Hartree potential VHfor both large Rρ0and n. We apply the multigrid
accelerated C2T and T2C algorithms introduced in Sections 2, 3, to decrease Rρ0
to much smaller value,
RρRρ0,with Rρr2,
where ris the Tucker rank (usually, we use equal ranks, r=r, = 1,2,3).
In this way, we have linear scaling (up to lower order terms) in all parameters
of the input canonical tensors, at every step of evaluation of the Coulomb matrix,
Rank reduction scheme for the electron density. Complexity of C2T, WCT=
O(rpRρ0n), where pris the multigrid parameter. T2C complexity,
WTC=O(Rρn), where Rρr2.
The Hartree potential VH, is computed by the “agglomerated” tensor-product
convolution, WCC=O(RρRNnlog n).
Tensor scalar product WCC =O(nRρ0RρRN) to compute Coulomb matrix
entries.
In the case of large molecules, the MGA rank reduction can be applied to the
result of convolution, to reduce the rank of the tensor before computation of the
Coulomb matrix entries.
Figures in Section 4 show linear scaling of the CPU times with respect to both
Rρ0, and n, in the rank reduction and the tensor-product convolution. Figures
illustrate the absolute error in calculations of VHand the Coulomb matrix for
CH4, C2H6and H2O. Accuracy up to 106hartree (absolute error) even for cusp
areas in ρis achieved due to huge n×n×nCartesian grids with nup to 104.
Next, in Section 4, the tensor-structured evaluation of the nonlocal (integral)
exchange operator Kin the Hartree-Fock equation is considered. Note that com-
putation of the Hartree-Fock exchange leads to integration in six dimensions,
see (4.17). Mostly, analytical evaluation of these six-dimensional integrals was
considered in the literature (two-electron integrals), and many efforts have been
devoted to solution of this problem (see [109, 78] and references therein).
We introduce the “agglomerated” scheme for the computation of the exchange
matrix entries by “dividing” the integration into the following steps.
22
Tensor-product convolutions of the Newton kernel with the rank-R0canoni-
cal tensors representing the Hadamard product of the corresponding molec-
ular orbital with every Gaussian.
Computation of the Galerkin projection of the convolution result with re-
spect to the GTO basis set (by tensor scalar products), to obtain contribu-
tion to the entries of the exchange matrix from the corresponding orbital.
Summation over all Norb orbitals completes the matrix computation.
This tensor-product algorithm for evaluation of (4.16) has the complexity
W=O(neff R4
0Norb),
where Norb is the number of orbitals in a molecule, neff nis the “effective”
univariate grid size, and R0is the number of Galerkin basis functions. To reduce
the R0-asymptotics to O(R3
0), we apply the C2T algorithm to reduce the ranks
after the convolution step.
Note, that in the traditional ab initio electronic structure calculations, many
years have been devoted to the development of the rigorous schemes for the an-
alytical evaluation of the two-electron integrals inherent to this approach, which
yielded state-of-the-art packages like GAUSSIAN, Abinit and MOLPRO [5, 108].
In analytical-based programs, elaborated by large scientific groups, essential parts
of calculations use precomputed parameters and additional “non-zero” initial ap-
proximations for accelerating the iterative solution of the eigenvalue problem (1.6).
6. Description of the Hartree-Fock solver
In Section 5 of this dissertation we introduce the novel multilevel scheme for the
numerical solution of the Hartree-Fock equation (1.6) using the grid-based tensor-
structured methods to represent basic operators as described in previous sections.
The new concept for the numerical solution of the Hartree-Fock equation is a
“grey box” scheme based on a moderate number of the problem-adapted Galerkin
basis functions {gk}represented on 3D Cartesian grid, which are used as “global
elements” with the low separation rank3.
We solve the nonlinear EVP obtained by the Galerkin-type discretisation of the
Hartree-Fock equation (1.6), with respect to this global element basis”,
FCi=λiSCiwith F=H0+J(C)K(C), i = 1, ..., Norb,(1.12)
3Here, the grid-based GTO basis is chosen for the reasons of convenient comparison of the inter-
mediate results of computations with the MOLPRO output. Any appropriate algebraically
separable basis with the rank larger than 1 can be used instead.
23
1 Introduction
where the eigenvectors CiRR0form a matrix C={cki}= [C1C2. . . CNorb ]
RR0×Norb ,representing the Galerkin expansion coefficients of orbitals in the given
basis set, ϕi=
R0
P
k=1
ckigk. Here, J(C), and K(C), correspond to the Galerkin
matrices of the Hartree and exchange operators, respectively.
The discrete Hartree-Fock equation is solved by the self-consistent field (SCF)
iteration on a sequence of refined grids. To enhance the convergence, the SCF
iteration is supplemented by the direct inversion in the iterative subspace (DIIS)
scheme commonly used in the physical literature [89, 53, 16]. The multilevel
acceleration of DIIS iteration described in Section 5.2.3 leads to stable and fast
convergence on fine grid levels. We begin the iterative solution of the EVP with
the initial zero approach for the matrices J,K, (J(C) = 0, K(C) = 0), i.e., only
with the given core Hamiltonian part of the Fock operator, H0, which contains
the contribution of the nuclear external potential and the kinetic energy of the
electrons in a molecule (linear part of F).
The core of our method is the tensor-structured computation of the electron
density and Galerkin matrices of the Hartree and exchange operators J(C), K(C),
with O(nlog n)-complexity using an updated matrix C, at every step of nonlinear
iteration.
High accuracy is achieved due to 3D tensor-structured arithmetic, with rank
truncation, enabling computations over huge n×n×ntensor grids with up to 1012
entries at the finest level. In electronic structure calculations, this implies rather
fine resolution discussed above, which enables an arbitrary space orientation of a
molecule in the computational box, as in the case of analytically based methods4.
Our multilevel tensor-structured scheme for the numerical solution of the ab
initio Hartree-Fock equation includes the following ingredients.
In current computations we use, for simplicity, the H0part of the Fock
operator from MOLPRO (solution independent part).
Computations involving the n×n×n3D grid, start with a coarse approxi-
mation (say, for n0= 64), and proceed using the dyadic grid refinement up
to n= 1024 in the pseudopotential cases, and up to n= 8192 in all electron
cases.
Iterations for solving the EVP start with Jn0(C) = 0, Kn0(C) = 0.
At every iteration step, the Hartree and exchange parts of the Fock ma-
trix Fare updated by the tensor-structured computations with O(nlog n)
complexity.
4Note that our algorithms are computationally feasible using MATLAB on a standard SUN
station.
24
A grid dependent termination criterion is used for switching iterations to
the next level of grid refinement.
Specifically, as the termination criterion, we control the norms of the differences
of the orbitals taken as the residual over subsequent iterations. Our multilevel
strategy has a four-fold effect, since it
a) provides fast convergence of the SCF DIIS iterations on the coarse grids, in
spite of zero initial guess for Jn0(C), Kn0(C),
b) ensures good initial guess for iterations on (time consuming) finer grids,
c) allows the improved asymptotical approximation O(h3), via the Richardson
extrapolation over a sequence of grids.
d) allows to reduce considerably the number of recomputed entries in K(C) at
fine grid levels (much less than R2
0/2), by using the filtering strategy on fine
grid levels.
The discrete orbitals, represented by the respective coefficient vectors are updated
by diagonalising the Galerkin stiffness matrix at each iteration of the solution of
the nonlinear EVP problem, at the expense O(R3
0), where R0is the dimension
of the Galerkin subspace. We observe that the multilevel DIIS iteration exhibits
uniform linear convergence rate qNit , q < 1, where Nit is the number of itera-
tions, while the overall computational time for one iteration on an n×n×n3D
Cartesian grid scales as O(nlog n) in the univariate grid size n(under fixed rank
parameters).
The current version of our method still scales cubically in the size of the ap-
proximating basis. Hence, any algebraic optimisation of this basis set may give
new opportunity to high accuracy ab initio computations for large molecules. The
quadratic scaling in the size of the approximating basis might be possible for the
iterative solution of the discrete spectral problem, or in the framework of direct
minimization algorithms, see [95, 93] for the detailed discussion on the direct
minimization methods.
The numerical illustrations for the SCF iteration are presented for the all elec-
tron case of H2O, and the pseudopotential case of CH4, CH3OH and C2H5OH
molecules. Figures demonstrate the exponential convergence of the residuals with
respect to the number of iterations. The number of effective iterations, scaled with
respect to the time-unit required for one iteration on the finest grid level, is fairly
small. The uniform convergence rate q0.4 is observed for the multilevel itera-
tions, and it turns out to be quite small, q0.1, in terms of effective iterations.
25
1 Introduction
Numerical computations confirm almost linear scaling in n, and possibilities
for the low-rank representation of the multivariate functions and operators, indi-
cating attractive features of the multilevel tensor-truncated SCF iteration in the
prospects of efficient ab initio and DFT computations for large molecules.
To summarize, we note that the novel tensor-structured methods considered
in this dissertation address many interesting mathematical and algorithmic prob-
lems which need future rigorous theoretical and numerical analysis. Validity of
these methods in molecular computations is verified by the numerical results dis-
cussed in this thesis, signifying their perspectives in application to computational
problems in modern quantum chemistry.
Leipzig 2006-2010
Max-Planck Institute for Mathematics in the Sciences
26
2 Tensor structured (TS) methods
for functions in Rd,d3
During the last decades, the methods for the low-rank tensor approximation of
multiway data, originating from the fundamental papers [54, 106, 77], have been
intensively developed in application to the problems of chemometrics, psychomet-
ric, independent component analysis, signal processing and higher order statistics.
A thorough mathematical approval and analysis of the Tucker decomposition algo-
rithm has been presented in the seminal works on the higher order singular value
decomposition [24] and the best rank-(r1,...,rd) orthogonal Tucker approxima-
tion of higher order tensors [23]. A comprehensive survey [74] summarized the
results of the extensive research on tensor decomposition methods and applica-
tions in computer science.
In this section, we recall some of the results in [23, 24] and describe the new
tensor decomposition methods especially designed for application to the problems
of scientific computing. The validity of the Tucker model in scientific computing
can be understood on the base of numerical results on low rank Tucker-type
decomposition applied to classes of function related tensors in Rd,d= 3, [67] that
gainfully exploits the important analytical properties of a respective multivariate
function.
We begin with the description of the full format dth order tensors and definitions
of the basic rank-structured tensor formats: the rank-Rcanonical, the orthogonal
rank-(r1,...,rd) Tucker and mixed type decompositions. As a starting point, we
recall the important theorem in [21] that the minimization problem of the Tucker
approximation is equivalent to the dual maximization problem of finding vectors
of the orthogonal mapping dominating subspaces which provide the maximum
norm of the Tucker core tensor over the product manifold of orthogonal matrices.
Our Theorem 2.14 on the canonical-to-Tucker decomposition describes the re-
duced higher order SVD (RHOSVD) applied to the canonical rank-Rtensors and
provides the error estimate in terms of the truncated SVD of the -mode side
matrices, supplemented by the corresponding complexity bounds. This result is
an analogue to the well known higher order SVD (HOSVD) approximation for
the full format tensors in [24], but now applied to the case of rank-Rtargets.
27
2 Tensor structured (TS) methods for functions in Rd,d3
We present numerical illustrations on the Tucker approximation of full for-
mat tensors on examples of the following discretized functions in R3: Newton,
Yukawa and Helmholtz potentials, Slater and periodic multi-centered Slater-type
functions, and electron densities of small-size organic molecules.
This section is concluded by the summary of linear and bilinear operations
in the Tucker and canonical tensor formats with estimation of the required nu-
merical complexity. In general, along with the consequent multigrid accelerated
Tucker approximation techniques introduced in Section 3, the material of this
section provides sufficient tools for the construction of efficient tensor-structured
numerical methods in electronic structure calculations in R3(see Sections 4, 5).
2.1 Definitions of rank-structured tensor formats
2.1.1 Full format dth order tensors
A tensor of order dis a multidimensional array of real (complex) numbers whose
entries are referred by using a product index set I=I1×...×Id. We use the
common notation
A= [ai1...id:iI]RI, I={1, ..., n}, = 1, ..., d, (2.1)
to denote a dth order tensor, and ifor the d-tuple i= (i1, ..., id) of integers1. A
tensor Ais an element of the linear vector space Vn=RI, where n= (n1, ..., nd),
with the entrywise addition (A+B)i=ai+biand the multiplication (c A)i=c ai
(cR). The linear vector space Vnof tensors is equipped with the Euclidean
scalar product ,·i :Vn×VnR, defined as
hA, Bi:= X
(i1...id)∈I
ai1...idbi1...idfor A, B Vn.(2.2)
We call the related norm kAkF:= phA, Ai, the Frobenius norm, as for matrices.
Notice that a vector is an order-1 tensor, while a matrix is an order-2 tensor, so
the Frobenius tensor norm coincides with the Euclidean norm of vectors and the
Frobenius norm of matrices, respectively.
In some cases when the tensor size should be specified explicitly, we use the
equivalent notation for the linear space of tensors, Rn1×...×nd, instead of RI.
Some multilinear algebraic operations with tensors of order d(d3), can be
reduced to the standard linear algebra by unfolding of a tensor into a matrix.
Here, we recall the construction of matrix unfolding (or the so-called matriciza-
tion) as given in the survey [74]. First, we recall the notion of fibers given in
1The alternative notation for the tensor entries ai1...idis a(i1,...,id).
28
2.1 Definitions of rank-structured tensor formats
[74], which are the higher order analogue of matrix rows and columns. A fiber is
defined by fixing all indices of a tensor except one. In this way, a matrix column
is a mode-1 fiber, and a matrix row is a mode-2 fiber. The -mode matricization
of a tensor ARI1×...×Idarranges the -mode fibers of a tensor to be the columns
of the resulting matrix.
Definition 2.1 ([74]) The unfolding of a tensor along mode is a matrix of
dimension n×(n+1...ndn1...n1), further denoted by
A()= [aij]Rn×(n+1...ndn1...n1),(2.3)
whose columns are the respective fibers of Aalong the -th mode, such that the
tensor element ai1i2...idis mapped into the matrix element aijwhere
j= 1 +
d
X
k=1,k6=
(ik1)Jk,with Jk=
k1
Y
m=1,m6=
nm.
An illustration of the tensor unfolding A(), (= 1,2,3) for the 3-rd order tensor
is presented in Figure 2.3.
Another important tensor operation is the so-called contracted product of two
tensors. In the following, we frequently use its special case of the tensor-matrix
multiplication along mode .
Definition 2.2 ([21]) Given a tensor ARI1×...×Idand a matrix MRJ×I,
we define the respective mode-tensor-matrix product by
B=A×MRI1×...×I1×J×I+1...×Id,(2.4)
where
bi1...i1ji+1...id=
n
X
i=1
ai1...i1ii+1...idmji, jJ.
Notice that the order of indices J×Iin Definition 2.2 corresponds to the tradi-
tional contracted product notations for the Tucker decomposition as in (2.10).
The tensor-matrix product can be applied successively along several modes,
and it can be shown to be commutative
(A×M)×mP= (A×mP)×M=A×M×mP, 6=m.
The repeated (iterated) mode-tensor-matrix product for matrices Mand Pof
appropriate dimensions can be simplified as follows,
(A×M)×P=A×(PM),
29
2 Tensor structured (TS) methods for functions in Rd,d3
n3
n1
n2
r3
n2
n1
3
r3
n3
Figure 2.1: Contracted product of a 3rd order tensor with a matrix, see (2.4).
as discussed in [24].
For example, the contracted product of ARn1×n2×n3with a matrix M
Rr3×n3yields the tensor BRn1×n2×r3, see Figure 2.1.
An important property of the contracted product that resembles the matrix
transpose is given in the following Lemma.
Lemma 2.3 For any BRI1×...×I1×J×I+1...×Id, we have
hA×M, Bi=hA, B ×MTi.(2.5)
Proof: By definition,
hA×M, Bi=X
i∈I\I, jJ n
X
i=1
ai1,...i1,i,i+1,idmj,i!bi1,...,j,i+1,...,id
=X
i∈I
ai X
jJ
bi1,...i1,j,i+1,idmj,i!=hA, B ×MTi.
The number of entries in a full format tensor is
d
Q
=1
#I. Assume for simplicity
that #I=nfor all = 1, ..., d, then the number of entries in Aamounts to nd,
hence growing exponentially in d.
2.1.2 Tucker, canonical and mixed (two-level) tensor formats
To get rid of exponential scaling in the dimension approximate representations
in some classes S Vnof “rank structured” tensors will be applied. To that
end, the traditional concept of tensor-product Hilbert spaces (see, e.g. [91]) plays
an important role. Specifically, the initial linear vector space of tensors Vn, is
considered as the tensor-product Hilbert space Vn=d
=1Vof real-valued d-th
order tensors with V=RI, where RI(= 1, ..., d) is the standard Euclidean
“univariate” vector space.
The tensor product of vector spaces V(= 1, ..., d) is defined by using the usual
construction of the so-called rank-1 or elementary tensors: the tensor product of
30
2.1 Definitions of rank-structured tensor formats
vectors u()={u()
i}iIV(= 1, ..., d) forms the canonical rank-1tensor
U[ui]i∈I =u(1) ... u(d)Vnwith entries ui=u(1)
i1···u(d)
id.
Now the tensor product of vector spaces Vis defined by the span
d
=1V:= span{u(1) ... u(d):u()RI,1d}.
Taking all linear combinations of rank-1 tensors defined by the unit vectors in RI
(= 1, ..., d) shows that Vn=d
=1RI.
Notice that a rank-1 tensor requires only dn numbers to store it (now linear
scaling in the dimension). Moreover, the scalar product of two rank-1 tensors U
and Vin Vncan be represented by the componentwise univariate scalar products
hU, V i:=
d
Y
=1 u(), v(),
that can be calculated in O(dn) operations. When d= 2, the tensor product of
two vectors uRIand vRJrepresents a rank-1 matrix,
uv=uvTRI×J.
As the simplest rank structured ansatz, we make use of rank-1 tensors. In the
following, we consider the rank-structured representation of higher order tensors
based on sums of rank-1 tensors. Specifically, we shall use the Tucker, canonical
and mixed models.
Definition 2.4 (The canonical format).
Given a rank parameter RN, we denote by CR,n=CRVna set of tensors
which can be represented in the canonical format,
U=XR
ν=1ξνu(1)
ν...u(d)
ν, ξνR,(2.6)
with normalised vectors u()
νV(= 1, ..., d). The minimal parameter Rin
(2.6) is called the rank (or canonical rank) of a tensor.
Introducing the side-matrices corresponding to representation (2.6),
U()= [u()
1...u()
R]
and the diagonal tensor ξ:= diag{ξ1, ..., ξR}such that ξν1,...,νd= 0 except when
ν1=... =νdwith ξν,...,ν =ξν(ν= 1, ..., R), we obtain the equivalent contracted
product representation
U=ξ×1U(1) ×2U(2)... ×dU(d).(2.7)
31
2 Tensor structured (TS) methods for functions in Rd,d3
The canonical tensor representation is gainful for the multilinear tensor opera-
tions. In Section 2.4.2 it is shown that the linear tensor operations with tensors
in the rank-Rcanonical format have linear complexity Od
P
=1
nwith respect to
the univariate grid size nof a tensor. The disadvantage of this representation is
the lack of fast and stable algorithms for best approximation of arbitrary tensors
in the fixed-rank canonical format.
The rank-(r1,...,rd)Tucker tensor format [106, 23] is based on a representation
in subspaces
Tr:= d
=1Tof Vnfor certain TV
with fixed dimension parameters r:= dim Tn.
Definition 2.5 (The Tucker format).
For given rank parameter r= (r1, ..., rd), we denote by Tr,n(or shortly Tr) the
subset of tensors in Vnrepresented in the so-called Tucker format
A(r)=Xr1
ν1=1 ...Xrd
νd=1βν1,...dv(1)
ν1...v(d)
νdVn,(2.8)
with some vectors v()
νV=RI(1νr), which form the orthonormal
basis of r-dimensional subspaces T= span{v()
ν}r
ν=1 (= 1, ..., d). With fixed r
and n, we can also write
Tr,n:= {A d
=1Tfor arbitrary TV,such that dim T=r}.
Here we call the parameter r= max
{r}the maximal Tucker rank.
In our applications, we usually apply the Tucker approximation with rn, say
r=O(log n). The coefficients tensor β= [βν1,...,νd], that is an element of a tensor
space
Br=Rr1×...×rd,(2.9)
is called the core tensor. Introducing the (orthogonal) side matrices V()=
[v()
1...v()
r], such that V()TV()=Ir×r, we then use a tensor-by-matrix con-
tracted product to represent the Tucker decomposition of A(r)Tr,
A(r)=β×1V(1) ×2V(2)... ×dV(d).(2.10)
Remark 2.6 Notice that the representation (2.10) is not unique, since the tensor
A(r)is invariant under directional rotations. In fact, for any set of orthogonal
r×rmatrices Y(= 1, ..., d), we have the equivalent representation
A(r)=b
β×1b
V(1) ×2b
V(2)... ×db
V(d),
with
b
β=β×1Y1×2Y2... ×dYd,b
V()=V()YT
, = 1, ..., d.
32
2.1 Definitions of rank-structured tensor formats
Remark 2.7 If the subspaces T= span{v()
ν}r
ν=1 Vare fixed then the ap-
proximation A(r)Trof a given tensor AVnis reduced to the orthogonal
projection of Aonto the particular linear space Tr=d
=1TTr,n, that is
A(r)=
r
X
ν1,...,νd=1hv(1)
ν1... v(d)
νd, Aiv(1)
ν1...v(d)
νd
=A×1V(1)T×2···×dV(d)T×1V(1) ×2...×dV(d).
This property plays an important role in the computation of the best orthogonal
Tucker approximation, where the ”optimal” subspaces Tare recalculated within
a nonlinear iteration process.
In the following, to simplify the discussion of complexity issues, we assume that
r=r(= 1, ..., d). The storage requirements for the Tucker (resp. canonical)
decomposition is given by rd+drn (resp. R+dRn), where usually ris noticably
smaller than n. In turn, the maximal canonical rank of the Tucker representation
is bounded by rd1(see Remark 2.17).
Since the Tucker core still presupposes rdstorage, we introduce further the
approximation methods using a mixed (two-level) representation [60, 67] which
gainfully combines the beneficial features of both the Tucker and canonical mod-
els.
Definition 2.8 (The mixed (two-level) Tucker-canonical format).
Given the rank parameters r, R, we denote by TCR,rthe subclass of tensors in
Tr,nwith the core βrepresented in the canonical format, βCR,rBr. An
explicit representation of ATCR,ris given by
A= R
X
ν=1
ξνu(1)
ν...u(d)
ν!×1V(1) ×2V(2)... ×dV(d),(2.11)
with some u()
νRr. Clearly, we have the embedding TCR,rCR,nwith the cor-
responding (non-orthogonal) side-matrices U()= [V()u()
1...V ()u()
R], and scaling
coefficients ξν(ν= 1, ..., R).
A target tensor AVncan be approximated by a sum of rank-1 tensors as in
(2.8), (2.6), or using the mixed format TCR,ras in (2.11). More details on the
mixed (two-level) tensor format are given in Sections 2.2.4 and 2.2.5.
In the next sections we discuss fast and efficient methods to compute the cor-
responding rank structured approximations in different problem settings.
33
2 Tensor structured (TS) methods for functions in Rd,d3
A
r3
I3
I2
r2
(3)
(2)
I1
I
I
I
2
3
1r2r3
r1+ . . . +
1
(3) (3)
r
1
1
(2)
(1)
r2
r3
r1
+d r n
r1rd
nd
(1)
u
u
u u
b1 bR(2)
r
u
u(1)
r
2
R < = r
V
V
Vβ
ββ
Figure 2.2: Mixed Tucker-canonical representation of the full format 3rd order
tensor.
2.2 Best orthogonal Tucker approximation (BTA)
2.2.1 General discussion
The numerical Tucker-type approximation of dth order tensors is one of the most
practically important MLA operations. This operation is, in fact, one of the
possible higher order extensions of the best rank-rmatrix approximation in linear
algebra, based on the truncated SVD.
In our applications, we deal with the best Tucker approximation applied to
the so-called function related tensors, whose entries are computed by sampling a
given multivariate function in Rd,d= 3 over an n×n×nCartesian tensor grid.
Further, in the discussion of the numerical results in Section 2.3.2, we describe
the particular construction of function related tensors and observe some useful
properties of the Tucker format applied to these tensors.
In general, the target tensor A0to be approximated may belong itself to a
certain class S0Vnof data structured tensors. Since both Tr,nand CR,nare
not linear spaces we are led to the challenging nonlinear approximation problem
A0 S0Vn:f(A) := kA0Ak2min (2.12)
over all tensors A S with S {Tr,n,CR,n,TCR,r}. The target tensor A0might
inherit a certain data-sparse structure like S0 {CR0,n,Tr0,n}.
As the basic nonlinear approximation scheme, we consider the best orthogonal
rank-(r1, ..., rd) Tucker format corresponding to the choice S=Tr,n. Tensors
ATr, are parametrised as in (2.10), with the orthogonality constraints
V() Vn,r(= 1, ..., d),
34
2.2 Best orthogonal Tucker approximation (BTA)
where
Vn,r := {YRn×r:YTY=Ir×rRr×r}(2.13)
is the so-called Stiefel manifold of n×rorthogonal matrices. This minimisation
problem on the product of Stiefel manifolds was first addressed in [76].
In the following we denote by Gthe Grassman manifold that is the factor space
of the Stiefel manifold Vn,r(= 1, ..., d) in (2.13) with respect to all possible
rotations, see Remark 2.6.
For a wide class of function related tensors, the quality of approximation via the
minimisation (2.12) can be effectively controlled by the Tucker rank. In particular,
for certain classes of function related tensors it is possible to prove exponential
convergence (cf. [44, 60]),
kA(r)A0k Ceαˆrwith ˆr= min
r,(2.14)
where A(r)is a minimizer in (2.12). As a consequence, the approximation error
ε > 0 can be achieved with ˆr=O(|log ε|).
The following Lemma proves that the relative difference of norms of the best
rank-(r1,...,rd) Tucker approximation A(r)and the target A0is estimated by the
square of the relative Frobenius norm of A(r)A0.
Lemma 2.9 (quadratic convergence in norms). Let A(r)RI1×...×Idsolve the
minimisation problem (2.12) over ATr. Then we have the ”quadratic” relative
error bound kA0kkA(r)k
kA0kkA(r)A0k2
kA0k2.(2.15)
Moreover, it holds kβk=kA(r)k kA0k.
Proof: First part of the proof is given for the completeness (cf. [23] for a short
exposition). Letting A(r)=β×1V(1) ×2V(2) ...×dV(d)and using Lemma 2.3,
we easily obtain the identity
kA(r)k=kβk,(2.16)
since orthogonal matrices V() Vn,rdo not effect the Frobenius norm. Further-
more, with fixed V()(= 1, ..., d), relation (2.12) is merely a linear least-square
problem with respect to βRr1×...×rd,
g(β) := hA0, A0i2hA0,β×1V(1) ×2...×dV(d)i+hβ,βi min.(2.17)
Hence, the corresponding minimisation condition
g(β+δβ)g(β)0δβRr1×...×rd,
35
2 Tensor structured (TS) methods for functions in Rd,d3
leads to the following equation for the minimiser,
−hA0, δβ×1V(1) ×2...×dV(d)i+hβ, δβi= 0 δβRr1×...×rd.
This implies, using Lemma 2.3, that
h−A0×1V(1)T×2...×dV(d)T+β, δβi= 0,δβRr1×...×rd,
and we obtain
βA0×1V(1)T×2...×dV(d)T= 0.(2.18)
Next we readily derive
f(A(r)) = kA(r)k22hβ×1V(1) ×2...×dV(d), A0i+kA0k2
=kA(r)k2+kA0k22hβ, A0×1V(1)T×2...×dV(d)Ti,
=kA0k2kβk2,
hence it follows that
kA0k2kA(r)k2=kA(r)A0k2.(2.19)
The latter leads to the estimate (clearly (2.19) implies kA0k kA(r)k)
kA0kkA(r)k
kA0k=kA(r)A0k2
(kA(r)k+kA0k)kA0kkA(r)A0k2
kA0|2,
that completes the proof.
The key point for the efficient solution of the minimization problem (2.12) with
S=Tr,nis its equivalence to the dual maximisation problem [23],
[Z(1), ..., Z(d)] = argmax
hv(1)
ν1... v(d)
νd, Air
ν=1
2
Br
(2.20)
over the set of side-matrices V()= [v()
1. . . v()
r] in the Stiefel manifold Vn,r, as
in (2.13).
The following lemma reduces the minimisation of the original quadratic func-
tional to the dual maximisation problem thus eliminating the core tensor βfrom
the optimization process.
Lemma 2.10 ([23]) For given A0RI1×...×Id, the minimisation problem (2.12)
on Tris equivalent to the dual maximisation problem
g(Y(1), ..., Y (d)) :=
A0×1Y(1)T×2... ×dY(d)T
2max (2.21)
36
2.2 Best orthogonal Tucker approximation (BTA)
over a set Y()Rn×rfrom the Grassman manifold, i.e., Y() G(= 1, ..., d).
For given maximizing matrices V(m)(m= 1, ..., d), the tensor βminimising
(2.12) is represented by
β=A0×1V(1)T×2... ×dV(d)TRr1×...×rd.(2.22)
Proof: Inspecting the proof of Lemma 2.9 we find that substitution of βin (2.18)
to (2.17) leads to the equivalent minimizing equation
hA0, A0ihA0×1V(1)T×2...×dV(d)T, A0×1V(1)T×2...×dV(d)Ti min,(2.23)
that proves (2.21). Finally, (2.18) yields (2.22).
In view of Remark 2.6, the rotational non-uniqueness of the maximizer in (2.20)
can be avoided if one solves this maximisation problem in the so-called Grassmann
manifold that is the factor space of Vn,rwith respect to the rotational transforms
[26]. The dual maximisation problem (2.21) posed on the compact manifold can be
proven to have at least one global maximum (see [60, 26]). For the size consistency
of the arising tensors, we require the natural compatibility conditions
r¯r:= r1...r1r+1...rd, = 1, ..., d. (2.24)
2.2.2 BTA algorithm for full format tensors
The best (nonlinear) Tucker approximation (BTA) based on solving the dual
maximization problem (2.20) is usually solved numerically by the ALS iteration
with the initial guess computed by the higher order SVD [24].
The generalization of the SVD to the dth order tensors has been introduced in
[24] in application to the multidimensional problems in signal processing. It is
called the higher order SVD (HOSVD) or dth order SVD. We recall the theorem
with the basic notations as in [23].
Theorem 2.11 (dth order SVD, HOSVD, [23]).
Every complex n1×n2×... ×nd-tensor Acan be written as the product
A=S ×1U(1) ×2U(2)... ×dU(d),
in which
1. U()= [U()
1U()
2...U()
n]is a unitary n×n-matrix,
2. Sis a complex n1×n2×...×nd-tensor of which the subtensors Si=α, obtained
by fixing the th index to α, have the properties of
37
2 Tensor structured (TS) methods for functions in Rd,d3
(i) all-orthogonality: two subtensors Si=αand Si=βare orthogonal for all pos-
sible values of ,α, and βsubject to α6=β:
hSi=α,Si=βi= 0 when α6=β,
(ii) ordering: kSi=1k kSi=2k ... kSi=nk 0for all positive values of .
The Frobenius norms kSi=ik, symbolized by σ()
i, are -mode singular values of
A()and the vector U()
iis an ith -mode left singular vector of A().
Next theorem gives the error bound for the truncated HOSVD.
Theorem 2.12 (approximation by HOSVD, [24]). Let the HOSVD of Abe given
as in Theorem 2.11 and let the -mode rank of A, rank(A()), be equal to R
(= 1, ..., d). Define a tensor ˜
Aby discarding the smallest -mode singular values
σ()
r+1, σ()
r+2, ..., σ()
Rfor given values of r(= 1, ..., d), i.e., set the corresponding
parts of Sequal to zero. Then we have
kA˜
Ak2
R1
X
i1=r1+1
σ(1)
i1
2+
R2
X
i2=r2+1
σ(2)
i2
2+···+
Rd
X
id=rd+1
σ(d)
id
2.
Proof: We have
kA˜
Ak2=
R1
X
i1=1
R2
X
i2=1 ···
Rd
X
id=1
s2
i1i2...id
r1
X
i1=1
r2
X
i2=1 ···
rd
X
id=1
s2
i1i2...id
R1
X
i1=r1+1
R2
X
i2=1 ···
Rd
X
id=1
s2
i1i2...id+
R1
X
i1=1
R2
X
i2=r2+1 ···
Rd
X
id=1
s2
i1i2...id
+···+
R1
X
i1=1
R2
X
i2=1 ···
Rd
X
id=rd+1
s2
i1i2...id
=
R1
X
i1=r1+1
σ(1)2
i1+
R2
X
i2=r2+1
σ(2)2
i2+···+
Rd
X
id=rd+1
σ(d)2
id,
that completes the proof.
Next, we recall the method to solve the (local) maximization problem in Lemma
2.10 which is based on the alternating least squares (ALS) iteration. For the
full format tensors, the sketch of ALS algorithm G BTA reads as follows (see
Algorithm 4.2 in [23] for more details).
Algorithm G BTA (VnTr,n). Given the input tensor AVn, the Tucker
rank r, and the maximum number of ALS iterations kmax 1.
38
2.2 Best orthogonal Tucker approximation (BTA)
SVD
SVD
SVD
(1)
(2)
(3)
r1
r2
r3
n1
n2
n3
n1
n . n
n2
n3
n . n
3 2
1 3
2 1
nn
.
0
0
0
V
V
V
B
B
B
[1]
[2]
[3]
Figure 2.3: Truncated HOSVD of a tensor ARn1×n2×n3is computed by the truncated
SVD of the -mode unfolding matrices, = 1,2,3.
1. Compute the truncated” HOSVD of Ato obtain an initial guess V()
0
Rn×rfor the -mode side-matrices V()(= 1, ..., d) (“truncated” SVD ap-
plied to each matrix unfolding A()). The complexity of HOSVD is bounded
by
W=O(dnd+1).(2.25)
2. For k= 1 : kmax perform:
for each q= 1, ..., d, and with fixed side-matrices V()
k1Rn×r,6=q, the
ALS iteration optimises the q-mode matrix V(q)
kvia computing the domi-
nating rq-dimensional subspace (truncated SVD) for the respective matrix
unfolding
B(q)Rnqׯrq,¯rq=r1...rq1rq+1...rd=O(rd1),(2.26)
corresponding to the tensor obtained by the q-mode contracted product
B=A×1V(1)
k
T×2... ×q1V(q1)
k
T×q+1 V(q+1)
k1
T... ×dV(d)
k1
T.
Each iteration has the cost O(drd1nmin{rd1, n}+dndr), that represents
the expense of SVDs and the computation of matrix unfoldings B(q).
3. Set V()=V()
kmax , and compute the core βas the representation coefficients
of the orthogonal projection of Aonto Tn=d
=1Twith T= span{v()
ν}r
ν=1
39
2 Tensor structured (TS) methods for functions in Rd,d3
(see Remark 2.7),
β=A×1V(1)T×2... ×dV(d)TBr,
at the cost O(rdn).
With fixed kmax, the overall complexity of the algorithm for d= 3, n=n, and
r=r(= 1,2,3) (suppose that r2n), is estimated by
WFT=O(n4+n3r+n2r2+nr3) = O(n4),
where different summands denote the cost of initial HOSVD of A, computation
of unfolding matrices B(q), related SVDs, and computation of the core tensor.
For the class of function related tensors we observe fast and robust local con-
vergence of the ALS iteration (though it is not always the case in traditional
applications of the Tucker decomposition in the independent component analysis
and chemometrics). This fact can be, probably, illuminated by the exponential
error bound in the Tucker rank for the rank-rorthogonal approximation (see
(2.14)), which is often observed in applications to tensors representing the physi-
cally relevant functions [67].
However, notice that the Tucker model applied to the general fully populated
tensor of size ndrequires O(dnd+1) arithmetical operations due to the presence of
complexity dominating higher-order SVD. Hence, in computational practice this
algorithm applies only to small dand moderate n.
2.2.3 BTA for rank-Rcanonical input
In some applications, for example in electronic structure calculations, the target
tensor is already presented in the rank-Rcanonical format, ACR,n, as in (2.6),
but with relatively large R,
U(R)=XR
ν=1ξνu(1)
ν...u(d)
ν, ξνR.
The complexity of tensor-structured operations, in spite of linear scaling with
respect to the one-dimension grid size, is O(R1R2n), is thus increasing also linearly
with respect to the product of the ranks R1and R2of the input tensors. Hence,
large parameters R1and R2may as well lead to substantial numerical cost.
In this case, to reduce the ranks of input tensors, we develop the two-level
canonical-to-Tucker (C2T) approximation with the consequent Tucker-to-canonical
(T2C) transform. The resulting computational cost of MLA operations supple-
mented by such a two-level method can be reduced essentially.
40
2.2 Best orthogonal Tucker approximation (BTA)
The corresponding canonical-to-Tucker-to-canonical approximation scheme [68]
is presented as the following two-level chain,
CR,n
I
TCR,r
II
TCR,r.(2.27)
Here, on Level-I, we compute the best orthogonal Tucker approximation applied
to the CR,n-type input, so that the resultant core is represented in the CR,rformat.
On Level-II, the small-size Tucker core in CR,ris approximated by a tensor in CR,r
with R< R. Here we describe the Algorithm on Level-I (which is, in fact, the
most laborious part in computational scheme (2.27)) that has a polynomial cost
in the size of the input data in CR,n(see Remark 2.15).
Next theorem gives the characterisation on the solution structure for the Level-
I scheme in (2.27), and provides the key ingredients to construct its efficient
numerical implementation provided that the target Ais represented by (2.6).
It also presents the error estimates for the reduced rank-rHOSVD type ap-
proximation (RHOSVD), given in the definition below. Suppose for definiteness
that nR, so that an SVD of the side-matrix U()is given by
U()=Z()DW()T=
n
X
k=1
σℓ,kz()
kw()
k
T, z()
kRn, w()
kRR,
with orthogonal matrices Z()= [z()
1, ..., z()
n], and W()= [w()
1, ..., w()
n], =
1, ..., d. We use the following notations for the vector entries, w()
k(ν) = w()
k
(ν= 1, ..., R).
Definition 2.13 (RHOSVD).
Introduce the truncated SVD of the side-matrices U(),Z()
0Dℓ,0W()
0
T, (= 1, ..., d),
where Dℓ,0=diag{σℓ,1, σℓ,2, ..., σℓ,r}and Z()
0Rn×r,W0()RR×r, represent
the orthogonal factors being the respective submatrices of Z()and W(). Then the
RHOSVD approximation is given by
A0
(r)=ξ×1hZ(1)
0D1,0W(1)
0
Ti×2hZ(2)
0D2,0W(2)
0
Ti···×dhZ(d)
0Dd,0W(d)
0
Ti.(2.28)
Theorem 2.14 (Canonical to Tucker approximation).
(a) Let ACR,nbe given by (2.6). Then the minimisation problem
ACR,nVn:A(r)= argminT∈Tr,nkATkVn,(2.29)
is equivalent to the dual maximisation problem
[V(1), ..., V (d)] = argmax
Y()∈G
R
X
ν=1
ξνY(1)Tu(1)
ν... Y(d)Tu(d)
ν
2
Br
,(2.30)
41
2 Tensor structured (TS) methods for functions in Rd,d3
over the Grassman manifolds G,Y()= [y()
1...y()
r] G(= 1, ..., d), and where
Y()Tu()
νRr.
(b) The compatibility condition (2.24) is simplified to
rrank(U())with U()= [u()
1...u()
R]Rn×R,
and we have the solvability of (2.30) assuming that the above relation is valid.
The maximizer is given by orthogonal matrices V()= [v()
1...v()
r]Rn×r, which
can be computed similarly, as in Algorithm GBTA, where the truncated HOSVD
at Step 1 is now substituted by RHOSVD, see(2.28).
(c) The minimiser in (2.29) is then calculated by the orthogonal projection
A(r)=
r
X
k=1
µkv(1)
k1... v(d)
kd, µk=hv(1)
k1···v(d)
kd, Ai,
so that the core tensor µ= [µk], can be represented in the rank-Rcanonical format
µ=
R
X
ν=1
ξν(V(1)Tu(1)
ν)···(V(d)Tu(d)
ν)CR,r.(2.31)
(d) Let σℓ,1σℓ,2... σℓ,min(n,R)be the singular values of the -mode side-
matrix U()Rn×R(= 1, ..., d). Then the RHOSVD approximation A0
(r), as in
(2.28), exhibits the error estimate
kAA0
(r)k kξk
d
X
=1
(
min(n,R)
X
k=r+1
σ2
ℓ,k)1/2,(2.32)
where kξk=sR
P
ν=1
ξ2
ν.
Proof: (a) The generic dual maximization problem (2.20) with Agiven by (2.6),
now takes the form (2.30) due to the relation
hy(1)
k1... y(d)
kd, Ai=
R
X
ν=1
ξνhy(1)
k1, u(1)
νi...hy(d)
kd, u(d)
νi.
(b) The compatibility condition ensures the size-consistency of all matrix unfold-
ings. Let us assume that nRfor definiteness. To justify the choice of Z()
0,
we notice that using the contracted product representation (2.7) of the canonical
tensors ACR,n,
A=ξ×1U(1) ×2U(2) ···×dU(d),
42
2.2 Best orthogonal Tucker approximation (BTA)
we have expansion (2.28) for the RHOSVD. Now we start from the error repre-
sentation
AA0
(r)=ξ×1U(1) ×2U(2) ···×dU(d)
ξ×1hZ(1)
0D1,0W(1)
0
Ti×2hZ(2)
0D2,0W(2)
0
Ti···×dhZ(d)
0Dd,0W(d)
0
Ti
=ξ×1hU(1) Z(1)
0D1,0W(1)
0
Ti×2hZ(2)
0D2,0W(2)
0
Ti···×dhZ(d)
0Dd,0W(d)
0
Ti
+ξ×1U(1) ×2hU(2) Z(2)
0D2,0W(2)
0
Ti···×dhZ(d)
0Dd,0W(d)
0
Ti+...
+ξ×1U(1) ×2U(2) ···×dhU(d)Z(d)
0Dd,0W(d)
0
Ti.
To proceed, we introduce
()=U()Z()
0Dℓ,0W()
0
T, U()
0=Z()
0Dℓ,0W()
0
T,
then the th summand in the right-hand side above takes the form
B=ξ×1U(1) ···×1U(1) ×()×+1 U(+1)
0···×dU(d)
0.
This leads to the error bound (by the triangle inequality)
kAA0
(r)k
d
X
=1 kBk=kξ×1(1) ×2U(2)
0···×dU(d)
0k
+kξ×1U(1) ×2(2) ···×dU(d)
0k+...
+kξ×1U(1) ×2U(2) ···×d(d)k.
Here the th term Bcan be represented by
R
X
ν=1
ξνu(1)
ν···u(1)
ν
n
X
k=r+1
σℓ,kz()
kw()
k
r+1
X
k=1
σ+1,kz(+1)
kw(+1)
k ···
rd
X
k=1
σd,kz(d)
kw(d)
k,
providing the estimate (take into account that ku()
νk= 1, = 1, ..., d,ν= 1, ..., R)
kBk
R
X
ν=1 |ξν|(
n
X
k=r+1
σ2
ℓ,kw()
k
2)1/2·(
r+1
X
k=1
σ2
+1,kw(+1)
k
2)1/2···(
rd
X
k=1
σ2
d,kw(d)
k
2)1/2.
Recall that U()(= 1, ..., d) has normalised columns, i.e., 1 = ku()
νk=k
n
P
k=1
σℓ,kz()
kw()
kk,
implying
n
P
k=1
σ2
ℓ,kw()
k
2= 1 for = 1, ..., d and ν= 1, ..., R.
43
2 Tensor structured (TS) methods for functions in Rd,d3
Hence, we finalise the error bound as follows,
kAA0
(r)k
d
X
=1
R
X
ν=1 |ξν| n
X
k=r+1
σ2
ℓ,kw()
k
2!1/2
d
X
=1 R
X
ν=1
ξ2
ν!1/2 R
X
ν=1
n
X
k=r+1
σ2
ℓ,kw()
k
2!1/2
=
d
X
=1 kξk n
X
k=r+1
σ2
ℓ,k
R
X
ν=1
w()
k
2!1/2
=kξk
d
X
=1 n
X
k=r+1
σ2
ℓ,k!1/2
.
The case R < n can be analysed along the same line. Now item (d) follows.
We notice that the error estimate (2.32) in Theorem 2.14 actually provides the
control of the RHOSVD approximation error via the computable -mode error
bounds since, by the construction, we have
kU()Z()
0Dℓ,0W()
0k2
F=
n
X
k=r+1
σ2
ℓ,k, = 1, ..., d.
This result is similar to the well-known error estimate for the HOSVD approxi-
mation (see Theorem 2.12 and Property 10 in [24]).
Based on Theorem 2.14 the corresponding algorithm C BTA for the rank-Rin-
put data can be designed by respective modifications of Steps 1, 2 in the general
GBTA scheme. In this way we note that each column of the -mode unfold-
ing matrix A()for the rank-Rcanonical tensor in (2.6) can be represented as
the weighted sum of the -mode canonical vectors u()
ν(ν= 1, ..., R) implying
rank(A())R. Keeping this modification in mind, the sketch of the new algo-
rithm C BTA reads as follows.
Algorithm C BTA (CR,nTCR,r). Given ACR,nin the form (2.6), an
iteration parameter kmax, and the rank parameter r.
1. For = 1, ..., d compute the truncated SVD of U()to obtain orthogonal
matrices Z()
0Rn×r, representing the rank-rRHOSVD approximation
of -mode dominating subspaces (cost O(dRn min{R, n})).
2. Given an initial guess Z()
0, (= 1, ..., d) for -mode orthogonal matrices.
Perform kmax ALS iterations as at Step 2 in the general G BTA algorithm
to obtain the maximizer V()Rn×r,= 1, ..., d (cost O(drd1nmin{rd1, n})
per iteration).
44
2.2 Best orthogonal Tucker approximation (BTA)
3. Calculate projections of U()onto the basis of orthogonal vectors of W()as
the matrix product V()TU()(= 1, ..., d), at the cost O(drRn).
4. Using the columns in V()TU()(= 1, ..., d), calculate the rank-Rcore
tensor µCR,ras in (2.31), in O(drRn) operations and with O(drR)-
storage.
Notice that Step 2 in Algorithm C BTA (CR,nTCR,r) above is not manda-
tory. It can be omitted if the initial guess Z()
0turns out to be “good enough”
with respect to chosen threshold criterion (see estimate (2.32)). Our numerical
study indicates that in the case of tensors related to the class of functions de-
scribing physical quantities in electronic structure calculations, Step 2 in the C2T
transform is not required.
The following remark addresses the complexity issues.
Remark 2.15 Algorithm C BTA (CR,nTCR,r) exhibits polynomial cost in R, r, n,
O(dRn min{n, R}+drd1nmin{rd1, n}),
with exponential scaling in d. In absence of Step 2 (if RHOSVD provides a sat-
isfactory approximation), the algorithm does not contain iteration loops, and for
any d2it is a finite SVD-based scheme.
Numerical tests show that Algorithm C BTA(CR,nTCR,r) is efficient for mod-
erate Rand n, in particular, it works well in electronic structure calculations on
3D Cartesian grids for moderate grid size n.103and for R103. However,
in real life applications the computations may require one-dimension grid sizes in
the range n.3·104, (= 1,2,3) with canonical ranks R104. Therefore,
to get rid of a polynomial scaling in R, n, r for 3D applications, we develop new
generation of best Tucker approximation methods based on the idea of multigrid
acceleration of the nonlinear ALS iteration, introduced in Section 3.
2.2.4 Mixed BTA for full format and Tucker tensors
A further reduction of numerical complexity for the Tucker model is based on
the concept of the mixed (two-level) Tucker-canonical approximation [60, 67, 68],
see Definition 2.8. The main idea of the mixed approximation consists of a rank-
structured representation to the Tucker core βin certain tensor classes S Br.
In particular, we consider a class S=CR,rof rank-Rcanonical tensors, i.e.,
βCR,r.
In the case of full format tensors, the two-level version of Algorithm G BTA
(VnTr,n) can be described as the following computation chain,
Vn
I
Tr,n
II
TCR,rCR,n,
45
2 Tensor structured (TS) methods for functions in Rd,d3
where the Level-I is understood as application of Algorithm G BTA (VnTr,n)
and the Level-II includes the rank-Rcanonical approximation to the small size
Tucker core βBr. Figure 2.2 illustrates the computational scheme of the two-
level Tucker approximation.
In the case of function related tensors, our goal is to compute the Level-I ap-
proximation with linear cost in the size of the input data (see Section 3.3).
If the input tensor A0is already presented in the rank-rTucker format then one
can apply the following Lemma 2.16. This lemma presents a simple but useful
characterisation of the mixed (two-level) Tucker model (cf. [60, 67]) which allows
to approximate the elements in Trvia the canonical decomposition applied to
the small sized core tensor.
Lemma 2.16 (Mixed Tucker-to-canonical approximation).
Let the target tensor ATr,nin (2.12) have the form A=β×1V(1) ×2... ×d
V(d)with the orthogonal side-matrices V()= [v()
1. . . v()
r]Rn×rand with β
Rr1×...×rd. Then, for a given Rmin
1dr(see (2.37), (2.38)),
min
ZCR,nkAZk= min
µCR,rkβµk.(2.33)
Assume that there exists a best rank-Rapproximation A(R)CR,nof A, then
there is a best rank-Rapproximation β(R)CR,rof β, such that
A(R)=β(R)×1V(1) ×2... ×dV(d).(2.34)
Proof: Below we present the more detailed proof compared with the sketch in
Lemma 2.5, [67]. Notice that the canonical vectors y()
kof any test element (see
(2.6)) in the left-hand side of (2.33),
Z=
R
X
k=1
λky(1)
k... y(d)
kCR,n,(2.35)
can be chosen in span{v()
1,...,v()
r}, i.e.,
y()
k=
r
X
m=1
µ()
k,mv()
m, k = 1, . . . , R, = 1, ..., d. (2.36)
Indeed, assuming
y()
k=
r
X
m=1
µ()
k,mv()
m+E()
kwith E()
kspan{v()
1,...,v()
r},
46
2.2 Best orthogonal Tucker approximation (BTA)
we conclude that E()
kdoes not effect the cost function in (2.33) because of the
orthogonality of V(). Hence, setting E()
k= 0, and substituting (2.36) into (2.35),
we arrive at the desired Tucker decomposition of Z,
Z=βz×1V(1) ×2...×dV(d),βzCR,r.
This implies
kAZk2=k(βzβ)×1V(1) ×2...×dV(d)k2=kββzk2min
µCR,rkβµk2.
On the other hand, we have
min
ZCR,nkAZk2min
βzCR,rk(ββz)×1V(1) ×2...×dV(d)k2= min
µCR,rkβµk2.
Hence, we arrive at (2.33).
Likewise, for any minimizer A(R)CR,nin the right-hand side of (2.33), we
obtain
A(R)=β(R)×1V(1) ×2V(2)... ×dV(d)
with the respective rank-Rcore tensor
β(R)=
R
X
k=1
λku(1)
k... u(d)
kCR,r,
where u()
k={µ()
k,m}r
m=1 Rr, are calculated by using representation (2.36),
and then changing the order of summation,
A(R)=
R
X
k=1
λky(1)
k... y(d)
k
=
R
X
k=1
λk r1
X
m1=1
µ(1)
k,m1v(1)
m1!... rd
X
md=1
µ(d)
k,mdv(d)
md!
=
r1
X
m1=1
...
rd
X
md=1 (R
X
k=1
λk
d
Y
=1
µ()
k,m)v(1)
m1... v(d)
md.
Now the relation (2.34) implies that
kAARk=kββ(R)k,
since the -mode multiplication with orthogonal side matrices V()does not change
the cost function. Using the already proven relation (2.33) this indicates that β(R)
is the minimizer in the right-hand side of (2.33).
47
2 Tensor structured (TS) methods for functions in Rd,d3
Lemma 2.16 means that the corresponding low rank Tucker-canonical approxi-
mation of ATr,ncan be reduced to the canonical approximation of a small-size
core tensor.
Lemma 2.16 suggests a two-level dimensionality reduction approach that leads
to a better data structure compared with the standard Tucker model. Though
A(R)CR,ncan be represented in the mixed Tucker-canonical format, its efficient
storage depends on further multilinear operations. In fact, if the resultant tensor is
further used in scalar, Hadamard or convolution products with canonical tensors,
it is better to store A(R)in the canonical format of the complexity rdn.
2.2.5 Remarks on the Tucker-to-canonical transform
In the rank reduction scheme for the canonical rank-Rtensors, we use conse-
quently the canonical-to-Tucker (C2T) transform and then the Tucker-to-canonical
(T2C) tensor approximation. Next, we give two useful remarks which characterize
the canonical representation of the full format tensors.
Remark 2.17 applied to the Tucker core tensor of the size r×r×r, indicates
that the ultimate canonical rank of a large-size tensor in Vnhas the upper bound
r2. According to Remark 2.18, its canonical rank can be reduced to a smaller
value using the SVD-based truncation procedure up to a fixed tolerance ε > 0.
Denote by nthe single-hole product of -mode dimensions
n=n1···n1n+1 ···nd.(2.37)
Remark 2.17 The canonical rank of a tensor AVnhas the upper bound
Rmin
1dn.(2.38)
Proof: First, consider the case d= 3. Let n1= max
1dnfor definiteness. We can
represent a tensor Aas
A=
n3
X
k=1
BkZk, BkRn1×n2, ZkRn3,
where Bk=A(:,:, k) (k= 1,...,n3) is the n1×n2matrix slice of Aand Zk(i) = 0,
for i6=k,Zk(k) = 1. Let rank(Bk) = rkn2,k= 1,...,n3, then
rank(BkZk) = rank(Bk)n2,
and we obtain
rank(A)
n3
X
n=1
rank(Bk)n2n3= min
13n.
48
2.2 Best orthogonal Tucker approximation (BTA)
The general case of d > 3 can be proven similarly by induction argument.
The next remark shows that the maximal canonical rank of the Tucker core
of 3rd order tensor can be easily reduced to the value r2by the SVD-based
procedure. Though, being not practically attractive for arbitrary high order ten-
sors, the simple algorithm described in Remark 2.18 is proved to be useful for
the treatment of small size 3rd order Tucker core tensors in the rank reduction
algorithms described in the previous sections.
Remark 2.18 Let d= 3 for the sake of clearness. There is a simple procedure
based on SVD to reduce the canonical rank of the core tensor β, within the accu-
racy ε > 0. Denote by BmRr×r,m= 1, ..., r the two-dimensional slices of βin
some fixed mode. Hence, we can represent
β=
r
X
m=1
BmZm, ZmRr,(2.39)
where Zm(m) = 1,Zm(j) = 0 for j= 1,...,r,j6=m(there are exactly dpossible
decompositions). Let pmbe the minimal integer, such that the singular values of
Bmsatisfy σ(m)
kε
r3/2for k=pm+ 1, ..., r (if σ(m)
r>ε
r3/2, then set pm=r).
Then, denoting by
Bpm=
pm
X
km=1
σ(m)
kmukmvkm,
the corresponding rank-pmapproximation to Bm(by truncation of σ(m)
pm+1, ..., σ(m)
r),
we arrive at the rank-Rcanonical approximation to β,
β(R):=
r
X
m=1
BpmZm, ZmRr,(2.40)
providing the error estimate
kββ(R)k
r
X
m=1 kBmBpmk=
r
X
m=1 v
u
u
t
r
X
km=pm+1
(σ(m)
km)2
r
X
m=1 rrε2
r3=ε
Representation (2.40) is a sum of rank-pmterms so that the total rank is bounded
by Rp1+... +prr2.
This approach can be easily extended to arbitrary d3with the bound Rrd1.
49
2 Tensor structured (TS) methods for functions in Rd,d3
2.3 Numerics on BTA of function related tensors in
R3
2.3.1 General description
Here we discuss the best rank-rTucker decomposition of 3D tensors arising as
discretization of classical potentials.
For a given continuous function g: R, := Qd
=1[a, b]Rd,−∞ < a<
b<, we introduce the collocation-type function related tensor of order dby
A0A0(g) := [ai1...id]RI1×...×Idwith ai1...id:= g(x(1)
i1,...,x(d)
id),
where (x(1)
i1,...,x(d)
id)Rdare grid collocation points, indexed by I=I1×...×Id,
x()
i=a+i1
2ba
n, i= 1,2,...,n, = 1,...,d, (2.41)
which are the midpoints of nequally spaced subintervals with the size h=ba
n,
in the intervals [a, b], corresponding to mode . We are interested in the validity
and the rank-dependence of the Tucker approximations to the function related
tensors discretized on 3D Cartesian grid.
The initial tensor A0is approximated by a rank r= (r, ..., r) Tucker tensor,
where the rank-parameter rincreases from r= 1,2, ... to some predefined value,
rmax. The orthogonal Tucker vectors and the core tensor of the size r×r×r
are then used for the construction of the approximating tensor A(r)A0, for
estimating the approximation properties of the tensor decomposition with the
given rank. For every Tucker rank rin the respective range, we compute the
relative error in the Frobenius norm as in (2.2)
EF N =kA0A(r)k
||A0|| ,(2.42)
the relative difference of norms
EF E =kA0kkA(r)k
||A0|| ,(2.43)
as well as the relative maximum norm
EC:= maxi∈I |a0,i ar,i|
maxi∈I |a0,i|.(2.44)
50
2.3 Numerics on BTA of function related tensors in R3
2 4 6 8 10 12
10−12
10−10
10−8
10−6
10−4
10−2
100
Tucker rank
error
Newton potential , b=10, n = 64
EFN
EFE
EC
0 2 4 6 8 10
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
Orthogonal vectors, r=6
x(1)−axis
Figure 2.4: Convergence in the corresponding norms with respect to the Tucker
rank r(left) and orthogonal vectors v(1)
k,k= 1,...,6, (right) for the
Tucker approximation to the Newton potential.
2.3.2 Numerics for classical potentials
We apply the numerical BTA algorithms to the full format tensors, and study the
Tucker decomposition properties of the tensors generated by the Newton kernel,
the Slater-type, Yukawa and Helmholtz functions in Rd,d= 3.
1. Newton kernel
We apply the best rank-rTucker decomposition algorithm with r= (r, ..., r)
for approximating the Newton potential
g(x) = 1
kxk, x R3,
in the cube [0, b]3with b= 10, on the cell-centred uniform grid with n= 64.
Here and in the following kxk=sd
P
=1
x2
denotes the Euclidean norm of x
Rd. Due to spherical symmetry of the function, we consider the same sampling
points x()
i=h/2 + (i1)hfor all space variables. Figure 2.4, left shows stable
exponential convergence of the relative Frobenius, FE and maximum norms with
respect to the Tucker rank up to r= 12. The right hand side of Figure 2.4
shows the orthogonal vectors v(1)
k,k= 1,...,6, for the mode = 1 (x(1)-axis).
The absolute error for the Tucker approximation with rank r= 14 in the cross-
section S:= [h/2, b h/2] ×[h/2, b h/2] ×h/2 is shown in Figure 2.5, where
the maximum value is of the order 107. Figure 2.9 presents matrix slices
MβrR9×9×1, νr= 1,...9, of the core tensor βR9×9×9(see (2.8)), where the
51
2 Tensor structured (TS) methods for functions in Rd,d3
0
5
10
0510
0
2
4
6
8
0
5
10
0510
0
1
2
3
4x 10−7
Figure 2.5: Left: the 3D Newton potential over cross-section S= [h/2, b h/2] ×
[h/2, b h/2] ×h/2, right: the absolute approximation error for the
Tucker decomposition with rank r= 14 (max 107).
6.9e+01 1.0e+01 2.7e+00
8.9e−01 2.9e−01 9.4e−02
2.9e−02 8.4e−03 2.4e−03
Newton potential , b=10, n = 64
Figure 2.6: Coefficients of the core tensor βR9×9×9for the Tucker approxima-
tion of the Newton potential. We show the maximum values of |βν|
for every matrix slice MβrR9×9×1,νr= 1,...,9, of β.
52
2.3 Numerics on BTA of function related tensors in R3
2 4 6 8 10 12 14
10−10
10−5
100
Tucker rank
error
Slater function, b=10, n = 64
EFN
EFE
EC
0 2 4 6 8 10
−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Orthogonal vectors, r=6
y−axis
Figure 2.7: Convergence history and orthogonal vectors of the Tucker side matrix
V(1) Rn×r1,r1= 6, for the Slater potential.
numbers indicate the maximum values of |βν|at a given slice Mβ,νrof β.
Figure 2.6 shows the decay of the entries of the core tensor βfor the Tucker
decomposition with the rank r= 9 (smaller ranks are chosen for simplicity of
visualization). We observe that for the particular class of function related tensors
A0Vn, the core tensor βBrturns out to be sparse (up to a certain thresh-
old). We further demonstrate that the sparsity of the representation coefficients
of the target tensor in the space Brwill be even stronger for the physically rele-
vant functions with strong singularities arising in electronic structure calculations.
2. Slater function
We are interested in the approximate low-rank representation of the Slater type
functions which play significant role in electronic structure calculations.
The Slater function given by
g(x) = exp(αkxk) with x= (x1, x2, x3)TR3,
presents the electron “orbital” (α= 1) and the electron density function (α= 2)
corresponding to the Hydrogen atom. We compute the rank-(r, r, r) Tucker
approximation to the function related tensor defined on the same grid as in the
previous section, with b= 10.
Figure 2.7 shows that the Slater function can be efficiently approximated by low
rank Tucker tensors. In fact, Tucker rank r= 14 provides a maximum absolute
error of the approximation of order 108, see Figure 2.8.
Figure 2.9 presents the matrix slices MβrR9×9×1, νr= 1,...,9, of the core
tensor βR9×9×9, where the numbers indicate the maximum values of the core
entries at a given slice MβrR9×9×1of β. Figure shows that the “energy” of
53
2 Tensor structured (TS) methods for functions in Rd,d3
0
5
10
0510
0
0.2
0.4
0.6
0.8
1
0
5
10
0510
0
0.5
1
1.5
2x 10−8
Figure 2.8: Left: the 3D Slater function in the cross-section S= [0, b h]×
[0, bh]×{0}, right: the absolute approximation error for the Tucker
decomposition with rank r= 14 (max 108), for the same section.
9.2e+00 6.8e−01 1.1e−01
2.4e−02 6.0e−03 1.6e−03
4.7e−04 1.4e−04 4.0e−05
Slater function , b=10, n = 64
Figure 2.9: Coefficients of the core tensor βR9×9×9for the Tucker approxima-
tion of the Slater function. We show the maximum values of |βν|for
every matrix slice MβrR9×9×1,νr= 1,...,9, of β.
54
2.3 Numerics on BTA of function related tensors in R3
2 4 6 8 10 12
10−10
10−5
100
Tucker rank
error
Yukawa potential, b=10, n = 64
EFN
EFE
EC
0 10 20 30 40 50 60
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6 Orthogonal vectors, r=6
Yukawa , AR=10, n = 64
y−axis grid points
Figure 2.10: Tucker approximation of the Yukawa potential and an example of
the Tucker orthogonal vectors v(2)
k,k= 1,...,6, (right).
0 10 20 30 40 50 60
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6 Orthogonal vectors, r=6
Helmholz , AR=10, n = 64
x−axis grid points
Figure 2.11: Convergence history (left) and orthogonal vectors v(1)
k,k= 1,...,6,
(right) for the Tucker approximation of the Helmholtz potential.
the decomposed function is concentrated in several upper slices of the core tensor.
Our numerical experiments show that the dominating entries in βare compactly
concentrated in its ”upper left corner”. This feature of the Tucker decomposition
will be employed further in Section 3.
3. Yukawa and Helmholtz Functions
In the next example, we consider a Tucker approximation of the third-order
function-related tensor generated by the Yukawa potential
g(x) = e−kxk
kxkwith x= (x1, x2, x3)TR3.
We consider the function related tensor with the same “voxel-centred” collocation
55
2 Tensor structured (TS) methods for functions in Rd,d3
−5
0
5
−5
0
5
0
0.05
0.1
0.15
0.2
0.25
Figure 2.12: A slice of a 3D multi-centred Slater potential with 43centers.
points with respect to the n×n×n-grid over [0, b]3with b= 10 as in previous
examples.
Figure 2.10 shows the convergence history and the orthogonal vectors v(2)
k,
k= 1,...,6, of the Tucker decomposition of the Yukawa potential. These vec-
tors represent the problem adaptive orthogonal basis. In almost all cases the
ALS iteration to compute the Tucker approximation converges very fast and it is
terminated at most after 5 iterations.
Fig. 2.11 provides computational results for the Helmholtz function given by
g(x) = cos kxk
kxkwith x= (x1, x2, x3)TR3,
indicating robust convergence in the considered applications.
5. Periodic structures of Slater functions
Finally, we analyse the multi-centered Slater potential“ obtained by displacing
a single Slater function with respect to the m×m×mspatial grid of step-size
H > 0, specifying centers of Slater functions,
g(x) = c
m
X
i=1
m
X
j=1
m
X
k=1
eα(x1iH+m+1
2H)2+(x2jH+m+1
2H)2+(x3kH+m+1
2H)2.(2.45)
Figure 2.12 shows the multi-centred Slater potential for m= 4, H= 3, α= 2
and the corresponding approximation error in the cube [5,5]3on the n×n×n
grid with n= 64, the surface level corresponds to x3= 2. Convergence is shown
in Figure 2.15 (left-top).
56
2.3 Numerics on BTA of function related tensors in R3
2 4 6 8 10 12
10−15
10−10
10−5
100
Tucker rank
Multicentered Slater funct.with 1000 samp.
EFN, n=129
EFE, n=129
−5
0
5
−5
0
5
0
0.2
0.4
0.6
Figure 2.13: Top: rank-convergence of the Tucker approximation for the multi-
centered Slater potential with 103cells ; bottom: a 2D plane section
of the 3rd order tensor representing the multi-centred Slater potential
with 103centers.
57
2 Tensor structured (TS) methods for functions in Rd,d3
2 4 6 8 10 12
10−15
10−10
10−5
100
Tucker rank
Slater function with 4096 nodes
EFN, n=257
EFN, n=129
EFN, n=65
EFE, n=257
EFE, n=129
EFE, n=65
−6 −4 −2 0246
−6
−4
−2
0
2
4
6
0
0.5
Figure 2.14: Top: convergence of G BTA applied to the multi-centered Slater po-
tential with 4096 cells with respect to the Tucker rank; bottom: a
2D plane section of the 3rd order tensor corresponding to the multi-
centred Slater potential with 4096 centers.
58
2.3 Numerics on BTA of function related tensors in R3
Next, we consider function related tensors with a large number of periodic cells.
Figure 2.13 (top) shows the convergence history for the tensor corresponding to
the multicentered Slater function with 103centers. Figure 2.13 (bottom) visualizes
a 2D plane section of this 3rd order tensor. Figures 2.14 show the approximating
error and a 2D plane section of a 3rd order tensor representing the multicentered
Slater function with 4096 cells. Note, that the Tucker approximation using the
grid size n= 257, is computationally feasible only when applying the MGA Tucker
algorithm which we introduce in Section 3.
Investigation of these periodic structures show, that the convergence rate of the
rank-(r, r, r) Tucker approximation practically does not depend on the number of
cells in a considered structure. It is demonstrated that in all cases for the Slater
function (see convergence of the multicentered Slater function in Figures 2.13, 2.14
and the convergence for a single Slater function in Figure 2.7), equal rank param-
eters imply equal accuracy of the Tucker approximation. For example, for the
Tucker rank r= 10, it is exactly 105for all versions of the single/multicentered
Slater function. This feature can be valuable, in the grid-based modelling of peri-
odic (or nearly periodic) structures in the density functional theory. It indicates
that the Tucker decomposition can be helpful in constructing of a small number
of problem-adapted basis functions for huge (almost) periodic clusters of atoms.
6. Multicentered Slater function with random perturbations
Results in the case of the Slater potential which entries are perturbed randomly
are given in Figure 2.15. The random constituent equals to 1, 0.1 and 0.01 per-
cents of the maximum amplitude. It is shown that the exponential convergence
in the Tucker rank is observed only until the approximation level of the order of
the random perturbation. Further increase in the Tucker rank does not improve
essentially the approximation.
7. Conclusions to Section 2.3.2
The numerical examples for the function related tensors presented in Section
2.3.2 have led to further development of the Tucker-based algorithms for the
problems in electronic structure calculations. Here we formulate some conclusions
to above numerics.
Remark 2.19 The Tucker approximation error for the considered class of func-
tion related tensors decays exponentially with respect to the Tucker rank.
Remark 2.20 The shape of the orthogonal vectors in the unitary matrices of the
Tucker decomposition for the class of function related tensors is almost indepen-
dent on n.
59
2 Tensor structured (TS) methods for functions in Rd,d3
2 4 6 8 10 12
10−10
10−5
100
Tucker rank
error
Slater−multi potential, AR=10, n = 32
E(FN)
E(FE)
E(C)
1 1.5 2 2.5 3 3.5 4
10−4
10−3
10−2
10−1
100
relative energy−norm
relative energy
Tucker rank
error
Slater−Mult−Rand 1% , AR=10, n = 64
1 2 3 4 5 6
10−6
10−5
10−4
10−3
10−2
10−1
100
relative energy−norm
relative energy
Tucker rank
error
Slater−Mult−Rand 0.1% , AR=10, n = 64
1 2 3 4 5 6
10−8
10−6
10−4
10−2
100
relative energy−norm
relative energy
Tucker rank
error
Slater−Mult−Rand 0.01% , AR=10, n = 64
Figure 2.15: Convergence history for the multi-centered unperturbed (upper left
figure) and randomly perturbed Slater potential.
60
2.3 Numerics on BTA of function related tensors in R3
Remark 2.21 The entries of the core tensor of the Tucker decomposition for the
considered function related tensors decay fast vs. index k= 1,...,r,= 1,2,3.
Remark 2.22 For a fixed approximation error, the Tucker rank of periodic struc-
tures practically does not depend on the number of cells included in the computa-
tional box (see also numerics in Section 3.3.2).
Properties of the Tucker decomposition for the function related tensors described
in Remarks 2.20 and 2.21 will be used further in the development of the multigrid
accelerated BTA.
2.3.3 Application to functions in electronic structure
calculations
Here, the approximating properties of the orthogonal Tucker-type representation
(2.10) are studied for the 3D electron densities of some simple molecules. The
ALS iterative scheme is applied to compute the low-rank tri-linear approximations
for electron densities of the H atom, LiH, CH4, C2H6and H2O molecules.
We assume that any particular molecule is embedded in a certain fixed com-
putational box [b, b]3with a suitable b > 0. Let ω3[b, b]3be a uniform
n×n×ntensor grid of collocation points introduced in (2.41), and indexed by
I=I1×I2×I3. The molecular orbitals and electron densities of the considered
molecules are represented as
ϕa(x) =
R0
X
k=1
ckagk(x), a = 1, ..., Norb,(2.46)
ρ(x) := 2
Norb
X
a=1
ϕ2
a(x) = 2
Norb
X
a=1 R0
X
k=1
ckagk(x)!2
, x R3,(2.47)
where Norb is the number of electron pairs in a molecule and R0is the number of
Gaussians in the GTOs basis, of type
gk(x) = µk(x1Ak)k(x2Bk)mk(x3Ck)nkexp(αkζ2
k) (2.48)
with
ζ2
k= (x1Ak)2+ (x2Bk)2+ (x3Ck)2, x = (x1, x2, x3)TR3,
where the parameters for the Gaussians are taken from the standard quantum
chemistry package MOLPRO [108]. For each particular molecule we use the
following physically relevant parameters: b= 10 bohr, R0= 10 for H atom,
61
2 Tensor structured (TS) methods for functions in Rd,d3
5 10 15
10−6
10−4
10−2
100
rank
a)
5 10 15
10−6
10−4
10−2
100
rank
b)
5 10 15
10−6
10−4
10−2
100
rank
c)
EFN
EC
EFN
EC
EFN
EC
Figure 2.16: Approximation error in EF N and ECnorms versus Tucker rank for
the electron densities of a) CH4, b) C2H6and c) H2O molecules with
n= 65.
b= 7 bohr, R0= 34 for LiH, b= 5 bohr, R0= 55 for CH4,b= 5.8 bohr, R0= 96
for C2H6and b= 10 bohr, R0= 41 for H2O.
We discretize ρ: [b, b]3R, by the function related tensor of order 3 by
computing its entries according to relations (2.47), (2.48)
A0A0(g) := [ai1i2i3](i1,i2,i3)∈I Rn×n×nwith ai1i2i3:= ρ(x(1)
i1, x(2)
i2, x(3)
i3),
where (x(1)
i1, x(2)
i2, x(3)
i3)ω3are the grid collocation points. In this way, we ob-
tain the tensor representation A0of the corresponding density function, which is
approximated by a rank r= (r, r, r) Tucker decomposition for a sequence of rank-
parameters r= 1,2, . . ., r0. The orthogonal matrices V()Rn×r, = 1,2,3,
and the corresponding core tensor βRr×r×rare then used for the construction
of the approximating tensor A(r)A0. For every rank-(r, r, r) Tucker approxi-
mation, we compute the relative error with respect to the Euclidean norm (2.42),
as well as the relative maximum error (2.44).
The approximation errors shown in Figure 2.16 verify the exponential con-
vergence of the orthogonal Tucker approximation (in the rank-parameter r) of
electron densities reaching the relative accuracy 105for CH4, H2O and C2H6
with r= 16. It is seen that the orthogonal vectors v()
ν(ν= 1, ..., 4, = 1,3) of
the tensor-product decomposition for the H atom, LiH and C2H6molecules shown
in Figures 2.17 and 2.18 resemble the shape of the decomposed electron density
along the corresponding spatial axis. Due to orthogonality of the decomposi-
tion, the Tucker model appears to be suitable for constructing a low dimensional
problem-dependent orthogonal basis. The entries of the core tensor presented in
Figure 2.18 are the weights βν1ν2ν3of the corresponding tensor products of the
orthogonal vectors v(1)
ν1v(2)
ν2v(3)
ν3, which compose the summands of A(r)in (2.8)
for ν6. Figures 2.19 a) and c) visualise the electron density of CH4in a
62
2.3 Numerics on BTA of function related tensors in R3
−10 −5 0 5 10
−0.5
0
0.5
atomic units
a) Hydrogen
V(1)1
V(1)2
V(1)3
V(1)4
−5 0 5
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
atomic units
b) LiH
V(1)1
V(1)2
V(1)3
V(1)4
Figure 2.17: Orthogonal vectors v(1)
ν1,ν1= 1, ..., 4, for the rank-10 orthogonal
Tucker approximation of the electron density for a) the H atom and
b) for the LiH molecule.
1.1e+02 1.6e+01 5.3e+00
3.1e+00 1.2e+00 6.0e−01
C2H6, core tensor
−6 −4 −2 0 2 4 6
−0.4
−0.2
0
0.2
0.4
0.6
C2 H6
atomic units
V(3)
1
V(3)
2
V(3)
3
V(3)
4
V(3)
5
V(3)
6
Figure 2.18: Slices MβrR6×6×1, νr= 1,...,6 of the core tensor βR6×6×6
and the orthogonal vectors v(3)
ν3,ν3= 1, ..., 6 of the rank-(6,6,6)
Tucker decomposition of the electron density of C2H6molecule.
Numbers correspond to the maxima of |βν|for the corresponding
slice of the core tensor.
63
2 Tensor structured (TS) methods for functions in Rd,d3
−5 05
−5
0
5
0
50
a) CH4
−5 05
−5
0
5
0
50
c) C2 H6
−5 05
−5
0
5
0
2
4
x 10−4
d) C2 H6, abs.error, r=16
−5 05
−5
0
5
0
1
2
x 10−5
b) CH4, abs. error , r=16
Figure 2.19: Electron densities and absolute errors of rank r= 16 Tucker approx-
imation for CH4and C2H6molecules.
plane containing the C atom and of C2H6in a plane containing both C atoms,
correspondingly. Figures 2.19 b) and d) visualise the absolute approximation er-
ror for the electron densities ρof these molecules in the corresponding planes for
r= 16. In spite of large values of ρat the cusp regions (60 units) we observe
a rather uniform distribution of the absolute approximation error of the order
104÷105in the computational domain. This is a typical feature of the
orthogonal Tucker decomposition. For H2O with even larger cusp (148 units)
at the origin, we see in Figure 2.16 c) that the Tucker approximation of ρfor this
molecule yields the relative accuracy 105for the Tucker rank r= 16. Finally,
we present the convergence behaviour of the Tucker approximation applied to
the tensor representation of the Hartree potentials of C2H6and H2O molecules,
see Figure 2.20, indicating exponential convergence in the Tucker rank r. The
numerical examples demonstrate efficiency of the low-rank Tucker tensor approx-
imations for the electron densities and the Hartree potential of the considered
molecules. This provides the background for further application of the Tucker
model in low-complexity numerical evaluation of functionals and operators in the
electronic structure calculations, see Sections 3, 4 and 5.
64
2.4 Tensorisation of basic multilinear algebra (MLA) operations
2 4 6 8 10 12 14 16 18
10−4
10−2
100
Tucker rank
error
b) Hartree Potential, C2 H6
EFN
EC
2 4 6 8 10 12 14 16 18
10−4
10−2
100
Tucker rank
c) Hartree Potential, H2 O
EFN
EC
Figure 2.20: Approximation error in EF N and ECnorms versus Tucker rank for
the Hartree potentials of C2H6(left) and H2O (right).
2.4 Tensorisation of basic multilinear algebra
(MLA) operations
For the sake of clarity (and without loss of generality) in this section we assume
that r=r,n=n(= 1, ..., d). If there is no confusion, the index ncan
be skipped. We denote by Nthe complexity of various tensor operations (say,
N,·i) or the related storage requirements (say, Nst(β)). We estimate the storage
demands Nst and complexity of the following standard tensor-product operations:
the scalar product, the so-called Hadamard (component-wise) product, and the
convolution transform. We consider the MLA operations in Tr,n, and CR,ntensor
classes.
The Tucker model requires
Nst,T =drn +rd(2.49)
storage to represent a tensor.
The number of parameters in the rank-Rcanonical model scales linearly in R,
Nst,C =dRn. (2.50)
Setting R=αr with α1, we can specify the range of parameters where the
Tucker model is less storage consuming compared with the canonical one
rd1d(α1)n, (for d= 3 : r23(α1)n).
In general, the numerical Tucker decomposition leads to a fully populated core
tensor, i.e., it is represented by rdnonzero elements. However, the special data
65
2 Tensor structured (TS) methods for functions in Rd,d3
structure of the Tucker core can be imposed which reduces the complexity of the
corresponding tensor operations (cf. [60]). In particular, for the mixed (two-
level) Tucker-canonical decomposition (see Definition 2.8), storage demands scale
linearly in d,
Nst,T C =dr(n+R).
2.4.1 Some bilinear operations in the Tucker format
For given tensors A1Tr1, A2Tr2represented in the form (2.8), i.e.,
A1=β×1U(1) ×2U(2) ...×dU(d), A2=ζ×1V(1) ×2V(2) ...×dV(d),(2.51)
the scalar product (2.2) is computed by
hA1, A2i:=
r1
X
k=1
r2
X
m=1
βk1...kdζm1...md
d
Y
=1 Du()
k, v()
mE.(2.52)
In fact, applying the definition of the scalar product in (2.2) to the rank-1 tensors
(with R=r= 1), we have
hA1, A2i:= X
i∈I
u(1)
i1···u(d)
idv(1)
i1···v(d)
id
=
n1
X
i1=1
u(1)
i1v(1)
i1···
nd
X
id=1
u(d)
idv(d)
id=
d
Y
=1 u(), v().(2.53)
Then the above representation follows by combining all rank-1 terms in the left-
hand side in (2.52).
We further simplify and suppose r=r1=r2. Then the calculation in (2.52)
includes dr2scalar products of vectors of size nplus r2dmultiplications, leading
to the overall complexity
N,·i =Odnr2+r2d,
and the same for the respective norm.
Note that in the case of mixed Tucker-canonical decomposition (see Definition
2.8) the scalar product can be computed in O(R2+dr2n+dR2r) operations (cf.
[60], Lemma 2.8).
For given tensors A, B RI, the Hadamard product ABRIof two tensors
of the same size Iis defined by the componentwise product,
(AB)i=ai·bi,i I.
66
2.4 Tensorisation of basic multilinear algebra (MLA) operations
Hence, for A1, A2Tr, as in (2.51), we tensorize the Hadamard product by
A1A2:=
r
X
k1,m1=1 ···
r
X
kd,md=1
βk1...kdζm1...mdu(1)
k1v(1)
m1... u(d)
kdv(d)
md.
(2.54)
Again, applying definition (2.2) to the rank-1 tensors (with β=ζ= 1), we obtain
(A1A2)i=(u(1)
i1v(1)
i1)···(u(d)
idv(d)
id)
=u(1) v(1)···u(d)v(d).(2.55)
Then (2.54) follows by summation over all rank-1 terms in A1A2. Relation
(2.54) leads to the storage requirements
Nst(AB)=O(dr2n+r2d),
that includes the memory size for dmodes n×r×rTucker vectors, and for the
new Tucker core of size (r2)d.
The multi-dimensional convolution product is one of the basic integral trans-
forms in the wide range of applications including many-particle models (see [27,
59, 65]). In this dissertation we apply the tensor-product convolution scheme
corresponding to the discretisation by piecewise constant basis functions over
uniform Cartesian grids. Methods to compute the multidimensional convolution
on refined grids, by higher order elements and using the wavelet basis were con-
sidered in [42, 43].
For given tensors F= [fi]RI, G = [gi]RI, we define their discrete
convolution product by
FG:= "X
i∈I
figji+1#j∈J
,J:= {1, ..., 2n1}d,
where ji+ 1 I (equivalent to the assumption that Gcan be extended to the
larger index set beyond Iby zeros).
For given A1, A2Tr, see (2.51), we now tensorize the convolution product
via
A1A2:=
r
X
k=1
r
X
m=1
βk1...kdζm1...mdu(1)
k1u(1)
m1... u(d)
kdv(d)
md.(2.56)
This relation again follows from the analysis for the case of rank-1 convolving
tensors Fand G, similar to the discussion for scalar product of tensors,
(FG)j:= X
i∈I
f(1)
i1···f(d)
idg(1)
j1i1+1 ···g(d)
jdid+1
=
n1
X
i1=1
f(1)
i1g(1)
j1i1+1 ···
nd
X
id=1
f(d)
idg(d)
jdid+1 =
d
Y
=1 f()g()j.(2.57)
67
2 Tensor structured (TS) methods for functions in Rd,d3
Assuming that ”one-dimensional” convolutions of n-vectors, u()
kv()
mR2n1,
can be computed in O(nlog n) operations, we arrive at the overall complexity
estimate
N·∗· =O(dr2nlog n+r2d).
In our particular case of equidistant grids we obtain (by setting a=u()
k, b =
v()
mRn)
(u()
kv()
m)j=
n
X
i=1
aibji+1, j = 1, ..., 2n1.
Hence, the 1D convolution can be performed by FFT in O(nlog n) operations.
We notice that the convolution product appears to be one of the most com-
putationally elaborate operations since in general one might have r2dterms in
(2.56). Significant complexity reduction is already observed if at least one of the
convolving tensors can be represented in the rank-Rcanonical format, so that we
have only rdRterms in the sum (2.56).
2.4.2 Summary on MLA operations in rank-Rcanonical format
In our numerical scheme we apply the following linear operations with dth order
tensors:
1. summation of tensors;
2. scalar product of tensors;
3. Hadamard product of tensors;
4. convolution of tensors.
We consider tensors A1,A2, represented in the rank-Rcanonical format, (2.6),
A1=
R1
X
k=1
cku(1)
k...u(d)
k, A2=
R2
X
m=1
bmv(1)
m...v(d)
m,(2.58)
with normalized vectors u()
k, v()
mRn. For simplicity of discussion, we assume
n=n,= 1, ..., d.
1. A sum of two canonical tensors given by (2.58) can be written as
A1+A2=
R1
X
k=1
cku(1)
k...u(d)
k+
R2
X
m=1
bmv(1)
m...v(d)
m,(2.59)
resulting in the canonical tensor with the rank at most RS=R1+R2. This
operation has no cost since it is simply a concatenation of two tensors.
68
2.4 Tensorisation of basic multilinear algebra (MLA) operations
2. For given canonical tensors A1,A2, the scalar product (2.2) is computed by
(see (2.53))
hA1, A2i:=
R1
X
k=1
R2
X
m=1
ckbm
d
Y
=1 Du()
k, v()
mE.(2.60)
Calculation of (2.60) includes R1R2scalar products of vectors in Rn, leading to
the overall complexity
N,·i =O(dnR1R2).
3. For A1, A2given by (2.58), we tensorize the Hadamard product by (see
(2.55))
A1A2:=
R1
X
k=1
R2
X
m=1
ckbmu(1)
kv(1)
m...u(d)
kv(d)
m.(2.61)
This leads to the complexity O(dnR1R2).
4. The convolution product of two tensors in the canonical format (2.58), is
given by (see (2.57))
A1A2:=
R1
X
k=1
R2
X
m=1
ckbmu(1)
kv(1)
m...u(d)
kv(d)
m,(2.62)
leading to the asymptotic complexity O(dn log nR1R2).
69
2 Tensor structured (TS) methods for functions in Rd,d3
70
3 Multigrid Tucker approximation
of function related tensors
3.1 Motivations
In the previous section we have discussed the general BTA algorithms, which
provide complexity of the Tucker tensor approximation of the order
WF2T=O(nd+1) (3.1)
for full format tensors and
WC2T=O(Rn min{R, n}+rd1nmin{rd1, n}),(3.2)
for the canonical rank-Rinput tensors. These bounds restrict application of the
general BTA scheme to small dimensions dand moderate grid size n.
Notice, that the general BTA algorithm is computationally not feasible in elec-
tronic structure calculations on large n×n×n3D Cartesian grids, since:
1. For the full format tensors, O(n4) complexity of HOSVD in the general BTA
algorithm restricts the size of the input tensors (maximum size n128,
for conventional computers). In this case, our goal is to reach linear in the
volume complexity O(n3), allowing maximum size of the input tensors up
to 5123, for conventional computers.
2. In the case of rank-Rinput data, in our applications, both the canonical
rank Rand the univariate grid size nmight be at least of the order 104.
(Fine grids are necessary to resolve strong cusps due to core electrons in
the Hartree potential.) It means that one has to compute the SVD of side
matrices of the sizes 104×104, with complexity of order 1012 which is
computationally unfeasible.
To avoid the above limitations, we introduce the multigrid accelerated (MGA)
Tucker decomposition [68], which is based on the successive reiteration of the ALS
Tucker approximation on a sequence of refined grids, using the results of the coarse
71
3 Multigrid Tucker approximation of function related tensors
grid approximation as the initial guess for the dominating subspaces on finer grid
levels. The idea of the multilevel acceleration originates from investigation of
the numerical examples of BTA for the function related tensors, described in
Section 2. These include Remarks 2.21 and 2.20 on the decay of the Tucker core
entries and the weak dependence of the shape of the orthogonal vectors on the
discretization parameter n, respectively.
We show that in the case of rank-Rcanonical target tensors in Rn×n×n, the
MGA method provides linear complexity (up to low order terms) of the canonical-
to-Tucker decomposition with respect to all input parameters: the maximal uni-
variate grid size n, the canonical rank Rand the Tucker rank r,
WC2T=O(rRn).
The resulting complexity of the decomposition for full format tensors is
WF2T=O(n3),
which currently makes possible the application of multigrid accelerated F2T algo-
rithm to the 3D function-related tensors with the maximal grid size n×n×n=
5123. In fact, applications are only bounded by the storage size for the input ten-
sor. These grids provide the accuracy level required for the consistent electronic
structure calculations in pseudopotential cases.
3.2 Multigrid accelerated BTA of canonical tensors
3.2.1 Basic concept
The concept of the MGA Tucker approximation applies to the multi-dimensional
data obtained as a discretization of physically relevant continuous multivariate
functions on a sequence of refined spatial grids. The typical application areas
include the tensor approximation of multi-dimensional operators and functionals,
the solution of integral-differential equations in Rd, data-structured representation
of physically relevant quantities [14, 52].
For a fixed grid parameter n, let us introduce the equidistant tensor grid
ωd,n := ω1×ω2···×ωd,(3.3)
where ω:= {−b+ (k1)h:k= 1, ..., n + 1}(= 1, ..., d) with mesh-size
h= 2b/n. Define a set of collocation points {xi}in := [b, b]dRd, located
at the midpoints of the grid-cells, and numbered by i I := {1, ..., n}d(see the
explicit definition in (2.41)). For fixed n, the target tensor An= [an,i]RIis
72
3.2 Multigrid accelerated BTA of canonical tensors
defined by sampling the given continuous multivariate function f: Ron the
set of collocation points {xi}as follows
an,i=f(xi),i I.
The idea of the multigrid accelerated best orthogonal Tucker approximation is
based on the following principles (topics 3-4 below apply to CR,ninitial data):
1. General multigrid concept. Solving a sequence of nonlinear approximation
problems for A=Anas in (2.12) with n=nm:= n02m, m = 0,1, ..., M,
corresponding to a sequence of (d-adic) refined spatial grids ωd,nm. The
sequence of approximation problems is treated successively in one run from
coarse-to-fine grid.
2. Coarse initial approximation to the side-matrices V(q),q= 1, ..., d.Specifi-
cally, the initial approximation of the q-mode orthogonal side-matrices V(q)
on finer grid ωd,nmis obtained by the linear interpolation of the correspond-
ing orthogonal vectors from the coarser grid ωd,nm1.
3. Most important fibers. (Applies to CR,ninitial data.) We employ the
idea of “most important fibers” (MIFs) of the q-mode unfolding matrices
B(q)Rn×rq(see (2.26) in Step 2 of basic Algorithm G BTA, Section 2.2.2),
whose positions are extracted from data on the coarser grids. To identify
the location of MIFs we apply the so-called maximal energy principle as
follows. On the coarse grid, we calculate a projection of the q-mode un-
folding matrix B(q)onto the true q-mode subspace span{v(q)
1, ..., v(q)
rq}with
V(q)= [v(q)
1...v(q)
rq], which is computed as the matrix product,
β(q)=V(q)TB(q)Rrq×rq.(3.4)
Now the maximal energy principle specifies the location of MIFs by finding
pr columns in β(q)with maximal Euclidean norms (supposing that pr rq),
see Figures 3.1 and 3.2. The positions of MIFs are numbered by the index
set Jq,p with #Jq,p =pr, being the subset of the larger index set
Jq,p Jrq:= {1, ..., rq},#Jrq=rq=O(rd1).
This strategy allows to predict a fixed portion of q-mode fibers in the Tucker
core on fine grids, which accumulate the maximum part of the Frobenius
norm. The union of selected fibers from every mode q(specified by the
index set Jq,p,q= 1, ..., d) accumulates the important information on the
structure of the rank-Rtensor in the space of Tucker coefficients Rr1×···×rd.
This information enormously reduces the amount of computational work on
the fine grids.
73
3 Multigrid Tucker approximation of function related tensors
+ ... +
n1n n n
1 1 1
n
n3
2
r
r
r1
2
r r2
r33 r33
rr22
3
B(q)
find MIFs
βcore (q)
+ ... +
n3
n2
n1
Figure 3.1: Illustration for d= 3. Finding MIFs in the “preliminary core” β(q)
for q= 1 for the rank-Rinitial data on the coarse grid n=n0=
(n1, n2, n3). B(q)is presented in a tensor form for explanatory reasons.
2 4 6 8 10 12 14
2
4
6
8
10
12
14
2 4 6 8 10 12 14
2
4
6
8
10
12
14
2 4 6 8 10 12 14
2
4
6
8
10
12
14
Figure 3.2: MIFs: selected positions of the fibers of the preliminary “cores” for
computing V(1)(left), V(2) (middle) and V(3) (right). The example is
taken from the multigrid rank compression in the computation of the
Hartree potential for H2O, r= 14, p= 4.
4. Performing restricted ALS iteration. The proposed choice of MIFs allows
to reduce the cost of ALS iteration to solving the problem of best rank-r
approximation to the large unfolding matrix B(q)Rn×rqwith dominating
second dimension rq=rd1(always the case for large d). In fact, we reduce
one step of ALS iteration to computation of the r-dimensional dominating
subspace of small n×pr submatrices B(q,p)of B(q)(q= 1, ..., d), where
p=O(1) is some fixed parameter (SVD with matrix-size n×pr instead of
n×rq).
The invention of above principles leads to dramatic complexity reduction of the
standard tensor algorithms G BTA(VnTBr) and C BTA(CR,nTCR,r). In the
latter case, this approach leads to the efficient tensor approximation method with
linear scaling in nand R, up to the computational cost on the coarsest level. In
the case of fully populated tensors we arrive, at least, at the linear cost O(nd),
corresponding to the storage space for the initial tensor.
74
3.2 Multigrid accelerated BTA of canonical tensors
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
2 4 6 8 10 12 14 16 18 20
2
4
6
8
10
12
14
16
18
20
Figure 3.3: MIFs: selected projections of the fibers of the preliminary cores” for
computing V(1)(left), V(2) (middle) and V(3) (right). The example
from the MGA rank reduction in computation of VH, in pseudopoten-
tial case of CH4with r= 22, and number of MIFs p= 8.
3.2.2 Description of the Algorithm and complexity bound
For further constructions, we use the 1D interpolation operator I()
(m1,m)from the
coarse to fine grids acting in spatial direction = 1, ..., d. This might be either the
interpolation by piecewise linear or cubic splines (the latter is our choice in the
current implementation). In the following we focus on the case of rank-Rinput.
The proposed algorithm of the MGA best Tucker approximation for ACR,n
can be outlined as follows:
Algorithm MG C BTA (CR,nMTCR,r). (Multigrid accelerated canonical-
to-Tucker approximation).
1. Given AmCR,nmin the form (2.6), corresponding to a sequence of grid
parameters nm:= n02m, m = 0,1, ..., M. Fix a structural constant p=O(1)
(i.e. pr rd1), iteration parameter kmax, and the Tucker rank r.
2. For m= 0, solve C BTA(CR,n0 TCR,r) and compute the index set Jq,p(n0)
Jrqvia identification of MIFs in the matrix unfolding B(q),q= 1, ..., d, us-
ing the maximum energy principle applied to the q-mode unfolding of the
Tucker core β(q)as in (3.4).
3. For m= 1, ..., M perform the cascadic MGA nonlinear Tucker approxima-
tion by the restricted ALS iteration:
3a) Compute the initial guess for side matrices on level mby interpolation
I(m1,m)of the side matrix from level m1 (by using piecewise linear or
cubic splines)
V(q)=V(q)
m=I(m1,m)(V(q)
m1), q = 1, ..., d.
3b) For each q= 1, ..., d, fix V()(= 1, ..., d, 6=q) and perform:
75
3 Multigrid Tucker approximation of function related tensors
U(l) (l)
m=0,..., M, l=1,...,d
HOSVD : 00
Given β ,U(l)
m, for,p>0
,
find pMIFs in B,specify
, l=1,...,d
compute projections
ALS for m=1,...,M (l)
m−1
(l)
m
using
its canonical rank
compute µMand reduce
q=1,..., d
(q)
Bµ0
using matrix unfolding of
compute dominating subspaces of
J
q,p
(q)
(l)
0
U
T
)
(l)
0
and the core tensor µ 0
Jq,p
compute reduced unfoldings B
B(q,p) m
(l)
(q,p)
V
(V
interpolate V V
V
Figure 3.4: Flow chart of Algorithm MG C BTA applied to the rank-Rtarget.
Compute matrix products V()TU(),= 1, ..., d;6=q, and construct
the ”restricted” q-mode matrix unfolding B(q,p)
B(q,p)=B(q)|Jq,p(n0)Rnm×pr,
by calculating pr columns in the complete unfolding matrix B(q)Rnm×rq.
Update the orthogonal matrix V(q)=V(q)
mRnm×rvia computing the
r-dimensional dominating subspace for the ”restricted” matrix unfolding
B(q,p)(truncated SVD of nm×pr matrix).
4. Compute the rank-Rcore tensor βCR,r, as in Step 3 of basic algorithm
CBTA (CR,n TCR,r).
Figure 3.4 shows the flow chart of Algorithm MG C BTA.
The next statement proves the linear complexity of Algorithm MG C BTA.
Theorem 3.1 Algorithm MG C BTA(CR,nMTCR,r) amounts to
O(dRrnM+dp2r2nM)
operations per ALS loop, plus extra cost of the coarse mesh solver C BTA (CR,n0TCR,r).
It requires O(drnM+drR)storage to represent the result.
76
3.2 Multigrid accelerated BTA of canonical tensors
20 40 60 80 100
10−10
10−8
10−6
10−4
10−2
100
n=128
n=256
n=512
n=1024
n=2048
Figure 3.5: Singular values of the mode-1 matrix unfolding B(1,p),p= 4.
Proof: Step 3a) requires O(drnM) operations and memory. Assume without loss
of generality that pr nM, hence the complexity of Step 3b) on the finest level
is bounded by O(dRrnM+dprnM+dp2r2nM) per iteration loop. The rank-R
representation of βCR,rrequires O(drRnM) operations and O(drR)-storage.
Summing up these costs over the levels m= 1, ..., M, proves the result since the
relation nM(1 + 1/2 + ...1/2M1)<2nM.
Figure 3.5 shows the singular values of the mode-1 matrix unfolding B(1,p)
with the choice p= 4, which demonstrates the reliability of the maximal energy
principle in the error control. Similar fast decay of respective singular values
is typical in most of our numerical examples in electronic structure calculations
considered so far. Remarkably, that the “representative subset” of fibers Jq,p
normally has the size pr of several r-s with pr. The (controllable) decay
of singular values of the small-size restricted unfolding matrix B(q,p)provides a
criterion for the satisfactory choice of parameter p. If the decay is not fast enough
the algorithm can be restarted with the larger parameter pp+ 1.
Figure 3.6 demonstrates the linear scaling of the MGA Tucker approximation
in the input rank R, and in the grid size n, applied to C2T rank reduction of the
electron density of CH4. Further numerical illustrations to the above algorithm
will be presented in Section 3.2.3.
77
3 Multigrid Tucker approximation of function related tensors
200 400 600 800 1000 1200
0
10
20
30
40
50
60
70
80
90
rank
seconds
n=2048
n=1024
n=512
n=256
Figure 3.6: Linear scaling in Rand n, of the C2T rank reduction algorithm.
Notice that the complexity and error of the MGA Tucker approximation can
be effectively controlled by the adaptive choice of the governing parameters p,r
and n0. Figure 3.7 shows the dependence of computational accuracy of Algorithm
MG C BTA on the choice of the number of important fiber pr and the Tucker
rank r, in the case of Hartree potential of C2H6.
3.2.3 Numerics on rank reduction of the electron density ρ
The MG C BTA algorithm has been evoked by the problem of fast computation of
the Hartree potential VHand the respective Coulomb matrix in the Fock operator,
using multilinear algebra in the tensor-structured formats, see Section 4.
In the evaluation of the Hartree potential for pseudodensities of some simple
molecules good accuracy of order 106hartree in the max-norm is achieved already
by computation on moderate n×n×ngrids with n= 400 and n= 800. In this
case the unigrid C2T algorithm described in Section 2.2.3 is sufficient.
However, when computing the all electron densities of molecules, which contain
strong cusps due to core electrons contribution (see the example of all electron
density for the water molecule presented in Figure 3.8), large grid parameters nof
the order of several thousands are required to ensure the high resolution of local
singularities.
In [19], where the wavelet basis set is used for approximation of the electronic
78
3.2 Multigrid accelerated BTA of canonical tensors
−5 0 5 10
1
2
3
4
5
6
x 10−3
atomic units
abs. error for VH ,C2 H6 , n=8192
r=18, p=5 (times:3.7,4.3)
r=16, p=3 (2.3,3.4)
r=14, p=2 (1.8;2.8)
Figure 3.7: Absolute error in the Hartree potential of C2H6vs. different multigrid
parameters rand p, visualised in the grid interval centered at (0,0,0).
Figure 3.8: Electron density of H2O in the cross-section = [4,4]×[4,4]×{0}.
79
3 Multigrid Tucker approximation of function related tensors
1000 2000 3000 4000 5000 6000 7000 8000
0
10
20
30
40
50
60
70
80
90
100
X: 8192
Y: 3.45
univariate grid size
minutes
C2 H6, HP, rT =22
unigrid C2T
MG C2T
Figure 3.9: Multigrid vs. single grid MATLAB times for the canonical-to-Tucker
transform in computations of the Hartree potential of C2H6molecule,
r= 22, p= 4.
structure quantities, it is shown that the Hartree potential of CH4and C2H6
molecules can be resolved in the cusp area with accuracies of the order of 103,
by computations with the univariate grid size n= 5.1·103in the volume of [b, b]3
with b= 20 au. Hence, the univariate mesh size of 3D Cartesian grids should be
not larger than h= 2b/n 8·103au.
In our tensor-structured computations, we apply uniform grids with the mesh-
size up to h= 1.3·103au, corresponding to the univariate grid size n= 16384
with b= 10.6 au. Large grids are required for the representation of functions with
multiple strong singularities discretized on large 3D uniform grids. The examples
of cusps of electron densities of some molecules are presented in Figures 3.8 for
the water molecule, and in Figure 2.19 for CH4and C2H6molecules.
The algorithm for computation of the Hartree potential by the tensor-product
convolution will be discussed in Section 4.2, where we demonstrate numerical re-
sults on the accuracy and computational cost. Here we present the comparison
of the MATLAB times for the unigrid and multigrid rank reduction algorithms
which are used in the accurate and fast computation of the Hartree potential on
large 3D Cartesian grids. Figures 3.9 and 3.10 show the comparison of MATLAB
times of the unigrid C2T versus MGA C2T algorithms applied to calculations for
C2H6and CH4molecules, respectively. Figures 3.9 and 3.10 show the compari-
son of MATLAB times of the unigrid C2T versus MGA C2T algorithms applied
to C2H6and CH4molecules, respectively.
80
3.2 Multigrid accelerated BTA of canonical tensors
2000 4000 6000 8000
0
5
10
15
20
25
30
grid size
minutes
MG C2T
Single grid C2T
Figure 3.10: Multigrid vs. single grid CPU times for the canonical-to-Tucker
transform in computations of the Hartree potential of CH4molecule.
n312822563512310243204834096381923
p= 4 C2T 10 12 19 41 52 95 188
r= 20 CONV 0.2 0.8 1.5 10.6 22 58 160
p= 16 C2T 34 44 67 118 200 388 770
r= 28 CONV 0.5 1.8 3.2 15.6 35 108.2 352
Table 3.1: Elapsed times (in sec) for the C2T rank reduction of ρand the tensor-
product convolution (CONV) in the computation of VHfor the C2H6
molecule.
Tables 3.1 and 3.2 give computation times for the C2T rank reduction and of
the convolution with the Newton kernel for electron densities of H2O and C2H6
molecules. The initial canonical ranks Rρ0of the tensors are 861 and 4656, for
H2O and C2H6, respectively. The C2T algorithm yields the ranks Rρr p.
Computation times depend on the level of the thresholds εin the algorithms C2T
and T2C. For example, we observe accuracies of the order 104for parameters
p= 2, r= 16, and of the order 105for parameters p= 8, r= 20 in computations
for H2O molecule. For C2H6molecule, p= 4, r= 20 yield accuracies of the order
103, and p= 16, r= 28, the accuracies of the order 103. We observe, that
for moderate accuracies the times for the C2T and T2C rank reduction and the
81
3 Multigrid Tucker approximation of function related tensors
n32563512310243204834096381923163843
p= 2 C2T 4 4.5 5.5 5.7 12.2 23.7 66
r= 16 CONV 0.6 1.1 5.5 12 38 127 255
p= 8 C2T 6.1 8 12 18 39 75 129
r= 20 CONV 1.8 3.2 15.6 34 140 378 784
Table 3.2: Elapsed times (in sec) for the C2T rank reduction of ρand the
tensor-product convolution (CONV) in the computation of VHfor H2O
molecule.
corresponding convolution times are reduced dramatically.
3.3 Multigrid accelerated BTA for the full format
function related tensors
In the case of full format tensors the main principles of the multigrid concept
are based on topics 1, 2 and 4 from §3.2.1. Here we describe the algorithm of
the MGA Tucker approximation of the function related tensors discretized on a
sequence of grids specified as in Section 3.2.
Algorithm MG G BTA (VnTr). (MGA full-to-Tucker approximation).
1. Given AmVnin the form (2.1), corresponding to a sequence of grid
parameters nm:= n02m, m = 0,1, ..., M. Fix the Tucker rank r, and the
iteration number kmax.
2. For m= 0, solve the approximation problem by Algorithm G BTA(Vn0
Tr) via kmax steps of ALS iteration.
3. For m= 1, ..., M, perform the cascadic MGA Tucker approximation:
3a) Compute the initial guess for the side matrices on level mby interpola-
tion I(m1,m)from level m1 (using piecewise linear or cubic splines)
V()=V()
m=I(m1,m)(V()
m1), = 1, ..., d.
3b) Starting with the initial guess V()(= 1, ..., d) perform kmax steps of
the ALS iteration as in Step 2 of Algorithm G BTA (see §2).
4. Compute the core βby the orthogonal projection of Aonto Tn=d
=1T
with T= span{v()
ν}r
ν=1 (see Remark 2.7),
β=A×1V(1)T×2... ×dV(d)TBr,
82
3.3 Multigrid accelerated BTA for the full format function related tensors
at the cost O(rdn).
The complexity of the MGA Tucker approximation by the ALS algorithm applied
to full format tensors is given in the following Lemma.
Lemma 3.2 Suppose that r2nmfor large m, then the numerical cost of Algo-
rithm MG G BTA is estimated by
WFT=O(n3
Mr+n2
Mr2+nMr4+n4
0) = O(n3
Mr+n4
0).
Proof: In Step 2, the HOSVD on the coarsest grid level requires O(n4
0) operations
(which for large n=nmis negligible compared to other costs in the algorithm).
Next, for fixed n=nmthe assumption r2nimplies that at every step of the
ALS iterations the costs of the consequent contractions to compute the n×r2un-
folding matrix B(q)is estimated by O(n3r+n2r2), while the SVD of B(q), requires
O(nr4) operations. Summing up over the levels completes the proof, taking into
account that the Tucker core is computed in O(n3
Mr) operations.
3.3.1 Numerics on the MGA Tucker approximation (ρ1/3)
Figure 3.11 shows the numerical example of the MGA Tucker approximation to
fully populated tensors given by the 3D Slater function e−kxk, (x[b, b]3,b=
5.0), sampled over large n×n×nuniform grids with n= 64,128,256 and 512.
The MATLAB times (min) of the MGA Tucker decomposition corresponding to
data in Figure 3.11 are shown in Figure 3.12. It is worth to note that the times for
the grid size 2563outperform essentially the existing benchmark high performance
algorithm based on the Newton-type method on the Grassman manifold presented
in [92] (maximal grid size there is less than 2563). Another efficient approach
for the Tucker decomposition based on the cross-approximation algorithm was
recently presented in [86].
The results on the Tucker approximation for the irrational function of electron
density ρ,ρ1/3, discretized over the tensor grids in [b, b]3,b= 5.0, with n512,
are shown in Figure 3.13 (corresponding to CH4molecule). Notice that the low-
rank tensor representation of the exchange” potential ρ1/3, is an important issue
in the density functional theory computations. As it was mentioned before, the
convergence upon the Tucker rank depends on physical relevance” (regularity
properties) of the function. Our multigrid accelerated scheme requires SVD of
complexity O(nr2min{n, r2}), where rn, instead of O(n4) in the unigrid
approach. As a result, the admissible grid size for applicability of the MGA
Tucker decomposition to full-format tensors is limited only by the amount of
available computer memory for tensor representation, O(n3).
83
3 Multigrid Tucker approximation of function related tensors
2 4 6 8 10 12
10−14
10−12
10−10
10−8
10−6
10−4
10−2
100
Tucker rank
Slater function, MGA Tucker, EFN (solid), EEN (dashed)
n=512
n=256
n=128
n=64
n=512
n=256
n=128
n=64
Figure 3.11: Convergence of the MGA Tucker approximation with respect to the
Tucker rank rapplied to the discretized Slater function (relative
Frobenius norm).
2 4 6 8 10 12
0
0.5
1
1.5
2
2.5
3
3.5
Tucker rank
minutes
Times for MGA Tucker, Slater function
n=512
n=256
n=128
Figure 3.12: MATLAB times for the MGA Tucker approximation for the dis-
cretized Slater function.
84
3.3 Multigrid accelerated BTA for the full format function related tensors
14 16 18 20 22
10−6
10−5
10−4
10−3
10−2
Tucker rank
relative error, ρ1/3 ,pseudodensity of CH4, kmax=4
n=128
n=256
n=512
Figure 3.13: Relative approximation error (Frobenius norm) of the MGA Tucker
approximation applied to the discretized exchange” potential ρ1/3.
3.3.2 BTA of the electron density of Aluminium clusters
In our next example, we consider the Tucker approximation of the electron den-
sity of the Aluminium molecular clusters originating from the large scale finite
element (FE) calculations in the OFDFT method [32, 33]. By using the multigrid
accelerated Tucker decomposition procedure, the appropriate mapping subspaces
are computed for the data from the FE simulations interpolated to the uniform
3D Cartesian grid.
FE computations have been performed for two configuration of the Aluminium
lattice, further called cluster14 and cluster172, containing 14 and 172 atoms, re-
spectively. Figure 3.14 visualises the middle-point plane cross-section of electron
densities for these two configurations.
First, we interpolated (by cubic splines) the electron density from the initial
FE unstructured grid onto an nm×nm×nmCartesian grid for a sequence of
mesh parameters nm, thus obtaining 3D tensors Anm,m= 0,1, ..., M. Since
the accuracy level of the finite element discretization for the considered examples
was resolved on a Cartesian grid of size n3
M= 2003, there was no need for finer
interpolation grids. Then the tensor AMis approximated by the rank-(r, r, r)
Tucker model applying Algorithm MG G BTA to full-format tensors. We check
the Tucker approximation error versus the rank-parameter r, increasing from
r= 1,2,...to some predefined value.
Figure 3.15 shows the first 6 orthogonal vectors of the matrix V(1), for cluster14
(left) and cluster172 (right), respectively. We can observe that the shape of the
85
3 Multigrid Tucker approximation of function related tensors
−10 −5 0510
−10
−5
0
5
10
0
0.02
0.04
0.06
cluster1/sdv4 , initial ρ for Al, z=0
−10
0
10
−15
−10
−5
0
5
10
15
0
0.05
Al, ρ , cluster1/subdiv4, z=0
Figure 3.14: a) Pseudo-density of Aluminium cluster on the regular n3-grid for
n= 200: cluster14 (left), cluster172 (right).
−8 −6 −4 −2 0 2 4 6 8
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
u1(1)
u2(1)
u3(1)
u4(1)
u5(1)
u6(1)
−15 −10 −5 0 5 10 15
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4 orthogonal vectors, ρ for Al, cluster3,subdiv3
Figure 3.15: a) Orthogonal vectors of matrices V(1) for cluster14 (left) and
cluster172 (right), (r= 6).
Tucker orthogonal basis functions “resolves“ the complicated shape of the con-
sidered lattice structure, providing good approximation properties already with
moderate Tucker ranks, as seen in the next Figure 3.16. This feature can be use-
ful for the construction of the small-size set of problem-adapted orthogonal basis
functions.
Figure 3.16 presents the absolute error in the Tucker approximation with the
rank r= 10 for the cluster14 (left) and for the cluster172 (right). The maximum
absolute approximation errors for both clusters are bounded as 104, which is
in the range of the error of the employed FE approximation, see [14]. We observe
convergence in the Frobenius norms EF N with respect to the Tucker rank rover
Cartesian grids of size 1013, 2013for cluster14 (left), and 913, 1813for cluster172
86
3.3 Multigrid accelerated BTA for the full format function related tensors
Figure 3.16: a) The absolute error of the rank -10 Tucker approximation to the
Aluminium cluster14 with n= 201 (left) in the cross-section S=
[10,10] ×[10,10] × {0}, and for cluster172 with n= 181 in the
cross-section S= [18,18] ×[18,18] ×{0}(right).
(right), correspondingly.
Figures in 3.17 demonstrate that even the Tucker rank r= 6 is sufficient to
represent the data from FE simulations with the absolute error up to 104÷105
(corresponding to relative accuracy 102÷103, see blue lines in Figure 3.17). For
the Tucker rank r= 8 and n= 200, the storage requirements are O(r3+ 3rn)
=
5300, in contrast to approximately 20MB of the original data.
We observe that the original tensor can be represented in the Tucker or canon-
ical tensor format using a relatively tiny number of coefficients. For example,
the original 3D tensor corresponding to the electron density of the Aluminium
cluster172 can be approximated (up to relative accuracy 103) either by r3
T= 83
basis functions constructed using the combinations of the orthogonal vectors in
Rnfrom the respective Tucker mapping, or by Rr2
Tbasis functions using the
canonical vectors in Rn. Note, that the Tucker rank rto achieve the resolution
of FE scheme remains almost independent of the increasing number of atoms in
a cluster, as we already observed in the numerics on (almost) periodic structures
in Section 2.3.2. It indicates that the numerical complexity for solving larger
problems would increase only linearly in the univariate grid size n.
87
3 Multigrid Tucker approximation of function related tensors
2 4 6 8 10 12 14 16 18 20
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
Tucker rank
Al., cluster1 subdiv4, AR=20.0, n=51,101,201
2 4 6 8 10 12 14 16 18 20
10−6
10−5
10−4
10−3
10−2
10−1
100
Tucker rank
Aluminium el. dens., cluster3/subdiv4
EFN n=181
EFN n=91
EFN n=46
EEN n=181
EEN n=91
EEN n=46
Figure 3.17: Convergence of the Tucker tensor approximation for cluster14 (top)
and cluster172 (bottom).
88
4 TS computation of the Coulomb
and exchange Galerkin matrices
4.1 General remarks
In this section, we consider a novel approach for the numerical evaluation of the
Hartree and exchange integral operators in the Hartree-Fock equation. It is based
on the tensor-structured representation of the involved functions and operators on
3DCartesian grids and the agglomerated computations of the Hartree potential
and the Coulomb and exchange matrices using the tensor-structured operations
described in Section 2.4.2. This provides linear complexity scaling with respect
to the univariate grid size n.
Reduction of the computational complexity in electronic structure calculations
based on low-rank separable representations of functions is a usual practice in
quantum chemistry. But, the low rank concept is applied in the choice of well-
separable basis functions enabling the analytical evaluation of the one- and two-
electron integrals corresponding to the Hartree and the exchange operators. Usu-
ally, Gaussian-type orbital (GTO) basis sets are used for the evaluation of the
involved integrals [3, 53, 25, 20].
Our grid-based approach for the TS computations of functions and operators
in the Hartree-Fock equation is essentially different. We discretize the basis func-
tions on the n×n×nCartesian grid and perform integrations with the ag-
glomerated densities/orbitals in tensor-product format using efficient multilinear
algebra on low-rank tensors. Therefore, our main goal is to construct the numer-
ical algorithms which enable computations on fine enough spatial grids to fulfill
the accuracy requirements, and to introduce stable rank reduction schemes, also
conserving the required high accuracy.
For reducing the initial or intermediate ranks of the canonical tensors, we use
the combination of the MGA C2T with the consequent T2C transforms, also
scaling linearly in all input parameters. As shown in Section 2.3.3 on the examples
of the low-rank orthogonal Tucker tensor approximations to electron densities and
Hartree potentials of some molecules, exponential convergence with respect to the
Tucker rank is observed.
89
4 TS computation of the Coulomb and exchange Galerkin matrices
This enables an efficient tensor-product convolution for computation of the
Hartree potential using a collocation-type approximation via piecewise constant
basis functions on a uniform n×n×ngrid. In a similar way, the tensor-product
convolution is applied to the Hartree-Fock exchange. As it was already mentioned,
the complexity of the tensor product convolution introduced in [61] is of order
WCC=O(R1R2nlog n),
where R1and R2are the canonical ranks of 3D tensors representing the convolving
kernel and density, respectively, and nis the one-dimension grid size. Note that
the tensor-product convolution outperforms significantly the 3D FFT of complex-
ity O(n3log n), that is linear-logarithmic with respect to the volume size of the
3D tensor (see Table 4.1).
It should be noted, that our algorithms for the computation of the Coulomb
and the exchange matrices work as black-box” schemes, since in the convolution
integrals (1.8) and (1.10) only a pair of 3D tensors is involved - the grid-based
electron density/molecular orbitals and the Newton kernel. Therefore, there are
no restrictions on grid-based basis functions which can be used for the grid rep-
resentation of the electron density and orbitals.
Notice, that due to the grid-based concept, our approach is very flexible con-
cerning the character of the chosen Galerkin basis set, see properties (A)–(D) in
Section 5.1.3. In the present computations, we use the GTO basis sets, mostly,
due to possibility of easy evaluation of the computational accuracy for all impor-
tant entities (comparison with MOLPRO), beginning from the calculation of the
Hartree and exchange potentials, and up to energy estimates from the solution of
the Hartree-Fock equation in the tensor-structured format.
4.2 Accurate evaluation of the Hartree potential by
the tensor-product convolution in R3
We consider the computation of the Hartree potential
VH(x) := ZR3
ρ(y)
kxykdy =ρ1
k·k(x), x R3,(4.1)
that is the convolution product of the Newton kernel with the electron density
ρ(y) = 2
Norb
X
a=1
(ϕa(y))2.(4.2)
We apply the discrete tensor-product convolution on tensor grids in R3, described
in [61].
90
4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3
In general, our computational scheme for the convolution integral (4.1) involves
the following steps (which will be considered further in details):
1. Compute the grid-based canonical representation of the electron density
of the molecule using the expansion coefficients for GTOs for the electron
orbitals, and by discretizing the corresponding GTO basis functions on n×
n×ntensor grid. In this way, the initial rank Rρ0of the canonical tensor
representing ρis of the order 103or even 104, depending on the molecule.
2. Reduce the rank of the electron density by consequent MGA C2T and T2C
transforms. Then the canonical rank of ρis reduced by an order of magni-
tude, which is sufficient for the fast discrete tensor-product convolution. As
shown in Section 3.2 the MGA C2T algorithm yields linear scaling in nand
the initial rank Rρ0. As a result of this step, we obtain the computationally
feasible rank RρRρ0.
3. Compute the convolution (4.1) (representing the Hartree potential) in tensor-
product format using rank-RNcanonical representation of the Newton po-
tential with RNabout 20 ÷30 and canonical representation of the electron
density obtained in Step 2.
We use the total electron density ρ(y), y= (y1, y2, y3)R3, in a form including
rank-R0separable representation of orbitals in GTO basis set,
ϕa(x) =
R0
X
k=1
ckagk(x), a = 1, ..., Norb,(4.3)
so that ρ(y) is given by the polynomial-exponential sum
ρ(y) := 2
Norb
X
ν=1 R0
X
k=1
ckν
3
Y
=1
(yA
k)β
keαkkyAkk2!2
,(4.4)
where Ak= (A1
k, A2
k, A3
k)R3(k= 1, ..., R0) correspond to the locations of atoms
in a molecule, β
kN0,Norb is the number of electron pairs, and R0is the number
of GTO basis functions. For example, the number of GTO basis functions is given
by R0= 55,41 and 96 for CH4, H2O and C2H6molecules, respectively.
We assume that a particular molecule is embedded in an appropriate compu-
tational box [b, b]3. We use b= 10.6 atomic units (au) for H2O and b= 14.6
au for CH4and C2H6molecules. In the following, we represent the convolving
tensor corresponding to ρ(y) in the canonical format. Since the products of two
Gaussians in (4.4) can be written in the form of a single Gaussian by recomputing
the coefficients as
eλ(ya)2·eβ(yb)2=eσ·eγ(yc)2, σ =λβ(ab)2
λ+β, γ =λ+β, c = +
λ+β,
91
4 TS computation of the Coulomb and exchange Galerkin matrices
and taking into account the symmetry of the representation, we arrive at the
following bound on the initial rank of the input tensor
RRρ0=R0(R0+ 1)
2.(4.5)
Hence, for the considered particular molecules, we have the following ranks of the
discrete canonical representation of the electron densities:
Rρ0,CH4= 1540, Rρ0,C2H6= 4656, Rρ0,H2O= 861.
In our computational scheme, we apply Algorithm MG C BTA for rank reduction
of the electron density tensor (see the discussion below).
We apply the collocation scheme to discretise the convolution product in tensor
formats (see [61]). To that end we introduce the equidistant tensor grid in the
computational box [b, b]3,
ω3,n := ω1×ω2×ω3, ω:= {−b+ (m1)h:m= 1, ..., n + 1},(4.6)
with = 1,2,3, and mesh-size h= 2b/n, and define a set of collocation points
{xm} ω3,n,m M := {1, ..., n + 1}3, composing the tensor grid ω3,n. For
technical reasons we further assume that n= 2k,kN. Now we choose a set
of piecewise constant basis functions {φi},i I := {1, ..., n}3, defined as the
characteristic functions of elementary grid-cells in ω3,n. For a given continuous
function fsupported in the computational box [b, b]3(i.e. f(y) = 0 for y
R3\[b, b]3), let fi=f(yi), i I, be the (collocation) representation coefficients
of fin the basis set {φi},
f(y)X
i∈I
fiφi(y), y [b, b]3,(4.7)
where yiis the midpoint of the grid-cell numbered by i I (see (2.41) for explicit
definition).
Now the discrete collocation convolution scheme recovers approximately the
values of the exact convolution product
(fp)(x) = ZR3
f(y)p(xy)dy, x [b, b]3,
in the set of collocation points {xm},
fp[wm]m∈M, wm:= X
i∈I
fiZR3
φi(y)p(xmy)dy, xmω3,n.
We refer to [61] concerning particular assumptions on the convolving density f
and convolving kernel function p.
92
4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3
In the following applications, our particular choice is specified by f(y) =
ρ(y), y [b, b]3, where ρis the electron density, and p(xy) = 1/kxyk,
being the classical Newton (Coulomb) potential. To compute the integral trans-
forms, we apply the piecewise constant approximation to discretize the separable
GTO basis functions of the type
gk(y) =
3
Y
=1
(yA
k)β
keαkkyAkk2
3
Y
=1
g()
k(y), y R3, k = 1,...,R0,(4.8)
by rank-1 n×n×ntensors. The respective canonical vectors are obtained by sam-
pling the univariate functions g()
k(y) at the centers of intervals prescribed by the
tensor grid ω3,n. Specifically, we define the sampling points {yi= (y1
i1, y2
i2, y3
i3)}, i
I={1,...,n}for = 1,2,3, where, as above, yiis the midpoint of the grid-cell
numbered by i I.
The rank-1 tensor representing the single Gaussian gk,k= 1, ..., R0, is given
by the canonical rank-1 tensor
Gk[gi]i∈I =γ(1) γ(2) γ(3) Vnwith entries gi=g(1)
i1·g(2)
i2·g(3)
i3,(4.9)
where
γ()
k={γ()
k,i }iIV, γ()
k,i = (y
iA
k)β
keαk(y
iA
k)2, = 1,2,3.
At the first step, the 3rd order coefficients tensor F= [fi]RIapproximat-
ing the electron density ρ, is represented by a rank-Rρ0canonical tensor in the
“discretized” GTO basis set1corresponding to (4.4).
At the second step, we precompute the coefficients tensor P= [pi]RI,
pi=ZR3
φi(y)
kykdy, i I.(4.10)
The coefficient tensor P= [pi] corresponding to the Coulomb potential 1
kxyk
is approximated in the rank-RNcanonical tensor format using optimised sinc-
quadratures [98, 10, 61], where the rank parameter RN=O(|log ε|log n) depends
logarithmically on both the required accuracy ε > 0 and the grid-size n. In all
computations presented below we choose the tensor rank in the range RN
[20,30] to ensure the desired accuracy of order 104÷106.
1To reduce the numerical complexity of the tensor-product convolution it is approximated
either in the rank-(r, r, r) Tucker format or via the canonical model with the tensor rank
R < Rρ0using the MGA canonical-to-Tucker-to-canonical rank reduction scheme described
in Section 3.
93
4 TS computation of the Coulomb and exchange Galerkin matrices
At the third step, following [61, 67], we compute the set of values wm,m M,
by copying the corresponding portion of entries (centred at j=n) taken from the
discrete tensor convolution of 3rd order tensors,
FP:= {zj}, zj:= X
i∈I
fipji+1,j J := {1, ..., 2n1}3,(4.11)
where the sum is over all i,j I, which lead to legal subscripts for pji+1, i.e.,
ji+1 I. Specifically, we define wm=zm+n/21,m M.
Remark 4.1 The tensor [wm]can be identified with the piecewise trilinear in-
terpolating function uniquely defined by its values at the grid (collocation) points
{xm},m M. Sampling this interpolant at cell-centered points {yi},i I,
defined above, we return the coefficient tensor VVn=RIof the resultant ap-
proximate convolution, providing the representation in the same basis set {φi}as
for the initial electron density f=ρ. In the following, the resultant coefficients
tensor VVnover the index set Iwill be denoted by VFTPfp.
Representing Fin the rank-Rcanonical format (see (2.6)) enables us to compute
FPin the form (see Section 2.4.2)
FP=
RN
X
k=1
R
X
m=1
ckbmu(1)
kv(1)
mu(2)
kv(2)
mu(3)
kv(3)
m,(4.12)
which leads to the cost
NCC=O(RRNnlog n).
Thus, computation of the convolution product with the rank R=Rρ0of the input
tensors has practical limitations since the exact rank of the resulting tensor is the
product of those for the convolving arrays. For reducing the tensor rank of the
convolving density, we apply the fast algorithm MG C BTA(CR,nMTCR,r) for
the canonical-to-Tucker multigrid transform defined on the sequence of refined
grids, followed by the Tucker-to-canonical transform of the core tensor. In this
way, the canonical rank Rcan be reduced by the order of magnitude, from several
thousands to few hundreds or even tens, depending on the input data and the
required accuracy.
Table 4.1 shows the advantage of the fast tensor-product convolution method
with the rank recompression via the MG C BTA algorithm, compared with those
based on 3D FFT (of the complexity O(n3log n)). We present the MATLAB
CPU time for a high accuracy computation of the Hartree potential for the H2O
94
4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3
n312832563512310243204834096381923163843
3DFFT (sec) 4.3 55.4 582.86000 2 years
ConvCC (sec) 1.0 3.1 5.3 21.9 43.7 127.1 368.6 700.2
CRTr(sec) 1.5 3.8 8.2 16.4 32.0 64.0 136.0
Table 4.1: Comparison of the 3D FFT and the tensor-product convolution. The
bottom line shows the times for C2T rank reduction.
molecule on a Sun Fire X4600 computer with 2,6 GHz processor. The CPU time
for the FFT-based scheme with n1024 is obtained by extrapolation, with factor
10 for each step of grid refinement.
As it was mentioned, in general, the complexity of the single grid canonical-
to-Tucker transform depends polynomially on the parameters of the model, see
Algorithm C BTA (CR,nTCR,r). The multigrid acceleration technique employed
by MG C BTA algorithm provides calculations of the 3D integral transform (4.1)
using benchmark grid sizes up to Nvolume = 163843in small computational times,
of the multigrid preprocessing and discrete 3D convolution. Figure 4.2 b) demon-
strates the linear scaling of the rank reduction algorithm in the grid-size n.
The approximation error of the discrete tensor-product convolution V(n)
H=
Vapproximating the Hartree potential VH(x) (see Remark 4.1) and applied to
the particular class of electron density functions appears to be of order O(h2)
(in the Frobenius norm), see the error analysis in Theorems 2.2 and 2.3, [61].
Following [61], we apply the Richardson extrapolation technique to obtain higher
accuracy approximations of order O(h3) without extra computational cost. The
corresponding Richardson extrapolant V(n)
H,Rich approximating VH(x) over a pair
of nested grids ω3,n and ω3,2nand defined on the coarser n×n×ngrid is given
by
V(n)
H,Rich = (4 ·V(2n)
HV(n)
H)/3 over the grid-points in ω3,n.(4.13)
Figure 4.1 illustrates the effect of the Richardson extrapolation in the compu-
tation of VHfor the pseudodensity case of CH4on small n×n×ngrids with
n= 112,224,448. Due to a distinct asymptotic convergence ratio of 4 in the
computations on dyadically refined grids, the Richardson method gives an im-
provement of accuracy from 103to 105÷106.
For the all electron case of moderate size molecules large grids are required to
enable high accuracy of the computations in the presence of cusps in the electron
density due to core electrons contribution. Figures 4.2 a), 4.3 a) and 4.4 a) show
the accuracy of the computation of the Hartree potential of the H2O, CH4and
C2H6molecules on logarithmic scale. We present the absolute error for VHin
the subinterval along the x-axis compared with the corresponding values of VH
95
4 TS computation of the Coulomb and exchange Galerkin matrices
−6 −4 −2 0 2 4 6
0
0.5
1
1.5
2
x 10−3
atomic units
abs. error
abs. error for VH in subinterval
n=448
n=224
n=112
−6 −4 −2 0 2 4 6
0
2
4
6
8
10
12
x 10−5
X: 5.551e−16
Y: 5.531e−06
atomic units
Richardson error
abs. Richardson error for VH
112−224
224−448
Figure 4.1: Absolute approximation error in the Hartree potential VHfor the
pseudo-density of CH4in the subinterval = [7,7] × {0} × {0}
(left) and the reduced error by Richardson extrapolation involving
two pairs of grids (right).
computed by the MOLPRO package. We compare the computational error for
the grid sizes n= 4096,8192 and for the corresponding Richardson extrapolation
of order O(h3).
We observe the accuracy 8 ·105hartree at the cusp region corresponding to
Carbon atom in CH4, 5 ·105hartree at the cusp region corresponding to the
Oxygen atom in H2O, and 5 ·106hartree at the cusps corresponding to the
Carbon atoms in C2H6. Note that maximum values of the Hartree potential at
the cusp points are VH(0,0,0) = 8.35, 11.73 and 9.49 hartree for the CH4, H2O
and C2H6molecules, respectively. This yields relative accuracy of the order 106
in the cusp region for the considered molecules.
Figures 4.2 b), 4.3 b) and 4.4 b) show the CPU times for the CH4, H2O and
C2H6molecules, respectively, for C2T preprocessing and 3D convolution. The
total time for VHat a fixed grid size n=nfconsists of a sum of preprocessing
times from all previous levels, starting from the initial grid with, say n0= 64, up
to nf, plus the convolution time for the level with n=nf.
The computational time depends strongly on the chosen accuracy of approx-
imation, which is prescribed by the multigrid parameters pand r, see Theorem
3.1. It should be noted that the CPU times can be crucially reduced if the ac-
curacy of the result is not so demanding, say up to 103. In the applications
discussed above the accuracy requirements are challenging, which leads to the in-
96
4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3
−6 −4 −2 0 2 4 6
10−4
10−3
hartree
Abs. approx. error, V
H for H2O
a) atomic units
n=4096
n=8192
Ri−4096−8192
2000 4000 6000 8000 10000 12000 14000 16000
0
1
2
3
4
5
6
univariate grid size n
minutes
H2O , rT=20
C−2−T time
3D conv. time
Figure 4.2: a) Absolute approximation error of the tensor-product computation
of the Hartree potential of the water molecule in the subinterval =
[6,6] ×{0}×{0}, and the corresponding CPU times (bottom).
97
4 TS computation of the Coulomb and exchange Galerkin matrices
−5 0 5
10−6
10−5
10−4
10−3
a) atomic units
hartree
abs. approx. error, V
H for CH4
n=4096
n=8192
Ri−8192−4096
2000 4000 6000 8000
0
2
4
6
8
10
b) univariate grid size n
minutes
C−2−T time
3D conv. time
Figure 4.3: a) Absolute approximation error in the Hartree potential of CH4
molecule measured in the subinterval = [8,8] ×{0}×{0}, and b)
the corresponding CPU times.
98
4.2 Accurate evaluation of the Hartree potential by the tensor-product convolution in R3
−4 −2 0 2 4 6
10−6
10−4
10−2
100
a) atomic units
hartree
C2H6, abs. appr. error for V
H
n=4096
n=8192
Ri−4096−8192
1000 2000 3000 4000 5000 6000 7000 8000
0
2
4
6
8
10
12
univariate grid size n
minutes
C2H6 , rT=30, p=10
C−2−T time
3D conv. time
Figure 4.4: a) Absolute approximation error in the Hartree potential of C2H6in
the subinterval = [5,7]×{0}×{0}, the corresponding CPU times
(bottom).
99
4 TS computation of the Coulomb and exchange Galerkin matrices
crease of parameters rand pin the MGA scheme, and consequently, the demands
on computational resources.
4.3 Tensor computation of the Coulomb matrix
The Coulomb (Galerkin) matrix J={Jkm}R0
k,m=1 for the Hartree potential VH(x)
with respect to the set of normalized Gaussians {egk}is given elementwise by
Jkm := ZR3egk(x)egm(x)VH(x)dx, k, m = 1,...R0.(4.14)
For given rank-1 tensors Gkrepresenting Gaussians {egk}on n×n×n-grid, and
the Hartree potential tensor V=V(n)
Hcomputed for the considered molecule as
above, we calculate the Galerkin Coulomb matrix {Jkm}using the tensor scalar
product and Hadamard product operations as discussed in Section 2.4,
Jkm := hGk, GmV(n)
Hi, k, m = 1,...R0.(4.15)
Since the molecular orbitals are specified by the Galerkin coefficients matrix C=
{cka} RR0×Norb as in (4.3), in the following, we denote the nonlinear dependence
in Jby J=J(C).
4.4 Numerics: the Coulomb matrices of CH4, C2H6
and H2O molecules
In this section we present numerical results on the tensor-structured computation
of the Coulomb matrix for some particular molecules. Next figures show the
difference between the reference matrices obtained by MOLPRO and our TS
calculations of the Coulomb matrix for H2O and small organic molecules in all
electron case. Figure 4.7 visualises [Jkm] for H2O and the absolute error of the
TS computations.
Figures 4.6 shows absolute values of [Jkm] for C2H6and the absolute approxima-
tion error of our TS calculations using the MGA parameters r= 26 and p= 18.
Figures 4.5 show the Coulomb matrix errors for computations over the grids with
the univariate sizes n= 4096 and n= 8192. Note, that for matrix elements the
decay of the error has a factor of 4. This relation is a prerequisite for efficiency
of the Richardson extrapolation (4.13). Then we obtain the resulting Coulomb
matrix errors shown in Figure 4.6. Note, that the entries of the Coulomb matrix
corresponding to cusp areas are approximated by the Richardson technique with
100
4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules
even better accuracy than other diagonal matrix entries, due to the almost exact
convergence factor of 4 observed.
Results for the Coulomb matrix of CH4molecule are shown in Figure 4.8.
The upper Figure shows the absolute values of the matrix entries. We perform
the multigrid tensor-structured calculations, using rank reduction with the MGA
parameters Tucker rank r= 24 and the number p= 16 of most important fibers.
Lower figure in 4.8 shows the ultimate approximation error in computations using
the Richardson extrapolation over the grids with n= 4096 and n= 8192.
Notice, that the errors for the diagonal matrix elements corresponding to cusp
areas, are below 4·106, 8·106and 8·105for the CH4, C2H6and H2O molecules,
respectively. The accuracy of the order of 105÷106is defined by the chosen error
control parameter ε > 0 in the tensor-structured computations of the Coulomb
matrix. By using smaller values of εat all steps of computations it is possible to
get better accuracies at the expense of the computation time.
101
4 TS computation of the Coulomb and exchange Galerkin matrices
Figure 4.5: Absolute approximation error in the Coulomb matrix entries of C2H6
computed with the grid sizes 40963(top) and 81923(bottom).
102
4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules
Figure 4.6: Absolute values of the Coulomb matrix entries of the C2H6molecule
(top) and absolute approximation error of calculations using MGA
rank reduction (bottom).
103
4 TS computation of the Coulomb and exchange Galerkin matrices
Figure 4.7: a) Coulomb matrix (absolute values) for H2O, b) the absolute ap-
proximation error of the multigrid tensor product computation of the
Coulomb matrix for H2O.
104
4.4 Numerics: the Coulomb matrices of CH4, C2H6and H2O molecules
Figure 4.8: CH4molecule. Top: absolute values of the Coulomb matrix entries.
Bottom: absolute approximation error in the Coulomb matrix ob-
tained by the MGA tensor-structured method (r= 24, p= 16),
with the Richardson extrapolation over the grids with n= 4096 and
n= 8192.
105
4 TS computation of the Coulomb and exchange Galerkin matrices
Figure 4.9: Absolute approximation error in the Coulomb matrix entries of CH4
molecule computed with the grid size 163843.
4.5 Agglomerated computation of the Hartree-Fock
exchange
In this section, we consider the tensor product approximation of the nonlocal
(integral) exchange operator Kin the Hartree-Fock equation by the agglomer-
ated tensor-product operations used in Section 4.2 for the computation of the
Hartree operator. Note that the calculation of the exchange Galerkin matrix in
the Hartree-Fock equation is a challenging problem due to the nonlocal character
of the exchange operator
(Kψ) (x) :=
Norb
X
a=1 ZR3
ϕa(x)ϕa(y)
kxykψ(y)dy x R3,(4.16)
which leads to the integration in six dimensions (see (4.17)). This problem is
usually solved analytically by evaluating the so-called two-electron integrals using
naturally separable basis sets like Gaussians, see [109, 78] and references therein.
Here, we propose and implement the grid-based evaluation of the Hartree-Fock
exchange (4.16) by using the tensor product approximation of the included oper-
ators and functions on the 3D Cartesian grid. We apply the fast tensor product
106
4.5 Agglomerated computation of the Hartree-Fock exchange
convolution for the multivariate functions in Rd(see Section 4.2), providing com-
plexity O(dn log n).
To cover the general case of molecular geometries, in the numerical examples,
we use equal grid sizes nfor three spatial dimensions (a cubic computational box)
and do not employ information on a symmetry of the molecules. Therefore, our
scheme works as a grey-box” algebraic algorithm, where as the input data only
the discrete representation of the Galerkin basis functions is used. However, the
algorithm works as well with arbitrary n1×n2×n3grids.
Our initial algorithm for the evaluation of (4.16) has the complexity O(R2
0nlog n+
nef R4
0Norb), where nef nis the “effective” univariate grid size, and R0is the
number of Galerkin basis functions. Here we reduce the constant in the linear com-
plexity scaling in nby truncating the effective size of the computation intervals,
where the values of rapidly decaying basis functions (in particular, Gaussians) are
less than a threshold controlling the accuracy of computations. Thus, we have for
the number of grid points in effective support of the interacting vectors, nef =αn,
with αmuch less than 1.
To reduce the R0-asymptotics to O(R3
0), we further apply the canonical-to-
Tucker algorithm for decreasing the ranks of intermediate results after every con-
volution step. The corresponding rank reduction algorithms are considered in
Section 3.
The accuracy of the computation of matrix entries on a particular grid is es-
timated in numerical experiments by O(h2), where h=O(n1) is the step-size
of the grid. We achieve O(h3) accuracy in our evaluation of the exchange matrix
by using the Richardson extrapolation on a couple of consequent grids. Usually,
the size of the computational box for small organic molecules is in the range of
14 ÷20
A. Since the TS methods enable computations on huge 3D Cartesian
grids, the univariate grid step-sizes of applied discretizations range from h0.02
Afor n= 1024, up to h0.0008
Afor the benchmark grids with the number of
entries n3= 163843.
4.5.1 Galerkin exchange operator in the Gaussian basis
The exchange Galerkin matrix K=Kex with respect to the set of normalized
“Cartesian Gaussians {gk}k=1,...R0is given by
Kex ={Kij}R0
i,j=1, Kij := 1
2ZR3ZR3
gi(x)τ(x, y)
kxykgj(y)dxdy, i, j = 1,...R0,
(4.17)
107
4 TS computation of the Coulomb and exchange Galerkin matrices
where the density matrix τ(x, y) is defined in terms of electron orbitals as
τ(x, y) = 2
Norb
X
a=1
ϕa(x)ϕa(y),
over all occupied orbitals a.
The low cost of the three-dimensional convolution using the canonical repre-
sentation of the convolving tensors makes possible the agglomerated numerical
evaluation of the exchange matrix in the Fock operator. For this purpose, we
split the integration in (4.17) into the following steps. First, we compute the con-
volutions of the pointwise products of molecular orbitals with the vectors from
the normalized Gaussian basis set
Waj(x) = ZR3
ϕa(y)gj(y)
kxykdy a = 1,...,Norb, j = 1,...,R0.(4.18)
These quantities are further used for the calculation of contributions to the
Galerkin matrix entries from every orbital a,
Vij,a =ZR3
ϕa(x)gi(x)Waj(x)dx, i, j = 1,...R0.(4.19)
The entries of the exchange matrix are then the sums of the corresponding values
over all orbitals
Kij =
Norb
X
a=1
Vij,a, i, j = 1,...Norb.(4.20)
We compute the exchange matrix (4.17) by the numerical scheme (4.18) - (4.20)
using the discrete tensor product representation of arising functions and operators.
The occupied orbital of the molecule is considered as an expansion over the
basis set of separable continuous functions gk(x),
ϕa(x) =
R0
X
k=1
ckagk(x), x = (x1, x2, x3)R3,(4.21)
where the basis functions gk,k= 1,...,R0, are represented as the rank-1 canon-
ical tensor products,
gk(x) = g(1)
k(x1)g(2)
k(x2)g(3)
k(x3),(4.22)
with 1,2,3 designating spatial dimensions. Hence, with a fixed set of Galerkin
basis functions {gk}, the exchange matrix Kdepends (nonlinearly) on the rep-
resentation coefficients matrix C={cka} RR0×Norb , that will be specified as
K=K(C).
108
4.5 Agglomerated computation of the Hartree-Fock exchange
4.5.2 Discrete computational scheme
GTOs are used as conventional basis sets in electronic structure calculations due
to their separability in spatial variables which is useful in the analytical evalua-
tion of the two-electron integrals in the calculation of the Hartree and exchange
potentials.
In the following, for numerical illustrations, we choose the discretized Gaussians
as vectors in the rank-1 canonical representations of the basis functions, mainly for
the sake of convenient verification of the computational results (the corresponding
Galerkin matrix) with the standart MOLPRO output [108].
The rank-1 GTO basis functions gk(x), k= 1,...R0, are given by (4.22) with
R= 1, where g()
k(y()) denotes the generalized univariate Gaussians. The uni-
variate Gaussians g()
k(y()), = 1,2,3, are functions with infinite support given
as
g()
k(y()) = (y()A()
k)β()
kexp(αk(y()A()
k)2), y()R, αk>0,
where β()
k= 0,1,... is the polynomial degree, and the points (A1
k, A2
k, A3
k)R3
specify the positions of nuclei in a molecule. In our scheme we use the discrete
basis functions (given by vectors of the canonical tensor representation (2.6))
which are constructed by discretizing the Gaussians on the given tensor grid by
using the associated piecewise constant basis functions, as in Section 4.2.
Assume that the molecule is embedded in a certain fixed computational box
[b, b]3with a suitable b > 0. For simplicity of notation, we take n=nequal
for all dimensions. We use the equidistant tensor grid ω3,n , see (4.6), with grid
points {xm},m M := {1, ..., n + 1}3. We use a representation like (4.7) with
f(y) = gk(y), where the rank-1 coefficients tensor Gk=γ(1)
kγ(2)
kγ(3)
kis given
by the values of -mode functions g()
kat the centers y()
iof the intervals of the
univariate grid [x()
i, x()
i+1], i= 1,...,n. This results in canonical vectors of
length nwith the entries {g()
k(y()
i)}n
i=1,
γ()
k={g()
k(y()
i)}n
i=1 Rn,for = 1,2,3, k = 1,...R0.(4.23)
such that we have Gk=γ(1)
kγ(2)
kγ(3)
k. Given the coefficients matrix C={cka}
(k, a = 1, ..., R0) corresponding to expansion (4.21), then by summing tensor prod-
ucts of the canonical vectors with the corresponding weights cka, we obtain the
discrete representation of the orbital ϕa,a= 1,...Norb, in the rank-R0canonical
format,
Ua=
R0
X
k=1
ckaγ(1)
kγ(2)
kγ(3)
k, cka R,(4.24)
109
4 TS computation of the Coulomb and exchange Galerkin matrices
where R0is the number of basis functions. This discretization can be considered
as a representation in the Galerkin set of basis functions {Gk}obtained by repre-
senting the initial continuous basis set {gk}via piecewise constant basis functions
{φi}on the uniform grid (see (4.23)).
We use the rank-RNcanonical tensor product representation of the coefficient
tensor P= [pi]RIfor the Newton potential 1
kxyk, given in (4.10). As in
Section 4.2 this tensor is precomputed by using the optimized sinc-quadratures
[10, 61], where the rank parameter RN=O(|log ε|log n) depends logarithmically
on both the required accuracy ε > 0 and the univariate grid size n. Recall that for
our computations tensor P, representing the Newton potential has the canonical
rank in the range 20 RN30, depending on the one-dimension grid size nand
accuracy requirements ε > 0.
Below, we present Algorithm 1 describing the computational scheme for evalu-
ation of (5.11) - (5.13) in tensor product format2.
Lemma 4.2 The complexity of Algorithm 1 for the computation of the exchange
Galerkin matrix Kex in the Hartree-Fock equation using the discretized GTO basis
is estimated by
WKex =O(NorbRN(R2
0nlog n+R4
0nef )).
Proof: This estimate includes the cost of the evaluation of convolutions in
(5.11) for every orbital, O(NorbRNR2
0nlog n), and the scalar product (5.12) of
the rank-/R0RN) tensor Θa,k with the products of the orbitals and Gaussians,
O(NorbRNR4
0nef ).
Since the canonical rank RNof tensor Pcorresponding to the Coulomb potential
depends only logarithmically on n, it can be treated as a constant in the following
complexity estimate.
Remark 4.3 Notice that the rank reduction of the canonical tensor Θa,k after
step (5.11) reduces the complexity to
WKex =O(NorbR3
0nef ).(4.25)
In the case of large molecules further optimization up to O(NorbR2
0nef )-complexity
is possible due to the rank reduction applied to the rank-R0orbitals (tensors Ua).
2The Hadamard product θa,j =UaGjin the Algorithm 1 can be either (1) stored for all
vectors γkat step (A) or (2) recomputed before evaluation of the scalar products at step
(C) . Due to very low cost of this operation, and large storage requirements for the case of
large grids, O(R2
0n), we prefer the case (2).
110
4.5 Agglomerated computation of the Hartree-Fock exchange
Algorithm 1 Computation of the Exchange Matrix in Tensor Arithmetic
Input data: rank-R0canonical tensors UaVn,a= 1,...,Norb, specified by
the coefficients matrix C={cka} RR0×R0, rank-RNtensor PVn,rank-
1canonical tensors Gk=γ(1)
kγ(2)
kγ(3)
k,k= 1,...R0, and the filtering
threshold εF>0.
(A0) Find effective supports σj[b, b]3for γj,j= 1,...,R0, by εF-
thresholding,
σj=σ(1)
j×σ(2)
j×σ(3)
j,where σ()
j={i:|γ()
j(y()
i)| εF} {1,...,n}, = 1,2,3.
for a= 1,...,Norb
for k= 1,...,R0
(A) Compute the Hadamard product θa,k =UaGkof tensors Uaand Gkby
using (2.61).
(B) Compute the tensor convolution Θa,k =θa,k Pby using (4.12).
for j= 1,...,R0
(C) Compute the restricted scalar products in the window σj,
Ka,k,j =hθa,j,Θa,ki|σj,
end for j
end for k
end for a.
(D) Sum the matrix elements over all orbital indices, Kkj =PNorb
a=1 Ka,k,j, for
k, j = 1,...,R0.
Output data: the exchange matrix Kex(C) = {Kkj}R0
k,j=1.
Remark 4.4 The rank-R0tensors Ua,a= 1,...,Norb, representing the orbitals,
can be chosen as the Galerkin basis set {Ga},a= 1,...,Norb, where Norb is
usually much smaller than R0. This may relax the critical dependence, O(R4
0)
and O(R3
0)as in Lemma 4.2 and Remark 4.3 above (see also Lemma 3.1 in [70]).
1. Reducing complexity by reduction of the initial rank RΘ=RNR0.
The maximal initial rank of tensor Θa,k at the step (B) in Algorithm 1 is given
by RΘ=RNR0. We perform the rank reduction for this tensor by the C2T
and T2C algorithms introduced and discussed in details in Sections 2 and 3. In
particular, it is shown that the multigrid version of the C2T algorithm applied to
the 3-rd order rank-Rcanonical tensors has linear complexity with respect to all
input parameters: the canonical rank R, the Tucker rank r, and the univariate
grid size n. Thus, we can reduce the complexity of Algorithm 1 to (4.25) by
111
4 TS computation of the Coulomb and exchange Galerkin matrices
CH4CH3OH C2H5OH
RΘ1250 1875 2775
rT= 12, ǫT107,RRED 80 90 110
coefR15 20 23
rT= 10, ǫT106,RRED 50 70 100
coefR25 26 27
Table 4.2: Rank reduction for Θa,k in computation of the exchange matrix for the
pseudopotential case of some molecules.
solely multilinear algebraic methods, which do not take into account any previous
knowledge on the molecular structure.
Table 4.2 shows the average rank reduction by the C2T and T2C algorithms
applied to tensor Θa,k in calculations for the molecules CH4, CH3OH and C2H5OH.
We present the reduced canonical ranks RRED (and respective Tucker ranks rT) of
the tensors corresponding to the largest value, over the parameters a= 1,...,Norb,
k= 1,...R0, to achieve the prescribed approximation error ǫT,
RRED = max
1aNorb,1kR0
RRED(a, k),
where RRED(a, k) denotes the reduced canonical rank of Θa,k, for a given ϕaand
gk. Table 4.2 gives also the corresponding reduction coefficient, coefR=RΘ
RRED .
2. Windowing procedure for fast computation of the scalar products.
We compute the algebraic tensor representation of the discrete electron orbitals
Uagiven by (4.24) using the coefficients of their representation in the discrete
Gaussian basis set γ()
k. It turns out by the construction that most of γ()
khave
local character (fast exponential decay) with respect to the size of the whole
computation domain [b, b]3. Therefore we precompute effective supports of the
canonical vectors γ()
kby truncating their parts, which are below some predefined
threshold ε > 0. We call it the “windowing” procedure for finding the active
interval for each Gaussian. In our case, the resulting effective vector size of the
canonical vectors is at average 3 times smaller than the corresponding grid size n
even for small molecules. The resulting “effective” univariate grid size is nef =αn,
with α=α(ε)<1. For example, for small molecules α0.2÷0.3 for ε= 105.
This leads to reduced cost of the scalar products with respect to the univariate
grid size n.
We expect much stronger windowing effect in the case of large molecules, while
in this case, it can be directly applied to the Hadamard products UaGk.
112
4.5 Agglomerated computation of the Hartree-Fock exchange
Figure 4.10: Top: entries of the exchange matrix for the all electron case of H2O.
Bottom: absolute error in Kex R41×41 extrapolated over n×n×n
3D Cartesian grids with n= 8192 and n= 16384.
113
4 TS computation of the Coulomb and exchange Galerkin matrices
Figure 4.11: Top: the entries of exchange matrix for CH4. Bottom: absolute ap-
proximation error in Kex computed on 3D grids with one-dimension
sizes n= 2048 and n= 4096.
114
4.5 Agglomerated computation of the Hartree-Fock exchange
Figure 4.12: Top: the exchange matrix for the pseudopotential case of CH3OH.
Bottom: absolute approximation error of computations on the n×
n×ngrids with n= 512 and n= 1024.
115
4 TS computation of the Coulomb and exchange Galerkin matrices
Figure 4.13: Absolute error in the tensor-product computation of Kex the pseu-
dopotential case of molecules CH4(top) and C2H5OH (bottom).
116
4.6 Numericals experiments
4.6 Numericals experiments
We tested the presented tensor-structured method in computation of the exchange
Galerkin matrix for the following molecules:
all electron case : H2O (Norb = 5, R0= 41), CH4(Norb = 5, R0= 55);
pseudopotential case: CH4(Norb = 4, R0= 50), CH3OH (Norb = 7, R0=
75) and C2H5OH (Norb = 11, R0= 111).
The calculations are performed on a standard SUN station using Matlab 7.6. In
presented figures we give the absolute difference of the results of our computations
with the corresponding exchange matrix calculated by the MOLPRO program
[108].
The computational box [b, b]3for small molecules is in the range of 2b= 14
A
for H2O, and 2b= 20
Afor CH4, C2H5OH, and CH3OH. The developed tensor-
structured algorithm enables computation of the Hartree-Fock exchange on huge
n×n×n3D Cartesian grids, with the number of entries up to n3= 163843. This
corresponds to the usage of univariate mesh-sizes from h2·102for the grids
with n= 1024, to h8·104
Afor the grids with n= 16384.
4.6.1 All electron case
For molecules with moderate size R0of basis sets, like CH4or H2O the grid-sizes
up to n= 16384 are computationally feasible for MATLAB, that is equivalent
to computations with 4.4·1012 nodes in the volume. These grids provide
the resolution of the strong cusps in electron orbitals corresponding to the core
electrons in a molecule, thus enabling accurate computations of the exchange
matrix for all electron case.
Computation of the exchange Galerkin matrix for the all electron case of H2O
molecule is a challenging problem due to sharp” Gaussians corresponding to
core electrons of the Oxygen atom. Figure 4.10 (top) shows the absolute values of
the exchange matrix entries for H2O, Figure 4.10 (bottom) presents the absolute
error of the tensor-structured computations of this matrix using the Richardson
extrapolation on n×n×n3D Cartesian grids with n= 8192 and n= 16384.
We achieve high accuracy 1.89·105in the “cusp area”, the remaining entries are
computed with the absolute error in the range of 106÷108.
Figure 4.11 (top) displays the absolute values of the exchange matrix of CH4
and Figure 4.11 (bottom) shows the absolute error in Kex reaching the accuracy
104by using the Richardson extrapolation on the grids with n= 2048 and
117
4 TS computation of the Coulomb and exchange Galerkin matrices
n364312832563512310243
H2O1 1.3 2.0 3.2 8.0
CH4(ps) 1 1.3 2.0 3.6 8.9
CH3OH (ps) 1 1.3 1.9 3.3 5.1
Table 4.3: Comparison of relative times.
n= 4096. Again, the matrix entries apart from the “cusp area” are computed
with much higher accuracy.
4.6.2 Pseudopotential case
We consider the pseudopotential case for larger molecules, achieving an accuracy
up to 106, using smaller 3D grids with one-dimension size n= 1024.
Figure 4.12 (top) shows entries of the exchange matrix of CH3OH molecule and
Figure 4.12 (right) displays that tensor-structured computations for this molecule
using the Richardson extrapolation on grids with n= 512,1024 yield an accuracy
105. Figure 4.13 shows the absolute error in Kexin the pseudopotential case of
the CH3OH molecule (top) and C2H5OH (bottom), correspondingly. For CH3OH
the Richardson extrapolation on two consequent grids with n= 512,1024 yields
the accuracy 105, while for C2H5OH we obtain 7 ·104, already on small 3D
grids with the one-dimension size n= 256,512.
Table 4.6.2 illustrates linear scaling of the relative one orbital computation
time, with respect to the one-dimension grid size n, in respective units of the
coarsest grid calculations (n= 64).
118
5 Solution of the Hartree-Fock
equation by multilevel TS
methods
The traditional numerical approaches in ab initio electronic structure calcula-
tions are based on the Galerkin approximation in the naturally separable GTO
basis and analytical evaluation of one- and two-electron integrals inherent to this
approach. Many years of development of rigorous schemes for the analytical
evaluation of the operators in the Hartree-Fock equation, yielded state-of-the-art
packages like GAUSSIAN, Abinit and MOLPRO [5, 108]. In analytical-based pro-
grams, elaborated by large scientific groups, precomputed parameters and “non-
zero” initial approximations are used for accelerating the iterative solution of the
nonlinear EVP (1.6).
The Hartree-Fock model presupposes at least cubic (or fourfold) scaling in the
number of the basis functions. When using the conventional Gaussian bases, the
number of basis functions rapidly increases for larger molecules, thus making this
model computationally unfeasible.
As a first remedy, there are simplified models based on the appropriately ad-
justed pseudopotentials and the grid-oriented methods over n×n×nspatial
grids. These include the traditional plane waves, wavelet or finite element dis-
cretizations, at the expense that scales linearly in the volume size, Nvol =n3,
[4, 5]. Accuracy of these approaches is limited due to the constraints on practi-
cally tractable grid size n500.
Here, we introduce the novel multilevel scheme for the numerical solution of the
Hartree-Fock equation using the grid-based tensor-structured methods described
in previous sections, and providing O(nlog n)-complexity, i.e., sublinear in the
volume, O(N1/3
vol ). The new concept for the numerical solution of the Hartree-Fock
equation is a “grey box” scheme based on a moderate number of problem-adapted
discrete Galerkin basis functions represented on 3D Cartesian grid, which are used
as the “global elements” with a low separation rank.
The core of our method is the tensor-structured computation of the electron
density and the Galerkin matrices of the nonlinear Hartree and (nonlocal) Hartree-
119
5 Solution of the Hartree-Fock equation by multilevel TS methods
Fock exchange operators (see Section 5.1.3) at all steps of iterations in the solu-
tion of the nonlinear EVP problem . Within the solution process, the multilinear
algebra operations such as the scalar and Hadamard products, the 3D convo-
lution transform, and the rank truncation are implemented with almost O(n)-
complexity. In view of linear scaling in nof the 3D tensor-structured arithmetic,
high accuracy is achieved due to computations over large n×n×ntensor grids
of size up to 163843entries. In electronic structure calculations, this implies fine
resolution with the mesh size h104
A, providing possibility for arbitrary
space orientation of a molecule in the computational box (like in the analytical
approaches).
The piecewise constant representation of the Galerkin basis functions, and the
electron density, leads to approximation error of order O(h2), in the Hartree and
exchange potentials, where h=O(n1) is the respective mesh size. In turn, the
two-grid version of the computational scheme improves the convergence rate up
to O(h3), by using the Richardson extrapolation. These approximation properties
are verified by the numerical experiments.
For solving the discrete Hartree-Fock equation on a fixed grid, we apply the
self-consistent field (SCF) iteration based on the traditional direct inversion in the
iterative subspace (DIIS) scheme commonly used in the physical literature [89].
In this iteration scheme, the current update to the Fock matrix is obtained by
relaxation over its values at the previous steps. The discrete orbitals, represented
by the Galerkin coefficients vectors, are updated by direct diagonalization of the
arising system matrix at each iteration on nonlinearity.
To enhance the classical DIIS iteration in our tensor-structured grid-based ap-
proach, we propose the multilevel strategy, that provides fast and robust iterative
solution method for the Hartree-Fock equation discretized on a sequence of re-
fined grids. Iterations begin on the coarsest grid with the zero initial guess for
the Hartree and exchange Galerkin matrices, J(C) = 0, K(C) = 0. We switch
iteration from the coarser to finer grid using a grid-dependent termination crite-
rion, such that the initial guess for the Coulomb and exchange matrices, obtained
from previous coarse levels, ensures robust convergence on finer grids. Low cost
of the iterations on the coarse grid levels reduces essentially the overall numerical
complexity of the global solution process.
In general, the convergence proof for the nonlinear DIIS iteration is still an open
question [78, 20]. We observe in numerical experiments for several small organic
molecules, that our multigrid accelerated DIIS iteration exhibits fast and uniform
in nconvergence (linear convergence rate), so that the overall computational time
scales almost linearly in n.
It is worth to note that the current version of the method still scales cubically
120
5.1 Galerkin scheme for the Hartree-Fock equation
in the size of approximating basis. Hence, any algebraic optimisation of this
basis set within the solution process gives the new opportunity to high accuracy
ab initio computations for large molecules. The quadratic scaling in the size of
approximating basis might be possible in the framework of direct minimization
algorithms (see [95] for detailed discussion on the direct minimization).
5.1 Galerkin scheme for the Hartree-Fock equation
5.1.1 Problem setting
As it was already mentioned in the Introduction, the Hartree-Fock equation for
pairwise L2-orthogonal electronic orbitals ϕi:R3R,ϕiH1(R3), is a nonlin-
ear EVP,
Fϕi(x) = λiϕi(x),ZR3
ϕiϕjdx =δij, i, j = 1, ..., Norb (5.1)
with the nonlinear Fock operator F,
F:= 1
2 + Vc+VHK,
where δij denotes the Kronecker delta, and Norb is the number of electron pairs
in a molecule.
Here, an external nuclear potential Vcis defined by
Vc(x) =
M
X
ν=1
Zν
kxAνk,(5.2)
where Mis the number of nuclei in a molecule, and Zν>0, AνR3denote the
charge and spatial coordinates of the nuclei, respectively. The Hartree potential
VHdetermines the Coulomb interaction of every single electron with the field
generated by all electrons of the system,
VH(x) := ZR3
ρ(y)
kxykdy, x R3,(5.3)
that corresponds to the convolution of the Coulomb potential with the electron
density
ρ(y) = 2
Norb
X
i=1
(ϕi(y))2.(5.4)
121
5 Solution of the Hartree-Fock equation by multilevel TS methods
The exchange part Kof the Fock operator is given by the convolution integral,
(Kϕ) (x) := 1
2ZR3
τ(x, y)
kxykϕ(y)dy, (5.5)
with the density matrix
τ(x, y) = 2
Norb
X
i=1
ϕi(x)ϕi(y).
5.1.2 Traditional discretization
Usually, the Hartree-Fock equation is solved by the standard Galerkin approxi-
mation of the initial problem (5.1) posed in H1(R3) (see [78] for more details).
For a given finite basis set {gµ}1µR0,gµH1(R3), the molecular orbitals ϕi
are expanded as
ϕi=
R0
X
µ=1
cµigµ, i = 1, ..., Norb.(5.6)
To derive the equation for the unknown coefficients matrix C={cµi} RR0×Norb ,
one first introduces the mass (overlap) matrix S={sµν}1µ,νR0, given by
sµν =ZR3
gµgνdx,
and the stiffness matrix H={hµν}of the core Hamiltonian part of the Fock
operator, H=1
2 + Vc, that includes the kinetic energy of electrons and the
nuclear potential energy,
hµν =1
2ZR3gµ·gνdx +ZR3
Vc(x)gµ(x)gν(x)dx, 1µ, ν R0.
The nonlinear terms representing the Galerkin approximation of the Hartree and
exchange operators are usually constructed by using the so-called two-electron
convolution integrals (cf. for example [78]), defined as
bµν,κλ =ZR3ZR3
gµ(x)gν(x)gκ(y)gλ(y)
kxykdxdy, 1µ, ν, κ, λ R0.
Traditionally, the Hartree and exchange parts in the Fock operator are computed
by the analytical evaluation of one- and two-electron integrals, using the natu-
rally separable Cartesian Gaussians gk,k= 1,...,R0, represented in the rank-1
canonical tensor product form,
gk(x) = g(1)
k(x1)g(2)
k(x2)g(3)
k(x3), x R3,
122
5.1 Galerkin scheme for the Hartree-Fock equation
with 1,2,3 designating the space dimensions.
Denoting by D= 2CCTRR0×R0the so-called density matrix, the R0×R0
Galerkin matrices J(C) for the Hartree, and K(C) for the exchange potentials,
can be defined as (cf. [78])
J(C)µν =
R0
X
κ,λ=1
bµν,κλDκλ, K(C)µν =1
2
R0
X
κ,λ=1
bµλ,νκDκλ.
The complete Fock matrix F=F(C) is then given by,
F(C) = H+G(C), G(C) = J(C) + K(C).(5.7)
The respective Galerkin system of nonlinear equations for the coefficients matrix
CRR0×Norb takes the form
F(C)C=SCΛ,Λ = diag(λ1,...,λNorb),(5.8)
CTSC =INorb ,
where the second equation represents the orthogonality constraints RR3ϕiϕj=δij,
with INorb being the Norb ×Norb identity matrix.
Since the traditional implementation of the SCF iteration for the Hartree-Fock
equation is based on the precomputed two-electron integrals, the complexity to
build up the matrix Gnormally scales as O(R4
0), that is dominated by computa-
tional cost for the exchange matrix K(C). In turn, the core Hamiltonian Hcan
be precomputed in O(R2
0) operations, hence, in the following, we focus on the
efficient Galerkin approximation of the nonlinear Hartree and exchange operators
to be updated at each step of SCF iteration.
The nonlinear system (5.8) can be solved by a certain SCF iteration, where
at each iterative step the respective linear eigenvalue problem has to be solved
with the updated matrix G(C). Given F(C), using the direct diagonalization for
solving the system (5.8) leads to the cost O(R3
0) at each iterative step.
An alternative approach can be based on the direct minimization of the Hartree-
Fock energy functional
IHF = inf (1
2
Norb
X
i=1 ZR3|∇ϕi|2+ZR3
ρVc+ZR3ZR3
ρ(x)ρ(y)|τ(x, y)|2
kxykdxdy),
over ϕiH1(R3), under the orthogonality constraints in (5.1), see [95] for more
details.
123
5 Solution of the Hartree-Fock equation by multilevel TS methods
5.1.3 Novel scheme via agglomerated tensor-structured
calculation of Galerkin matrices
We introduce the novel tensor-structured method for the numerical solution of the
nonlinear Hartree-Fock equation, which does not require the analytical evaluation
of the two-electron integrals described in Section 5.1.2. In our approach, we use
the grid-based agglomerated calculation of the three- and six-dimensional inte-
grals in evaluation of the Hartree and exchange operators and the corresponding
Galerkin matrices J(C) and K(C).
The idea is based on a certain reorganisation of the standard computational
scheme described in Section 5.1.2. Specifically, instead of precomputing the full
set of two-electron integrals bµν,κλ and the elements of the density matrix D, we
use agglomerated representations for J(C) and K(C).
Now, let us recall some constructions from Section 4.2. We suppose that the
initial problem is posed in the finite volume box = [b, b]3R3subject to
the homogeneous Dirichlet boundary conditions on Ω. For given discretization
parameter nN, introduce the equidistant tensor grid ω3,n with the mesh-size
h= 2b/n, as in (4.6). Then define the set of piecewise constant basis functions
{φi},i I := {1, ..., n}3, associated with the respective grid-cells in ω3,n (in-
dicator functions), and introduce the set of collocation discretisations of GTO
basis functions, {gk}, represented in this basis by the rank-1 coefficients tensor
{Gk} RI(k= 1, ..., R0). The projected Newton potential is represented in the
basis set {φi}by the low-rank coefficients tensor PNRI.
Now the computational scheme for the Coulomb matrix can be written by the
following tensor operations:
ρΘ :=
Norb
X
a=1 R0
X
κ,λ=1
cκacλaGκGλ!,
and then by the tensor-product convolution (see Section 4.2),
VH=ρ1
k·k ΘPN,(5.9)
with PNVnbeing the projection tensor for the Coulomb potential. This implies
the tensor representation of the Coulomb matrix,
J(C)µν =hgµ(x)gν(x), VH(x)i
hGµGν,ΘPNi,1µ, ν R0.(5.10)
We have rank(Gµ) = 1, while 3rd order tensors Θ and PNare approximated
by low-rank tensors (see Section 4). Hence, the corresponding tensor operations
124
5.1 Galerkin scheme for the Hartree-Fock equation
are carried out using fast multilinear algebra supplemented by the corresponding
rank optimization (tensor truncation).
In turn, as proposed in Section 4.5, we represent the matrix K(C) using tensor
operations by the following three loops. For a= 1, ..., Norb, and ν= 1, ..., R0,
compute the convolution integrals,
W(x) = ZR3
gν(y)
R0
P
m=1
cmagm(y)
kxykdy
Υ := "Gν
R0
X
m=1
cmaGm#PN,(5.11)
and then the scalar products (µ, ν = 1, ..., R0),
Kµν,a =ZR3"R0
X
m=1
cmagm(x)#gµ(x)W(x)dx
χµν,a := h"R0
X
m=1
cmaGm#Gµ,Υi.(5.12)
Finally, the entries of the exchange matrix are given by sums over all orbitals,
K(C)µν =
Norb
X
a=1
Kµν,a, µ, ν = 1, ..., R0.(5.13)
This scheme gains from the efficient low-rank separable approximation of the
Newton kernel, the discretised electron density ρ(x), and of auxiliary poten-
tials W(x) at step (5.11), that ensures low complexity of the three-dimensional
tensor-structured operations including rank reduction algorithms.
Notice that the effective realization of such a concept is possible with more
general basis sets {gµ}, than those generated by analytically separable rank-1
GTO basis chosen above. The key features of the Galerkin basis functions gµ
should include: (A) approximability, i.e., the Galerkin approximation error over
the quantities in (5.8) is satisfactory, and (B) separability, i.e., the collocation
coefficients tensor Gµfor the basis function gµ(x), can be represented by the
rank-RGexpansion with small RG,
Gµ=
RG
X
k=1
g(1)
µ,k g(2)
µ,k g(3)
µ,k, µ = 1, ..., R0,(5.14)
(see [70] for more detailed discussion on this topic). In this case, the tensor-
product schemes as above remain essentially the same except that the rank of the
collocation coefficients tensor Gµ(µ= 1, ..., R0) increases to RG.
125
5 Solution of the Hartree-Fock equation by multilevel TS methods
5.2 Multilevel tensor-truncated iteration via DIIS
5.2.1 General SCF iteration
The standard SCF algorithm can be formulated as the following “fixed-point”
iteration ([78]): Starting from initial guess C0, perform iterations of the form
FkCk+1 =SCk+1Λk+1,Λk+1 = diag(λk+1
1, ..., λk+1
Norb ) (5.15)
CT
k+1SCk+1 =INorb ,
where the current Fock matrix Fk= Φ(Ck, Ck1,...,C0), k= 0,1, ..., is specified
by the particular relaxation scheme. For example, for the simplest approach,
called the Roothaan algorithm, one has Fk=F(Ck). In practically interesting
situations this algorithm usually leads to “flip-flop” stagnation [78].
Recall that here, λk+1
1λk+1
2... λk+1
Norb are Norb negative eigenvalues of the
linear generalized eigenvalue problem
FkU=λSU, (5.16)
and the R0×Norb matrices Ck+1 contain the respective Norb orthonormal eigen-
vectors U1, ..., UNorb . We denote by Ck+1 RR0×R0the matrix representing the
full set of orthogonal eigenvectors in (5.16).
We use the particular choice of Fk,k= 0,1, ..., via the DIIS-algorithm, (cf.
[89]), with the starting value F0=F(C0) = H.
We propose the modification to the standard DIIS iteration, by carrying out
the iteration on a sequence of successively refined grids with the grid-dependent
stopping criteria. The multilevel implementation provides robust convergence
from the zero initial guess for the Hartree and exchange operators. The coarse-
to-fine grids iteration, in turn, accelerates the solution process dramatically due
to low cost of the coarse grid calculations.
The principal feature of our tensor-truncated iteration is revealed on the fast
update of the Fock matrix F(C) by using tensor-product multilinear algebra of
3-tensors accomplished with the rank truncation described above. Another im-
portant point is that the multilevel implementation provides simple and robust
scheme for construction good initial guess on the fine grid-levels.
5.2.2 SCF iteration by using DIIS scheme
For each fixed discretization, we use the original version of DIIS scheme (cf. [53]),
defined by the following choice of the residual error vectors (matrices),
Ei:= [CT
i+1F(Ci)Ci+1]|{1µNorb;Norb+1νR0}RNorb×(R0Norb),(5.17)
126
5.2 Multilevel tensor-truncated iteration via DIIS
for iteration number i= 0,1, ..., k, that should vanish on the exact solutions of
the Hartree-Fock Galerkin equation due to the orthogonality property. Hence,
some stopping criterion applies to residual error vector Eifor i= 0,1,2, ....
The minimizing coefficient vector c:= (c0, ..., ck)TRk+1 is computed by
solving the constrained quadratic minimisation problem for the respective cost
functional (the averaged residual error vector over previous iterands),
f(c) := 1
2
k
X
i=0
ciEi
2
F1
2hBc, ci min,provided that
k
X
i=0
ci= 1,
where
B={Bij}k
i,j=0 with Bij =hEi, Eji,
and with Eidefined by (5.17). Introducing the Lagrange multiplier ξR, the
problem is reduced to minimization of the Lagrangian functional
L(c, ξ) = f(c)ξ(h1, ci1),
where 1= (1, ..., 1)TRk+1, that leads to the linear augmented system of equa-
tions
Bcξ1= 0,(5.18)
h1,ci= 1.
Finally, the updated Fock operator Fkis built up by
Fk=
k1
X
i=0
copt
iFi+copt
kF(Ck), k = 0,1,2, ..., (5.19)
where the minimizing coefficients copt
i=ci(i= 0,1, ..., k) solve the linear system
(5.18). For k= 0 the first sum in (5.19) is assumed to be zero, hence providing
copt
0= 1, and F0=F(C0).
Recall that if the stopping criterion on Ck,k= 1, ..., is not satisfied, one updates
Fkby (5.19) and solves the eigenvalue problem (5.15) for Ck+1.
Note that in practice one can use the averaged residual vector only on a reduced
subsequence of iterands, Ek, Ek1, ..., Ekk0,kk0>0. In our numerical examples
below, we usually set k0= 4.
5.2.3 Unigrid and multilevel tensor-truncated DIIS iteration
In this section, we describe the resultant numerical algorithm. Recall that the
discrete nonlinear Fock operator is specified by a matrix
F(C) = H+J(C) + K(C),(5.20)
127
5 Solution of the Hartree-Fock equation by multilevel TS methods
where Hcorresponds to the core Hamiltonian (fixed in our scheme) and the
discrete Hartree and exchange operators are given by tensor representations (5.10)
and (5.12), respectively.
First, we describe the unigrid tensor-truncated DIIS scheme.
Algorithm TT DIIS (Unigrid tensor-truncated DIIS iteration).
1. Given the core Hamiltonian matrix H, the grid parameter n, and the termi-
nation parameter ε > 0.
2. Set C0= 0 (i.e. J(C0) = 0,K(C0) = 0), and F0=H.
3. For k= 0,1, ..., perform
a) Solve the full linear eigenvalue problem of size R0×R0, given by (5.16), and
define Ck+1 as the matrix containing the Norb eigenvectors corresponding to
Norb minimal eigenvalues.
b) Terminate the iteration by checking the stopping criterion
kCk+1 CkkFε.
c) If kCk+1 CkkF> ε, compute the Fock matrix
F(Ck+1) = H+J(Ck+1) + K(Ck+1)
by the tensor-structured calculations of J(Ck+1)and K(Ck+1), using grid-
based basis functions with expansion coefficients Ck+1, (see Section 4), up-
date the Fock matrix Fk+1 by (5.19), and switch to Step a).
4. Returns: Eigenvalues λ1, ..., λNorb and eigenvectors CRR0×Norb.
Figure 5.1 shows the convergence of Algorithm TT DIIS for the solution of the
Hartree-Fock equation in the pseudopotential case of CH4. It demonstrates that
the convergence history is almost independent on the grid size on the examples
with n= 64 and n= 256, correspondingly.
To enhance the unigrid DIIS iteration, we propose the multilevel version of Al-
gorithm TT DIIS defined on a sequence of discrete Hartree-Fock equations spec-
ified by a sequence of grid parameters np=n0,2n0,...,2Mn0, with p= 0, ..., M,
corresponding to the succession of dyadically refined spacial grids. To that end,
for ease of exposition, we also introduce the incomplete version of Algorithm
TT DIIS, further called Algorithm TT DIIS(k), which represents only its part
starting from iteration number k=k1. The input data for Algorithm
TT DIIS(k) include the current approximation Ckand a sequence of all already
precomputed Fock matrices, F0, F1, ..., Fk1.
We sketch this algorithm as follows.
128
5.2 Multilevel tensor-truncated iteration via DIIS
Algorithm TT DIIS(k)(Incomplete unigrid tensor-truncated DIIS iteration).
1. Given the core Hamiltonian matrix H, the grid parameter n, the termination
parameter ε > 0,Ck, and a sequence of Fock matrices F0, F1, ..., Fk1.
2. Compute J(Ck),K(Ck),F(Ck) = H+J(Ck) + K(Ck), and Fkby (5.19).
3. For k=k, k + 1, ..., perform steps a) - c) in Algorithm MTT DIIS.
We further assume that the core Hamiltonian His precomputed beforehand.
Algorithm MTT DIIS (Multilevel tensor-truncated DIIS scheme).
1. Given the core Hamiltonian matrix H, the coarsest grid parameter n0, the
termination parameter ε0>0, and the number of grid refinements M.
2. For p= 0, apply the unigrid Algorithm TT DIIS with n=n0,εp=ε0, and
return the number of iterations k0, matrix Ck0+1, and a sequence of Fock matrices
F0, F1, ..., Fk0.
3. For p= 1, ..., M, apply successively Algorithm TT DIIS(kp1+ 1), with the
input parameters np:= 2pn0,εp:= ε022p,Ckp1+1. Keep continuous numbering
of the DIIS iterations through all levels, such that the maximal iteration number
at level pis given by
kp=
p
X
p=0
mp,
with mpbeing the number of iterative steps at level p.
4. Returns: kM,CkM+1, and a sequence of Fock matrices F0, F1, ..., FkM.
Usually, we start calculations on an n×n×n3D Cartesian grid, with n0= 64,
and end up with maximum nM= 8192, for all electron case computations, or
nM= 1024, for the pseudopotential case. Further, in Section 5.3, we show by
numerical examples that in large scale computations the multilevel Algorithm
MTT DIIS allows us to perform most of the iterative steps on coarse grids thus
reducing dramatically the computational cost and, at the same time providing
a good initial guess for the DIIS iteration on nonlinearity at each consequent
approximation level.
The flow-chart in Figure 5.2 shows the general steps of the unigrid/multigrid
tensor-structured SCF algorithms.
5.2.4 Complexity estimates in terms of R0,Norb and n
The rest of this section addresses the complexity estimate of the multilevel tensor-
truncated iteration in terms of RN,R0,nand other governing parameters of the
algorithm. For the ease of discussion we suppose that rank(Gµ) = 1, µ= 1, ..., R0,
(see [70] concerning the more detailed discussion on general case of rank(Gµ)1).
129
5 Solution of the Hartree-Fock equation by multilevel TS methods
2 4 6 8 10 12 14 16 18 20
10−5
10−4
10−3
10−2
10−1
100
iterations
λ1, n1=64
5 10 15 20 25
10−5
10−4
10−3
10−2
10−1
100
iterations
the residual (λnλn−1)
Figure 5.1: Convergence in the eigenvalues for the unigrid TT DIIS algorithm,
in the pseudopotential case of the CH4molecule: the univariate grid
size n= 64 (left) and n= 256 (right).
Lemma 5.1 Let rank(Gµ) = 1,µ= 1,...R0and rank(PN)RNCNorb.
Suppose that the rank reduction procedure applied to the convolution products Υ
in (5.11) provides the rank estimate rank)r0. Then the numerical cost of
one iterative step in Algorithm MTT DIIS at level p, can be bounded by
Wp=O(R0RNnplog np+R3
0r0Norbnp).
Assume that the number of multigrid DIIS iterations at each level is bounded by
the constant I0, then the total cost of Algorithm MTT DIIS does not exceed the
double cost at the finest level n=nM,2WM=O(I0R3
0r0Norbn).
Proof: The rank bound rank(Gk) = 1 implies rank(
R0
P
m=1
cma Gm)R0. Hence,
the numerical cost to compute the tensor-product convolution Υ in (5.11)
amounts to
W) = O(R0RNnplog np).
Since the initial canonical rank of Υ is estimated by rank)R0RN, the
multigrid rank reduction algorithm, having linear scaling in rank), see Sec-
tion 3, provides the complexity bound O(r0R0RNnp). Hence the total cost to
compute scalar products in χµν,a (see (5.12)) can be estimated by
W(χµν,a) = O(R3
0r0Norbnp),
which completes the first part of our proof. The second assertion is due to linear
scaling of the unigrid algorithm in np, that implies the following bound
n0+ 2n0+... + 2pn02p+1n0= 2nM,
130
5.3 Numerical illustrations
J=0, K=0
0discretize GTOs
,
ε=εset 04−p
F=H
0FC = λS C
p=0 0
n
,,ε0
compute ρ, VH,J,
λF C =
make DIIS
) = C
||del(C
n= 2pn
CS
p=p+1
pyes
no
,k=1
kKk
k+1 k+1 k+1
if k>k
k=k+1
k
k
< M
0
k+1
del(C − Ck
)|| < ε
Fk
,
Figure 5.2: The flow-chart of the tensor-structured SCF algorithm.
hence, completing the proof.
Remark 5.2 In the case of large molecules and RG=rank(Gµ)1, further
optimisation of the algorithm up to O(RNR2
0np)-complexity may be possible on
the base of rank reduction applied to the rank-RGR0orbitals and by using an
iterative eigenvalue solver instead of currently employed direct solver via matrix
diagonalization, or by using direct minimization schemes [95].
5.3 Numerical illustrations
5.3.1 General discussion
Our algorithm for ab initio solution of the Hartree-Fock equation in tensor-
structured format is examined numerically on some moderate size molecules. In
particular, we consider the all electron case of H2O, and the case of pseudopo-
tential of CH4and CH3OH molecules. In the present numerical examples, we
use the discretised GTO basis functions for reasons of convenient comparison of
the results with the output from the standard MOLPRO package based on the
analytical evaluation of the integral operators in the GTO basis.
131
5 Solution of the Hartree-Fock equation by multilevel TS methods
The size of the computational box [b, b]3introduced in Section 5.1.3 varies
from 2b= 11.2
Afor H2O up to 2b= 16
Afor small organic molecules. The
smallest step-size of the grid h= 0.0013
Ais reached in the SCF iterations for
the H2O molecule, using the finest level grid with n= 8192, while the average step
size for the computations using the pseudopotentials for small organic molecules
is about h= 0.015
A, corresponding to the grid size n= 1024.
5.3.2 Multilevel tensor-truncated SCF iteration applied to
some moderate size molecules
We solve numerically the ab initio Hartree-Fock equation, by using Algorithms
TT DIIS and MTT DIIS presented in Section 5.2.3. Starting with the zero
initial guess for matrices J(C) = 0 and K(C) = 0 in the Galerkin Fock matrix
(5.7), the eigenvalue problem at the first iterative step (p= 0) is solved by using
only the Hpart of the Fock matrix in (5.7), that does not depend on the solution,
and hence, can be precomputed beforehand.
Thus, the SCF iteration starts with the expansion coefficients cµi for orbitals in
the GTO basis, computed using only the core Hamiltonian H. At every iteration
step, the Hartree and exchange potentials and the corresponding Galerkin matri-
ces, are computed using the updated coefficients cµi. The renewed Coulomb and
exchange matrices generate the updated Fock matrix to be used for the solution
of the eigenvalue problem. The minimization of the Frobenius norm of the virtual
block of the Fock operator evaluated on eigenvectors of the consequent iterations,
Ck, Ck1, ..., is utilized for the DIIS scheme.
The multilevel solution of the nonlinear eigenvalue problem (5.8) is realised via
the SCF iteration on a sequence of uniformly refined grids, beginning from the
initial coarse grid, say, with n0= 64, and proceeding on the dyadically refined
grids, np=n02p,p= 1, ..., M. We use the grid dependent termination criterion
εnp:= ε022p, keeping a continuous numbering of the iterations.
Figure 5.6 (left) shows the convergence of the iterative scheme in the case of
pseudopotential of CH4. Convergence in the total Hartree-Fock energy reaching
the absolute error 9·106on the grid size n= 1024 is shown in Figure 5.6 (right).
The total energy is calculated by
EHF = 2
Norb
X
a=1
λa
Norb
X
a=1 e
Jae
Ka
with e
Ja=hψa, VHψaiL2, and e
Ka=hψa,VexψaiL2, being the so-called Coulomb and
exchange integrals, respectively, computed in the orbital basis ψa(a= 1, ..., Norb).
132
5.3 Numerical illustrations
5 10 15
10−6
10−4
10−2
100
iterations
residual in HF EVP, all electron case H2O
128 256 512 1024 2048 4096 8192
10−4
10−3
10−2
10−1
100
101
univariate grid size
Figure 5.3: Multilevel convergence of the DIIS iteration applied to the all electron
case of H2O (left), and convergence in the energy in n(right).
Figure 5.4 (left) shows the linear scaling in n, corresponding to the CPU time of
one iteration. Figure 5.4 (right) shows the number of “effective” iterations counted
by rescaling the total computational time to the iteration time-unit observed at
each iterative step at the finest grid-level: one iteration at level pis counted with
the factor 2pM. Figure 5.3 (left) shows convergence of the SCF iteration for the
200 400 600 800 1000
0
5
10
15
20
25
30
univariate grid size
minutes
time per SCF iteration
01234
10−4
10−3
10−2
10−1
100conv.in eff.iterations, CH4, pseudo, n=512
Figure 5.4: Linear scaling in n(left) and convergence in the effective iterations
(right).
all electron case of H2O. This challenging problem is solved efficiently due to the
usage of large 3D Cartesian grids up to the volume size Nvol = 81923. Figure 5.3
(right) shows convergence of the HF energy for the corresponding grid levels.
133
5 Solution of the Hartree-Fock equation by multilevel TS methods
Figure 5.7 presents the convergence history of the nonlinear SCF iteration for
CH3OH (top) and C2H5OH (bottom) molecules. Red and blue lines show the
convergence in the residual error, and in the largest eigenvalue, respectively.
5 10 15 20
10−3
10−2
10−1
100
iterations
EVP CH3OH , residual, nmax=2048
1 2 3 4 5 6
10−4
10−3
10−2
10−1
100
101
grid levels (level 6−>n=2048)
ENERGY CH3OH
Figure 5.5: Iteration history for CH3OH molecule: Residual (left) and the energy
(right).
5.3.3 Conclusions to Section 5
We present the grid-based tensor-truncated numerical method for the robust and
accurate iterative solution of the nonlinear Hartree-Fock equation at the cost
O(N1/3
vol ) in the volume size Nvol =n3. The computational scheme is based on the
discrete tensor representation of the Fock operator over the 3D Cartesian grid at
each step of the multilevel SCF iteration applied to the nonlinear 3D eigenvalue
problem. The storage request is roughly estimated by O(N1/3
vol ).
This scheme is neither restricted to the analytically separable basis functions
like GTO orbitals nor to the traditional plane waves approximations. The Galerkin
basis can be modified by adapting to the particular problem in the framework of
the tensor-structured solution process.
Further improvement of the algorithm toward the O(log n)-complexity on the
base of quantics-TT approximation [87, 83, 88, 64, 84, 71], may open new per-
spectives for efficient ab initio numerical simulation of complex molecules and for
the FEM-DFT computations of large molecular clusters.
The main computational blocks of the numerical scheme allow the natural par-
allelization on the level of matrix elements computation, rank decompositions,
and the multilinear tensor operations.
134
5.3 Numerical illustrations
2 4 6 8 10 12
10−4
10−3
10−2
10−1
100
iterations
P. CH4 abs. error , |λλn,it |
n=64 n=128 n=256
n=512
n=1024
12345
10−6
10−4
10−2
100
univariate grid levels
HF energy, HF pseudo, n5=1024
Figure 5.6: Multilevel convergence for the pseudopotential of CH4molecule (top),
and convergence of the HF energy in the grid levels (bottom). Level
1 corresponds to n= 64, while on the finest level 5 we have n= 1024.
135
5 Solution of the Hartree-Fock equation by multilevel TS methods
5 10 15 20 25 30
10−5
10−4
10−3
10−2
10−1
100
101
iterations
DeltaLam1
residual
5 10 15 20 25 30 35
10−5
10−4
10−3
10−2
10−1
100
101
iterations
DelLam1
residual
n=64
n=128
n=256
n=512
Figure 5.7: SCF iterations for the pseudopotentials of CH3OH, with the maximum
grid level n= 2048 (top) and C2H5OH with the maximum grid level
n= 512 (bottom).
136
6 Summary of main results
6.1 Brief summary
In the present dissertation we developed and analysed tensor numerical methods
and algorithms for efficient grid-based evaluation of multivariate functions and
integral operators. These methods are successfully applied to the solution of the
nonlinear Hartree-Fock equation in quantum chemistry. Here we present the sum-
mary of main results.
1. The developed tensor methods and algorithms.
Mixed (two-level) best Tucker approximation (BTA) algorithm applied to
the rank-Rcanonical input, combined with the rank reduction algorithm
applied to the small-size rank-RTucker core.
Reduced HOSVD algorithm (RHOSVD) of complexity O(dR2n), for fast
computation of the initial orthogonal vectors in the ALS iteration imple-
menting the mixed BTA.
The multigrid accelerated (MGA) mixed best Tucker approximation method
based on the enhanced ALS iteration1applied to large rank-Rcanonical n×
n×n-tensors represented on a sequence of refined grids (O(Rrn) complexity,
where ris the maximal Tucker rank). In the case of a full format target
tensor the complexity of MGA BTA algorithm is linear in the volume size,
O(n3), instead of O(n4) for HOSVD;
Accurate and fast rank-structured tensor computation of the Hartree po-
tential, with O(Rn log n) complexity, where Ris the rank of the electron
density. Main ingredients are the MGA BTA of the agglomerated electron
density, and the tensor-product convolution with the low-rank approximand
to the Newton potential. The computational complexity of the respective
1The MGA-ALS iteration has a three-fold gain: avoids the problem to compute HOSVD on
large-size tensors, provides good initial guess for ALS corrections, resulting in the only few
ALS iterations at each approximation level, and allows to optimize the -mode orthogonal
subspaces only on the reduced set of “most important fibers”.
137
6 Summary of main results
Coulomb matrix is proportional to R2
0n, where R0is the number of Galerkin
basis functions.
Accurate tensor-structured computation of the Hartree-Fock exchange us-
ing agglomerated orbitals and applying enhanced MLA via the window-
ing and rank reduction techniques, providing the asymptotical complexity
O(R3
0nlog n);
The multilevel tensor-truncated DIIS-based iterative method for the numer-
ical solution of the nonlinear Hartree-Fock equation disretised on n×n×n-
grid, that provides almost linear scaling in n,O(nlog n) (a “grey-box” al-
gorithm). The multilevel scheme has a four-fold effect: fast convergence
with zero starting values for the Coulomb/exchange matrices, good initial
guess at each grid level, improved approximation by the Richardson extrap-
olation, considerable reduction of the number of recomputed entries in the
exchange matrix by using filtering strategy on fine grid levels. Notice that
the existing benchmark grid-oriented solvers for the Hartree-Fock equation,
based on the full-grid representation, scale at least linear in the volume size,
O(n3);
A 3D nonlinear EVP solver is implemented in MATLAB and verified (com-
pared with MOLPRO output) in ab initio solution of the Hartree-Fock equa-
tion for particular molecules, H2O (all electron case), and CH4, CH3OH,
C2H5OH (pseudopotential case).
2. Analysis of algorithms.
Error and complexity analysis of presented methods include the following results.
Theorem 2.14 on the error bound of RHOSVD in Algorithm C2T applied
to a canonical tensor;
Theorem 3.1 on the complexity of the MGA Algorithm C2T for a canonical
input tensor;
Lemma 3.2 on the complexity of the MGA Algorithm F2T for full format
tensors.
Lemma 4.2 on the complexity of the tensor-structured computational scheme
for the Hartree-Fock exchange matrix.
Lemma 5.1 on the complexity of the multilevel tensor-truncated iterative
solver for the Hartree-Fock equation.
138
6.1 Brief summary
3. Numerical results.
All algorithms are implemented in MATLAB and verified by extensive numeri-
cal experiments in electronic structure calculations that confirm the theoretical
performance of the algorithms, e.g., the almost linear scaling in all important
rank and grid-discretization parameters. These algorithms enable using large 3D
Cartesian grids with the benchmark sizes up to Nvol = 163843, which ensures high
approximation accuracy (resolution) for an arbitrary orientation of a molecule in
the computational box.
4. Interesting observations based on numerical experiments.
In the numerical experiments, we found out that for the Tucker-type approxima-
tion of function related tensors (corresponding to the classical potentials, electron
densities, and Hartree potentials):
(a) the error of the Tucker approximation decays exponentially with respect to
the Tucker rank (known theoretical result for the Newton, Yukawa and Helmholtz
potentials);
(b) the orthogonal vectors of the decomposition are of special shape, that is al-
most independent of the discretization step and resolves the peculiarities of the
considered function;
(c) the core coefficients tensor of the orthogonal Tucker transform is of sparse
character (up to certain threshold);
(d) the Tucker rank of the almost periodic 3D structures weakly depends on the
number of cells;
(e) the ALS Tucker iteration usually demonstrates fast and robust local conver-
gence (likely due to exponential decay of the Tucker approximation error, see item
(a)).
This research project addresses many new and challenging mathematical and
algorithmic problems to be considered in the future. These problems are related
to the developing efficient solution methods that are free of the “curse of dimen-
sionality”, to be applied to high-dimensional equations arising in modern scientific
computing.
139
6 Summary of main results
6.2 Presentations
The author presented the following talks (and posters) on the topics of disserta-
tion.
1. Numerical Solution of the Hartree-Fock Equation in the Multilevel Tensor-
Structured Format.
The 16-th Conference of the International Linear Algebra Society (ILAS).
Pisa, Italy, June 21-25, 2010.
2. Numerical Solution of the Hartree-Fock Equation by the Tensor-Structured
Methods. (Poster).
First Principles in Quantum Chemistry: From Elementary Reactions to
Enzyms. Bad Herrenalb, Germany, April 14-17, 2010.
3. Tensor-Structured Methods in Electronic Structure Calculations.
GAMM 2010. 81-st Annual Meeting of the International Association of
Applied Mathematics and Mechanics. Contributed Session S16. Karlsruhe
Germany, March 22-26, 2010.
4. Numerical Solution of the Hartree-Fock Equation by the Multilevel Tensor-
Structured Methods.
26-th GAMM-Seminar Leipzig on Tensor Approximations and High-Dimensional
Problems. Max-Planck-Institute for Mathematics in the Sciences, Leipzig,
22-24.02.2010.
5. Numerical Solution of the Hartree-Fock Equation by the Tensor-Structured
Techniques.
Invited talk at the Seminar of the Numerical Analysis Group. Mathematics
Institute, University of T¨ubingen, 22.10.2009.
6. Tensor Product Approximation of the Hartree and Exchange Operators in
the Hartree-Fock Equation.
The eighth European Conference on Numerical Mathematics and Advanced
Applications, ENUMATH 2009, Uppsala, Sweden, 29.06-3.07.2009.
7. Towards Solution of the Hartree-Fock Equation by the Tensor-structured
Methods.
Berlin-Leipzig Seminar, TU Berlin, May 18, 2009.
8. Accurate Solution of the Hartree-Fock Equation by the Tensor-structured
Methods.
140
6.2 Presentations
Evaluation Poster, Max-Planck-Institute for Mathematics in the Sciences,
Leipzig, April 28-30, 2009.
9. Efficient Tensor-product Approximation of the Hartree and Exchange Po-
tentials in the Hartree-Fock Equation.
Saxonian Theoretical Seminar, Theoretical Methods for Complex molecular
systems, Wilhelm-Ostwald Institute of Physical and Theoretical Chemistry,
University of Leipzig, February 26-27, 2009.
10. Multigrid Accelerated Tensor Approximation in Electronic Structure Calcu-
lations.
Workshop: Numerical Methods in Density Functional Theory, DFG Re-
search Center MATHEON, TU Berlin, 23-25 July 2008.
11. Multigrid Accelerated Tensor Approximation in 3D Electronic Structure Cal-
culations.
Max-Planck-Institute for Mathematics in the Sciences, Leipzig, July 15,
2008.
12. (with H.-J. Flad, B. Khoromskij and S.R. Chinnamsetty) Tensor Decompo-
sition in Electronic Structure Calculations on 3D Cartesian Grids.(Poster).
CompPhys07, 8th NTZ-Workshop on Computational Physics, University of
Leipzig, 29 November-01 December 2007.
141
6 Summary of main results
142
7 Appendix
7.1 Singular value decomposition and the best
rank-kapproximation of a matrix
Here we recall the standard theorem of the numerical analysis on the singular
value decomposition (SVD) [100].
Theorem 7.1 Let ARm×n, with mn, for definiteness. Then there exist
URm×n,ΣRn×nand VRn×nsuch that
A=UΣVT,(7.1)
where Σis a diagonal matrix whose diagonal entries, σi,i= 1,2,...,n, are the
ordered singular values of A,σ1σ2... σn0and UTU=Imand
VTV=In, with Indenoting the n×nidentity matrix.
The algebraic complexity of the SVD transform scales as O(mn2).
The best approximation of an arbitrary matrix ARm×nby a rank-kmatrix
Ak(say, in Frobenius norm, that is kAk2
F=P
(i,j)m×n
a2
ij) can be calculated by the
truncated SVD as follows. Let us consider the SVD of a matrix A=UΣVT, and
set Σk= diag{σ1,...σk,0,...,0}, then the best rank-kapproximation is given by
Ak:= UΣkVT=
k
X
i=1
σiuivT
i,
where ui, viare the respective left and right singular vectors of A. The approxi-
mation error in the Frobenius norm is bounded by
kAkAkFv
u
u
t
n
X
i=k+1
σ2
i.(7.2)
7.2 Reduced SVD of a rank-Rmatrix
Let us consider a rank-Rmatrix M=ABTRn×n, with the factor matrices
ARn×Rand BRn×R, where Rn. We are interested in the best rank
143
7 Appendix
rapproximation of M, with r < R. It can be implemented using the following
algorithm, that avoids the singular value decomposition of the target matrix M
with possibly large n.
This algorithm includes the following steps.
1. Perform the QR-decomposition of the side matrices,
A=QARA, B =QBRB,
with the unitary matrices QA, QBRn×R, and the upper triangular matri-
ces RA, RBRR×R.
2. Compute the SVD of the core matrix, RART
BRR×R
RART
B=UΣVT,
with the diagonal matrix Σ = diag{σ1,...,σR}, and unitary matrices U, V
RR×R.
3. Compute the best rank-rapproximation of the core matrix, UrΣrVT
r, by
extracting the submatrix Σr=diag{σ1,...r}in Σ, and the first rcolumns
Ur, VrRR×rin the unitary matrices Uand V, respectively.
4. Finally, set the rank-rapproximation Mr=QAUrΣrVT
rQT
B, where QAUr
and QBVrare n×runitary matrices.
The approximation error is bounded by sR
P
i=r+1
σ2
i. The complexity of above
algorithm scales linearly in n,O(nR2) + O(R3). In the case Rn, this reduces
dramatically the cost O(n3) of the truncated SVD applied to the full-format n×n
matrix M.
144
7.3 List of abbreviations
7.3 List of abbreviations
ALS alternating least squares (algorithm)
BTA best Tucker approximation
C2T (CT) canonical to Tucker transform
C BTA BTA algorithm for the canonical target tensor
DIIS direct inversion of iterative subspaces
F2T (FT) full format to Tucker transform
GBTA BTA algorithm for the full format tensor
HOSVD higher order singular value decomposition
MLA multilinear algebra
MGA multigrid accelerated BTA
MG C BTA multigrid accelerated BTA algorithm for the
canonical target tensor
MG G BTA multigrid accelerated BTA algorithm for the
full format tensor
MIFs most important fibers in MG C BTA
RHOSVD reduced higher order singular value decomposition
for the canonical target tensor
SCF self consistent field (iteration)
SVD singular value decomposition
T2C (TC) Tucker to canonical transform
145
7 Appendix
146
Bibliography
[1] P.-A. Absil, R. Mahoni and R. Sepulchre. Optimization Algorithms on Matrix
manifolds. Princeton University Press, 2008.
[2] E. Acar, T. G. Kolda and D. M. Dunlavy. An Optimization Approach
for Fitting Canonical Tensor Decompositions. Technical Report Number
SAND2009-0857, Sandia National Laboratories, Albuquerque, NM and Liv-
ermore, CA, February 2009.
[3] J. Alml¨of, Direct methods in electronic structure theory. In D. R. Yarkony,
Ed., Modern Electronic Structure Theory, Vol. II, (World Scientific, Singa-
pore, 1995) pp. 110–151.
[4] L. Genovese, A. Neelov, S. Goedecker, T. Deutsch, S. A. Ghasemi, A. Wil-
land, D. Caliste, O. Zilberberg, M. Rayson, A. Bergman, R. Schneider.
Daubechies wavelets as a basis set for density functional pseudopotential cal-
culations. J. Chem. Phys. 129 (2008) 014109.
[5] X. Gonze, J.-M. Beuken, R. Caracas, F. Detraux, M. Fuchs, G.-M. Rignanese,
L. Sindic, M. Verstraete, G. Zerah, F. Jollet, M. Torrent, A. Roy, M. Mikami,
Ph. Ghosez, J.-Y. Raty, D. C. Allan. First-principles computation of material
properties: the ABINIT software project. Comput. Mater. Sci. 25 (2002) 478.
[6] B.W. Bader and T.G. Kolda. MATLAB Tensor Classes for Fast Algorithm
Prototyping. SANDIA Report, SAND2004-5187, Sandia National Laborato-
ries, 2004.
[7] J. Ballani and L. Grasedyck. A Projection Method to Solve Linear Systems
in Tensor Format. Preprint MIS MPI 22/2010, Leipzig.
[8] M. Bebendorf. Hierarchical matrices. A means to efficiently solve elliptic
boundary value problems. Lecture Notes in Computational Science and En-
gineering, 63. Springer-Verlag, Berlin, 2008.
[9] M. Bebendorf. Adaptive Cross Approximation of Multivariate functions. Con-
structive Approximation. 2010, to appear.
147
Bibliography
[10] C. Bertoglio, B. N. Khoromskij. Low rank tensor-product approximation of
projected Green kernels via sinc-quadratures. Preprint 79/2008, MPI MIS
Leipzig, 2008.
[11] G. Beylkin and M.J. Mohlenkamp. Numerical operator calculus in higher
dimensions. Proc. Natl. Acad. Sci. USA 99 (2002) 10246–10251.
[12] G. Beylkin and M.J. Mohlenkamp. Algorithms for numerical analysis in high
dimension. SIAM J. Scientific Computing, 26 (6) (2005) 2133-2159.
[13] G. Beylkin, M.J. Mohlenkamp and F. P´erez. Approximating a wavefunction
as an unconstrained sum of Slater determinants. Journal of Math. Physics,
49 032107 (2008).
[14] T. Blesgen, V. Gavini and V. Khoromskaia. Tensor Product Approximation
of the Electron Density of Large Aluminium Clusters in OFDFT. Preprint
66/2009 MPI MIS Leipzig, 2009.
[15] D. Braess, W. Hackbusch. Approximation of 1/x by exponential sums in
[1,).IMA J. Numer. Anal. 25 (2005).
[16] E. Canc´es and Le Bris. On the convergence of SCF algorithms for the Hartree-
Fock equations. ESAIM: M2AN, vol. 34/4, 2000, 749-774.
[17] J.D. Carrol and J. Chang. Analysis of individual differences in multidimen-
sional scaling via an N-way generalization of ’Eckart-Young’ decomposition.
Psychometrika 35 (1970), 283-319.
[18] J.D. Carrol, S. Pruzansky, and J.B. Kruskal. CANDELINC: A general
approach to multidimensional anlysis of many-way arrays with linear con-
straints on parameters. Psychometrika, 45 (1980), pp. 3-24.
[19] S.R. Chinnamsetty, M. Espig, W. Hackbusch, B.N. Khoromskij, H-J Flad,
Kronecker tensor product approximation in quantum chemistry. Journal of
Chemical Physics 127, 084110 (2007).
[20] P.G. Ciarlet and C. Le Bris, eds. Handbook of Numerical Analysis. Vol. X,
Computational Chemistry. Elsevier, Amsterdam, 2003.
[21] L. De Lathauwer. Signal processing based on multilinear algebra. PhD Thesis.
Katholeke Universiteit Leuven, 1997.
[22] L. De Lathauwer. Decomposition of a higher-order tensor in block terms. Part
II: Definitions and uniqueness. Tech. Report no. 07-81, ESAT/SCD/SISTA,
K.U. Leuven, Belgium, 2007.
148
Bibliography
[23] L. De Lathauwer, B. De Moor, J. Vandewalle. On the best rank-1and rank-
(R1, ..., RN)approximation of higher-order tensors. SIAM J. Matrix Anal.
Appl., 21 (2000) 1324-1342.
[24] L. De Lathauwer, B. De Moor, J. Vandewalle. A multilinear singular value
decomposition. SIAM J. Matrix Anal. Appl., 21 (2000) 1253-1278.
[25] T.H. Dunning Jr. Gaussian basis sets for use in correlated molecular calcu-
lations. I. The atoms boron through neon and hydrogen. J. Chem. Physics,
90 (1989), 1007-1023.
[26] L. Elden and B. Savas. A Newton-Grassmann method for computing the best
multilinear rank-(r1, r2, r3)approximation of a tensor. SIAM J. Matrix Anal.
and Appl., Volume 31, Issue 2, pp.248-271 (2009).
[27] H.-J. Flad, W. Hackbusch, B.N. Khoromskij, and R. Schneider. Concepts
of Data-Sparse Tensor-Product Approximation in Many-Particle Modeling.
In: Matrix Methods: Theory, Algorithms, Applications, (dedicated to the
memory of Gene Golub) V. Olshevsky, E. Tyrtyshnikov eds., World Scientific,
2010, pp.313-347.
[28] H.-J. Flad, W. Hackbusch, and R. Schneider. Best N-term approximation
in electronic structure calculations: I. One-electron reduced density matrix.
ESAIM: M2AN 40, 49-61 (2006).
[29] H.-J. Flad, B.N. Khoromskij, D.V. Savostyanov, and E.E. Tyrtyshnikov. Ver-
ification of the cross 3d algorithm on quantum chemistry data. J. Numer.
Anal. and Math. Modelling, 4 (2008), 1-16.
[30] H.-J. Flad, R. Schneider, B.-W. Schulze. Asymptotic regularity of solutions
to Hartree-Fock equations with Coulomb potential. Math. Methods Appl. Sci.,
31 (2008), no. 18, 2172-2201.
[31] S. Friedland, V. Mehrmann, A. Miedlar, and M. Nkengla. Fast low rank ap-
proximation of matrices and tensors. 2008, www.matheon.de/preprints/4903.
[32] V. Gavini, J. Knap, K. Bhattacharya, and M. Ortiz. Non-periodic Finite-
element formulation of orbital-free density-functional theory. J. Mech. Phys.
Solids 55(4) (2007), 669-696.
[33] V. Gavini, K. Bhattacharya, and M. Ortiz. Quasi-continuum orbital-free
density-functional theory: A route to multi-million atom non-periodic DFT
calculation. J. Mech. Phys. Solids 55(4) (2007), 697-718.
149
Bibliography
[34] I.P. Gavrilyuk, W. Hackbusch and B.N. Khoromskij. Hierarchical Tensor-
Product Approximation to the Inverse and Related Operators in High-
Dimensional Elliptic Problems. Computing 74 (2005), 131-157.
[35] I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij. Tensor-product ap-
proximation to elliptic and parabolic solution operators in higher dimensions.
Computing 74 (2005), 131-157.
[36] I. P. Gavrilyuk, W. Hackbusch and B. N. Khoromskij. Data-sparse approxi-
mation of a class of operator-valued functions. Math. Comp. 74 (2005), pp.
681-708.
[37] G.H. Golub, C.F. Van Loan. Matrix Computations. Johns Hopkins Univer-
sity Press, Baltimore, MD, 1996.
[38] L. Grasedyck. Hierarchical Singular Value Decomposition of Tensors. SIAM
J. Matrix Anal. Appl. 31:2029-2054, 2010.
[39] L. Grasedyck. Polynomial Approximation in Hierarchical Tucker Format by
Vector-Tensorization. Preprint 43 DFG 1324, submitted to BIT.
[40] M. Griebel and J. Hamaekers. Sparse grids for the Schroedinger equation.
M2AN, 41 (2007), pp. 215-247.
[41] W. Hackbusch. Hierarchische Matrizen: Algorithmen und Analysis. Springer
Verlag, Berlin 2009.
[42] W. Hackbusch. Fast and Exact Projected Convolution for Non-equidistant
Grids. Computing 80 (2007), 137-168.
[43] W. Hackbusch. Efficient convolution with the Newton potential in d dimen-
sions. Numerische Mathematik, 110 (2008) 4, p. 449-489.
[44] W. Hackbusch and B.N. Khoromskij. Low-rank Kronecker product approxi-
mation to multi-dimensional nonlocal operators. Part I. Separable approxi-
mation of multi-variate functions. Computing 76 (2006), 177-202.
[45] W. Hackbusch and B. N. Khoromskij. Low-rank Kronecker-product Approxi-
mation to Multi-dimensional Nonlocal Operators. Part II. HKT Representa-
tion of Certain Operators. Computing 76 (2006), pp. 203-225.
[46] W. Hackbusch and B. N. Khoromskij. Tensor-product Approximation to Op-
erators and Functions in High dimension. Journal of Complexity 23 (2007),
697-714.
150
Bibliography
[47] W. Hackbusch, B.N. Khoromskij and E.E. Tyrtyshnikov. Hierarchical Kro-
necker tensor-product approximations. J. Numer. Math. 13 (2005), 119–156.
[48] W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov. Approximate iteration
for structured matrices. Numer. Math., 109 (2008), 365-383.
[49] W. Hackbusch, B.N. Khoromskij, S. Sauter and E. Tyrtyshnikov. Use of Ten-
sor Formats in Elliptic Eigenvalue Problems. Preprint 78, MPI MIS, Leipzig
2008, submitted.
[50] W. Hackbusch and S. K¨uhn. A new scheme for the tensor representation. J.
Fourier Anal. Appl. 15 (2009), no. 5, 706-722.
[51] R.J. Harrison, G.I. Fann, T. Yanai, Z. Gan, and G. Beylkin. Multiresolution
quantum chemistry: Basic theory and initial applications. J. of Chemical
Physics, 121 (23): 11587-11598, 2004.
[52] E. Hayryan and V. Khoromskaia. Low-rank Approximation of the Electro-
static Potential of Proteins, 2010, (in progress).
[53] T. Helgaker, P. Jørgensen and J. Olsen. Molecular Electronic-Structure The-
ory. Wiley, New York, 1999.
[54] F.L. Hitchcock. The expression of a tensor or a polyadic as a sum of products.
J. Math. Physics, 6 (1927), 164-189.
[55] F.L. Hitchcock. Multiple invariants and generalized rank of a p-way matrix
or tensor. J. Math. Physics, 7 (1927), 39-79.
[56] M. Ishteva, L. De Lathauwer, P.-A. Absil and S. Van Huffel. Differential-
geometric Newton method for the best rank-(R1, R2, R3)approximation of
tensors. Numer. Algorithms 51 (2009), no. 2, 179-194.
[57] V. Khoromskaia. Computation of the Hartree-Fock Exchange in the Tensor-
structured Format. Computational Methods in Applied Mathematics, Vol.
10(2010), No 2, pp.204-218.
[58] V. Khoromskaia. Multilevel Tucker Approximation of 3D Tensors. 2010, in
progress.
[59] B. N. Khoromskij. Structured Data-sparse Approximation to High Order Ten-
sors arising from the Deterministic Boltzmann equation. Math. Comp. 76
(2007), pp. 1292-1315.
151
Bibliography
[60] B. N. Khoromskij. Structured Rank-(r1, ..., rd)Decomposition of Function-
related Tensors in Rd.Comp. Meth. in Applied Math., 6(2006), 2, 194-220.
[61] B. N. Khoromskij. Fast and Accurate Tensor Approximation of Multivariate
Convolution with Linear Scaling in Dimension. J. Comp. Appl. Math., 2010,
DOI:10.1016/j.cam.2010.02.004, to appear.
[62] B. N. Khoromskij. On Tensor Approximation of Green Iterations for Kohn-
Sham equations. Computing and Visualisation in Science, (2008) 11:259-271.
[63] B.N. Khoromskij. Tensor-Structured Preconditioners and Approximate In-
verse of Elliptic Operators in Rd.J. Constructive Approximation, 30:599-620
(2009).
[64] B.N. Khoromskij. O(dlog n)-Quantics Approximation of n-dTensors in
High-Dimensional Numerical Modeling. Preprint MPI MIS 55/2009, Leipzig
2009; J. Constructive Approximation, accepted.
[65] B.N. Khoromskij. An Introduction to Structured Tensor-Product Represen-
tation of Discrete Nonlocal Operators. Lecture Notes 27, MPI MIS, Leipzig
2005.
[66] B.N. Khoromskij. Tensors-structured Numerical Methods in Scientific Com-
puting: Survey on Recent Advances. Preprint MPI MIS 21/2010, Leipzig 2010
(submitted).
[67] B. N. Khoromskij and V. Khoromskaia. Low Rank Tucker Tensor Approxi-
mation to the Classical Potentials. Central European J. of Math., 5(3) 2007,
1-28.
[68] B.N. Khoromskij and V. Khoromskaia. Multigrid Tensor Approximation of
Function Related Arrays. SIAM J. on Scient. Computing, 31(4), 3002-3026
(2009).
[69] B. N. Khoromskij, V. Khoromskaia, S. R. Chinnamsetty, H.-J. Flad. Tensor
Decomposition in Electronic Structure Calculations on 3D Cartesian Grids.
J. of Comput. Physics, 228 (2009) 5749-5762.
[70] B.N. Khoromskij, V. Khoromskaia and H.-J. Flad. Numerical solution of
the Hartree-Fock equation in multilevel tensor-structured format. SIAM J. on
Scient. Computing, 33(1), 45-65 (2011).
152
Bibliography
[71] B.N. Khoromskij, and I. Oseledets. Quantics-TT approximation of elliptic
solution operators in higher dimensions. Preprint MPI MiS 79/2009, Leipzig
2009, submitted.
[72] B.N. Khoromskij, and Ch. Schwab. Tensor Approximation of Multi-
parametric Elliptic Problems in SPDEs. Preprint MPI MiS 9/2010, Leipzig
2010, SISC, accepted.
[73] T. Kolda. Orthogonal tensor decompositions. SIAM J. Matrix Anal. Appl. 23
(2001) 243-255.
[74] T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications.
SIAM Rev, 51(2009), no. 3, pp.455-500.
[75] D. Kressner and C. Tobler. Krylov subspace methods for linear systems with
tensor product structure. SIAM J. Matrix Anal. Appl., 31(4):1688-1714, 2010.
[76] P.M. Kroonenberg and J. De Leeuw. Principal component analysis of three-
mode data by means of alternating least squares algorithms. Psychometrika,
45 (1980) 69-97.
[77] J.B. Kruskal. Three-way arrays: rank and uniqueness of trilinear decom-
positions, with applications to arithmetic complexity and statistics. Linear
Algebra Appl., 18 (1977) 95-138.
[78] C. Le Bris. Computational chemistry from the perspective of numerical anal-
ysis. Acta Numerica (2005), 363 - 444.
[79] Ch. Lubich. On Variational Approximations in Quantum Molecular Dynam-
ics. Math. Comp. 74 (2005), 765-779.
[80] Ch. Lubich. From quantum to calssical molecular dynamics: reduced models
and numerical analysis. Zurich Lectures in Advanced Mathematics, EMS,
2008.
[81] F.R. Manby, P.J. Knowles and A.W. Lloyd. The Poisson equation in density
fitting for the Kohn-Sham Coulomb problem. J. Chem. Phys. 115 (2001),
9144-9148.
[82] K.K. Naraparaju and J. Schneider. Generalized Cross Approximation for 3d-
tensors. Preprint 29/2010 MPI MIS, Leipzig, 2010.
[83] I.V. Oseledets. Compact matrix form of the d-dimensional tensor decompo-
sition. Preprint 09-01, INM RAS, Moscow 2009.
153
Bibliography
[84] I.V. Oseledets. Approximation of 2d×2dmatrices by using tensor decompo-
sition. SIAM J. Matrix Anal., 2009, accepted.
[85] I.V. Oseledets, D.V. Savostyanov, and E.E. Tyrtyshnikov. Tucker dimen-
sionality reduction of three-dimensional arrays in linear time. SIMAX, 30(3),
939-956 (2008).
[86] I. Oseledets, D. Savostyanov, E. E. Tyrtyshnikov. Linear algebra for tensor
problems.. Computing, 85 (2009), no. 3, 169-188.
[87] I. Oseledets and E. E. Tyrtyshnikov. Breaking the curse of dimensionality
or how to use SVD in many dimensions. SIAM J. Scient. Computing, 31
(2009), no.5, 3744-3759.
[88] I. Oseledets and E. E. Tyrtyshnikov. TT-cross approximation for multidi-
mensional arrays. Linear Algebra Appl. 432 (2010), no.1, 70-88.
[89] P. Pulay. Improved SCF convergence acceleration. J. Comput. Chem. 3, 556-
560 (1982).
[90] R. Polly, H.-J. Werner, F. R. Manby and P. J. Knowles. Fast Hartree-Fock
theory using density fitting approximations. Mol. Physics, 102:2311-2321,
2004.
[91] M. Reed, and B. Simon. Functional analysis. Academic Press, 1972.
[92] B. Savas, L.-H. Lim. Quasi-Newton methods on Grassmanians and multilin-
ear approximations of tensors. SIAM J. Scient. Computing, submitted.
[93] R. Schneider. Analysis of the projected coupled cluster method in electronic
structure calculation. Numer. Math. 113, (2009), no. 3, 433-471.
[94] R. Schneider. Multiscalen-Matrixkompression und Wavelet-Matrix-
kompression. Tuebner Verlag, 1998.
[95] R. Schneider, Th. Rohwedder, J. Blauert, A. Neelov. Direct minimization
for calculating invariant subspaces in density functional computations of the
electronic structure. Journal of Comp. Math. 27 (2009), no. 2-3, 360-387.
[96] J. Sielk, H. F. von Horsten, F. Kr¨uger, R. Schneider, B. Hartke. Quantum-
mechanical wavepacket propagation in a sparse, adaptive basis of interpolating
Gaussians with collocation. Physical Chemistry Chemical Physics, 2009, 11
pp.463-475.
[97] A. Smilde, R. Bro, P. Geladi. Multi-way Analysis. Wiley, 2004.
154
Bibliography
[98] F. Stenger. Numerical methods based on Sinc and analytic functions.
Springer-Verlag, Heidelberg, 1993.
[99] J. Strang and G.J. Fix. An Analysis of the Finite Element Method. Prentice-
Hall, inc. N. J., 1973.
[100] E. S¨uli, D. Mayers. An introduction to Numerical Analysis. Cambridge Uni-
versity Press, 2008.
[101] E. E. Tyrtyshnikov. A Brief Introduction to Numerical Analysis. Birkhauser,
Boston, 1997.
[102] E. E. Tyrtyshnikov. Mosaic-skeleton approximations. Calcolo 33, 47-57
(1996).
[103] E. E. Tyrtyshnikov. Incomplete cross approximation in the mosaic-skeleton
method. Computing 64, 367-380 (2000).
[104] E.E. Tyrtyshnikov. Tensor approximations of matrices generated by asymp-
totically smooth functions. Sbornik: Mathematics 194, No. 5-6 (2003), 941
954 (translated from Mat. Sb. 194, No. 6 (2003), 146–160).
[105] E.E. Tyrtyshnikov. Kronecker-product approximations for some function-
related matrices. Linear Algebra Appl. 379 (2004), 423–437.
[106] L.R. Tucker. Some mathematical notes on three-mode factor analysis. Psy-
chometrika 31 (1966) 279-311.
[107] J. VandeVondele, M. Krack, F. Mohamed, M. Parinello, Th. Chassaing,
J. Hutter. QUICKSTEP: Fast and accurate density functional calculations
using a mixed gaussian and plane waves approach. Comp. Phys. Comm.,
167(2005) 103-128.
[108] H.-J. Werner, P.J. Knowles et al. MOLPRO, version 2002.10, a package of
ab initio programs for electronic structure calculations.
[109] T. Yanai, G. Fann, Z. Gan, R. Harrison and G. Beylkin. Multiresolution
quantum chemistry: Hartree-Fock exchange. J. Chem. Phys. 121 (14) (2004)
6680-6688.
[110] H. Yserentant. The hyperbolic cross space approximation of electronic wave-
functions. Numer. Math., 105 (2007) 659-690.
[111] H. Yserentant. Regularity and Approximability of Electronic Wave Func-
tions. Lecture Notes in Mathematics series, Springer-Verlag, 2010, to appear.
155
Bibliography
[112] T. Zhang and G. Golub. Rank-0ne approximation to high order tensors.
SIAM J. Matrix Anal. Appl. v. 23 (2001) 534-550.
156
Bibliography
Daten zum Autor:
Name Venera Khoromskaia
geboren 25. August 1952, in Stadt Kazan, Russland
1969-1974 staatliche Universit¨at Kazan, Physik Fakult¨at.
Abschluss: Dipl.- Phys.
1974 -1995 Joint Institute for Nuclear Research, Dubna, Russland
wiss. Mitarbeiter
1995 -2005 Aufenthalt in Deutschland (ohne Arbeitserlaubnis)
2001, 2002, 2003 wiss. Stipendien an der Universit¨at Leipzig,
Institut f¨ur Informatik
2006 - Max-Planck-Institut f¨ur Mathematik in den Naturwissenschaften
wiss. Mitarbeiter
Bibliographische Daten
Numerical Solution of the Hartree-Fock Equation by the Multilevel Tensor-structured
Methods.
(Numerische osung der Hartree-Fock-Gleichung mit der Multilevel Tensor-struktur-
Verfahren.)
Venera Khoromskaia
Dissertation
157 Seiten, 58 Abbildungen, 112 Referenzen
Hiermit erkl¨are ich, die vorliegende Dissertation selbst¨andig und ohne unzuassige
fremde Hilfe angefertigt zu haben. Ich habe keine anderen als die angef¨uhrten
Quellen und Hilfsmittel benutzt und amtliche Textstellen, die ortlich oder sinn-
gem¨aß aus ver¨offentlichten oder unver¨offentlichten Schriften entnommen wurden,
und alle Angaben, die auf m¨undlichen Ausk¨unften beruhen, als solche kenntlich
gemacht. Ebenfalls sind alle von anderen Personen bereitgestellten Materialien
oder erbrachten Dienstleistungen als solche gekennzeichnet.
Leipzig, 21 September, 2010
............................................
(Venera Khoromskaia)
157