scieee Science in your language
[en] (orig)
Development and Application of the
S/PHI/nX Library:
First-principles Calculations of
Thermodynamic Properties
of III-V Semiconductors
by
Sixten Boeck
Development and Application of the S/PHI/nX Library:
First-principles Calculations of Thermodynamic Properties
of III-V Semiconductors
Dissertation
zur Erlangung des akademischen Grades
Doktor der Naturwissenschaften (Dr. rer. nat.)
vorgelegt dem
Department Physik der Fakultät für Naturwissenschaften
an der Universtiät Paderborn
Sixten Boeck
Promotionskommission
Vorsitzender Prof. Dr. phil. Klaus Lischka
Gutachter Prof. Dr. rer. nat. Wolf Gero Schmidt
Gutachter Prof. Dr. rer. nat. Jörg Neugebauer
Beisitzer Dr. rer. nat. Thomas Hangleiter
Tag der Einreichung 30.6.2009
Tag der mündlichen Prüfung 3.9.2009
Summary
Computer simulations are becoming increasingly important for developing new materials. This process is
largely triggered by advances in our physical and mathematical understanding of materials and the recent
progress in computer architectures. An important contribution of computational physics to material research
and design is the development of highly optimized methods to accurately model material properties. The
description of many material properties requires a seamless consideration of various length and time scales.
Therefore, a new family of scale-bridging algorithms (multi scale) and algorithms combining various physics
disciplines (multi physics) is currently in the focus of method development. Due to the huge computa-
tional demand of such methods and the tremendous running costs of high performance computing facilities,
increasing the performance of the applied algorithms is critical. Due to recent advances in computer tech-
nology the development/optimization of novel algorithms becomes increasingly challenging. It requires a
thorough knowledge of physics and numerics as well as state-of-the-art computer science. The gap which
opens between physics and computer science creates a new interdisciplinary research field.
The objective of this thesis was the development and implementation of a new physics meta-language which
simplifies the development of algorithms in computational materials design (CMD) significantly. (i) State-
of-the-art computer science techniques have been applied or developed in this work to provide language
elements to express algebraic expressions efficiently on modern computer platforms. (ii) Quantum mechanical
algorithms are crucial in CMD. The new meta-language supports the Dirac notation to implement such
algorithms in the native language of physicists. (iii) The language is completed by elements to express
equations of motions efficiently which is required for implementing structural algorithms such as molecular
dynamics.
A major goal of this work was to combine an intuitive algebra/physics programming interface with high run-
time performance. Therefore, a major challenge was to allow the compiler to “understand” the algebraic or
even quantum mechanical context. Only with this knowledge the compiler can generate machine code which
is (at least) as efficient as manually optimized code. This has been accomplished by deriving new techniques,
such as fully automatic BLAS/LAPACK function mapping, algebra type mapping, and the application of
sophisticated template techniques. Further details like memory management, efficiently exploiting the com-
puter’s level caches and arithmetic pipelines which had formerly to be addressed by physicists are in our
approach entirely shifted to the compiler. With the new technique of virtual templates the compiler can now
even detect the quantum mechanical context of Dirac elements. While Dirac projectors, scalar products with
metrics, Dirac operators, and Dirac vectors look syntactically very similar, this technique allows the compiler
to recognize these terms and generate the proper highly efficient function calls. With virtual templates an
interface which is strongly reminiscent to quantum mechanical textbooks could be provided. Equations of
motions can be intuitively expressed exploiting transformation pipelines which we developed in this work.
In order to demonstrate the power of the this approach the full-featured plane-wave framework S/PHI/nX
has been developed based on the new meta-language. The S/PHI/nX source code is remarkably short and
transparent which simplifies code maintenance and the introduction of new sophisticated algorithms. The
intuitive interface allows for a drastic reduction of the workload when implementing new CMD algorithms.
Various benchmarks which have been conducted in this study compare S/PHI/nX with other state-of-the-art
plane-wave packages with respect to runtime performance and accuracy. The obtained results indicate that
the highly abstract S/PHI/nX approach yields a very high optimization level.
Since the computation of thermodynamic properties from first-principles requires very high accuracy and is
computationally very demanding, computing these properties for a wide range of technologically important
semiconductors provided a perfect benchmark to demonstrate the efficiency of S/PHI/nX. Based on these
calculations we verified the general trends of phonon spectra, the location and amplitudes of the thermal
anomalies of these systems. We compared our LDA and PBE data with the experiment and confirmed LDA
to be a reliable basis for computing these properties for the class of III-V semiconductors in the zincblende
phase.
With this work the new simulation package S/PHI/nX will be introduced which has been already applied
successfully to a broad spectrum of systems, ranging from bio-inspired materials to metallic surfaces. The
modular approach allows for a simple extension of S/PHI/nX with novel methods in future versions.
4
Contents
1 Theory 17
1.1 The Many-body problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.1.1 Born-Oppenheimer approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1.2 Electron-electron interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Density functional theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.1 Kohn-Sham formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2.2 Kohn-Sham equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.2.3 XCfunctional......................................... 24
1.3 Periodic boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Integration over the Brillouin zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.5 Valence/core partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.5.1 Pseudo-potential theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.5.2 All-electron approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.6 Tight-binding methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.7 Forces in ionic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
1.7.1 Hellmann-Feynman theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.8 Conclusions.............................................. 42
2 Methods 43
2.1 Electronic minimization schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.1 Gradients δ
δ!Ψ|....................................... 44
2.1.2 Search direction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.3 Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.1.4 Conjugate gradient methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.2 Structural properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5
CONTENTS CONTENTS
2.2.1 QuasiNewton......................................... 54
2.2.2 Molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3 Deriving thermodynamic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.1 Free energy surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.3.2 Born-effectivecharges .................................... 58
3 S/PHI/nX 61
3.1 Basis-set independent implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.1 Matrix notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 Bra-Ket notation....................................... 64
3.1.3 Programming languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.4 BLAS/LAPACK interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.2 The Dirac notation in S/PHI/nX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2.1 Conventional approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.2 Modular approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.2.3 Object-oriented approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.2.4 Example............................................ 96
3.3 ClassHierarchy............................................ 96
3.3.1 Electronic minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
3.3.2 Representing atomic structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
3.3.3 Add-ons............................................110
3.4 Comparison with VASP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.5 Conclusions..............................................114
4 Applications 117
4.1 Introduction..............................................117
4.2 Thermodynamic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.2.1 Convergence aspects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.3 Comparison with experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3.1 Bulk properties at T=0 K . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
4.3.2 Phonon spectra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.3.3 Thermal expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.3.4 Heatcapacity.........................................137
4.3.5 Conclusions..........................................140
6
CONTENTS CONTENTS
5 Conclusions and Outlook 143
A Computational details 149
A.1 Pseudo potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A.2 Convergence parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Bibliography 151
7
CONTENTS CONTENTS
8
Nomenclature
Units
If not specified all units are given in atomic units, i.e., m=e=c=1.
Symbols
The following table contains the symbols used in this work. In general, we use hat (ˆ) for denoting operators
and tilde (˜) for screened or softened entities. Vectors and matrices are indicated with bold letters.
Symbol Description
AsArea of the surface
ciσk(G)Vector representation of plane-wave expansion coefficients
CiG(σk)Matrix representation of plane-wave expansion coefficients
Etot,EH,Ekin Total energy and its contributions: Hartree energy, kinetic energy
Eps
loc,E
ps
nl ,E
xc local and non-local pseudo potential energy, exchange-correlation energy
εiσkOne-particle energy
#xc,#hom
xc exchange-correlation energy (of the homogeneous electron gas) per particle
FHelmholtz free energy
Fi,FHF
i,FPulay
iForce, Hellmann-Feynman force, Pulay force acing on atom i
φis,ia(G)Form factor of atom iabelonging to species is
focc
iσkOccupation numbers of the state (i, σ,k).
GGibbs free energy
"G|Plane wave basis (Gspace), used to represent %(G)and veff(G)
"G+k|k-point dependent plane wave basis, used to represent |Ψiσk#
γsSurface energy
HfHeat of formation
ˆ
HHamilton operator
9
CONTENTS CONTENTS
Symbol Description
iBloch index
˜
iWave function dependent index mapper, e.g. ˜
i|˜
Ψ"$→ (iσk)or ˜
i|µ"$→ (isinnlm)
is,iaIndex of species, index of atom
kkpoint
kBBoltzmann constant
ˆ
LLaplace operator
langular momentum quantum number
mmagnetic momentum quantum number
µiChemical potential of species i
|µisianlm#Atomic orbital µbelonging to the species is, the atom ia
with the main quantum number n, the angular momentum l,
and the magnetic projection m
Nis,n
ion Number of particles of species is,number of ions
n, npw number of states, number of plane-waves
nel,nat number of electrons, number of atoms
nDoF Number of degrees of freedom.
Volume of the unit cell, =a1(a2×a3)=|A|
RResidual vector: difference between the solution x0and the current x
RRealspace vector
Rlat Lattice vector
"R|Realspace, sampled on FFT grid
"r|Radial grid, sampled on logarithmic mesh
riCoordinate of the ith electron
rGaussian Radius of the artificial Gaussian screening charge
ˆ%Density matrix
%(R),%(G)Electronic charge density in real or reciprocal space
%Gauss Artificial Gaussian screening charge
˜%HScreened Hartree density ˜%H=%%Gauss.
%ps Pseudo atomic charge density
SEntropy
ˆ
SSymmetrization operator
Sis(G)Structure factor of the species is
σSpin quantum number
10
CONTENTS CONTENTS
Symbol Description
pPressure
|Ψ#,|ΨMB#single-particle wave function, many-body wave function
|ΨBO#Born-Oppenheimer wave function; electrons decoupled from the ionic movement
{|Ψ#}Set of all wave functions of the system
|˜
Ψiσk#Bloch wave function with the band index i, the spin channel σon the kpoint
τisiaAtomic coordinates of the atom iabelonging to the species is
TTemperature
ˆ
T, ˆ
TsKinetic energy operator, single-particle kinetic operator
ˆ
Te,ˆ
TIkinetic energy operator of electronic/ionic subsystem
ˆ
TisiaTranslation operator. Move to coordinates of the atom iabelonging to species is.
ups
ll-component of the radial pseudo wave function
veff(R),v
eff(G)Effective potential in real or reciprocal space
ˆveI Potential describing core electrons in Coulomb potentials of nuclei
˜veI Replacing Coulomb potential by pseudo potential yields softened veI
ˆvII repulsive ion-ion Coulomb interaction
ˆvee repulsive electron-electron Coulomb interaction
ˆvHHartree potential
ˆvps
loc,ˆvps
nl Local pseudo potential, non-local pseudo potential
ˆvXC exchange-correlation potential
|ξ#Gradient of the Hamiltonian, |ξ#=ˆ
H|Ψ#
ylm Spherical harmonics with angular momentum land magnetic projection m
Ylm Real spherical harmonics (linear combinations of ylm).
Also known as Cartesian spherical harmonics
zval
is,Z
isValence charge, ionic charge
11
CONTENTS CONTENTS
Abbreviations
AE All-electron
BLAS Basic Linear Algebra Subprograms
c.c. complex conjugate
CCG complex conjugate gradient
CI Configuration interaction
DFT Density functional theory
FFT Fast Fourier transformation
FP-LAPW Full potential linear augmented plane waves
GGA Generalized gradient approximation
HF Hartree Fock
LAPACK Linear Algebra Package
LAPW Linear augmented plane waves
LDA Local density approximation
lo local orbitals
TB Tight-Binding
DFTB Density Functional based Tight Binding
PAW Projector augmented waves
PS Pseudo
XC Exchange-correlation
12
Introduction
The process of developing new materials is still mainly an empirical approach nowadays [1]. Typically, the
materials are first being processed, then the structure and properties are investigated. Material properties
are tuned by iterating this procedure with varying processing parameters. This approach is often very time-
consuming and expensive. Due to advances in the physical and mathematical understanding of materials
in combination with recent progresses in computer architectures, computer simulations are becoming a key
technology to support future materials research and design. Since the computational approach allows not
only to save costs but also to decrease the time to introduce new materials to the market, the new area
of computational materials design (CMD) becomes an increasingly interesting research field in science and
engineering. To address the fundamental questions of the emerging discipline great challenges in physics,
mathematics/numerics, and computer sciences have to be addressed to improve efficiency and reliability and
to enhance the predictive power of computations.
A wide range of methods and tools have been developed recently focusing on various single length and time
scales. At the microscopic scale Density Functional Theory [2, 3] (DFT) has been proven to be reliable and
efficient in predicting material properties. With modern computers and optimized DFT program packages it
is possible to simulate systems consisting of 102to 103atoms. The theoretical description of larger systems
requires further approximations, such as (semi-) empirical potentials [4]. For example, (Density Functional
based) Tight-Binding (DF)TB [5, 6, 7] can treat systems up to 104(DFTB) and even 107(TB) atoms. In
contrast to the above mentioned quantum-mechanical potentials also classical potentials, such as force fields
(MM1 [8], MM2 [9], MM3 [10, 11, 12], MM4 [13, 14, 15, 16], AMBER [17, 18], CHARMM [19]), are often
applied in particular for the simulation of biological systems.
Although these methods have been applied very successfully, there is a constant need for improving the
existing algorithms. Increasing the performance as well as improving accuracy are always focal points in CMD
method development. While in the past years highly optimized methods in single scales have been developed,
the description of many material properties requires the consideration of various length- and time scales.
Therefore, a new family of scale-bridging methods (e.g. multi-grids [20, 21], wavelets [22], heterogeneous
multi-scale method [23], adaptive model refinement, coarse-grained simulations [24]) is currently in the focus
of method development. Since computational materials design becomes also industrially applicable, it can
be assumed that the improvement of existing algorithms and the connection of algorithms across the scales
(multi-scale) as well as the combination of various physics disciplines (multi-physics) will become a more
important research field in computational physics.
Besides the algorithmic work, method and code development are also critical elements of CMD. The computer
architectures on which the program packages run have evolved significantly over the past years. Most of the
applied methods in CMD (such as DFT) are computational very demanding and require program packages
to be executed on high performance computers. Due to the remarkable running costs of such compute
13
CONTENTS CONTENTS
facilities, the optimization of CMD programs is crucial. The complexity of algorithmic optimization becomes
clear considering that implementing algorithms in DFT in high level toolkits (such as Mathematica) can
be accomplished within days or weeks while the development of an optimized CMD package applicable
to realistic systems creates workloads of years [25, 26, 27, 28, 29, 30, 31, 32, 33, 34]! Since there is no
approach available which is able to bridge the constantly broadening gap between computer science and
computational physics, codes rarely reach peak performance. Method development in the field of high
performance computing (HPC) requires thorough knowledge in both computer science and physics which
opens a new inter-disciplinary field of research. Without such interdisciplinary work no modern simulation
code can be developed nowadays.
In the scope of this work we contribute to closing the gap between computational physics and computer
science. In our approach the physicists should be “screened” from the complex details of HPC. Therefore,
a new physics programming meta-language has been developed which provides elements addressing the
above mentioned requirements of CMD. (i) State-of -the-art computer science techniques have been applied
or developed in this work to provide language elements for expressing algebraic expressions efficiently on
modern high-performance computer architectures. (ii) In order to address the development of quantum
mechanical algorithms which are crucial in CMD, the new meta-language supports the Dirac-formalism [35].
(iii) The approach is completed by an efficient way of expressing equations of motions to express structural
algorithms such as molecular dynamics.
In this work we derive the details of the above mentioned meta-language. Based on this concept a new
framework called S/PHI/nX has been developed in which the implementation of CMD algorithms becomes
significantly simpler than in conventional approaches. We employ state-of -the-art methods from computer
science to ensure peak performance of each language element. Within our framework physicists with only
rudimentary programming skills are able to develop efficient CMD simulation programs.
As proof of concept we demonstrate the power of this approach by using the new framework for developing
an efficient plane-wave DFT program package. It will be shown that the program is remarkably short with
respect to the number of code lines which simplifies understanding and code maintenance. Since in this work
we limit ourselves to deriving the concept, our program employs in the current version only norm-conserving
pseudo potentials [36] and minimization methods optimized for semiconductors. Of course, the program
package can easily be expanded by, e.g., modern basis-sets and more efficient minimizers.
The efficiency of the framework will be shown by means of calculations of realistic systems, namely the class
of III-V semiconductors. Here the point of interest lies in both the execution speed as well as the obtained
accuracy. A good test that requires very well converged parameters is the computation of thermodynamic
properties like the heat capacity or the linear expansion coefficients from first-principles. In Ref. [37] the
authors demonstrate how sensitive thermodynamic properties are with respect to the quality of the obtained
forces from DFT calculations. Therefore, a study of these sensitive properties is an excellent test for the
new framework.
From the physical and technological point of view the III-V semiconductors are most important. Electronic
and opto-electronic devices, including blue and UV lasers, are made of these compounds. It is essential to
understand their thermodynamic behavior during the growth process as well as their operation. Therefore,
in this work we investigate the temperature dependencies of the most important thermodynamic properties.
III-V semiconductors can crystallize in the wurzite and zincblende phase. For many cubic III-V semiconduc-
tors the experimentally accessible data on these properties scatter significantly and reliable data obtained
14
CONTENTS CONTENTS
from first-principles are often not available. We, therefore, investigate the thermodynamic behavior of III-V
semiconductors in the zincblende phase.
When computing properties within DFT a big degree of uncertainty arises from the exchange-correlation
(XC) potential. The influence of XC is strongly material and system dependent. For cubic III-V semi-
conductors, however, there is no investigation clarifying the influence of the choice of the XC potential to
the obtained thermodynamic properties. Thus, we compute the most important thermodynamic properties
using both the local density approximation (LDA [38]) as well as the generalized-gradient approximation
(GGA-PBE [39]).
The tetraedrically bound III-V semiconductors in the zincblende phase show a thermal anomaly at low
temperatures. Up to a critical temperature the materials contract at rising temperatures. Only above the
critical temperature these materials expand at increasing temperatures. While the underlying mechanism is
essentially understood (e.g. Ref. [40]) the exact location of the critical point as well as the pronunciation of
the anomaly is for some III-V semiconductors controversial. For example, in Ref. [41] GaP was reported not
exhibiting the anomaly while according to Refs. [42] and [43] a tiny anomaly effect at low temperatures was
observed. There is no ab-initio study available for this material in the cubic phase, neither is the influence
of the XC potential to the pronunciation of the theoretically obtained anomaly investigated.
Reliable data of the heat capacity of some III-V systems are very scattered. If theoretical data from first-
principles are available, they have been obtained only within LDA. We will extend the direct approach that
recently has been applied to unary metallic systems [37] to binary systems in order to compute the phonon
spectra, the temperature dependencies of the linear expansion coefficients as well as the heat capacities at
constant pressure and constant volume.
15
CONTENTS CONTENTS
16
Chapter 1
Theory
In this chapter we introduce the theoretical basis of this work and provide the required physical and math-
ematical/numerical formalism which is required to implement the efficient S/PHI/nX CMD library. On
modern computer platforms some approaches are capable of describing very accurately systems containing
only a few atoms while others can treat millions of atoms at a significantly lower accuracy [5, 6, 7]. Since
a mixture or combination of various methods is often cumbersome, the development of a generalized frame-
work is a major objective of this work. In the following sections our approach will be illustrated by means
of a Hamiltonian employing norm-conserving pseudo potentials [36] as proof of concept. The wave func-
tions will be represented in a plane-wave basis-set. Therefore, in this chapter the corresponding theoretical
descriptions of the plane-wave basis as well as norm-conserving pseudo potentials are discussed. For the
design of a general framework it is, however, important that the concept can be easily extended to describe
alternative valence/core partitioning. We, therefore, briefly sketch various alternative methods ranging from
APW [44] to FP-LAPW+lo [45, 46] and PAW [31]. Although they will not be implemented in the scope of
this work, the framework will prepare interfaces and concepts which allow a straightforward implementation
of these methods. Besides an accurate but computationally demanding ab-initio description of systems,
the framework should be applicable for larger systems. Here, the number of atoms that can be computed
in reasonable times can be traded with lower accuracy using, e.g., (semi-)empirical methods. In the end
of this chapter tight-binding methods, structural optimization schemes, and molecular dynamics are briefly
sketched.
1.1 The Many-body problem
Calculations on the atomic scale focus on the description of atoms, molecules, and crystals. In quantum
mechanics a system of nel electrons and NIions can be represented by a Hamilton operator ˆ
H
ˆ
H=ˆ
Te+ˆ
TIvee veI vII (1.1)
with ˆ
Teand ˆ
TIbeing the kinetic energy of the electronic and ionic subsystem, respectively. The potential
contribution ˆvee and ˆvII denote the repulsive electron-electron and ion-ion Coulomb interactions, respectively.
The remaining potential contribution ˆveI is the attractive Coulomb interaction between the nel electrons and
NIions. With MIbeing the mass of the Ith ion and ZIthe corresponding atomic number the contributions
17
1.1. THE MANY-BODY PROBLEM CHAPTER 1. THEORY
to the Hamiltonian become1
ˆ
H=
nel
!
i
1
2i
NI
!
I
1
2MI
I+
nel
!
i
nel
!
j>i
1
|rirj|
nel
!
i
NI
!
I
ZI
|riτI|+
NI
!
I
NI
!
J>I
ZIZJ
|τIτJ|.(1.2)
The symbol rirefers to the position of the i-th electron whereas the location of the I-th ion is specified by
τI.
The time-independent Schrödinger [47] equation provides a quantum mechanical description of the system
ˆ
H|ΨMB#=E|ΨMB#.(1.3)
Here |ΨMB#denotes the many-body wave function of the nuclei and the electrons. The symbol Eis the total
energy of the system. Solving the Schrödinger equation (1.3) gives access to all (static) properties of the
many-body problem.
The variational principle With the Rayleigh-Ritz variational principle [48, 49] a basis for numerical
solutions of the Schrödinger equation (1.3) is available. According to this principle, the expectation value
of a Hamiltonian calculated for any trail wave function Ψtrial is always greater or equal to the ground state
energy
E0E[ΨMB]="ΨMB|ˆ
H|ΨMB#
"ΨMB|ΨMB#(1.4)
or E0= min
{ΨMB}E[ΨMB].(1.5)
With the variational principle the Schrödinger equation (1.3) can be reformulated
δ""ΨMB|ˆ
H|ΨMB#−E"ΨMB|ΨMB##=0.(1.6)
This equation gives rise to a way of solving the Schrödinger equation iteratively. The total energy Eis a
Lagrange multiplier connected to the normalization of the solution ΨMB.
1.1.1 Born-Oppenheimer approximation
For realistic systems the Schrödinger equation (1.3) has 1023 degrees of freedom and all particles interact
with each other according to Eq. (1.2). The complexity of the quantum mechanical many-body problem
can be reduced by separating the electronic motion from that of the ions. Such an approximation can be
justified with the relatively huge mass of the ions compared to that of the electrons (me
MI(1): The neglected
energy contributions are by m
MIand (m
MI)1
2smaller than the electronic energy (see e.g. Ref. [45]) and are,
therefore, much smaller than the distances between electronic levels. Approximately, the electrons follow
the (slower) ions adiabatically. Therefore, this approximation is called adiabatic approximation, also known
as Born-Oppenheimer (BO) approximation2[51].
1Unless noted otherwise, we use atomic units throughout this work, i.e., me=c=!=e2
4π"0=1.
2The Born-Oppenheimer approximation fails, however, for systems where the nuclei movement is relatively fast with respect
to the electronic system [50], e.g., highly excited rotational states of molecules. Here the fast molecular movements do not allow
the electronic system to follow adiabatically!
18
CHAPTER 1. THEORY 1.1. THE MANY-BODY PROBLEM
By keeping the positions of the nuclei τ=(τ1,...,τNI)fixed, the equations of motion of the electrons at
r=(r1, . . . , rn)decouple from those of the nuclei. In this separation ansatz the nuclei merely modulate the
electronic wave functions and the total wave function can be expressed as
ΨMB(r,τ) = ΨBO
I(τ)ΨBO
e(r,τ).(1.7)
ΨBO
erefers to the solution of the Schrödinger equation with neglected kinetic energy of the nuclei ˆ
TI.ΨBO
I
is the wave function of the ions. The total energy values E({τI})at a fixed set of ionic positions form the
so-called Born-Oppenheimer surface.
1.1.2 Electron-electron interaction
Even after applying the Born-Oppenheimer approximation the quantum mechanical many-body problem
is still far too complex, mainly due to the remaining electron-electron interaction determined by the Pauli
principle and the Coulomb interaction between the electrons.
The first approach to cope with this difficulty was suggested by Hartree [52]. Here, the many-body wave
function |ΨMB#is approximated by a product of single-particle wave functions |ψi#.Each single particle
is moving in an averaged self-consistent potential of all single particles. |ψi#satisfies the single-particle
Schrödinger equation. The Hartree approach does not incorporate the Pauli principle. In order to do so in
the Hartree-Fock approach [53, 54] the wave function is approximated by an anti-symmetrized product
of northonormal spin orbitals ψHF
i(r).Each orbital is constructed from a spatial orbital φi(r)and a spin
function χ(σ).The Hartree-Fock wave function ΨHF
iis constructed from a Slater determinant of the ψHF
i
according to
ΨHF
i=1
n!|ψHF
i|ψHF
i=φi(r)χ(σ)(1.8)
with the orthonormality constraint
"ψHF
i|ψHF
j#=δij.
The Hartree-Fock equations read
$1
2δ(rr#)+vH(r)δ(rr#)+veI(r)δ(rr#)+vX(r,r#)%ψHF
idr#=!
i$=j
εijψHF
i(r).
The electron-electron interaction is referred to as the Hartree term vH(x).The new non-local term vX(r,r#)
is called exchange. It describes the energy gain due to the anti-symmetrization of the wave function when
two electrons with originally equal spin σreduce their Coulomb energy by flipping the spin of one electron
and occupying the same orbital. The Hartree-Fock equations can be solved in a self-consistent field (SCF)
calculation.
By combining Slater determinants (so-called “multi configuration”) the above picture can be improved. In
the configuration interaction [55] (CI) such linear combinations of Slater determinants of many config-
urations are used to approximate the many-body wave function. CI corrects the lack of correlation effects
in the Hartree-Fock approach and leads theoretically to an exact many-body wavefunction. However, the
computational effort in CI scales exponentially with the system size. Hence, CI can be presently applied to
very small systems only (20 atoms) [1].
In the Thomas-Fermi model [56, 57, 58] statistical considerations lead to an approximation of the electron
19
1.2. DENSITY FUNCTIONAL THEORY CHAPTER 1. THEORY
distribution in an atom. The first assumption is that the electrons are uniformly distributed in the phase
space per h3volume box with hbeing the Planck constant. The second assumption is the existence of an
effective potential which is determined by the charge of the nuclei and the electrons. From these assumptions
an energy expression depending only of the electron charge density can be derived. The kinetic energy is
being described very poorly in this model. The Thomas-Fermi model can be considered as the origin of
Density Functional Theory.
1.2 Density functional theory
In this section we provide a brief overview about Density Functional Theory (DFT) which is one of the main
focal points of this work. By providing two fundamental theorems Hohenberg and Kohn [2] accomplished to
form an exact theory that can be used for numerical calculations of realistic systems. The first theorem shows
that instead of the complex wave function the much simpler charge density can act as the key entity when
calculating the electronic ground state. The second theorem provides a way how an actual minimization
scheme can be realized. These two theorems are the foundation of DFT.
1.2.1 Kohn-Sham formalism
First Hohenberg Kohn theorem Both the ground state energy E0as well as the many body wave
function ΨMB of an electronic system which is specified by an Hamilton operator as in Eq. (1.2) can be
obtained by minimizing [59] the energy functional E[Ψ]
E0E[ΨMB]="ΨMB|ˆ
H|ΨMB#
"ΨMB|ΨMB#.(1.9)
The electronic system is, of course, determined by the underlying atomic structure given by an external
potential v(R)as well as the number of electrons n. Hence, the two entities v(R)and nalone determine
entirely the Hamiltonian of a system and thus, the electronic ground state energy.
The previous statement can be formulated even stronger because both v(R)and nare given by the density
(and a trivial constant c)
v=v[%(R)] + c, (1.10)
n[%(R)] = %(R)dR:(1.11)
The external potential - and hence the energy and the ground state wave function of an electronic system -
is entirely determined, within an additive constant3, by the electron density %(R)[50].
This theorem is the first Hohenberg Kohn theorem.
3The here involved constant is usually chosen such that vvanishes at R→∞.
20
CHAPTER 1. THEORY 1.2. DENSITY FUNCTIONAL THEORY
Second Hohenberg Kohn theorem The energy Evfor a given external potential v(R)can be written
as
Ev[%]= T[%]+vee[%]
& '( )
+
FHK[%]
veI[%](1.12)
=FHK[%]+ %(R)v(R)dR.(1.13)
The introduced term FHK is called Hohenberg-Kohn functional and will be discussed in detail below. In
analogy to the variational principle for wave functions (Eq. 1.4) the second Hohenberg-Kohn theorem
provides an energy variational principle for densities:
The functional Ev[%]of Eq. (1.12) becomes minimal at the correct ground state density. That minimal value
corresponds to the ground state energy value E0[50].
That means that for any (positive) trial density %trial,which fulfills %trial(R)dR=n, the inequality
E0Ev[%trial](1.14)
holds.
According to the second Hohenberg-Kohn theorem the problem of determining the ground state entities E0,
%0,and Ψ0for a given external potential is nothing but minimizing the energy functional Ev[%]of Eq. (1.12)
with respect to the density. With the constraint of keeping the number of electrons constant %(R)dR=n
the following equation has to be minimized
δ$Ev[%]µ(%(R)dRn)%=0,(1.15)
which gives with Eq. (1.12)
µ=δEv[%(R)]
δ%(R)=v(R)+δFHK[%]
δ%(R).(1.16)
The Lagrange multiplier µwhich ensures the previously mentioned electron conservation can be identified
with the chemical potential of the electrons.
1.2.2 Kohn-Sham equations
In order to apply the DFT formalism to realistic systems an explicit form of the Hohenberg-Kohn density
functional FHK[%]is required. An approximation which obtains rather accurate results has been introduced
in the Kohn-Sham method [3] which shall be briefly discussed in this section.
Introduction of Kohn-Sham orbitals Kohn and Sham suggested an indirect method to compute the
unknown functional FHK[%]. They introduced Kohn-Sham orbitals Ψisuch that the kinetic energy T[%]
can be determined to a reasonable accuracy leaving a small residual correction which is handled separately.
Therefore, an auxiliary system of non-interacting particles was introduced with the single-particle kinetic
energy Tsand the local single-particle potential vssuch that the single-particle ground state densities of the
interacting and non-interacting systems are equal. The functional FHK[%]can then be expressed as follows
21
1.2. DENSITY FUNCTIONAL THEORY CHAPTER 1. THEORY
FHK[%]=T[%]+vee[%](1.17)
=Ts[%]+J[%] + (T[%]Ts[%]) + (vee[%]J[%])
& '( )
Exc[%]
(1.18)
=Ts[%]+J[%]+Exc[%].(1.19)
Here, J[%]symbolizes the classical electrostatic repulsion of the electrons. Exc is called the exchange-
correlation energy consisting of [50]
the difference between the kinetic energy and the single-particle kinetic energy (TTs)and
the non-classical part of electron-electron interaction (vee J).
Note that after rewriting the functional F[%]only the still unknown expression Exc[%]contains terms of the
interacting electron system whereas Tsand Jdescribe the system of non-interacting electrons.
The kinetic energy of the non-interacting system in terms of the nlowest single-particle orbitals is
Ts[%]=
n
!
i"Ψi|1
22|Ψi#.(1.20)
The single-particle ground state density is computed from
%(R)=
n
!
i|Ψi(R)|2.(1.21)
The above expressions for the kinetic energy and for the density hold only as long as the Kohn-Sham orbitals
fulfill the orthonormalization constraint
δij ="Ψi|Ψj#.(1.22)
The energy functional can now be rewritten in terms of the nKohn-Sham orbitals
E[%]=FHK[%]+ v(R)%(R)dR
=Ts[%]+J[%]+Exc[%]+ v(R)%(R)dR
=
n
!
i"Ψi|Ts|Ψi#+J[%]+Exc[%]+ v(R)%(R)dR.(1.23)
The Euler equation (1.16) belonging to the above energy function can be expressed as follows
µ=δE[%(R)]
δ%(R)=δTs[%]
δ%(R)+δJ[%]
δ%(R)+δExc[%]
δ%(R)+v(R)
&'( )
veff(R)
δJ[#]
δ#(R)=vH=#(R!)
|RR!|dR#
δExc[#]
δ#(R)=vxc
µ=δTs
δ%(R)+veff(R).(1.24)
veff(R)=vH+vxc +v(R)(1.25)
22
CHAPTER 1. THEORY 1.2. DENSITY FUNCTIONAL THEORY
Note that so far this equation is nothing but a rearrangement of the Euler equation (1.16).
Search of the energy minimum Based on the rewritten energy functional Eq. (1.23) it can be concluded
that the variational search of the minimum of E[%]can be performed in the space of the single particle Kohn-
Sham orbitals {Ψi}.Therefore, a functional of the Kohn-Sham orbitals can be defined including Lagrange
multipliers λij to enforce orthonormalization (Eq. (1.22))
[{Ψi}]=E[%]!
i!
j
λij ("Ψi|Ψj#−δij).
Finding the minimum of E[%]in the space of the Kohn-Sham orbitals implies
δ[{Ψi}]!
= 0 = δE[%]
δ%
&'()
Eq.(1.24)
δ%
δ"Ψi|
&'()
|Ψi#
!
j
λij|Ψi#(1.26)
=(Ts+veff)|Ψi#−!
j
λij|Ψi#(1.27)
=(Ts+veff)|Ψi#=!
j
λij|Ψi#.(1.28)
The λij represent a hermitian matrix which can be diagonalized by applying an uniform transformation
to the Kohn-Sham orbitals |Ψi#.The Hamiltonian as well as the density are invariant with respect to the
uniform transformation. Hence, the above equation can be reformulated as follows
ˆ
Heff|Ψ#
i#=(Ts+veff)|Ψ#
i#=εi|Ψ#
i#.(1.29)
By priming the Kohn-Sham orbitals it should be emphasized that the uniform transformation has been
applied to Ψi.The |Ψ##are now eigenfunctions of ˆ
Heff.In the following the prime will be omitted.
Solving the Kohn-Sham equations The ground state of the interacting system can be found by solving
the equations $1
2+veff(R)%Ψi=εiΨi,i=1,2, . . . , n. (1.30)
and
%(R)=
n
!
i=1 |Ψi(R)|2.(1.31)
As veffdepends on %(R)via Eq. (1.24), the equations (1.25), (1.30), and (1.31) have to be solved self-
consistently. That means that the non-interacting electrons are moving in the effective self-consistent field
of all electrons. Such an approach is called self-consistent field (SCF) calculation. These three equations
are known as the Kohn-Sham equations and build up the backbone of the Kohn-Sham density functional
23
1.2. DENSITY FUNCTIONAL THEORY CHAPTER 1. THEORY
theory
I: veff(R)=v(R)+vH(R)+vxc(R)
II : *1
2+veff(R)+Ψi=εiΨi
III : %(R)=,n
i=1 |Ψi|2.
(1.32)
Conclusion and interpretation of the Kohn-Sham equations The Kohn-Sham method transforms
the many-body problem of interacting electrons into an effective single-particle problem by introducing n
Kohn-Sham orbitals. In DFT the many-body problem is not specified by the complex many-body wave
function ΨMB(r1,r2,r3, . . . , rn)with 3ncoordinates any longer. Instead the electron density %with only
three spatial dimensions becomes the key entity.
In contrast to the Thomas-Fermi method the kinetic energy of the non-interacting system is correctly ob-
tained. In return the computational effort is increased because the single equation to obtain the total density
becomes a system of nequations which has to be solved. The Kohn-Sham equations are reminiscent to the
previously mentioned Hartree equations. A major advantage over the Hartree method is that veffprovides
a way to incorporate exchange-correlation effects. Solving the Kohn-Sham equations is computationally less
demanding than the Hartree-Fock equations, mainly due to the non-local Fock operator. Compared to CI
the Kohn-Sham equations are dramatically simpler to evaluate.
Up to now the functional Exc[%]in Eq. (1.23) is still undefined. Since Exc contains terms of the interacting
system it is clear that only approximate expressions for this term can be found. It is also obvious that the
way of such an approximation strongly depends on how the density varies in space and is hence, system
dependent. The following section is, therefore, dedicated to this issue.
1.2.3 XC functional
In order to specify the Kohn-Sham equations (1.32) an explicit form of the exchange correlation contribu-
tion is still missing. The search for approximations providing high accuracies of the exchange correlation
functional is up to today one of the greatest challenges in DFT.
Local density approximation (LDA) The first and simplest approach to find an approximate expression
for an exchange correlation functional is to start from a uniform electron distribution.
ELDA
xc [%]= %(R)#hom
xc [%]dR.(1.33)
where #hom
xc is the exchange correlation energy per particle of the homogeneously distributed electron gas. It
can be divided into an exchange and a correlation part
#hom
xc [%]=#hom
x[%]+#hom
c[%].(1.34)
The Thomas-Fermi model provides an expression for the kinetic energy Tsas well as the exchange energy
εxof the uniform electron gas. The exchange energy per particle in the Thomas-Fermi model reads
#TF
x[%]=3
4$3
π%1
3
%(R)1
3.
24
CHAPTER 1. THEORY 1.3. PERIODIC BOUNDARY CONDITIONS
The correlation part #hom
cmust contain the remaining unknown contributions to Exc namely, the non-classical
part of the electron-electron interaction (vee J)as well as the difference TTs. An analytic expression
of the %dependence of #cis not available. However, due to quantum Monte Carlo calculations by Ceperley
and Alder [38] an interpolation formula to #cis at hand.
Accordingly, the exchange correlation potential vxc from Eq. (1.32) becomes
vLDA
xc (R)=δELDA
xc
δ%(R)=#hom
xc %(R)+#hom
xc
δ#hom
xc
δ%(R).(1.35)
LDA is a good approximation for system with slowly varying electron densities such as many bulk systems.
For systems such as atoms and molecules that show inhomogeneous densities, however, LDA may become
too inaccurate. In comparison with the experiment LDA often predicts too small lattice constants and bond
distances (overbinding). Binding and cohesive energies are usually too large [60].
Generalized Gradient Approximation (GGA) For systems in which the charge densities cannot be
simply approximated by an uniform electron gas the generalized gradient approximation (GGA) can be used.
Commonly, besides the electronic charge density also the gradient of the density is taken into account. Such
an approximation is referred to as generalized gradient approximation
EGGA
xc [%]= %(R)#(%,%)dR.(1.36)
In general, the errors introduced by the exchange-correlation functional cannot be quantified. For many sys-
tems GGA was found to correct the overbinding problem of LDA. The cohesive energies are often significantly
improved by applying GGA [60].
A major focus of this work is the investigation of thermodynamic properties of III-V semiconductors. In
order to estimate the uncertainty arising from XC, in this work all calculations will be performed with both
LDA and GGA-PBE [39]. Therefore, both LDA and GGA-PBE will be implemented in the S/PHI/nX
framework.
1.3 Periodic boundary conditions
When applying the Kohn-Sham formalism to periodic systems such as crystals, an infinite number of ions
have to be treated. Furthermore, the wave functions extend over the entire space. Hence, an infinite basis-set
would be required. For systems with periodic boundary conditions such as crystals the dimensionality of the
many-electron system can be drastically reduced when employing translational symmetry.
The periodic atomic structure of a crystal creates an periodic external potential vext(R)in which the electrons
move
vext(R+Rlat)=vext(R).(1.37)
The periodicity is given by the lattice vectors
Rlat =
3
!
i=1
niai,=|a1·a2×a3|.(1.38)
25
1.4. INTEGRATION OVER THE BRILLOUIN ZONE CHAPTER 1. THEORY
The three lattice vectors aiare the lattice vectors of the primitive unit cell with the volume .The nidenote
integer numbers. The translational symmetry of vext(R)suggests that also the Hamiltonian underlies the
same translation symmetry. According to Bloch’s theorem, the wave functions of such a Hamiltonian with
translational invariance can be factorized in a cell periodic part fi(R)and a wave-like part (phase factor)
(see Ref. [47])
Ψik(R)=eik·Rfi(R),f
i(R+Rlat)=fi(R).(1.39)
The indices idenote the band index and kpoint which lies in the first Brillouin zone, respectively.
1.4 Integration over the Brillouin zone
The computation of expectation values requires an integration over the Brillouin zone. In a bulk crystal all
occupied states iat each of the infinite kpoints contribute to the density %(R)and thus, to the potential
v(R).Hence, when computing the potential an infinite number of calculations is required. Wave functions are
smooth in reciprocal space and are almost identical at kpoints which are close to each other [61]. Therefore,
the wave functions in a region of kpoints can be approximately expressed by a single wave function at a
representative single kpoint. This allows to consider electronic states at a finite number of kpoints. The
integral over the Brillouin zone can be replaced by a discrete sum over a chosen k-point mesh
BZ
dk!
k
ωk∆Ω.(1.40)
The ωkare weight factors which fulfill the conservation law
!
k
ωk=1.(1.41)
The choice of k-points used for sampling the Brillouin zone determines the quality of the obtained SCF
results. There are various methods known to sample the k-space in order to integrate continuous functions
over the Brillouin zone [31, 62, 63, 64, 65, 66]. One of the most often applied techniques is the special-point
scheme of Monkhorst and Pack [62] which will be implemented in the plane-wave framework S/PHI/nX (see
Fig. 1.1).
The Monkhorst-Pack scheme is very successful for total energy calculations of semiconductors and insulators.
However, at a first glimpse it fails at T=0 K when performing total energy calculations of metallic systems
because the function to be integrated becomes discontinuous at the Fermi edge. In order to cope with this
problem occupation numbers focc
iσkcan be introduced when computing the electronic charge density
%(R)=!
iσk
ωkfocc
iσk|Ψiσk|2(1.42)
which can be obtained according to the Fermi function
focc
iσk=1
exp(εiσkεF
kBT)+1.(1.43)
26
CHAPTER 1. THEORY 1.4. INTEGRATION OVER THE BRILLOUIN ZONE
Figure 1.1: Schematic representation of the Monkhorst-Pack scheme for the case of a 2D lattice. The
parallelogram centered around the origin is the conventional Brillouin zone whereas the bold one indicates
the technical implementation in the S/PHI/nX code. We span the Brillouin zone by the reciprocal lattice
vectors bi.The Brillouin zone is subdivided into identical small tiles spanned by the vectors btile
i.A set
of generating k-points (here one generating k-point at (1
2
1
2) is placed into each tile to generate all special
k-points (gray circles). Points along the edges of the conventional Brillouin zone (marked with ’x’) should be
avoided when constructing the Monkhorst-Pack k-point mesh. The set of necessary k-points can be reduced
by applying crystal symmetries.
In the Fermi function the εiσkare the one-particle energies, εFis the Fermi energy, kBis the Boltzmann
constant, and Tis the temperature.
Super cell approach The Bloch-theorem can be applied to periodic systems only. Calculations of surfaces,
for example, would require an infinite number of basis-set functions perpendicular to the surface plane.
Hence, numerical calculations would be impossible.
However, by introducing super cells which mimic the non-periodicity and repeating them periodically, Bloch’s
theorem can also be applied to such systems. In Fig. (1.2) a sketch of the super cells of a defect calculation
and a surface calculation is presented. The super cell has to be chosen large enough that the feature (defect,
vacuum, slab) causing the break of the translation symmetry is nearly decoupled from its images.
Plane-wave representation of KS orbitals The methods and techniques which will be derived in this
work will be demonstrated by means of a pseudo potential plane-wave library. Therefore, the KS orbitals
will be represented in a plane-wave basis-set. The expressions contributing to the potential veffand the total
energy Etot that are required for the implementation of our framework will be presented in the following.
The cell periodic part fcan be expanded using a basis-set eiG·Rwith of discrete set of plane-waves
fi(R)=!
G
ci(G)eiG·R.(1.44)
The expansion coefficients are labeled ci.The reciprocal lattice vectors Gthat fulfill G·Rlat =2πmwith
mbeing any integer number. Using Bloch’s theorem the wave function reads finally
Ψik(R)=!
G
cik(G)ei(G+k)·R.(1.45)
The application of Bloch’s theorem transforms the problem of describing an infinite number of wave functions
expanded over the infinite space to an infinite number of wave functions defined only in the first unit cell
27
1.4. INTEGRATION OVER THE BRILLOUIN ZONE CHAPTER 1. THEORY
(a) Super cell for calculations of a vacency (b) Super cell for calculations of surfaces
Figure 1.2: Illustration of the super cell approach. The first unit cell (darker, blue balls) is repeated along
all spatial directions. (a) Setup of a super cell to compute a vacancy defect in a bulk solid. The size of the
1st super cell has to be chosen large enough in order to decouple the vacancy from its image defects. (b)
The super cell approach can also be applied to surface calculations. In this case the vacuum region has to
be big enough to decouple the two surfaces from each other. On the other hand the slab region needs to
contain enough atomic layers to decouple the two vacuum regions from each other. Otherwise a thin film
would be modeled.
of the crystal. On the other hand, however, Bloch’s theorem makes the wave functions to be given at an
infinite set of kpoints.
Eq. (1.45) is an expansion of the Kohn-Sham orbitals in a plane-wave basis set. In this basis-set the Kohn-
Sham equation can be written as [61]
!
G!
HG+k,G!+kcik(G#)=εikcik(G)(1.46)
!
G!*|G+k|2δGG!+vion(GG#)+vH(GG#)+vxc(GG#)+cik(G#)=εikcik(G).(1.47)
In this representation the kinetic energy is diagonal in Gspace, the contributions to the effective potential
are described in terms of their Fourier transforms using
"R|G+k#=!
G
e+i(G+k)·R.(1.48)
The expansion coefficients cik(G)can be determined by diagonalizing the Hamiltonian matrix HG+k,G!+k.
In principle, an infinite basis-set is required to expand the wave functions Ψik.Hence, also the Hamilton
matrix HG+k,G!+kwould have an infinite size. Typically, only the coefficients cik(G)with small kinetic
energy contributions are more important than those with large kinetic energies [61]. Therefore, the plane-
wave basis-set can be truncated at a certain energy cut-off
Ecut =1
2|G+k|2
max (1.49)
28
CHAPTER 1. THEORY 1.4. INTEGRATION OVER THE BRILLOUIN ZONE
which defines the highest kinetic energy of the basis functions as well as the shortest wave length
λmin =2π
Gmax
.(1.50)
Hence, the basis-set convergence can be systematically improved by reducing the wave length which is
equivalent to increasing the energy cut-off. In this work the cut-offis given in Rydberg4.
With the above given Hamiltonian (Eq. (1.46)) the total energy functional Etot can be provided [61]
Etot ="ˆ
H#= tr( ˆ
Hˆ%)(1.51)
= tr "(ˆ
TsvHveI%#+EI+EXC[%].(1.52)
With the Laplacian5ˆ
L=1
22the kinetic energy contribution becomes [61]
Ekin ="ˆ
Ts#= tr( ˆ
Tsˆ%)(1.53)
=!
iσk
ωkfocc
iσk"Ψiσk|ˆ
L|Ψiσk#(1.54)
=!
iσk
ωkfocc
iσk|G+k|2|ciσk|2.(1.55)
The ion-ion contribution to the total energy EIcan be decomposed into sums over 1/rpotentials [1]
EI=1
2!
i$=j
ZiZj
|τiτj+R|.(1.56)
The prefactor 1
2arises from the double counting of the ions in the above expression. In periodic systems
such as crystals the series in the above term does not converge as it becomes an infinite sum over long-range
1/rpotentials. Ewald [68, 69, 70] solved this problem by introducing an artificial screening charge with a
Gaussian shape as the series of such screened 1/rpotentials is converging. The Ewald summation is based
on the following identity [61]:
!
i
1
|τiτj+R|=2π
!
G
η
0
exp(|G|2
4x2) exp(i(τiτj)·G)1
x3dx
+2
π!
R
η
exp(|τiτj+R|2x2)dx. (1.57)
The non-converging infinite sum on the left-hand side is replaced by two infinite sums, the first one in
the reciprocal space, the second one in real space. By choosing proper values of ηthe two sums on the
right-hand side can converge rapidly in reciprocal or real space, respectively. This identity can be efficiently
implemented in plane-wave codes, as various contributions to the Hamiltonian are evaluated in either real
or reciprocal space. By introducing an artificial Gaussian screening charge %Gauss to the ionic contributions
the following screened energy contributions can be obtained
EI+EH[%]+EeI[%]Ewald
=˜
EI+˜
EH[%%Gauss]+ ˜
EeI[%+%Gauss]Eself .(1.58)
41Ry=1
2Ha 13.6eV.
5We follow the nomenclature of Ref. [67]
29
1.4. INTEGRATION OVER THE BRILLOUIN ZONE CHAPTER 1. THEORY
Entities with tilde are screened by the Gaussian. The last term Eself is due to the self-interaction between
two Gaussians.
The energy contribution arising from the electron-electron interaction is obtained from the Poisson equation
vH(R)=4π˜%H(R)(1.59)
with the screened charge density
˜%H=%%Gauss.(1.60)
The Gaussian screening charge is constructed from spherical Gaussian with the radius rGauss and the charge
z
φis(r)= zis
-π3r3
Gauss,is
er2
r2
Gauss,is.(1.61)
%Gauss(G)=!
is"G|ˆ
Tis|φis#(1.62)
In order to project radial functions like φ(r)to the Gspace the following projector can be defined
"G+k|RnlYlm#=.2l+1
4π
4π
0
drr2jl(|G+k|r)Rnl(r)Ylm(θG,φG).(1.63)
with spherical Bessel functions jland spherical harmonics Ylm [71].
The translator in Eq. (1.62) reads
ˆ
Tis=!
ia
eiG·τisia.(1.64)
The Hartree potential and energy become eventually
vH(G-=0)= 4π
|G|2˜%H(G).(1.65)
EH=1
2tr(ˆvH%).(1.66)
The prefactor 1
2is due to the double counting correction.
The energy contributions arising from the exchange-correlation potential have been introduced already above
(see Sec. 1.2.3). The remaining electron-ion contributions will be defined in the next section.
The choice of a plane-wave representation is, in particular, justified for 3d-periodic systems such as bulk
crystals. Also in case of 2d periodic systems (e.g., surfaces) a plane-wave representation can be very efficient.
Sometimes, even 1d-periodic systems, e.g., nanowires, can be efficiently treated using a plane-wave repre-
sentation. However, one must not forget that the required vacuum regions (see Fig. 1.2(b)) are also sampled
with plane waves. The usage of larger vacuum regions causes an increase of memory and computation
demands.
Choosing plane waves as basis-set has various advantages:
As already mentioned the completeness of the basis-set can be systematically controlled by one pa-
rameter, namely the energy cut-offEcut and kpoint mesh.
30
CHAPTER 1. THEORY 1.5. VALENCE/CORE PARTITIONING
Plane-waves are orthogonal which simplifies the solution of the eigen problem. The issue of orthogo-
nality will be discussed in the following chapter in detail.
From the numerical point of view the contributions to the Hamiltonian are rather inexpensive to
calculate. Particular the Hartree term can be elegantly expressed using the so-called Fourier deriva-
tive techniques [71] based on fast Fourier transformation (FFT). Such an approach scales only with
npw ln npw when npw is the number of plane waves. Furthermore, the application of the Hamiltonian
to the wave functions |ξ#=ˆ
H|Ψ#can be comfortably expressed using FFT.
The basis-set does not directly depend on the atomic positions. Hence, when calculating forces no
additional basis-set dependent terms occur.
However, plane waves have also some disadvantages. First of all, the computational effort depends on the
system size. Larger systems require more memory and operations to be performed. Secondly, they are
sampled on a uniform grid. An accurate description of core states would require a very high density of
grid points in order to sample the nodal structure of the core states properly. In the following section it is
described how a valence/core partitioning can help to benefit from the advantages of a plane-wave basis-set
without introducing a too high sampling grid density.
1.5 Valence/core partitioning
According to the Bloch theorem a wave function can be expanded in terms of a discrete plane wave basis set
when periodic boundary conditions are being applied. However, the high kinetic energy of the tightly bound
electrons as well as the valence electrons in the core region lead to high frequencies of the corresponding
wave functions. The resulting rapid oscillations would require an extremely and computationally infeasible
large plane wave basis-set.
On the other hand chemical bonds of molecules and solids are to a greater extent determined by the valence
electrons (with significant lower kinetic energy) rather than the core electrons. For the examination of
many problems it, therefore, suffices to describe only these chemically active valence electrons quantum
mechanically, while the chemically inert core electrons and the ions are being handled together with the
nuclei as rigid non-polarized ion cores (pseudo potential). Such an approach is known as the frozen core
approximation [72].
If, however, the core region is also required to be treated accurately and a pseudo potential approximation is
not applicable all electrons have to be included. As an expansion of the tightly bound core orbitals in plane
waves is computationally not feasible, the space is in this case partitioned into a muffin tin region describing
the core wave functions and an interstitial region for the valence electrons. The muffin tin region is expanded,
e.g., in terms of atomic orbitals sampled on a radial grid which can computationally efficiently sample rapid
oscillations. The smoother wave functions in the interstitial region can be expanded in a relatively small
plane wave basis-set. In addition projectors between the muffin tin and the interstitial region have to be
defined. Methods based on such an approach are capable of describing wave functions of all orbitals and are
hence referred to as all-electron methods.
Depending on the atomic system and the investigated observables as well as the required accuracy various
methods to partition the valence and core region are available. Higher performance is usually traded with
larger computational demands. The framework that will be developed within this work should be able to
31
1.5. VALENCE/CORE PARTITIONING CHAPTER 1. THEORY
(a) Schematic illustration of the all-
electron and pseudo electron poten-
tial
0 2 4 6 8
r (bohr)
-0.2
0.0
0.2
0.4
0.6
u(r) (arbitrary scale)
1s
4s
rm
(b) Comparison of the 4s all electron and the
pseudo 1s wave function of the Zn atom
Figure 1.3: (a) Schematic illustration of the all-electron (solid line) and pseudo electron potential (dashed
line). The hard all-electron potential causes wave functions with depicting high frequency oscillations.
By introducing a pseudo potential (Vps)the obtained pseudo wave function Ψps shows significantly lower
frequencies which suggests an expansion into plane waves. The pseudo wave function Ψps matches the all-
electron wave function at the radius rm.(b) Radial part oft he all electron 4s (dashed line) and pseudo 1s
wave function (solid line) as obtained from the Zinc pseudo potential generation.
provide different partitioning approaches for different tasks. Therefore, the most important approaches are
discussed in this section. For many investigations related to semiconducting systems, such as those studied
in this work, the influence of the core is rather small. Thus, this section first addresses the pseudo potential
theory. The discussion will then focus on the description of the core region. Here, Slater’s Augmented Plane
Wave (APW) method will be sketched as well as the successive improvement steps over APW ranging from
linearization (LAPW) to the application of full potentials (FP-LAPW), and eventually the introduction of
local orbitals in the FP-LAPW+lo method. The discussion on valence/core partitioning will be completed
with a short introduction to the very successful PAW method.
1.5.1 Pseudo-potential theory
Norm conserving pseudo-potentials. First-principles pseudo potentials are constructed on the basis of
the scalar-relativistic radial Schrödinger equation of a single spherical atom. As a solution one obtains the
all-electron potential as well as an all-electron wave function. The radial contributions to the wave functions
of different magnetic quantum numbers mare identical due to the mentioned spherical symmetry.
A radial part of the pseudo wave function ups
l(r)is derived from the non-relativistic Schrödinger equation
such that the following conditions are met:
1. Eigenspectrum. Both the pseudo wave function and the all-electron wave function yield the identical
eigenvalue
εps
lεAE
nl .(1.67)
2. Cut-offradius. Outside the cut-offregion (augmentation region) which is defined by cut-offradius
32
CHAPTER 1. THEORY 1.5. VALENCE/CORE PARTITIONING
rm,the pseudo wave function and the all-electron wave function match with respect to their amplitudes
ups
l(εps
l,r)uAE
nl (r)r > rm
l(1.68)
as well as their logarithmic derivatives
d
dr ln ups
ld
dr ln uAE
nl r > rm
l.(1.69)
By increasing the cut-offradius softer pseudo potentials can be generated which require a smaller plane
wave basis-set. In return larger cut-offradii lead to a more inaccurate pseudo wave function in the
region relevant to chemical bonding. Hence, the transferability of the pseudo potential suffers when
increasing the cut-offradii.
3. Norm conservation. The pseudo and the all-electron wave functions are normalized
0|ups
l|2dr =
0|uAE
l|2dr =1 (1.70)
which implies norm conservation
r!
0|ups
l|2dr
r!
0|uAE
nl |2dr r#>r
c
l.(1.71)
4. Nodal structure. In contrast to the all-electron wave function the pseudo wave function has no radial
nodes. It should be at least twice differentiable in order to make the pseudo potential be continuous.
The pseudo valence states can be obtained from the ups
lvia
|Ψps#=1
r|ups
lYlm#(1.72)
with Ylm being the spherical harmonics [71].
5. Pseudo potential contributions. From the pseudo wave function the lth pseudo potential contri-
bution vps
lis obtained by inverting the Schrödinger equation
$1
22+vps
lεps
l%ul=0 (1.73)
=vps
l=εps
l+1
2ups
l2ups
l.
As for the pseudo wave function also the pseudo potential contributions must match the all-electron
potential outside the cut-offregion r > rm
l.
For each valence state (s,p,d,f,..., lmax) (at least) one pseudo potential vps
lis required. By applying a projector
for each component the total pseudo potential can be decomposed into the single potential contributions
vps =
lmax
!
l=0
l
!
m=l
vps
l|χlm#"χlm|.(1.74)
33
1.5. VALENCE/CORE PARTITIONING CHAPTER 1. THEORY
Semi local and fully separable form of the pseudo potential For r→∞the shape of the potential
can be nearly described as zval/r. The long-range limit of the pseudo potential can be approximately treated
independently of l. This component is called local pseudo potential vps
loc.Only the short-range limit remains
l-dependent
vps =vps
loc +
lmax
!
l$=lloc
l
!
m=l
vsl
l|χlm#"χlm|with vsl
l=v
ps vps
loc.(1.75)
This decomposition is referred to as the semi local form of a pseudo potential. The potential is constructed
such that vsl
lvanishes beyond rm
l.A further simplification can be achieved by restricting to the ground
state only (n=1)
vps =vps
loc +
lmax
!
l
l
!
m=l
vsl
l|χ1lm]#"χ1lm|.(1.76)
In contrast to Eq. (1.74) the semi local form of the pseudo potential is computationally less demanding since
the projectors belonging to l=lloc can be avoided. The number of projections to be performed is smallest
if the maximum angular momentum is chosen as local component, i.e., lloc =lmax.The required storage of
the semi local form when being applied to a plane-wave basis-set is as large as 1
2(n2
pw +npw).
An important simplification has been proposed by Kleinman and Bylander [36]. They suggested to treat
also the radial pseudo potential as a non-local potential by replacing it with the projector
"R|vps
l|R## "R|χl#EKB
l"χl|R##(1.77)
and
"R|χl#=1
r
vps
lups
l(R)
/"ups
l|vps
l|ups
l#Ylm.(1.78)
In the Kleinman-Bylander form, also known as the fully separable form of the pseudo potential, the number of
required projector evaluations reduces to npw.On the other hand this truncation might yield wrong results
because the order of atomic eigenstates is not necessarily correct. The Kleinman-Bylander Hamiltonian might
obtain atomic eigenstates containing nodes which can lie energetically below the lowest state. Such ghost
states are a direct consequence of the truncation of the pseudo potential. If such a ghost state is occupied
within the self-consistent field calculation, a spurious density is obtained which leads to unphysical results.
Thus, during the construction of the pseudo potentials it has to be ensured that such ghost states are not
lying energetically below or near the physical valence states. This can be accomplished by a comparison of
the atomic spectra for the semi local with the Kleinman-Bylander pseudo potential. Ghost states below the
valence states can be identified with the criteria suggested by Gonze [73]. In most cases pseudo potentials
free of ghost states can be generated by choosing a proper local component lloc as well as cut-offradii.
Plane wave representation The framework which will be developed in the scope of this work will employ
norm-conserving pseudo potentials. Since in this work the wave functions will be expanded in plane-waves,
the local and non-local pseudo potential contributions have to be expressed in Gspace. The artificial
Gaussian screening charge (see Eq. (1.62)) that has been introduced in the Hartree potential (Eq. (1.65))
via Eq. (1.60) can be conveniently subtracted in the local pseudo potential in order to maintain charge
neutrality [30]
φGauss,is(r)=zis
rerf,(1.79)
34
CHAPTER 1. THEORY 1.5. VALENCE/CORE PARTITIONING
"r|˜
φps
loc,is#=˜
φps
loc,is(r)=!
is
φloc,is(r)+φGauss,is(r),(1.80)
vps
loc(G)=!
is!
r"G|ˆ
Tis|r#"r|˜
φps
loc,is#.(1.81)
The energy contribution to the total energy Etot is the corresponding expectation value
Eloc ="ˆvloc ˆ%#= tr(ˆvloc ˆ%).(1.82)
The non-local pseudo potential and energy contributions in a plane-wave basis are defined [30] as
ˆvnl =
lmax
!
l$=lloc
l
!
m=l
|vnl
isl|Ψps
islm#"Ψps
islm|vnl
islm|
"Ψps
islm|vnl
isl|Ψps
islm#(1.83)
and
Enl ="Ψ|ˆvnl|Ψ#(1.84)
=!
iσk
lmax
!
l$=lloc
l
!
m=l!
GG!"Ψiσk|G+k#"G+k|vnl
isl|Ψps
islm#"Ψps
islm|vnl
isl|G#+k#
"Ψps
islm|vnl
isl|Ψps
islm#"G#+k|Ψiσk#.(1.85)
1.5.2 All-electron approaches
In the previous section the basic expressions that are necessary to implement a framework employing pseudo
potentials in a plane-wave representation have been presented. In order to account for future implementations
with respect to all-electron methods in the following the basic concepts of those methods are briefly sketched.
Since the pseudo potential approach does not treat the core states explicitly, properties which depend on
them, such as hyperfine parameters, cannot be accurately expressed6. To overcome this problem also the
core region has to be included into the quantum-mechanical description. In the following paragraphs various
basis-sets to describe the core region are introduced. In particular, we sketch the often applied full-potential
linearized augmented plane-wave method with local orbitals (FP-LAPW+lo). This approach is based on
APW (Augmented Plane Waves) and has been successively been refined: By linearization of the energy
dependence of the basis functions APW has been improved to the LAPW method. FP-LAPW describes in
addition a full potential in the interstitial regions instead of a muffin-tin potential. The convergence of the
semi-core states has been improved by additionally adding local orbitals in the FP-LAPW+lo method. In
the following paragraphs we sketch the basic concepts of these methods. The Projector Augmented Wave
(PAW) method that is able of improving the performance drastically with respect to LAPW will round up
this discussion.
Slater’s APW. In 1937 Slater introduced the augmented plane wave (APW) method [44]. The unit cell
is partitioned in the interstitial region and augmentation spheres. In the interstitial region plane waves are
taken as basis-set, while inside the augmentation sphere atomic partial waves of the form
6Approximations to hyperfine parameters can, however, be obtained within a pseudo potential approach (see for example
[74]).
35
1.5. VALENCE/CORE PARTITIONING CHAPTER 1. THEORY
φAPW(r)=!
lm
Ak
lmul(r, ε)Ylm (1.86)
are assumed. The ul(r, ε)are energy dependent radial basis functions. The free parameter Ak
lm makes sure
that the plane waves ei(G+k)·Rmatch the atomic partial waves ulYlm at the augmentation sphere boundary.
Inside the augmentation spheres the potential is assumed to reflect spherical symmetry, outside it is kept
constant.
With Sbeing the overlap matrix the following non-linear eigenvalue problem has to be solved
|ˆ
HES|=0.(1.87)
The APW approach is computationally very demanding and numerically even unstable because the deter-
minant’s matrix is energy dependent. However, it was the starting point of a family of partitioning methods
which are described below.
Linearization of the energy dependence. The energy dependence of the basis functions in the APW
method leads to an expensive non-linear eigenvalue problem. To overcome this difficulty Anderson [75]
suggested to map the APW set to an energy-independent basis set by linearizing the partial waves in energy.
The energy-dependent partial wave uAPW
l(r, ε)can be expanded in a Taylor series about a reference energy
εl
u(r, ε)=ul(r, εl)+(εεlul(r, εl)+O((εεl)2)(1.88)
˙ul=ul
∂ε .(1.89)
In addition in APW the wave functions inside and outside the augmentation sphere are matched only with
respect to the value but not with respect to the slope. This introduces additional contributions to the kinetic
energy which have to be considered in the Hamiltonian. The linearization of the energy, i.e., Eq. (1.89), can
be controlled by introducing the additional parameter Bk
lm
φLAPW(r)=!
lm
(Ak
lmul(r, εl)+Bk
lm ˙ul(r, εl))Ylm.(1.90)
The parameters Aand Bare chosen such that the plane waves can be joined with respect to both value and
slope. This procedure leads to a generalized eigenvalue problem
ˆ
HC=ESC (1.91)
with Cbeing the matrix of eigenvectors containing the wave function coefficient. The dimension of the
involved matrices depends on the number of basis-set functions used to describe the interstitial region.
Various functions have been applied as regular basis-sets (so-called envelope functions) ranging from plane-
waves to Gaussian or Hankel functions. When applying plane-waves as envelope functions the method is
referred to as LAPW [76], if Hankel functions are taken instead, the method is called LMTO (linear muffin-
tin orbital) method [77]. Besides the number of envelope functions, the choice of the augmentation radius
rmcontrols the size of the matrices. Usually rmis chosen such that it is about half the covalent radius.
36
CHAPTER 1. THEORY 1.5. VALENCE/CORE PARTITIONING
The valence states are then described by the basis functions while the core states are localized inside the
augmentation spheres. The linearization of the energy introduces additional constraints requiring more plane
waves than in APW.
Full potential representation. So far, in both methods APW and LAPW/LMTO the potential has been
treated in the so-called muffin tin approximation: Inside the augmentation region the potential is assumed
to reflect spherical symmetry while within the interstitial region the true potential is approximated as a
constant. The resulting shape of the potential suggests the name “muffin-tin” potential. A generalization of
this approximation consists of expanding the potential in the interstitial region in terms of plane-waves like
V(R)=
,lm vlm(|R|)Ylm |R|<r
α
m
,kvkeik·Rinterstitial region.
(1.92)
The density can be represented analogously. The full description of the potential in the entire space gives
this method the name Full Potential LAPW (FP-LAPW [78]).
Local Orbitals. A general drawback of LAPW is the treatment of semi-core states. These states lie
energetically between the delocalized valence states and the core states localized inside the augmentation
spheres. The semi-core states are not completely confined inside the spheres. These semi-core states have
usually one principle quantum number below the valence state. In the case of Ti the 4p state is a valence
state while the 3p state is a semi-core state.
In Ref. [79] Singh proposed the usage of local orbitals (lo) inside of the augmentation sphere. Local orbitals
can treat two principle quantum numbers per lchannel (in case of the Ti example, 3p and 4p). When such
a semi-core state should be constructed the two corresponding reference energies ε1for the description of
the 4p and ε2for the 3p state can be considered
φLO =!
lm
(Almul(r, ε1)+Blm ˙ul(r, ε1)
& '( )
+Clmul(r, ε2)
& '( )
1st ref.energy ε12nd ref.energy ε2
)Ylm.(1.93)
The free parameters Alm,B
lm,and Clm are constructed such that the local orbital has zero value and
slope at the augmentation sphere radius rm.Furthermore the local orbitals are strictly orthogonal which
implies semi-core and valence states being orthogonal. The tail of semi-core states can be represented in the
interstitial region using plane-waves.
FP-LAPW+lo. The previous ideas of APW, energy linearization, full-potential representation, as well as
the usage of local orbitals to improve the semi-core convergence are merged in the FP-LAPW+lo method
[45, 46]. Its basis-set is a mixture of plane waves in the interstitial region and a linear combination of APWs
and local orbitals within the augmentation spheres
φ(R)=
,kckeik·Rinterstitial region
,lm (Almul+Blm ˙ul)Ylm
& '( )
+Clm(A#
lmul+B#
lm ˙ul+ul(ε2))Ylm
&'( )
LAPW +lo
|R|<r
m.(1.94)
37
1.5. VALENCE/CORE PARTITIONING CHAPTER 1. THEORY
The LAPW method is a powerful and very accurate all-electron scheme which has been used in a broad
spectrum of applications. The high accuracy of LAPW is also the reason why it is used as benchmark method
when comparing accuracies of other methods. The huge computational effort in the (FP-)LAPW(+lo) scheme
arises from solving the generalized eigenvalue problem of Eq. (1.91) which has to be solved for huge matrices
while obeying the matching constraints at rmby adjusting the fitting parameters Alm,B
lm,and Clm.
PAW. The projector augmented waves (PAW) method proposed by Blöchl [31] generalizes both pseudo
potentials and the above described augmentation methods. In PAW a transformation ˆ
Tbetween the true
wave function |Ψ#and a numerically less demanding auxiliary wave function |˜
Ψ#is introduced
|Ψ#=ˆ
T|˜
Ψ#.(1.95)
The transformation should be chosen such that the smooth auxiliary wave function |˜
Ψ#converges quickly
with respect to the basis-set size. In order to yield the correct nodal structure of the true wave function,
ˆ
Thas to modify |˜
Ψ#inside each atomic region. Therefore, the atomic regions are described in terms of the
differences between the true and the auxiliary wave functions, |φ#and |˜
φ#,respectively. Inside the atomic
region the true wave function can be expanded in terms of the partial waves |φ#which are the solutions of
the radial Schrödinger equation for an isolated atom7
|Ψ#=!
i
ci"r|φi#r < rc.(1.96)
For every partial wave |φi#a smooth auxiliary partial wave |˜
φi#counterpart is constructed such that outside
the atomic region both partial waves are identical and their difference cancels out identically
φi(r)=˜
φi(r)
=φi(r)˜
φi(r)=0
rrc.(1.97)
The above definition is a crucial element for the efficiency of PAW. In LAPW a cumbersome matching
and fitting procedure is required in order to match the wave functions with respect to their value and
slope. Otherwise the truncation of the wave functions at rmwould correspond to the introduction of an
artificial multipole momentum. Throughout the PAW method always the difference of both partial waves is
considered. Hence, a truncation error would be introduced in both partial waves simultaneously and hence,
cancels out identically. As a result no fitting and matching is necessary in case of PAW.
Inside the atomic region the true wave function can be expressed in terms of the partial waves |φ#and the
auxiliary wave function |˜
Ψ#can be expressed likewise in terms of the auxiliary partial waves |˜
φ#
|Ψ#=,i|φi#ci
|˜
Ψ#=,i|˜
φi#ci
rrc.(1.98)
In order to project the true and the auxiliary wave functions from the interstitial to the atomic regions,
projector functions |˜pi#can be defined and the expansion coefficients cibecome
7Here the frozen core approximation is being applied. However, in principle the frozen core approximation can be relaxed
in the PAW approach.
38
CHAPTER 1. THEORY 1.6. TIGHT-BINDING METHODS
ci="˜pi|˜
Ψi#"˜pi|˜
φj#=δij.(1.99)
With the transformation defined in Eq. (1.95) the true wave function can now be expressed in terms of
smooth auxiliary wave functions |˜
Ψ#,smooth auxiliary partial waves |˜
φi#,smooth projector functions |˜pi#as
well as partial waves constructed as solutions of the radial Schrödinger equation for the isolated atom |φ#
|Ψ#=|˜
Ψ#+!
i"|φi#−|˜
φi##"˜pi|˜
Ψ#(1.100)
=|˜
Ψ#+!
τ"|Ψ1
τ#−|˜
Ψ1
τ##.(1.101)
Following Blöchl’s notation one-center entities within the atomic region are labeled “1”. The smooth auxiliary
wave functions |˜
Ψ#and the smooth partial waves |˜
φ#describe only the valence states. In order to evaluate
expectation values also the nccore states |φc#have to be taken into account
"A#=!
i"˜
Ψi|ˆ
TAˆ
T|˜
Ψi#
&'( )
+
nc
!
j"φc
j|A|φc
j#
& '( )
valence states core states
(1.102)
Analogously the remaining entities like the total energy and the charge density can be decomposed into the
auxiliary contributions and one-center terms
E=˜
E+!
τ
(E1
τ+˜
E1
τ)(1.103)
n(R) = ˜n(R)+!
τ
(n1
τ(R) + ˜n1
τ(R)).(1.104)
The actual expressions for the smooth auxiliary terms ˜
Eand ˜nas well as the one-center contributions E1,
˜
E1,n
1,and ˜n1are derived in [31]. A complete review about the PAW method can be found in [80, 81].
In the PAW method all entities which have to be evaluated during the SCF cycles when diagonalizing the
Kohn-Sham equations can be computed on a smaller basis-set of auxiliary functions (e.g., plane-wave basis-
set with a energy cut-offof 30 Ry). The transformation operator ˆ
Tallows a direct access to the true wave
function with the full nodal structure. In fact, it can be shown [80] that the total energy expression of the
non-local pseudo potential can be obtained by expanding the PAW total energy expression into a Taylor
series and truncating it beyond the linear term. From this point of view one might interpret PAW as a
pseudo potential approach with a pseudo potential that adapts to the electronic environment at every SCF
iteration. On the other hand PAW describes also an all-electron augmentation region like the APW family.
PAW provides full access to the true wave function, full charge and spin densities as well as properties related
to the core states, such as hyper fine parameters.
1.6 Tight-binding methods
With increasing number of atoms the application of ab-initio methods can become computationally too de-
manding. In the range up to 103or even 107atoms the tight-binding (TB) method [82] can be applied. Since
39
1.7. FORCES IN IONIC SYSTEMS CHAPTER 1. THEORY
the application of TB is important the framework should consider a future implementation of a corresponding
TB Hamiltonian. Hence, we briefly sketch the major ideas behind TB in the following paragraphs.
Tight-binding can be interpreted as counterpart to the free-electron approximation. Its basic assumption is
that the restricted Hilbert space that is spanned by atomic-like orbitals is sufficient to describe the solution of
the Schrödinger equation within a restricted energy range. There are various tight-binding implementations,
ranging from semi-empirical tight-binding to ab-initio based tight-binding. In semi-empirical tight-binding
fitted parameters are used to describe matrix elements of the overlap and Hamilton operators directly. No
localized basis is specified. Higher accuracy can be obtained by using a localized basis, such as atomic
orbitals. This leads to the linear combination of atomic orbitals method (LCAO). In LCAO the Hamiltonian
is expressed in terms of atomic orbitals µand νwhich yields the LCAO Hamilton matrix HLCAO as well as
the overlap matrix S
HLCAO
µν="µ|ˆ
H|ν#(1.105)
Sµν="µ|ν#.(1.106)
Basic assumption of this approach is that the overlap of orbitals is limited to only a few shells of neighboring
atoms. If this assumption holds the tight binding Hamiltonian decomposes into a sparse matrix. For sparse
matrices efficient eigensolvers exists which scale quadratically or linearly with the system size [83, 84, 85, 86].
Higher accuracy in the TB methods can be accomplished by introducing density functional theory into the
TB method (DFTB). Therefore Foulkes and Haydock [87] have rewritten the expression of the total energy
(Eq. (1.23)) by substituting the electronic charge density %by a superposition of the reference densities %ref
and small fluctuations δ%ref .Exc is then expanded at this reference density up to the second order. Linear
terms of %ref cancel out. The DFTB energy functional becomes then
E=!
i"Ψi|ˆ
H[%ref ]|Ψi#−1
2
%ref (R)%ref (R#)
|RR#|dRdR#+Exc[%ref ]
vxc[%ref ]%ref +Eionion
+1
2
1
|RR#|+δ2Exc
δ%#ref 7777#ref
δ%#ref .(1.107)
The linear terms in %ref cancel out. Traditional DFTB simply neglects the second order density dependent
term (last line of latter equation) while in self-consistent charge tight-binding [5, 6, 7] (SCC-DFTB) the
second order terms are considered in an extra charge density SCF loop.
1.7 Forces in ionic systems
So far we have focused on various methods to compute the electronic structure of atoms, molecules, and
solids. These methods provide access to the system’s energy as well as the wave function. Still missing
is the access to the equilibrium geometry as well as a dynamic description of atomic positions. At a first
glimpse a finite difference approach with respect to computed energies at varied atomic positions provides
a straightforward access to forces which then can be used to integrate equations of motion (EOM). As
pointed out already the self-consistent computation of the Born-Oppenheimer surface for different atomic
positions can be computationally very demanding. Such an expensive approach can be avoided by exploiting
perturbation theory.
40
CHAPTER 1. THEORY 1.7. FORCES IN IONIC SYSTEMS
In the following we discuss how forces can be obtained efficiently from first-principles calculations.
1.7.1 Hellmann-Feynman theorem
In classical mechanics forces acting on a particle at the coordinates τcan be obtained from the derivative of
the potential energy U
Fτ=−∇τU(τ).(1.108)
As analogon in quantum mechanics the forces can be determined according to
F=−∇τ"E#with "E#= min"Ψ|ˆ
H|Ψ#,"Ψ|Ψ#=1.(1.109)
A proper ansatz to compute quantum mechanical forces is the Hellmann-Feynman theorem that was pre-
sented in 1937 by Feynman [88]. This theorem states that for any degree of freedom λ(in our case the
atomic coordinates τ) the following identity holds
E
∂λ ="Ψ
∂λ |
=E|Ψ#
()&'
ˆ
H|Ψ#+"Ψ|ˆ
H
∂λ |Ψ#+
=E"Ψ|
()&'
"Ψ|ˆ
H|Ψ
∂λ #(1.110)
="Ψ|ˆ
H
∂λ |Ψ#+E
∂λ"Ψ|Ψ#.(1.111)
Hence, the Hellmann-Feynman theorem becomes eventually
E
∂λ ="Ψ|ˆ
H
∂λ |Ψ#.(1.112)
Applied to atomic coordinates the Hellmann-Feynman theorem implies that the forces can be computed
directly from the ground state wave functions, which are available from the total energy calculations anyway.
Finite basis-sets The Hellmann-Feynman theorem is only valid if Ψis an exact eigenstate. For variational
calculations of the ground state energy E, Ψis expanded in a finite basis-set. In this case Eq. (1.110) cannot
be simplified to Eq. (1.112) anymore since the first and last term of Eq. (1.110) have to be considered
explicitly. Here the term
"Ψ
∂λ |ˆ
H|Ψ#−"Ψ|ˆ
H|Ψ
∂λ #(1.113)
becomes a matrix expression and does not vanish. There are two approaches when computing forces from
ground state energy calculations. (1) The basis-set can be constructed such that Ψdoes not depend on λ.
Such a requirement is known as the Hurley condition [89, 90]. The plane-wave basis-set fulfills this condition
and the forces can obtained from the Hellmann-Feynman forces FHF
F=FHF =E
∂τ =−"Ψ|ˆ
H
∂τ |Ψ#.(1.114)
(2) In case of atomic-centered basis functions the Hellmann-Feynman forces of Eq. (1.112) will not provide
correct forces. Here the full Eq. (1.110) has to be considered. The term (1.113) is called as the Pulay force
FPulay
F=FHF +FPulay =−"Ψ|ˆ
H
∂τ |Ψ#+$"Ψ
∂τ |ˆ
H|Ψ#−"Ψ|ˆ
H|Ψ
∂τ #%.(1.115)
41
1.8. CONCLUSIONS CHAPTER 1. THEORY
1.8 Conclusions
In this chapter a brief summary of theories has been presented that are necessary to derive a flexible
framework for developing efficient CMD applications. Depending on the required accuracy, in electronic
structure calculations the description of the valence/core partitioning is crucial. Various methods ranging
from pseudorization of the core to a full all-electron description have been sketched. Since the library should
be able to cover a wide range of system sizes, we also presented an overview of important approximations,
such as the tight-binding approach. Based on the here mentioned methods in the next chapter common
issues of these methods will be identified and generalized to provide an more flexible approach to electronic
structure simulations.
In the scope of this work we limit ourselves to the pseudo potential method while keeping the other methods
in mind. It will be shown that with our approach other basis-sets and/or potentials can be easily be covered.
42
Chapter 2
Methods
In order to obtain physical properties based on the theoretical concepts described in the previous chapter,
numerical/physical methods have to be applied which will be introduced in this chapter. We provide a brief
overview on the methods which allow the development a general framework for CMD applications. The
following discussions focus on methods to
provide access to the electronic structure of a material (Sec. 2.1),
describe structural properties of a system (Sec. 2.2), and to
obtain thermodynamic properties of materials (Sec. 2.3).
2.1 Electronic minimization schemes
We begin the discussion on methods with an overview on how ground state properties Etot,%,and εiσ(k)
can be computed efficiently. With these entities a broad spectrum of material properties at T= 0 K can be
derived, such as equilibrium lattice parameters, bulk moduli, cohesive energies, and band gaps.
If the system is described within DFT, a direct approach to compute the ground state wave functions
belonging to the minimum of the total energy is to diagonalize the matrix HGG!="G|ˆ
H|G##.However,
one has to keep in mind that calculations of even small systems require a rather large basis-set (104105
plane waves). As the matrix "G|ˆ
H|G##is not sparse its memory demand scales like O(N2
pw).Also the
computational effort which is necessary to diagonalize a general matrix scales like O(N3
pw).An alternative
way of finding the ground state is the usage of iterative schemes (see Fig. 2.1).
The residual vector Rin
0!
=R=ˆ
H|Ψtrial#−ε|Ψtrial#
vanishes if the trial wave function |Ψtrial#is identical to the ground state wave function |Ψ#.The negative1
gradient of the energy with respect to the wave functions can be evaluated using the variational principle
for normalized Kohn-Sham wave functions
|ξiσk#=|giσk#=δεiσk
δ"Ψiσk|=δ
δ"Ψiσk|"Ψiσk|ˆ
H|Ψiσk#=ˆ
H|Ψiσk#.(2.1)
1by definition the gradient always points upwards
43
2.1. ELECTRONIC MINIMIZATION SCHEMES CHAPTER 2. METHODS
Figure 2.1: Schematic representation of the one-particle energy as function of the wave functions. Each wave
function is a vector in a multidimensional vector space. An initial wave function Ψ(1) is improved by using
the negative gradient, denoted with |ξiσk#until the minimum is found.
In all iterative minimization schemes the negative gradient is used to improve the wave function of the nth
iteration
|Ψ(n+1)
iσk#=|Ψ(n)
iσk#−M(|ξ(n)
iσk#).(2.2)
The iterative minimizers Mdiffer basically in the way how the negative gradient is being used and whether
a single state/band is updated sequentially (state-by-state schemes) or all states/bands are improved at once
(all-state schemes).
In the following paragraphs important aspects of multi-dimensional minimization techniques are summarized
which will be applied in this work, i.e., the numerically efficient evaluation of gradients (Sec. 2.1.1), the set
up of search direction vectors (Sec. 2.1.2), the introduction of preconditioning (Sec. 2.1.3), and conjugating
subsequent search vectors (Sec. 2.1.4).
2.1.1 Gradients δ
δ$Ψ|
In order to support iterative minimization algorithms the gradient of ˆ
Hneeds to be evaluated.
The gradient of the Hamiltonian can be obtained from
δ"Ψiσk|ˆ
H|Ψiσk#
δ"Ψiσk|=ˆ
H|Ψiσk#.(2.3)
It will be shown in the next chapter that the matrix form of the gradient has significant advantages for the
runtime performance of the framework. The gradient in matrix form reads
(ξiG)σk=(ˆ
HGG!|ΨiG)σk.(2.4)
44
CHAPTER 2. METHODS 2.1. ELECTRONIC MINIMIZATION SCHEMES
Figure 2.2: Illustration of wrap-around errors by a two-dimensional sketch of the (periodic) Fourier space.
Wave functions |Ψ#and gradients |ξ#are sampled within a cut-offsphere with the radius 1Gcut (innermost
circle 1). Components of the charge density %(G)and the effective potential veff(G)are defined inside a
cut-offsphere of twice the size (circle 2). In order to sample the gradient veff(G)Ψ(G)a sphere of 3Gcutwould
be necessary (circle 3).
Using a FFT grid with only 2Gcut the high frequency contributions between 2Gcut and 3Gcutwould be folded
back into the first cell (yellow area) due to the periodicity of the Fourier space. Hence, an artificial wrap-
around error within the interval 2Gcut <G
wrap <3Gcutwould occur. Because the gradient is sampled only
up to 1Gcut the back-folded components between 2Gcut and 3Gcut do not contribute. Thus, instead of using
a (more expensive) FFT grid of 3Gcut a smaller one of 2Gcut can be applied.
In the plane-wave representation the effective potential is diagonal in real space, i.e., the matrix "R|ˆveff|R##=
veff(R)δ(RR#),whereas the kinetic gets diagonal in reciprocal space (see p. 28). Hence, the computation
of the gradient can be efficiently evaluated using Fourier transformations [30]
"G+k|ξkin
iσk#="G+k|ˆ
L|Ψiσk#,(2.5)
"G+k|ξnl
iσk#="G+k|ˆvnl|Ψiσk#,(2.6)
"R|ξeff
iσk#="R|ˆveff,σ|Ψiσk#.(2.7)
Convolution problem veff(G)is defined with Fourier coefficients up to 2Gcut,Ψ(G)up to Gcut.The
gradient ξeff(G)that is to be used to improve the wave function, is also sampled up to Gcut.According to
the Fourier folding rule the highest frequency components of
|ξ#veff|Ψ#(2.8)
range to 3Gcut.Fortunately, it is possible to restrict this evaluation to a smaller (and faster) 2Gcut FFT
grid.
When a smaller 2Gcut FFT grid is used, a wrap around error (see Fig. 2.2) would occur. However, in the
gradient |ξ#only components of max. 1Gcut are taken into account. The wrap-around error contributions
45
2.1. ELECTRONIC MINIMIZATION SCHEMES CHAPTER 2. METHODS
exist only in the frequency range [91] of
2Gcut <G
wrap <3Gcut.(2.9)
Hence, the error contributions cannot affect the gradient |ξ#and the smaller FFT grid of 2Gcut suffices!
The consideration of the convolution problem is important with respect to the computational efficiency of
implemented gradients.
2.1.2 Search direction
In the case of the steepest descent scheme [92] the wave functions are iterated by “walking down” along the
negative residual vector direction with a fixed step width. Hence, the search direction vector |X#is
|X(n)
iσk#=(
ˆ
Hε(n)
iσk)|Ψ(n)
nσk#
=|ξiσk#−ε(n)
iσk|Ψ(n)
iσk#.(2.10)
The ε(n)
iσkdenote the approximation of the one-particle energies in the nth iteration
ε(n)
iσk="Ψ(n)
iσk|ˆ
H|Ψ(n)
iσk#.(2.11)
In the steepest descent scheme the wave functions are improved as
|Ψ(n+1)
iσk#=|Ψ(n)
iσk#−δt|X(n)
iσk#.(2.12)
In the latter equation δtis the artificial time step. The larger δt, the larger is the change in the wave
functions per iteration step |Ψ(n+1)#−|Ψ(n)#.However, if δtexceeds a critical value the scheme becomes
unstable. The position of that critical value depends strongly on the system.
2.1.3 Preconditioning
For realistic systems the steepest descent scheme converges too slowly [92]. In order to improve it, a basic
understanding of the plane-wave Hamiltonian’s shape is imperative. As already pointed out in Eq. (1.46)
the kinetic energy contribution ˆ
TGG!contains only the diagonal elements
ˆ
TGG!=δGG!|G+k|2ciσk(G).
The plane-wave basis "G+k|is expanded up to the energy cut-offEcut.According to Eq. (1.49) higher
energy cut-offs introduce a "G+k|basis with large|G+k|2values. In case of high energy cut-offs this leads
to a domination of the kinetic energy contributions to ˆ
HGG!over those of the effective potential veff(GG#)
in the high frequency regime of the plane-wave Hamiltonian matrix. In other words, high energy cut-offs
cause a domination of the diagonal over the off-diagonal elements. Such a matrix becomes ill-conditioned2
and is difficult to diagonalize. The actual problem can be easily demonstrated in the 2-dimensional case. For
this purpose the matrix "G|ˆ
H|G##with the dominating diagonal shall be iconified with a two–dimensional
2The condition of a matrix is the ratio of its largest and its smallest eigenvalue.
46
CHAPTER 2. METHODS 2.1. ELECTRONIC MINIMIZATION SCHEMES
(a) a = b (b) a « b
Figure 2.3: Illustration of the influence of the matrix conditioning to the convergence rate.
diagonal matrix
"G|ˆ
H|G##→
a0
0b
and the basis-set functions are represented with a simple 2d vector
ε="Ψ|G#"G|ˆ
H|G##"G#|Ψ#→ε=(x, y)
a0
0b
x
y
.
Evaluating the latter expression yields
ε=ax2+by2.
If aand bare identical the solution is a sphere whereas the cases a(bor a3b(ill-conditioning of the
matrix) form an ellipsoid. Following the negative gradient direction on an ellipsoidal surface leads to a
slow “zick-zack-like” minimization path involving many iterations (Fig. 2.4). That leads to the conclusion
that the steepest-descent scheme can/should be applied only when the energy cut-offcan be chosen very
small (5Ry). Yet, for realistic systems much higher energy cut-offs are required. To cope with the bad
matrix conditions a preconditioner Kcan be applied. A preconditioner approximates ˆ
H1
GG!.In the case
of a plane-wave representation it becomes a function of the kinetic energy. A preconditioner can be seen
as a mapping function to reshape the ellipsoidal form back to a spherical one, or equally, to decrease the
ratio of the largest and smallest eigenvalue of the Hamiltonian matrix in order to improve its condition.
Williams–Soler [93] combined the steepest descent with the preconditioner
K(G)=1eα(G)λ
α(G)(2.13)
α(G)=HGG ε(i)
iσk.(2.14)
The parameter λis a parameter indicating the step length. Larger values converge faster but might light to
numerical instabilities. Typical values of λare between 0.1 and 10. The search direction in the Williams–
Soler scheme reads then
|X(n)
iσk#=K|ξ(n)
iσk#+ε(n)
iσk|Ψ(n)
iσk#.(2.15)
The Williams–Soler algorithm is more stable than the steepest-descent and can be used for higher energy
cut-offs. It has the same memory demand as the steepest descent scheme.
The concept of preconditioning is crucial for many minimization schemes. In the following sections various
47
2.1. ELECTRONIC MINIMIZATION SCHEMES CHAPTER 2. METHODS
Figure 2.4: Sketch of a zick-zack minimization path of a steepest descent scheme with optimized step length.
In the optimized steepest-descent the artificial time step δtis chosen by a line minimization. Therefore, the
search direction of the next iteration is perpendicular to the previous one. The more ellipsoidal the surfaces
gets the stronger is the “zick-zack-like” of the minimization paths which leads to a slower convergence rate.
more advanced preconditioners will be introduced.
2.1.4 Conjugate gradient methods
The previously described minimization schemes suffer from the fact that each minimization step along the
search direction might affect degrees of freedom which were optimized already in a previous step. Hence, a
subsequent minimization step can reintroduce new errors proportional to the previous search direction. To
avoid this coupling between subsequent search directions two conditions have to meet: along each vector
the line minimum has to be found and the new search direction must be chosen conjugate rather than
perpendicular to the previous one. Conjugate means that coefficients, whose residual vector contributions
were already negligible, will not be considered in further iterations anymore. Hence, the multidimensional
vector space itself in which the minimization takes place is reduced. Theoretically, an optimal (exponential3)
convergence rate can so be accomplished. That is the idea of all conjugate gradient methods [94, 92].
Assuming the function to be minimized can be approximated by a multidimensional quadratic function fat
the point P
f(x)=c−"b|X#+1
2"X|A|X#.(2.16)
c=f(P),b=−∇f|p,A
ij =2f
xixj7777P
,X=(x1,x
2, . . . , xm).
Arefers to the m×mHessian matrix. Within an iteration the improved point P(n)is obtained from the
current value P(n1) via
P(n)=P(n1) +λ(n1)h(n1) (2.17)
The iteration step is denoted as n, λ(n)is a step width and h(n)is the vector of the search direction. The
search direction is obtained from
h(n)=
−∇f(P(n))n=0
−∇f(P(n))+γ(n1)h(n1) n>0,
i.e., for the first iteration a steepest descent vector acts as search direction. For all subsequent iterations the
3i.e., ln(E(n)Econverged)converges linearly.
48
CHAPTER 2. METHODS 2.1. ELECTRONIC MINIMIZATION SCHEMES
search direction vector contains an additional constraint γ(n)h(n)with
γ(n)="g(n)|g(n)#
"g(n1)|γ(n1)#(2.18)
g(n)=−∇f(P(n)).(2.19)
This additional constraint ensures the new search direction h(n)being conjugate to all previous ones, i.e.,
"h(l)|A|h(n)#=0 l-=n. (2.20)
All-state conjugate gradient
It will be shown later (Sec. 3.1.3) that the computational efficiency of evaluating complex algebraic equations
can be significantly improved using (blocked) matrix operations. Therefore, the minimization problem can
be reformulated to a matrix notation. Treating all states isimultaneously allows to rewrite the object
"G+k|Ψiσk#as a matrix
"G+k|Ψiσk#⇒CGi(σk).(2.21)
In case of plane-waves the coefficient matrix Chas the dimensions npw ×noccstates.Also the other ingredients
to a preconditioned conjugate gradient method can be rewritten in form of matrices
|giσk#=ˆ
H|Ψiσk#⇒gGi=ˆ
HCGi(2.22)
|Xiσk#⇒XGi.(2.23)
The all-state search direction in matrix notation reads then [67]
XGi=KgGi+Re tr *(PGigGi)gGi+
Re tr *(Pold
Gi)gold
Gi+X(n1).(2.24)
The wave function coefficient matrix is updated by
Cnew
Gi=CGi+λXGi(2.25)
with λis chosen to yield the line minimum along the search direction Xin order to conjugate the vector
of the next iteration [92]. It will be shown later (Sec. 3.3.1) that for most applications a quadratic line
minimization is sufficient4. For the quadratic fit as used in Eq. (49) the total energy value and its derivative
as well as a distant trial energy Etrial are considered. The derivative reads
D= 2Re tr (XG)(2.26)
whereas the trial energy is
Etrial =Etot[(CλtrialX)].(2.27)
4The application of a quadratic line minimization is also suggested in Ref. [67].
49
2.1. ELECTRONIC MINIMIZATION SCHEMES CHAPTER 2. METHODS
The symbol Xrefers to the required orthonormalization of the wave function X.The minimum value of
the quadratic fit is then
λ=D
2c(2.28)
with the curvature
c=1
λtrial
(Etrial (E+λtrialD)).(2.29)
The electronic charge density is updated according to
%σ(R)=!
ik
ωk|Ψiσk(R)|2.(2.30)
State-by-state conjugate gradient
Beside the (memory consuming) all-band conjugate gradient a state-by-state conjugate gradient scheme [26]
that can treat systems with partially occupied or even empty states will be implemented in this work. In
contrast to the previously described minimization schemes the Hamiltonian is diagonalized at a constant
charge density ˆ
H(%in = const).Hence, for a fixed Hamiltonian the total energy Etot[%in = const] cannot be
the variable to be minimized anymore. Instead, the one-particle energies εiσkcan be considered.
Since the matrix elements "Ψ(n)
i|ˆ
H[%in]|Ψ(n)
j#are not the correct one-particle energies, a subspace rotation
has to be performed. In order to decouple the |Ψ(n)#and to get access to the correct ε,a uniform rotation
matrix Uis constructed [67] in the subspace spanned by the set of wave functions of the current iteration
|Ψ(n)#
|ξ(n)#=ˆ
H|Ψ(n)#,(2.31)
Uij ="ξi|ξj#.(2.32)
The decoupled wave functions |Ψ##which yield to correct ε="Ψ#|ˆ
H[%in]|Ψ##can be obtained by rotating
along the eigenvectors of U.
Uu =uu(2.33)
|Ψ##=u|Ψ#.(2.34)
As before, a preconditioned conjugate gradient scheme is applied, just only for a single state
|g(n)
iσk#=ˆ
H[%in]|Ψ#
iσk#(2.35)
|P(n)
iσk#=K|g(n)
iσk#(2.36)
|X(n)
iσk#=K|g(n)
iσk#+"P(n)
iσkg(n)
iσk|g(n)
iσk#
"P(n1)
iσk|g(n1)
iσk#|X(n1)
iσk#.(2.37)
The improved wave function |Ψ(n)#can be obtained from a linear combination of the previous wave function
|Ψ(n1)#and the conjugate search direction |X(n)#[92]
|Ψ(n)#=α|Ψ(n1)#+β|X(n)#.(2.38)
50
CHAPTER 2. METHODS 2.1. ELECTRONIC MINIMIZATION SCHEMES
Since ˆ
H|Ψ#can be evaluated efficiently5for a fixed Hamiltonian, the coefficients αand βcan be obtained
from an uniform transformation
h=
"Ψ|ˆ
H|Ψ#"X|ˆ
H|Ψ#
"Ψ|ˆ
H|X#"X|ˆ
H|X#
(2.39)
The uniform transformation can be expressed in terms of a rotation matrix [61]
U=
cos θsin θ
sin θcos θ
(2.40)
with θbeing
tan θ=1
2
Re{h10}+ Re{h01}
Re{h00}+ Re{h11}.(2.41)
The final expression for improving the wave function reads then
|Ψ(n)#=U00|Ψ(n1)#+U01|X(n)#.(2.42)
This scheme determines an angle θwhich yields a wave function closest to the next extreme value of the
corresponding one-particle energy. Only for a rather accurate initial guess of %in and |Ψ(0)#it is ensured
that this extreme value is a minimum of ε.However, if θreturns a maximum value, an addition of π
2yields
a minimum again.
So far the charge density is kept fixed. In order to obtain self-consistency an outer iteration can be applied
which diagonalizes ˆ
Hat the updated charge density. The resulting wave functions Ψyields the output
density
%(n)
out(R)=!
iσk
ωkfocc
iσk|Ψ(n)
iσk(R)|2.(2.43)
Note, that the output density %(n)
out is not self-consistent anymore! This non-self-consistent density %(n)
out must
not be used as input density %(n+1)
in in the subsequent diagonalization of ˆ
H[%(n+1)
in ].Otherwise, the non-self-
consistent density would introduce an increase of the total energy - in other words, the algorithm would
diverge!
Instead, a “most self-consistent” or optimal density %(n)
opt can be computed from both %(n)
in and %(n)
out.Therefore a
functional %opt[%out,%in]has to be found which yields a maximum total energy gain in the next diagonalization
step. The simplest approach is a linear mixer (also known as Pratt mixer [95])
%(n)
opt =α%(n)
out + (1 α)%(n)
in .(2.44)
with αbeing a (constant) mixing factor between 0 and 1. This mixing scheme suffers usually from a bad
convergence. Additionally, the mixing parameter αhas to be optimized for each system.
5Because the charge density is fixed the contributions to the effective potential veffdo not need to be recomputed. Only the
non-local potential vnl must be updated for each gradient calculation.
51
2.1. ELECTRONIC MINIMIZATION SCHEMES CHAPTER 2. METHODS
RMM-DIIS Beside the linear charge density mixer in this work the more advanced Pulay scheme [96]
will be implemented. They key entity for charge density mixing schemes is the residual vector
R(n)[%]=%(n)
out[%(n)
in ]%(n)
in .(2.45)
Convergence is reached when Ris (nearly) zero. Thus, a minimization of R[%]0can be used in order to
estimate %opt.
Instead of using just %(n)
in and %(n)
out a set of previous densities could be used in order to predict %(n)
opt.Assuming
a linear dependence between the previous densities the new “most self-consistent” charge density can be
expressed as a linear combination
%(n)
opt =!
j
α(j)%(j)
in .(2.46)
The scheme to obtain the Pulay coefficients αwill be introduced below. Here the optimal density %opt lies
in the subspace spanned from the previous %(j)
in .
Hence, in each new iteration %(n+1)
in has to introduce new variations of the density in order to extend that
subspace. The procedure is known as DIIS6and was suggested by Pulay [96]. Since the DIIS mixing schemes
assume linearity between %(n)
in and %(n)
out they have explicitly to be “decoupled” from |Ψ#by using a subspace
diagonalization. The subspace diagonalization ensures %(n)
opt to be the only solution and %(n)
in and %(n)
out are no
longer linear dependent.
Knowing the self-consistent solution %scf the missing density %(n)
in %scf could be considered as a perturbation
ˆvCoul[%(n)
in %scf ].Linearizing the residual vector Rat the self-consistent density %scf gives
R[%(n)
in ]J(%(n)
in %scf )(2.47)
with Jbeing the dielectric function
J=1χU=1χδGG!
4πe2
|G|2(2.48)
and χbeing the susceptibility. It can be seen that the 1
|G|2term of the Hartree potential (see also Eq. (1.65))
introduces numerical errors7for small |G|values, which happens in case of large lattice vectors. These
numerical errors cause instabilities in the gradient and eventually, in the long wavelength limit of the residual
vector. This leads to artificial oscillations between subsequent charge densities %(n)
in .This effect is known as
charge sloshing.
Similarly to the considerations on preconditioners in Sec. 2.1.3 it is again the condition of the dielectric
matrix Jwhich determines the convergence. The larger the ratio between largest and smallest eigenvalue
the worse the convergence rate.
During the minimization the exact dielectric matrix Jremains unknown. Its direct solution would be too
expensive. However, an approximation of J(or J1) can be used in order to describe the influence of the
Hartree potential. Kerker [97] introduced an approximation to the response function from the Thomas-
Fermi-screening. The metric reads
J1δGG!|G|2
|G|2+q0
.(2.49)
6DIIS = Direct Inversion of the Iterative Subspace
7In general, it is numerically unstable to divide by small numbers.
52
CHAPTER 2. METHODS 2.1. ELECTRONIC MINIMIZATION SCHEMES
0 0.5 1
|G|^2
0
0.5
1
|G|2 / ( |G|2 + q0 )
q0 = 0.01
q0 = 0.05
q0 = 0.1
(a) Kerker metric
Figure 2.5: Kerker metric. Low G-frequency contributions are suppressed in the residual vector. Note, the
G=0component of Ris zero anyway. The G=0component carries the norm. In case of charge density
differences that norm vanishes [71].
The frequency range that should be suppressed in the residual vector can be tuned with q0. The shape of
the Kerker metric is plotted in Fig. 2.5 for typical values of q0.
Using8J-1R instead of Rcan counteract the charge sloshing [98]. In case of semiconducting/insulating
systems this condition becomes a constant. For metallic systems there is no screening in the short wavelength
limit (J1) whereas in the long wavelength limit the condition gets J1/q2L2(with the metallic
screening L). Hence, metallic systems are sensitive for charge sloshing problems and tend to converge
significantly slower than semiconductors/insulators.
Combining Kerker’s metric with Pulay’s DIIS scheme results in the RMM-DIIS9charge density mixer. With
the differences of the residual vectors
R(n)=R(n)R(n1) (2.50)
the Pulay matrix
Aij ="R(i)|R(j)#(2.51)
can be constructed [28]. The Pulay coefficients α(n)from Eq. (2.46) can be computed by inverting A
α(n)=A1Bwith (2.52)
Bn="R(n)|R(m)#.(2.53)
The optimal charge density for the next iteration m+1is then
%(m+1)
opt =KR(m)+
m1
!
i
α(i)((%(i)
in %(i1)
in )+KR(i)).(2.54)
81denotes the identity matrix.
9RMM-DIIS = Residual Metric Minimization - Direct Inversion of the Iterative Subspace
53
2.2. STRUCTURAL PROPERTIES CHAPTER 2. METHODS
2.2 Structural properties
Besides describing electronic properties, the computation of structural and thermodynamic properties is
important in CMD (such as the relaxed equilibrium atomic geometry, transition state search, and molecular
dynamics). Since the S/PHI/nX framework should be able to compute such important material properties
will briefly sketch important structural methods.
With the Hellmann-Feynman theorem (and the Pulay corrections) a way is found to determine forces ef-
ficiently within DFT. With an access to forces the relaxed atomic structure can be computed, transition
states can be identified, and dynamic properties can be expressed. In the following paragraphs we present
the basic concepts behind these schemes.
2.2.1 Quasi Newton
With the obtained forces the atomic structure of a system can be relaxed. One of the most efficient structural
relaxation schemes is the quasi-Newton scheme which will be implemented in S/PHI/nX. Quasi means to
avoid exploiting the actual Hessian (as used in all Newton-based schemes). The class of iterative quasi
Newton algorithms use in each cycle an approximation of the inverse Hessian H1called ˜
B. In modern
applications the BFGS10[99] quasi Newton scheme is one of the most successful optimization algorithms as
it is extremely efficient for large-scale problem. Several variations of this algorithm have been developed to
reach also O(n) scaling.
In S/PHI/nX the BFGS quasi Newton scheme has been implemented. As all quasi-Newtons schemes it starts
from an approximation of the Hessian. As initial guess the identity 1is used as approximation
BH=1(2.55)
˜
BH1.(2.56)
The gradient gcan be identified with the forces facting on each atom. The forces are obtained from either
the Hellmann-Feynman forces from the DFT potential or analytically/numerically from empirical potentials.
The gradient can be obtained from
g=f=δE
δτ .(2.57)
The atomic positions τare improved along the gradient taken the curvature into account
τ#=τ˜
Bg.
The gradient g#at the improved coordinates τ#reads
g#=f#=δE
δτ#.
10BFGS = Broyden, Fletcher, Goldfard, Shanno
54
CHAPTER 2. METHODS 2.2. STRUCTURAL PROPERTIES
The BFGS algorithm updates the approximation of the inverse Hessian like
˜
B#=$1syT
yTs%˜
B$1ysT
yTs%+ssT
yTs(2.58)
s=τ=τ#τ(2.59)
y=g=g#g.(2.60)
The approximation of the inverse Hessian is iteratively improved [100]
˜
B(n+1) =˜
B(n)˜
B(n)ssT˜
B(n)
sTBs yyT
yTs.(2.61)
2.2.2 Molecular dynamics
Properties which are dynamic in time, such as melting/solidification behavior, analysis of diffusion paths
and diffusion barriers, and the study of chemical reactions, require the description of the atom trajectories.
Therefore, Newton’s equation of motion Eq. (1.108) needs to be solved. In order to obtain the trajectories
an integrator over the time is introduced. For different statistical ensembles different integrator schemes are
applied. The Verlet integrator [101, 102] is commonly used for micro-canonical ensembles, the Nosé-Hoover
integrator [103] for canonical ensembles.
Verlet integrator One of the most commonly used integrators in the micro-canonical ensemble is the
Verlet integrator. It is known to be stable and provides time-reversibility. The atomic positions are expanded
into two third order Taylor series, one forward in time, one in reverse time
I:τ(t+t)=τ(t) + ˙τ(t)t+1
2¨τ(t)t2+1
6
...
τ(t)t3+O(t4)(2.62)
II :τ(tt)=τ(t) + ˙τ(t)t+1
2¨τ(t)t21
6
...
τ(t)t3+O(t4).(2.63)
Adding (I) and (II) yields the actual Verlet integrator
τ(t+t) = 2τ(t)τ(tt)+t2¨τ+O(t4).(2.64)
The truncation error is of order O(t4).The third order term of the Taylor expansions cancels out in the
Verlet integrator.
Subtracting (I) and (II) provides an expression for the velocities:
v(t)=τ(t+t)τ(tt)
2t+O(t2).(2.65)
In contrast to the atomic positions the velocities are only of order O(t2).
In principle, a more sophisticated algorithm such as higher order finite differences could be employed to
reduce the integration error beyond O(t4)to allow a longer time step. A longer time step would reduce
the computational time because the number of integration steps would be reduced. However, higher order
terms do not significantly increase the maximum stable time step. On the other hand, higher order terms
55
2.3. DERIVING THERMODYNAMIC PROPERTIES CHAPTER 2. METHODS
require both memory and additional computational effort compared to the plain Verlet integrator11. This is
the reason why in the realm of the micro-canonical ensemble the Verlet algorithm is very popular.
Nosé-Hoover thermostat While in calculations in the micro-canonical ensemble the energy E, the vol-
ume V, and the particle number Nis being kept constant (EVN ensemble) in the canonical ensemble the
temperature Tis kept fixed instead of the energy (TVN ensemble). The constant temperature constraint
is usually accomplished within a Nosé-Hoover thermostat [103]. Here an extended Lagrangian is being
introduced
LNose =
nat
!
τ
p2
τ
2Mτs2EBOS({τ})+1
2Q˙s2L
β.
In this equation Qdenotes an artificial mass of the additional degree of freedom. Lis the number of physical
degrees of freedom and the reciprocal temperature is β.The momentum pτis given as
pτ=LNose
∂τ =Mτs2˙τ.(2.66)
The Nosé Lagrangian leads to an extended Hamiltonian
ˆ
HNose =!
τ
p2
τ
2Mτ
+EBOS({τ})+1
2Q+Lln s
β.
2.3 Deriving thermodynamic properties
The S/PHI/nX framework will be applied in this work to compute thermodynamic properties (the phonon
spectra ω(T),the linear expansion coefficients α(T),the heat capacities CV(T)and Cp(T)) of III-V semi-
conductors. In the following paragraphs a method to compute these data from first-principles is sketched.
2.3.1 Free energy surface
Thermodynamic properties of crystals can be derived from the free energy surface F(T, V )spanned by the
temperature Tand the volume V. In case of non-magnetic crystals the free energy is decomposed into an
electronic and a vibronic contribution [52]
F(T, V )=Fel(T, V )+Fvib(T, V ).(2.67)
The latter equation holds when the adiabatic approximation applies and higher order contributions (e.g.,
due to spin-orbit coupling) are negligible. The electronic contribution to the free energy Fel can be obtained
in the finite temperature DFT [104] from the total energy Etot of the crystal and the electronic entropy Sel
Fel(T, V )=Etot(V)TSel,(2.68)
11Memory and computational efficiency is not strictly relevant for MD implementations acting on DFT potentials since the
computational effort of a MD algorithm is negligible compared to the computation of the Born-Oppenheimer surface. However,
our framework should also support (semi-)empirical potentials which are computationally less demanding. In this case efficiency
considerations of the MD implementation become important.
56
CHAPTER 2. METHODS 2.3. DERIVING THERMODYNAMIC PROPERTIES
while the electronic entropy Sel can be determined from the occupancies focc
ias
Sel =2kB!
i
(focc
iln focc
i+ (1 focc
i) ln(1 focc
i)) (2.69)
with the Boltzmann constant kB[52]. The prefactor 2 accounts for the spin-degeneracy of each quasi-particle
state. Note that focc is temperature dependent via Eq. (1.43). In this work we focus on semiconducting
systems for which we will show (see Sec. 4.1) that the temperature dependence of TSel plays only a minor
role.
The vibronic contribution of Eq. (2.67) reads [105]
Fvib(V, T )= 1
nat
3nat
!
i$1
2!ωi+kBTln "1e!ωi
kBT#%.(2.70)
Here the phonon frequencies ωiare eigenvalues of the dynamical matrix D
D(q)vi(q)=ωi(q)vi(q)(2.71)
and viare the corresponding eigenvectors at the wave-vector q.The dynamical matrix can be expressed in
reciprocal space [37]
Dµν(q)= 1
M
nat
!
iaiβ
FIFC,iαiβ
µνeiq·R.(2.72)
FIFC denotes the matrix of the interatomic force constants [106]
FIFC,iαiβ
µν=2Fel
τiaµτβν
(2.73)
which is generated by an atomic displacement τiaµof atom iαalong the direction µ. This displacement
induces forces on the atoms iβalong the directions ν.
There are different approaches to compute interatomic force constants. Originally, this has been done
by inverting the dielectric matrix [107] or, computationally much less demanding, using a perturbation
method using the Sternheimer equation [52] in the Density Functional Perturbation Theory (DFPT) [106,
108]. Alternatively, the interatomic force constants can be obtained using the direct method. The phonon
frequencies are obtained from total energy differences of the unperturbed and the perturbed structure (frozen
phonon method [1]) or, as used in this work, from forces acting on atoms in the distorted geometry using
HF theorem (see Sec. 1.7.1).
Perturbative methods are better suited for strongly localized phonon anomalies and the direct method
for shallow ones [109]. The direct method can be applied without the implementation of a perturbative
Hamiltonian and can be used with the current version of S/PHI/nX.
For systems for which the temperature dependence of TSel plays only a minor role (see above), the electronic
contribution Fel can be identified with the total energy at the equilibrium volume Veq
Fel(V, T )
=Etot(Veq).(2.74)
Within DFT the dynamical matrix can be determined by evaluating the interatomic force constants using
57
2.3. DERIVING THERMODYNAMIC PROPERTIES CHAPTER 2. METHODS
the Hellmann-Feynmann theorem. With Eq. (2.74) FIFC becomes [106]
FIFC
iαµ,iβν=2Etot
τiαµτiβν
.(2.75)
Following Ref. [37] the volume dependent total energy Etot(V)can be obtained by constructing a fit to the
Murnaghan equation
Etot(V)=Etot(Veq,T = 0) + BV
B#2B#<B#(1 Veq
V)+$Veq
V%B!
1=(2.76)
with the bulk modulus Band its derivative B#.The free energy surface can then be computed as
F(T, V )=Etot(V)+ 1
nat
3nat
!
i$1
2!ωi+kBTln "1e!ωi
kBT#%.(2.77)
The thermodynamic properties ω(q),α(T),and Cp,V (T)will be computed in this work from first-principles
by combining Eqs. (2.71), (2.72), (2.75), (2.76), and (2.77).
Thermodynamic properties The free energy surface defined in Eq. (2.77) provides access to various
thermodynamic properties [105], such as the thermal expansion #and its coefficients α, constant volume
heat capacity CVand constant pressure heat capacity Cpor the mode-Grüneisen parameters γ
P=$F(T, V )
V%Veq
(2.78)
#(T)=a(T)a(Tref )
a(Tref )(2.79)
α(T)= 1
a(T)
a(T)
T(2.80)
CV=T$2F(T, V )
T2%VV
(2.81)
Cp=T$2F(T, V )
T2%Vp
(2.82)
γ=V
3nat
3nat
!
i
1
ωi
dωi
dV .(2.83)
The thermal expansion #(T)is obtained from the lattice constant a(T)and the lattice constant at a reference
temperature Tref .
2.3.2 Born-effective charges
In polar crystals, such as the zincblende III-V semiconductor systems we investigate in this study, long range
electric fields emerge due to long-range longitudinal phonons. This effect is responsible for the disappearance
of the degeneracy between longitudinal and transversal optical phonon (LO and TO) at the center of the
Brillouin zone, also known as the LO-TO splitting phenomenon. Following Ref. [110], the origin of the
LO-TO-splitting can be easily understood when considering an optical phonon of a polar zincblende crystal
58
CHAPTER 2. METHODS 2.3. DERIVING THERMODYNAMIC PROPERTIES
Figure 2.6: Sketch of (a) LO and (b) TO phonon modes.
along the "111#direction. In this orientation the positive and negative ions lie in separate parallel planes. In
the LO phonon mode the ions are vibrating perpendicular to these planes (see Fig. 2.6). This is equivalent
to a capacitor with oppositely charged plates sliding apart from each other inducing an extra force onto the
plates. Analogously, due to the Coulomb interaction of the moving atomic planes perpendicular to the spatial
diagonal an extra force is introduced. This additional force causes a frequency shift of the corresponding
phonon. In the TO phonon mode the atoms vibrate within these planes. That is similar to a capacitor
with infinite parallel plates which are sliding along the plates while keeping the distance between the plates
constant. In contrast to the LO mode, here no additional force is induced and the phonon frequency is not
being affected. From the above picture it also becomes clear that the frequency shift depends on the direction
from where one approaches Γ.Depending on the approaching direction the distribution of the oppositely
charged ions differs and thus, the induced frequency shifts.
The couplings between the optical phonons and the electric fields are called Born effective charges Z
Z
κ,βα =Pβ
∂τκα(q=0)(2.84)
being the change of the macroscopic polarization Palong direction βcaused by an atomic displacement
along direction αwith absent external fields. With the electric enthalpy
Eelectr =E!
α
Pαεαand Pα=1
Eelectr
∂εα
(2.85)
the Born effective charge can be expressed in terms of the total energy or the forces as
Z
κ,αβ =2Eelectr
∂εβ∂τκα
=Fκα
∂εβ7777τκα=0
.(2.86)
This equation is equivalent to the above discussion. The induced charge is due to the change of the force
along αwhen considering a homogeneous electric field along β.
The coupling between phonons and electric fields affect the computed optical phonons near Γ[40]. The LO-
TO splitting affects only a very small region of the qspace. In this work we neglect the LO-TO splitting.
Conclusions
We presented the required theoretical and methodological concepts which are necessary to implement a
flexible framework to describe problems in CMD. We introduced various theories to describe ground state
properties such as density functional theory, (density functional based) tight binding, or the application
59
2.3. DERIVING THERMODYNAMIC PROPERTIES CHAPTER 2. METHODS
of empirical potentials. A brief overview on methods to obtain electronic, structural, and thermodynamic
properties based on these theories has been presented. With this knowledge a framework can be derived
which is sufficiently flexible to allow an implementation of state-of-the-art algorithms to compute a wide
range of material properties computationally efficiently and accurately.
60
Chapter 3
S/PHI/nX
Method development in the field of CMD is typically very time consuming. The implementation of new
basis-sets, minimizers, or structural algorithms can easily create workloads of months or years. A major
part of the time is spent in an iterative process of code implementation, testing/debugging, and a step to
optimize the run-time performance of the code. The developer must typically have deep programming skills
and a thorough understanding in numerics, computer science, and physics before an implementation can
begin.
In the first step of the development process (code implementation) the physical/algebraic algorithms have
to be transcribed to source code. The developer is often confronted with procedural programming leading
to function call with 20 arguments and more. For example, a typical source code reads1:
call opernla_ylm(choice,dgxdt,ffnlin,gx,ia3,idir,istwf_k,&
itype,kgin,kpin,lmnmax,matbtl,mincat,mlmn,ndgxdt,nincat,nloalg,&
npwin,ntype,ph3din,vectin)
call opernc_ypm(atindx1,choice,dgxdt,dgxdtfac,dimenl1,dimenl2,&
enl,gx,gxfac,iatm,indlmn,itype,lmnmax,mincat,mlmn,natom,ndgxdt,&
ndgxdtfac,nincat,ntype,paw,signs,wt)
In order to introduce changes in such a code a detailed knowledge of available functions, and the exact
meaning of all arguments is required. Required tasks like memory management and the numerical details do
not allow the developer to completely focus on the actual algorithm. This slows down the implementation
process significantly.
The testing and debugging sequence occupies typically significantly more time. Many program packages are
not strictly modularized. Modifications in one part of the code can often introduce unwanted side effects in
other routines. Locating such problems is often challenging. Furthermore, the explicit memory management
can lead to unpredictable run-time behavior. Therefore, the time needed for testing and debugging is often
the major part in code development.
Once the algorithm has been successfully transcribed and implemented the code needs to be optimized in
order to run efficiently on modern computers. This step requires a thorough understanding in numerics,
computer science, and computer hardware. New trends in HPC such as massive multi-core architectures,
1Taken from Abinit/Src_2nonlocal/nonlop_ylm.f.
61
CHAPTER 3. S/PHI/NX
multi-threaded programming, CUDA2/GPU3computing, grids/clouds have to be exploited when optimizing
algorithms to take advantage from the new hardware capabilities. The setup of data structures, communi-
cations between computer components, and the efficient usage of external libraries need to be considered.
It will be shown later that often only a fraction of the possible algorithmic optimization is reached in many
HPC program packages.
This work introduces a new concept of how CMD method development can be simplified. We aim at a
method which requires only rudimental programming skills from the physicists which allows to focus on the
implementation of the actual algorithm. Therefore, we introduced a physics meta-language which allows the
development of even complex quantum mechanical algorithms in the Dirac notation directly in the source
code.
In this chapter the techniques are presented which we developed in the scope of this work. We use our
new meta-language to create the efficient library and full-featured program package S/PHI/nX. It will be
demonstrated that in our approach the implementation of a DFT Hamiltonian using a plane-wave basis-set
can be accomplished in a very short and transparent source code (approx. 550 source lines). For example,
a quantum mechanical expression such as
Ψ(R)=!
G"R|G#"G|Ψ#
can be programmed in an almost text book like notation:
psiR = SUM (G, (R|G) * (G|psiG));
The entire approach has been derived under the constraint of achieving a very high run-time performance
on modern computer platforms with a source code which is as close as possible to mathematics/physics
textbooks. Therefore, we discuss our new techniques in an order dictated by performance considerations:
We begin with an analysis of the machine code generated by the common programming languages in
HPC (FORTRAN77, Fortran90/95, C, and C++) in order to identify the best suiting programming
language for this project.
Quantum mechanical expressions can be efficiently expressed in an algebraic formalism [52] which
requires an efficient numeric/algebra library in our approach. We derive new techniques addressing
automatic memory management and show how the run-time performance of functional programming4
can be significantly increased. This allows us an implementation of a high performance algebra library
with an interface reminiscent to algebraic textbooks.
We discuss techniques which we developed to create a quantum-mechanical meta-language and describe
how to get from a conventional modular programming via object-orientation to a functional Dirac-
notation meta-language.
We extend the set of techniques such that complex equations of motion can be comfortably imple-
mented.
2Compute Unified Device Architecture. CUDA is a parallel computing architecture developed by NVIDIA
3Graphic processing unit.
4Functional programming, i.e., defining algorithms in terms of mathematical functions, is necessary for a notation resembling
text books.
62
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
The chapter is completed by run-time performance benchmarks of S/PHI/nX in comparison to VASP.
Here we demonstrate that with our approach textbook-like source code can generate highly efficient
CMD programs.
3.1 Basis-set independent implementations
Modern physics offers two convenient notations for quantum mechanical equations: the matrix-oriented
Heisenberg notation (suggested by Heisenberg, Born, and Jordan [111]) and Dirac’s vector-oriented bra-
ket notation [35]. In the following we compare both styles with respect to the feasibility for developing a
meta-language for quantum mechanics.
3.1.1 Matrix notation
When performing numerical calculations the wave functions can be expanded into basis functions {bα}
Ψi=!
α
Cαibα.(3.1)
Here, the Cαidenote the expansion coefficients of the wave function Ψi.Thus, for numerical calculations the
wave function can be treated as the matrix Cαβ
C=
C11 C12 ··· C1m
C21 C22 ··· C2m
.
.
..
.
.....
.
.
Cn1Cn2··· Cnm
(3.2)
with each state/band ibeing a single column of that matrix. The dimension of the coefficient matrix is
determined by nbasis functions and mstates.
Quantum mechanics makes heavy usage of operators. An arbitrary hermitian operator ˆ
Awhich acts on Ψ
can also be expressed as a matrix
C#
11 C#
12 ··· C#
1m
C#
21 C#
22 ··· C#
2m
.
.
..
.
.....
.
.
C#
n1C#
n2··· C#
nm
=
A11 A12 ··· A1n
A21 A22 ··· A2n
.
.
..
.
.....
.
.
An1An2··· Ann
C11 C12 ··· C1m
C21 C22 ··· C2m
.
.
..
.
.....
.
.
Cn1Cn2··· Cnm
.(3.3)
Besides the wave functions basis-set operators like the overlap Oand the Laplacian L5can also be defined
as matrices
5We follow here the nomenclature presented in Ref. [67].
63
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
Oαβ =drb
α(r)bβ(r)(3.4)
Lαβ =drb
α(r)2bβ(r),(3.5)
where Oαβ =δαβ for orthonormal basis-sets.
Using this set of operators and functions any Hamiltonian can be built up independently of the basis-set.
For the sake of simplicity we demonstrate the kinetic operator in a matrix notation which reads
T=1
2!
i
drΨ
i(r)2Ψi(r)(3.6)
=1
2!
iαβ
C
αiLαβCβi(3.7)
=1
2tr(CLC).(3.8)
This formulation can easily be applied to all other contributions of the Hamiltonian as well. It has been
successfully applied in the DFT++ project of the Arias Research Group Initiative [67]. The computational
advantage of this approach is, that blocked algorithms can be used and highly optimized matrix-matrix nu-
meric libraries (BLAS6, LAPACK7) can be applied which improves the executional performance drastically.
On the other hand, depending on the basis-set the wave function and operator matrices might tend to be
very large, perhaps even too large to keep them in the computer’s memory. In the Arias approach a fallback
to the memory friendly vector-vector implementation is not straightforward. Hence, the choice of a vector
or matrix representation is a compromise between memory consumption and performance. Therefore, one
goal in this work is to find a solution that supports vector and matrix representation equally, depending on
the algorithmic and computational needs.
3.1.2 Bra-Ket notation
Besides the matrix-oriented Heisenberg notation nowadays the Dirac notation has become a standard lan-
guage in quantum mechanics because it allows “hiding” the basis-set details and focusing on the physical
content instead. Generally, a Dirac vector is symbolized as |Ψ#,and is an element of an abstract Hilbert
space which cannot be used for performing actual calculations. In order to define a wave function numerically
a basis-set "B|is necessary on which the wave function can be projected (sampled) on, e.g.,
Ψ(B)="B|Ψ#.(3.9)
How that expansion looks in detail is not specified in the bra-ket notation. The Dirac notation also allows
projections between different basis-sets, e.g,
6BLAS = Basic Linear Algebra System. BLAS is a set of highly optimized algebra functions organized in BLAS-1 (vector-
vector operations), BLAS-2 (vector-matrix operations), and BLAS-3 (matrix-matrix operations). BLAS libraries are typically
provided by the HPC vendors and are specially optimized for their platforms, e.g., AMD-ACML, Intel-IMKL, IBM-Essl, HP-
Veclib.
7LAPACK = Linear Algebra Package. LAPACK provides high level algebra algorithms such as matrix inversions or singular
value decompositions based on BLAS.
64
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
Ψ(X)=!
B"X|B#"B|Ψ#="X|Ψ#.(3.10)
An arbitrary hermitian operator ˆ
Awhich acts on |Ψ#yields a vector |Ψ##
ˆ
A|Ψ#=|Ψ##.(3.11)
Multiplying with the adjoined bra vector gives the expectation values (matrix elements)
"Ψi|ˆ
A|Ψj#="Ψi|Ψ#
j#.(3.12)
In order to define a basis-set independent Dirac-like implementation, basis-set dependent operators like the
overlap ˆ
Oor a Laplacian ˆ
Lhave to be defined. Then, based on those basis-set dependent operators a new
Hamiltonian can be built up which depends only on these operators.
A special operator is the density matrix
ˆ%=!
i|Ψi#"Ψi|(3.13)
which can be used to express expectation values according to
"Ψi|ˆ
A|Ψi#= tr(ˆ%ˆ
A).(3.14)
Of course, both formulation styles, the matrix and the Dirac notation, can easily be connected with each
other. So the Dirac ket vector |Ψ#can be expressed as a column vector of its expansion coefficients
"B|Ψi#=
C1i
C2i
.
.
.
Cni
(3.15)
and its adjoined as a bra row vector, which is
"Ψi|B#=(C
i1C
i2···C
in).(3.16)
The scalar product of a bra and a ket8is then simply9
"Ψ#
i|B#"B|Ψj#⇒!
k
C
ikCkj =(C
i1C
i2···C
in)
C1j
C2j
.
.
.
Cnj
.(3.17)
8This is why the Dirac notation often is also referred as bra-ket notation.
9This works only if both vectors are given in a complete orthonormal basis. Otherwise a metric has also to be introduced.
65
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
Such scalar products are massively used in quantum mechanics. Numerically they can be comfortably
evaluated using BLAS function calls.
Most of the algorithms in literature are presented in the bra-ket formalism. If a programming language would
support this notation implementation of quantum mechanical program packages would be dramatically
simplified. In the following sections we derive the necessary techniques to extend existing programming
languages such that a support of the flexible Dirac formalism becomes possible.
3.1.3 Programming languages
The support of the Dirac notation directly in a programming language is not trivial. In the Dirac formalism
similar terms require the execution of very different operations depending on the physical context. For
example, the meaning of the term "A|B#depends on what Aand Bis:
"X|Ψ#⇒
identity if Ψ=Ψ(X)
projector from Bto Xif Ψ=Ψ(B)
(3.18)
"Ψ|Ψ#⇒scalar product (with metric) (3.19)
"X|B#⇒function call to projector between Band X(3.20)
ˆ
A|Ψ#⇒evaluation of an operator A such as ˆ
O,ˆ
L,ˆ
H(3.21)
In order to assemble machine code, the compiler has to recognize the physical meaning. For a physi-
cist/chemist this discrimination of the above cases is trivial while “teaching” the physical context to a
compiler is challenging and requires the combination of various disciplines of computer science and quantum
physics. A programming language has to be identified that provides the flexibility to allow an extension for
“understanding” the basics of a quantum-mechanical language. Therefore, in this section we compare the
common programming languages used in HPC with respect to the efficiency of the generated machine code
and the provided flexibility in the source code.
Most of the scientific software so far has been written in FORTRAN77 or Fortran90/95. These languages
generate fast machine code but do not provide the huge freedom to the developer as C or C++ do. Unfortu-
nately, C and in particular C++ is slow in the field of numerics [112]. The slower executional speed of C++ is
also referred to as abstraction penalty. The key in overcoming the abstraction penalties is the understanding
of how both languages define pointers10. In the following a brief analysis of pointer arithmetics and memory
hierarchies will be presented.
Performance of vector operations
FORTRAN represents vectors in form of arrays. When accessing the i-th vector element FORTRAN in-
ternally interprets it as a pointer to an address with the offset nbytes iwith nbytes being the number of
bytes used for storing one element, e.g., 8 bytes for a REAL*8. Here it is important that FORTRAN’s
pointer can dereference11 only addresses that lay inside the range belonging to the vector. The FORTRAN
compiler makes sure that the pointer cannot refer to any other address in the memory. Also the usage of
10Pointer = address in memory. A pointer to a variable “points to” the address of that variable.
11dereferencing = resolving the value at the memory address specified by the pointer.
66
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
Fortran90/95’s pointer type does not allow referring to an address outside the assigned vector12. This re-
stricted definition of a pointer has a big advantage for high-performance numerics: the compiler makes sure
that two pointers cannot overlap. Such non-overlapping pointers are called restricted pointers. Assuming
the vector-vector operation y=y+ax,a FORTRAN source code would read
do i=1,n
y[i] = y[i] + a*x[i]
end do ! i
Because the (internal) pointers of y[i]=ystart +nbytesiand x[i]=xstart +nbytesicannot overlap by definition,
the FORTRAN compiler will unroll the loop over iand split up the expression. During the optimization
sequence the compiler generates an intermediate code like
do i=1, n, 4
y[i] = y[i] + a*x[i] ! performed in 1st CPU pipeline
y[i+1] = y[i+1] + a*x[i+1] ! performed in 2nd CPU pipeline
y[i+2] = y[i+2] + a*x[i+2] ! performed in 3rd CPU pipeline
y[i+3] = y[i+3] + a*x[i+3] ! performed in 4th CPU pipeline
end do ! i
The 4 assignments are completely independent from each other (because the pointers are restricted and
cannot overlap). Thus, they can also be evaluated independently in the 4 parallel software-pipelines of the
CPU13. As a result the executional speed increases roughly by a factor of 4!
In C/C++ the situation is more complicated because here the concept of pointers is by far more flexible.
This flexibility allows the realization of object-oriented techniques. In C/C++ a pointer can always overlap
with another pointer14. In the previous example the C/C++ compiler cannot assume that the 4 assignments
are really independent (xcan also point to some area of y). In this case unrolling of the iloop would not
lead to a parallel execution in the CPU’s pipelines. However, with this knowledge in mind it is simple to
overcome this problem manually by loading the values into local variables. The compiler always assumes
variables to be independent from one another. The corresponding C code (which yields the same speed as
the optimized FORTRAN code) reads
for (i=0; i < n; i+=4) {
x0 = x[i+0]; x1 = x[i+1]; x2 = x[i+2]; x3 = x[i+3];
y0 = y[i+0]; y1 = y[i+1]; y2 = y[i+2]; y3 = y[i+3];
y[i] = y0 + a*x0; ! performed in 1st CPU pipeline
y[i+1] = y1 + a*x1; ! performed in 2nd CPU pipeline
y[i+2] = y2 + a*x2; ! performed in 3rd CPU pipeline
y[i+3] = y3 + a*x3; ! performed in 4th CPU pipeline
}
Of course, this manual unrolling can be implemented in low-level libraries and can so be hidden from the
developer.
12The only exception is the NULL address to indicate an not associated pointer.
13The number of available pipelines depends on the architecture. We discuss here the situation for an Intel Xeon platform
with 4 pipelines.
14Bjarne Stroustrup, who developed the C++ language, suggested a new pointer attribute restrict to support the usage of
restricted pointers. However, the introduction of restricted pointers was not approved by the ANSI C++ committee. Hence, it
is not supported by all C++ compilers and we, therefore, avoid it.
67
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
Figure 3.1: Memory hierarchy in modern computer architectures. Since most numerical operations have
to reuse certain elements of the memory (such as matrix operations) a hierarchy of memories is nowadays
implemented in all modern computer platforms. Fast memory is expensive and very limited in space. The
fastest memory are CPU registers and Level 1 Caches located close to the math unit (access time: ns, size:
100 bytes). To buffer data from the RAM a hierarchy of 3 level caches is provided inside the CPU. The
RAM is connected to the CPU via a (slow) data bus (100ns, GB). Scratch space refers to the local hard
disk while WFS are global file systems mounted via a slow network. The massively accessed elements can
be first loaded into the smaller but faster memory. This way the operations can be performed much faster.
Depending on the number of available memory levels this approach can be recursively repeated.
Matrix operations
The other extremely important performance issue is the usage of the CPU’s level caches15 (see Fig. 3.1).
Matrix operations can greatly benefit from such architectures. This can be easily demonstrated by means
of a simple matrix multiplication Cij =,kAikBkj.A pseudo code of a naive implementation would read
do i=1, n
do j=1, n
do k=1, n
C[i,j] = C[i,j] + A[i,k] * B[k,j]
end do ! k
end do ! j
end do ! i
In this implementation the matrix multiplication involves 2n3arithmetic operations if “+” and “*” are
counted separately. The memory demand for the three matrices A,B, and Care in total as large as 3n2.If A,
B, and Cdo not fit into the CPU’s level caches the matrix elements of Aand Bhave to be loaded repeatedly
from the RAM through the slow data bus. If Aand Bare subdivided into smaller b×bmatrices such that
the fast level caches can accommodate the blocks the performance speed can be dramatically increased. An
improved algorithm would read
do i=1, n
15The level caches are memory banks integrated directly inside or placed very close to the CPU. Due to the short distances
to the math units accessing of the level cache memories is extremely fast compared to the slow access to the RAMs via the
data bus. Modern CPUs are equipped with a hierarchical arrangement of caches, most common are 3 levels of different sizes,
ranging from a few kB (level 1 cache, fastest access) to 2 MB (level 3 cache).
68
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
do j=1,n
do k=1, n
! --- loop over blocks
do iBlk=i, min(i+b-1, n)
do jBlk=j, min (j+b-1, n)
do kBlk=k, min(k+b-1,n)
C[iBlk,jBlk] = C[iBlk,jBlk] + A[iBlk,kBlk]*B[kBlk,jBlk]
end do ! kBlk
end do ! jBlk
end do ! iBlk
end do ! k
end do ! j
end do ! i
This implementation results in exactly the same arithmetic operations as in the first example. Only the
sequence differs between the unblocked and blocked version. Consider a single iteration at a fixed tuplet of
(i,j,k). The 3b3arithmetic operations that are performed in the inner loops operate on data blocks with the
size 3b2.If b has been chosen small enough that the data blocks fit into the cache the data transfer between
the slower RAM can be avoided. Depending on the computer platform, the speed up is up to several orders
of magnitude! The standardized numeric libraries BLAS and LAPACK are highly focusing on such level
cache techniques.
3.1.4 BLAS/LAPACK interfaces
Procedural vs. functional interfaces
An integral part of this thesis is the development of an intuitive programming interface that automatically
translates algebraic expressions into highly optimized function calls exploiting the above described issues.
BLAS and LAPACK provide highly efficient numeric subroutines, but using their interfaces is cumbersome,
e.g., a multiplication of a double precision general matrix Aand matrix Brequires a subroutine call with 13
arguments:
call DGEMM (tranA, tranB, m, n, k, alpha, A, lda, B, ldb, beta, c, ldc)
We refer to the BLAS documentation for the meaning of the arguments. When optimizing a program
package every algebra expression has to be replaced with such subroutine calls. In turn, the source code
becomes very difficult to read and to maintain. Furthermore, the large amount of subroutine arguments in
BLAS/LAPACK calls can slow down the development process. With various indices the developer might
introduce inconsistent arguments which are often not detected by the subroutine. The runtime behavior
can become unpredictable. The identification of the problem (often referred to as debugging) can be very
challenging. Due to the cumbersome subroutine interfaces of BLAS/LAPACK in many program packages
only key algorithms are optimized accordingly and the executional performance suffers.
In order to simplify the handling in many simulation packages wrapper procedures to the most important
BLAS/LAPACK calls have been developed. Here, the additional information like dimensions are often
hidden in modules/classes to reduce the number of arguments drastically, for example:
69
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
call MatrixMult (result, A, B)
Implementing algorithms based on such wrappers is much simpler than performing direct BLAS/LAPACK
calls, but such handling it still not intuitive. Algebraic equations could be better expressed using a functional
programming interface instead of calling subroutines. Therefore, some high-level programming languages
(like Fortran90/95 or C++) support the definition of own data types in combination with overloaded op-
erators. Using this technique unique data types for vectors or matrices can be defined which provide the
proper BLAS/LAPACK wrapper procedures. Operators, like * can be overloaded to execute the actual
BLAS/LAPACK DGEMM call. In program packages which provide such an algebra interface the above example
can now easily be transcribed to
MyMatrix :: result, A, B
A = ...
B = ...
result = A * B
The multiplication operator is simply overloaded such as
MyMatrix operator* (Matrix A, Matrix B)
{
Matrix result
call MatrixMult (result, A, B)
return result
}
The matrix elements can be of different data types, such as integer, single/double real, or single/double
complex. That means that the operator function has to be provided for all combinations of possible data
types. Furthermore, the actual BLAS/LAPACK interface depends on the matrix type (general, symmetric,
hermitian, trigonal packed, etc.). Therefore, the multiplication function needs to support all combinations of
data and matrix types. These combinations have to be considered for every algebraic operation. Although
such an approach would offer an efficient and intuitive programming interface for algebraic expressions, the
manual development of such a library would not be feasible.
In C++ functions can be defined as templates. In this programming technique functions can be developed
once for all possible data types. The actual type will be kept as template argument, e.g., <T>. During
the compilation, template arguments are replaced with proper data types. Hence, the compiler can create
functions with all possible combinations of types16. A simplified C++ source code could read as follows:
template<class T>
MyMatrix<T> operator* (MyMatrix<T> a, MyMatrix<T> b)
{
MyMatrix<T> result;
MatrixMult (&result, a, b);
return result;
}
16Since the compiler generates source code from generic types, such approach is often called generic programming.
70
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
This operator can be invoked for various data types:
MyMatrix<double> a = ..., b = ..., c;
MyMatrix<complex16> d = ...; e = ...; f;
c=a*b;
f=d*e;
Information about dimensions and matrix/vector element types are provided only during the declaration.
The source code addressing the actual algorithm is kept free of memory management and BLAS/LAPACK
function call mappings. Such an intuitive handling of algebraic expressions can speed up the development
and debugging process significantly. The task of memory management and numeric function mapping is
accomplished fully automatically at compile time. In contrast to a manual optimization, the compiler replaces
systematically all algebraic expressions with highly efficient subroutines. A manually written algebra program
can be as fast as such a generic library only if all numeric expressions have been replaced with proper calls
thoroughly.
A functional programming ansatz has the huge advantage of providing an interface reminiscent to equations
in textbooks. However, the executional performance in C/C++ is still tremendous. For example, a similar
algebra library forms the basis of the DFT++ project [67]. A significant performance drop arises from
the way of how functions return variables. The return variable is typically a local variable of the function
(variable result in the above examples). This variable is destroyed at the end of the function body. In
order to return it to the calling routine, the variable is copied onto the stack and then copied from the stack
into the destination variable. This copy operation involves data transactions between RAM and the CPU
via the data bus. Note that such data bus operations can be as expensive as the actual numerical operations.
When evaluating a vector expression like
a=α(b+c)d.(3.22)
various temporary objects (t1...t3) will be created
a=α(b+c)
&'( )
d(3.23)
=αt1d
&'() (3.24)
=αt2
&'() (3.25)
=t3.(3.26)
In BLAS level 1 vector and copy operations require approximately the same executional time. That means
that in the above example 7 copy operations (from and to the stack: 2×tiand copy t3$→ a) outweigh 3
numerical operations (a=α(b+c)d).Due to these additional copy operations functional programming can
be significantly slower than procedural programming approaches.
In order to address this issue, in S/PHI/nX the technique of reference counting has been adapted to algebraic
functions. In this technique an additional counter is attached to every data array. It counts how many
variables refer to a vector/matrix. When a vector/matrix is being declared the data array and a counter
with the value 1 is created. Every time a variable (or a temporary object) is assigned, the reference counter
is increased. Instead of copying the full data array only the pointer is being copied. If an object is being
71
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
destroyed, the reference counter is decreased. The data array itself is deallocated when the reference counter
reaches zero. Instead of returning the full data array only the pointer (8 byte) and the reference counter (4
byte) are copied to and from the stack. In case of the previous example only 56 bytes are copied. By applying
reference counting, functional programming becomes as efficient as procedural techniques. These techniques
allow to shift memory management, BLAS/LAPACK function call mapping, and data type handling entirely
to the compiler.
Computation of Data Types
In the previous paragraphs it has been shown that reference counting and C++ templates allow an efficient
functional programming approach in contrast to methods applied in most conventional packages which
facilitate procedural programming. In functional programming all operations can act on temporary objects.
Consider a vector expression Eq. (3.22). For every temporary object tithe compiler needs to know the data
type in order to perform the numerical operation. This is, however, not trivial when using C++ templates.
In case of many matrix expressions the optimal data type of the resulting temporary object depends also on
the matrix type and cannot be expressed simply with a template type <T>. The problem can be illustrated,
e.g., by means of the efficient computation of eigensystems which are crucial in many parts of S/PHI/nX
(e.g., tight-binding initialization, Löwdin orthogonalization). The eigenvalues of a general complex matrix
Mare complex while those of a hermitian matrix Hare real, i.e.,
Mm =mMm#C
Hh =hhh#R,H
ij =H
ji.
If this is not considered in the library, all subsequent operations on the eigenvalues would waste CPU time
and memory17. Therefore, a technique to compute the computationally optimal data type of temporary
algebraic expressions has been developed and implemented in S/PHI/nX. The basis of this approach are
“S/PHI/nX type mappers” which create relationships between the various data types. For every type (float,
double, single complex, double complex) relations to their corresponding real and complex counterparts (with
the same precision) are defined. Any S/PHI/nX data type Tcan be linked to the matching real or complex
type. For example, the type mapper expression T::Real refers to the real value of the T. If T is already a
real type, T::Real is replaced with T. The same applies for T::Complex”. If Tis a float T::Complex
is replaced with the single precision complex type at compile time. All functions in the S/PHI/nX algebra
library use such type mapper definitions consistently.
The function to compute eigenvalues of general matrices returns the S/PHI/nX type mapper T::Complex”.
If the general matrix was declared as matrix of double values, the resulting eigenvalues are automatically
double complex. The function to compute eigenvalues of symmetric/hermitian matrices on the other hand
returns T::Real”.
The technique of S/PHI/nX type mappers makes sure that even for temporary expressions, the computa-
tionally optimal data type is always being used for the evaluation. For example, the code18
double a = 10.;
17For example, the multiplication of complex values is computationally 4 times more demanding then the multiplication of
real values.
18The C++ statement cout < < a < < endl prints the value of a.
72
CHAPTER 3. S/PHI/NX 3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS
SxMatrix<Complex16> M = ...;
SxSymMatrix<Complex16> H = ...;
cout < < sizeof (a * M.eigenvalues()(0)) < < endl;
cout < < sizeof (a * H.eigenvalues()(0)) < < endl;
prints the size (in bytes) of the temporary expressions of the first scaled eigenvalue of a general matrix M
and a hermitian matrix H, respectively. The first one is 16 bytes since eigenvalues of a generalized matrix
are complex while the second one returns automatically 8 byte, i.e., the size of a double.
S/PHI/nX computes the required precision (single / double) of temporary entities. Complex values are
applied only if necessary, otherwise the computationally less demanding real types (float, double) will be
used. Hence, during the evaluation of algebraic expressions memory consumption can be saved and the
number of performed floating point operations can be reduced to a minimum (e.g., number of operations for
multiplications). This is crucial when considering algebraic expressions applied to large vectors/matrices.
Automatic error detection
The previously introduced techniques which have been applied/developed to provide an intuitive algebra
interface simplify the development process drastically. A major time in the development process is, however,
being spent in the debugging process. Typically, a significant amount of time is required in identifying typical
errors related to memory management, usage of inconsistent variables, or array index boundary violations.
The S/PHI/nX project aims at defining an environment for efficient development of numeric and physical
algorithms. Therefore, it is necessary to address the issue of an automatic identification of typical errors
related to numerics and physics.
The previously introduced restricted pointers in Fortran allow the vendors of (some) Fortran compilers to
provide a very important feature for developers, namely the array boundary checks. Whenever a restricted
pointer exceeds array boundaries, an error is being emitted. Such boundary violations typically occur
during the process of code development. This Fortran compiler feature can simplify the step of debugging
significantly. The typical behavior of the compiler’s run-time environment is to cause a program stop when a
boundary check was failing. Some compilers also print the code line in which the error occurred. With such
an approach only the function in which the violation occurred can be identified. However, the information
which is mainly required is the location of the calling procedure instead. Assuming a Fortran pseudo code
which provides a function trace that computes a trace of a matrix:
Matrix :: M (n:n)
m=n+10
a = trace (M, m) ! ERROR: m exceeds matrix dimensions
subroutine trace (M, n)
tr = 0
do i=1, n
tr = tr + M(i,i) ! program stops here
done
return tr
Fortran’s array boundary check would identify the boundary inconsistency in the trace subroutine. The
required information is, however, not the source code line where the error occurred but where the trace
73
3.1. BASIS-SET INDEPENDENT IMPLEMENTATIONS CHAPTER 3. S/PHI/NX
function was called from. For many applications the execution of a program inside a debugger is often not
an option as the execution is too slow for many realistic cases. The time spent for the location of similar
problems exceeds often the time of the actual implementation of an algorithm.
In C/C++ the more flexible approach to pointers exclude such an array boundary check. In C/C++
boundary violation can lead to program flaws which might be very difficult to locate. Consider for example
the example code
int i = 1;
int a[2];
a[2] = 10; // ERROR: boundary violation!
cout < < ‘‘i = ‘‘ < < i < < endl; // yields: ‘‘i = 10’’!
The variables will be defined in reverse order on the stack, i.e., a[0],a[1],i. Therefore, the memory address
of iis equivalent to that of a[2]. By executing this example code the value of variable iis overwritten19
by the write access of a[2]!
Before developing a complex simulation code in C++ a mechanism has to be found to exclude such severe
problems. Many huge software projects (such as the Linux kernel20, the Visualization Tool Kit21, or KDE22)
provide boundary checks in many functions. Typically, when a violation is detected a warning message
is being printed and the program continues. We consider this standard approach as not suitable for the
development of a simulation code.
In S/PHI/nX we introduced an unconventional approach to address this problem. Similar to other C/C++
software projects also in S/PHI/nX functions provide CHECK macros which verify boundaries23. Instead
of printing warning messages a memory fault is being initiated intentionally, namely
#define SX_CHECK (expr) if (!expr) { printSomeMessage(); *0 = 1; }
Every function in the S/PHI/nX library first verifies boundary accesses using such macros. If a violation is
detected the expression *0 = 1 is being evaluated. It tries to assign a value to the pointer to the NULL
address. This address is, however, located outside of the program segment and, therefore, protected by the
operating system. As a result the operating system initiates a segmentation fault and dumps a core file. The
generated core file contains all information (variables, stack, thread data) about the process at the time the
program stopped. In this case, the core file contains the data of the program when a function was being called
with inappropriate arguments. In contrast to Fortran’s boundary checks our solution allows an analysis of
a core file. A debugger can be used to backtrace the stack after the program execution and the calling
function (which caused the misusage) can be easily identified. It is worth mentioning that this approach is
significantly faster than executing the program directly in the debugger24. Therefore, this method works
even for larger test systems.
19This effect is known as buffer overflow and heavily exploited in various computer hacking techniques.
20http://www.kernel.org
21http://www.vtk.org
22http://www.kde.org
23All CHECK macros are activated only in the debug mode. When S/PHI/nX is compiled in the release mode which enables
all relevant optimization switches of the compiler, all CHECK macros are ignored.
24In order to identify program flaws the code is typically executed directly in the debugger. In this case every function
call and memory access is rerouted through the debugger to monitor all stack operations. This slows down the executional
performance by a factor of up to 10 (e.g. GNU Debugger gdb 6.6) compared to our approach.
74
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
This technique which is the basis of the automatic error detection in S/PHI/nX provides macros beyond mere
boundary checks. The library is equipped with macros to verify expressions with respect to divisions by zero
(SX_CHECK_DIV). Furthermore, macros to verify mathematical identities are provided (SX_CHECK_VAR). This
macro verifies whether a value is within proper limits. For example, the sqrt function that can be element-
wise applied to vectors rejects vectors with negative elements25. This approach exceeds by far conventional
assertation functions since physical identities can be validated. For example, if an algorithm expects an
input matrix Hto be hermitian it would test SX_CHECK (H.isHermitian())”. Source code which calls the
function with non-hermitian matrices can be instantly identified.
Many Fortran programs do not specify initial values for variables. This can be very dangerous since unini-
tialized variables depend on the value of the memory at the time the variable is constructed. Many Fortran
compilers provide a possibility of initializing all variables with zeros. This is both time consuming and
highly unportable as not all Fortran compilers provide this feature. In S/PHI/nX the error detection mech-
anism has been expanded to track down also uninitialized values without sacrificing executional speed. All
floating point variables are initialized with sqrt(-1) (instead of 0) which is NaN (not a number). All
vector/matrix operations are checked for the existence of NaN values. If an algorithm omits the initialization
of even a single element S/PHI/nX instantly triggers a segmentation fault and the function in question is
identified.
The techniques developed in this project are able to identify the most common development mistakes auto-
matically which improves the speed for development dramatically.
Benchmark results
The above mentioned techniques have been developed and implemented in a general algebraic C++ class
library. This algebraic library, which is called SxMath26, forms the backbone of our ab-initio program
package. The efficiency27 of this library is illustrated in Fig. 3.2. For typical vector and matrix sizes
commonly used in plane-wave applications performance benchmarks with respect to vector-vector, vector-
matrix, and matrix-matrix operations have been performed. The performance that can be obtained with
SxMath can significantly outperform existing algebra libraries, such as the BOOST library [113]. The poor
performance of Fortran’s matmul is due to the fact that most compilers do not use BLAS/LAPACK by
default. Some compilers, however, support compiler flags to call one of the _GEMM BLAS calls. Even if
the compilers support it, matmul cannot benefit from matrix shapes and matrix packing schemes. The good
performance of SxMath in all BLAS levels is due to the consistent combination of above discussed techniques.
SxMath also provides an intuitive algebra interface. In Tab. 3.1 some illustrative examples are presented.
The syntax is reminiscent to high level toolkits such as Mathematica28 or Maple29.
3.2 The Dirac notation in S/PHI/nX
The SxMath library offers a comfortable way to express algebraic equations while guaranteeing high per-
formance. A major challenge of this project was to extend this library such that the C++ compiler “un-
25This is only an example to demonstrate the concept. In S/PHI/nX a sqrt function which returns complex values in this
case has been implemented as well.
26SxMath: S/PHI/nX-Mathematics
27The codes have been compiled with pgf77 and g++, respectively.
28http://www.wolfram.com
29http://www.maplesoft.com
75
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
0 20000 40000 60000 80000
Vector size
0
0.5
1
1.5
2
CPU time (a.u.)
S/PHI/nX X-Press
Boost
FORTRAN
(a) BLAS level 1
1000 2000 3000 4000 5000
Vector size
2
4
6
8
10
CPU time (a.u.)
S/PHI/nX X-Press
Boost
FORTRAN
(b) BLAS level 2
1000 2000 3000
Matrix size
0
100
200
300
400
CPU time (a.u.)
S/PHI/nX X-Press
Boost
FORTRAN
(c) BLAS level 3
Figure 3.2: Benchmark results of algebra operations using conventional FORTRAN programming (matmul),
BOOST, and the SxMath library. The vector/matrix sizes have been varied within ranges similar to real
plane-wave calculations. A slower slope means better performance. The tests were performed on an Intel P4,
1.6 GHz. The available memory of 512 MB RAM made sure that the tests were not influenced by swapping
or paging.
(a) Vector-vector operations: The level caching cannot be used, the main speed-up can be obtained by com-
bining loop unrolling and software-pipelining (see example on page 66). Several operations are performed
in parallel on the CPU’s pipelines. As test equations of the form y=ax+yhave been chosen. Such equa-
tions are heavily used in the Gram-Schmidt-orthogonalization, in the update of the wave functions, and the
computation of the non-local pseudo potential energy. Note, that for systems containing many atoms (and
many states) this routine becomes crucial with respect to executional speed. Both SxMath and BOOST are
equally fast, the FORTRAN test performs slightly slower.
(b) Matrix-vector operations: BLAS level 2 allows to use both software-pipelining and level caching. The
test system is y=Mx. Matrix-vector multiplications are applied across the S/PHI/nX package, in particular
when updating the gradient |Ψ#=ˆ
H|Ψ#.FORTRAN’s matmul has a poor performance. The BOOST library
is almost twice slower than SxMath.
(c) Matrix-Matrix operations: The major speed-up in BLAS level 3 can be accomplished mainly by level
caching. A typical candidate to benchmark matrix-matrix operations is the matrix multiplication A=MC
as used in the tight-binding initialization and the all-state-conjugate gradient as well as the wdin orthog-
onalization. Hence, this operation becomes crucial for all kind of semiconductor/insulator calculations.
76
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
Equation S/PHI/nX
z=ax+bz = a*x + b;
Cik =,jAijBjk C=A^B;
Aij =M1
ij A=M.inverse();
M=LLtL=M.choleskyDecomposition();
S1=LLtL=S.inverse().choleskyDecomposition();
Table 3.1: Example of S/PHI/nX expressions using the SxMath library. We tried to keep the S/PHI/nX
source code as close as possible to common mathematical toolkits (such as Mathematica). At compile time
the code is replaced with the corresponding BLAS/LAPACK routines. Therefore, the evaluation of the
highly abstract function calls can be accomplished without any loss in performance compared to an explicit
BLAS/LAPACK function call.
derstands” the Dirac notation. The highly abstract level of the Dirac notation separates the numerical and
basis-set details from the actual algorithm (see Sec. 3.1.2). In order to implement a quantum mechani-
cal algorithm based on expressions given in the Dirac formulation, the numerical and physical context has
to be considered by the developer. In the following sections we derive new techniques which allow the
C++ compiler to put Dirac expressions in the proper contexts and hence, to perform some of the tasks
of physicists/chemists when developing quantum mechanical program packages. It will be shown that the
new techniques are able to generate code which is at least as efficient as conventionally developed program
packages while a quantum mechanical interface in a Dirac language can be provided. This is crucial since
usually higher abstraction results in performance penalties.
In the following sections we illustrate the concepts by means of key contributions to a pseudo potential plane
wave Hamiltonian (Eq. 1.46) such as the Hartree potential vH(Eq. (1.65)), the local vps
loc (Eq. (1.81)) and
non-local pseudo potentials vps
nl (Eq. (1.83)), and the kinetic energy ˆ
T(Eq. (1.53)). The chosen equations are
typical for the process of developing quantum mechanical program packages. Although the discussions in
the following sections focus on these examples, it will be shown that the concept is applicable to all elements
of the Dirac notation.
3.2.1 Conventional approach
Most quantum mechanical program packages follow the procedural approach (see Sec. 3.1.4) not only for
describing algebraic expressions but also quantum mechanical expressions. The source code files dedicated
to the description of the Hamiltonian consist typically of subroutine calls with a huge amount of arguments.
In the introduction of this chapter a representative source code example has been presented (see p. 61).
Subroutine calls with more than 20 arguments are typical for such approaches. The arguments are mainly
necessary to provide dimensions of the basis sets as well as various prefactors. This approach leads to even
less transparent source code if global variables are applied. When implementing new features modifications
often require massive changes across the code. The implementation of new and advanced algorithms requires
a significant amount of time.
3.2.2 Modular approach
With high level languages such as Fortran90/95 and C++ the relevant information can be kept together in
modules (Fortran) or classes (C++). For quantum mechanical expressions prefactors and dimensions can
77
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
be combined with physical entities such as wave functions, potentials, or charge densities. By overloading
functions and operators an object30 can be handled like an intrinsic data type, such as integer or float.
For example, the electronic charge density of a pseudo potential plane-wave code would contain the data
(density value of %(R)on the FFT grid) as well as functions to manipulate them (e.g., symmetrization,
normalization). Default operators, like “+” or “-”, would allow the computation of the sum of two densities
as simple as rho = rho1 + rho2”.
In order to approach a functional interface the key variables in quantum mechanics must be modularized.
That are the wave functions Ψ,basis-sets B, and electronic charge densities %.In modular programming,
functions can be defined that act on the self-defined data types. For example, a FFT function call could be
added to the wave function type, i.e.,
WaveFunctionG psiG = ...;
WaveFunctionR psiR = psiG.fftForward ();
Besides wave functions also electronic charge densities need to be transformed between reciprocal and real
space. The same transformation for charge densities would read
RhoG rhoG = ...;
RhoR rhoR = rhoG.fftForward ();
This example shows that functions (like Fourier transformations) should not be added directly into the
wave function or charge density object since redundant code would have to be implemented. A reduction
of redundancy in the source code can be accomplished by separating operator functions from data types.
Operator functions are related to the basis-sets. Hence, for R,G,G+k,and rown data types can be
defined. They contain dimensions, prefactors arising for transformations, and the projector functions.
In the following it will be discussed how wave functions and basis-sets are modularized in order to allow the
compiler a detection of the corresponding physical/numerical contexts.
Bloch-like wave functions
According to Sec. 1.4 Bloch-like wave function coefficients can be represented either in vector ciσk(G)or in
matrix form (CiG(σk)). The relevant quantum numbers to specify a state are the Bloch state i, the spin
channel σ.Furthermore, the kpoint is necessary to specify the state.
In Sec. 3.1.3 the advantage of blocking matrix algorithms has been discussed. In the matrix representation
each Bloch state iis kept as a column with npw row entries
"G+k|Ψσk#iG=CiG(σk)=
C11(σk)C21(σk)··· Cn1(σk)
C12(σk)C22(σk)··· Cn2(σk)
.
.
..
.
.....
.
.
C1npw (σk)C2npw (σk)··· Cnpw (σk)
.(3.27)
30An object is an instance of a class.
78
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
Property Value Interface
Construction |˜
ΨG+k#,GCiσk(G)eiG·rSxPW psi;
Referencing / Quantum numbers
Single state |˜
Ψiσk#Vector: ciσk(G)psi(i,iSpin,ik);
All states |˜
Ψσk#Matrix: CiG(σk)psi(iSpin,ik);
Table 3.2: Summary of the key functions to support Dirac notation for Bloch waves. Bloch wave functions
are implemented as containers of algebraic vectors. By choosing one of the referencing operators, either
(i,iSpin,ik) or (iSpin,ik), one comfortably can switch between vector or matrix representation.
In S/PHI/nX all states are stored in a SxMath object of type SxMatrix<Complex16>. By exploiting the
reference counting technique (see page 71) the full matrix is returned at once using the intuitive interface
psi(iSpin,ik). Subsequent operations on the object psi(iSpin,ik) can provide full BLAS3 support.
As mentioned previously (e.g., page 64) beside the computationally efficient matrix form of wave functions,
a vector representation is required to reduce the memory consumption. The vector representation considers
the ith column vector of the previous matrix individually
"G+k|Ψiσk#G=Ci(σk)=
Ci1(σk)
Ci2(σk)
.
.
.
CiNpw (σk)
.(3.28)
The interface psi(i,iSpin,ik) returns a single column of the matrix CiG(σk). In the approach of Ref. [67]
this operation is accomplished by copying an entire column which is computationally very demanding. In
S/PHI/nX, the reference counting technique has been adapted such that only a pointer to the first column
element is returned and the memory management of the returned vector object is deactivated. Instead,
the matrix handles the memory management. This approach has the advantage, that wave functions can
simultaneously be accessed in both matrix and vector shape. Since only pointers (8 bytes) are copied all
operations acting on wave functions are very efficient. The S/PHI/nX interface for Bloch-like wave functions
is shown in Tab. 3.2.
The return value of both psi(i,iSpin,ik) and psi(iSpin,ik) is a vector or a matrix of the above discussed
high-performance algebra library. Therefore, algebraic expressions on wave function coefficients will be
automatically optimized. For example, the code fragment to compute an overlap matrix
Sij ="Ψiσk|Ψiσk#(3.29)
would read
SxPW psi = ...;
int iSpin = 0, ik = 0;
SxSymMat<Complex16> S(nStates);
for (int i=0; i < nStates; ++i)
for (int j=i; j < nStates; ++j)
S(i,j) = psi(i,iSpin,ik) ^ psi(j,iSpin,ik);
79
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
Property Value Interface
Construction |µr#ˆ
Tisia|φisnlmσ#SxAtomicOrbials mu;
Referencing:
Single state |µisianlmσk#ˆ
Tisia|φisnlmσ#mu(iSpecies,iAtom,n,l,m,iSpin,ik);
Index map |µ˜
iσk#|µ˜
i)→isianlm#mu(i,iSpin,ik);
Table 3.3: Summary of the key functions to support Dirac notation for atomic orbitals.
and is replaced with ZDOTC BLAS calls during the compilation. The class representing the Bloch-like wave
functions is only a container of algebraic vectors. The actual mathematical calculation is performed in BLAS
calls mapped by the SxMath library.
Atomic orbitals
In a pseudo potential approach atomic orbitals are used to represent the pseudo wave functions. Atomic
orbitals |µ#are characterized by the quantum numbers n,l,mas well as the species and atomic indices is
and ia.Atomic orbitals in S/PHI/nX provide two index maps. The map ˜
iref $→ (isnlm)refers to a reference
orbital, i.e., |φisnlm#, or in S/PHI/nX syntax mu(is,n,l,m). The second index map ˜
i$→ (isianlm)refers
to an orbital shifted to the location of atom ia.By providing the additional parameter the corresponding
S/PHI/nX interface is simply mu(is,ia,n,lm).
This second interface to address an atomic orbital is equivalent to an application of the translation operator
ˆ
Tisia
|µisianlmσ#=ˆ
Tisia|φisnlmσ#.(3.30)
Beside the internal management of reference and atomic orbitals the two index arrays can simplify the source
code when applying operations fto all orbitals sequentially. Using the index map ˜
ithe code
for (int is=...)
for (int ia=...)
for (int n=...)
for (int l=...)
for (int m=...)
muNew = f (mu(is,ia,n,l,m));
can be transformed to a single loop over all orbitals
for (int iOrb=0; iOrb < mu.getNOrbitals(); ++iOrb)
muNew = f (mu(iOrb));
In Tab. 3.3 the S/PHI/nX interface for atomic orbitals is defined.
New wave functions
The above mentioned interfaces to represent wave functions can easily be extended in future code develop-
ments. Let us assume an extension of S/PHI/nX to support PAW. In this case the wave functions need to
80
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
be represented as Bloch-like waves as well as orbital-like partial waves. The modular concept would support
such an extension as follows.
The all-electron wave function consists of a smooth auxiliary wave function |˜
Ψ#as well as the partial waves
|˜
φ#and |φ#(SxAtomicOrbital) (see Sec. 1.5.2). The new entity to represent wave functions in PAW would
refer to SxPW and SxAtomicOrbital in the referencing operator. In order to provide an intuitive interface
operators like “+” and “-” can be overloaded.
Basis-sets
So far the modularized form of the wave function class connects an intuitive interface with high performance
algebra function calls. However, when developing a class describing the Hamiltonian various quantum
mechanical operations (e.g. Eq. (1.64), Eq. (1.53), Eq. (1.63)) are applied to wave functions. A mere
algebraic interface is not sufficient here since these operations depend on the choice of the basis-set. This
work aims for deriving methods that allow an Hamiltonian to be implemented independently of the basis-set.
Hence, the basis-set dependent operators should be modularized in terms of basis-set classes.
In a pseudo potential plane-wave approach, Bloch-like wave functions are represented in real space "R|
(application of the effective potential veff,Eq. (1.24)) and reciprocal space "G+k|(kinetic energy and non-
local potential, TEq. (1.53) and vnl,Eq (1.83)). Basis-set dependent operators are the evaluation of the
trace (Eq. (1.51)), the Laplacian 2, and the definition of a metric for scalar products. The radial space
"r| is necessary to describe atomic orbitals. Furthermore, projections between basis-sets have to be defined
in the same scope. Such an approach is also flexible with respect to a future extension to e.g. PAW. The
smooth auxiliary wave function |˜
Ψ#is sampled on "G+k|and "R|while the partial waves |φ#and ˜
|φ#are
projected on "r|.
In Tabs. 3.4, 3.5, and 3.6 the S/PHI/nX interfaces to "R|,"G+k|,and "r| are presented.
Hamiltonian
The modularized wave function and basis-set approach extends the functional interface which reflects already
the key elements of the Dirac notation. This can be illustrated by some examples of the Hamiltonian.
SxGBasis G;
SxGkBasis Gk;
SxRBasis R;
SxRhoG rhoG;
SxRhoR rhoR = G.projectTo (R, rhoG);
With the above variable declarations the kinetic energy defined in Eq. 1.53 becomes
for (i=0; i < nStates; ++i)
psi = waves(i,iSpin,ik)
for (iSpin=0; iSpin < nSpin; ++iSpin)
for (ik=0; ik < nk; ++ik)
eKin += omega(k) * focc(i,iSpin,ik) * Gk(ik).laplacian(psi,psi);
81
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
Property Value Interface
Sampling:
Grid points Rijk i
N1a1+j
N2a2+k
N3a3SxRBasis R;
Abstract wave function representation:
Identity |ΨR#,R|R#"R|Ψ#psi;
or: (R|psi);
Integration and Metric:
Scalar product "a|b#,ijk a
ijkbijk (a|b);
Trace trRX(R)∆Ω ,ijk X(Rijk)tr(X);
with ∆Ω =|a1·(a2×a3)|
(N1N2N3)
Projectors:
Project wave functions to G "G+k|R#1
ei(G+k)·R(Gk|R);
Project densities/potentials to G ˆ
GR1
,ReiG·RX.toG ();
Table 3.4: Summary of the key functions to support Dirac notation in real space contexts. Various Dirac
objects (property column) are presented with its physical values as well as the S/PHI/nX programming
interface. It can be seen that the interface is strongly reminiscent to the Dirac notation and the resulting
source code becomes very intuitive.
while via Eqs. (1.61), (1.62), (1.65), and (1.66) the Hartree potential/energy contribution is:
gaussianFunc = ...
for (is = ...)
rhoGaussG += G.phase(is) * r.projectTo (G, gaussianFunc); // Eq. (1.62)
vHartree = FOUR_PI/G.gVec(SxIdx(1:ng)) * rhoGaussG; // Eq. (1.65)
eHartree = 0.5 * G.trace (rho * vHartree); // Eq. (1.66)
The local pseudo potential/energy contribution Eqs. (1.80), (1.81), (1.82) can be expressed as:
locPsFunc = ...
for (is = ...)
vLocG += G.phase(is) * r.projectTo (G, locPsFunc); // Eq. (1.81)
eLocPs = R.trace (rhoR * G.projectTo (R, vLocG)); // Eq. (1.82)
The contributions to the gradient (Eqs. (2.5), (2.5)) can also be transcribed easily:
PsiG psiG;
PsiR psiR = Gk(ik).projectTo (R, psiG);
dPsiG = 0.5 * Gk(ik).g2 * psiG; // Eq. (2.5)
dPsiG += Gk(ik).projectTo (Gk(ik), vEffR(iSpin) * psiR); // Eq. (2.7)
Due to the modularization of quantum mechanical key entities in wave functions and basis-sets a Hamiltonian
can be implemented very intuitively. The transcription of expressions given in the Dirac notation into
82
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
Property Value Interface
Sampling:
Grid G Gijk b1i+b2j+b3kSxGBasis G;
Grid G+k(G+k)ijk SxGkBasis Gk;
(full mesh) with bi=2πbj×bk
bi·bj×bk
and cyclic
condensed Gig 0|Gijk|2Gcut G.gVec(ig);
mesh (G+k)ig 0|(G+k)ijk|2Ecut Gk.gVec(igk);
Abstract wave function representation:
Identity "G+k|Ψk#!
G+k|G+k#"G+k|Ψk#psi;
or: (Gk|psi);
Integration/Metric:
trace tr(ˆ%ˆ
A),G%(G)ˆ
A(G)tr(rho*A);
Operators:
Laplacian "G+k|ˆ
L|˜
Ψiσk#|G+k|2|ciσk(G)|2waves(i,s,k).laplacian();
ˆ
LCiG(σk) (|G+k|2CiG(σk)) waves(s,k).laplacian();
Translator "G|ˆ
Tis|Φis#Sis(G)Φis(G)G.T(is)*phi;
Translator "G+k|ˆ
T|µ#
Projectors:
to realspace "R|G#∆Ω ,Ge+i(G+k)·R(R|G);
ˆ
RG∆Ω ,GGe+iG·RX.toR();
Table 3.5: Summary of the key functions to support Dirac notation for G contexts. Sampling and integration
are performed on the regular (FFT) grid, the main projections are to R space using FFT.
Property Value Interface
Sampling:
Grid points riri=ri+1/rlog SxRadBasis r;
Wave functions |µr#SxAtomicOrbitals mu(r);
"r|µ#
Projectors:
Identity "r|r#1(r|r);
Project wave functions to R ˆ
Gr-2l+1
4π
4π
0drr2jl(|G|r)(G|r);
Table 3.6: Summary of the key functions to support Dirac notation for r contexts. Sampling and integration
are performed on an logarithmic mesh, the main projections are to Gspace.
83
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
source code can be accomplished in significantly shorter time than conventional procedural functions. Since
equations can be implemented almost directly from the Dirac notation we call this approach quasi-Dirac
notation.
The quasi-Dirac notation requires that the wave functions and basis-sets have to be explicitly provided.
This is error-prone since it must be ensured by the developer that basis-set and wave function objects always
match. This requirement makes an cumbersome management of such objects throughout the code necessary.
In the following section a way of solving this drawback will be shown.
3.2.3 Object-oriented approach
In order to obtain an intuitive programming meta language for quantum mechanical expressions the pre-
viously introduced quasi-Dirac ansatz has to be extended such that the compiler “understands” the Dirac
notation. In this section the required object-oriented programming techniques will be derived.
In order to detect the quantum mechanical context (see Sec. 3.1.3) a further degree of modularization is
necessary. C++ provides language elements for object oriented programming (OOP). As object orientation
is not yet a very common approach in the field of high performance computing its ideas are briefly sketched
here. In procedural languages, such as FORTRAN or Fortran90/9531, a program is a collection of functions
and subroutines which build up an algorithm. In OOP, the program is organized in terms of self-defined
data types (classes) which contain both data as well as functions which can act on this data. In contrast
to modular programming, in OOP hierarchies of classes can be defined. More complex classes can derive
properties from basis classes. For example, a density used in PAW would be defined of three contributions,
the plane-wave density ˜%(R)as well as the contribution inside the one-center densities %i(r)and ˜%i(r). By
applying OOP both densities could be defined in separate classes to represent radial densities. An abstract
density class could then combine %(R),%(r),and ˜%(r)in a new type for describing PAW densities. By
deriving from the basis classes (e.g. SxPW and SxAtomicOrbitals) the new class inherits all functions of
the basis classes. The functionalities implemented in the basis classes are available in the derived classes
without additional programming lines which simplifies the source code dramatically:
class A {
public:
void foo1();
};
class B {
public:
void foo2();
};
class C : public A, public B { /* derive from A and B */ };
C obj;
obj.foo1 (); // call derived function from A
obj.foo2 (); // call derived function from B
31Fortran90/95 is not object oriented although sometimes claimed otherwise. Fortran90/95 lacks the ability of polymorphism,
i.e., the capability of inheriting classes from various basis-classes. Polymorphism is one of the 3 required features of object
oriented programming besides inheritance and encapsulation. It will be shown that all 3 features are necessary to support
Dirac’s notation in the source code.
84
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
An important step in accomplishing a meta language that supports Dirac’s notation is the abstraction of
wave functions and basis-sets. Instead of operating explicitly on coefficient arrays (SxPW,SxAtomicOrbitals)
or the basis-sets (SxRBasis,SxGBasis,SxRadBasis), in the Dirac notation more general wave functions |Ψ#
or basis-sets "X|are used. Generalization of data types in C++ is conventionally accomplished by specifying
virtual functions (placeholders) in the basis classes and specifying the actual function in the derived class.
This approach can be illustrated by means of a Laplacian:
class SxBasisSet {
public:
virtual Psi laplacian (Psi); // empty placeholder
};
class SxPW {
public:
virtual Psi laplacian (Psi psiIn) {
return Gk.laplacian (psiIn); // perform laplacian in <G+k| basis
}
};
class SxAtomicOrbitals {
public:
virtual Psi laplacian (Psi psiIn) {
return rad.laplacian (psiIn); // performs laplacian in <r| basis
}
};
The advantage of such a construction is that an identical interface can be applied for both wave function
representations:
SxPW psiG = ...;
SxAtomicOrbital mu = ...;
lPsi = psiG.basis->laplacian (psiG);
lMu = mu.basis->laplacian (mu);
This identical interface for two very different functions can be only accomplished by attaching the information
about the basis to each wave function. Here a crucial compromise has to be made:
The actual numerical calculations cannot be performed with an abstract Dirac vector |Ψ#.Instead, a vector is
sampled on the basis-set and it is represented as expansion coefficients "B|Ψ#=c(B).In conventional codes
the algorithms act directly on these expansion coefficients. As a result those codes are strongly basis-set
dependent. In order to create a quantum physics library which can offer an interface using abstract Dirac
vector objects every |Ψ#needs to “know” the basis-set on which it is sampled on. In order to bridge the gap
between the notation of the abstract Dirac vector |Ψ#and the computationally necessary basis-set dependent
coefficient notation c(B)="B|Ψ#we introduce a new symbol |ΨB#.The index Bsymbolizes that for an
introduction of a Dirac vector in the source code, the basis-set has to be provided. This needs to be done
once, namely when constructing the vector. For the entire lifetime of the vector it keeps the information
about its basis-set B. For all operations acting on the vector, Bdoes not need to be provided anymore by
85
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
the developer. All algorithms acting on the new object |ΨB#can be now formulated independently of the
chosen basis-set.
The compromise |ΨB#has been implemented like the following example:
class SxPsi {
public:
SxBasis *basis; // pointer to ‘‘any’’ basis
};
class SxPW : public SxPsi {
public:
SxPW (SxBasis *basis); // specify basis pointer
};
class SxAtomicOrbitals : public SxPsi {
public:
SxAtomicOrbitals (SxBasis *basis); // specify basis pointer
};
// --- (A) Constructing Dirac elements.
SxGkBasis Gk = ...;
SxPW psiG (&Gk); // <G+k|psi>
SxAtomicOrbitals mu (&rad); // <rad|mu>
// --- (B) Transition from coefficient vectors to abstract Dirac vectors.
SxPsi psi = psiG;
// OR:
SxPsi psi = mu;
// --- (C) Type of psi and basis is ‘‘hidden’’ from here on.
// Using Dirac notation, no coefficient vectors anymore.
SxPsi dPsi = psi.basis->laplacian ();
During the construction of wave functions (A) the information about the basis-set is attached to the object.
Here, a wave function is represented as coefficient array c(B)sampled on a specified basis B. In the source
code block (B) the specified wave function type is converted to an unspecified wave function (wave function
basis class SxPsi), i.e., c(B)|ΨB#.In block (C) an operator (Laplacian ˆ
LB) can operate on an object
symbolizing a wave function |Ψ#instead of a coefficient array. A Hamiltonian can be implemented using
only the abstract wave functions |ΨB#as well as operators defined in the basis-set "B|.The corresponding
part of the S/PHI/nX class hierarchy is shown in Tab. 3.3. In Tab. 3.7 all elements of the S/PHI/nX Dirac
notations are presented.
In quantum mechanical expressions the introduction of identities is often useful. In order to make sure that
such identities do not generate machine code (which would slow down the performance) the S/PHI/nX Dirac
86
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
Figure 3.3: Illustration of the various abstraction levels in the S/PHI/nX class hierarchy by means of the
Hamiltonian: (I) In the source code of the Hamiltonian (central box) a writing style reminiscent to the Dirac
notation is realized using ’(’, |’, and ’)’ to build bras and kets. Additionally operators like the Laplacian
are used in the Hamiltonian. (II) At compile time all those operators are replaced with function calls in the
abstract basis-set interface (box left of the Hamiltonian). Depending on which basis-set is used the actual
implementation of the operators (projectors, Laplacian, etc.) are invoked. (III) The same approach is used
in the case of wave functions: The Hamiltonian interacts with an abstract wave function class (box right of
the Hamiltonian) to extract a single state. Depending on the used wave functions the actual wave function
container is used instead.
basis-set independent basis-set dependent formal S/PHI/nX internal S/PHI/nX
bra-ket notation bra-ket notation interface representation
|Ψ#,B|B#"B|Ψ#|ΨB# "B|Ψ#=c(B)
"B|Ψ# "B|,B|B#"B|Ψ# "B|ΨB# "B|Ψ#=c(B)
"X|Ψ# "X|,B|B#"B|Ψ# "X|ΨB#,B"X|B#"B|Ψ#=c#(X)
ˆ
Aˆ
AB"B|ˆ
A|B#
ˆ
Oˆ
OB"B|ˆ
O|B#
ˆ
Lˆ
LB"B|ˆ
L|B#
tr(ˆ%ˆ
A) trBtrB%ˆ
AB)
Table 3.7: Comparison between Dirac’s notation (1st and 2nd column) and our basis-set “aware” S/PHI/nX
notation (3rd and 4th column). Beside the wave functions also operators ( ˆ
A) such as the overlap ˆ
Oor
Laplacian operators ˆ
Lneed to have access to the basis-set. Also the evaluation of a trace is basis-set
dependent.
87
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
Expression Template
"G|R#return SxProjector<SxRBasis,SxRBasis> (&a,&b);
"R|G#return SxProjector<SxRBasis,SxGBasis> (&a,&b);
"r|G#return SxProjector<SxRadBasis,SxGBasis> (&a,&b);
"G|r#return SxProjector<SxRBasis,SxGBasis> (&a,&b);
Table 3.8: Registration of bra-ket combinations in S/PHI/nX. An extension to new basis-sets is trivial.
notation has to be extended with a basis-set dependent identity32
|1#"1||1B#"1B|.(3.31)
Applying these “new” basis-set aware operators and wave functions, the Hamiltonian and its application on
wave functions can be accomplished entirely basis-set independently. The same technique which has been
applied to generalize wave function classes can be adapted to transformations:
psiG.basis->projectTo (R); // SUM_G <R|G><G|psi> or simply <R|psi>
With the generalization classes SxPsi and SxBasis the Dirac notation employing bra and ket vectors can
eventually be defined. Therefore, a template class (see p. 70) to represent the bra-ket elements
template<class Bra, class Ket>
class SxProjector {
public:
Bra *braPtr; // pointer to <bra|
Ket *ketPtr; // pointer to |ket>
};
can be defined. In order to provide a full Dirac notation the “|” operator33 has been overloaded according
to all possible combinations (see Tab. 3.8). During the compilation an expression like (G|R) will be
replaced with the proper FFT function call. Please note that this source code expression returns a function
rather than a value! The interface is strongly reminiscent to the expression in the Dirac notation "G|R#
which allows a straightforward implementation of various quantum mechanical projectors without any loss
in computational performance!
Similar Dirac expressions have to be supported for wave functions such as "X|Ψ#.Since the numerical
operation depends in this case on the context (see e.g. Eqs. (3.18)-(3.21)) a different solution has to be
found:
template<class T>
operator| (SxRadBasis b, SxDiracVec<T> v)
{
32This identity operator is not necessarily required. In reality it is used in particular to verify that the basis-sets left and right
of the identity do match. Without such test facilities the high abstraction level could easily lead to barely traceable program
flaws.
33The default behavior of the “|” operator in C is the evaluation of the bit-wise OR operation.
88
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
return v.basis->projectTo (&b, &v);
}
template<class T>
operator| (SxDiracVec<T> v, SxRadBasis b)
{
return (b | v).conj());
}
“Virtual” templates In principle, this technique could be applied to all other basis-sets. Unfortunately,
this approach as presented so far is in violation to C++ since the virtual projector functions (projectTo)
contain template arguments (SxDiracVec<T> v). One of the fundamental language concepts in C++ is
type-safety. Virtual functions can be discriminated only by their function arguments. If these arguments
are templates, distinguishing virtual functions becomes impossible for the compiler. Therefore, the usage of
templates and virtual functions are mutually excluded in C++. In our ansatz, however, the usage of virtual
functions is crucial to create abstract wave function (p. 85) and basis-set objects (p. 86). Also template
arguments are substantial in our approach to obtain optimal numerical performance (pp. 70). In order to
address this problem the required C++ type safety has to be disabled temporarily. Therefore, in S/PHI/nX
the actual type of vector v is removed and replaced with a void pointer34. The above function call becomes
SxDiracVec<T> vec = ...;
void *vecPtr = (void *)(&v); // type cast to void
projectTo (b.getBasisPtr(), vecPtr); // Dirac vector has no type
Due to this type cast all function arguments are well defined and projectTo can be a virtual function that
is defined in the basis class SxBasis (p. 86). The pointer to the corresponding object can be extracted from
b. An abstraction of wave functions and basis-set is now possible. When performing the actual projector
operation the vector type is, however, required. For example, when performing a FFT function call it is
important whether the FFT mesh describes real or complex values. The above introduced type cast removes
all type information completely. Algebraic vectors can only contain real or complex values with either single
or double precision. In a first attempt the information about which of these types have to be considered in
the projector has been “attached” by casting any number (e.g. 0) to one of these types using S/PHI/nX
type mappers (p. 72):
projectTo (b.getBasisPtr(), vecPtr, (b::Type)0, (v::Type)0);
This solution still did not solve the actual problem, but the discrimination of input and output data type
of the Dirac projector function is now simpler to handle. With the two additional arguments one out of 16
projectTo functions35 can be easily be selected.
#define VIRTUAL_PROJECT_TO (B) \
projectTo (B *, void *, double, SxComplex16); // vector is C16, returns double
...
projectTo (B *, void *, SxComplex16, double); // vector is double, returns C16
...
34The C/C++ data type void refers to no type.
35Permutation of all 4 floating point types)
89
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
The necessary 16 functions can be joined in a single preparser text macro which will be added to every
basis-set class. The actual projector can then overload these functions in the derived class. Therefore, the
proper projector can be registered using the following macro
#define REGISTER_PROJECTOR(FROM,TO,PROJECTOR) \
virtual void projectTo (const TO, const void *in, void *o, float, double) {...}
...
virtual void projectTo (const TO, const void *in, void *o, float, SxComplex16) { ... }
...
With this set of macros a combination of virtual functions with (a limited number of) template arguments
can be accomplished within the C++ language concepts. The macros are available in S/PHI/nX service files
and do not need to be modified for future code developments. For each projector every basis-class needs
exactly one source code line. For example, the class for representing the G-basis, the identity 1G(Eq. (3.31))
and "R|G#(Eq. (1.48)) need to be registered:
class SxGBasis : public SxBasis
{
...
REGISTER_PROJECTOR (SxGBasis, SxGBasis, identity); // (G | psiG) = psiG
REGISTER_PROJECTOR (SxGBasis, SxRBasis, toRSpace); // (R | psiG) = FFT (psiG,psiR)
...
};
With these two additional lines the source code (G | psiG) will be replaced with the function call
identity while (R | psiG) will perform a FFT function call.
By applying the predefined S/PHI/nX macros, registering new basis-sets into the existing Dirac environment
of S/PHI/nX can be accomplished by copying these few source lines (REGISTER_PROJECTOR). Therefore,
introduction of new Dirac projectors in S/PHI/nX is extremely simple.
Dirac’s bra-ket notation is appropriate when describing wave functions. Projections of entities like charge
densities or potentials cannot be symbolized in the bra-ket style. In order to project those terms as simple
as wave functions we introduce new basis-set operators written in calligraph letters. An entity given in any
basis-set Xcan be projected onto the basis-set Bwith the operator ˆ
BX.For example, a density in basis-set
Bcan be projected to real space Rlike
%(R)= ˆ
RG%(G)(3.32)
or back to Gspace with
%(G)= ˆ
GR%(R).(3.33)
In order to accomplish a basis-set independent notation these operators can be extended such that they
project from any basis-set. For example, ˆ
RBwould become the generalized real space projector ˆ
R
%(R)= ˆ
R%(X).(3.34)
90
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
Our modified Dirac notation bridges the gap between the formal Dirac language as used by physicists and a
strongly basis-set dependent implementation as required in computer programs. The implementation simply
provides functions like toX() in all vectors:
SxVector<Double> vEffR = vEffG.toR ();
Combining Dirac elements So far only binary Dirac operators have been discussed. An efficient imple-
mentation, however, requires sometimes to evaluate several Dirac operations simulataneously. Consider, for
example, the evaluation of the kinetic energy in a plane-wave basis
Ekin ="ΨG+k|ˆ
LG+k|ΨG+k#.(3.35)
If only binary operations would be supported the evaluation would be inefficient, e.g., the following operations
would be involved
tk(G)=2ciσk(G)ei(G+k)·r
Ekin =!
iσk!
G
c
iσk(G)e+i(G+k)·rtk(G)
instead of the computationally less demanding direct evaluation according to
Ekin =!
iσk!
G|ciσk(G)|2|G+k|2.(3.36)
In S/PHI/nX such combinations are accomplished by introducing container types which only store infor-
mation about vectors or quantum numbers. For the above example an empty class SxLaplacian has been
defined
class SxLaplacian { };
as well as a class describing the combination ˆ
L|Ψ#
class SxLaplacianPsi {
SxLaplacianPsi (Psi psiIn) : psi(psiIn) { }
};
This container class only stores a reference (p. 71) to the wave function, but does not compute anything.
Due to reference counting this operation is fast. The remaining operation which applies "Ψ|from left has
been implemented similar to the operations introduced in Tab. 3.8, i.e., (psi | SxLaplacianPsi) will be
mapped to the efficient function which evaluates Eq. (3.36). By applying this technique the kinetic energy
reads in S/PHI/nX
SxLaplacian L; // create empty laplacian
for (i=...)
for (iSpin=...)
for (ik=...)
eKin += (waves(i,iSpin,ik) | L | waves(i,iSpin,ik);
This technique can be applied to evaluate any combination of Dirac expressions efficiently.
91
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
Figure 3.4: Demonstration of the compilation process by means of the implementation of "G|and "R|in
order to support projectors like "R|G#.When compiling "R|G#the virtual projector function of the basis-
anchor "Y|X#redirects the compiler to the actual implementations (FFT functional call). Similarly, the term
"G|Ψiσk#is replaced by a vector handled by the SxMath class. Due to object-orientation the expressions
,G"R|G#"G|Ψ#can be replaced by the proper FFT function call at compile-time.
The S/PHI/nX Dirac notation
In the previous paragraphs techniques to define projector elements "X|Y#between basis-sets and an interface
to represent wave function coefficients "B|Ψ#have been introduced. The missing element is to apply operators
or projectors to wave functions. This has been accomplished by combining all previously discussed techniques,
S/PHI/nX type mappers (p. 69), vector/matrix reference counting (p. 71), templates (p. 70), automatic
BLAS/LAPACK function call mappings (p. 68), abstract basis-set/wave function classes (p. 85/86), bra-
ket templates (p. 88), virtual template projector functions (p. 89), and the automatic error detection (see
Sec. 3.1.4) in a single function:
template<class Bra, class Ket>
operator* (SxProjector<Bra,Ket> proj, SxDiracVec<Ket::BasisType> vec)
{
SX_CHECK (proj.ket);
SX_CHECK (proj.ket == vec.basis)); // ‘‘|A><B|’’ is not allowed!
return proj.ket->projectTo (proj.bra, vec, Ket::BasisType(0), Bra::BasisType(0));
}
This operator can be considered as the “glue” of the S/PHI/nX Dirac library. Note that this function does
not generate machine code! Instead it is used by the compiler to create a highly efficient function call. The
left argument (“proj”) is replaced by a function while the right argument (“vec”) is replaced by a suitable
wave function coefficient vector. The operator can only create functions that are presented in Tab. 3.8 and
apply them to available wave functions.
92
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
(a) Basis-set layer (b) Wave function layer (c) Dirac vector layer
Property Example Property Example
Mesh BiBasis-set ptr. psi.getBasis() -Algebraic vector
Integration trBAccess states psi(i,s,k) -Operators: +, -, *, /, ^, ...
Metric "a|b#-Trigonometric functions
Projectors "B|B#,-Pointer to basis-set
"X|B#-Quantum numbers
ˆ
R,ˆ
G,ˆ
B, ... -Memory management
Operators ˆ
LB,ˆ
OB,, ... -BLAS/LAPACK
Table 3.9: Organization of basis-set layer, wave function layer, and Dirac vector layer to construct the
backbone of the Dirac notation at source code level. (a) The basis-layer defines general properties of a single
basis-set, such as the mesh sampling and integration, scalar products with metrics or projectors onto other
basis-sets. When adding new basis-sets only these features have to be overloaded in the new class. (b) Wave
function classes are nothing but containers of Dirac vectors. The most important function is how to extract
a single state. This function returns a Dirac vector. (c) Dirac vectors are algebraic vectors with a similar
functionality as offered by high level toolkits such as Mathematica. Besides the numerical capabilities of
these vectors information about the original basis-set and its quantum numbers can be retrieved. Our Dirac
vectors are equipped with an automatic memory management.
The compilation process of an expression like
!
G"R|G#"G|Ψ#(3.37)
is illustrated in Fig. 3.4. Now it becomes clear, why an identity operation 1B(Eq. (3.31)) is useful. If the
type of the vector psi is not known, the identity allows an explicit projection SUM ( G, (R|G) * ( G
| psi )”. Assuming psi is a wave function in Gspace already. In this case the term (G | psi) will
be replaced with the identity call. Otherwise a projection to Gis performed before the FFT function is
executed. Therefore, the two expressions
psiR = SUM (G, (R|G) * (G|psiG);
and
psiR = (R|psiG);
will result in the identical machine code (with the same performance). The same technique can be used to
apply all other elements of the Dirac notation such as projectors, operators, the trace, and scalar products
with metrics (see Tab. 3.9).
The context of a quantum mechanical expression is now detected by the compiler. Terms that are not
defined cannot be specialized and result in a compiler error message. This is crucial as unphysical quantum
mechanical expressions are automatically detected and will not compile in our method, even if the formal
syntax is correct. Therefore, the S/PHI/nX library is not only a class library for quantum mechanical
expressions. S/PHI/nX may be considered as a quantum mechanical meta language!
93
3.2. THE DIRAC NOTATION IN S/PHI/NX CHAPTER 3. S/PHI/NX
Equation S/PHI/nX source code
T=1
22|˜
Ψiσk#=ˆ
L|˜
Ψiσk#T = L | waves(i,s,ik);
˜
Ψiσk(R)="R|˜
Ψiσk#psiR = ( R | waves(i,s,ik) );
˜
Ψiσk(R)=,G+k"R|G+k#"G+k|˜
Ψiσk#psiR = SUM(Gk,(R|Gk) * (Gk|waves(i,s,ik)));
˜
Ψ(R)=,G+k,r"R|G+k#"G+k|r#"r|µ#psiR = SUM(Gk,SUM(r,(R|Gk)*(Gk|r)*(r|mu)));
εiσk="˜
Ψiσk|ˆ
H|˜
Ψiσk#eps = ( waves(i,s,k) | H | waves(i,s,k) );
eps
loc ="Vps
loc#= tr(ρVps
loc)eLocPS = tr(rho*vLocPS);
%(R)= ˆ
R%(G)rhoR = rhoG.toR ();
%(R)= ˆ
G%(G)rhoG = rhoR.toG ();
%ps(R)= ˆ
Rˆ
G%ps(r)rhoR = rhoPsRad.toG().toR();
Table 3.10: Demonstration of C++ source code using the S/PHI/nX Dirac notation. The source code is
strongly reminiscent to the original Dirac’s notation in quantum mechanics. Note, that there is no drop in
computational efficiency compared to a corresponding FORTRAN program.
With the high abstraction level of the source code the importance of the previously discussed automatic
error detection (Sec. 3.1.4) becomes more pronounced. For example, consider an expression operating on
two wave functions which have mismatching coefficient vector sizes (e.g. wave functions belonging to different
basis-sets):
SxRBasis R (...);
SxGBasis G1 (eCut);
SxGBasis G2 (2 * eCut);
SxPW psi1 (G1); // <G1|psi>
psiR = (R | psi1); // < R|psi>
psi = psi1 + ( G2 | psiR ); // (*) error detected here
The two vectors which are added in the last line have different number of vector elements which triggers the
error detection mechanism in SxVector<T>::operator+. The generated error message, however, does not
refer to SxVector<T>::operator+ since this information would not be helpful. Instead S/PHI/nX’s error
detection identifies the calling source line (*) and can, thus, identify immediately the semantic error.
In S/PHI/nX errors are detected in low-level routines but the messages refer typically to high level calling
functions. This mechanism allows S/PHI/nX developers to locate and remove most of the typical semantic
errors easily during the development process.
In Tab. 3.10 typical equations applied in Hamiltonians are shown. With our new meta language the previously
sketched Hamiltonian can be now formulated almost like in textbooks. The Hamiltonian in the quasi-Dirac
notation becomes in S/PHI/nX as simple as follows:
SxGBasis G;
SxGkBasis Gk;
SxRBasis R;
SxRhoG rhoG;
SxRhoR rhoR = rhoG.toR();
SxArray<SxDiracVec<Complex16> > T = G.getT ();
94
CHAPTER 3. S/PHI/NX 3.2. THE DIRAC NOTATION IN S/PHI/NX
The Hartree potential/energy contributions can now be very intuitively expressed
for (is = ...)
rhoGaussG += T(is) * SUM(r, (G|r)*(r|gaussianFunc)); // Eq. (1.62)
vHartree = FOUR_PI/G.gVec(SxIdx(1:ng)) * rhoGaussG; // Eq. (1.65)
eHartree = 0.5 * tr (rho * vHartree); // Eq. (1.66)
as well as the local pseudo potential/energy contributions:
for (is = ...)
vLocG += T(is) * SUM(r,(G|r)*(r|locPsFunc); // Eq. (1.81)
eLocPs = tr (rhoR * (R|vLocG)); // Eq. (1.82)
Also the previous example of the gradient can be encoded in a single transparent source line
...
dPsiG += (Gk(ik) | ( vEffR(iSpin) * (R|psiG) ); // Eqs. (2.5), (2.7)
...
and last but not least the expression to obtain the single particle energy (Eq. (2.10)):
double eps = ( psiG | H | psiG );
The application of our new techniques in quantum mechanical program packages combine various advantages:
The source code becomes very short and compact. Algebraic and quantum mechanical expressions
require only 1 or 2 source lines. The entire DFT Hamiltonian in S/PHI/nX could be implemented in
less than 550 source lines.
The functional approach makes the source code very transparent. Instead of reading many parts or
files the algorithm can be understood in short time.
The generated machine code is very efficient due to consequently mapping to highly efficient BLAS /
LAPACK calls whenever possible.
Numerics and computer related issues (like memory management) are strictly separated from physics.
Future code extension are simple due to the modular concept.
The implementation process is significantly faster than in conventional approaches. A sophisticated
debug environment ensures that unphysical algebra equations or Dirac expressions do not compile.
Physical and mathematical identities are validated at runtime to detect errors automatically. The
error detection is able to identify the calling procedure that causes the problem.
95
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
Figure 3.5: Fast development and execution speed requires often a mixture of generalization (grey shaded
boxes) and conventionally implemented algorithms (colored boxes). The S/PHI/nX concept supports both.
While the generalized form is usually very short, flexible, and simple to maintain, the conventional approach
is best suited for method development and testing. From left to right: Structure related methods obtain
their input from an abstract potential class. That can be either a DFT Born-Oppenheimer solver or any
emperical potential. The Born-Oppenheimer solver minimizes the energy of an abstract basis-set independent
Hamiltonian or, alternatively, of manually implemented basis-set dependent Hamiltonians.
3.2.4 Example
The benefit of a Dirac-like implementation can be best presented by means of a realistic code. In Tab. 3.10
the 1:1 relationship between expressions given in Dirac notation and the transcription into source code using
our library is presented. The chosen equations are typical expressions when developing DFT codes.
As an example consider a code that projects an atomic orbital µisianklm to real space and computes the
partial density afterwards in order to visualize the result %i(R)=|"R|µisianlm#|2.In the given example a
direct projection "R|r#is not defined. However, as both "R|G#and "G|r#are defined in S/PHI/nX the
partial density can be evaluated according to
%(R)=|!
G,r"R|G#"G|r#"r|µ#|2.(3.38)
In Tab. 3.11 the fully functional and compilable source code is presented. As can be seen from the source
code, this equation can be written as a single line in the code. As the library is based on the previously
mentioned BLAS/LAPACK interface (SxMath) the executable’s performance is very high.
3.3 Class Hierarchy
In the previous section abstract coding techniques have been derived which allow a high-performance imple-
mentation of quantum mechanical expressions. A critical design criteria of the S/PHI/nX library was that it
provides a flexible basis to work with various quantum mechanical Hamiltonians (see Sec. 1.5), Hamiltonian
derivatives (e.g. Density functional perturbation theory [106, 114]), new exchange-correlation functionals
96
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
Algorithm Source code
#include <SPHInX.h>
int main ()
{
// read input file
SxParser parser;
SxParser::Table input = parser.read (‘‘input.sx’’);
read {τisia}SxAtomicStructure str (input);
read {φps
isianl(r)}SxPseudoPot psPot (input);
// setup FFT mesh resolution and energy cut-off
setup N1,N
2,N
3SxVector3<Int> mesh (SxGBasis::getMesh(input));
read Ecut double gCut (SxGBasis::getEcut(input));
// build Dirac basis-sets and atomic orbitals
"R|SxRBasis R (mesh, str.cell);
"G|SxGBasis G (mesh, str, gCut);
"G+k|SxGkBasis Gk (G, input);
"r|SxRadBasis r (psPot, str.cell);
"r|µ#=φps
isianl(r)ylm SxAtomicOrbitals mu(psPot, r);
// compute partial density
int is=0, ia=0, n=0, l=1, m=0;
R.writeMesh3d (‘‘s.sxb’’,
|,G"R|G#"G|µis,ia,n,l,m#|2SUM (G, (R|G)*(G|mu(is,ia,n,l,m))).absSqr()
);
return 0;
}
Table 3.11: Demonstration of implementing an algorithm using the S/PHI/nX Dirac notation capabilities.
The presented code is not a pseudo code, but a complete and compilable source code!
97
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
(e.g. Exact exchange formalism [115]). To achieve this aim, also the electronic minimizers (such as steepest-
descent see Sec. 2.1.2, or conjugate gradient schemes, see Sec. 2.1.4) have to be flexible enough to allow an
straight-forward modification in future.
3.3.1 Electronic minimization
In Sec. 2.1 a brief overview about state-of-the-art schemes to diagonalize the Hamiltonian iteratively was
presented. The number of iterations nit which are necessary to reach the Born-Oppenheimer surface within
the required accuracy and, in particular, the number of evaluations of the gradient δˆ
H
!δΨ|,i.e., Eqs. (2.5),
(2.6), (2.7)) , determines strongly the performance of a DFT program package. In order to increase the com-
putational performance the identification and/or development of algorithms that minimize the number of
iterations nit is critical. Furthermore, the minimization schemes differ by the number of temporary wave func-
tion objects which increases the memory demand (see, e.g., Sec. 2.1.4). Therefore, the source code dedicated
to converge down to the Born-Oppenheimer surface is typically modified very frequently. In order to address
the issues of performance and memory consumption of the minimization schemes in S/PHI/nX various meth-
ods to simplify testing and implementation of advanced schemes have been developed. That are methods
to (i) strictly separate Hamiltonian / potential sources from that of the the multi-dimensional minimization
algorithms using automatic pointers, (ii) modular support of preconditioners as used in conjugate-gradient
schemes, (iii) a simultaneous support of vector and matrix representations, and (iv) a test environment for
iterative schemes. These methods will be discussed in the following.
Automatic pointers
As discussed above in S/PHI/nX a functional programming approach has been chosen instead of procedural
programming (see Sec. 3.1.4). On one hand functional programming increases the transparency of the code
drastically, on the other hand various problems ranging from performance drops to data type handling had
to be solved. In the previous section (pp. 70) it was shown how to overcome the drawbacks of functional
programming. We will describe in this section how an efficient functional programming can be accomplished
for obtaining the Born-Oppenheimer surfaces as well.
A key element for the iterative minimization schemes introduced in Sec. 2.1.1 is the gradient Eq. (2.3) . By
overloading the “|” operator, the evaluation of ˆ
H|Ψ#reads in S/PHI/nX as follows:
dPsi = H | psi;
The choice of which Hamiltonian should be used is not known at compile time. When performing a calculation
with S/PHI/nX the Hamiltonian should be chosen from an input file. Hence, the data types of Hand psi
are entirely unknown for the compiler. As introduced in Sec. 3.2.3 C++ provides placeholder functions
(virtual functions) which can be applied here. The usage of virtual functions requires, however, that Hand
psi cannot be static variables any longer. Instead C++ expects them to be pointers to the corresponding
class. For example, a generalized Hamiltonian class would provide an interface
class SxHamiltonian
{
...
98
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
virtual Psi operator| (Psi);
virtual double getEnergy (Rho);
};
The corresponding Hamiltonian class can be derived from SxHamiltonian and provides the virtual functions
operator| and getEnergy. The decision about the choice of the Hamiltonian can then be done at run-time:
// initialize ’hType’ from input file
...
SxHamiltonian *H = NULL; // create pointer to abstract Hamiltonian
if (hType == PW) H = new SxPWHamiltonian;
if (hType == Test) H = new SxTestHamiltonian;
...
Psi dPsi = H | psi;
...
delete H; // free memory
This approach is very error-prone. Here, C pointers have to be applied in the physics part of S/PHI/nX.
While C pointers provide a huge degree of flexibility (see Sec. 3.1.3) they are also very dangerous. Even minor
mistakes in their handling can create catastrophic problems, ranging from memory leaks to an unpredictable
run-time behavior. The difficult and error-prone usage of C pointers is one of the main hurdles for beginners
in C/C++. Hence, a main goal in the development of the S/PHI/nX library was to avoid a direct handling
with C pointers at the physics level. A safer way of providing the required flexibility had to be found.
Various modern computer languages (such as Java or ObjectiveC) tackle the problem of pointer misusage with
garbage collecting. In this approach an external process constantly searches for unused memory resources and
releases them. This approach is not suitable for high-performance computing (HPC): The garbage collecting
task consumes typically 5-10% of the CPU load which was not acceptable to us. Furthermore, resources can
be released with a considerable delay. When releasing huge objects (such as the memory for large matrices)
the memory might be required instantly for the subsequent operation. The approach of garbage collecting
was, therefore, no alternative for our approach.
In S/PHI/nX we, therefore, combine the reference counting technique which has been applied to reduce the
memory accesses in algebraic expressions (p. 71), C++ template techniques (p. 70), and automatic error
detection (see Sec. 73) to create automatic pointers. An automatic S/PHI/nX pointer is a template class for
any data type <T> which wraps all memory accesses. The reference counting ensures that the memory will
be automatically deleted when no object refers to the pointer anymore. The above pseudo code becomes
{
...
// --- read from input file
SxPtr<SxHamiltonian> H; // create automatic pointer to abstract Hamiltonian
if (hType == PW) H = SxPtr<SxPWHamiltonian>::create ();
if (hType == Test) H = SxPtr<SxTestHamiltonian>::create ();
...
Psi dPsi = H | psi;
99
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
...
} // H will be released here automatically
Every memory access of SxPtr<T> is monitored by the S/PHI/nX error detection mechanism and every
memory violation is detected immediately. The S/PHI/nX autopointer provides the same flexibility as C
pointers. No CPU-time consuming garbage collecting process is necessary. The memory is automatically
released without any delay.
The automatic S/PHI/nX pointers provide a simple way of generalizing Hamiltonians as well as potentials
and, consequently, to decouple the source code from the specific electronic minimization schema. The above
techniques to generate abstract and modular interfaces have been introduced in S/PHI/nX for Hamiltonians
and electronic minimization schemes to obtain the Born-Oppenheimer surfaces. Beside the DFT potential
also (semi-)empirical potentials are available in S/PHI/nX. All algorithms related to atomic structures
(structure optimization, molecular dynamics, transition-state search) access an abstract potential class. The
required techniques to create such general interfaces have been discussed in the previous section (p. 85/86).
Vector/matrix representation
In Sec. 2.1.4 the highly efficient all-band conjugate gradient and state-by-state conjugate gradient minimiza-
tion schemes have been introduced. The requirements for an implementation of both are very different. The
all-band conjugate gradient scheme requires a matrix formulation while the state-by-state conjugate gradient
method uses a vector form. This applies to wave functions Ψiσk/CiG(σk),the gradients |ξiσk#/ΞiG(σk),as
well as the preconditioners K(|Ψiσk#)/K(CiG(σk)).Following previous discussions from Sec. 3.2.2 both rep-
resentations are already supported in our approach for the first two cases. Only a simultaneous application
of both preconditioner forms has to be supported by the Hamiltonian class, i.e.,
PsiG SxHamiltonian::preconditioner (PsiG);
PsiGI SxHamiltonian::preconditioner (PsiGI);
The first function represents the preconditioner in vector form, the second function is the matrix counterpart.
The preconditioner source code can be decoupled from that of the electronic minimization library using
automatic pointers (Sec. 3.3.1) to the Hamiltonian, i.e., SxPtr<SxHamiltonian>.
The importance of a preconditioner is illustrated in Fig. 3.6 where the performance of the conjugate gra-
dient scheme is compared with an optimized steepest descent36 (Sec. 2.1.2). In both cases the tests have
been conducted with and without preconditioners. The steepest descent with quadratic line minimization
(optimized steepest descent) is shown as dotted line. The convergence rate is not logarithmic. Only conju-
gating the search directions can reduce the dimensions of the search vector space in every iteration and a
truly logarithmic convergence rate can be accomplished. In both cases the application of a preconditioner
can significantly improve the convergence behavior. As illustrated, a speed up factor of two can easily be
accomplished. In the performed test an Arias preconditioner has been applied. It can be seen that the
application of preconditioning is crucial to achieve optimal performances.
In the current implementation of S/PHI/nX, for the conjugate gradient schemes there are two preconditioners
available, the Payne preconditioner [61, 116]
K=27 + 18x+ 12x2+8x3
27 + 18x+ 12x2+8x3+ 16x4x=|G+k|2
gGi
(3.39)
36steepest descent with line minimization
100
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
025 50 75 100
iteration
1×10-12
1×10-9
1×10-6
1×10-3
1×100
Etot - E0
tot (H)
optimized SD, without preconditioner
optimized SD, with preconditiner
CG, without preconditioner
CG, with preconditioner
(a) total energy convergence
0 10 20 30 40 50
iteration
1×10-8
1×10-6
1×10-4
1×10-2
1×100
εi=17,k=19 - ε(0)
i=17,k=19 (eV)
20 Ry, no K
20 Ry, K
65 Ry, no K
65 Ry, K
75 Ry, no K
75 Ry, K
(b) one-particle energy convergence
Figure 3.6: Influence of the contributions to a preconditioned conjugate gradient scheme in case of ZnO bulk
to the (a) total energy convergence and (b) one-particle energy convergence of a randomly chosen state at
different energy cut-offs and a k-point folding of [1
2
1
2
1
2]×{4,4,4}. For each iteration the difference of the
total energy Etot or one-particle energy εto the converged solution E0
tot or ε0are given.
as well as the Arias preconditioner [67]
K=,8
i=0 xi
,9
i=0 xix=|G+k|2
Ekin/n (3.40)
with ndenoting the number of bands, gGirefers to the vector of Gcomponents of the gradient vector defined
in Eq. (2.22). Both preconditioners are constructed by polynoms depending on the kinetic contribution. The
denominator is a polynom one order higher than the nominator in order to remove the |G+k|2term. The
Payne preconditioner depends on the kinetic energy per state whereas Arias is using an averaged kinetic
contribution. In Fig. 3.7 the general shape of both preconditioners is compared. The Arias and Payne
preconditioner yield comparable convergence rates.
In Fig. 3.8 the residue vector |Ψ(G)Ψ0(G)|2obtained with and without application of a preconditioner is
plotted in the spectral representation. If the residue vector vanishes for all Gthe solution has been found. As
illustrated in Fig. 3.8 the high frequency contributions of the residue vector vanish when the preconditioner is
applied. Therefore, the dimension of the search space can be significantly reduced which leads to a dramatic
increase of the convergence rate.
In Sec. 2.1.4 the subspace rotation has been introduced in Eq. (2.34). For systems that can be described with
only fully occupied states (such as insulators or semiconductors), the application of a subspace rotation [67,
27] seems not to be important at first. As can be seen from Fig. 3.9 the application of a subspace rotation can
significantly improve the convergence speed and stability in case of semiconductors. Hence, in S/PHI/nX
the subspace diagonalization is applied for all systems by default.
Orthonormalization In the previously discussed minimization schemes the orthonormalization constraint
is fulfilled by a Gram-Schmidt orthonormalizer. Here each state is sequentially orthogonalized to all energet-
ically lower lying states. This scheme can also be formulated using blocked matrix-operations. The matrix
oriented orthonormalization method is known as wdin orthonormalization. It creates Cwhich is the
101
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
Figure 3.7: Shape of both Payne and Arias preconditioners during the electronic minimization cycle for
a ZnO-bulk with 65 Ry cut-off. Both preconditioners damp the high frequency components of the search
direction vector. In order to keep the norm of the search vector untouched the G=0component of the
preconditioner K(G=0) is 1. The numerical fluctuations are due to the kinetic contribution of the search
vector itself.
0 500 1000 1500 2000 2500 3000
G component
1e-08
1e-07
1e-06
1e-05
|Ψ(G) - Ψ0(G)|2
Rpreconditioner(G)
Rno preconditioner(G)
0 200 400 600
0
1e-07
2e-07
3e-07
Figure 3.8: Effect of preconditioning in case of a ZnO-bulk with 65 Ry energy cut-off. In this figure the G
frequency resolved spectrum of the residual wave function |Ψ(G)Ψ0(G)|2is depicted. Ψ(G)is the wave
function of a randomly chosen state after performing an iteration of a conjugate gradient method. Ψ0(G)
refers to the fully converged wave function belonging to the same state. The solid line depicts the result
obtained with an Arias-like preconditioner whereas the dotted line shows the same situation without the
application of a preconditioner. In both cases the error wave function has the strongest contribution in the
region of small Gvectors. Hence, the time consuming part of the minimization is due to the long-range
contributions. In case of an high energy cut-offsystem (here 65 Ry) the preconditioner can decrease the
error wave function significantly and hence the convergence rate becomes improved. The inset magnifies the
low-frequency region on an linear scale.
102
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
0510 15 20 25
iteration
1×10-7
1×10-6
1×10-5
1×10-4
1×10-3
Eband - E0
band (H)
with subspace rotation
without subspace rotation
Figure 3.9: Using the state-by-state conjugate gradient scheme for a band structure calculation of a
semiconducting system (ZnO-bulk). The diagram displays the convergence rate of the band energy
Eband =,iσkωkfocc
iσkεiσk.Note that the total energy Etot is not variational when the density is kept fixed
and hence, Etot cannot be used to analyze the convergence behavior. The energy axis is scaled according to
the converged band energy value E0
band.The picture also shows that the application of the subspace rotation
is also useful for semiconducting systems. Without subspace diagonalization the numerical error is energy
dependent. Energetically higher lying states show a larger numerical error and converge therefore slower.
When the numerical accuracy has been reached H is not entirely diagonalized.
matrix containing the orthonormalized wave function coefficients by applying an uniform transformation U.
S=˜
C˜
C(3.41)
Sv =sv(3.42)
U=v1
sv(3.43)
C=U˜
C.(3.44)
The implementation in S/PHI/nX is as simple as
I = S.identity ();
eig = S.eigensystem (); // Eq. (3.42)
U = eig.vecs.adjoint() ^ (I/sqrt(eig.vals)) ^ eig.vecs; // Eq. (3.43)
psi = U ^ psi; // Eq. (3.44)
Please note, that SxMath internally computes the optimal data types (see Sec. 3.1.4) depending on the
matrix shape and type of Sand automatically applies blocking (see Sec. 3.1.3). While the source code
remains very close to the actual equations, the executional performance is very high.
The usage of the wdin orthonormalization has two significant advantages over the Gram-Schmidt coun-
terpart. It is very efficient as it is based only on blocked operations and can, therefore, exploit modern
computer architectures most efficiently. Numerically it is also more stable than the Gram-Schmidt scheme
because numerical errors are uniformly distributed to all states equally. In the Gram-Schmidt scheme the
numerical error increases with higher lying states.
RMM-DIIS The final element in providing an efficient interface to implement efficient minimization
techniques is the support of charge density mixing schemes (see p. 51). The S/PHI/nX libraries and the
functional programming approach allow a straightforward implementation of the RMM-DIIS mixing scheme:
103
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
for (i=0; i < m; ++i) {
for (j=i; j < m ; ++j)
A(i,j) = (dR(i) | dR(j)); // Eq. (2.51)
B(i) = (dR(i) | R(m)); // Eq. (2.53)
}
alpha = -(A.inverse() ^ B); // Eq. (2.52)
M = g2 / (g2 + q0); // Eq. (2.49)
K = rho + (M * rho.toG()).toR (); // apply metric in <G| space
rhoOpt = K*R(m); // Eq. (2.54)
for (i=0; i < m-1; ++i)
rhoOpt += alpha * (dRhoIn(i) + K*dR(i));
In order to increase the source code transparency, in S/PHI/nX the preconditioner Kis defined as an operator
which can be applied on charge densities or residual vectors via the *”–operator.
Linearized gradient test
The iterative nature of the electronic minimization schemes introduces a difficulty when developing/modifying
the Hamiltonian or potential classes. During the development process a significant time is spent in testing
the modified algorithms. In case of iterative minimization schemes this is, however, challenging. The result
can only be verified after full convergence has reached. In case of inconsistencies the result will converge to
an unphysical value and locating the source of the problem is not trivial. This problem also applies to the
identification of the origin of numerical inaccuracies. In numerics various expressions can introduce losses in
the accuracy, e.g., division of large values by small numbers [92]. After the SCF cycle the identification of
the source line where such numeric instabilities have been introduced is difficult.
In order to address these two important issues, in S/PHI/nX a test environment to verify the consistency
of potential contribution and its corresponding gradient has been developed. This linearized gradient test
environment is demonstrated in Fig. 3.10. In this example the performance of a quadratic line minimization
(see Sec. 2.1.4) is analyzed to test whether a second order fit using one trial energy E(n)
trial and the derivative
Eq. (2.26) is sufficient. While the incorporation of higher orders might lead to a better sampling of the true
energy function, more total energy values or derivatives need to be provided. Every energy value or derivative
is computationally demanding37. From Fig. 3.10 one can see that even far offthe minimum the prediction
of λmin is only 20% smaller than the true minimum. With every subsequent iteration the prediction is close
to the true value. Hence, for our applications higher order fits would only waste CPU time.
Such tests are typical for the development process. In S/PHI/nX any part of the Hamiltonian can be tested
non-iteratively and separately for single states using the linearized gradient test. Numerical inconsistencies
between the analytic energy expression and the numerical linearized gradient can be determined easily.
By applying the linearized gradient test for every contribution to the Hamiltonian numerical instabilities
could be easily identified. The highest numerical accuracy of the total energy in S/PHI/nX is currently38
Etot 1e12 H.This is equivalent to the estimated achievable numerical accuracy39.
37It requires an update of the Hamiltonian ˆ
H["]and an orthogonalization step!
38The measurements have been performed on AMD Opteron 246, 64-bit, using GNU C++ compiler.
39The achievable accuracy can be roughly estimated [92] depending on the involved operations: data type “double” = 16
digits. (+,)=1 digit,(, /, sqrt) = 2 digits,(sin,cos,exp) = 4 digits.
104
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
00.5 11.5
λ (a.u.)
-130
-120
-110
-100
E (Hartree)
Etot
Efit
Elin
Etrial
λmin
E0, D
(a) After initialization with random numbers
-0.2 0 0.2 0.4 0.6
λ (a.u.)
-140.5
-140.4
-140.3
-140.2
-140.1
-140
-139.9
Etot (Hartree)
Etot
Efit
Elin
0.3 0.35 0.4 0.45
λ
-140.47
-140.46
-140.45
-140.44
-140.43
Etot
(b) 10th iteration step
0510 15 20 25 30
iteration
0
0.5
1
1.5
| λmin- λmin,fit |2 (a.u.)
(c) Differences between predicted and acutal
minimum location
Figure 3.10: Quality of the quadratic line minimization demonstrated by means of a ZnO bulk analyzed using
the linearized gradient test with the total energy Etot,the fitted energy Efit (Eq. (2.27), and the linearized
energy Elin (Eq. (2.26)). (a) Far offthe minimum the predicted energy value does not fit well the actual total
energy value. However, applying the predicted minimum λtrial improves the wave function significantly. Each
iteration step leads to a new starting value that is closer to the quadratic regime. Therefore the quadratic
line minimization predicts the minimum better with every step. (b) After performing a few iterations the
fitted total energy Efit,0is still not predicting the real total energy minimum (see Fig. (b) inset). However,
the discrepancy between λtrial and the true λis very small. Higher order fits could not significantly improve
the predicted λand would, therefore, not justify additional (expensive) total energy calculations. The closer
the wave functions are to the self-consistent solution the better the quadratic fit. Hence, the scheme stabilizes
itself. (c) The difference between the real and the predicted location of the minimum vanishes rapidly with
progressing conjugate gradient steps.
105
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
3.3.2 Representing atomic structures
In a plane-wave package the majority of the CPU time is spent in the computation of the Born-Oppenheimer
surface (Sec. 2.1) and the Hellmann-Feynman forces (Sec. 1.7.1). The time which is required to move atoms
along the force gradients to relax atomic structures or to perform molecular dynamics (Sec. 2.2.2), can
be neglected compared to the minimization of the total energy. However, the S/PHI/nX project aims to
be a basis-set independent program and many different potential types should be applicable within this
framework, ranging from computationally expensive DFT potentials to fast empirical potentials such as
Stillinger-Weber or Lennard-Jones potentials [1]. Empirical potentials can easily be applied to systems with
thousands or tens of thousand of atoms [4]. The evaluation of the forces with these potentials is very fast.
Therefore, when developing/implementing structure optimization algorithms into S/PHI/nX, performance
is a most crucial issue. In order to cope with this performance problem while providing a similarly intuitive
library like the S/PHI/nX DFT library we introduce in this section a concept of how atomic structures can
be represented efficiently with respect to CPU time while providing an intuitive programming interface.
Modular description of atomic structures and forces
The equations used in structural related algorithms, such as structure optimization [99], transition state
searches [1], phonon calculations (Sec. 2.3), or molecular dynamics are often based on Newton’s mechanics.
Most of these schemes involve rather simple algebraic equations. Yet actual implementations in modern
programs tend to be large. Consider for example a simple damped Newton algorithm. The atom structure,
given as a set of atomic positions {τ},can be relaxed according to [61]
τn+1
isiad= (1 + λis)τn
isiadλisτn1
isiad+µisFisiadwith d=(xyz).(3.45)
Here τisiadenotes the atomic position of the atom iabelonging to the species is.λand µare species dependent
convergence parameters. λcan be interpreted as a damping parameter and µas a reduced mass. Fisiadis
the dth component of the force vector. The previous, current, and new atomic structures are identified by
their iteration numbers n1,n, and n+1,respectively. The implementation of such expressions requires
extensive loops over the indices is,i
a,and d. Some representative example code code read as follows:
for (is=0; is < nSpecies; ++is) {
for (ia=0; ia < nAtoms(iSpecies); ++ia) {
for (d=0; d < 3; ++d) {
tauNew(is,ia)(d) = (1. + lambda(is)) * tau(is,ia)(d)
+ lambda(is) * tauPrev(is,ia)(
+ mu(is) * F(is,ia)(d);
tauPrev(is,ia)(d) = tauNew(is,ia);
}
}
}
Following the discussions in the previous section the resulting machine code is inefficient: numerical oper-
ations on the arrays tau,lamda, and Fare inefficient because software pipelining/look-ahead mechanisms
(p. 66) cannot be applied. Even more important, this example code does not benefit from matrix blocking
106
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
algorithms (Sec. 3.1.3). Besides the efficiency, the development of an intuitive source code has been and is
always a key item in the S/PHI/nX project. The required source lines for loops and index handling requires
a detailed index-based implementation which can be cumbersome and error-prone40. In order to simplify
Eq. (3.45) the terms are separated into classes of variables:
1. 3d coordinate vectors depending on (d, is,i
a),e.g., τor F
2. parameters depending on the species isonly, e.g., λor µ
3. entities independent of species or atoms
For the first group the S/PHI/nX class SxAtomicStructure has been implemented which basically represents
the atomic coordinates in terms of nspecies coordinate matrices with dimensions 3×natoms.These coordi-
nate matrices are represented with SxMatrix<T> objects (see Sec. 3.2) which allows for an efficient BLAS3
mapping. The algebraic operators (’+’, ’-’, ’*’, ’/’ etc.) must be defined such that they loop automatically
over the proper indices, for example:
operator+ (SxAtomicStructure a, SxAtomicStructure b)
{
// --- verify code consistency (see Sec. 3.1.4)
SX_CHECK (a.getNSpecies() == b.getNSpecies());
SX_CHECK (a.getNAtoms() == b.getNAtoms());
SxAtomicStructure res(...);
for (is=0; is < a.getNSpecies(); ++is)
res(is) = a(is) + b(is); // mapped to BLAS3 matrix operation!
return res; // using ref. counting (p. 71)!
}
Note that the expressions res(is),a(is), and b(is)are efficient matrix operations.
Loops of the second type of the above list (species dependent entities) can be implemented similarly:
operator* (SxVector<T> a, SxAtomicStructure b)
{
// --- verify code consistency (see Sec. 3.1.4)
SX_CHECK (a.getSize == b.getNSpecies());
SxAtomicStructure res (...);
for (is=0; is < b.getNSpecies; ++is)
res(is) = a(is) * b(is); // mapped to BLAS2 operation
return res; // using ref. counting
}
40In this simple example 3 source lines are required for index loop handling for a single equation. For more complex algorithms
considering case differentiations (discussed below) the ratio between source lines for index handling and actual expressions can
become even worse.
107
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
With such an approach the extensive index handling can be completely avoided and the corresponding source
code for implementing Eq. (3.45) becomes as simple as
SxAtomicStructure tauNew, tau, tauPrev, F;
SxSpeciesData lambda, mu;
while ( (tauNew-tau).absSqr().maxVal() > 1e-8) { // convergence?
F = hamSolver.getForces (tau); // independent of actual H
tau = tauNew;
tauNew = (1+lambda)*tau - lambda*tauPrev + mu*F; // Eq. (3.45)
tauPrev = tauNew;
}
As for the electronic part (previous section), the numerical operations are mapped to the proper BLAS calls
to guarantee peak performance.
Coordinate representation
Depending on the structure optimization schemes coordinates of atoms or forces must be treated differently.
A very intuitive way of describing a coordinate is in the Cartesian system, e.g.,
τcart =
x
y
z
.
In other scheme, such as the quasi Newton scheme [99] which is described in the following section, a degree
of freedom (DoF) representation of all coordinates at once is necessary, e.g.,
τDoF =
xia=1
yia=1
zia=1
xia=2
.
.
.
zia=na
.
Some algorithms even need to change between both representation repeatedly41. Due to efficiency consid-
erations copying of the vector elements must be avoided. The previously sketched solution combines both
representations. The internal storage using <T> allows a DoF representation while in S/PHI/nX functions
applying reference counting (p. 71) extract Cartesian coordinates without copying elements.
Transformations
In structure optimization calculations it is often very useful to apply constraints. For example, when per-
forming a relaxation of the surface layers only, the surface atoms should be relaxed whereas atoms belonging
41For example, structure relaxation of spatially constrained atoms (coordinate representation) with the BFGS algorithm [99]
(DoF representation).
108
CHAPTER 3. S/PHI/NX 3.3. CLASS HIERARCHY
to the bulk region must be kept fixed. Furthermore, it should be possible in S/PHI/nX to constrain high-
frequency atomic movements along specified directions. When computing molecular systems center of mass
filters can be applied or rotations due to numerical noise have to be projected out. Typically those operations
are intermixed with the structural algorithms such as
for (is=0; is < nSpecies; ++is) {
for (ia=0; ia < nAtoms(is); ++ia) {
if (moveAtom) { // keep atoms fixed (‘‘sticky filter’’)
if (applyCenterOfMass) {
...
}
if (applyHighFreq) {
...
}
if ...
}
}
The same filter operations have to be applied to all structural algorithms which lead to a large redundancy
of source code fragments. In order to decouple such constraints from the actual multidimensional mini-
mization algorithms, “S/PHI/nX transformation pipelines” have been introduced. In this approach one can
define all transformations in terms of a new operator acting on τisiaand/or Fisia.Letting ˆ
Tbe a general
transformation, Eq. (3.45) would become
τn+1 = (1 + λ)τnλτn1+µˆ
TF.(3.46)
The actual form of ˆ
Tcan be defined elsewhere, for example as user input from an input file. Of course,
transformations have to be capable of combing like
ˆ
T=ˆ
T1|ˆ
T2|. . . |ˆ
Tn.(3.47)
Every transformation can be modularly defined in a separate S/PHI/nX class:
class SxStickyFilter : public SxTransform
{
...
operator* (SxAtomicStructure a) {
SxAtomicStructure res = ...;
return res;
}
};
class SxCenterOfMass : public SxTransform
{
operator* (SxAtomicStructure f) {
return f - f.sum () / f.getNAtoms();
109
3.3. CLASS HIERARCHY CHAPTER 3. S/PHI/NX
}
};
The base class SxTransform provides the pipeline mechanism which allows to decouple the source codes of
transformations entirely from those of the minimization schemes:
// --- create dynamic pipeline of transformations
SxTransform T;
if (applyStickyFilter) T = T | SxStickyFilter (...);
if (applyCenterOfMass) T = T | SxCenterOfMass (...);
// --- decoupled structural minimization scheme
while ( (tauNew-tau).absSqr().maxVal() > 1e-8) { // convergence?
F = T | hamSolver.getForces (tau); // apply transformation pipeline
tau = tauNew;
tauNew = (1+lambda)*tau - lambda*tauPrev + mu*F; // Eq. (3.46)
tauPrev = tauNew;
}
Quasi Newton
The quasi Newton scheme has been presented in Sec. 2.2.1. Using the previously discussed techniques for
representing atomic structures in S/PHI/nX and the usage of transformation pipelines this algorithm can
be implemented remarkably simple while the machine code is very efficient due to reference counting (p. 71,
blocking (p. 76). This efficiency is important for large-scale calculations using empirical potentials.
tau -= B.inverse() ^ g.coordRef (); // (A): convert DoF to cart. repr.
g = T | getForces(x); // (B): apply transformation pipelines
s = tau - tauOld; // Eq. (2.59)
y = g.coordRef () - gOld.coordRef (); // Eq. (2.60)
// --- (C): BLAS mapping
B -= B^s^s.transpose()^B.transpose()) / (s.transpose()^B^s) // Eq. (2.61)
- (y^y.transpose()) / (y.transpose()^s);
In this example all techniques described in this section have been applied. In line “(A)” the degree of freedom
representation is converted without performing copy operations to the Cartesian representation. The forces
are filtered/transformed in line “(B)”. The DoF representation is mapped to BLAS calls in line “(C)”.
3.3.3 Add-ons
So far we discussed the modular library aspects of the S/PHI/nX project. By including the above described
classes simulation programs, analysis tools, or other preparation tools can be easily developed by including
the S/PHI/nX class libraries. The executables built with S/PHI/nX are called S/PHI/nX add-ons. They
provide a standardized command line interface. The output of one S/PHI/nX add-on can serve as input
110
CHAPTER 3. S/PHI/NX 3.4. COMPARISON WITH VASP
for another. This way S/PHI/nX add-ons are scriptable. Currently S/PHI/nX comes with a set of about
50 add-ons such as tools to setup complex atomic structures, analysis tools to compute (partial) densities
of state (DOS), tools to operate on wave functions and/or charge densities/potentials as well as file i/o
converters to connect the S/PHI/nX project to 3rd party tools.
The usage of the highly abstract S/PHI/nX library allows the add-ons to be remarkably short with respect
to the number of source lines. Typically, even complex analysis operations could be programmed with 50-150
source code lines.
Since the IT market progresses rapidly with respect to the introduction of new hardware and operating sys-
tems, S/PHI/nX has been developed as cross-platform project. To simplify future ports to new architectures,
S/PHI/nX has been and will be developed while strictly obeying the ANSI C++ standard which allows to
use a wide range of C++ compilers. In order to communicate with the underlying operating system only
a subset of POSIX functions has been applied. As a result S/PHI/nX is available on all major platforms,
such as Linux, MacOS X, FreeBSD, AIX, HP-UX, Windows XP/Vista/7rc1.
3.4 Comparison with VASP
In the previous sections we introduced the S/PHI/nX approach in which quantum mechanical algorithms can
be implemented in a formulation which is strongly reminiscent to the Dirac notation. This provides a number
of advantages: new algorithms can be easily developed and tested and code maintenance becomes simple
and the program package can be kept small. The ultimate question, however, is whether such advantages
introduce performance penalties which, of course, would render the entire approach useless.
In order to estimate the actual performance of the S/PHI/nX approach it is necessary to compare the run-
time performance of realistic simulations with other standard-codes. In the scope of this work thermodynamic
properties of III-V semiconductors are investigated. In the following chapter it will be shown how demanding
the computation of thermodynamic properties from first-principles is with respect to both accuracy and
run-time performance. Hence, we can combine the investigation of thermodynamic properties of III-V
semiconductors with performing benchmarks of the S/PHI/nX program package. In order to evaluate the
benchmark results, reference data have to be taken into account. In the DFT community one of the most
important and widely applied codes is VASP [26], the Vienna ab-initio simulation package. This package
became very successful due to its high accuracy and performance. It has been successfully employed to a
wide range of applications. Therefore, it is a good choice to consider performance data obtained with the
VASP42 package as reference for the following benchmarks.
Both VASP and the current version of S/PHI/nX are plane-wave codes. VASP, however, employs ultrasoft
pseudo potentials (USPP) and PAW while at the moment S/PHI/nX supports only norm-conserving pseudo
potentials. For systems which require an accurate description of the semicore states, these states need
to be treated as valence in the norm-conserving pseudo potential approach and a much higher energy-
cutoffis required than a comparable simulation based on USPP. The goal of the benchmark is to verify
that the S/PHI/nX ansatz yields a good run-time performance and not to confirm that USPP/PAW often
outperforms norm-conserving pseudo potentials. Although the potentials of both packages differ important
scaling information from such benchmarks can be derived:
42The tests have been performed with VASP 4.6 (serial version, pgf95) on AMD Opteron 246, 2.4 GHz.
111
3.4. COMPARISON WITH VASP CHAPTER 3. S/PHI/NX
05000 10000 15000 20000 25000
Volume (Bohr3)
1
2
4
8
16
32
64
Total Time (sec.)
S/PHI/nX
VASP
Figure 3.11: Run-time performance of S/PHI/nX and VASP in the low cut-offregime. Total execution time
(user + system time) of a single Si atom in a simulation cell with varying volume. Both codes show the
same overall scaling behavior with respect to the number of plane-waves.
1. The scaling behavior of both codes with respect to the unit cell volume or the number of plane-
waves is expected to be similar since both are plane-wave codes. A worse scaling of S/PHI/nX would
indicate problems of the abstract approach in S/PHI/nX. In contrast to conventional packages many
operations which are usually manually written are automatically mapped in S/PHI/nX (pipelining:
p. 66, BLAS/LAPACK mapping: p. 68, data type mapping: p. 72, Dirac projector mapping: p. 89). If
the generic S/PHI/nX interface is not as efficient as human developers a drop in the scaling behavior
schould be observed.
2. The comparison of the scaling behavior with respect to the number of atoms at the same energy cut-offs
and FFT mesh sizes allows to test the efficiency of the non-local projectors.
3. Benchmarks with the optimal parameters for each code (energy cut-off, mesh sizes, etc.) can test the
numerical accuracy that can be obtained with S/PHI/nX in comparison to VASP. Furthermore, the
overall timing for realistic studies of systems can be compared.
The first test addresses the scaling behavior with respect to the unit cell volume by means of a single Si atom
performed in a simulation cell with varying size. In this test the number of states and projectors remain
constant while the number of plane-waves is being changed. This test has been performed with the energy
cut-offES/PHI/nX,VASP
cut = 10 Ry and a kpoint at (1
4
1
4
1
4).The parameters to generate the pseudo potential
are listed in the appendix. The results are presented in Fig. 3.11. The fluctuations of both data sets is due
to the varying numbers of G+kpoints at different volumes. S/PHI/nX and VASP show the same scaling
behavior with respect to the system cell size. The execution speeds are comparable.
In Figs. 3.12 and 3.13 the run-time performance results for computing the total energy of cubic AlN bulk
are depicted to analyze different aspects of the scaling behavior with respect to the number of atoms. As
a measure, the time that is necessary to obtain the Born-Oppenheimer surface up to a reasonable accuracy
of E=1·108His taken to compute AlN with 2 atoms (1x1x1 fcc cell), 8 atoms (1x1x1 sc cell), 16
atoms (2x2x2 fcc cell), 54 atoms (3x3x3 fcc cell), 64 atoms (2x2x2 sc cell), 128 atoms (4x4x4 fcc cell), and
216 atoms (3x3x3 sc cell). This accuracy is required to obtain forces that are accurate enough to derive
thermodynamic properties. This benchmark has been conducted with three settings: (a) identical energy
112
CHAPTER 3. S/PHI/NX 3.4. COMPARISON WITH VASP
050 100 150 200
natoms
0e+00
1e+04
2e+04
3e+04
4e+04
Total Time (sec.)
S/PHI/nX
VASP
Figure 3.12: Scaling behavior of S/PHI/nX vs. VASP with respect to the number of atoms (AlN bulk). In
order to compare the results, the calculations with both S/PHI/nX and VASP have been performed with
the same settings for the energy cut-offEcut = 40 Ry and the FFT meshes.
050 100 150 200
natoms
0e+00
1e+04
2e+04
3e+04
4e+04
Total Time (sec.)
S/PHI/nX blocking, full mesh
VASP (29 Ry)
(a) Optimized parameters
050 100 150 200
natoms
0e+00
1e+04
2e+04
3e+04
4e+04
Total Time (sec.)
S/PHI/nX blocking, red. mesh
VASP (29 Ry)
VASP (22 Ry)
(b) Settings for fast but less accurate calculations
Figure 3.13: Performance comparison S/PHI/nX vs. VASP by means of AlN bulk using realistic parameter
settings for both codes.
cut-offs and FFT mesh sizes, (b) typical settings for each code, and (c) optimized settings for each code to
perform fast but less accurate calculations.
In Fig. 3.12 the benchmark results obtained with both codes and identical settings for Ecut and the FFT mesh
size nx×ny×nzare displayed. It can be seen that both codes show similar results. The fact that S/PHI/nX
is 25% faster than VASP to reach the given accuracy is likely due to the generalized eigenvalue problem of
PAW which requires an additional computational effort. The test results indicate, that the caching technique
which is necessary for applying the non-local projectors in S/PHI/nX are performing efficiently.
The previous benchmarks focused only on testing various technical aspects of the S/PHI/nX approach. From
a user’s point of view, however, only the actual execution speed is interesting. Therefore, in Fig. 3.13(a)
the same set of calculations have been repeated with optimal settings for both codes, respectively. The
VASP calculations have been performed with an optimal energy cut-offfor this system of 29 Ry43. The FFT
43This value is suggested by the VASP potential file.
113
3.5. CONCLUSIONS CHAPTER 3. S/PHI/NX
and augmentation meshes have been chosen to yield efficient and accurate results44. The norm-conserving
pseudo potentials did not allow to decrease the energy cut-offbelow 40 Ry for the S/PHI/nX benchmark. In
order to achieve optimal performance of S/PHI/nX for this test, full blocking (see Sec. 3.1.3 and 3.3.1) was
enabled in the electronic minimization. It can be seen that S/PHI/nX is able to reach the same accuracy
30% faster than VASP in this example.
VASP provides the possibility to perform very fast but slightly less accurate calculations by reducing the
energy cut-offto a minimum value and to use slightly under-sampled FFT meshes which leads then to minor
wrap-around errors (see Sec. 2.1.1 on page 45). The benchmark results with reduced meshes are shown in
Fig. 3.13(b). The VASP calculations have been performed with the minimum value of Ecut = 22 Ry while
S/PHI/nX uses again 40 Ry. Both sets of calculations performed with S/PHI/nX and VASP applied the
reduced FFT meshes. This test gives almost identical run-time performances of both codes.
In the last test we compute the thermodynamic properties α(T)and Cp,V (T)of GaAs bulk as representative
system with S/PHI/nX and compare the obtained results with VASP. The computed data are plotted in
Fig. 3.14. The computational details (Ecut,k-point sampling, information about potential generation) can
be found in the appendix. In the region of interest (T<300 K) both methods yield almost identical results.
Only in the high temperature regime of T>1000 K the slope of the linear expansion coefficients is slighly
smaller (α2·107K1) with our S/PHI/nX calculations. For this work we can, therefore, neglect the
negative aspects of pseudoization. Note that with the same accuracy settings VASP’s PAW calculation was
30% slower than the norm-conserving pseudopotential simulation performed with S/PHI/nX at a higher
energy cut-off.
The performed benchmarks show that the highly abstract meta-language of S/PHI/nX which provides Dirac-
notation, automatic blocking and memory management, can be used to develop a highly optimized and
accurate DFT program package. In S/PHI/nX, the compiler “understands” the quantum mechanical context
and can replace the required optimized algebraic operations with peak performance function calls. While in
conventional program packages the human developer optimizes only some critical routines the S/PHI/nX
approach shifts this task to the compiler and all routines are optimized. Optimization in S/PHI/nX focuses
mainly on blocking techniques in order to exploit modern computer hardware. Since no or only little manual
optimization is needed in S/PHI/nX the source code remains very short and intuitive.
It can be concluded that many tasks which had to be programmed manually can now be automatically
handled during the compilation without loosing executional performance. While the developers can focus
on implementation of quantum mechanical algorithms based on the S/PHI/nX Dirac notation the library
performs tedious and error-prone tasks automatically. The above benchmarks show that S/PHI/nX is able
to perform this mapping at least as efficiently as human developers.
3.5 Conclusions
In this chapter we discussed the ideas and concepts behind the S/PHI/nX density functional theory program
package. Based on various new programming techniques an introduction of the Dirac notation to the C++
language became possible. By exploiting object-orientation we were able to mimic the building blocks of
quantum mechanics in terms of C++ classes. The fundamental elements of this construction are Dirac
vectors, Dirac basis-sets, Dirac wave functions. The hierarchy is “glued” with Dirac projectors. Since in
44using VASP parameters: ALGO=fast, ADDGRID=false, PREC=medium
114
CHAPTER 3. S/PHI/NX 3.5. CONCLUSIONS
0250 500 750 1000 1250 1500
T (K)
0
2e-06
4e-06
6e-06
8e-06
α(1/Κ)
S/PHI/nX
VASP
melting point
(a) linear expansion coefficient α(T)
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
S/PHI/nX
PAW
melting point
(b) specific heat Cp(T)
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
CV (kB)
S/PHI/nX
VASP
melting point
(c) specific heat CV(T)
Figure 3.14: Comparison of the (a) linear expansion coefficient and (b) specific heat computed with norm-
conserving pseudo potential plane-wave (PW-PS) with our S/PHI/nX code and with PAW using VASP. The
results obtained with S/PHI/nX reflect almost perfectly the VASP results. Only in the high temperature
regime (T>1000 K) the data computed with S/PHI/nX and VASP differ slightly (α(T= 1500) =
2·107K1and Cp(T= 1500) = 0.03 kB).
115
3.5. CONCLUSIONS CHAPTER 3. S/PHI/NX
the new ansatz the compiler is able to “understand” the quantum-mechanical context, it can replace the
abstract Dirac terms with highly optimized numerical routines. In contrast to conventional programming,
where human developers optimize only crucial parts of the program package, now the compiler performs
an automatic code optimization throughout the entire package. Here, optimization refers to an automatic
application of blocking algorithms and mapping to efficient function calls of high performance numeric
libraries. It must not be confused with the intrinsic code optimization (e.g. “-O2”). The application of the
Dirac notation in the source code has also significant advantages for the developer. Developing can now be
accomplished in a physics language. Common issues of programming, such as memory management, calling
unhandy functions from numeric libraries like BLAS or LAPACK are entirely shifted to the compiler, which
speeds up the procedure of code developing, testing, and maintaining. The resulting code is dramatically
shorter than usual packages. Using this approach we were able to derive the DFT Hamiltonian in only 550
code lines. The huge degree of flexibility could be combined with automatic peak performance. The entire
library is prepared to work with other basis-sets.
116
Chapter 4
Applications
4.1 Introduction
In this section we apply the S/PHI/nX program package to compute thermodynamic properties such as
the phonon dispersion curves ωi(q), the linear expansion coefficient α(T),and the heat capacity Cp,V (T)of
III-V semiconductors. This semiconductor class is nowadays important for a manifold of applications. They
play a major role when building up electronic and opto-electronic devices, including lasers, LEDs in the blue
and UV regions of the spectra. It is important to understand their physical properties during the growth
process as well as while they operate. Depending on the growth conditions their fabrication occurs at higher
temperatures (e.g. growth of GaN at 950 K [117]), while they operate usually at room temperature.
As mentioned earlier (see p. 13) the derivation of trends is one of the fundamental tasks of CMD. In order to
study thermodynamic trends in this work we investigate the most frequently applied III-V semiconducting
systems which can be built up from combinations of neighboring elements in the periodic system: Al, Ga, In
with N, P, and As, see Fig. 4.1. From these elements a matrix of the following 9 important semiconductors
can be systematically investigated: AlAs, GaAs, InAs, AlP, GaP, InP, AlN, GaN, and InN which can act
as a basis for deriving important thermodynamical trends. These systems crystallize in the wurzite and/or
zincblende phase. In the zincblende phase many of them1are reported to exhibit a thermal expansion
anomaly, i.e., up to a critical temperature the lattice parameter decreases with increasing temperatures.
1For the nitrides in the zincblende phase no experimental data for α(T)were found.
Figure 4.1: Location in the periodic systems and electronic configuration of the involved elements to inves-
tigate the following systems: AlAs, GaAs, InAs, AlP, GaP, InP, AlN, GaN, InN.
117
4.2. THERMODYNAMIC PROPERTIES CHAPTER 4. APPLICATIONS
From the technological point of view the exact knowledge of this anomaly is important since it determines,
e.g., the thermal lattice mismatch between substrate and semiconductor. It will be shown in the following
that for most of the above mentioned systems the experimentally available data on the linear expansion
coefficients scatter significantly (in particular for AlAs, InAs, GaP and InP).
Although in literature thermodynamic properties have been calculated from first-principles there is only
little known about the influence of the XC functional. The previously performed theoretical studies applied
mainly LDA. In this chapter we present our results with both the LDA and the GGA-PBE functional in
order to roughly estimate how the choice of the XC functional influences the accuracy of the computed α(T)
and Cp,V (T).
The details about the applied pseudo potentials are given in the appendix. For Ga and In the semicore d-
states can be treated either as core or as valence states in the pseudo potential approach. In Refs. [118, 119] it
has been shown that an explicit treatment of the d-semicore states in the valence improves the structural and
cohesive properties (such as a0and Eb) significantly. Therefore, we expect that the choice of the treatment
of the delectron influences also the description of ω,α(T),and Cp,V (T).Following Ref. [119] we constructed
the pseudo potentials GaNLCC,InNLCC which employ non-linear core correction [120] as well as the pseudo
potentials Ga3d,In4d which include the 3d/4d electrons in the valence part. Following Ref. [119] for these
two systems we also apply non-local projectors for the s, p, d components, and the f component as the local
potential.
4.2 Thermodynamic properties
4.2.1 Convergence aspects
The derivative nature of the thermodynamical properties suggests a severe effect of convergence issues on the
quality of properties computed from first-principles: Cp(T),C
V(T),and γ(T)can be expressed as second
derivatives of the free energy F(T, V )(see Eqs. (2.81), (2.82), and (2.83)) which in turn depends on the
forces via Eqs. (2.70), (2.71), and (2.75). As introduced in Sec. 1.7.1 forces are derivatives of the total
energy surface. Hence, very small numerical errors in the total energy arising from, e.g., an incompleteness
of the basis-set, k-point sampling, or small problems related to the pseudo potentials, will largely influence
the accuracy of the obtained free energy surface. This has been observed for metallic systems [37].
Thus, we first focus on convergence aspects with respect to Ecut and the k-point mesh density. Furthermore,
we estimate the influence of the pseudo potential approach for the achievable accuracy of the calculated
thermodynamic properties in comparison to a PAW description. We demonstrate the general approach
by means of the GaAs bulk system as a representative. For the other systems analogous tests have been
conducted. The obtained converged parameters for the investigated systems are presented in the appendix.
Convergence of bulk properties at T=0 K
A high accuracy of the equilibrium lattice constant a0,the Bulk modulus Band its derivative B#at T=0 K
is crucial as these data provide the reference values for the subsequent temperature-dependent entities. In
Tab. 4.1 they are presented for various energy cut-offs. In order to estimate the intrinsic uncertainty of DFT
that exists due to the treatment of the exchange-correlation potential for the investigated systems, we have
calculated the entities with both LDA and PBE. The comparison with FP-LAPW calculations allows a rough
118
CHAPTER 4. APPLICATIONS 4.2. THERMODYNAMIC PROPERTIES
estimation of the error introduced in the frozen-core approximation and the pseudoization procedure. Above
an energy cut-offof 15 Ry the computed data show excellent agreement (a0.04 Å) with FP-LAPW
data [121] as well as the experiment [122, 123, 124]. Similarly to many other systems LDA tends also for
GaAs bulk to overbind while PBE underbinds slightly.
Ecut a a/a0a0/aexp
0B B/B0B/Bexp B#
(Ry) (Å) (%) (%) (GPa) (%) (%)
(a) Experiment
5.65 [122] 76.9 [123] 4.80 [124]
(b) LDA
10 5.794 +3.07 +2.49 103.28 +29.04 +25.55 0.99
15 5.631 +0.27 -0.34 75.177 +2.50 -2.29 5.12
20 5.622 +0.11 -0.50 72.05 -1.72 -6.72 4.44
25 5.616 +0.00 -0.61 73.88 +0.79 -4.08 4.21
30 5.615 -0.02 -0.62 73.39 +0.13 -4.78 4.36
40 5.615 -0.02 -0.62 74.24 +1.28 -3.57 4.17
50 5.616 +0.00 -0.62 73.29 +0.00 -4.92 4.29
FP-LAPW 5.621 [121] 74.2 [121]
(c) PBE
10 6.008 +4.20 +6.34 92.02 +59.97 +19.67 15.41
15 5.779 +0.23 +2.28 56.42 -1.92 -26.63 5.31
20 5.776 +0.17 +2.23 59.09 +2.72 -23.16 4.25
25 5.768 +0.03 +2.09 58.12 +1.04 -24.41 4.51
30 5.766 +0.00 +2.05 57.66 +0.25 -25.01 4.67
40 5.766 +0.00 +2.05 57.52 +0.00 -25.19 4.68
FP-LAPW 5.74 [125] 59.96 [125]
Table 4.1: Convergence of GaAs bulk properties with resp. to the energy cut-offEcut in comparison with
(a) the experiment, (b) LDA, and (c) PBE: The computed equilibrium lattice constant aand the bulk
modulus Bare compared with the converged reference values at 50 Ry (a0and B0) and the experimental
values. Both have been extrapolated to T=0 K. B#denotes the computed bulk modulus derivative.
Convergence of thermodynamic properties
Based on the computed T=0 K values from the previous subsection the temperature dependence of both the
linear expansion coefficients and the heat capacity can be investigated at various energy-cutoffvalues and
k-point meshes.
When computing the phonon spectra in the direct approach the simulation cells have to be chosen large
enough to properly describe long range spatial interactions, i.e., the phonon branches close to Γ.The second
effect of the cell size is the sampling of the Brillouin zone with exact q-points. For example, by doubling
the unit cell q-points such as X can be sampled exactly2. If the simulation cell is too small, an unphysical
2consider a frozen phonon calculation of a linear atomic chain: the phonon at Xcan be described by constructing a super
cell obtained by repeating the unit cell twice along the x axis and displacing the atom (0,0,0) to (x, 0,0).
119
4.2. THERMODYNAMIC PROPERTIES CHAPTER 4. APPLICATIONS
interaction between the displaced atom and the mirror atoms in the periodically repeated unit cells occurs.
The forces acting on the displaced atoms are artificially screened by those from the displaced image atoms.
This effect reduces the corresponding phonon frequencies. Only if the cell is sufficiently large the artificial
forces originating from the displaced image atoms are screened by the undisplaced atoms and the phonon
frequencies can be obtained correctly. This effect is illustrated in Fig. 4.3. Using a cell of 64 atoms (2x2x2
folding of the conventional cubic cell) a minor reduction of the phonon frequencies can be observed at
1
4Kwhich disappears already at 1
2K.The artificial red shift is for all systems investigated here less than
1 meV and contributes only to a very small part of the q-space. The effect of this minor softening at 1
4K
is estimated to be negligible when integrating over the entire Brillouin zone and does, therefore, not justify
the significantly larger computational costs.
In Fig. 4.4 the influence of the plane-wave energy cut-offon α(T)and Cp(T)is being shown. The heat
capacity (Fig. 4.4b) is surprisingly insensitive compared to the linear expansion coefficient (Fig. 4.4(a)). A
value of about 15 Ry yields already converged results (Cp<0.1 J mol1K1).
In case of the linear expansion coefficients α(T), however, this picture changes. At 15 Ry the differences to
the 35 Ry curve is still significant. In order to obtain a converged absolute value of α(T= 500 K) an energy
cut-offof at least 25 Ry is necessary. That is surprising since according to Eq. (2.80) such a sensitivity would
not be expected compared to Cp(T). However, while the absolute values of the linear expansion coefficients
are still varying, at Ecut >15 Ry the slope of α(T)is already converged. The shift of the absolute values can
be explained by means of the abnormal thermal expansion behavior of GaAs at low temperatures. In the
regime at about T<80 K GaAs shows an abnormal thermal compression. With increasing temperatures
the volume contracts and α(T)<0.Only at approx. T>80 K GaAs shows the usual expansion behavior.
This well-known behavior (see for example Ref. [40]) of the III-V semiconductors in the zincblende phase
can be explained by means of the mode-Grüneisen parameters γ(k).According to Ref. [126] (Fig. 4.2(b and
c)) the energetically lowest transversal-acoustic phonon branches3TA, TA1, and TA2 at Γ,L and X have
all negative mode-Grüneisen parameters. At low temperatures mainly these negative branches are occupied.
With γ<0the entropy decreases with increasing lattice constant and SvibTshows a positive slope [127].
Consequently, the lattice contracts compared to the T=0 K lattice constant. With rising temperatures also
energetically higher lying phonon states with positive γvalues become occupied, which drives a normal
expansion behavior.
Both γand αdepend via Eqs. (2.77), (2.76), (2.83), and (2.80) on the volume dependence dωi/dV which
is due to its derivative nature numerically sensitive. Small numerical inaccuracies can induce inaccuracies
in γwhich consequently introduce inaccuracies in the prediction of the expansion behavior (Eq. 2.80). An
accurate description of the location and magnitude of the thermal anomaly requires thus a high accuracy of
dωi/dV as can be seen in Fig. 4.4a. In contrast to the discussion above (Tab. 4.1) here an energy cut-offof
25 Ry is necessary to obtain the location and magnitude of the thermal anomaly within acceptable accuracy
(α<1·107K1).This can be generalized to all systems with the same expansion behavior: a poor
description of the low temperature limit introduces a shift in the absolute values at high temperatures for
systems with such an abnormal thermal expansion behavior. This is due to a poor sampling of the volume
dependence of ωiand, therefore, the mode-Grüneisen parameters γ(k).
The analysis of the influence of the energy cut-offto α(T)and Cp(T)provides already a rough estimation
of the required energy cut-offbased on the (computationally inexpensive) T=0 K bulk properties. At
the minimum cut-offof 25 Ry the equilibrium lattice constant is already converged to a<0.001 Å and
3The nomenclature of phonon branches we use throughout this work can be seen in Fig. 4.2(a).
120
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
ΓKX ΓL X W L
0
10
20
30
40
ω (meV)
TA
TA
LA
TA1
TO
TA2
LO+TO TO2
TO1
TO
LO
(a) Nomenclature of phonon branches
ΓKX ΓL X W L
0
10
20
30
40
ω (meV)
a=10.2 Bohr
a=10.4 Bohr
a=10.6 Bohr
a=10.8 Bohr
(b) phonon modes at various volumes
ΓXΓL
-1
0
1
2
3
Mode-Gruneisen parameter γι
LA
LO
TO
TA
TA2
TA1
LO
TO1
TO2
LA LA
LO
TO
TA
(c) Mode Grüneisen parameters
Figure 4.2: Phonon modes that occur in zincblende semiconductors, shown by means of GaAs bulk. (a)
Nomenclature of phonon branches. The energetically lowest states are transversal acoustic branches (TA1
and TA2) which become degenerate at X. The transversal optical phonon (TO) is split into TO1 and TO2. (b)
Volume dependence of ωi(q).The phonon frequencies of ωTA (X) and ωTA2(L) increase with increasing lattice
constant. (c) Mode-Grüneisen parameters γ(q)from Ref. [126]. Most of the III-V and II-VI semiconductors
show negative thermal expansion coefficients at low temperatures [40]. That is due to the (flat) TA phonon
modes possessing negative mode-Grüneisen parameters.
B<1% as shown in Tab. 4.1. The first value is crucial when determining α,while the phonon frequencies
are correlated with the bulk modulus B.
In analogy to the above considerations, in Fig. 4.5 the influence of the sampling of the Brillouin-zone on
α(T)and Cp(T)is depicted. As for the energy cut-offalso the number of k-points has a stronger influence
on the absolute values of α(T)than to those of Cp(T).A sampling of 3×3×3k-points yields converged
data with an accuracy of α<1·108K1and Cp<0.1 Jmol1K1.
4.3 Comparison with experiment
4.3.1 Bulk properties at T=0 K
In order to evaluate the free energy surface at T=0 K the structural bulk properties a0, B, and B#have
been computed using the Murnaghan equation of state Eq. (2.76). In Tab. 4.2 the experimental data are
compared with our results for all 9 considered semiconductors obtained from LDA and PBE calculations. For
the investigated arsenides the equilibrium lattice constants are in good agreement with the experimental
121
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
Γ1/2 K
0
2
4
6
8
10
ω (meV)
2x2x2: TA
3x3x3: TA
TA2
TA1
TA
Figure 4.3: Influence of the unit cell size to the long range limit of the acoustic phonon branches TA, TA1,
and TA2 in the interval [Γ,1
2K].The TA branch of GaAs bulk computed with a 64 atom (2x2x2) unit cell
(bold solid curve) shows a minor softening which disappears for the larger simulation 216 atom cells (bold
dashed curve).
0 100 200 300 400 500
T (K)
-4e-06
-2e-06
0e+00
2e-06
4e-06
6e-06
α (1/Κ)
ecut = 5 Ry
ecut = 10 Ry
ecut = 15 Ry
ecut = 20 Ry
ecut = 25 Ry
ecut = 30 Ry
ecut = 35 Ry
0 10 20 30 40
ecut (Ry)
1e-08
1e-07
1e-06
α−α0
(a) linear expansion coefficient α
0 100 200 300 400 500
T (K)
0
5
10
15
20
25
30
Cp (J/molK)
ecut = 5 Ry
ecut = 10 Ry
ecut = 15 Ry
ecut = 20 Ry
ecut = 25 Ry
ecut = 30 Ry
ecut = 35 Ry
(b) specific heat capacity
Figure 4.4: Convergence of (a) the linear expansion coefficient and (b) the specific heat of GaAs with respect
to the energy cut-off. The tests have been performed on a k-point mesh of 3x3x3. The inset in Fig. (a)
shows the convergence of αwith respect to the converged value α0at 35 Ry.
values. Like for many other materials also the lattice constants of the arsenides obtained from LDA follow
the general trend of underestimating the lattice constant slightly while PBE shows the opposite behavior.
The differences to the experimental values are always well below 2.5%. The bulk moduli and their derivatives
are also described well: the deviations to the experimentally obtained bulk moduli for AlAs and InAs are
less than 16%. Only in case of GaAs bulk the PBE value is underestimated by about 25%. However, this
is in good agreement with other norm-conserving GGA pseudo potential simulations: in Ref. [140, 141] the
influence of the chosen GGA functional to the deviation of Bis investigated. In their work a value of 56 GPa
has been obtained which is close to our PBE-value of 57.2 GPa.
In case of the phospides (second row of Tab. 4.2) the prediction of the lattice constant at T=0 K is even
closer to experiment. All deviations are less than 1.7%. As for the arsenides also in case of the phosphides
LDA(PBE) tends to over(under)bind slightly. We obtained bulk moduli with deviations to the experiment
of less than 10%. Please note, that for AlP (zincblende) no experimental data for the bulk modulus was
found. Therefore we compare our AlP results with FP-LMTO calculations.
122
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
0 100 200 300 400 500
T (K)
0
1e-06
2e-06
3e-06
4e-06
5e-06
6e-06
7e-06
α (1/K)
2x2x2
3x3x3
4x4x4
5x5x5
(a) linear expansion coefficient α
100 200 300 400 500
T (K)
5
10
15
20
Cp (J/molK)
2x2x2
3x3x3
4x4x4
5x5x5
(b) specific heat capacity
Figure 4.5: Convergence of (a) the linear expansion coefficient and (b) the specific heat of GaAs with respect
to the k-point density. As expected for a semiconductor only a few kpoints are required to obtain an
accurate Brillouin zone integration. The calculations have been performed with an energy cut-offof 25 Ry.
The linear expansion coefficient requires a Monkhorst Pack folding of at least 3x3x3 while the specific heat
is already well converged with a sampling of 2x2x2.
In the last two rows of Tab. 4.2 our LDA and PBE results for the nitrides have been compared with the
experiment or other theoretical calculations. The computed lattice constants of AlN and GaN-NLCC are
close to the experiment with deviations less than 1.3%. As for the previously discussed systems also here
LDA(PBE) under(over)estimates the lattice constant compared to the experimentally obtained values. In
InN-NLCC, however, the picture is different since both LDA and PBE underestimate the lattice constant.
The poor description of InN using NLCC is well known (see, e.g., Ref. [119]). InN is also particularly sensitive
to the choice of the description of the exchange-correlation potential. For example, the band gap is reported
to be positive (Eg=0.16eV) with LDA/NLCC [142] and negative with LDA/4d [118](ELDA,4d
g=0.40eV)
as well as PBE/4d [118](EPBE,4d
g=0.55eV).
Conclusion
For all investigated systems the convergence parameters that are necessary to obtain accurate thermody-
namical properties have been determined carefully. In order to determine the influence of the exchange-
correlation potentials the convergence analysis has been performed with both LDA and PBE. The general
trend of over(under)estimating the equilibrium lattice constant could be confirmed to be valid for the in-
vestigated systems. With the determined convergence parameters for Ecut and the k-point sampling all
computed lattice constants are in good agreement with the experiment. The deviations are always less than
2.1%. Only the predicted bulk modulus obtained with the PBE functional deviates in case of GaAs and InN
by 25% and 18%, respectively.
4.3.2 Phonon spectra
In this section the computed phonon spectra are presented which are needed to derive the thermodynamic
properties. A particular focus will be on of the exchange-correlation functional as well as the treatment of the
semicore d-states of Ga and In is investigated. In order to compare our results with the literature the phonon
dispersion curves have been calculated at the temperatures of the experiment. To do so, the corresponding
123
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
AlAs GaAs InAs
LDA Exp. PBE LDA Exp. PBE LDA Exp. PBE
a5.628 5.663 [128] 5.749 5.616 5.65 [122] 5.766 5.978 6.058 [129] 6.131
a-0.62 +1.50 -0.62 +2.05 -1.34 +1.19
B73.10 74.4 [130] 65.28 73.29 76.9 [123] 57.52 63.12 59.2 [131] 51.04
B-1.78 -13.96 -4.92 -25.19 +6.22 -15.98
B#4.27 5.0 [130] 4.03 4.29 4.8 [124] 4.68 4.90 6.8[131] 4.50
AlP GaP InP
LDA Exp. PBE LDA Exp. PBE LDA Exp. PBE
a5.413 5.451 [132] 5.452 5.398 5.447 [42] 5.511 5.772 5.866 [40] 5.886
a-0.99 +0.00 -0.91 +1.16 -1.63 +0.34
B88.41 87* [133] 84.98 90.40 87.4 [134] 80.45 77.45 76 [135] 65.39
B+1.59 -2.37 +3.32 -8.63 +1.88 -8.57
B#3.65 4.30 [133] 3.61 3.98 4.5 [136, 134] 3.78 4.76 4.0 [135] 4.52
AlN GaN-NLCC InN-NLCC
LDA Exp. PBE LDA Exp. PBE LDA Exp. PBE
a4.03 4.36 [137] 4.387 4.449 4.5 [138] 4.525 4.83 4.98 [137] 4.92
a-1.32 +0.62 -1.15 +0.55 -3.11 -1.16
B206.84 202 [137] 191.15 214.02 190 [139] 177.02 182.51 137 [139] 146.64
B+2.34 -5.68 +11.22 -7.33 +24.94 +6.59
B#3.77 4.15G[138] 3.89 5.12 4.27G[138] 4.69 3.90 4.43G[118] 4.1
GaN-3d InN-4d
LDA Exp. PBE LDA Exp. PBE
a4.436 4.5 [138] 4.5 4.97 4.98 [137] 5.1
a-1.44 0 -0.04 +2.1
B213.42 190 [139] 195.31 136.79 137 [139] 116.18
B+10.98 +2.72 -0.02 -17.9
B#5.12 4.27G[138] 4.92 4.52 4.43G[118] 4.24
Table 4.2: Computed bulk properties of all investigated systems with LDA and PBE. Lattice constants
agiven in Å,Bulk modulus Bin GPa, and deviations to reference data aand Bin %. For systems
where experimental data are not available, other theoretical data act as a reference: Values labeled with
(*) refer to FP-LMTO-LDA calculation and Gmarks pseudo potential plane-wave GGA results rather than
experimental ones.
volumes V=Vexp have been extrapolated consistently from our LDA/PBE thermal expansion computa-
tions (Eqs. (2.79), (2.80)). The phonon dispersion curves ωi(q)|Vexp have been linearly interpolated [92] from
dispersion curves at different volumes ωi(q)V=V0±3%. In case of structures for which experimental data are
not available we overlayed our spectra with other theoretical data, which are usually performed without
including temperature effects. In this case we calculate our spectra also at T=0 K, i.e., at the equilibrium
lattice constants of LDA and PBE, respectively.
124
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
Arsenides In Fig. 4.6 the phonon dispersion curves of the arsenides are presented. Similar to the study
of Grabowski and co-workers [37] for metals, the frequencies obtained from LDA(PBE) under(over)estimate
the experimental data. As discussed earlier our data show a minor red shift of the acoustic branches in
the long-range limit due to the relatively small 64 atom cell. From Fig. 4.6 it can be seen that there is
a significant discrepancy between our results and experimental data in the optical branches in the vicinity
of the Γpoint. The experiment predicts a splitting of the LO and TO phonon branches while our data
displays a degeneracy. This difference can be explained with the missing Born effective charge tensor in our
calculations (see p. 58).
ΓKX ΓL X W L
0
10
20
30
40
50
ω (meV)
LDA@T=0K
PBE@T=0K
Photo-lum.@T=4K
DFPT-LDA@T=0K
(a) AlAs
ΓKX ΓL X W L
0
5
10
15
20
25
30
35
ω (meV)
LDA@T=0K
PBE@T=0K
Neutron@T=12K
DFPT-LDA@T=0K
(b) GaAs
ΓKX ΓL X W L
0
10
20
30
ω (meV)
LDA@T=80K
PBE@T=80K
x ray@80K
raman@330K
neutron@300K
DFTP-LDA@T=0K
DFPT-LDA@T=0K
(c) InAs
Figure 4.6: Phonon spectra calculated at the theoretical equilibrium lattice constants of LDA and PBE
compared with the experimental data. (a) AlAs. triangles: Raman scattering data at T=4 K [143, 144],
circles: pseudo potential DFPT-LDA [108]. (b) GaAs. open circles: experimental low temperature neutron
scattering data [145] at T=12 K, closed circles: DFPT-LDA [108] (c) InAs. open circles: therm. diffuse X-
ray data at T=80 K[146], boxes: Raman data at room temperature [147], triangles: neutron scattering data
at room temperature [146] , DFPT-LDA taken from Ref. [148] (solid circles) and from Ref. [149] (triangles
up).
For AlAs only experimental data of the phonon spectra at X and Γare available [143, 144]. We added also
a computed spectrum obtained in another study employing pseudo potentials [108]. The experimental data
points have been obtained by low temperature Raman scattering techniques at 4 K while Giannozzi et al.
performed LDA calculations based on Density Functional Perturbation Theory (DFPT) for the T=0 K case.
125
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
Our results for the energetically lower lying acoustic branches, also obtained at T=0 K, are virtually identical
to those obtained by the linear response approach employed by Giannozi. Both theoretical data (DFPT-LDA
and ours) are approx. 2 meV smaller at X compared to the experimental value. Minor differences can also
be seen in the TA branch between Γand K which is related to the finite size of our simulation cell. The
position of the optical branches is slightly underestimated by approx. 1 meV in our calculations compared
to the DFPT-LDA and Raman data.
For the well studied system of GaAs among others Strauch [145] presented phonon spectra based on the
inelastic neutron scattering technique at low temperatures (T=12 K). In order to compare our results with
additional reference data we also added theoretical data based on the linear response method [108] using
LDA as exchange-correlation functional. For all acoustic phonons the experimental data are very close to
our computed LDA and PBE phonon spectra. The DFPT-LDA results are, however, by approx. 1 meV
larger than our LDA data as well as the experiment. This deviation is most probably due to different pseudo
potentials. The high-symmetry path ΓKis, in contrast to AlAs, almost the same as the experiment and
the linear response data. There is a margin of error of about 3 meV between experiment, the DFPT-LDA
and our data which is due to the lack of the LO-TO splitting in our approach.
The large scattering of the experimental data for InAs does not allow for a direct verification of our theo-
retical results. Since we did not find experimental data at low temperatures we extrapolated (see p. 123) our
LDA and PBE results to 80 K which corresponds to the conditions of the X-ray investigations [146]. Again
a discrepancy at ΓKcan be seen which, however, agrees with other theoretical studies ([148], [149]).
Phosphides The phonon spectra of the phosphides are shown in Fig. 4.7. The general shape of the phonon
spectra is similar to that of the investigated arsenides in the zincblende structure due to the same symmetries
(space group F43m).
In Fig. 4.7(a) we compare our LDA and PBE results for AlP with low temperature (5 K) Raman data [143]
at the Γ-point as well as theoretical phonons spectra obtained from DFPT-LDA pseudo potential investiga-
tions [126]. It can be seen that the influence of the chosen exchange-correlation functional to the phonon
spectra is very small for the zincblende AlP system. The DFPT data are virtually the same as ours, only
the long wavelength limit is slightly softer than ours which is due to the finite simulation cell size in our
approach. Also the optical branches are in very good agreement with both data sets, except the expected
lack of the splitting at the Γ-point.
Similarly good results could be obtained for the GaP system. Our data agree very well with the experimen-
tally obtained neutron spectra [150, 153] as well as the previously reported DFT-LDA spectra[148]. Only at
the L point our frequencies are by about 2 meV larger than the experimentally obtained frequencies. Our
data are, however, very close with the DFT-LDA curves from Ref. [148].
For the InP at low temperatures only Raman data at Γobtained at T=4 K are available [154]. The experi-
mental TO data are in good agreement with our results as can be seen in Fig. 4.7(c). For low temperatures
there are no experimental data available. We added results obtained from DFPT-LDA calculations [155]
which are up to 1 meV close to ours. In the long wavelength limit our phonon branches appear slightly
harder. Besides the T=4 K Raman data there are also coherent inelastic neutron scattering experimen-
tal data [153] available at room temperature. We have, therefore, evaluated the corresponding theoretical
phonon spectra at T=300 K. Our results indicate that an increase of the temperature leads to a slight red
shift of the phonon branches. The calculated shift is less than 0.5 meV for the acoustic and 1.2 meV for the
optical phonons.
126
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
ΓKX ΓL X W L
0
10
20
30
40
50
60
ω (meV)
LDA@T=0K
PBE@T=0K
DFPT-LDA@T=0K
Raman@T=5K
(a) AlP
ΓKX ΓL X W L
0
10
20
30
40
50
60
ω (meV)
LDA@T=0K
PBE@T=0K
neutron@T=15K
neutron@T=15K
DFPT-LDA@T=0K
Raman@T=300K
(b) GaP
ΓKX ΓL X W L
0
5
10
15
20
25
30
35
40
45
ω (meV)
LDA@T=0K
PBE@T=0K
Raman@T=300K
Neutron@T=4K
DFPT-LDA@T=0K
LDA@T=300K
PBE@T=300K
(c) InP
Figure 4.7: Phonon spectra calculated at the theoretical equilibrium constants of LDA and PBE compared
with the experimental data. (a) AlP. solid boxes: low temperature Raman data [143], circles: DFPT-
LDA [126] (b) GaP. Neutron data at 15 K from [150](circles) and [151] (triangles down), room temperature
Raman data [152] (solid circles), DFPT-LDA [148] (diamonds) (c) InP. solid circles: low temperature
neutron data [153], triangles: room temperature Raman data [154], diamonds: DFPT-LDA [155]
Nitrides Since no experimental data are available for cubic AlN, in Fig. 4.8(a) we compare our results
with another DFT-LDA investigation at T=0 K [156, 157]. Except the missing LO-TO splitting as well
as a slightly harder phonon in the long wavelength limit our data are in good agreement with the other
theoretical work.
In Fig. 4.8(b)-(c) we show the computed GaN phonon spectra evaluated employing with either NLCC or
an explicit description of the 3d electrons as valence (see p. 118). This figure also displays the available
experimental (low temperature Raman [158]) and theoretical (DFPT-LDA [156, 157]) data.
When the semicore states of Ga are treated as core states and taking the non-linear core correction (NLCC)
into account, the bonds appear stronger and the bond distances shorter than all-electron calculations would
suggest (see Ref. [119]). Our data are consistent with this trend: The stronger bonds obtained with LDA-
NLCC and PBE-NLCC result in higher phonon frequencies in comparison to those obtained with LDA-3d
or PBE-3d respectively. There is an increasing red shift of the phonon frequencies for different functionals
in the order LDA-NLCC, LDA-3d, PBE-NLCC, PBE-3d. This behavior can be assigned to the description
127
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
ΓKX ΓL X W L
0
20
40
60
80
100
120
ω (meV)
LDA@T=0K
PBE@T=0K
DFPT-LDA@T=0K
(a) AlN
ΓKX ΓL X W L
0
20
40
60
80
100
ω (mEV)
LDA-nlcc@T=0K
LDA-3d@T=0K
PBE-nlcc@T=0K
PBE-3d@T=0K
Raman@T=300K
DFPT-LDA@T=0K
(b) GaN, T=0K
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA-nlcc@T=0K
LDA-4d@T=0K
PBE-nlcc@T=0K
PBE-4d@T=0K
DFPT-LDA@T=0K
Raman@T=300K
FP-LAPW@T=0K
(c) InN, T=0K
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA@T=300K
LDA-3d@T=300K
PBE-nlcc@T=300K
PBE-3d@T=300K
Raman@T=300K
DFPT-LDA@T=0K
(d) GaN, T=300K
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA-nlcc@T=300K
LDA-4d@T=300K
PBE-nlcc@T=300K
PBE-4d@T=300K
DFPT-LDA@T=0K
Raman@T=300K
FP-LAPW@T=0K
(e) InN, T=300K
Figure 4.8: Phonon spectra at the theoretical equilibrium constants of LDA and PBE compared with the
experimental data. (g) AlN.ab-initio data (theory) from [156, 157], (h) GaN. exp. data from [158], ab-initio
data (theory) from [156, 157], (i) InN.ab-initio data (theory) from [156, 157]
of the band gap with the different exchange-correlation functionals. In Fig. 4.9(b) we show the computed
electronic band structures of GaN with LDA-3d/NLCC and PBE-3d/NLCC. Our PBE pseudo potentials
yield one-particle energies εi(k)very close to those obtained with FP-LAPW-GGA [159]. That indicates
128
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
that the pseudoization has no negative influence to the quality of our obtained electronic band structures.
The experimental band gap is reported to be 3.23 eV [160]. Our computed band gaps are 2.18 eV (LDA-
NLCC), 1.88 eV (LDA-3d), 1.81 eV (PBE-NLCC), and 1.66 eV (PBE-3d). They are close to results obtained
elsewhere (2.20 eV LDA-NLCC [161], 1.89 eV LDA-3d [118], 1.99 eV PBE-NLCC [162], 1.74 PBE-3d [159]).
By formally “improving” the exchange-correlation potential from LDA to PBE and by describing the d-
electrons as valence, the computed band gap deviates further away from the experimental value. This
artificially decreased band gap changes the electronic behavior of the described GaN system to be more metal-
like, i.e., an artificial metallic-like screening effect occurs. The screening reduces the interactions between
the atoms and thus, reduces the phonon frequencies. Therefore, the red shift of the phonon frequencies
correlates with the corresponding band gap (see inset of Fig.4.9a).
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA@T=0K
LDA-3d@T=0K
PBE-nlcc@T=0K
PBE-3d@T=0K
1.6 1.8 2 2.2
Eg (eV)
0
2
4
Δω (meV)
(a) GaN, phonon band structure
WL ΓX W K
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
8
εi (eV)
LDA-nlcc
LDA-3d
PBE-nlcc
PBE-3d
FP-LAPW
Γ
0
1
2
(b) GaN, electronic band structure
Figure 4.9: (a) Phonon dispersion curves as in Fig. 4.8b and (b) electronic band structure of GaN at T=0 K
at the theoretical lattice constants. The top of the valence band has been aligned to the Fermi level (0 eV).
The inset of figure (b) magnifies the area around the direct band gap at the Γpoint. Beside the computed
dispersion curves for LDA-NLCC, LDA-3d, PBE-NLCC, and PBE-3d results obtained from FP-LAPW [159]
using PBE have been added. The shift the phonon frequency ωTA (X) vs. the band gap Egis shown in the
inset of (a).
In Fig. 4.8(d)-(e) our InN results are presented. Since the experimentally obtained Raman spectra have
been measured at room temperature [163] we extrapolated our phonon spectra to the same conditions
(T=300 K) and find a deviation of less than 3 meV. Since the experiment provides only a value at the
Γ-point, we added a DFPT-LDA data set [164] as well as results obtained from FP-LAPW [163]. It can be
seen that our LDA calculation with the 4d valence electrons yields almost the same result as that taken from
Ref. [164]. Our LDA-NLCC and LDA-4d results are in good agreement with both experimental and other
theoretical findings and suggest a good basis for the subsequent thermodynamic calculations. While the LDA
results yield accurate phonon spectra the predicted phonon frequencies obtained with PBE are significantly
overestimated. This is likely related to the PBE functional itself. PBE is known to predict a number of
properties for the InN system poorly. Fuchs et al.[
119] reported even a negative formation enthalpy, i.e.,
an endothermic behavior of InN (Hexp =0.18 eV,HLDA =0.19 eV,HPBE = +0.35 eV). This
well-known shortcoming of PBE in case of zincblende InN leads apparently also to a poor description of
the phonon spectra. InN is also known to be challenging when computing the electronic structure with
both LDA and PBE. In Fig. 4.10 the computed electronic band structure is depicted together with FP-
LAPW/PBE data. It can be seen that our computed band gap of 0 eV corresponds to other PBE data while
129
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
the experimental band gap is 0.7 eV [165].
WL ΓX W K
-14
-12
-10
-8
-6
-4
-2
0
2
4
6
εi (ev)
LDA-4d
PBE-4d
LAPW-PBE
Γ
-2
-1
0
1
εi (ev)
LDA-4d
PBE-4d
LAPW-PBE
Figure 4.10: Electronic band structure of InN computed with LDA and PBE. The 4d semicore states are
treated as valence electrons. The top of the valence band has been aligned to the Fermi level (0 eV). The red
circles show LAPW-PBE data taken from Ref. [166] which are in good agreement with our pseudo potential
results. The inset magnifies the area around the direct band gap at the Γpoint.
Discussion
In this section we presented the computed phonon dispersion curves of the investigated III-V semiconductor
systems and laid out the foundation for computing their thermodynamic properties. For both exchange-
correlation functionals LDA and PBE our obtained acoustic phonon branches are in good agreement with
experimental results as well as other first-principles DFPT-LDA studies. Since in the temperature regime
we are focusing on in this work the acoustic branches are dominant when computing the free energy surface,
our phonon spectra should be a reliable basis for deriving the thermodynamic properties. Also the optical
branches obtained from our first-principles calculations are qualitatively in good agreement with experimental
and other theoretical data.
With a relatively small super cell (2x2x2, fcc cell, 64 atoms) it is possible to reproduce the available phonon
spectra obtained from DFPT-LDA calculations with the direct approach well with deviations of less than
2 meV. The achieved accuracy of the forces4allows to determine the phonon dispersion curves for all inves-
tigated systems with only small deviations to the experiment (typically below 3 meV). For some systems,
however, there are discrepancies between our data and the experimental ones. For example, for AlAs the
TA branch at X is 4 meV above our data, the frequencies at the L point of GaP is underestimated by about
4 meV compared to the neutron scattering data. These deviations are obtained consistently with LDA and
PBE. Our data are, however, qualitatively and quantitatively very close to the DFPT-LDA frequencies at
these points as well as in their vicinities.
There is also a small deviation in the predicted behavior at the vicinity of Γbetween our TA branches on
those obtained with DFPT-LDA. Typically, our branches are slightly steeper than the DFPT-LDA ones.
This effect is, however, small (between Γand K the deviation is always less than 3 meV) and the additional
computational effort of using a 3x3x3-fcc super cell cannot be justified.
Our approach has minor difficulties in the optical branches, though. In our ansatz we neglect the coupling
between phonons and electric fields which is responsible for the degeneracy at the Γpoint. This effect,
4All forces have been converged to a numerical accuracy of Fx,y,z 1e6H/Bohr.
130
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
ΓKX ΓL X W L
0
10
20
30
40
50
ω (meV)
LDA@T=0K
PBE@T=0K
Photo-lum.@T=4K
DFPT-LDA@T=0K
(a) AlAs
ΓKX ΓL X W L
0
5
10
15
20
25
30
35
ω (meV)
LDA@T=0K
PBE@T=0K
Neutron@T=12K
DFPT-LDA@T=0K
(b) GaAs
ΓKX ΓL X W L
0
10
20
30
ω (meV)
LDA@T=80K
PBE@T=80K
x ray@80K
raman@330K
neutron@300K
DFTP-LDA@T=0K
DFPT-LDA@T=0K
(c) InAs
ΓKX ΓL X W L
0
10
20
30
40
50
60
ω (meV)
LDA@T=0K
PBE@T=0K
DFPT-LDA@T=0K
Raman@T=5K
(d) AlP
ΓKX ΓL X W L
0
10
20
30
40
50
60
ω (meV)
LDA@T=0K
PBE@T=0K
neutron@T=15K
neutron@T=15K
DFPT-LDA@T=0K
Raman@T=300K
(e) GaP
ΓKX ΓL X W L
0
5
10
15
20
25
30
35
40
45
ω (meV)
LDA@T=0K
PBE@T=0K
Raman@T=300K
Neutron@T=4K
DFPT-LDA@T=0K
LDA@T=300K
PBE@T=300K
(f) InP
ΓKX ΓL X W L
0
20
40
60
80
100
120
ω (meV)
LDA@T=0K
PBE@T=0K
DFPT-LDA@T=0K
(g) AlN
ΓKX ΓL X W L
0
20
40
60
80
100
ω (mEV)
LDA-nlcc@T=0K
LDA-3d@T=0K
PBE-nlcc@T=0K
PBE-3d@T=0K
Raman@T=300K
DFPT-LDA@T=0K
(h) GaN, T=0K
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA-nlcc@T=0K
LDA-4d@T=0K
PBE-nlcc@T=0K
PBE-4d@T=0K
DFPT-LDA@T=0K
Raman@T=300K
FP-LAPW@T=0K
(i) InN, T=0K
/ Al Ga In
As 2.77 1.07 1.53
P1.15 2.25 3.70
N1.92 4.98 8.20
(j) ratio of atomic masses
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA@T=300K
LDA-3d@T=300K
PBE-nlcc@T=300K
PBE-3d@T=300K
Raman@T=300K
DFPT-LDA@T=0K
(k) GaN, T=300K
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA-nlcc@T=300K
LDA-4d@T=300K
PBE-nlcc@T=300K
PBE-4d@T=300K
DFPT-LDA@T=0K
Raman@T=300K
FP-LAPW@T=0K
(l) InN, T=300K
Figure 4.11: (a)-(i), (k)-(l) Influence of the atomic mass ratio to the phonon dispersion curves. Large mass
ratio (e.g., InP, GaN, InN) lead to large phonon band gaps and flatter optical branches while smaller ratios
(e.g. GaAs, AlP, InAs) lead to small gaps and larger dispersions of the optical branches.
however, can be neglected here since only a very small region in q-space is affected and the red-shifted LO
branches are energetically very high (typically above 80 meV) and will be only thermodynamically excited
close to the melting point. Our data are able to reproduce the typical trends [47, 167] of the phonon spectra:
With increasing mass differences the phonon band gap increases and the optical phonons become flatter (see
Fig. 4.11). Considering a linear di-atomic chain (Fig. 4.12) with masses Mand mthe gap between acoustic
and optical branches is driven by the mass ratio. With larger gaps the optical phonons become flatter [47].
131
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
Figure 4.12: Phonon spectra of an infinite di-atomic linear chain. The chain is formed with atoms of masses
Mand m. If M>ma gap opens at q=π
2.Larger mass differences open the gap more while the optical
phonon branch becomes flatter along the path 0π
2[47].
For systems with relatively small mass ratios (e.g., GaAs (M
m=1.07), AlP (M
m=1.15), InAs (M
m=1.53)
the TO mode shows a oscillatory behavior while for systems with larger mass ratios (e.g., GaP (M
m=2.25),
AlAs (M
m=2.77), InP (M
m=3.70), GaN (M
m=4.98), InN (M
m=8.20) the TO branch becomes flatter. The
obtained frequency range of the optical phonons increases in the order (1) arsenides, (2) phosphides, and (3)
nitrides. Within these material classes the frequencies increase with the choice of the cation in the order In,
Ga, and Al.
For all investigated systems except GaN and InN we obtained phonon dispersions with LDA and PBE which
are very close to each other. We have shown that the applied method is capable of computing accurately
phonon spectra for a wide range of systems if the pseudo potentials are constructed carefully. For GaN
and InN, the choice of the treatment of the d-semicore states (either via NLCC or by explicit treatment
as valence) is crucial. In particular, a good description of the band gap is necessary to obtain accurate
phonon spectra which can be achieved with LDA and PBE-NLCC. InN, which is known to be challenging in
particular for PBE, the phonon spectra deviate significantly which might lead to a substantial error bar in
the computed thermodynamic properties. Following the previous discussion (p. 130) an application of more
advanced (hybrid) functionals would be interesting but would exceed the scope of this work.
4.3.3 Thermal expansion
The exact knowledge of the thermal expansion behavior of semiconductors is technologically very important.
For example, when growing semiconductors on substrates it is crucial to consider the thermal lattice mismatch
between the substrate and the semiconductor. The same applies for semiconductor interfaces. Therefore, in
this section the thermal expansion for the 9 investigated semiconductor systems will be studied. With the
volume derivatives of the phonon spectra dωi(q)/dV the thermal expansion coefficients α(T)can be obtained
via Eqs. (2.67), (2.70), (2.71), and (2.80). Since all phonon dispersion curves have been computed with LDA
and PBE it is possible to estimate their performances when computing linear expansion coefficients of III-
V semiconductors. In Fig. 4.13 we present the obtained expansion coefficients in the temperature range
between 0 K and the melting temperature of the corresponding systems.
In Fig. 4.13(a) the computed linear expansion coefficients (Eq. (2.80)) of AlAs obtained with LDA and PBE
are compared with experimental data [168, 169] as well as another pseudo potential plane-wave LDA simula-
132
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
0 400 800 1200 1600 2000
T (K)
-2e-06
0
2e-06
4e-06
6e-06
8e-06
α (1/K)
LDA
PBE
PSPW
Exp. 1
Exp. 2
melting point
(a) AlAs
0500 1000 1500 2000
T (K)
0
2e-06
4e-06
6e-06
8e-06
1e-05
α (1/K)
LDA
PBE
cap. dilatometry
var. transform.
quartz dilatometry
melting point
(b) GaAs
0 200 400 600 800 1000 1200
T (K)
-2e-06
0
2e-06
4e-06
6e-06
8e-06
α (1/K)
LDA
PBE
quartz dilatometry
var. transform.
melting point
(c) InAs
0500 1000 1500 2000 2500 3000
T (K)
0
3e-06
6e-06
9e-06
α (1/K)
LDA
PBE
experiment (estim.)
melting point
(d) AlP
0 200 400 600 800 1000 1200 1400 1600
T (K)
0
2e-06
4e-06
6e-06
8e-06
α (1/K)
LDA
PBE
Bond’s method
X-ray powder diffr.
Dialatometry
melting point
(e) GaP
0250 500 750 1000 1250 1500
T (K)
-2e-06
0
2e-06
4e-06
6e-06
α (1/K)
LDA
PBE
Bond’s method
theory
melting point
(f) InP
0500 1000 1500 2000 2500 3000
T (K)
0
2e-06
4e-06
6e-06
8e-06
α (1/Κ)
LDA
PBE
melting point
(g) AlN
0500 1000 1500 2000 2500 3000
T (K)
0
5e-06
1e-05
1.5e-05
2e-05
α (1/K)
LDA-nlcc
LDA-3d
PBE-nlcc
PBE-3d
melting point
emp. potentials
(h) GaN, nlcc vs. 3d
0500 1000 1500
T (K)
0
5e-06
1e-05
2e-05
2e-05
α (1/K)
LDA-nlcc
LDA-4d
PBE-nlcc
PBE-4d
melting point
(i) InN, nlcc vs. 4d
Figure 4.13: Temperature dependence of the linear expansion coefficients α(T)of all investigated III-V
semiconductors. We compare results obtained from LDA (solid black lines) with PBE (red dashed lines).
The accuracy of α(T)in the high temperature limit strongly depends on the quality of the description of the
minimum of α(T). Minor error bars in the location or amplitude of the minima leads to significant shifts of
α(T).
tion [170]. There is a substantial scattering of the experimental data sets indicating significant uncertainties
for this material. Furthermore, only data between 0 K and 400 K are available. Our LDA and PBE curves
show virtually identical slopes. The shift of 1·106K1in the high temperature limit can be explained with
the slightly differently pronounced minima of α(T). The location of this minima at 55 K (LDA) and 50 K
(PBE) is in very good agreement with the LDA calculation from Ref. [170] (52 K).
Similarly good results have been obtained for GaAs. As for AlAs the slopes of αLDA(T)and αPBE(T)
are very close to each other. In case of GaAs the description of the location of the minimum as well as
its amplitude is also in good agreement (T<8K). The experimental values, obtained with capacitance
dilatometry [171], variable transformer measurements [172], and quartz dilatometry [173], are always well in
between αLDA(T)and αPBE(T).
133
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
The scattering of the experimental results (quartz dilatometry [174], variable transformer measurements [172])
for InAs reveals uncertainties for this system. Our data predict the location of the minimum with a small
deviation of 9 K between LDA and PBE and agrees very well with the experimental dataset. At higher
temperatures the curvatures of αLDA and αPBE differ slightly. The scattering between the experimental
datasets make a verification of our data difficult.
The second row of Fig. 4.13 depicts the temperature dependencies of the thermal expansion coefficients of the
investigated phosphides. For AlP the graph labeled as experiment is only a “rough estimation” (see [175])
based on an extrapolation of other III-V semiconductor zincblende structures. Its quality is questionable and
can only serve as a check of the order of magnitude. The AlP phonon spectra for LDA and PBE are in very
good agreement which is reflected in the high accuracy of the prediction of the minima at Tc= 72 K/78 K
(LDA/PBE). AlP has a very high melting point at T=2823 K. At such high temperatures unharmonicities
beyond the quasi-harmonic approximation are expected to play an important role [37]. The high temperature
results should, therefore, only be taken as crude approximation.
In contrast to most of the other tetrahedrally bounded III-V semiconductors investigated here, previous
studies predicted that GaP does not exhibit an anomaly in the thermal expansion. Soma et al. [41] reported
first that the magnitude of the TA mode Grüneisen parameters in GaP are too small to generate the anomaly.
Deus and co-workers [42] confirmed Soma’s theoretical prediction using Bond’s method [176]) with a relative
error of 2·105.However, in 1986 Haruna et al. [43] were able to measure a tiny anomaly effect in GaP at
low temperatures with a relative error of only 2·106.In Fig. 4.13(e) it can be seen that our results also
show that there is a shallow minimum of the linear expansion coefficients at Tc= 32 K/42 K (LDA/PBE).
The experimental value of Tc= 38 K lies between our results for TLDA
cand TPBE
c.
InP and GaP show a good agreement with respect to the slope and the location of the minimum. It is
noteworthy that our LDA data are in both cases slightly closer to the experimental values. Our data agree
well with results obtained by Bond’s method [40] as well as a DFPT-LDA result [170].
The last row of Fig. 4.13 presents the nidrides. AlN has the highest melting point of all considered systems
studied in this work. It melts at 3025 K. Our LDA and PBE curves in Fig. 4.13(g) are virtually the same
between 0 K and 750 K. Considering the good agreement of the corresponding acoustic phonon dispersion
curves this is not surprising. At high temperatures (above 1000K) the PBE results have a slightly smaller
slope than the LDA expansion coefficient curve. The deviation at the melting point is less than 0.5·106K1
and thus well within acceptable limits.
The example of the temperature dependence of the thermal expansion coefficients of GaN shows nicely how
the quality of the computed phonon spectra influences the accuracy of α(T)and provides an estimation of
the limits of the applied method. In Fig. 4.14a) and b) the phonon spectra which have been computed earlier
as well as the linear expansion curves are presented. As discussed above in case of LDA the influence of
the choice of the d-semicore treatment (NLCC or with 3d valence electrons) to the phonon spectra is small.
Both ωnlcc
LDA(q)and ω3d
LDA(q)are very close to one another and agree very well with other DFPT-LDA data.
From Fig. 4.14(b) it can be seen that both curves αnlcc
LDA(T)and α3d
LDA(T)are almost identical. Also the
phonon dispersion curves obtained with PBE-NLCC are in good agreement, even though they are slightly
red-shifted. The corresponding linear expansion coefficients αnlcc
PBE(T)respond to this red shift with a larger
slope in the temperature interval between 0 K and room temperature. Above that the slope of αLDA(T)and
αnlcc
PBE(T)are the same with a constant shift of 1.8·106K1.As discussed already above, by improving the
description of the d-semicore states (treating them in the valence) the prediction of the electronic band gap
of GaN gets worse which leads to a significant red shift of the phonon frequencies ω3d
PBE.
134
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
The last investigated system, InN, is known to be challenging for DFT. Above the obtained results for the
phonon spectra for LDA and PBE have been presented. We discussed that PBE has significant shortcomings
when describing InN, in particular, we focused on the electronic band gap and the phonon dispersion curves.
The latter ones are crucial for computing the free energy F(V, T)and thus, the linear expansion coefficients
α(T).In Fig. 4.15(b) we present our data for α(T).It can be seen that LDA-NLCC and LDA-4d yield
almost the same slope, only in the low temperature limit the description of the d-semicore states influences
the location of the anomaly of the temperature dependence of α,with LDA-NLCC we obtain Tc= 78 K
while an explicit treatment of the semicore states in the valence yields Tc= 26 K. The deviation between
αnlcc
LDA(T)and α4d
LDA(T)at the melting point Tm= 1373 Kis less than 0.9·106K1and thus, still within
acceptable limits.
ΓKX ΓL X W L
0
20
40
60
80
100
ω (mEV)
LDA-nlcc@T=0K
LDA-3d@T=0K
PBE-nlcc@T=0K
PBE-3d@T=0K
Raman@T=300K
DFPT-LDA@T=0K
(a) GaN, PBE (nlcc vs. 3d)
0500 1000 1500 2000 2500 3000
T (K)
0
5e-06
1e-05
1.5e-05
2e-05
α (1/K)
LDA-nlcc
LDA-3d
PBE-nlcc
PBE-3d
melting point
emp. potentials
(b) GaN, nlcc vs. 3d
0 200 400 600 800 1000 1200
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA-3d
PBE-3d
melting point
LDA-nlcc
PBE-nlcc
(c) Cp(T), GaN
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
CV (kB)
LDA-3d
PBE-3d
melting point
LDA-nlcc
PBE-nlcc
(d) CV(T), GaN
Figure 4.14: Influence of the explicit treatment of the d-electrons to all investigated properties of GaN. (a)
The obtained phonon spectra of GaN provide an indication about the quality of the description of the linear
expansion coefficients. A poor description of the phonon frequencies suggests also significant problems for
evaluating derived entities, such as α(T). (b) Temperature dependence of the linear expansion coefficients
α(T)of GaN computed with LDA and PBE. The d-semicore states have been treated in the core using
NLCC as well as explicitly in the valence (3d). (c) and (d) show the heat capacities Cp(T)and CV(T),
respectively.
135
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
ΓKX ΓL X W L
0
20
40
60
80
ω (meV)
LDA-nlcc@T=0K
LDA-4d@T=0K
PBE-nlcc@T=0K
PBE-4d@T=0K
DFPT-LDA@T=0K
Raman@T=300K
FP-LAPW@T=0K
(a) GaN, PBE (nlcc vs. 3d)
0500 1000 1500
T (K)
0
5e-06
1e-05
2e-05
2e-05
α (1/K)
LDA-nlcc
LDA-4d
PBE-nlcc
PBE-4d
melting point
(b) InN, nlcc vs. 4d
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA-4d
PBE-4d
melting point
LDA-nlcc
PBE-nlcc
(c) Cp(T), InN
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA-4d
PBE-4d
melting point
LDA-nlcc
PBE-nlcc
(d) CV(T), InN
Figure 4.15: Influence of the explicit treatment of the d-electrons to all investigated properties of InN. (a)
The phonon spectra of InN as well as the (b) linear expansion coefficients versus temperature. (c) and (d)
show the heat capacities Cp(T)and CV(T),respectively. LDA (NLCC and 4d) is able to describe the phonon
band structure of InN better which results on a reasonable description of the thermal expansion behavior
of cubic InN. The huge errors introduced by PBE (both NLCC and 4d) influences the qualitative picture of
the thermal expansion drastically.
Discussion
All investigated systems exhibit a thermal expansion anomaly in the low temperature regime, typically
between 20 K and 80 K. In this low temperature interval most of the systems show negative thermal
expansion coefficients (see explanation above, Sec. 4.2.1). An accurate description of the thermal expansion
coefficients requires very accurate values of dωi/dV. The accuracy of the temperature slope as well as the
location of the minimum is mainly determined by the acoustic phonons (TA, TA1, TA2).
The numerically very sensitive minimum determines whether αLDA(T)or αPBE(T)is the upper or lower
limit. For example, in case of AlAs αLDA(T)<αPBE(T)while for GaAs αLDA(T)>αPBE(T). The
thermal anomaly makes it, therefore, difficult to derive a trend whether LDA under- or overestimates α(T)
of zincblende III-V semiconductors.
136
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
The expansion coefficient curves of GaN and InN which have been obtained with PBE-NLCC or explicit
treatment of the 4d electrons show a qualitatively wrong dependence. This is likely due to the description of
the band gap (see p. 127). The effect on the phonon spectra (and their volume derivatives) induce difficulties
in the description of the linear expansion coefficients of GaN and InN.
4.3.4 Heat capacity
Besides the thermal expansion coefficients this work focuses also on a derivation of the temperature depen-
dence of the heat capacity Cp(T)and CV(T)from first-principles. In case of Cp(T)the theoretical results
can be easily compared with the experimental data, since Cp(T)can be measured directly.
The general shape of the temperature dependence of the heat capacity is determined by two limits. According
to the Debye model at low temperatures CV(T)should scale as T3while in the high temperature region
CV(T)converges to the Petit-Dulong limit 6kB.In between those limits CV(T)is determined by the details
of the atomic vibrations.
In Figs. 4.16, 4.17, and 4.18 the heat capacities of the arsenides, phosphides, and nitrides are presented,
respectively. For zincblende-AlAs,-AlP, and -AlN no experimental data were found which makes a
verification of our results difficult.
In Fig. 4.16c-d we compare our computed Cp(T)as well as CV(T)with both experimental data and other
theoretical data obtained by first-principles methods employing LDA pseudo potentials for GaAs. In the
temperature regime between 0 K and 500 K our results show the same behavior as the experimental data
(taken from Ref. [177], experiments are labeled as in this reference). Due to the large scattering between the
performed experiments a conclusive evaluation of our data is not straightforward. Thus, we compare our
data with other LDA pseudo potential data [178] which are virtually identical to our data.
The picture for InAs is reminiscent to GaAs. The various experimental data sources (taken from Ref. [177],
experiments are labeled as in this reference) differ in their results at high temperatures above 400 K. At
higher temperatures the theoretical Cp(T)curve remains more flat then the experimental ones. Furthermore,
the agreement between LDA and PBE is obtained only up to 600 K.
In Fig. 4.17 we present the theoretically obtained temperature dependencies of the heat capacity at constant
pressure and constant volume for AlP, GaP, and InP, respectively. For all investigated phosphides the
agreement between LDA and PBE is very good, indicating that the error bar introduced by the XC functional
is very small for the heat capacity.
Discussion
A direct comparison of our result with experiment is difficult due to large scattering between various exper-
imental data sets. In Ref. [177] a comprehensive overview on the different challenges in performing accurate
experiments focusing on measuring the heat capacities for GaAs and InAs is presented. Various sources of
error are responsible for a significant discrepancy among various experiments of InAs and GaAs, such as the
choice of the calorimeter container material5, introduction of additional heat when entering the test ampulle
into the container, accounting for thermal dissociation effect. Such issues introduce large uncertainties in
the experimental measurements and must be considered when performing the comparison to theory.
5The Cp(T)curve for GaAs labeled “4: drop calometry” differs significantly from the other graphs probably due to the
choice of Ta as container material. Ta reacts with both GaAs and InAs.
137
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
0500 1000 1500 2000
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA
PBE
melting point
(a) Cp(T), AlAs
0500 1000 1500 2000
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA
PBE
melting point
(b) CV(T), AlAs
0500 1000 1500 2000
T (K)
0
2
4
6
8
Cp (kB)
LDA
PBE
melting point
1: adiab. calometry
2: adiab. calometry
3: "recommended"
4: drop calometry
5: drop calometry
6: adiab. calometry
7: drop calometry
8: adiab. calometry
PSPP
(c) Cp(T), GaAs
0500 1000 1500 2000
T (K)
0
1
2
3
4
5
6
CV (kB)
LDA
PBE
melting point
Exp. 1
Exp. 2
(d) CV(T), GaAs
0 400 800 1200 1600
T (K)
0
2
4
6
8
Cp (kB)
LDA
PBE
melting point
3: "recommend"
4: drop calometry
5: drop calometry
5: drop calometry
6: adiab. calometry
7: drop calometry
(e) Cp(T), InAs
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
CV (kB)
LDA
PBE
melting point
(f) CV(T), InAs
Figure 4.16: Cp(T)and CV(T)for AlAs, GaAs, and InAs.
Also the verification of our data with measurements of the phosphides is not straightforward since measuring
the heat capacities of phosphides is challenging. Phosphorus exhibits an allotropic behavior, i.e., it can exist
in different forms. A famous example of allotropy is carbon which can exist in the diamond and graphite
form. For phosphorus 9 allotropes are known at this time [181, 182, 183, 184, 185]. Since they have different
enthalpies of formation, a misinterpreted form of P leads inevitably to wrong data. These experimental
challenges are responsible for the scattered data in Fig. 4.17.
Qualitatively for all investigated systems except GaN and InN we obtained similar data for Cp(T)and
138
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA
PBE
melting point
(a) Cp(T), AlP
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
CV (kB)
LDA
PBE
melting point
(b) CV(T), AlP
0500 1000 1500 2000
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA
PBE
melting point
Exp. 2
Exp. 3
Exp. 4
(c) Cp(T), GaP
0500 1000 1500 2000
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA
PBE
melting point
(d) CV(T), GaP
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA
PBE
melting point
Exp. 1
Exp. 2
Exp. 3
PSPP
estim.
(e) Cp(T), InP
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA
PBE
melting point
(f) CV(T), InP
Figure 4.17: Cp(T)and CV(T)for AlP, GaP, and InP. AlP that is meta-stable in the zincblende phase no
experimental data were found. The experimental calometry data for GaP have been taken from Ref. [168].
In case of InP the calometric data were taken from Ref. [179]. The InP DFT-LDA pseudo potential data
labeled “PSPP” are taken from Ref. [180].
CV(T)with both XC functionals. The phonon and electronic band structure of cubic GaN and InN can
be only accurately described within LDA. PBE introduces errors in the band gap as well as a significant
red shift of the phonon frequencies. This inaccuracy enters the free energy F(T, V )and hence, the derived
thermodynamic properties such as α(T)as well as Cp(T)and CV(T).In Figs. 4.14 and 4.15 all these entities
are depicted together. The temperature dependencies CLDA,nlcc
p(T)and CLDA,3d
V(T)show qualitatively the
139
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA
PBE
melting point
(a) Cp(T), AlN
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA
PBE
melting point
(b) CV(T), AlN
0 200 400 600 800 1000 1200
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA-3d
PBE-3d
melting point
LDA-nlcc
PBE-nlcc
(c) Cp(T), GaN
0500 1000 1500 2000 2500 3000
T (K)
0
1
2
3
4
5
6
CV (kB)
LDA-3d
PBE-3d
melting point
LDA-nlcc
PBE-nlcc
(d) CV(T), GaN
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
Cp (kB)
LDA-4d
PBE-4d
melting point
LDA-nlcc
PBE-nlcc
(e) Cp(T), GaN
0250 500 750 1000 1250 1500
T (K)
0
1
2
3
4
5
6
7
CV (kB)
LDA-4d
PBE-4d
melting point
LDA-nlcc
PBE-nlcc
(f) CV(T), InN
Figure 4.18: Cp(T)and CV(T)for AlN, GaN, and InN. All systems are described well within LDA while
PBE shows problems in describing the heat capacities of GaN and InN.
same behavior which is due to the good agreement of ωnlcc
LDA(q)and ω3d
LDA(q).
4.3.5 Conclusions
In this chapter the results of the computations of thermodynamic properties have been presented. All
necessary calculations have been performed with the S/PHI/nX package in order to demonstrate that the
abstract and complex S/PHI/nX approach is capable of combining a great degree of development flexibility
140
CHAPTER 4. APPLICATIONS 4.3. COMPARISON WITH EXPERIMENT
with high performance calculations of realistic systems. As benchmark we have chosen the computation of
thermodynamic properties of III-V semiconductors.
We found that for all investigated systems the application of LDA provides a good basis to obtain high
accuracy phonon spectra ωi(q),linear expansion coefficients α(T)as well as heat capacities Cp,V (T). In
case of the two systems GaN and InN, PBE introduces major difficulties when computing thermodynamic
properties. We have demonstrated that even minor deviations in the phonon frequencies introduce problems
in the prediction of derived entities such as α(T)and Cp,V (T).Therefore, we have shown that it is crucial
to perform very thorough convergence tests with respect to the pseudo potential, the energy cut-offand
the k-point sampling with respect to the phonon dispersion curves. By means of GaAs we demonstrated
that deriving thermodynamic properties requires dramatically higher convergence criteria than, e.g., when
computing electronic structures.
The obtained phonon spectra follow the general trends: Larger ratios of atomic masses tend to large phonon
gaps and flatten the optical phonon dispersions. The III-V semiconductors in the zincblende phase have
negative mode-Grüneisen parameters γ(q)which are occupied at low temperatures (typically at T<100 K).
In this regime they show an anomaly in the thermal expansion. At increasing temperatures the crystal
compresses. Above the critical temperature positive mode-Grüneisen parameters dominate and normal
expansion behavior occurs. A theoretical description of this anomaly based on first-principles is extremely
sensitive since it can introduce shifts of α(T)in the high temperature regime. For all systems it was possible
to keep this shift within acceptable limits (α<2·106K1)indicating a very high numerical accuracy of
our data.
141
4.3. COMPARISON WITH EXPERIMENT CHAPTER 4. APPLICATIONS
142
Chapter 5
Conclusions and Outlook
In this work we derived and implemented a new physics meta-language to develop highly efficient programs
in the field of computational materials design (CMD). It simplifies the development process of ab-initio
based multiscale approaches drastically. Our meta-language provides intuitive language elements to express
algebraic equations, quantum mechanical expressions in the Dirac notation, and an efficient representation
of equations of motions. State-of -the-art programming techniques from the field of computer science have
been developed / derived in this work in order to automatically create optimized machine code.
Our meta-language supports the application of Dirac’s notation. Therefore, the “building blocks” of the Dirac
notation, i.e., Dirac vectors, projectors, and operators have been introduced as generic data types (p. 88).
Our concept considers, in particular, future extensions with respect to the implementation of modern basis-
sets and Hamiltonians. Therefore, we derived a technique to support virtual template projector functions
(p. 89). It allows the compiler to determine the quantum mechanical context and thus, to replace complex
expression with highly optimized function calls during the compilation. Benchmarks that compare the run-
time performance of various representative calculations with VASP demonstrate that with our solution an
intuitive Dirac-notation interface can be combined with high executional performance (p. 112).
With the new meta-language complex algebraic or quantum mechanical algorithms can be developed easily
with only rudimental programming skills. However, the high abstraction level requires a smart environment
to identify typical problems during the development process. Therefore, we developed an automatic error
detection mechanism which is capable of identifying typical program inconsistencies in quantum mechanical
or numerical algorithms (see Sec. 3.1.4). With this mechanism quantum mechanical expressions which are
unphysical but syntactically correct can be identified instantly which decreases the time necessary for the
development process drastically.
In order to compute the material properties related to the electronic structure of a system, an efficient library
specialized in electronic minimization to obtain the Born-Oppenheimer surface has been developed (pp. 49).
In the current version of our library we introduced an exponentially converging all-band preconditioned
conjugate-gradient algorithm for semiconducting and insulating systems (Sec. 49). We also introduced
an state-by-state preconditioned conjugate-gradient minimizer which employs DIIS charge density mixing
(pp. 51). This scheme can be applied to systems with empty or partially occupied states, such as metals.
The computation of structural material properties requires an efficient representation of equations of motion.
Therefore, we derived transformation pipelines (Sec. 3.3.2) which allow a separation of the multi-dimensional
minimization schemes from structural constraints or frequency filters. Due to our automatic BLAS/LAPACK
143
CHAPTER 5. CONCLUSIONS AND OUTLOOK
mapping (p. 76) the resulting library to represent atomic structures performs also efficiently for large atomic
systems described by (semi-)empirical potentials. It provides the major representations of atomic structures,
namely the xyz- and the degree of freedom-form. In our approach both forms can be applied simultaneously.
Codes developed in the S/PHI/nX meta-language can be written very transparently and remarkable short
which simplifies the process of code development significantly. For example, the entire DFT Hamiltonian in
S/PHI/nX requires only 550 code lines.
In order to guarantee computational peak performance we developed a highly efficient numeric library
SxMath which is the foundation of the S/PHI/nX project. It provides a functional interface reminiscent
to high-level toolkits like Mathematica. In order to combine an intuitive interface with high executional
speed, performance problems that occur typically in functional approaches (abstraction penalties, Sec. 3.1.3)
had to be addressed. Therefore, we derived new techniques such as S/PHI/nX type mappers (p. 69). They
ensure computationally optimal1data types to allow effective computations on temporary algebraic objects.
With a reference counting technique specialized for algebraic vectors/matrices (p. 71) we are able to replace
procedural with the more transparent functional interface without any performance loss. The often tedious
and error-prone task in High-Performance computing (HPC) of BLAS/LAPACK function call mapping can
be accomplished fully automatically due to generic programming techniques using template classes (p. 68).
With this technique all algebraic expressions are mapped consequently to the available HPC function calls.
Our algebra library “SxMath” is applicable for a wide range of applications which rely on highly efficient
evaluation of algebraic algorithms. It could be shown that our approach improves the run-time performance of
typical algebraic expressions necessary in DFT program packages dramatically (p. 76) compared to standard
numerical toolkits (e.g. Blitz++, Boost).
In the current version of S/PHI/nX we focused, in particular, on heavy usage of blocking algorithms, i.e.,
the formulation of the computationally demanding algorithms in a matrix-matrix form. In this notation ma-
trices can be consequently subdivided such that they fit into the steadily increasing level-caches of modern
computer architectures, which guarantees peak-performance. For the group of investigated III-V semicon-
ductors we conducted benchmarks with the widely applied VASP package. Even for systems which require
larger (converged) energy cut-offs, where the pseudo-potential plane-wave approach is computationally more
demanding than PAW, the high optimization level of S/PHI/nX is comparable and for some systems even
faster than VASP.
By organizing S/PHI/nX as library instead of a single monolithic program package, code fragments can
easily be reused. S/PHI/nX add-ons (Sec. 3.3.3) are small programs (usually less than 50 or 150 lines) which
can be developed in very short times. An add-on has full access to all elements of the above described class
hierarchy. In order to analyze results wave functions can be imported, projected on other basis-sets, densities
of state can be obtained, or complex atomic structures can be generated. For the most common tasks for the
preparation as well as the analysis a set of 50 S/PHI/nX add-ons have been developed. The add-ons have a
common interface so that the output of one add-on can act as input of another. Complex analysis pipe lines
can be easily created by the user. It is worth mentioning that S/PHI/nX has been developed while strictly
obeying various standards, such as ANSI, POSIX, as well as a source code style guide applied in industrial
environments. The code is therefore platform independent and is available on Linux, MacOS X, FreeBSD,
AIX, HPUX, and Windows. The S/PHI/nX testbed ensures reliability during the development phase. Code
consistency is accomplished using a reviewing process.
In the second part of this work it was tested whether the new S/PHI/nX package is sufficiently fast and
1optimal = smallest required accuracy and byte width
144
CHAPTER 5. CONCLUSIONS AND OUTLOOK
accurate to be applied for realistic systems by means of thermodynamic properties of III-V semiconduc-
tors. We could confirm the findings of Ref. [37] that the computation of thermodynamic properties from
first-principles requires high convergence in all parameters well beyond what is commonly needed to T=0 K
properties. In this study calculations within LDA and PBE have been conducted in order to study the
influence of the exchange-correlation potential to the accuracy of the obtained thermodynamic properties.
We investigated the temperature dependencies of the linear expansion coefficients α(T)as well as the heat
capacities Cp(T)and CV(T)in the temperature regime between 0 K and the corresponding melting tempera-
tures. For all investigated systems we found a good agreement with experimental data and we could establish
that the direct approach is a reliable method to obtain thermodynamic properties from first-principles. All
obtained phonon spectra are an good agreement with the experiment (ωi(q)4 meV). Generally, LDA
and PBE provide almost identical phonon dispersion curves. In our study we obtained phonon frequencies
of the acoustical phonons very close or virtually identical to experiments or other theoretical investigations.
The very sensitive temperature dependence of the linear expansion coefficients α(T)have been reproduced
well. The magnitude of the thermal expansion anomaly of the cubic III-V semiconductors could be well
reproduced within LDA. We found that PBE introduces minor deviations in the linear expansion coefficients
and heat capacities of GaN and InN. This is likely related to the description of the electronic structures
of both structures. A further investigation with more advanced exchange-correlation functionals such as
hybrid functionals would be interesting. At high temperatures close to the melting point our computed α(T)
and Cp,V (T)results deviate from the experimental data. Therefore, an investigation which also includes
unharmonic effects would clarify whether the quasi-harmonic approximation fails for these systems in the
high-temperature regime.
The performed calculations make the upcoming next steps clear. The application of norm-conserving pseudo
potentials is computationally too demanding and the treatment of the dsemicore states of Ga and In
introduces problems. Therefore, the implementation of PAW as new basis-set is imperative for the S/PHI/nX
project. Furthermore, our benchmarks have shown that the parallelization of S/PHI/nX is urgently needed
to perform simulations of even larger simulation cells. Due to the required high energy cutoffs in this work
we had to restrict ourselves to a 64 atom cell which introduced a minor phonon softening. For other systems,
however, the influence might be more important and the simulation of larger cells might be necessary. Besides
improved basis-sets also parallelization of S/PHI/nX is, therefore, an urgent item on the list of features that
will be implemented in the upcoming version(s). We believe that the template linkage technique can be
extended such that a semi- or even fully automatic MPI parallelization becomes possible. Similar to the
automatic BLAS/LAPACK mapping or the Dirac-notation we think that this technique can also be used
to map algebraic or quantum mechanical expressions to proper MPI calls during the compilation. The
distribution of data, which determines the performance and scalability of the parallel approach, depends
on the basis-set. Since that is already known to our template linkage approach, a high performance and
automatic MPI parallelization might be possible. First proofs of concepts have indicated that such an
approach can be successful.
With S/PHI/nX we introduce a new flexible development framework and a user-friendly program package
for CMD, that has been successfully applied already to a variety of investigations. S/PHI/nX has been used
to
compute bio-inspired systems systems such as polyalanine alpha and πhelix [186, 187, 188, 189],
crystalline α-chitin [190]
computation of thermodynamic properties of metallic systems, e.g., of Al [191] and Fe [192] up to the
145
CHAPTER 5. CONCLUSIONS AND OUTLOOK
melting points
compute electro-optical properties of quantum dots [193, 194, 195] and quantum wells [196, 197, 198],
investigate material properties of semiconductors, e.g., dislocations in wurtzite GaN [199], applica-
tion of maximally-localized Wannier functions to III-V semiconductors [200] or to semiconductor al-
loys [201], description of nitrogen solubility at GaAs and InAs (001) surfaces [202, 203]), compute
finite-size corrections for charged defect supercell calculations [204], investigate ferromagnetic systems
such as GaMnAs [205]
address the band gap problem of DFT using EXX [165, 206, 207, 208, 209, 210, 211, 212, 213]
introduce an efficient all-band conjugate gradient method for metallic systems [214] and a plane-wave
implementation of the real space k·p formalism and continuum elasticity theory [215]
investigate the role of anharmonic effects on the elasticity of ice [216]
perform atomic-scale spin-polarized scanning microscopy simulations of nonmagnetic metallic sur-
faces [60]
Besides the actual program, the S/PHI/nX C++ library has been used separately to develop new tools:
In Ref. [217, 218] the S/PHI/nX library has been applied to implement tools for the ABINIT project
to simulate STM and STS simulations on magnetic and non-magnetic metallic surfaces.
Based on the efficient numeric libraries of S/PHI/nX a powerful graphic render engine for interactive
scientific visualization could be introduced [219].
The network communication and file i/o libraries have been used to create an efficient generalized
database for multi-physics applications [220].
The S/PHI/nX project is also engaged in simplifying the exchange of data between research groups within the
Ψk-network community. For example, in S/PHI/nX a general file format has been introduced which allows
a comfortable exchange of huge data sets such as wave functions and potentials. Based on the S/PHI/nX
format the Ψk-network introduced the ETSF_IO [221] data exchange format. With this format a standard
for exchanging wave functions and potentials between various plane-wave codes has been established2.
2S/PHI/nX supports currently both file formats, the original S/PHI/nX format “sxb” as well as ETSF_IO.
146
Acknowledgments
A complex project such as S/PHI/nX can only be realized with the support and collaboration of many
people.
I’m grateful to my supervisor Jörg Neugebauer for giving me the opportunity to work on this exciting project.
With patience he introduced me to the field of DFT and method development. He provided perfect working
conditions and created such a nice atmosphere in his group that I could develop S/PHI/nX with great joy.
I would like to thank the S/PHI/nX team for all their contributions: Hazem Abu-Farsakh, Abdullah Al-
Sharif, Alexey Dick, Christoph Freysold, Lars Ismer, Abduallah Qteish, and Matthias Wahn. Here, I thank,
in particular, Christoph Freysold for all those daily discussions and his valuable contributions across the
package and Alexey Dick who spent an immense time in the unpleasant task of bug tracking and his vast
contributions to the stability of the code.
I would like to extend my thanks also the new members of the S/PHI/nX team Vaclav Bubnik, Björn Lange,
Oliver Marquadt, Gernot Pfanner, and Thomas Uchdorf. With the on-going projects they show that the
project continues to prosper.
I am grateful to Lutz Schützenmeister who woke my interest in physics in the first place. The S/PHI/nX
package wouldn’t have none of its beauty without Roland Augst who introduced me to the fundamental
concepts of code design.
For proof reading and valuable discussions I would like to thank Alexey Dick and Tilmann Hickel.
I would like to thank my fiancée for her constant support in all the years of the S/PHI/nX development and
the never-ending time of writing up. I want to thank my parents for waking my interest for natural and
computer sciences. Without their constant effort in motivating me I might not have been able to complete
this project.
147
CHAPTER 5. CONCLUSIONS AND OUTLOOK
148
Appendix A
Computational details
A.1 Pseudo potentials
The pseudo potential have been generated with the following configurations and cut-offradii1(rcut
s,r
cut
p,r
cut
d,r
cut
f)
Al: 3s23p13d0(1.4, 1.5, 1.4)
Ga (nlcc): 4s24p14f0(2.3, 2.4, 2.2, 2.8)
Ga (3d): 3d104s24p14f0(2.3, 2.4, 2.2, 2.8)
In (nlcc): 5s25p15g0(2.3, 2.4, 2.2)
In (4d): 4d105s25p15g0(2.3, 2.4, 2.2, 2.8)
P: 3s23p13d0(1.8, 2,2, 1.9)
As: 4s24p33d0(1.4,1.2, 2.1)
N: 2s22p33d0(1.5, 1.5, 1.5)
In case of NLCC we used partial core densities with cutoffradii 1.3 (Al, Ga), 1.8 (In)
The reference energies for the unbound unoccupied states were set to 2p(N), 3p(Al) eigenvalues, and to 5 eV
(Ga 4d), 0 eV (Ga 4f), 15 eV (In 5d), -30 eV (In 5g)
A.2 Convergence parameters
Ecut (Ry) Ecut (Ry) Ecut (Ry)
Al Ga In
As 30 25 50
P 30 25 45
N 40 55nlcc 753d 55nlcc 4d: 654d
Monkhost-Pack meshes: generating kpoint (1
2
1
2
1
2),folding: (4 ×4×4)
1using atomic units
149
A.2. CONVERGENCE PARAMETERS APPENDIX A. COMPUTATIONAL DETAILS
Displacements for force calculations: d=0.05 Bohr.d=(x, y, z)
Energy convergence parameters: E1e9Ha/atom.
150
Bibliography
[1] Sidney Yip, Handbook of Materials Modeling (Springer, Dordrecht, 2005).
[2] P. Hohenberg, W. Kohn, “Inhomogeneous Electron Gas”, Phys. Rev. 136, B864 B871 (1964).
[3] W. Kohn, L.J. Sham, “Self-Consistent Equations Including Exchange and Correlation Effects”, Phys.
Rev. 140, A1133 A1138 (1965).
[4] W.L. Briggs, Van Emden Henson, S.F. McCormick, A Multigrid Tutorial (SIAM, 2000).
[5] T. Frauenheim, G. Seifert, M. Elsterner, Z. Hajna,l G. Jungnickel, D. Porezag, S. Suhai, R. Scholz,
“A Self-Consistent Charge Density-Functional Based Tight-Binding Method for Predictive Materials
Simulations in Physics, Chemistry and Biology”, phys. stat. sol.(b) 217, 41 62 (2000).
[6] M. Elstner, D. Porezag, G. Jungnickel, J. Elsner, M. Haugk, T. Frauenheim, “Self-consistent-charge
density-functional tight-binding method for simulations of complex materials properties”, Phys. Rev.
B58, 7260 7268 (1998).
[7] T. Frauenheim, G. Seifert, M. Elstner, T. Niehaus, C. Köhler, M. Amkreutz, M. Sternberg, Z. Hajnal,
A. Di Carlo, S. Suhai, “Atomistic simulations of complex materials: ground-state and excited-state
properties”, J. Phys.: Condens. Matter 14, 3015 (2002).
[8] N. L. Allinger, M. A. Miller, D. H. Wertz, J. Am. Chem. Soc. 93, 1637 (1971) 93, 1637 (1971).
[9] N. L. Allinger, J. Am. Chem. Soc. 99, 3279 (1977).
[10] N. L. Allinger, K. Chen, J.-H. Lii, J. Comput. Chem. 14, 642 (1996).
[11] J.-H. Lii, N. L. Allinger, J. Am. Chem. Soc. 111, 8566 (1989).
[12] J.-H. Lii, N. L. Allinger, J. Am. Chem. Soc. p. 8576 (1989).
[13] N. L. Allinger, K. Chen, J.-H. Lii, J. Comput. Chem. 14, 642 (1996).
[14] N. Nevins, K. Chen, N.L. Allinger, J. Comput. Chem. 14, 669 (1996).
[15] N. Nevins, J.-H. Lii, N.L Allinger, J. Comput. Chem. 14, 695 (1996).
[16] N. L. Allinger, K. Chen, J. A. Katzeellenbogen, S. R. Wilson and G. M. Anstead, J. Comput. Chem.
14, 747 (1996).
[17] S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Sing,h C. Chio, G. Alagona, S. Profeta, P. Weiner, J.
Am. Chem. Soc. p. 106 (1984).
151
BIBLIOGRAPHY BIBLIOGRAPHY
[18] W.D. Cornell, P. Cieplak, C.I. Bayly, I.R. Gould, K.M. Merz, D.M. Ferguson, T. Fox, J.W. Caldwell,
P.A. Kollman, J. Am. Chem. Soc. 117, 5179 (1995).
[19] R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States and S. Swaminathan, M. Karplus, J. Comput.
Chem. 4, 1234 (1983).
[20] Achi Brandt, “Multi-Level Adaptive Solutions to Boundary-Value Problems”, Math. Comp 31, 333
(1977).
[21] D. W. Brenner, “Thr Art and Science of an Analytic Potential”, phys. stat. sol. (b) 217, 23 (2000).
[22] O. Rioul, M. Vetterli, “Wavelets and signal processing”, IEEE Signal Processing Magazine p. 14 (1991).
[23] B. Engquist, Z. Huang, “Heterogeneous multi-scale method: a general methodology for multi-scale
modeling”, Phys. Rev. B 67, 092101 (2003).
[24] M. Katsoulakis, A.J. Majda, D. G. Vlachos, “Coarse-grained stochastic processes for lattice systems”,
Proc. Natl. Acad. Sci. U.S.A. 100, 782 (2003).
[25] G. Kresse, J. Furtmüller, Phys. Rev. B 54, 11169 (1996).
[26] G. Kresse, J. Hafner, “Ab initio molecular dynamics for liquid metals.” Phys. Rev. B 47, 558 561
(1993).
[27] G. Kresse, J. Haffner, “Ab initio molecular-dynamics simulation of the liquid-metal-amorphous-
semiconductor transition in germanium”, Phys. Rev. B 49, 14251 14269 (1994).
[28] G. Kresse, J. Furthmüller, “Efficiency of ab-initio total energy calculations for metals and semicon-
ductors using a plane-wave basis set.” Comp. Mat. Sci. 6, 15 50 (1996).
[29] X. Gonze, J.M. Beuken, R. Caracas, F. Detraux, M. Fuchs and G.M. Rignanese, L. Sindic, M. Ver-
straete, G. Zera,h F. Jollet, M. Torrent, A. Roy, M. Mikami, Ph. Ghosez and J.Y. Raty, D.C. Allan,
“First-principles computation of material properties : the ABINIT software project.” Comp. Mat. Sci.
25, 478 492 (2002).
[30] M. Bockstedte, A. Kley, J. Neugebauer, M. Scheffler, Comp. Phys. Comm. 107, 187 (1997).
[31] P.E. Blöchl, “Projector augmented-wave method”, Phys. Rev. B 50, 17953 17979 (1994).
[32] K. Schwarz, P. Blaha, “Solid state calculations using WIEN2k”, Comp. Mat. Sci. 28, 259 273 (2003).
[33] B. Delley, “From molecules to solids with the DMol3 approach”, J. Chem. Phys. 113 (2000).
[34] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria and M.A. Robb, J.R. Cheeseman, J.A. Mont-
gomery, Jr., T. Vreven, K.N. Kudin, J.C. Burant, J.M. Millam, S.S. Iyengar and J. Tomasi, V. Barone,
B. Mennucci, M. Cossi, G. Scalmani and N. Rega, G.A. Petersson, H. Nakatsuji, M. Hada, M. Ehara,
K. Toyota, R. Fukuda, J. Hasegawa, M. Ishid,a T. Nakajima, Y. Honda, O. Kitao, H. Nakai, M. Klen,e
X. Li, J.E. Knox, H.P. Hratchian, J.B. Cross, V. Bakken and C. Adamo, J. Jaramillo, R. Gomperts,
R.E. Stratmann and O. Yazyev, A.J. Austin, R. Cammi, C. Pomelli, J.W. Ochterski, P.Y. Ayala, K.
Morokuma, G.A. Voth, P. Salvador and J.J. Dannenberg, V.G. Zakrzewski, S. Dapprich, A.D. Daniels,
M.C. Strain, O. Farkas, D.K. Malick, A.D. Rabuck and K. Raghavachari, J.B. Foresman, J.V. Ortiz,
Q. Cu,i A.G. Baboul, S. Clifford, J. Cioslowski, B.B. Stefano,v G. Liu, A. Liashenko, P. Piskorz, I.
Komaromi, R.L. Martin and D.J. Fox, T. Keith, M A. Al-Laham, C.Y. Peng, A. Nanayakkara, M.
152
BIBLIOGRAPHY BIBLIOGRAPHY
Challacombe, P.M.W. Gill, B. Johnso,n W. Chen, M.W. Wong, C. Gonzalez, J.A. Pople, “Gaussian
03”, Tech. rep., Gaussian, Inc., Wallingford CT (2004).
[35] P.A.M. Dirac, The Principles of Quantum Mechanics (Oxford University Press, London, 1958).
[36] L. Kleinman, D.M. Bylander, “Efficacious Form for Model Pseudopotentials”, Phys. Rev. Lett. 48,
1425 1428 (1982).
[37] B. Grabowski, T. Hickel, J. Neugebauer, “Ab initio study of the thermodynamic properties of non-
magnetic elementary fcc metals: Exchange-correlation-related error bars and chemical trends”, Phys.
Rev. B 76, 024309 (2007).
[38] D.M. Ceperley, B.J. Alder, “Ground State of the Electron Gas by a Stochastic Method”, Phys. Rev.
Lett. 45, 566 569 (1980).
[39] J.P. Perdew, K. Burke, M. Ernzerhof, Phys. Rev. Lett. 77, 3865 3868 (1996).
[40] P. Deus, H.A. Schneider, U. Voland, K. Stiehler, “Low Temperature Thermal Expansion of InP”, phys.
stat. sol. (a) 103, 443 (1987).
[41] T. Soma, J. Satoh, H. Matsuo, Solid State Commun. 42 42, 889 (1982).
[42] P. Deus, U. Voland, H.A. Schneider, “Thermal Expansion of GaP within 20 to 300K”, phys. stat. sol.
(a) 80, K29 (1983).
[43] K. Haruna, H. Maeta, K. Ohashit, T. Koike, “The negative thermal expansion coefficient of GaP
crystal at low temperatures”, J. Phys. C 19, 5149 (1986).
[44] J.C. Slater, “Wave Functions in a Periodic Potential”, Phys. Rev. 51, 846 851 (1937).
[45] F. Schwabl, Quantenmechanik 1 (Springer Verlag Berlin, Heidelberg, New York, 1988).
[46] K. Schwarz, “DFT calculations of solids with LAPW, WIEN2k”, J. Sol. Stat. Chem. 176, 319 328
(2003).
[47] N. W. Ashcroft, N. D. Mermin, Solid State Physics (Saunders College Publishing, Philadelphia, 1976).
[48] A. Gross, Theoretical Surface Science. A Microscopic Perspective (Springer, Berlin, 2003).
[49] G. Baym, Lectures on Quantum Mechanics (Benjamin/Cummings, Merlo Park, 1973).
[50] R. G. Parr, W. Yang, Density-Functional Theory of Atoms and Molecules (Oxford University Press,
New York, 1989).
[51] M. Born, R. Oppenheimer, “Zur Quantentheorie der Molkeln”, Ann. Phys. 84, 457 (1927).
[52] W. Nolting, Grundkurs: Theoretische Physik, Vol. 5 (Verlag Zimmermann-Neufang, Ulmen, 1992).
[53] D. R. Hartree, Proc. Camb. Phil. Soc. 24, 89 (1928).
[54] V. Fock, Z. Phys. 61, 126 (1930).
[55] A. Szabo, N. S. Ostlund, Modern Quantum Chemistry: Introduction to Advanced Electronic Structure
Theory (McGraw Hill, 1989).
[56] L.H. Thomas, “The calculation of atomic fields”, Proc. Camb. Phil. Soc. 23, 542 (1927).
153
BIBLIOGRAPHY BIBLIOGRAPHY
[57] E. Fermi, “Un metodo statistica per la determinazione di alcune priorieta dell’atomie”, Atti Della Reale
Accademia Nazionale Dei Lincei 6, 602 (1927).
[58] E. Fermi, “Eine statistische Methode zur Bestimmung iniger Eigenschaften des Atoms und ihre An-
wendung auf die Theorie des periodischen Systems der Elemente”, Z. Phys. 48, 73 (1928).
[59] R. M. Dreizler, E. K. U. Gross, Density Functional Theory (Springer, Berlin, 1990).
[60] A. Dick, An-initio STM and STS Simulations on Magnetic and Nonmagnetic Metallic Surfaces, Ph.D.
thesis, University of Paderborn (2008).
[61] M. C. Payne, M. P. Teter, D. C. Allen, T. A. Aria,s J. D. Joannopoulos, “Iterative minimization
techniques for ab-initio total-energy calculations: molecular dynamics and conjugate-gradients”, Rev.
Mod. Phys. 64, 1045 1097 (1992).
[62] H. J. Monkhorst, J. D. Pack, “Special points for Brillouin-zone integrations”, Phys. Rev. B 13, 5188
5192 (1976).
[63] D. J. Chadi, M. L. Cohen, Phys. Rev. B 8, 5747 (1973).
[64] A. Baldereschi, Phys. Rev. B 7, 5212 (1973).
[65] G. Lehmann, M. Taut, Phys. Stat. Sol. B 54, 469 (1972).
[66] M. Methfessel, A. T. Paxton, “High-precision sampling for Brillouin-zone integration in metals”, Phys.
Rev. B 40, 3616 3621 (1989).
[67] Arias, “New Algebraic Formulation of Density Functional Calculation”, Comp. Phys. Comm. 128,1
(2000).
[68] P.P. Ewald, Ann. Phys. 54, 519 (1917).
[69] P.P. Ewald, Ann. Phys. 54, 557 (1917).
[70] P.P. Ewald, Ann. Phys. 64, 253 (1921).
[71] I.N. Bronstein, K.A. Semedjajew, G. Musiol H. Muühlig, Taschenbuch der Mathematik (Verlag Harri
Deutsch, 1993).
[72] U. von Barth, C.D. Gelatt, “Validity of the frozen-core approximation and pseudopotential theory for
cohesive energy calculations”, Phys. Rev. B 21, 2222 2228 (1980).
[73] X. Gonze, R. Stumpf, M. Scheffler, “Analysis of separable potentials”, Phys. Rev. B 44, 8503 8513
(1991).
[74] C.G. van de Walle, P.E. Blöchl, “First-principles calculations of hyperfine parameters”, Phys. Rev. B
47, 4244 4255 (1993).
[75] O.K. Andersen, Phys. Rev. B 12, 3060 (1975).
[76] D.J. Singh, Plane waves, pseudopotentials and the LAPW method (Kluwer Academic Publisher, Bosten,
Dortrecht, London, 1994).
[77] H.L. Skriver, The LMTO Method (Springer-Verlag, 1984).
154
BIBLIOGRAPHY BIBLIOGRAPHY
[78] H.J.F. Jansen and A.J. Freeman, “Total-energy full-potential linearized augmented-plane-wave method
for bulk solids: Electronic and structural properties of tungsten”, Phys. Rev. B 30, 561 569 (1984).
[79] D. Singh, “Ground-state properties of lanthanum: Treatment of extended-core states”, Phys. Rev. B
43, 6388 6392 (1991).
[80] P.E. Blöchl, C.J. Först, J. Schimpl, “The Projector augmented wave method: ab-initio molecular
dynamics with full wavefunctions”, Bull. Mater. Sci. 26, 33 50 (2003).
[81] P.E. Blöchl, J. Kästner, C.J. Först, Electronic structure methods: Augmented Waves, Pseudopotentials
and the Projector Augmented Wave method, Vol. 1 (Springer-Verlag, 2005).
[82] J. C. Slater, G. F. Koster, Phys. Rev. 94, 1498 (1954).
[83] R. Barret, M. Berry, T.F. Chan, J. Demmel, J. Donato and J. Dongarra, V. Eijkhout, R. Pozo, C.
Romine, H. van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative
Methods (SIAM, 1993).
[84] Z. Bai, J. Demmel, J. Dongarra, Templates for the Solution of Algebraic Eigenvalue Problems: A
Practical Guide (SIAM, 2000).
[85] J. K. Cullum, R. A. Willoughby, “Computing eigenvalues of very large symmetric matrices-an imple-
mentation of a Lanczos algorithm with no reorthogonalization”, J. Comp. Phys. 44, 329 (1981).
[86] J. K. Cullum, R. A. Willoughby, Lanczos algorithms for Large Symmetric Eigenvalue Computations.
Volume 1, Theory (Birkhüuser, Boston, 1985).
[87] W.M.C. Foulkes, R. Haydock, “Tight-binding models and density-functional theory”, Phys. Rev. B
39, 12520 12536 (1989).
[88] R. Feynman, “Forces in Molecules”, Phys. Rev. 56, 340 (1937).
[89] A.C. Hurley, “The electrostatic calculation of molecular energies. 1. Methods of calculating molecular
energies”, Proc. R. Soc. London, Ser. A 226, 170 178 (1954).
[90] M. Di Ventra, S.T. Pantelides, “Hellmann-Feynman theorem and the definition of forces in quantum
time-dependent and transport problems”, Phys. Rev. B 61, 16207 16212 (2000).
[91] Ch. Kittel, Introduction to Solid State Physics (Wiley, 1986), 6th edition edn.
[92] W. H. Press, S. A. Teulosky, W. T. Vetterling, B. P. Flannery, Numerical Recipes in C: The art of
scientific computing (Cambridge University Press, 1992), 2nd ed. edn.
[93] A. Williams, J. Soler, Bull. Am. Phys. Soc. 32, 562 (1987).
[94] M.R. Hestenes, E. Stiefel, “Methods of conjugate gradients for solving linear systems”, J. Research
Nat. Bur. Standard 49, 409 436 (1952).
[95] G.W. Pratt, “Wave Functions and Energy Levels for Cu+ as Found by the Slater Approximation to
the Hartree-Fock Equations”, Phys. Rev. 88, 1217 1224 (1952).
[96] P. Pulay, “Convergence acceleration of iterative sequences. the case of scf iteration”, Chem. Phys. Lett.
73, 393 398 (1980).
155
BIBLIOGRAPHY BIBLIOGRAPHY
[97] G. P. Kerker, “Efficient iteration scheme for self-consistent pseudopotential calculations”, Phys. Rev.
B23, 3082 (1981).
[98] D. Raczkowski, A. Canning, L.W. Wang, “Thomas-Fermi charge mixing for obtaining self-consistency
in density functional calculations”, Phys. Rev. B 64, 121101 (2001).
[99] R. Fletcher, Practical Methods of Optimization (Wiley, 1981).
[100] P.v.Rague Schleyer, Encyclopedia of Computational Chemistry (John Wiley and Sons, 1998).
[101] L. Verlet, “Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-
Jones Molecules”, Phys. Rev. 159, 98 103 (1967).
[102] L. Verlet, “Computer "Experiments" on Classical Fluids. II. Equilibrium Correlation Functions”, Phys.
Rev. 165, 201 214 (1968).
[103] S. Nose, “A molecular dynamics method for simulations in the canonical ensemble”, Molec. Phys. 52,
255 268 (1984).
[104] Mermin, Phys. Rev. 137, A1441 (1965).
[105] D. C. Wallace, Thermodynamics of Crystals (Dover Publications, Inc., Mineolta, New York, 1998).
[106] S. Baroni, P. Giannozzi, A. Testa, Phys. Rev. Lett. 58, 1861 (1987).
[107] R. Resta, Festkö rperprobleme: Advances in Solid State Physics, Vol. 25 (Vieweg, Braunschweig, 1985).
[108] P. Giannozzi, S. de Gironcoli, P. Pavone, S. Baroni, Phys. Rev. B 43, 7231 (1991).
[109] R. Heid, K.P. Bohnen, K.M. Ho, “Ab initio phonon dynamics of rhodium from a generalized supercell
approach”, Phys. Rev. B 59, 7407 (1998).
[110] P.Y.Yu, M.Cardona, Fundamentals of Semiconductors: Physcis and Materials Properties (Spinger
Verlag, Berlin, 1996), p. 104.
[111] M. Born, W. Heisenberg, P. Jordan, “Zur Quantenmechanik II”, Z. Phys. 35, 557 615 (1925).
[112] U. Breymann, “Geprüfte Dimensionen”, iX 11, 174 180 (1994).
[113] http://www.boost.org .
[114] X. Gonze, Phys. Rev. A 52, 1086 (1995).
[115] M. Städele, J.A. Majewski, P. Vogl, A. Görling, “Exact Kohn-Sham Exchange Potential in Semicon-
ductors”, Phys. Rev. Lett. 79, 2089 2092 (1997).
[116] M.C. Payne, J.D. Joannopoulos, D.C. Allan, M.P. Tete,r D.H. Vanderbuilt, “Molecular Dynamics and
ab initio Total Energy Calculations”, Phys. Rev. Lett. 56, 2656 2656 (1986).
[117] S.V. Novikov, N.M. Stanton, R.P. Campion, R.D. Morris, H.L. Geen, C.T. Foxon, A.J. Kent, “Growth
and characterization of free-standing zinc-blende (cubic) GaN layers and substrates”, Semicond. Sci.
Technol. 23, 015018 (2008).
[118] C. Stampfl, C. G. Van de Walle, Phys. Rev. B 59, 5521 (1998).
156
BIBLIOGRAPHY BIBLIOGRAPHY
[119] M. Fuchs, J.L.F. da Silva, C. Stampfl, J. Neugebauer, M. Scheffler, “Cohesive properties of group-III
nitrides: A comparative study of all-electron and pseudopotential calculations using the generalized
gradient approximation”, Phys. Rev. B 65, 245212 (2002).
[120] M. Fuchs, M. Scheffler, Comp. Phys. Comm. 119, 67 (1999).
[121] M. Alouani, J.M. Wills, Phys. Rev. B 54, 2480 (1996).
[122] M. Leszczynski, V.B. Pluzhnikov, A. Czopnik, J. Bak-Misiuk and T. Slupinski, J. Appl. Phys. 82, 4678
(1997).
[123] R.I. Cottam, G.A. Saunders, J. Phys. C 6, 2105 (1973).
[124] S.T. Weir, Y.K. Vohra, C.A. Vanderborgh, A.L. Ruoff,Phys. Rev. B 39, 1280 (1989).
[125] H. Arabi, A. Pourghazi, F. Ahmadian, Z. Nourbakhsh, “First-principles study of structural and elec-
tronic properties of different phases of GaAs”, Physica B 373, 16 (2006).
[126] Landolt-Börnstein, Condensed Matter, III/41A1a (2001).
[127] S. Biernacki, M. Scheffler, “Negative Thermal Expansion of Diamond and Zinc-Blende Semiconduc-
tors”, Phys. Rev. Lett. 3, 291 (1989).
[128] B.K. Tanner, A.G. Turnbull, C.R. Stanley, A.H. Kea,n M. McElhinney, Appl. Phys. Lett. 59 59, 2272
(1991).
[129] J.V. Ozolin’sh, G.K. Averkieva, A.F. Ilvin’sh, N.A. Goryunova, Sov. Phys. Cryst. (English Transl.) 7,
691 (1963).
[130] R.G. Greene, H. Luo, A.L. Ruoff,J. Appl. Phys. 76, 7296 (1994).
[131] Y.K. Vohra, S.T. Weir, A.L. Ruoff,Phys. Rev. B 31, 7344 (1985).
[132] V.N. Bessolov, S.G. Konnikov, V.I. Umanskii, Yu.P. Yakovlev, Sov. Phys. Solid State (English Transl.)
24 (1982) 875 24, 875 (1982).
[133] C.O. Rodriguez, R.A. Casali, E.I. Peltzer, O.M. Cappannini and M. Methfessel, Phys. Rev. B 40, 3975
(1992).
[134] A. Polian, J.P. Itie, C. Jaubertie-Carillon, A. Dartyge and A. Fontaine, H. Tolentino, High Pressure
Res. 4, 309 (1990).
[135] C.S. Menoni, I.L. Spain, Phys. Rev. B 35, 7520 (1987).
[136] J.P. Itie, A. Polian, C. Jauberthie-Carillon, E. Dartyge and A. Fontaine, H. Tolentino, G. Tourillon,
Phys. Rev. B 40, 9709 (1989).
[137] R.W.G. Wyckoff,Semiconductors: Data Handbook, 2nd. Edition, Krieger 1986 (Springer, Berlin, 2004),
chap. Crystal Structures.
[138] R. Ahmed, H. Akbarzadeh, Fazal-e-Aleem, “Ab-initio Study of Structural Properties of III- Nitrides”,
in MODERN TRENDS IN PHYSICS RESEARCH: Second International Conference on Modern
Trends in Physics Research MTPR-06 888, 42 (2007).
[139] M.E. Sherwin, T.J. Drummond, “J. Appl. Phys. 69, 8423 (1991).” J. Appl. Phys. 69, 8423 (1991).
157
BIBLIOGRAPHY BIBLIOGRAPHY
[140] A. Garcia, C. Elsässer, J. Zhu, S.G. Louie, M.L. Cohen, Phys. Rev. B 46, 9829 (1992).
[141] A. Garcia, C. Elsässer, J. Zhu, S.G. Louie, M.L. Cohen, Phys. Rev. B 47, 4150(E) (1993).
[142] M. Buongiorno Nardelli, K. Rapcewicz, E. L. Briggs, C. Bun- garo, J.Bernholc, “III-V Nitrides”, in
MRS Symposia Proceedings, F. A. Ponce, T. D. Moustakas, I. Akasaki„ B. A. Monemar, ed., 449, 893
(1997).
[143] A. Onton, “10th Int. Conf. on Physics of Semiconductors”, p. 107 (1970).
[144] Monemar, J. Appl. Phys. 50, 4362 (1979).
[145] D. Strauch, H. Dorner, J. Condens. Matter 2, 1457 (1990).
[146] N.S. Orlova, Phys. Status Solidi (b) 119, 541 (1983).
[147] R. Carles, N. Saint-Cricq, J.B. Renucci, M.A. Renucc,i A. Zwick, Phys. Rev. B 22, 4804 (1980).
[148] Ch. Eckl, P. Pavone, J. Fritsch, U. Schröder, The Physics of Semiconductors (Singapore: World
Scientific, 1996), Vol. 1, p. 229.
[149] T. Pletl, P. Pavone, Ul.E. Dieter Strauch, “First-principles study of lattice-dynamical and elastic trends
in tetrahedral semiconductors”, Physica B 263-264, 392–395 (1999).
[150] J.L. Yarnell, J.L. Warren, R.G. Wenzel, P.J. Dean, Neutron Inelastic Scattering (International Atomic
Energy Agency, Vienna, 1968), p. 301.
[151] P.H. Borcherds, K. Kunc, G.F. Alfrey, R.L. Hall, J. Phys. C Solid State Phys. 12, 4699 (1979).
[152] B. Podor, “Zone Edge Phonons in Gallium Phosphide”, phys. stat. sol. (b) 120, 207 (1983).
[153] P.H. Borcherds, G.F. Alfrey, D.H. Saunderson, A.D.B. Woods, J. Phys. C 8, 2022 (1975).
[154] A. Mooradian, G.B. Wright, Solid State Commun. 4, 431 (1966).
[155] J. Fritsch, P. Pavone, U. Schröder, Phys. Rev. B 52, 11326 (1995).
[156] K. Karch, J.M. Wagner, F. Bechstedt, Phys. Rev. B 57, 7043 (1998).
[157] F. Bechstedt, U. Grossner, J. Fruthmueller, Phys. Rev. B 62, 8003 (2000).
[158] A. Cros, R. Dimitrov, H. Ambacher, M. Stutzmann, S.Christiansen and M. Albrecht, H.P. Strunk, J.
Crystal Growth 181, 197 (1997).
[159] Rashid Ahmed, H. Akbarzadeh, Fazal-e-Aleemaam, “A first principle study of band structure of III-
nitrides compounds”, Physica B 370, 52 (2005).
[160] Properties of Group IV, III-V and II-VI Semiconductors (Wiley, England, 2005).
[161] A. L. da Rosa, Density Functional Theory Calculations on Anti-Surfactants at GaN Surfaces, Ph.D.
thesis, TU Berlin (2003).
[162] R. Miotto, G. P. Srivastava, A. C. Ferraz, Phys. Rev. B 59, 3008 (1999).
[163] A. Tabata, A.P. Lima, L.K. Teles, L.M. Scolfaro, R.R. Leite, V. Lemos, B. Schöttker, T. Frey, D.
Schikora and K. Lischka, Appl. Phys. Lett. 74, 362 (1999).
158
BIBLIOGRAPHY BIBLIOGRAPHY
[164] H.M.T. Tütüncü, G.P. Srivastava, S. Dumana, “Lattice dynamics of the zinc-blende and wurtzite
phases of nitrides”, Physica B 316-317, 190–194 (2002).
[165] P. Rinke, M. Winkelnkemper, A. Qteish, D. Bimber, J. Neugebauer, M. Scheffler, “Consistent set of
band parameters for the group-III nitrides AlN, GaN, and InN”, Phys. Rev. B 77, 075202 (2008).
[166] S. Adachi, Properties of Group-IV, III-V and II-VI Semiconductors (Willey, England, 2005).
[167] K. Karch, F. Bechstedt, T. Pletl, “Lattice dynamics of GaN: Effects of 3d electrons”, Phys. Rev. B
56, 3560 (1997).
[168] H.M. Kagaya, T. Soma, Solid State Commun. 62, 707 (1987).
[169] H.M. Kagaya, T. Soma, Phys. Status Solidi (b) 142, 411 (1987).
[170] A. Debernardi, Solid State Commun. 113, 1 (2000).
[171] T.F. Smith, G.K. White, J. Phys. C 5, 2031 (1975).
[172] P.W. Sparks, C.A. Swenson, Phys. Rev. 163, 779 (1967).
[173] S.N. Novikova, Sov. Phys. Solid State (Engl. Trans.) 3, 129 (1961).
[174] N.N. Sirota, L.I. Berger, Inzh. Fiz. Zhurnal, Akad. Nauk. Beloruss. SSR 2, 104 (1959).
[175] H.G. Grimmeiss, B. Monemar, Phys. Status. Solidi. (a) 5, 109 (1971).
[176] H. Maeta, T. Kat, S. Okuda, J. Appl. Crystallogr. 9, 378 (1976).
[177] V.M. Glazov, A.S. Pashinkin, “Thermal expansion and Heat Capacity of GaAs and InAs”, Inorg.
Materials 36, 289 (2000).
[178] Lu Lai-Yu, Chen Xiang-Rong, Yu Bai-Ru, Gou Qing-Quan, “First-principles calculations for transition
phase and thermodynamic properties of GaAs”, Chinese Physics 15, 802 (2006).
[179] V.P. Vasil’ev, J.-E. Gachon, “Thermodynamic Properties of InP”, Inorg. Materials 42, 1287 (2006).
[180] I. Ansara, C. Chatillon, H.L.Lukas, “A Binary Data Base for III-V Compounds Semiconductor Sys-
tems”, CALPHAD: Comp. Cpupling Phase Diagrams Thermochem. 18, 177 (1994).
[181] J. Jacobs, “Phosphorus at High Temperature and Pressure”, J. Chem. Phys. 5, 945 (1937).
[182] W.S. Holmes, “Heat of Combustion of Phosphorus and the Enthalpies of Formation of P4O10 and
H3PO4”, Trans. Faraday Soc. 58, 916 (1962).
[183] P.A.G. O’Hare, W.N. Hubbard, “Fluorine Bomb Calometry”, Trans. Faraday Soc. 62, 2709 (1966).
[184] P.A.G. O’Hare, B.M. Levis, “Thermodynamic Stability of Orthorhombic Black Phosphoruus”, Ther-
mochim. Acta 129, 57 (1988).
[185] I. Yamaguchi, K. Itogaki, A. Iazawa, “Measurements Heat of Formation of GaP, InP, GaAs, InAs, and
InSb”, mater. Trans. JIM 35, 596 (1994).
[186] L. Ismer, Protonentransport in Wasserstoffbrückenbindungen, Master’s thesis, Technische Universität
Berlin (2002).
159
BIBLIOGRAPHY BIBLIOGRAPHY
[187] L. Ismer, J. Ireta, S. Boeck, J. Neugebauer, “Phonon spectra and thermodynamic properties of the in
polyalanine alpha helix: A density-functional-theory-based harmonic vibrational analysis”, Phys. Rev.
E71, 031911–1 (2005).
[188] L. Ismer, J. Ireta, J. Neugebauer, “First principles free energy analysis of helix stability: The origin of
the low pi-helices”, J. Phys. Chem. B 112, 4109 (2008).
[189] L. Ismer, First principles based thermodynamic stability analysis of the secondary structure of proteins,
Ph.D. thesis, University of Paderborn (2009).
[190] M. Petrov, L. Lymperakis, M. Friak, J. Neugebauer, “Ab-initio based conformational study of the
crystalline alpha-chitin”, to be submitted .
[191] B. Grabowski, L. Ismer, T. Hickel, J. Neugebauer, “Ab initio up to the melting point: Anharmonicity
and vacancies in aluminum”, Phys. Rev. 79, 134106 (2009).
[192] Blazej Grabowski, Ab-initio based free-energy surfaces: Method development and application to alu-
minum an iron, Master’s thesis, University of Paderborn (2005).
[193] O. Marquardt, D. Mourad, S. Schulz, T. Hickel, G. Czycholl and J. Neugebauer, “A comparison
of atomistic and continuum theoretical approaches to determine electronic properties of GaN/AlN
quantum dots”, Phys. Rev. B 78, 235302 (2008).
[194] T. Hammerschmidt, P. Kratzer, M. Scheffler, “Analytic many-body potential for InAs/GaAs surfaces
and nanostructures: Formation energy of InAs quantum dots”, Phys. Rev. B 77, 235303 (2008).
[195] T. D. Young, O. Marquardt, phys. stat. sol. (c) 6, 557 (2009).
[196] M. Albrecht, L. Lymperakis, J. Neugebauer, J.E. Northrup, L. Kirste, M. Leoux, I. Grzegory, S.
Porowski, “Chemically ordered AlxGa1-xN alloys: Spontaneous formation of natural quantum wells”,
Phys. Rev. B 71, 035 314 (2005).
[197] O. Marquardt, T. Hickel, J. Neugebauer, C.G. van de Walle, “Influence of polarization effects due to
thickness fluctuations in nonpolar InGaN/GaN quantum wells”, to be published .
[198] O. Marquardt, T. Hickel, J. Neugebauer, “Polarization-induced charge carrier separation in polar and
nonpolar grown GaN quantum dots”, to be published .
[199] J. Kioseoglou, E. Kalesaki, Ph. Komninou, Th. Karakostas and L. Lymperakis, J. Neugebauer, “Elec-
tronic structure of 1/6<2023> partial dislocations in wurtzite GaN.” to be submitted .
[200] H. Abu-Farsakh, Maximally-localized Wannier functions in III-V Semiconductors, Master’s thesis,
Yarmouk University, Irbid, Jordan (2003).
[201] T. Hammerschmidt, M. A. Migliorato, D. Powell, A. G. Cullis and G. P. Srivastava, “Composition
and Strain Dependence of the Piezoelectric Coefficients in Semiconductor Alloys”, in MRS Proceedings
(2007).
[202] H. Abu-Farsakh, A. Qteish, “Ionicity scale based on the centers of maximally localized Wannier func-
tions”, Phys. Rev. B 75, 085201 (2007).
[203] H. Abu-Farsakh, J. Neugebauer, “Enhancing nitrogen solubility in GaAs and InAs by surface kinetics:
An ab initio study”, Phys. Rev. B 79, 155311 (2009).
160
BIBLIOGRAPHY BIBLIOGRAPHY
[204] C. Freysoldt, J. Neugebauer, C. van de Walle, “Fully ab initio finite-size corrections for charged defect
supercell calculations”, Phys. Rev. Lett. 102, 035702 (2009).
[205] S. Frank, Einfluss der Materialeigenschaften auf den Ferromagnetismus von GaMnAs, Master’s thesis,
University Ulm (2006).
[206] P. Rinke, A. Qteish, J. Neugebauer, C. Freysoldt, M. Scheffer, “Structural phase transformation of
GaN under high-pressure: an exact exchange study”, New J. Phys. 7, 2126 (2005).
[207] P. Rinke, M. Scheffler, A. Qteish, M. Winkelkemper, D. Bimberg, “Band gap and band parameters
of InN and GaN from quasiparticle energy calculations based on exact-exchange density-functional
theory”, Appl. Phys. Lett. 89, 161919 (2006).
[208] “Combining GW calculations with exact-exchange density-functional theory: an analysis of valence-
band photoemission for compound seminconductors”, New J. Phys. 7, 126 (2005).
[209] A. Qteish, A.I. Al-Sharif, M. Fuchs, M. Scheffler, S. Boeck, J. Neugebauer, “Exact-exchange calcula-
tions of the electronic structure of AlN, GaN and InN”, Comp. Phys. Comm. 169, 28 (2005).
[210] Abdallah Qteish, Patrick Rinke, Matthias Scheffler, Joerg Neugebauer, “Exact-exchange based quasi-
particle energy calculations for the band gap, effective masses and deformation potentials of ScN”,
Phys. Rev. B 74, 245 208–1 (2006).
[211] A. Qteish, A.I. Al-Sharif, M. Fuchs, M. Scheffler, S. Boeck, J. Neugebauer, “Role of semicore states
in the electronic structure of group-III nitrides: An exact exchange study”, Phys. Rev. B 72, 155317
(2005).
[212] A.I. Al-Sharif, “Structural phase transformation of GaN under high-pressure: an exact exchange
study”, Sol. Stat. Comm. 135, 515 (2005).
[213] M. Wahn, J. Neugebauer, “Generalized Wannier functions: An efficient way to construct ab-initio
tight-binding parameters for group-III nitrides”, phys. stat. solidi (b) 243, 1583 (2006).
[214] C. Freysoldt, S. Boeck, J. Neugebauer, “Direct minimization technique for metals in density-functional
theory”, Phys. Rev. B .
[215] O. Marquardt, S. Boeck, C. Freysoldt, T. Hickel, J. Neugebauer, “Implementation of the real-space k.p
formalism and continuum elasticity theory in the plane-wave software library S/PHI/nX”, submitted
to Comp. Phys. Comm. (2009).
[216] M. Todorova, L. Ismer, J. Neugebauer, “Role of anharmonic contributions for the elasticity of ice”, in
prep. .
[217] A. Smith, R. Yang, H.Q. Yang, W.R.L. Lambrecht, A. Dick and J. Neugebauer, “Aspects of spin-
polarized scanning tunneling microscopy at the atomic scale: experiment, theory, and simulation”,
Surface Science 561, 154 (2004).
[218] A.R. Smith, R. Yang, H.Q. Yang, A. Dick, J. Neugebauer and W.R.L. Lambrecht, “Recent advances in
atomic-scale spin-polarized scanning microscopy”, Microscopy Research and Technology 66, 72 (2005).
[219] V. Bubnik, Visualization, Modeling of Molecules, Crystals, Master’s thesis, Brno University of Tech-
nology (2008).
161
BIBLIOGRAPHY BIBLIOGRAPHY
[220] T. Uchdorf, Developing a general purpose database application for multiphysics, Master’s thesis, fach-
hochschule Aachen (2008).
[221] D. Calistea, Y. Pouillona, M.J. Verstraetea, V. Olevanoa and X. Gonze, “Sharing electronic structure
and crystallographic data with ETSF_IO”, Comp. Phys. Comm. 179, 748 (2008).
162