Document [original]

Horizon 2020

Reduced Order Modelling, Simulation and Optimization of Coupled systems

Model order reduction for parametric high

dimensional models in the analysis of

financial risk

European Union’s Horizon 2020 research and innovation programme

under the Marie Skłodowska-Curie Grant Agreement No. 765374

Model order reduction for parametric high

dimensional interest rate models in the analysis

of financial risk

Andreas Binder∗Onkar Jadhav †Volker Mehrmann ‡

Abstract

This paper presents a model order reduction (MOR) approach for high dimensional problems in the analysis

of financial risk. To understand the financial risks and possible outcomes, we have to perform several

thousand simulations of the underlying product. These simulations are expensive and create a need for

efficient computational performance. Thus, to tackle this problem, we establish a MOR approach based on a

proper orthogonal decomposition (POD) method. The study involves the computations of high dimensional

parametric convection-diffusion reaction partial differential equations (PDEs). POD requires to solve the high

dimensional model at some parameter values to generate a reduced-order basis. We propose an adaptive greedy

sampling technique based on surrogate modeling for the selection of the sample parameter set that is analyzed,

implemented, and tested on the industrial data. The results obtained for the numerical example of a floater with

a cap and floor under the Hull-White model indicate that the MOR approach works well for short-rate models.

Keywords: Financial risk analysis, short-rate models, convection-diffusion reaction equation, finite difference

method, parametric model order reduction, proper orthogonal decomposition, adaptive greedy sampling,

Packaged retail investment and insurance-based products (PRIIPs).

MSC(2010): 35L10, 65M06, 91G30, 91G60, 91G80

1MathConsult GmbH, Altenbergerstraße 69, A - 4040 Linz, Austria

2Institut f¨

ur Mathematik MA 4-5, TU Berlin, Str. des 17. Juni 136, D-10623 Berlin, Germany

3Institut f¨

ur Mathematik MA 4-5, TU Berlin, Str. des 17. Juni 136, D-10623 Berlin, Germany

Contents

1. Introduction 1

2. Model Hierarchy 3

2.1. BankAccountandShort-Rate .................................. 3

2.2. Vasicek and Cox-Ingersoll-Ross Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3. Hull-WhiteModel......................................... 6

3. Yield Curve Simulation and Parameter Calibration 7

3.1. YieldCurveSimulation...................................... 7

3.2. ParameterCalibration....................................... 9

4. Numerical Methods 12

4.1. FiniteDifferenceMethod..................................... 13

4.2. Parametric Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3. GreedySamplingMethod..................................... 19

4.4. Adaptive Greedy Sampling Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5. Numerical Example 26

5.1. ModelParameters......................................... 26

5.2. FiniteDifferenceMethod..................................... 27

5.3. ModelOrderReduction...................................... 28

5.4. Computationalcost ........................................ 31

6. Conclusion 34

A. Relation between a singular value decomposition and a principal component analysis 39

1. Introduction

Packaged retail investment and insurance-based products (PRIIPs) are at the essence of the retail investment

market. PRIIPs offer considerable benefits for retail investors which make up a market in Europe worth up

to e10 trillion. However, the product information provided by financial institutions to investors can be overly

complicated and contains confusing legalese. To overcome these shortcomings, the EU has introduced new reg-

ulations on PRIIPs (European Parliament Regulation (EU) No 1286/2014) [19]. According to these regulations,

a PRIIP manufacturer must provide a key information document (KID) for an underlying product that is easy to

read and understand. The KID informs about the vital features, such as costs and risks of the investment, before

purchasing the product. The PRIIPs include interest rate derivatives such as the interest rate cap and floor [27],

interest rate swaps [6], etc.

A key information document includes a section about ’what could an investor get in return?’ for the invested

product which requires costly numerical simulations of financial instruments. This paper evaluates interest

rate derivatives based on the dynamics of the short-rate models [9]. For the simulations of short-rate models,

techniques based on discretized convection-diffusion reaction partial differential equations (PDEs) are very

successful [1]. To discretize the PDE, we implemented the finite difference method (FDM) [17]. The FDM has

been proven to be efficient for solving the short-rate models [29, 21, 8]. The model parameters are usually cal-

ibrated based on market structures like yield curves,cap volatilities, or swaption volatilities [9]. The regulation

demands to perform yield curve simulations for at least 10,000 times. A yield curve shows the interest rates

varying with respect to 20-30 time points known as tenor points. These time points are the contract lengths

of an underlying instrument. The calibration based on several thousand simulated yield curves generates a

high dimensional model parameter space as a function of these tenor points. The 10,000 different simulated

yield curves and the calibrated parameters based on these simulated yield curves can be considered as 10,000

different scenarios. We need to solve a high dimensional model (HDM) obtained by discretizing the short-rate

PDE for such scenarios [12]. Furthermore, the results obtained for these several thousand scenarios are used

to calculate the possible values for an instrument under favorable, moderate, and unfavorable conditions. The

favorable, moderate, and unfavorable scenario values are the values at 90th percentile, 50th percentile, and 10th

percentile of 10,000 values, respectively. However, these evaluations are computationally costly, and addition-

ally, have the disadvantage of being affected by the so-called curse of dimensionality [43].

To avoid this problem, we establish a parametric model order reduction (MOR) approach based on a variant of

the proper orthogonal decomposition (POD) method [11, 5]. The method is also known as the Karhunen-Lo´

eve

decomposition [23] or principal component analysis [35] in statistics. The combination of a Galerkin projection

approach and POD creates a powerful method for generating a reduced order model (ROM) from the high di-

mensional model that has a high dimensional parameter space [24]. This approach is computationally feasible

as it always looks for low dimensional linear (or affine) subspaces [44, 51]. Also, it is necessary to note that

the POD approach considers the nonlinearities of the original system. Thus, the generated reduced order model

will be nonlinear if the HDM is nonlinear as well. POD generates an optimally ordered orthonormal basis in the

least squares sense for a given set of computational data. Furthermore, the reduced order model is obtained by

projecting a high dimensional system onto a low dimensional subspace obtained by truncating the optimal basis

called reduced-order basis (ROB). The selection of the data set plays an important role and is most prominently

obtained by the method of snapshots introduced in [46]. In this method, the optimal basis is computed based

on a set of state solutions. These state solutions are known as snapshots and are calculated by solving the HDM

for some pre-selected training parameter values. The quality of the ROM is bounded by the training parameters

used to obtain the snapshots. Thus, it is necessary to address the question of how to generate the set of potential

parameters which will create the optimal ROB. Some of the previous works implement either some form of

fixed sampling or often only uniform sampling techniques [38]. These approaches are straightforward, but they

may neglect the vital regions in the case of high dimensional parameter spaces.

In the current work, a greedy sampling algorithm has been implemented to determine the best suitable parame-

ter set [25, 41, 3]. The basic idea is to select the parameters at which the error between the ROM and the HDM

is maximal. Further, we compute the snapshots using these parameters and thus obtain the best suitable ROB

1. Introduction

which will generate a fairly accurate ROM. The calculation of the relative error between the ROM and the HDM

is expensive, so instead, we use error estimators like the residual error associated with the ROM [10, 50]. The

greedy sampling algorithm picks the optimal parameters which yield the highest values for the error estimator.

Furthermore, we use these parameters to construct a snapshot matrix and, consequently, to obtain the desired

ROB.

However, it is not reasonable to compute an error estimator for the entire parameter space. The error estimator

is based on the norm of the residual, which scales with the size of the HDM. This problem forces us to select

a pre-defined parameter set as a subset of the high dimensional parameter space to train the greedy sampling

algorithm. We usually select this pre-defined subset randomly. But, a random selection may neglect the crucial

parameters within the parameter space. Thus, to surmount this problem, we implemented an adaptive greedy

sampling approach. We choose the most suitable parameters adaptively at each greedy iteration using an opti-

mized search based on surrogate modeling. We construct a surrogate model for the error estimator and use it to

find the best suitable parameters. The use of the surrogate model avoids the expensive computation of the error

estimator over the entire parameter space. The adaptive greedy sampling approach associated with surrogate

modeling has been introduced before in [41, 3]. There are several approaches to design a surrogate model

like regression analysis techniques [37], response surface models, or Kriging models [36]. The authors of [41]

have designed a Kriging based surrogate model to select the most relevant parameters adaptively. However,

in our case, due to the high dimensional parameter space, we may face the multicollinearity problem as some

variable in the model can be written as a linear combination of the other variables in the model [34]. Also, we

need to construct a surrogate model considering the fact that the model parameters are time-dependent. Thus,

in this work, we construct a surrogate model based on the principal component regression (PCR) technique

[37]. The PCR approach is a dimension reduction technique in which explanatory variables are replaced by

few uncorrelated variables known as principal components. It replaces the multivariate problem with a more

straightforward low dimensional problem and avoids overfitting.

In the classical greedy sampling approach, the convergence of the algorithm is observed using the error estima-

tor. However, we can use the norm of the residual to estimate the exact error between the HDM and the ROM.

In this work, we establish an error model for an exact error as a function of the error estimator based on the

idea presented in [41]. Furthermore, we use this exact error model to observe the convergence of the greedy

sampling algorithm.

To summarize, this paper presents an approach to select the most prominent parameters or scenarios for which

we solve the HDM and obtain the required ROB. Thus, instead of performing 10,000 expensive computations,

we perform very few expensive computations and solve the remaining scenarios with the help of the ROM.

The paper illustrates the implementation of numerical algorithms and methods in detail. It is necessary to note

that the choice of a short-rate model depends on the underlying financial instrument. In this work, we focus

on one-factor short-rate models only. We implement the developed algorithms for the one-factor Hull-White

model [31] and present the results with a numerical example of a floater with cap and floor [20]. The current

research findings indicate that the MOR approach works well for short-rate models.

The paper is organized as follows. Section 2 presents a model hierarchy to construct a short-rate model along

with some of the well-known one-factor models. Section 3 is branched into two parts. In subsection 3.1, we

present a detailed simulation procedure for yield curves and, subsequently, subsection 3.2 describes the cali-

bration of model parameters based on these simulated yield curves. In section 4, subsection 4.1 illustrates the

FDM implemented for the Hull-White model, and subsection 4.2 introduces the projection-based MOR tech-

nique for the HDM. The selection of best suitable parameters to obtain the ROB based on the classical greedy

approach is presented in subsection 4.3. To overcome the drawbacks associated with the classical greedy ap-

proach, subsection 4.4 explains the adaptive greedy method with the surrogate modeling technique illustrated

in sub-subsection 4.4.1. Furthermore, sub-subsection 4.4.2 presents a detailed algorithm and its description for

the adaptive greedy approach. Finally, we tested our algorithms using a numerical example of a floater, and the

obtained results are presented in section 5.

2. Model Hierarchy

The management of interest rate risks, i.e., the control of change in future cash flows due to the fluctuations in

interest rates is of great importance. Especially, the pricing of products based on the stochastic nature of the

interest rate creates the necessity for mathematical models. In this section, we present some basic definitions

and the model hierarchy to construct a financial model for any underlying instrument.

2.1. Bank Account and Short-Rate

First we introduce the definition for a bank account also called as a money-market account. When investing

a certain amount in a bank account, we expect it to grow at some rate over time. A money-market account

represents a risk-less investment with a continuous profit at a risk-free rate.

Definition 2.1. Bank account (Money-market account).Let B(t)be the value of a bank account at time

t≥0. We assume that the bank account evolves according to the following differential equation with B(0) = 1,

dB(t) = B(t)rtdt, (1)

where rtis a short-rate. This gives

B(t) = expZt

rsds.(2)

According to the above definition, investing a unit amount at time t= 0 yields the value in (2) at time t, and rt

is the short-rate at which the bank account grows. This rate rtis the growth rate of the bank account Bwithin

a small time interval (t, t +dt). When working with interest-rate products, the study about the variability of

interest rates is essential. Therefore, it is necessary to consider the short-rate as a stochastic process and it

prompts us to construct a stochastic model that will describe the dynamics of the short-rate.

Let Sbe the price of a stock at the end of the nth trading day. The daily return from days nto (n+ 1) is given

by (Sn+1 −Sn)/Sn. In general, it is common to work with log returns, since the log return of kdays can be

easily computed by adding up the daily log returns:

log(Sk/S0) = log(S1/S0) + ···+ log(Sk/Sk−1).

Based on the assumption that the log returns over disjoint time intervals are stochastically independent, and are

equally distributed, the central limit theorem [16] of probability theory implies that the log returns are normally

distributed [13]. So, it is necessary to define a stochastic model in continuous time in which log returns over

arbitrary time intervals are normally distributed. The Brownian motion provides these properties [2].

Definition 2.2. Brownian motion. A standard Brownian motion is a stochastic process W(t)where t∈R,

i.e., a family of random variables W(t), indexed by non-negative real numbers twith the following properties:

•At t= 0,W0= 0.

•With probability 1, the function W(t)is continuous in t.

•For t≥0, the increment W(t+s)−W(s)is normally distributed with mean 0 and variance t, i.e.,

W(t+s)−W(s)∼N(0, t).

•For all nand times t0< t1<··· < tn−1< tn, the increments W(tj)−W(tj−1)are stochastically

independent.

Based on the definition of the Brownian motion, we can establish a stochastic differential equation (SDE).

2. Model Hierarchy

Consider an ordinary differential equation (ODE)

dX(t)

dt =q(t)X(t),(3)

with an initial condition X(0) = X0. When we consider ODE (3) with an assumption that the parameter q(t)

is not a deterministic but rather a stochastic parameter, we get a stochastic differential equation.

In our case, the stochastic parameter q(t)is given as [26]

q(t) = f(t) + h(t)w(t),

where w(t)is a white noise process. Thus, we get

dX(t)

dt =f(t)X(t) + h(t)X(t)w(t).(4)

This equation is known as Langevin equation [39]. Here X(t)is a stochastic variable having the initial condition

X(0) = X0with probability one. The Langevin force w(t) = dW(t)/dt is a fluctuating quantity having

Gaussian distribution. Substituting dW(t) = w(t)dt in (4), we get

dX(t) = f(t)X(t)dt +h(t)X(t)dW(t)(5)

In the general form a stochastic differential equation is given by

dX(t) = f(t, X(t))dt +g(t, X(t))dW(t),(6)

where f(t, X(t)) ∈R, and g(t, X(t)) ∈Rare sufficiently smooth functions. Based on (6), we can define a

stochastic differential equation with the short-rate rtas a stochastic variable as

drt=f(t, rt)dt +g(t, rt)dW(t),(7)

Furthermore, based on Ito’s lemma [53], we can derive a general PDE for any underlying instrument depending

on the short-rate r.

Theorem 2.1. Ito’s Lemma. Let ξ(X(t), t)be a sufficiently smooth function and let the stochastic process

X(t)be given by (8), then with probability one,

dξ(X(t), t) = ∂ξ

∂X(t)f(X(t), t) + ∂ξ

∂t +1

∂2ξ

∂X2(t)g2(X(t), t)dt

+∂ξ

∂X(t)g(X(t), t)dW (t).

(8)

Consider a risk-neutral portfolio Πtthat depends on the short-rate rtand consists of two interest rate instru-

ments V1and V2with different maturities T1and T2respectively. Let there be ∆units of the instrument V2. For

an infinitesimal time interval, the value change of the portfolio is dΠt= ∆dV2−dV1. To avoid the arbitrage,

we have to consider a risk-free rate [1], which gives

dΠt= ∆dV2+ (V1−∆V2)rdt −dV1.

2. Model Hierarchy

Based on Ito’s lemma, we get

dΠt= (V1−∆V2)rtdt

−∂V1

∂rt

f(rt, t) + ∂V1

∂t +1

∂2V1

∂r2

g2(rt, t)dt +∂V1

∂rt

g(rt, t)dW(t)

+ ∆∂V2

∂rt

f(rt, t) + ∂V2

∂t +1

∂2V2

∂r2

g2(rt, t)dt +∂V2

∂rt

g(rt, t)dW(t).

Let ∆ = ∂V1

∂rt∂V2

∂rt. Due to the zero net investment requirement, i.e., dΠt= 0, we obtain

0 = V1−∂V1

∂rt∂V2

∂rtV2rtdt

−∂V1

∂rt

f(rt, t) + ∂V1

∂t +1

∂2V1

∂r2

g2(rt, t)dt +∂V1

∂rt

g(rt, t)dW(t)

+∂V1

∂rt∂V2

∂rt∂V2

∂rt

f(rt, t) + ∂V2

∂t +1

∂2V2

∂r2

g2(rt, t)dt +∂V2

∂rt

g(rt, t)dW(t).

Eliminating the stochastic term, we get

V1−∂V1

∂rt∂V2

∂rtV2rtdt

=∂V1

∂t +1

∂2V1

∂r2

g2(rt, t)−∂V1

∂rt∂V2

∂rt∂V2

∂t +1

∂2V2

∂r2

g2(rt, t)dt.

Rearranging the terms, and using the notation r=rt, we get

∂V1

∂t +1

∂2V1

∂r2g2(r, t)−rV1

∂V1

∂r

∂V2

∂t +1

∂2V2

∂r2g2(r, t)−rV2

∂V2

∂r

Denoting the right side as u(r, t)

∂V1

∂t +1

∂2V1

∂r2g2(r, t)−rV1

∂V1

∂r

=u(r, t),

and using the notation V=V1, we obtain the following PDE for any financial instrument Vdepending on r

∂V

∂t +1

2g2(r, t)∂2V

∂r2−u(r, t)∂V

∂r −rV = 0,(9)

We introduce some well-known one-factor short-rate models in the following subsections.

2.2. Vasicek and Cox-Ingersoll-Ross Models

The following SDE represent two different models depending on the choice of a parameter λ.

drt= (a−brt)dt +σrλdW (t),(10)

where a,b,σ, and λare positive constants. The model proposed in [49] considers λ= 0 and is well known

as the Vasicek model, while the Cox-Ingersoll-Ross model introduced in [14] considers λ= 0.5. One of the

drawbacks of the Vasicek model is that the short-rate can be negative. On the other hand, in the case of the

2. Model Hierarchy

Cox-Ingersoll-Ross model, the square root term does not allow negative interest rates. However, the major

drawback of these models is that the model parameters are constant, so we can not fit the models to the market

structures like yield curves.

2.3. Hull-White Model

The Hull-White model [31, 32] is an extension of the Vasicek model which can be fitted to market structures

like yield curves. The SDE is given as

drt= (a(t)−b(t)rt)dt +σ(t)dW(t),(11)

where now the parameters a(t),b(t), and σ(t)are time-dependent parameters. The term (a(t)−b(t)rt)is a

drift term and a(t)is known as deterministic drift. We can define a PDE for any underlying instrument based

on the Hull-White model depending on r. In the case of the Hull-White model in (9), g(r, t) = σ(t), and

−u(r, t) = (a(t)−b(t)r). Substituting g(r, t)and u(r, t), we get

∂V

∂t + (a(t)−b(t)r)∂V

∂r +1

2σ2(t)∂2V

∂r2−rV = 0,(12)

In the following, we consider the parameter b(t)and σ(t)as constants to have a robust model. Hull and White

supported for making model parameters band σto be independent of time [33]. The problem is that the

volatility term structure in the future could become nonstationary in the sense that the future term structure

implied by the model can be quite different than it is today. Also, the author of [15] quoted that the volatility

in the future may reach zero, which ultimately could result in implausible option prices, thus, we suggest to

consider parameters band σas independent of time. Figure 1 shows the model hierarchy for the Hull-White

model.

SDE for a short-rate model

drt=f(t, rt)dt +g(t, rt)dW(t)

PDE from SDE using Ito’s lemma

∂V

∂t +1

2g2(r, t)∂2V

∂r2−u(r, t)∂V

∂r −rV = 0

Hull-White PDE for any financial instrument V

∂V

∂t + (a(t)−b(t)r)∂V

∂r +1

2σ(t)2∂2V

∂r2−rV = 0

Robust Hull-White model

∂V

∂t + (a(t)−br)∂V

∂r +1

2σ2∂2V

∂r2−rV = 0

Figure 1: A model hierarchy to construct the Hull-White model based on the short-rate rwith constant band

σ.

The following section 3 presents the simulation procedure for yield curves and the calibration of the model

3. Yield Curve Simulation and Parameter Calibration

parameter a(t)based on these simulated yield curves.

3. Yield Curve Simulation and Parameter Calibration

3.1. Yield Curve Simulation

The problem of determining the model parameters is relatively complex. The time-dependent parameter a(t)is

derived from yield curves, which determine the average direction in which the short-rate rmoves. The PRIIP

regulation demands to perform yield curve simulations for at least 10,000 times [19]. We explain the detailed

yield curve simulation procedure in this subsection.

1. Collect historical data for the interest rates.

The data set must contain at least 2 years of daily interest rates for an underlying instrument or 4 years of

weekly interest rates or 5 years of monthly interest rates. Further, we construct a data matrix D∈Rn×mof the

collected historical interest rates data, where each row of the matrix forms a yield curve, and each column is

a tenor point m. The tenor points are the different contract lengths of an underlying instrument. For example,

we have collected the daily interest rate data at 20-30 tenor points in time over the past five years. Each year

has approximately 260 working days also known as observation periods. Thus, there are n≈1306 observation

periods and m≈20 tenor points in time.

2. Calculate the log return over each period.

We take the natural logarithm of the ratio between the interest rate at each observation period and the interest

rate at the preceding period. To avoid problems while taking the natural logarithm, we have to ensure that all

elements of the data matrix Dare positive which is done by adding a correction term γ.

D=D+γW,

dij =dij +γwij, wij = 1 for all i, j

The correction factor γensuring all elements of matrix Dto be positive. Here the matrix W∈Rn×mis a

binary matrix having all entries as 1. The selection of γdoes not affect the final simulated yield curves as

we are compensating this shift at the bootstrapping stage by subtracting it from the simulated rates. Then we

calculate the log returns over each period and store them into a new matrix ˆ

D=ˆ

dij ∈Rn×mas

dij =ln( ¯

dij)

ln( ¯

d(i−1)j).

3. Correct the returns observed at each tenor so that the resulting set of returns at each tenor point has a zero

mean.

We calculate the arithmetic mean µjof each column of the matrix ˆ

µj=1

i=1

dij,

and subtract this arithmetic mean µjfrom each element of the corresponding jth column of a matrix ˆ

Dand

store the obtained results in the matrix ¯

D∈Rn×m,

dij =ˆ

dij −µjwij.

4. Compute the singular value decomposition [22] of the matrix ¯

The singular value decomposition of the matrix ¯

Dis

D= ΦΣΨT,

3. Yield Curve Simulation and Parameter Calibration

D=





φ11 ··· φ1m

φm1··· φmm





m×m

·





Σ10···

0....

.··· Σm





m×m

·





ψ11 ··· ψ1m

ψm1··· ψmm





m×m

where Σis a diagonal matrix having singular values Σiarranged in descending order. The columns of Φare

the normalized singular vectors φ∈Φ, and the columns of ΦΣ are known as principal components. The

right singular vectors ψ∈Ψare the eigenvectors or also called principal directions of the covariance matrix

C=¯

DT¯

D. A detailed relation between the singular value decomposition and the principal component analysis

is presented in the Appendix A.

5. Select the principal singular vectors corresponding to the maximum energy.

The relative importance of ith principal singular-vector is determined by the relative energy Ξiof that compo-

nent defined as

Ξi=Σi

i=1 Σi

where the total energy is given by Pm

i=1 Ξi= 1. We select pright singular vectors corresponding to the

maximum energies from the matrix Ψ. We construct a matrix ¯

Ψ∈Rm×pcomposed of these selected singular

vectors

Ψ = 





ψ11 ··· ψ1p

ψm1··· ψmp





m×p

6. Calculate the matrix of returns to be used for the simulation of yield curves.

We project the matrix ¯

Donto the matrix of selected singular vectors ¯

Φ.

X=¯

D·¯

Ψ,X∈Rn×p.

Furthermore, we calculate the matrix of returns MR∈Rn×mby multiplying the matrix Xwith the transpose

of the matrix of singular vectors ¯

Ψ.

MR=X·¯

ΨT.(13)

This process simplifies the statistical data ¯

Dthat transforms mcorrelated tenor points into puncorrelated

principal components. It allows reproducing the same data by simply reducing the total size of the model.

7. Bootstrapping

Bootstrapping is a type of resampling where large numbers of small samples of the same size are drawn repeat-

edly from the original data set. It follows the law of large numbers, which states that if samples are drawn over

and over again, then the resulting set should resemble the actual data set. In short, the bootstrapping creates

different scenarios based on simulated samples, which resembles the actual data. These scenarios can be further

used to obtain the values at favorable, moderate, and unfavorable conditions for an underlying instrument. Ac-

cording to the PRIIP regulations, we have to perform a bootstrapping procedure for the yield curve simulation

for at least 10,000 times. The regulations state that a standardized key information document shall include the

minimum recommended holding period (RHP).

Definition 3.1. Holding period. A holding period is a period between the acquisition of an asset and its sale.

It is the length of time during which an underlying instrument is held by an investor.

Remark. The recommended holding period gives an idea to an investor for how long should an investor hold

the product to minimize the risk. Generally, the RHP is given in years.

The time step in the simulation of yield curves is typically one observation period. Let hbe the RHP in days

(e.g., h≈2600 days or 10 years). So, there are hobservation periods in the RHP. For each observation period

in the RHP, we select a row at random from the matrix MR, i.e., we select hrandom rows from the matrix MR.

3. Yield Curve Simulation and Parameter Calibration

We construct a matrix ¯

MR=χij ∈Rh×mfrom these selected random rows. Further, we sum over the selected

rows of the columns corresponding to the tenor point j,

¯χj=

i=1

χij, j = 1,··· , m.

In this way, we obtain a row vector ¯χ∈R1×msuch that

¯χ= [¯χ1¯χ2··· ¯χm].

The final simulated yield rate yjat tenor point jis the rate at the last observation period ¯

dnj at the corresponding

tenor point j,

1. multiplied by the exponential of the ¯χj,

2. adjusted for any shift γused to ensure positive values for all tenor points.

3. adjusted for the forward rate so that the expected mean matches current expectations.

The forward rate between two time points t1and t2is then given as

r1,2=r(t0, t2)(t2−t0)−r(t0, t1)(t1−t0)

t2−t1

where t1and t2are measured in years. Here, r(t0, t1)and r(t0, t1)are the interest rates available from the data

matrix for the time periods (t0, t1)and (t0, t2). Thus, the final simulated rate rjis given by

yj=¯

dnj ×exp(¯χj)−γwnj +r1,2, j = 1,··· , m. (14)

Finally, we get the simulated yield curve from the calculated simulated returns yjas

y= [y1y2··· ym], j = 1,··· , m.

We perform the bootstrapping procedure for at least s= 10,000 times and construct a simulated yield curve

matrix Y∈Rs×mas

Y=





y11 ··· y1m

ys1··· ysm





s×m

(15)

Subsection 3.2 explains the parameter calibration based on simulated yield curves.

3.2. Parameter Calibration

The model parameters a(t)is calibrated based on simulated yield curves Y. According to [45], we can write a

closed-form solution for a zero-coupon bond B(t, T)maturing at time Tbased on the Hull-White model as

B(t, T) = exp{−r(t)Γ(t, T)−Λ(t, T)},(16)

3. Yield Curve Simulation and Parameter Calibration

where ris the short rate at time tand

κ(t) = Zt

b(s)ds,

Γ(t, T) = ZT

e−κ(t)dt,

Λ(t, T) = ZT

teκ(v)a(v)ZT

e−κ(z)dz−1

2e2κ(v)σ2(v)ZT

e−κ(z)dz2dv.

(17)

We take the following input data for the calibration:

1. The zero-coupon bond prices for all maturities Tm,0≤Tm≤T, where Tmis the maturity at the mth

tenor point.

2. The initial value of a(t)at t= 0 as a(0).

3. The constant value of the volatility σof the short-rate rtat all maturities 0≤Tm≤Tis assumed to be

constant.

4. The constant value of the parameter bis known and constant for all maturities 0≤Tm≤T.

We then determine the value of Γ(0, T)as follows:

e−κ(T)=∂

∂T Γ(0, T),

κ(T) = −ln ∂

∂T Γ(0, T),

∂

∂T κ(T) = ∂

∂T ZT

b(s)ds =b(T).

(18)

As we know the value of bfrom the given initial condition, we can compute κ(t). Subsequently, using κ(t), we

compute Γ(t).

We can use Λ(0, T)to determine a(t), for 0≤Tm≤Tin the following way:

∂

∂T Λ(0, T) = ZT

0"eκ(v)a(v)e−κ(T)

−e2κ(v)σ2(v)e−κ(T)ZT

e−κ(z)dz#dv,

eκ(T)∂

∂T Λ(0, T) = ZT

0"eκ(v)a(v)−e2κ(v)σ2(v)ZT

e−κ(z)dz#dv,

∂

∂T "eκ(T)∂

∂T Λ(0, T)#=eκ(T)a(T)−ZT

e2κ(v)σ2(v)e−κ(T)dv,

eκ(T)"eκ(T)∂

∂T Λ(0, T)#=e2κ(T)a(T)−ZT

e2κ(v)σ2(v)dv,

∂

∂T "eκ(T)"eκ(T)∂

∂T Λ(0, T)##=∂a(T)

∂T e2κ(T)+ 2a(T)e2κ(T)∂

∂T κ(T)−e2κ(T)σ2(T),

∂

∂T "eκ(T)"eκ(T)∂

∂T Λ(0, T)##=∂a(T)

∂T e2κ(T)+ 2a(T)e2κ(T)b(T)−e2κ(T)σ2(T).

(19)

3. Yield Curve Simulation and Parameter Calibration

The yield y(T)at time Tis given by

y(T) = −lnB(0, T).(20)

From (16), we can obtain

Λ(0, T)=[y(T)−r0Γ].

This gives an ordinary differential equation (ODE) for a(t)

∂

∂ta(t)e2κ(t)+ 2a(t)·b(t)·e2κ(t)−e2κ(t)σ2(t)

=∂

∂t"eκ(t)"eκ(t)∂

∂t(y(t)−r0Γ(0, t))##,

(21)

where y(T)is the simulated yield rate at tenor point T. We can solve (21) numerically with the given initial

conditions and yield rates for 0≤Tm≤T. For all Tm∈[0, T], we know b, σ and Γ(0, T)from the given

initial conditions and (18) respectively. If we assume a(t)to be piecewise constant with values a(i)in ((i+

1).∆T, i.∆T), then we can calculate a(i)iteratively for 0≤Tm≤T. This leads to a triangular system of

linear equations for the vector a(i)with non-zero diagonal elements.

Ea =F, (22)

where Emaps the parameter a(t)of the Hull-White model based on the simulated yield curves obtained using

the market data. The authors of [18] noticed that a small change in the market data used to obtain the yield

curves leads to large disturbances in the model parameter a(t). This makes the problem of solving a system of

linear equations ill-posed. We can define a well-posed problem with the following three different properties.

Definition 3.2. A well-posed problem [28] A given problem is said to be well-posed if it holds the following

three properties.

1. For all suitable data, a solution exist,

2. has an unique solution, and

3. the solution changes continuously with data.

However, in our case, a small change in the market data may lead to the large perturbations in the model

parameters, and which violates the third property. Hence, the use of naive approaches may result in some

numerical errors. To regularize the ill-posed problem, we implement Tikhonov regularization.

aδ

µ= argminkEa −Fδk2+µkak2(23)

where aδ

µis an estimate for a,µis the regularization parameter, δ=kF−Fδkis the noise level, and µkak2is

a penalty term. We solve the optimization problem (23) to obtain the parameter a(t). In this work, we use the

commercially available software called UnRisk PRICING ENGINE for the parameter calibrations [40]. The

UnRisk engine implements a classical Tikhonov regularization approach to regularize the ill-posed problems.

By providing the simulated yield curve, the UnRisk pricing function returns the calibrated parameter a(t)for

that yield curve. Based on 10,000 different simulated yield curves, we obtain s=10,000 different parameter

vectors a(t). In the matrix form, we write

A=





a11 ··· a1m

as1··· asm





(24)

where mis the number of tenor points. All parameters ai,j are assumed to be piecewise constant changing their

values only on tenor points, i.e., mtenor points mean mvalues for a single parameter vector.

4. Numerical Methods

Figure 2 shows the model hierarchy to obtain a reduced-order model for the Hull-White model.

Hull-White model

∂V

∂t + (a(t)−br)∂V

∂r +1

2σ2∂2V

∂r2−rV = 0

Finite difference method

A(ρs(t))Vn+1 =B(ρs(t))Vn, V (0) = V0

Selection of training parameters:

Classical greedy sampling algorithm

ρ1, ..., ρl

POD: Method of snapshots

V= [V(ρ1), V (ρ2), ..., V (ρl)]

Singular value decomposition

i=1

ΣiφiψT

Reduced order model

Ad(ρs)Vn+1

d=Bd(ρs)Vn

Model Order Reduction

ROM Quality Adaptive greedy sampling

algorithm

Model Order Reduction

Stop

satisfactory

unsatisfactory

Figure 2: Model hierarchy to obtain a reduced order model for the Hull-White model.

4. Numerical Methods

We discretize the Hull-White PDE using a finite difference method as presented in the subsection 4.1.3. The

discretization of the PDE creates a parameter-dependent high dimensional model. We need to solve the high

dimensional model for at least 10,000 parameters calibrated in the previous subsection 3.2. Solving the high

dimensional model for such a large parameter space is computationally costly. Thus, we incorporate the para-

metric model order reduction approach based on the proper orthogonal decomposition. The POD approach

relies on the method of snapshots. The snapshots are nothing but the solutions of the high dimensional model

at some parameter values.

The idea is to solve the high dimensional model for only a certain number of training parameters to obtain a

reduced-order basis. This reduced-order basis is then used to construct a reduced-order model. Subsection 4.2

illustrates the proper orthogonal decomposition approach, along with the construction of snapshots, and the Al-

gorithm 1 presents the methodology to obtain the reduced-order basis. Finally, we can solve the reduced-order

model cheaply for the large parameter space. The selection of the training parameters is of utmost importance

to obtain the optimal reduced-order model. In subsection 4.3, we introduce a greedy sampling technique for

the selection of training parameters. The greedy sampling technique selects the parameters at which the error

between the reduced-order model and the high dimensional model is maximal. However, the greedy sampling

approach exhibits some drawbacks (see subsection 4.3.1). To avoid these drawbacks, in the subsection 4.4, we

implement an adaptive greedy approach. These methods are interlinked with each other and necessary to obtain

the most efficient reduced-order model.

4.1. Finite Difference Method

The PDEs obtained for the Hull-White model is a convection-diffusion-reaction type PDE [4]. Consider a

Hull-White PDE given by (20)

∂V

∂t + (a(t)−brt)∂V

∂r

| {z }

Convection

2σ2∂2V

∂r2

| {z }

diffusion

−rV

|{z}

reaction

= 0.(25)

In this work, we apply a finite difference method to solve the Hull-White PDE. The convection term in the

above equation may lead to numerically unstable results. Thus, we implement the so-called upwind scheme

[30] to obtain a stable solution. We incorporate the semi-implicit scheme called the Crank-Nicolson method

[47] for the time discretization.

4.1.1. Spatial Discretization

Consider the following one-dimensional linear advection equation for a field ζ(x, t),

∂ζ

∂t +U∂ζ

∂x = 0,(26)

describing a wave propagation along the x−axis with a velocity U. We define a discretization of the computa-

tional domain in sd spatial dimensions as

[uk, vk]sd ×[0, T]=[u1, v1]×···×[usd, vsd]×[0, T] = sd

k=1

[uk, vk]×[0, T],

where uand vare the cut off limits of the spatial domain. Tdenotes the final time of the computation. The

corresponding indices are ik∈ {1, . . . , Mk}for the spatial discretization and n∈ {1, . . . , N}for the time

4. Numerical Methods

discretization. The first order upwind scheme of order O(∆x)is given by

ζn+1

i−ζn

∆t+Uζn

i−ζn

i−1

∆x= 0 for U > 0(27a)

ζn+1

i−ζn

∆t+Uζn

i+1 −ζn

∆x= 0 for U < 0(27b)

Let us introduce,

U+= max(U, 0), U−= min(U, 0),

and

ζ−

x=ζn

i−ζn

i−1

∆x, ζ+

x=ζn

i+1 −ζn

∆x

Combining (27a) and (27b) in compact form, we obtain

ζn+1

i=ζn

i−∆t[U+ζ−

x+U−ζ+

x].(28)

We have implemented the above defined upwind scheme for the convection term. The diffusion term is dis-

cretized using the second order central difference scheme of order O(∆x)2given by

∂2ζ

∂x2=ζn

i+1 −2ζn

i+ζn

i−1

(∆x)2.(29)

4.1.2. Time Discretization

Consider a time-dependent PDE for a quantity ζ

∂ζ

∂t +Lζ = 0,(30)

where Lis the differential operator containing all spatial derivatives. Using the Taylor series expansion, we

write

ζ(t+ ∆t) = ζ(t)+∆∂ζ

∂t +∆t2

∂2ζ

∂t2.

Neglecting terms of order higher than one, we obtain

∂ζ

∂t =ζ(t+ ∆t)−ζ(t)

∆t+O(∆t).

From (30), we get

ζ(t+ ∆t) = ζ(t)−∆t(L(t)ζ(t)).(31)

Let us introduce a new parameter Θsuch that

ζ(t+ ∆t)−ζ(t)

∆t= (1 −Θ)(L(t)ζ(t)) + Θ(L(t+ ∆t)ζ(t+ ∆t)).(32)

We can construct different time discretization schemes for different values of Θ. Setting Θ=0, we obtain a

fully explicit scheme known as the forward difference method, while considering Θ=1, we get a fully implicit

scheme known as the backward difference method. Here we set Θ = 1/2and obtain a semi-implicit scheme

known as the Crank-Nicolson method [4]

1−1

2∆tL(t+ ∆t)ζ(t+ ∆t) = 1 + 1

2∆tL(t)ζ(t).(33)

4. Numerical Methods

4.1.3. Finite Difference Method for a Hull-White Model

The computational domain for a spatial dimension the rate ris [u, v]. According to [1], the cut off values uand

vare given as

u=rsp + 7σ√Tand v=rsp −7σ√T, (34)

where rsp is a yield at the maturity Talso known as a spot rate. We divide the spatial domain into Mequidistant

grid points which generate a set of points {r1, r2, . . . , rM}. The time interval [0, T]is divided into N−1time

points (Npoints in time that are measured in days starting from t= 0).

Equation (25) can then be represented as,

∂V

∂t +L(t)V(t)=0.(35)

We specify the spatial discretization operator L(n), where the index ndenotes the time-point. From (27) and

(29), we get,

for (a(n)−bri)>0

L(n)Vn

i:= 1

2σ2Vn

i+1 −2Vn

i+Vn

i−1

(∆x)2+ (a(n)−bri)Vn

i−Vn

i−1

∆x−riVn

for (a(n)−bri)<0

L(n)Vn

i:= 1

2σ2Vn

i+1 −2Vn

i+Vn

i−1

(∆x)2+ (a(n)−bri)Vn

i+1 −Vn

∆x−riVn

(36)

From (33), we obtain

V(t+ ∆t)−V(t)

∆t= (1 −Θ)(L(t)V(t)) + Θ(L(t+ ∆t)V(t+ ∆t)) (37)

For Θ=1/2, we then have

1−1

2∆tL(t+ ∆t)

| {z }

A(ρs(t))∈RM×M

V(t+ ∆t) = 1 + 1

2∆tL(t)

| {z }

B(ρs(t))∈RM×M

V(t),(38)

where the matrices A(ρs(t)), and B(ρs(t)) depend on parameters a(t),band σ. We denote ρs={a(t), b, σ}

as the sth group of these parameters. Here

A(ρs(t)) = I−σ2∆t

2(∆x)2J−∆t

2∆x(H+G−+H−G+) + Ro,

and

B(ρs(t)) = I+σ2∆t

2(∆x)2J+∆t

2∆x(H+G−+H−G+)−Ro,

where







−2 1 0 ··· 0

1−2 1 ....

0 1 ......0

..........1

0··· 0 1 −2







4. Numerical Methods

G−=







1 0 0 ··· 0

−1 1 0 ....

0−1......0

..........0

0··· 0−1 1







G+=







1−1 0 ··· 0

0 1 −1....

0 0 ......0

..........−1

0··· 0 0 1







H+=







max(a(n)−br(1)) 0 0 ··· 0

0 max(a(n)−br(2)) 0 ....

0 0 ......0

..........0

0··· 0 0 max(a(n)−br(M))







H−=







min(a(n)−br(1)) 0 0 ··· 0

0 min(a(n)−br(2)) 0 ....

0 0 ......0

..........0

0··· 0 0 min(a(n)−br(M))







Ro=







r(1) 0 0 ··· 0

0r(2) 0 ....

0 0 ......0

..........0

0··· 0 0 r(M)







The above discretization of the PDE generates a parametric high dimensional model of the following form (39).

A(ρs(t))Vn+1 =B(ρs(t))Vn, V (0) = V0,(39)

where the matrices A(ρ)∈RM×M, and B(ρ)∈RM×Mare parameter dependent matrices. Mis the total

number of spatial discretization points. tis the time variable. t= [0, T]where Tis the final term date. For

the simplicity of notations, we denote ρs={(as1, . . . , asm), b, σ}as the sth group of model parameters where

s= 1,...,10000, and mis the total number of tenor points. We need to solve this system at each time step

nwith an appropriate boundary condition and a known initial value of the underlying instrument. We need to

solve the system (39) for at least 10,000 parameter groups ρgenerating a parameter space Pof 10000 ×m.

Table 1: List of symbols used in subsection 4.1.3

MSpatial discretization points.

NTemporal discretization points.

A,BSystem matrices.

TMaturity or the final term date.

mNumber of tenor points.

ρGroup of model parameters {a(t), b, σ}.

PParameter space.

4. Numerical Methods

4.2. Parametric Model Order Reduction

We employ the projection based model order reduction (MOR) technique to reduce the high dimensional model

(39). The reduced-order model is obtained using the Galerkin projection onto the low dimensional subspace,

Q∈ {φi}d

i=1. We approximate the high dimensional state space Vnby a Galerkin ansatz as

Vn=QV n

d,(40)

where Q∈RM×dis the reduced-order basis with dM,Vdis a vector of reduced coordinates, and ¯

V∈RM

is the solution obtained using the reduced order model. Substituting (40) into the system of equations (39) gives

the residual of the reduced state as

Rn(Vd, ρs) = A(ρs)QV n+1

d−B(ρs)QV n

d.(41)

In the case of the Galerkin projection, the residual R(Vd, ρs)is orthogonal to the ROB Q

QTRn(Vn

d, ρs)=0.(42)

Multiplying (41) by QT, we get

QTA(ρs)QV n+1

d=QTB(ρs)QV n

Ad(ρs)Vn+1

d=Bd(ρs)Vn

d,(43)

where the matrices Ad(ρs)∈Rd×dand Bd(ρs)∈Rd×dare the parameter dependent reduced matrices. In

short, instead of solving a linear system of equations of order M, the MOR approach solves a linear system

of order dwhere dM. We obtain the Galerkin projection matrix Q(43) based on a proper orthogonal

decomposition (POD) method. POD generates an optimal order orthonormal basis Qin the least square sense

which serves as the reduced-order basis for a given set of computational data. We aim to obtain the subspace

Qindependent of the parameter space P. In this work, we obtain the reduced-order basis by the method of

snapshots. The snapshots are nothing but the state solutions obtained by solving the high dimensional models

for selected parameter groups. We assume that we have a training set of parameter groups ρ1··· ρl∈[ρ1

ρs]. We compute the solutions of the high dimensional models for the training set and combine them to form

a snapshot matrix ˆ

V= [V(ρ1), V (ρ2), ..., V (ρl)]. Now, to obtain the desired reduced-order basis, the POD

method solves

POD( ˆ

V) := argmin

i=1 kVi−QQTVik2,(44)

for all matrices Q∈RM×dthat satisfy QQT=I. We can obtain the reduced-order basis Qfulfilling the

condition (44) by computing the SVD of the matrix ˆ

V. We perform a truncated SVD [22] of the matrix ˆ

Vto

obtain the reduced-order basis Q

i=1

ΣiφiψT

V= ΦΣΨT,

(45)

where φiand ψiare the left and right singular vectors of the matrix ˆ

Vrespectively, and Σiare the singular

values.

V=φ1··· φkM×k





Σ10···

0....

.··· Σk





k×kψ1··· ψkk×k

The truncated (economy-size) SVD computes only the first kcolumns of the matrix Φ. The optimal projection

subspace Qconsists of dleft singular vectors φiknown as POD modes. Here dis the dimension of the reduced

4. Numerical Methods

order model.

The Algorithm 1 shows the steps to construct a reduced-order basis using the proper orthogonal decomposition

approach. We have to choose the dimension dof the subspace Qsuch that we get a good approximation of the

Table 2: List of symbols used in the Algorithm 1

a(t)Deterministic drift.

bMean reversion speed.

σVolatility.

ρGroup of model parameters {a(t), b, σ}.

V(ρi)Solution obtained using a high dimensional model for ρi.

VSnapshot matrix.

ΦMatrix of left singular vectors.

ΨMatrix of right singular vectors.

ΣMatrix of singular values.

ΞjRelative energy of the jth POD mode.

QReduced-order basis.

HDM High dimensional model.

ROM Reduced-order model.

Algorithm 1 Reduced-order basis using a proper orthogonal decomposition

Input: Parameter a(t),b,σ, Energy level EL,l

Output: Q

1: Choose ρ1,··· , ρl

2: for i= 1 to ldo

3: Solve the HDM for the parameter group ρi:V(ρi)

4: end for

5: Construct a snapshot matrix ˆ

Vusing {V(ρi)}l

i=1

6: ˆ

V= [V(ρ1), ..., V (ρl)]

7: Compute the leading singular values and associated singular vectors of ˆ

Vusing the truncated SVD: ˆ

ΦΣΨT

8: Ξ = diag(Σ)/sum(diag(Σ))

9: for j= 1 to length(Σ) do

10: ¯

Ξ = sum(Ξ(1 : j)) ×100

11: if ¯

Ξ> EL then

12: d=j

13: end if

14: end for

15: Q= [φ1···φd]

snapshot matrix. According to [44], large singular values correspond to the main characteristics of the system,

while small singular values give only small perturbations of the overall dynamics. The relative importance of

the ith POD mode of the matrix ˆ

Vis determined by the relative energy Ξiof that mode

Ξi=Σi

i=1 Σi

(46)

4. Numerical Methods

If the sum of the energies of the generated modes is unity, then these modes can be used to reconstruct a

snapshot matrix completely [52]. Generally, the number of modes required to generate the complete data set is

significantly less that the total number of POD modes [42]. Thus, a matrix ˆ

Vcan be accurately approximated

by using POD modes whose corresponding energies sum to almost all of the total energy. Thus, we choose only

dout of kPOD modes to construct Q= [φ1···φd]which is a parameter independent projection subspace based

on (46). It is evident that the quality of the reduced-order model mainly depends on the selection of parameter

groups ρ1, ..., ρluse to compute snapshots. Thus, it necessitates defining an efficient sampling technique for the

high dimensional parameter space. We can consider the standard sampling techniques like uniform sampling

or random sampling [34]. However, these techniques may neglect a vital region within the parameter space.

Alternatively, the greedy sampling method is proven to be an efficient method for sampling a high dimensional

parameter space in the framework of model order reduction [25, 41, 3].

4.3. Greedy Sampling Method

The greedy sampling technique selects the parameter groups at which the error between the reduced-order

model and the high dimensional model is maximal. Further, we compute the snapshots using these parameter

groups so that we can obtain the best suitable reduced-order basis Q.

kek=kV−¯

kVk.

ρI= argmax

ρ∈Pkek,

(47)

where kekis a relative error between the reduced-order model and the high dimensional model. Thus, the

greedy sampling algorithm, at each greedy iteration i= 1, ..., Imax, selects the optimal parameter group ρI,

which maximizes the relative error kek. However, the computation of relative error kekis costly as it entails

the solution for the high dimensional model. Thus, usually, the relative error is replaced by error bounds or the

residual error kRk. However, in some cases, it is not possible to define the error bounds or the error bounds

do not exist. In such cases, the norm of the residual is a good alternative [10, 50, 41]. For the simplicity of

notation, we consider εas the error estimator, i.e., in our case the norm of the residual. The greedy sampling

algorithm runs for Imax iterations. At each iteration i= 1, ..., Imax, we choose the parameter group as the

maximizer

ρI= argmax

ρ∈P

ε(ρ).(48)

The Algorithm 2 describes the classical greedy sampling approach. It initiates by selecting any parameter

group ρ1from the parameter set Pand computes a reduced-order basis Q1, as described in subsection 4.2.

It is necessary to note that the choice of a first parameter group to obtain Q1does not affect the final result.

Nonetheless, for the simplicity of computations, we select the first parameter group ρ1from the parameter space

P. Furthermore, the algorithm chooses the pre-defined parameter set ˆ

Prandomly of cardinality Cas a subset

of P. At each point within the parameter set ˆ

P, the algorithm determines a reduced-order model using the

reduced-order basis Q1and computes error estimator values, ε(ρj)C

j=1. The parameter group in the pre-defined

parameter set ˆ

Pat which the error estimator is maximal is then selected as the optimal parameter group ρI.

Furthermore, the algorithm solves the high dimensional model for the optimal parameter group and updates

the snapshot matrix ˆ

V. Finally, a new reduced-order basis is obtained by computing a truncated singular value

decomposition of the updated snapshot matrix, as described in the Algorithm 1. These steps are then repeated

for Imax iterations or until the maximum value of the error estimator is higher than the specified tolerance εtol.

4. Numerical Methods

Table 3: List of symbols used in the Algorithm 2

Imax Maximum number of greedy iterations.

CMaximum parameter groups selected to obtain a reduced-order basis.

PParameter space.

ρGroup of model parameters {a(t), b, σ}.

QReduced-order basis (ROB).

εError estimator.

εtol Tolerance for the error estimator, greedy iteration terminates if ε<εtol.

V(ρi)Solution obtained by solving a high dimensional model for ρi.

VSnapshot matrix.

Algorithm 2 The classical greedy sampling algorithm

Input: Maximum number of iterations Imax, maximum parameter groups C, Parameter space P,εtol

Output: Q

1: Choose first parameter group ρ1= [(a11, ..., a1m), b, σ]from P

2: Solve the HDM for a parameter group ρ1and store the results in V1

3: Compute a truncated SVD of the matrix V1and construct Q1

4: Randomly select a set of parameter groups ˆ

P={ρ1, ρ2, ..., ρC} ⊂ P

5: for i=2toImax do

6: for j=1toCdo

7: Solve a ROM for the parameter group ρjwith the ROB Qi−1

8: Compute the error estimator ε(ρj)

9: end for

10: Find ρI= argmax

ρ∈ˆ

ε(ρ)

11: if ε(ρI)≤εtol then

12: Q=Qi−1

13: break

14: end if

15: Solve the HDM for the parameter group ρIand store the result in Vi

16: Construct a snapshot matrix ˆ

Vby concatenating the solutions Vsfor s= 1, ..., i

17: Compute an SVD of the matrix ˆ

Vand construct Qi

18: end for

4.3.1. Drawbacks

The greedy sampling method computes an inexpensive a posteriori error estimator for the reduced-order model.

However, it is not feasible to calculate the error estimator values for the entire parameter space P. An error esti-

mator is based on the norm of the residual which scales with the dimension of the high dimensional model, M.

With an increase in dimension, it is not computationally reasonable to calculate the residual for 10,000 param-

eter groups, i.e., to solve the reduced-order model for the entire parameter space. Hence, the classical greedy

sampling technique chooses the pre-defined parameter set ˆ

Prandomly as a subset of P. Random sampling is

designed to represent the whole parameter space P, but there is no guarantee that ˆ

Pwill reflect the complete

space Psince the random selection of a parameter set may neglect the parameter groups corresponding to the

most significant error. These observations motivate to design a new criterion for the selection of the subset ˆ

Another drawback of the classical greedy sampling technique is that we have to specify the maximum error

4. Numerical Methods

estimator tolerance εtol. The error estimator usually depends on some error bound, which is not tight or it may

not exist. To overcome this drawback, we establish a strategy by constructing a model for an exact error as a

function of the error estimator based on the idea presented in [41]. Furthermore, we use this exact error model

to observe the convergence of the greedy sampling algorithm instead of the error estimator.

4.4. Adaptive Greedy Sampling Method

To avoid the drawbacks associated with the classical greedy sampling technique, we have derived an adaptive

greedy sampling approach which selects the parameter groups adaptively at each iteration of the greedy pro-

cedure, using an optimized search based on surrogate modeling. We construct a surrogate model of the error

estimator ¯εto approximate the error estimator εover the entire parameter space. Further, we use this surrogate

model to locate the parameter groups ˆ

Pk={ρ1, ..., ρCk}with Ck< C, where the values of the error estimator

εare highest. For each parameter group within the parameter set ˆ

Pk, we determine a reduced-order model and

compute the error estimator values. The algorithm builds a new surrogate model based on these error estimator

values, and the process repeats itself until the total number of parameter groups reaches C, resulting in the

desired parameter set ˆ

4.4.1. Surrogate Modeling of the Error Estimator

At each greedy iteration, the algorithm construct a surrogate model of the error estimator to locate the parame-

ters adaptively.

Algorithm 3 Surrogate model using the principal component regression technique.

Input: The response vector ˆε= [ε1, ..., εCk],ˆ

Pk= [ρ1, ..., ρCk]∈RCk×¯m, principal components p

Output: Vector of regression coefficients η

1: Standardize ˆ

Pkand ˆεwith zero mean and variance equals to one

2: Compute the singular value decomposition of the matrix ˆ

Pk:

Pk=ˆ

Φˆ

Σˆ

3: Construct a new matrix Z=ˆ

Pkˆ

Ψp= [ ˆ

Pkˆ

ψ1, ..., ˆ

Pkˆ

ψp]composed of principal components

4: Compute the least square regression using the principal components as independent variables:

Ω = argmin

Ωkˆε−ZΩk2

5: Compute the PCR estimate ηP CR of the regression coefficients η:ηP CR =ˆ

Ψpˆ

Ω

The detailed adaptive greedy sampling algorithm along with its description is presented in the subsection 4.4.2.

The first stage of the adaptive greedy sampling algorithm computes the error estimator over the randomly

selected parameter set ˆ

P0of cardinality C0. Furthermore, the algorithm uses these error estimator values

{ε}C0

i=1 to build a surrogate model ¯ε0and locates the Ckparameter groups corresponding to the Ckmaximum

values of the surrogate model. This process repeats itself for k= 1, ..., K iterations until the total number

of parameter groups reaches C. Finally, the optimal parameter group ρIis the one that maximizes the error

estimator within the parameter set ˆ

P=ˆ

P0∪ˆ

P1∪ˆ

P2∪···∪ ˆ

PK, k = 1, ..., K

Thus, at each kth iteration, we construct a surrogate model ¯εkwhich approximates the error estimator over

the entire parameter space P. There are different choices to build a surrogate model [34]. In this paper, we

use the principal component regression (PCR) technique. Suppose the vector ˆε= (ε1, ..., εCk)∈RCk×1is

the response vector having error estimator values at kth iteration. Since, we have considered parameters band

σas constants, we build a surrogate model with the parameter a(t)only. Let ˆ

Pk= [ρ1, ..., ρCk]∈RCk×m

be the matrix composed of Ckparameter groups at the kth iteration. The rows of the matrix ˆ

Pkrepresent Ck

4. Numerical Methods

parameter vectors, while mcolumns represent mtenor points for the parameter vector a(t). We can fit a simple

multiple regression model as

ˆε=ˆ

Pk·η+err, (49)

where η= (η1, ..., ηm)is an array containing regression coefficients and err is an array of residuals. The least

square estimate of ηis obtained as

ˆη= argmin

ηkˆε−ˆ

Pk·ηk2

2= argmin

ηkˆε−

i=1

ρiηik2

However, if Ckis not much larger than m, then the model might give weak predictions due to the risk of

overfitting for the parameter groups which are not used in model training. Also, if Ckis smaller than m, then

the least square approach cannot produce a unique solution, restricting the use of the simple linear regression

model. We might face this problem during the first few iterations of the adaptive greedy sampling algorithm,

as we will have less error estimator values to build a reasonably accurate model. Hence, to overcome this

drawback, we implement the principal component regression technique. This method is a dimension reduction

technique in which mexplanatory variables are replaced by plinearly uncorrelated variables called principal

components. The dimension reduction is achieved by considering only a few relevant principal components.

The principal component regression approach helps to reduce the problem of estimating mcoefficients to the

more simpler problem of determining pcoefficients. In the following, we illustrate the method to construct a

surrogate model at kth iteration in detail. Before performing a principal component analysis, we center both the

response vector ˆεand the data matrix ˆ

Pk. The principal component regression starts by performing a principal

component analysis of the matrix ˆ

Pk. For this, we compute a singular value decomposition of the matrix ˆ

Pk. A

detailed relation between the singular value decomposition and the principal component analysis is presented

in the Appendix A.

Pk=ˆ

Φˆ

Σˆ

Ψ,

where ˆ

ΣCk×m= diag[ˆ

Σ1, ..., ˆ

Σm]is a diagonal matrix with singular values arranged in the descending order.

Φm×m= [ˆ

φ1, ..., ˆ

φm]and ˆ

Ψm×m= [ ˆ

ψ1, ..., ˆ

ψm]are the matrices containing left and right singular vectors.

The principal components are nothing but the columns of the matrix ˆ

Pkˆ

Ψ. For dimension reduction, we select

only pcolumns of the matrix ˆ

Ψ, which are enough to construct a fairly accurate model. The author of [7]

reported that the first three or four principal components are enough to analyze the yield curve changes. Let

Z=ˆ

Pkˆ

Ψp= [ ˆ

Pkˆ

ψ1, ..., ˆ

Pkˆ

ψp]be the matrix containing first pprincipal components. We regress ˆεon these

principal components as follow

ˆε=ZΩ + err, (50)

where Ω = [ω1, ..., ωp]is the vector containing regression coefficient obtained using principal components.

The least square estimate for Ωis given as

Ω = argmin

Ωkˆε−ZΩk2

2= argmin

ωkˆε−

i=1

ziωik2

We obtain the PCR estimate ηP CR ∈Rmof the regression coefficients ηas

ηP CR =ˆ

Ψpˆ

Ω(51)

Finally, we can obtain the value of the surrogate model for any parameter vector as= (as1, ..., asm)as

¯ε(ρs) = η1as1+···+ηmasm.(52)

4. Numerical Methods

Table 4: List of symbols used in the Algorithm 3

εError estimator.

ˆεResponse vector composed of error estimator values at the kth iteration.

PkParameter set at the kth iteration comprised of Ckparameter groups.

ρGroup of model parameters {a(t), b, σ}.

ΩRegression coefficients obtained using principal component regression technique.

ηFinal regression coefficients used to construct a surrogate model.

4.4.2. Adaptive Greedy Sampling Algorithm

The adaptive greedy sampling algorithm utilizes the designed surrogate model to locate the optimal parameter

groups adaptively at each greedy iteration i= 1, ..., Imax. The first few steps of the algorithm resemble

the classical greedy sampling approach. It selects the first parameter group ρ1from the parameter space P

and computes the reduced-order basis Q1. Furthermore, the algorithm randomly selects C0parameter groups

and construct a temporary parameter set ˆ

P0={ρ1, ..., ρC0}. For each parameter group in the parameter

set ˆ

P0, the algorithm determines a reduced-order model and computes the residual errors {ε(ρj)}C0

j=1. Let

ε0={ε(ρ1), ..., ε(ρC0)}be the array containing the error estimator values obtained for the parameter set

P0. The adaptive parameter sampling starts by constructing a surrogate model for the error estimator ¯εbased

on {ε(ρj)}C0

j=1 error estimator values, as discussed in subsection 4.4.1. The obtained surrogate model is then

solved for the entire parameter space P. We locate Ckparameter groups corresponding to the first Ckmaximum

values of the surrogate model. We then construct a new parameter set ˆ

Pk={ρ1, ..., ρk}composed of these

Ckparameter groups. The algorithm determines a reduced-order model for each parameter group within the

parameter set ˆ

Pkand obtains the analogous error estimator values {ε(ρk)}Ck

k=1. Let εk={ε(ρ1), ..., ε(ρCk)}be

the array containing the error estimator values obtained for the parameter set ˆ

Pk. Furthermore, we concatenate

the set ˆ

Pkand the set ˆ

P0to form a new parameter set ˆ

P=ˆ

Pk∪ˆ

P0. Let Esg =ε0∪···∪εkbe the set composed

of all the error estimator values available at the kth iteration. The algorithm then uses this error estimator set Esg

to build a new surrogate model. The quality of the surrogate model increases with each proceeding iterations

as we get more error estimator values to build a fairly accurate model. This process repeats itself until the

cardinality of the set ˆ

Preaches C,

P=ˆ

P0∪ˆ

P1∪ˆ

P2∪···∪ ˆ

PK, k = 1, ..., K.

Finally, the optimal parameter group ρIis extracted from the parameter set ˆ

P, which maximizes the error esti-

mator (48). In this work, we build a computationally cheap surrogate model based on the principal component

regression. Note that typically it is not necessary to obtain a very accurate sampling using the designed surro-

gate model. Sampling the high dimensional model in the neighborhood of the parameter group with maximum

error is acceptable enough to obtain good results.

In the classical greedy sampling approach, we use the residual error εto observe the convergence of the algo-

rithm, which corresponds to the exact error between the high dimensional model and the reduced-order model

(47). However, in the adaptive greedy POD algorithm, we use an approximate model ¯efor an exact error e(., .)

as a function of the error estimator. To build an approximate error model, we need to solve one high dimensional

model at each greedy iteration. The algorithm solves the high dimensional model for the optimal parameter

group ρIand updates the snapshot matrix ˆ

V. A new reduced-order basis Qis then obtained by computing the

truncated singular value decomposition of the updated snapshot matrix as explained in subsection 4.2. Fur-

thermore, we solve the reduced-order model for the optimal parameter group before and after updating the

reduced-order basis and obtain the respective error estimator values εbf (ρI), and εaf (ρI). Here superscript bf

and af denote the before and after updating the reduced-order basis. Now, we obtain the relative errors ebf , eaf

4. Numerical Methods

between the high dimensional model and the reduced-order models constructed before and after updating the

reduced-order basis. In this way, at each greedy iteration, we get a set of error values Epthat we can use to

construct an approximate error model for an exact error ebased on the error estimator ε.

Ep={(ebf

1,εbf

1)∪(eaf

1,εaf

1), ..., (ebf

i,εbf

i)∪(eaf

i,εaf

i)}.(53)

We construct a linear model for an exact error based on the error estimator as follows

log(¯ei) = γilog(ε) + logτ. (54)

Setting Y= log(¯e),X= log(ε)and ˆτ= log(τ). We get

Y=γX+ ˆτ,

where γis the slope of the linear model and ˆτis the intersection with the logarithmic axis log(y). After each

greedy iteration, we get more data points in the error set Ep, which increases the accuracy of the error model.

In subsection 5.3, we validate with the obtained results that in our case the linear model is sufficient to achieve

an accurate error model for the exact error. Algorithm 4 describes the adaptive sampling method based on

Table 5: List of symbols used in the Algorithm 4

Imax Maximum number of greedy iterations.

CMaximum parameter groups selected to obtain a reduced-order basis.

PParameter space.

ρGroup of model parameters {a(t), b, σ}.

CkNumber of parameters selected adaptively based on surrogate modeling.

VSolution obtained by solving a high dimensional model.

QReduced-order basis (ROB)

C0Number of randomly selected parameter groups to initiate the algorithm.

εError estimator.

εkA set comprised of error estimator values at the kth iteration.

Esg A set composed of all the error estimator values at the kth iteration.

¯εSurrogate model.

PA parameter set used to obtain the optimal parameter group.

ρIOptimal parameter group which maximizes the error estimator.

HDM High dimensional model.

ROM Reduced-order model.

eRelative error between a ROM and a HDM.

VSolution obtained using a reduced-order model.

af, bf Superscripts used to denote before and after updating the ROB.

VSnapshot matrix.

EpError set.

¯eError model: an approximate error for an exact error e.

emax

tol Tolerance for the relative error, greedy iterations terminates if ¯e<emax

tol .

surrogate modeling in lines 12 to 24. The algorithm for error modeling is described in lines 30 to 38.

4. Numerical Methods

Algorithm 4 The adaptive greedy sampling algorithm

Input: Maximum number of iterations Imax, maximum parameter groups C, number of adaptive candidates

Ck, Parameter space P, tolerance emax

tol

Output: Q

1: Choose first parameter group ρ1= [(a11, ..., a1m), b, σ]from P

2: Solve the HDM for parameter group ρ1and store the results in V1

3: Compute a truncated SVD of the matrix V1and construct Q1

4: for i=2toImax do

5: Randomly select a set of parameter groups ˆ

P0={ρ1, ρ2, ..., ρC0} ⊂ P

6: for j= 1 to C0do

7: Solve a ROM for the parameter group ρjwith the ROB Qi−1

8: Compute the error estimator ε(ρj)

9: end for

10: Let ε0={ε(ρ1), ..., ε(ρC0)}be the error estimator values obtained for parameter set ˆ

11: set k= 1 and Esg =ε0

12: while n(ˆ

P)< C do

13: Construct a surrogate model ¯

ε(ρ)using the values Esg

14: Compute the values of the surrogate model over P:¯

ε(ρ)for all ρ∈P

15: Determine the first Ckmaximum values of ¯

ε(ρ)and corresponding parameter groups ˆ

Pk=

{ρ1, ..., ρk}

16: for x= 1 to n(ˆ

Pk)do

17: Solve a ROM for the parameter group ρxwith the ROB Qi−1

18: Compute the error estimator ε(ρx)

19: end for

20: Let εk={ε(ρ1), ..., ε(ρCk)}be the error estimator values obtained for parameter set ˆ

21: Update Esg as Esg ={ε0∪···∪εk}

22: Construct a new parameter set ˆ

P=ˆ

P0∪ˆ

Pkwith k= 1,2, . . .

23: k=k+ 1

24: end while

25: Find ρI= argmax

ρ∈ˆ

ε(ρ)

26: if i > 2and ¯ei≤emax

tol then

27: Q=Qi−1

28: break

29: end if

30: Solve the HDM for the parameter group ρIand store the result in Vi

31: Solve the ROM for the parameter group ρIusing Qi−1and store the result in ¯

32: Compute the relative error ebf

iand the error estimator εbf

iusing the ROM obtained with Qi−1(Before

updating the ROB). ebf

i=kVi(ρI)−¯

Vi(ρI)k/kVi(ρI)k

33: Construct a snapshot matrix ˆ

Vby concatenating the solutions Vsfor s= 1, ..., i

34: Compute an SVD of the matrix ˆ

Vand construct Qi

35: Solve the ROM for parameter group ρIusing Qiand store the result in ¯

Vi+1

36: Compute the relative error eaf

iand the error estimator εaf

iusing the ROM obtained with Qi(after up-

dating the ROB). eaf

i=kVi(ρI)−¯

Vi+1(ρI)k/kVi(ρI)k

37: Construct a error set Ep={(ebf

1,εbf

1)∪(eaf

1,εaf

1), ..., (ebf

i,εbf

i)∪(eaf

i,εaf

i)}

38: Construct an approximate model for an exact error ¯eusing error set Ep:log(¯ei) = γilog(ε) + logτ

39: end for

5. Numerical Example

A numerical example of a floater with cap and floor [20] is used to test the developed algorithms and methods.

We solved the floater instrument using the Hull-White model. We obtained the high dimensional model by

discretizing the Hull-White PDE as discussed in subsection 4.1.3 and compared the results with the reduced-

order model. The reduced-order model is generated by implementing the proper orthogonal decomposition

method along with the classical and the adaptive greedy sampling techniques. The characteristics of the floater

instrument are as shown in Table 6.

Table 6: Numerical Example of a floater with cap and floor.

Coupon frequency quarterly

Cap rate, CR2.25 % p.a.

Floor rate, FR0.5 % p.a.

Currency EURO

Maturity 10 years

Nominal amount 1.0

b0.015

σ0.006

The interest rates are capped at cR= 2.25% p.a. and floored at cF= 0.5% p.a. with the reference rate as

Euribor3M. The coupon rates can be written as

c= min(2.25%,max(0.5%,Euribor3M)) (55)

Note that, the coupon rate c(n)at time tnis set in advanced by the coupon rate at tn−1. All computations

are carried out on a PC with 4 cores and 8 logical processors at 2.90 GHz (Intel i7 7th generation). We used

MATLAB R2018a for the yield curve simulations. The numerical method for the yield curve simulations is

tested with real market based historical data. We have collected the daily interest rate data at 26 tenor points

in time over the past five years. Each year has 260 working days. Thus, there are 1300 observation periods.

We have retrieved this data from the State-of-the-art stock exchange information system, ”Thomson Reuters

EIKON [48]”. We have used the inbuilt UnRisk tool for the parameter calibration, which is well integrated with

Mathematica (version used: Mathematica 11.3). Further, we used calibrated parameters for the construction of

a Hull-White model. We have designed the finite difference method and the model order reduction approach

for the solution of the Hull-White model in MATLAB R2018a.

5.1. Model Parameters

We computed the model parameters as explained in subsection 3.2. The yield curve simulation is the first

step to compute the model parameters. Based on the procedure described in subsection 3.1, we performed the

bootstrapping process for the recommended holding period of 10 years, i.e., for the maturity of the floater. The

collected historical data has 19 tenor points and 1306 observation periods as follows (D: Day, M: Month, Y:

Year):

m=: {1D, 1Y, 2Y, 3Y, ··· ,10Y, 12Y, 15Y, 20Y, 25Y, 30Y, 40Y, 50Y}

n=: {1306 daily interest rates at each tenor point}

The ten thousand simulated yield curves in 10 years in the future are presented in Fig. 3. For the floater

example, we need parameter values only until the 10Y tenor point (maturity of the floater). Henceforth, we

consider the simulated yield curves with only the first 11 tenor points. The calibration generates the real

parameter space of dimension R10000×11 for the parameter a(t). We considered the constant volatility σand

5. Numerical Example

Figure 3: 10,000 simulated yield curves obtained by bootstrapping for 10 years in future.

the constant mean reversion bof the short-rate requal to 0.006 and 0.015, respectively. All parameters are

assumed to be piecewise constants between the tenor points (0 −1Y, 1Y−2Y, 2Y−3Y, ··· ,9Y−10Y).

Figure 4 shows 10,000 different piecewise constant parameters a(t).

Figure 4: 10,000 parameter vectors a(t)as a piecewise function of time.

5.2. Finite Difference Method

The computational domain for a spatial dimension ris restricted to r∈[u, v]as described in subsection 4.1.3.

Here, u=−0.1and v= 0.1. We applied homogeneous Neumann boundary conditions of the form

∂V

∂r |r=u= 0,∂V

∂r |r=v= 0.(56)

5. Numerical Example

We divided the spatial domain into M= 600 equidistant grid points which generate a set of points

{r1, r2, . . . , rM}. The time interval [0, T]is divided into N−1time points. Npoints in time that are measured

in days starting from t= 0 untill the maturity T, i.e., in our case, the number of days until maturity are assumed

to be 3600 with an interval ∆t= 1 (10 years ≈3600 days). Rewriting (39), we obtain

A(ρs(t))Vn+1 =B(ρs(t))Vn, V (0) = V0.

We can apply the first boundary condition in (56) by updating the first and the last rows (A1and AM) of the

matrix A(ρs). Using the finite difference approach, the discretization of (56) yields

A1= (−1,1,0,...,0) and AM= (0,...,0,1,−1).

The second Neumann boundary condition can be applied by changing the last entry of the vector BV nto zero.

Starting at t= 0 with the known initial condition V(0) as the principal amount, at each time step, we solve the

system of linear equations (39). Note that, we need to update the value of the grid point rievery three months

as the coupon frequency is quarterly by adding coupon fnbased on the coupon rate given by (55).

5.3. Model Order Reduction

We have implemented the parametric model order reduction approach for the floater example, as discussed

in subsection 4.2. The quality of the reduced-order model depends on the parameter groups selected for the

construction of the reduced-order basis Q. The reduced-order basis is obtained using both classical and adaptive

greedy sampling algorithms.

Classical Greedy Sampling Approach

At each iteration of the classical greedy sampling approach, the algorithm constructs a reduced-order basis as

presented in the Algorithm 1. We have specified a maximum number of pre-defined candidates to construct a

set ˆ

Pto 40 and a maximum number of iteration Imax to 10.

123456789

10-4

10-3

10-2

10-1

100

Figure 5: Evolution of maximum and average residuals with each iteration of the classical greedy algorithm.

The progression of the maximum and average residuals with each iteration of the greedy algorithm is presented

in Fig. 5. It is observed that the maximum residual error predominantly decreases with increasing iterations.

5. Numerical Example

123456789

10-4

10-3

10-2

10-1

100

Figure 6: Evolution of the maximum residual error for three different cardinalities of set ˆ

Thus, we can say that the proposed greedy algorithm efficiently locates the optimal parameter groups and

constructs the desired reduced-order basis Q. Furthermore, we tested the effect of change in the cardinalities

of the set ˆ

P. The proposed algorithm is applied with three different cardinalities of ˆ

P:|ˆ

P1|= 20,|ˆ

P2|=

30,|ˆ

P3|= 40. Note that we have constructed ˆ

Pby randomly selecting the parameter groups from the parameter

space P. Figure 6 shows the plot of the maximum residual against the number of iterations for three different

cardinalities. It is evident that with an increasing number of candidates, the maximum residual error decreases.

However, the decrement is not significant enough with an increase in the cardinality of ˆ

Peven by 20. Thus, we

can say that 20 randomly selected parameter groups are enough to obtain the reduced-order basis Q.

Drawbacks of the classical greedy approach

5 10 15 20 25 30 35 40 45 50

10-5

10-4

10-3

10-2

10-1

Figure 7: The relative error between the HDM and the ROM for two different parameter groups.

5. Numerical Example

We noticed that there are some parameter groups (e.g., ρ238) for which the reduced-order model gives unsatis-

factory results. Figure 7 illustrates the relative error between the high dimensional model and the reduced-order

model for two different parameter groups. One can observe that the reduced-order model built for the parameter

group ρ238 (dashed line) shows inferior results as compared to the reduced-order model for the parameter group

ρ786 (solid line). Even an increase in the reduced dimension ddoes not improve the quality of the result sub-

stantially. This remark reveals that the selection of trial candidates by random sampling neglects the parameter

groups corresponding to the significant error.

Adaptive Greedy Sampling Approach

To overcome the drawbacks associated with the classical greedy algorithm, we have implemented the adaptive

greedy sampling approach for the floater example. At each greedy iteration, the algorithm locates Ck= 10

123456789

10-6

10-5

10-4

10-3

10-2

10-1

Figure 8: Evolution of maximum and average residuals with each iteration of the adaptive greedy algorithm.

parameter groups adaptively using the surrogate modeling technique, as described in subsection 4.4. We have

fixed the maximum number of elements within the parameter set ˆ

Pto 40. Furthermore, the adaptively obtained

parameter set ˆ

Phas been used to locate the optimal parameter group ρI. These steps are repeated for a maxi-

mum of Imax = 10 iterations or until the convergence. The algorithm has been initiated by selecting C0= 20

random parameter groups.

The optimal parameter group updates the snapshot matrix, and consecutively the algorithm generates a new

reduced-order basis at each greedy iteration. Figure 8 shows the evolution of maximum and average residual

errors with each iteration of the adaptive greedy algorithm. The residual error decreases with each incrementing

iteration and hence the algorithm succeeded in locating the optimal parameter group efficiently. To monitor the

convergence of the adaptive greedy algorithm, we have designed an error model ¯efor the relative error eas a

function of the residual error ε. Figure 9 shows the designed error model based on the available error set Epfor

four different greedy iterations. The error plot exhibits a strong correlation between the relative error and the

residual error. The results indicate that a consideration of the linear error model is satisfactory to capture the

overall behavior of the exact error as a function of the residual error.

We used the reduced-order basis obtained from the adaptive greedy sampling procedure to design the reduced-

order model. Figure 10 presents the relative error plot for the parameter groups ρ238,and ρ786. We see that

the adaptive greedy approach gives better results as compared to the classical greedy method. With a reduced

dimension of d= 6, we obtained an excellent result as the relative error is less than 10−3.

5. Numerical Example

10-8 10-6 10-4 10-2

10-6

10-5

10-4

10-3

10-2

10-1

(a) Iteration no. 2.

10-8 10-6 10-4 10-2

10-6

10-5

10-4

10-3

10-2

10-1

(b) Iteration no. 4.

10-8 10-6 10-4 10-2

10-6

10-5

10-4

10-3

10-2

10-1

10-8 10-6 10-4 10-2

10-6

10-5

10-4

10-3

10-2

10-1

(d) Iteration no. 8.

Figure 9: The error model ¯ebased on the available error set Epfor four different greedy iterations.

0 10 20 30 40 50 60

10-7

10-6

10-5

10-4

10-3

10-2

10-1

Figure 10: Comparison of the classical greedy approach vs adaptive greedy approach (CG - Classical greedy

sampling and AG - Adaptive greedy sampling).

5.4. Computational cost

In the case of the classical greedy sampling approach, the algorithm solves Creduced-order models and one

high dimensional model at each greedy iteration. It also computes a truncated SVD of the updated snapshot

matrix with each proceeding iteration. Let tROM be the time required to solve one reduced-order model, tHDM

5. Numerical Example

be the computational time required for one high dimensional model, and tSVD be the time required to obtain a

truncated SVD of the snapshot matrix. The total computational time TCG

Qrequired to obtain the reduced-order

basis in the case of the classical greedy sampling approach can be given as

TCG

Q≈C×tROM + (tHDM +tSVD)×i.

Similarly, in the case of the adaptive greedy approach, the total computational time TAG

Qcan be given as

TAG

Q≈C0×tROM +k(Ck×tROM +tSM +tev

SM)

| {z }

tρI

+tHDM +tSVD + 2taf,bf

ROM +tEM×i,

where tSM and tev

SM denote the computational times required to build and evaluate a surrogate model for the

entire parameter space respectively. tEM is the time required to build an error model. The term 2taf,bf

ROM shows

the computational time needed to solve the reduced-order model after and before updating the ROB.

Table 7 compares the computational times required to generate the reduced-order basis Qfor different sets of

Pin case of the classical greedy sampling approach with the computational time needed to obtain the reduced-

order basis in case of the adaptive greedy sampling approach.

Table 7: Computation time/ reduction time (TQ) to generate projection subspace.

Algorithm Cardinality |ˆ

P|Maximum iterations Imax Computational time

Classical greedy sampling 20 10 56.82 s

Classical greedy sampling 30 10 82.54 s

Classical greedy sampling 40 10 95.04 s

Adaptive greedy sampling 40 10 183.21 s

Table 8: Evaluation time.

Algorithm Model Eva. time

single ρs

Total Eva.

time (Teva)

Total time

TQ+Teva

HDM, M= 600 0.2193 s 2193.72 s 2193.72 s

Classical greedy sampling ROM, d= 5 0.0102 s 102.56 s 197.60 s

Classical greedy sampling ROM, d= 10 0.0125 s 125.48 s 220.52 s

Adaptive greedy sampling ROM, d= 6 0.0104 s 104.38 s 287.59 s

Adaptive greedy sampling ROM, d= 10 0.124 s 124.32 s 307.53 s

The computational time tρIrequired to locate the optimal parameter group by constructing a surrogate model

for one greedy iteration is approximately 8 seconds. tρIis nothing but the time required to complete a while

loop outlined in the algorithm 4 for a single greedy iteration. Thus, the total time contributed to generate the

reduced-order basis via surrogate modeling is Imax ×tρI= 78.56 s, considering the adaptive greedy algorithm

runs for Imax iterations. Nonetheless, Fig. 10 shows that we can truncate the algorithm after 4 or 5 iterations

as the residual error fall below 10−4.

5. Numerical Example

The computational times required to simulate reduced-order models and high dimensional models is presented

in Table 8. The evaluation columns give the time needed to solve the systems generated for both high dimen-

sional models and reduced-order models. The time required to solve the complete system with a parameter

space of 10000 ×mfor both a high dimensional model and a reduced-order model is given in the total time

column. We can see that the evaluation time required for the reduced-order model is at least 18-20 times less

than that of the high dimensional model. However, there is a slight increase in total time due to the addition

of reduction time TQ. One can also observe that with an increase in the dimension dof the reduced-order

model, the evaluation time increases significantly. The reduced-order model with the classical greedy sampling

approach is at least 10-11 times faster than the high dimensional model. The time required to simulate the

reduced-order model with the adaptive greedy sampling approach is a bit higher than its counterpart due to the

time invested in building surrogate and error models. Despite of that, the reduced-order model is at least 8-9

times faster than the high dimensional model. The computational time presented in both tables considers that

the greedy algorithms run for maximum iterations Imax. However, we can truncate the algorithms after 4 or 5

iterations, i.e., we can practically achieve even more speedup than commented here.

5.4.1. Floater Scenario Values

To design a key information document, we need the values of the floater at different spot rates. The spot rate

rsp is the yield rate at the first tenor point from the simulated yield curve. The value of a floater at the spot

rate rsp is nothing but the value at that short rate r=rsp. For 10,000 simulated yield curves, we get 10,000

different spot rates and the corresponding values for the floater. These several thousand values are further used

to calculate three different scenarios: (i) favorable scenario, (ii) moderate scenario, (iii) unfavorable scenario.

The favorable, moderate, and unfavorable scenario values are the values at 90th percentile, 50th percentile and

10th percentile of 10,000 values respectively.

Table 9: Results for a floater with cap and floor.

Scenario 5 years 10 years

Favorable (90th percentile) 1.0759 1.0829

Moderate (50th percentile) 1.0578 1.0615

Unfavorable (10th percentile) 1.0183 1.0254

6. Conclusion

This paper presents a parametric model order reduction approach for a high dimensional convection-diffusion-

reaction PDE that arises in the analysis of financial risk. The high dimensional parameter space with time-

dependent parameters is generated via the calibration of financial models based on market structures. A finite

difference method has been implemented to solve such a high dimensional PDE. The selection of parameters

to obtain the reduced-order basis is of utmost importance. We have established a greedy approach for param-

eter sampling, and we noticed that there are some parameter groups for which the classical greedy sampling

approach gave unsatisfactory results. To overcome these drawbacks, we applied an adaptive greedy sampling

method using a surrogate model for the error estimator that is constructed for the entire parameter space and

further used to locate the parameters which most likely maximizes the error estimator. The surrogate model is

built using the principal component regression technique.

We tested the designed algorithms for a numerical example of a floater with cap and floor solved using the

Hull-White model. The results indicate the computational advantages of the parametric model order reduction

technique for the short-rate models. A reduced-order model of dimension d= 6 was enough to reach an ac-

curacy of 0.01%. The reduced-order model was at least 10-12 times faster than that of the high dimensional

model. The developed model order reduction approach shows potential applications in the historical or Monte

Carlo value at risk calculations as well, where a large number of simulations need to be performed for the

underlying instrument.

References

[1] M. Aichinger and A. Binder. A Workout in Computational Finance. John Wiley and Sons Inc., West

Sussex, UK, 1 edition, 2013.

[2] H. Albrecher, A. Binder, V. Lautscham, and P. Mayer. Introduction to Quantitative Methods for Financial

Markets. Springer-Verlag, Berlin, 1 edition, 2010.

[3] D. Amsallem, M. Zahr, and Y. Choi. Design optimization using hyper-reduced-order models. Struct.

Multidisc. Optim., 51(4):919–940, 2015.

[4] D. Anderson, J.C. Tannehill, and R.H. Pletcher. Computational Fluid Mechanics and Heat Transfer. CRC

Press, London, 3 edition, 2013.

[5] G. Berkooz, P. Holmes, and J.L. Lumley. The proper orthogonal decomposition in the analysis of turbulent

flows. Annu. Rev. Fluid Mech., 25(1):539–575, 1993.

[6] J. Bicksler and A. Chen. An economic analysis of interest rate swaps. J. Finance, 3:645–655, 1986.

[7] A. Binder. A clever handful is enough. Wilmott, pages 10–14, 2007.

[8] M. Briani, L. Caramellino, and A. Zanette. A hybrid tree/finite-difference approach for

Heston–Hull–White-type models. J. Comput. Finance, 21(3), 2017.

[9] D. Brigo and F. Mercurio. Interest Rate Models - Theory and Practice. Springer-Verlag, Berlin, 2 edition,

2006.

[10] T. Bui-Thanh, K. Willcox, and O. Ghattas. Model reduction for large-scale systems with high-dimensional

parametric input space. SIAM J. Sci. Comput., 30(6):3270–3288, 2008.

[11] A. Chatterjee. An introduction to the proper orthogonal decomposition. Curr. Sci., 78:808–817, 2000.

[12] A. Cohen and R. DeVore. Approximation of high-dimensional parametric PDEs. Acta Numerica, 24:1–

159, 2015.

[13] A. Corhay and A.T. Rad. Statistical properties of daily returns: Evidence from european stock markets.

J. Bus. Finan. Account., 21(2):271–282, 1994.

[14] J.C. Cox, J.E. Ingersoll Jr., and S.A. Ross. A theory of the term structure of interest rates. Econometrica,

53(2):385–407, 1985.

[15] G.A. Darbellay. A note on the volatility term structure in short rate models. J. Appl. Math. Mech.,

78:885–886, 1998.

[16] R.M. Dudley. Unifrom Central Limit Theorems. Cambridge University Press, New York, NY, US, 1

edition, 2010.

[17] E. Ekstr¨

om, P. L¨

otstedt, and J. Tysk. Boundary values and finite difference methods for the single factor

term structure equation. J. Appl. Math. Finance, 16:253–259, 2009.

[18] H. Engl. Calibration problems–an inverse problems view. Wilmott, pages 16–20, 2007.

[19] European Commission. Commission delegated regulation (EU) 2017/653 OJ L 100. Off. J. EU, 1:1–52,

2017.

[20] F. Fabozzi. Valuation of Fixed Income Securities and Derivatives. John Wiley and Sons Inc., 3 edition,

1998.

[21] A. Falc´

o, L. Navarro, and C.V. Cend´

on. Finite difference methods for hull-white pricing of interest rate

derivatives with dynamical consistent curves. Soc. Sci. Res. Net. Elec. J., 2014.

[22] G.H. Golub and C. Reinsch. Singular value decomposition and least squares solutions. Numer. Math.,

24:403–420, 1970.

[23] M.D. Graham and I.G. Keverekidis. Alternative approaches to the K arhunen-lo´

eve decomposition for

model reduction and data analysis. Comput. Chem. Eng., 20(5):495–506, 1996.

References

[24] M.D. Graham and I.G. Keverekidis. Proper orthogonal decomposition and its applications part I: Theory.

J. Sound Vib., 252(3):527–544, 2002.

[25] M. Grepl and A. Patera. A posteriori error bounds for reduced-basis approximations of parametrized

parabolic partial differential equations. M2AN, Math. Model. Numer. Anal., 39:157–181, 2005.

[26] M. Grigoriu. Stochastic Calculus: Applications in Science and Engineering. Birkh¨

auser, Basel, 1 edition,

2002.

[27] A. Gupta and M.G. Subrahmanyam. Pricing and hedging interest rate options: Evidence from cap-floor

markets. J. Bank. Finance, 29:701–733, 2005.

[28] J. Hadamard. Sur les probl´

emes aux d´

eriv´

ees partielles et leur signification physique. Princeton University

Bulletin, 13:49–52, 1902.

[29] T. Haentjens and K. In’T Hout. Alternating direction implicit finite difference schemes for the Heston-

Hull-White partial differential equation. J. Comput. Finance, 16:83–110, 2012.

[30] J.C. Heinrich, P.S. Huyakorn, O.C. Zienkiewicz, and A.R. Mitchell. An ’upwind’ finite element scheme

for two-dimensional convective transport equation. Int. J. Numer. Meth. Engng., 11:131–143, 1977.

[31] J. Hull and A. White. Pricing interest-rate-derivative securities. Rev. Financial Stud., 3(4):573–592, 1990.

[32] J. Hull and A. White. One-factor interest-rate models and the valuation of interest-rate derivative securi-

ties. J. Financ. Quant. Anal., 28(2):235–254, 1993.

[33] J. Hull and A. White. Numerical procedures for implementing term structure models I: single-factor

models. J. Deriv., 2:7–16, 1994.

[34] G. James, D. Witten, T. Hastie, and R. Tibshirani. An Introduction to Statistical Learning. Springer-

Verlag, New York, NY, 1 edition, 2013.

[35] I. Jolliffe. Principal Component Analysis. Springer-Verlag, Berlin, 1 edition, 2014.

[36] D. Jones. A taxonomy of global optimization methods based on response surfaces. J. Global Optim.,

21(4):345–383, 2001.

[37] H. Lee, Y. Park, and S. Lee. Principal component regression by principal component selection. Commun.

Stat. Appl. Methods, 22(2):173–180, 2015.

[38] J.D. Lucia, P.S. Beran, and W.A. Silva. Reduced order modeling: new approaches for computational

physics. Prog. Aerosp. Sci., 40(1-2):51–117, 2004.

[39] R. Mahnke, J. Kaupuzs, and I. Lubashevsky. Physics of Stochastic Processes: How Randomness Acts in

Time. WILEY-VCH Verlag GmbH and Co., Weinheim, 1 edition, 2008.

[40] MathConsult. Calibration of interest rate models. Report, MathConsult GmbH, Linz, Austria, 2009.

[41] A. Paul-Dubois-Taine and D. Amsallem. An adaptive and efficient greedy procedure for the optimal

training of parametric reduced-order models. Int. J. Numer. Meth. Engng., 102:1262–1292, 2014.

[42] R. Pinnau. Model reduction via proper orthogonal decomposition. In W. Schilders, H. van der Vorst, and

J. Rommes, editors, Model Order Reduction: Theory, Research Aspects and Applications, pages 95–109,

Berlin, Heidelberg, 2008. Springer-Verlag.

[43] I. Piotr and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality.

In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, pages 604–613, New

York, NY, USA, 1998. ACM Press.

[44] M. Rathinam and L.R. Petzold. A new look at proper orthogonal decomposition. SIAM J. Numer. Anal.,

41(5):1893–1925, 2003.

[45] S.E. Shreve. Stochastic Calculus and Finance. Springer-Verlag, New York, US, 1 edition, 2004.

[46] L. Sirovich. Turbulence and the dynamics of coherent structures. Part I: coherent structures. Quart. Appl.

Math., 45(3):561–571, 1987.

References

[47] G. Sun and C.W. Trueman. Efficient implementations of the crank-nicolson scheme for the finite-

difference time-domain method. IEEE Trans. Microw. Theory Tech., 11:131–143, 1977.

[48] Thomson Reuters Eikon. Interest rate data, December 2018. Retrieved from

eikon.thomsonreuters.com.

[49] O. Vasicek. An equilibrium characterization of the term structure. J. Financial Econ., 5(2):177–188,

1977.

[50] K. Veroy and A. Patera. Certified real-time solution of the parametrized steady incompressible

Navier–Stokes equations: rigorous reduced-basis a posteriori error bounds. Int. J. Numer. Meth. Fluids,

47:773–788, 2005.

[51] A. Vidal and D. Sakrison. On the optimality of the Karhunen-Lo´

eve expansion (corresp.). IEEE Trans.

info. theory, 15(2):319–321, 1969.

[52] M.O. Williams, P.J. Schmid, and J.N. Kutz. Hybrid reduced-order integration with proper orthogonal

decomposition and dynamic mode decomposition. SIAM J. Multiscale Model. Simul., 11(2):522–544,

2013.

[53] P. Wilmott and S. Howson. The Mathematics of Financial Derivatives: A Student Introduction. Cambridge

University Press, London, 1 edition, 2002.

Glossary

Packaged retail investment and insurance-based products Packaged retail investment and

insurance-based products are a broad class of financial instruments that are provided to customers

in the EU through banks or financial institutions, which include stocks, bonds, insurance policies,

structured funds, structured deposits, and structured products. 1

key information document A key information document is a 3-page document provided to the customers

by financial institutions for the invested product. It includes the risk and reward profile of the product,

the cost of the product, the recommended holding period, possible outcomes, etc., so that investors can

understand the product easily.. 1

yield curve The yield curve is the curve showing interest rates plotted against different maturities for the

same financial instrument. 1

cap An upper limit on the interest rate for a floating interest rate instrument. 1

swaption a. Swap: A swap is an agreement between two parties to exchange financial instruments for a

certain time.

b. Swaption: A swaption is an option to enter into a swap. 1

tenor Tenor refers to the amount of time left until the financial instrument expires. The maturity time points

on a yield curve are also known as tenor points. 1

bank account A bank account is an interest-bearing account at a bank to earn interest on the invested amount.

short-rate Short rate is the rate at which bank account grows. 3

stock Stock is a type of security that implies proportionate ownership in the issuing corporation. 3

return The amount of money returned to an account holder is typically referred to as the return. 3

risk-neutral Risk neutral term describes the investment which is insensitive to risk. 4

portfolio In finance, a portfolio is a collection of investments held by an investment company, a hedge fund,

a financial institution or an individual. 4

arbitrage Arbitrage is the simultaneous purchase and sale of an instrument to benefit from an imbalance in

the price. 4

drift The drift is the change of average value of a process. 6

observation period One observation period is one working day at which the statistical data has been col-

lected. 7

recommended holding period The recommended holding period gives an idea to an investor that for how

long should an investor hold the product to minimize the risk. 8

A. Relation between a singular value decomposition and a principal component analysis

Consider a data matrix Xof size n×m, where nis the number of samples and mis the number of variables.

Let us also assume that the data matrix Xis centered. We can decompose the matrix Xusing singular value

decomposition (SVD) as

X= ΦΣΨT,(57)

where the matrices Φand Ψcontain left and right singular vectors of the matrix Xrespectively. The matrix

Σis a diagonal matrix having singular values Σiarranged in descending order along the diagonal. Consider a

covariance matrix Cof order m×mas follows:

C=XTX. (58)

We now compute an eigendecomposition, also known as spectral decomposition of the covariance matrix as

C=XTX= Ψ¯

ΣΨT,(59)

where Ψis the matrix of eigenvectors and ¯

Σis a diagonal matrix having eigenvalues λiarranged in descending

order along the diagonal. In the principal component analysis (PCA), the columns of a matrix XΨare known as

the principal components, while the columns of the matrix Ψare known as principal directions or PCA loading.

Furthermore, one can easily see the similarities between the SVD and the PCA. We can write a covariance

matrix as follows

XTX= (ΦΣΨT)TΦΣΨT,

XTX= ΨΣΦTΦΣΨT,

XTX= ΨΣ2ΨT,

(60)

where ΦTΦ = I. Now we can see that the right singular vectors of the matrix Xare simply the eigenvectors of

the matrix C, and the singular values of Xare related to the eigenvalues of Cas λi= Σ2