Document [original]

An end-to-end-trainable iterative network architecture for accelerated radial

multi-coil 2D cine MR image reconstruction

Andreas Koﬂer

Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany

Markus Haltmeier

Department of Mathematics, University of Innsbruck, Innsbruck 6020, Austria

Tobias Schaeffter

Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany

School of Imaging Sciences and Biomedical Engineering, King’s College London, London SE1 7EH, UK

Department of Biomedical Engineering, Technical University of Berlin, Berlin 10623, Germany

Christoph Kolbitsch

Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany

School of Imaging Sciences and Biomedical Engineering, King’s College London, London SE1 7EH, UK

(Received 24 December 2020; revised 11 February 2021; accepted for publication 18 February 2021;

published 1 April 2021)

Purpose: Iterative convolutional neural networks (CNNs) which resemble unrolled learned iterative

schemes have shown to consistently deliver state-of-the-art results for image reconstruction problems

across different imaging modalities. However, because these methods include the forward model in

the architecture, their applicability is often restricted to either relatively small reconstruction prob-

lems or to problems with operators which are computationally cheap to compute. As a consequence,

they have not been applied to dynamic non-Cartesian multi-coil reconstruction problems so far.

Methods: In this work, we propose a CNN architecture for image reconstruction of accelerated 2D

radial cine MRI with multiple receiver coils. The network is based on a computationally light CNN

component and a subsequent conjugate gradient (CG) method which can be jointly trained end-to-

end using an efficient training strategy. We investigate the proposed training strategy and compare

our method with other well-known reconstruction techniques with learned and non-learned regular-

ization methods.

Results: Our proposed method outperforms all other methods based on non-learned regularization.

Further, it performs similar or better than a CNN-based method employing a 3D U-Net and a method

using adaptive dictionary learning. In addition, we empirically demonstrate that even by training the

network with only iteration, it is possible to increase the length of the network at test time and further

improve the results.

Conclusions: End-to-end training allows to highly reduce the number of trainable parameters of and

stabilize the reconstruction network. Further, because it is possible to change the length of the net-

work at the test time, the need to find a compromise between the complexity of the CNN-block and

published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.

[https://doi.org/10.1002/mp.14809]

Key words: deep learning, inverse problems, magnetic resonance imaging, neural networks

1. INTRODUCTION

Magnetic resonance imaging (MRI) is an important tool for

the assessment of different cardiovascular diseases. Cardiac

cine MRI, for example, is used to assess the cardiac function

as well as left and right ventricular volumes and left ventricu-

lar mass.

However, MRI is also known to suffer from rela-

tively long data acquisition times. In cardiac cine MRI, the

data acquisition usually has to take place during a single

breathhold of the patient in order to avoid artifacts arising

from the respiratory motion. Since for patients with limited

breathhold capabilities this can be challenging, undersam-

pling techniques can be used to accelerate the measurement

process. Undersampling in Fourier domain, the so-called k-

space, leads to a violation of the Nyquist–Shannon sampling

theorem and therefore, regularization techniques must be

used to reconstruct artifact-free images.

Recently, image reconstruction has experienced a para-

digm shift with the re-emergence of neural networks (NNs),

and in particular, convolutional NNs (CNNs) which can be

employed as regularization methods.

CNNs can be used in

different ways as regularization methods for image recon-

struction problems. A key element to categorize these meth-

ods is whether the forward operator is used in the learning

process.

A straightforward approach is to simply post-

process an initial estimate of the solution which is impaired

2412 Med. Phys. 48 (5), May 2021 0094-2405/2021/48(5)/2412/14

behalf of American Association of Physicists in Medicine. This is an open

access article under the terms of the Creative Commons Attribution

License, which permits use, distribution and reproduction in any medium,

provided the original work is properly cited.

2412

by noise and artifacts (see, e.g., Ref. [4–7]). Further,

cross-/multi-domain methods which pre-process the k-space

and subsequently process the initially obtained reconstruction

have been investigated as well.

However, the post-processed

image might lack data consistency, meaning that it is not

clear how well the post-processed image matches the mea-

sured k-space data. Thus, approaches which used the output

of a pretrained CNN as image-priors have been considered as

well (see, e.g., Ref. [9,10]). Thereby, the CNN-priors are used

in the formulation of a Tikhonov functional which is subse-

quently minimized to increase the data consistency of the

solution. These methods have the (computational) advantage

of decoupling the regularization step from increasing data

consistency and are thus also applicable to arbitrary large-

scale image reconstruction problems. However, the network

training is typically carried out in the absence of the forward

model and although this strategy was reported to be success-

ful, image reconstruction methods based on NNs have been

reported to possibly suffer from instabilities.

This issue

could directly affect the obtained CNN-prior and thus, affect

the quality of the final reconstruction, since its solution

depends on the CNN-prior. Interestingly, in Ref. [11], the

authors showed that the CNN-based reconstruction methods

which include the forward and adjoint operators in the net-

work architecture are the most stable with respect to small

perturbations and adversarial attacks. Further, including the

forward model in the network architecture has been reported

to lower the maximum error bound of the CNNs.

Thus, it is

desirable to find a way to include the physical model in the

CNN architecture, even for computationally expensive/large-

scale problems.

Methods in which the CNNs architectures resemble

unrolled iterative schemes of finite length are referred to as

variational/iterative/cascaded networks (see, e.g., Ref. [13-

19,43]). Thereby, these methods typically consist of CNN-

blocks and data consistency (DC)-blocks, which are given in

the form of gradient steps with respect to a data-fidelity term

(see, e.g., Ref. [14,19]), or as layers which implement the

minimizer of a functional involving a data-fidelity term (see,

e.g., Ref. [13,16,18]). While the CNN-blocks can be inter-

preted as learned regularizers, the DC-blocks utilize the mea-

sured undersampled k-space data to increase/ensure the data

consistency of the intermediate CNN outputs.

Unfortunately, including the forward and adjoint operators

in the CNN can at the same time represent the computational

bottleneck of these methods, since the forward operator as

well as its adjoint can be computationally expensive to evalu-

ate. As a consequence, the applicability of iterative networks

is currently still limited to either reconstruction problems

with easy-to-compute forward operators, for example, a FFT

sampled on a Cartesian grid

13,16,18,21

or to non-dynamic prob-

lems with non-Cartesian sampling schemes, see Ref. [22]for

a single-coil data acquisition or Ref. [23] for a multiple-coil

acquisition. For example, in Ref. [13,16,18] only a single-coil

is used for the encoding operator. In,

multiple receiver coils

were used for a dynamic 3D reconstruction problem with a

Cartesian sampling grid. For non-dynamic problems, in

contrast, iterative CNNs for multi-coil data acquisition on a

Cartesian grid have been received more attention (see, e.g.,

Ref. [14,15,24]).

Acquisition protocols using non-Cartesian sampling tra-

jectories such as radial or spiral sampling can be an attractive

alternative to standard Cartesian acquisitions, especially

because the arising undersampling artifacts are much more

incoherent compared to the ones arising from a Cartesian

acquisition.

However, radially acquired k-space data are

computationally more demanding to reconstruct as the for-

ward encoding operator involves gridding on a Cartesian grid

before the fast Fourier transform can be applied, see Ref. [26]

for more details. In,

an iterative network for a non-dynamic

radial single-coil acquisition was proposed. However, the

standard in the clinical routine is in fact given by multi-coil

data-acquisitions

through which the acquisition of the k-

space data increases in terms of computational complexity.

Finally, acquiring a dynamic process is inherently computa-

tionally more demanding as for each time point, the encoding

operator has to be applied to the image.

In this work, we present a novel computationally light and

efficient CNN-block architecture as well as a training strategy

which are tailored to image reconstruction of dynamic multi-

coil radial MR data of the heart. The combination of the pro-

posed network architecture and training strategy allows to

train the entire reconstruction network in an end-to-end man-

ner, even for dynamic problems with nonuniformly sampled

data as well as multiple receiver coils. To the best of our

knowledge, this is the first work which overcomes these

computational difficulties and presents such an end-to-end

trainable network architecture for dynamic non-Cartesian

multi-coil MR reconstruction problems.

The paper is structured as follows: In Section 2, we for-

mally introduce the reconstruction problem and the proposed

method by discussing in more detail the used CNN-block as

well as the training strategy. In Section 3, we show extensive

experiments to validate the efficacy of our approach and com-

pare with to different state-of-the-art methods for dynamic

cardiac radial MRI. We then discuss the main advantages and

limitations of our work in Section 4and conclude the work in

Section 5.

2. MATERIALS AND METHODS

2.A. Problem formulation

Let x∈Nwith N¼NxNyNtdenote the vector rep-

resentation of a complex-valued cine MR image, that is,

x¼½x1,...,xNtT. The forward operator Amaps the dynamic

cine MR image to its corresponding k-space. In this work, we

focus on a 2D radial encoding operator using multiple recei-

ver coils. More precisely, the operator Ais given by.

A:¼INcEðÞC, (1)

where INcdenotes the identity operator and Cconsists of the

concatenation of the Ncdifferent coil-sensitivity maps which

are multiplied with the cine MR image, that is,

Medical Physics, 48 (5), May 2021

2413 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2413

C¼½C1,...,CNcT, with Cj¼diag cj,cj,...,cj



∈NNand

cj∈NxNy. The operator E¼diagðE1,...,ENtÞdenotes a

composition of 2D radial encoding operators Etwhich for

each temporal point t∈f1,⋯,Ntgsample a 2D image

xt∈NxNyalong a non-Cartesian grid in the Fourier domain.

In particular, for this work, we consider trajectories given by

the radial golden-angle method

but point out that other tra-

jectories can be considered as well. By J¼1, ...,Nrad

,we

denote the set of indices the k-space coefficients which are

needed to sample a 2D image xtat Nyquist limit. In order to

accelerate the image acquisition process, the set Jis typically

constructed by reducing the number of radial lines. We

denote by AIthe radial Fourier encoding operator which sam-

ples a complex-valued 2D cine MR image xat a radial grid

given by the indices in the set I¼I1∪...∪INtwith It⊂Jfor

all t¼1, ...,Nt:Thus, the considered reconstruction problem

is given by

AIxþe¼yI, (2)

where yI¼½y1

I,...,yNc

ITwith yc

I∈Ntmrad ,forc¼1,⋯,Nc

denotes the measured k-space data for the dynamic 2D

cine MR image xand e∈Nkwith Nk¼Ntmrad Ncdenotes

random noise. Multiple receiver coils are typically used in

order to achieve mrad >Nradsuch that problem (2) is overde-

termined and can be solved by considering the normal

equations

IAIx¼AH

IyI:(3)

System (3) could in principle be solved by conjugate gra-

dient (CG)-like methods yielding an approximation of the

solution given by.

x∗¼ðAH

IAIÞ1AH

IyI, (4)

where the configuration of the receiver coils ensures that the

operator AH

IAIis invertible. However, for non-Cartesian tra-

jectories, problem (3) is ill-conditioned. Thus, methods used

to approximate x∗can exhibit a semi-convergence behavior

and lead to undesired noise amplification.

Hence, regular-

ization techniques must be used to stabilize the inversion pro-

cess.

In this work, we propose a reconstruction network based

on a fixed-point iteration which has the form.

Θ¼fDC

λ∘uΘ



∘...∘fDC

λ∘uΘ



|ﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ{zﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄﬄ}

Mtimes

¼ðfDC

λ∘uΘÞM, (5)

where uΘis a CNN-block which reduces undersampling arti-

facts, and noise and fDC

λis a module which increases data

consistency of the intermediate outputs of the CNN-blocks.

Limits of Eq. (5) simultaneously satisfy data consistency

induced by the DC module and regularity induced by the

CNN-block.

For example, in the simplest case where fDC

λis defined as

the minimizer of a penalized least squares functional [see Eq.

(14)] and where uΘdefines a linear projection, the fixed-point

iteration Eq. (5) can be shown to converge to the minimizer

of the functional.

Fλ,yIxðÞ¼kAIxyIk2

2þλkxuΘxðÞk

2:(6)

Note that, similar to

but in contrast to,

13,14,16

the set of

parameters Θis the same for each CNN-block and therefore

allows a repeated application of the iterative network uM

Θin

general. Further, the proposed CNN-block differs from the

one in,

as we shall discuss later. Figure 1shows an illustra-

tion of the described reconstruction network.

In the following, we describe the CNN-block uΘand the

DC-module fDC

λin more detail.

2.B. Proposed CNN-block

In order to keep the notation as simple as possible, we

neglect the indices referring to the iteration within the net-

work. We illustrate the CNN-block for the first iteration of

the network, that is, when the input of the CNN-block is the

initial nonuniform FFT (NUFFT)-reconstruction. The process

described below is employed within every CNN block in the

proposed network architecture. Let Wdenote a diagonal

operator which contains the entries of the density compensa-

tion function in k-space. By xI:¼A#

IyI:¼AH

IWyI,we

denote the initial NUFFT-reconstruction which is obtained by

re-gridding the density-compensated measured k-space coef-

ficients onto a Cartesian grid and applying the inverse FFT

(IFFT). First, we compute a temporal average of the input

over all cardiac phases, that is,

νI¼1

∑

t¼1

xI,t∈NxNy(7)

and stack it along the temporal axis, that is,

μI¼½νI,⋯,νI∈NxNyNt. Then, we subtract μIfrom the

initial NUFFT-reconstruction xIand apply a temporal Fourier

transform Ft, that is, we obtain z¼FtðxIμIÞ. Subtracting

the temporal average from image frames is a well-known and

established approach to sparsify image sequences, for exam-

ple in video-compression (see, e.g., Ref. [30]). Further, apply-

ing a temporal FFT provides an even sparser representation

and has been used for example in Refs. [18,31] for dynamic

MR image reconstruction. Thus, the CNN-block learns to

reduce the present undersampling artifacts and noise in a

sparse domain. We define the operators Rxt and Ryt which

rotate zby appropriately permuting the relevant axes of the

images such that

zxt ¼Rxtz∈NyNxNt(8)

zyt ¼RytzðNxNyNtÞ:(9)

At this point, zxt and zyt can be interpreted as Nycom-

plex-valued images of shape NxNtand Nxcomplex-val-

ued images of shape NyNt, respectively. Then, a simple

2D U-Net,

which we denote by cΘis applied both to zxt

and zyt. Note that we use the same U-Net for both xt- and

yt-branches, that is, the weights are shared among the two.

This suggestion relies on the assumption of an isotropic

resolution in x- and y-direction and the fact that for a radial

sampling, the artifacts in xt-andyt-direction do not differ

Medical Physics, 48 (5), May 2021

2414 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2414

across the branches. For a Cartesian acquisition, where the

latter assumption is clearly not valid, one could simply

assign two different CNNs cΘxt and cΘyt to the two different

branches. Also, note that because the U-Net only consists

of convolutional and max-pooling layers and has no fully

connected layers, the approach can be used for the case

Nx≠Nyas well. Then, we obtain.

zxt

CNN ¼cΘzxt

ðÞ (10)

zyt

CNN ¼cΘzyt

ðÞ:(11)

and calculate the estimate zCNN by reassembling the pro-

cessed spatiotemporal slices, that is,

zCNN ¼1

2ðRxtÞTzxt

CNN þRyt

ðÞ

Tzyt

CNN



:(12)

The factor 1=2 in Eq. (12) is needed because every pixel is

used twice —once in the xt-branch and once in the yt-

branch. Finally, the estimate xCNN is given by applying FH

and adopting a residual connection, that is,

xCNN ¼FH

tzCNN þμI:(13)

Figure 2shows an illustration of the just described

procedure. The 2D U-Net employed in our proposed network

consists of three encoding stages which are interleaved by

max-pooling layers. Each stage consists of a block of 2D con-

volutional layers with Leaky ReLU as activation function.

The number of initially extracted feature maps which is dou-

bled after each max-pooling layer is nf¼16 and is on pur-

pose chosen to be relatively small in order to keep the

model’s complexity as low as possible. The reason for that is

that the 2D U-Net will have to be used together with a (com-

putationally expensive) CG-block, which we describe in the

next subsection. In the decoding path, bilinear upsampling

followed by a 3 3 convolutional layer with no activation

function is used to upsample the images. Our 2D U-Net also

involves skip connections from the last to the first layers of

the corresponding stages in the encoding and the decoding

path. The residual connection is performed in image domain

rather than in Fourier domain.

2.C. DC-module

We illustrate the DC-module for the first iteration of the

network, that is, where the incoming input of the CNN-block

is given by the initial reconstruction. For later iterations, the

construction is analogous.

Let us fix xCNN ¼uΘðxIÞand consider the functional.

Fλ,yIxðÞ¼kW1=2AIxyI

ðÞk

2þλkxxCNN k2

2, (14)

where W1=2denotes the square root of the diagonal operator

which contains the entries of the density compensation func-

tion. For the special case where the operator Ais an isometry,

problem (14) has a simple closed-form solution (see, e.g.,

Ref. [13,16]). The involvement of Win functional (14) can be

motivated as a weighting factor which is used for precondi-

tioning the linear system which needs to be solved when min-

imizing Eq. (14).

This aspect is more clearly visible later in

Eq. (15). A simple example of Abeing an isometry is given

by a single-coil acquisition on a Cartesian grid using a simple

FFT. In this case, the solution of Eq. (14) can be obtained by

performing a linear combination of the measured k-space

data yIand the one estimated by applying AIto xCNN. See,

for example, Ref. [13], for more details. However, in the more

general case, a minimizer of Eq. (6) is obtained by solving a

system of linear equations. By setting the derivative of Eq.

(14) with respect to xto zero, it can be easily seen that the

system one needs to solve is given by Hx ¼bwith

H¼A#

IAIþλIN,

b¼A#

IyIþλxCNN:(15)

System (15) can be solved by means of any iterative algo-

rithm and, since the operator His symmetric, an appropriate

choice is the conjugate gradient (CG) method.

Note that

functional Eq. (14) is linear in xand therefore, due to strong

convexity, solving Eq. (15) leads to the unique minimizer of

Eq. (14). In practice, as we discuss later in the implementa-

tion details, the DC-module is an implementation of a finite

number of iterations of the CG method. We denote the num-

ber of such iterations by nCG. Note that in the CG method, the

operator Hhas to be applied at each iteration. In addition,

note that in general, the application of AIas well as A#

Ican

be computationally way more expensive than for a simple

FFT, for example, if the gridding of the k-space coefficients

is part of the operators as in our case. Thus, it is desirable to

have a CNN-block with as few trainable parameters as possi-

ble such that end-to-end training of the entire network is pos-

sible in a reasonable amount of time.

2.D. Training scheme

Because the solution of Eq. (15) has to be approximated

using an iterative scheme which employs the

FIG. 1. The proposed reconstruction network alternates between the application of CNN-blocks uΘand and data onsistency blocks fDC

λwhich increase the con-

sistency of the outputs of the intermediate CNN-blocks by making use of the measured data which are implicitly given by the initial reconstruction xI.

Medical Physics, 48 (5), May 2021

2415 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2415

(computationally expensive) application of the operator Hat

each iteration, end-to-end training of the entire reconstruction

network from scratch would be time-consuming. Thus, we

circumvent this issue by the following more efficient training

strategy.

First, in a pretraining step, we only train a single CNN-

block on image pairs as is typically done in deep learning-

based post-processing methods. More precisely, we minimize

the L2-norm of the error between the output of the CNN-

block which was predicted from the initial reconstruction xI

and its corresponding label. Then, in a second training stage,

we construct a network as in Eq. (5) and initialize each of the

CNN-blocks by the previously obtained parameter set Θand

perform a further fine-tuning of the entire network by end-to-

end training. Further, the regularization parameter λcon-

tained in each of the CG-blocks fDC

λcan be included in the

set of trainable parameters as well and is trained to find the

optimal strength of the contribution of the regularization

term. This means that it implicitly learns to estimate the noise

level present in the measured k-space data for the whole data-

set.

2.E. Experiments with in-vivo data

To evaluate the efficacy of our proposed approach, we per-

formed different experiments. First, we investigated the effect

of our proposed training strategy. We also compared the pro-

posed network with different configurations of hyper-parame-

ters Mand nCG. Finally, we further compared our proposed

approach to several other methods for non-Cartesian cardiac

cine MR image reconstruction. In the following, we provide

the reader with information about the used dataset, the meth-

ods of comparison and some details on the implementation

of the proposed method.

2.F. Dataset

We used a dataset of cine MR images of n¼19 subjects

(15 healthy volunteers +4 patients). For each healthy volun-

teer as well as for two patients, Nz¼12 different orientations

of cine MR images were acquired. For the resting two

patients, only Nz¼6 slices were acquired due to limited

breathhold capabilities. Thus, we have a total of 216 com-

plex-valued cine MR images. We split the dataset in 144/36/

36 images used for training, validation, and testing, where for

the test set, we use the 36 cine MR images of the patients in

order to be able to qualitatively assess the images with respect

to clinically relevant features.

All images were acquired using a balanced steady-state

free precession (bSSFP) sequence on a 1.5 T MR scanner

during a single breathhold of 10 s (repetition time/echo

time =3.0/1.5ms, flip angle 60°). The images used as

ground-truth data for the retrospective undersampling were

reconstructed from the k-space data which were sampled

along Nθ¼3400 radial spokes with kt-SENSE.

The coil-

sensitivity maps, which were calculated from low-resolution

images obtained from the central part of the radial k-space

data, were used to combine the different images to a single

FIG. 2. The proposed CNN-block. First, the input image xIis Fourier transformed along the temporal axis, then the Fourier-transformed image is reshaped into

the xt- and yt-domain. The same 2D U-Net uΘis applied to each of the 2D slices. Then, the estimate in the temporal Fourier domain is calculated by reassembling

the processed slices and the output is obtained by applying the inverse temporal Fourier transform and summing it up to the input image. [Color figure can be

viewed at wileyonlinelibrary.com]

Medical Physics, 48 (5), May 2021

2416 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2416

image after the NUFFT-reconstruction. The images have a

field of view of NxNy¼320 320 mm

with an in-plane

resolution of 2 mm

and a slice thickness of 8 mm. The num-

ber of acquired cardiac phases was Nt¼30. A 32-channel

cardiac-phased array coil was used for signal reception and

parallel image reconstruction. From these images, we retro-

spectively simulated k-space data by sampling Nθ¼560 and

Nθ¼1130 radial spokes. Since sampling along Nθ¼3400

already corresponds to an acceleration factor of approxi-

mately ~3 (with respect to the Nyquist limit), which was

needed to perform the scan during one breathhold, Nθ¼560

and Nθ¼1130 correspond to acceleration factors of ~18 and

~9, respectively.

Further, the k-space data were corrupted by normally dis-

tributed noise with standard variation σ¼0:02 which was

added to each the real and imaginary parts of yc

Ifor

c¼1,⋯,12 after having centered them. The calculation of

the density compensation function for a fixed set of trajecto-

ries was based on partial Voronoi diagrams.

2.G. Quantitative measures

We evaluated the performance of our proposed network

architecture in terms of different error- and image-similarity-

based measures. The peak signal-to-noise ratio (PSNR) and

the normalized root mean squared error (NRMSE) are used

as error-based measures. Further, we report a variety of differ-

ent similarity-based measures: the structural similarity index

measure (SSIM),

the multi-scale SSIM (MS-SSIM),

the

universal image quality index (UQI),

the visual information

quality measure (VIQM),

and the Haar wavelet-based simi-

larity index measure.

We calculate all measures by comparing the 2D complex-

valued images in the xy-plane for each time point. This means

that our test set consists of 1080 2D images. For the calcula-

tion of the similarity measures, the real and imaginary part of

the images are treated as channels. The similarity-based mea-

sures are calculated for each channel separately and then aver-

aged across the two channels. The statistics were calculated

over a region of interest of 160 160 pixels in order to dis-

card background noise. Further, we segmented the patients in

the images of the test set in order for the statistics to reflect

the achieved performance on regions of interest.

2.H. Implementation details

The architecture was implemented in PyTorch. Complex-

valued images were stored as two-channeled images. The for-

ward and the adjoint operators AIand AH

Iwere implemented

using the publicly available library Torch KBNUFFT

42,43

which also allows to perform back-propagation across the for-

ward and adjoint operators. During training, the k-space tra-

jectories, the coil-sensitivity maps, and the density

compensation functions were stored as tensors for the imple-

mentation of AIand AH

I. Note that we make the regularization

parameter λtrainable such that a trade-off between the mea-

sured k-space and the output of the CNN-blocks is learned.

In order to constrain λto be strictly positive, during the fine-

tuning of the model, we apply a Softplus activation function,

that is, we set λ:¼SoftPlusð~

λÞ¼1

βðlogð1þexpðβ~

λÞÞ, which

maps ~

λto the interval ð0,∞Þ. We used the default parameter

β¼1. Note that a CG scheme is usually stopped after a cer-

tain stopping criterion is met. A commonly used stopping cri-

terion is if the norm of the newly calculated residual

rk¼Hxkbkis small enough, that is, krkk2≤TOLkbk2for

a tolerance TOL chosen by the user. During fine-tuning the

iterative network, we fix the number of CG iterations nCGbut

when testing the network on unseen data, we can choose to

use the number of iterations the CNN was fine-tuned with or

set an own stopping criterion. All experiments were per-

formed on an NVIDIA GeForce RTX 2080 with 11 GB

memory.

2.I. Comparison with other methods

We compared our proposed approach to the following

methods which employ recently published learned and well-

established non-learned regularization methods. As well-

established reconstruction methods, we applied iterative

SENSE,

a Total Variation (TV)-minimization method,

and kt-SENSE.

Further, we compared our proposed method

to a method based on dictionary learning (DL) and sparse

coding (SC)

46,47

using adaptive DL and adaptive SC

and a

method which employs CNN-based regularization in the form

of previously obtained image priors,

9,10

where we used a pre-

viously trained 3D U-Net

for obtaining the prior.

Note that on purpose we did not compare our proposed

approach with other CNN-based methods involving genera-

tive adversarial networks (GANs). The reason is that we are

mainly interested in the performance of the combination of

the proposed CNN-block in terms of artifacts-reduction as

well as the trade-off between employing a CG-module or not.

In addition, note that if the hardware allows it, it is always

possible to add different components based on GANs to regu-

larize the output of the CNN-blocks. Further, note that

although there exist several other state-of-the-art methods

using cascaded/iterative networks for dynamic cine MRI, see

for example Ref. [13,16,18] the underlying reconstruction

problem is a different one (single-coil and Cartesian vs. mul-

ti-coil and radial) and thus these methods are not directly

applicable as originally published.

3. RESULTS

In the following, we report the obtained results concerning

the training behavior and the reconstruction performance of

our proposed method.

3.A. Computational complexity of the forward and

adjoint operators

Here, we evaluated the computational complexity of the

proposed network architecture in terms of required GPU

memory as well as training times which can be estimated for

Medical Physics, 48 (5), May 2021

2417 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2417

the end-to-end training stage. Here, we fixed the number of

iterations of the network to be M¼1 and the CNN-block was

fixed to be the identity cΘ¼IN, that is, it contains no train-

able parameters. Thus, the allocated memory can be mainly

attributed to the tensors needed for the radial trajectories, the

density compensation, the coil-sensitivity maps which define

the operator, and the considered input image.

Figure 3shows the allocated GPU memory as well as the

required time to perform one weights-update by performing a

forward- and a backward-pass through the entire reconstruc-

tion network. By Nθwe denote the total number of radial

spokes which are acquired for each of the Ntcardiac phases.

The first row of Fig. 3shows the average allocated GPU

memory as well the required time to perform one weights-up-

date depending on the spatial image size. This is shown for

nCG ¼1 and different numbers of radial spokes Nθ. It can be

seen that already for nCG ¼1, the required GPU memory

amounts to approximately 512–1024 GB and performing one

step of back-propagation is in the range of 4 s. The second

row of Fig. 3shows the same quantities for fixed Nθ¼560

and different nCG. Here, we also see that employing a rela-

tively high number of CG iterations, say nCG ¼12, almost

requires 2 GB of GPU memory and, more importantly,

requires more than 30 s. Training the entire network in this

configuration for 100 epochs would, for example, already

require 5 days. By that, one can estimate that training the

entire network from scratch in an end-to-end manner could

easily amount to weeks or months. Further, the required time

does not vary much for different image sizes, meaning that

even for relatively small reconstruction problems, the applica-

tion of AIand AH

Iis inherently computationally demanding.

These computational aspects highlight the importance of

the efficacy of the CNN-block in terms of having a small

number of parameters to ensure fast convergence during

training and at the same time a good performance in terms of

undersampling artifacts-reduction.

3.B. Efﬁcacy of the training scheme

Here, we show the impact of including the forward and

the adjoint operators AIand AH

Iin the network architecture

during the learning process. Figure 4shows the training and

validation error of the proposed CNN-module during the pre-

training stage. In the first pretraining stage, only one single

CNN-block was trained to minimize the L2-error between the

output estimated by the CNN-block uΘand its corresponding

label. In the fine-tuning stage, a CG-module with nCG ¼8

iterations was attached to the CNN-block. As can be seen, in

the pretraining stage, after about 65000 weight updates

(which corresponds to approximately 450 epochs using a

mini-batch size of one), training and validation error start to

stagnate between 0.025 and 0.30, respectively. After

FIG. 3. Computational complexity and the time required for one backward pass for the operator Hwhich is employed in the DC-module for different spatial

image sizes NxNy. The number of temporal points was set to Nt¼30 and the number of coils was Nc¼12 [Color figure can be viewed at wileyonlinelibrary.

com]

Medical Physics, 48 (5), May 2021

2418 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2418

pretraining, the parameter set Θwas stored and the fine-tun-

ing of the entire network was carried out by initializing the

set of parameters Θas the one obtained after pretraining. As

can be seen from the orange curves, including the GC-mod-

ule which employs the encoding operators in the CNN archi-

tecture allowed to further reduce the training as well as the

validation error and therefore to obtain a more suitable

parameter set Θ.

Note that since the fine-tuning stage is much more compu-

tationally demanding than the pretraining stage, we only

trained for 150 epochs and tested on the whole training and

validation dataset only 12 times instead of 75 times as in the

pretraining stage. The time for pretraining the CNN-module

amounted to approximately 6 hr, while fine-tuning the entire

network took approximately 5 days.

Figure 5shows the effect of the proposed training strategy

on the obtained results. Figure 5(e) shows the initial NUFFT-

reconstruction which is directly obtained from the undersam-

pled k-space data. After having pretrained the single block uΘ,

the image in Fig. 5(a) is obtained by passing the input to the

CNN-block. Further, proceeding in the reconstruction network

with the CG-bock yields the image shown in Fig. 5(b) for

which the point-wise error is lower than for (a). Figure 5(c)

shows the output of the CNN-block after having fine-tuned the

entire network architecture, that is, a CNN-block with attached

CG-block, by end-to-end training. By comparing (c) to (a), we

see that the quality of the output of the CNN-block has clearly

increased as the point-wise error has been further reduced. Fur-

ther, proceeding with the CG-module in the network further

reduces the point-wise errors as can be seen in (d). Again, by

comparing the final reconstructions (d) and (b), we see that the

quality of the reconstruction has further increased as the point-

wise error has decreased. Note that by taking in consideration

the stagnation of the training and validation error shown in

Fig. 4, we can indeed attribute the increase in performance of

the reconstruction method to the fact that we included the for-

ward and adjoint operators in the training process and rather

than to the additionally performed weight-updates. Further, we

have experimentally confirmed the efficacy of the proposed

training procedure.

Table Ilists the quantitative measures obtained on the test

set for the intermediate output of the CNN-block as well as for

the final reconstruction before and after fine-tuning the entire

network for two different acceleration factors. The table well-

reflects the visual results in the sense that the intermediate out-

put of the CNN as well as the final reconstruction after the

fine-tuning stage surpass their corresponding image estimates

after the sole pretraining in terms of all reported measures.

3.C. Variation of the hyper-parametersMandnCG

As already mentioned, at test time, one does not necessar-

ily need to stick to the configuration of the network in terms

of length Mand number of CG-iterations nCG which were

used for fine-tuning the network. In particular, for the fine-

tuning stage, the choice of Mand nCG is mainly driven by

factors as training times and hardware constraints which do

not play a role at test time.

Thus, we performed a parameter study where at test time,

we varied the length of the network Mas well as the number

of CG-iterations nCG. We repeated these experiment for two

different configurations of our proposed network. We fine-

tuned one network with M¼3 and nCG ¼2 and another one

with M¼1 and nCG ¼8. Due to hardware constraints, the

networks also differ in terms of number of trainable parame-

ters of the CNN-block.

At test time, we fixed nCG ¼4 and varied Mfrom

M¼2,4,⋯,12. With this configuration, the noise and the

artifacts are gradually reduced and data consistency of the

solution is increased several times during the whole recon-

struction process. In Tables S1 and S2 in the supplementary

material, one can see that increasing Mconsistently further

improved the results for both networks. Further, the compar-

ison of the two networks in Fig. S1 in the supplementary

material shows that the network containing more trainable

parameters (i.e., the one which was fine-tuned with M¼1

and nCG ¼8) surpasses the other one with respect to all mea-

sures. This observation motivates the choice of the final

reconstruction network used for comparison with other meth-

ods in the next subsection.

3.D. Reconstruction results

Here, we compare our reconstruction method with the

image reconstruction methods previously introduced. As dis-

cussed in Section 3.C we chose to use M¼12 and nCG ¼4

although the network was trained with M¼1 and nCG ¼8.

Note that, since the same strategy seemed not to be consis-

tently useful for the 3D U-Net which was trained without the

integration of the encoding operators, we report here the

0 10000 20000 30000 40000 50000 60000 70000 80000

Back-Propagations

0.020

0.025

0.030

0.035

0.040

Average L2-error

Training-Error (Pre-Training)

Validation-Error (Pre-Training)

Training-Error (Fine-Tuning)

Validation-Error (Fine-Tuning)

FIG. 4. Training behavior of the proposed network architecture. The lines

indicate the L2-error on the training data (solid lines) as well as on the valida-

tion data (dashed lines) during pretraining of the CNN-block (green lines)

and fine-tuning of the entire network (orange lines). As can be seen, in the

fine-tuning stage, where the physical models AIand AH

Iare included in the

network architecture, the error is further reduced on both the training and the

validation data. Further, the gap between training and validation errors

becomes smaller. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (5), May 2021

2419 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2419

values for M¼1 and nCG ¼12 which are also the ones used

in Ref 10. In the supplementary material, the results for the

variation of Mand nCG can be found as well. The regulariza-

tion parameter for the dictionary learning method

and the

CNN-based regularization method

was chosen as λ¼1.

Figure 6shows an example of the results obtained with

the previously described methods of comparison and our pro-

posed approach. As can be seen, the total variation-minimiza-

tion method as well as kt-SENSE successfully removed the

undersampling artifacts but also led to a loss of image details

as is indicated by the yellow arrows. In contrast, all learning-

based method yielded a good reconstruction performance in

terms of preservation of the cardiac motion. We see that our

proposed method is in addition the one which best reduced

residual image noise in the images. Table II shows the results

achieved in terms of the chosen measures. The best achieved

results are highlighted as bold numbers. Again, the experi-

ments were repeated for two different undersampling (i.e.,

acceleration) factors, given by sampling the k-space along

Nθ¼560 and Nθ¼1130 radial spokes, respectively.

The numbers well-reflect the observations from Fig. 6and

we see that all methods based on regularization methods

employing learning-based methods consistently outperform

the methods using hand-crafted regularization methods with

respect to all measures. All three reported methods using

machine learning-based regularization achieve competitive

results, where we see that our proposed method yields sub-

stantially better results compared to the dictionary learning

(DL)-based method and the other CNN-based method in

terms of error-based measures. In terms of image similarity-

based measures, the difference between the dictionary learn-

ing-based method and ours becomes less prominent except

for VIQP. Interestingly, the 3D-U-Net-based method is con-

sistently surpassed either by the dictionary learning-based

method or our proposed one. All observations are consistent

among both acceleration factors with Nθ¼560 and

Nθ¼1130. In addition, note that while the obtained results

for DL, the 3D U-Net-based iterative reconstruction and our

proposed approach are similar, the average reconstruction

time using DL amounts to approximately 2000 s, where the

most computationally intensive part is the repeated sparse

coding of the image patches at each iteration. However, the

implementation of the dictionary learning and sparse coding

algorithms aITKrM and aOMP is currently only available for

running on the CPU and thus, a further acceleration could be

expected. In contrast, for the CNN-based methods, the

(a) (c)

(b) (d)

(e) (f)

FIG. 5. Intermediate results and point-wise error images of our reconstruction method after pretraining and after fine-tuning our proposed method with M¼1

and nCG ¼8. The output of the CNN-block after pretraining (a), the final reconstruction (b) after pretraining where (a) was the output of the CNN-block, the out-

put of the CNN-block after fine-tuning the entire cascade in an end-to-end manner (c) and the final reconstruction after fine-tuning (d) where (c) was the output

of the CNN-block, the initial NUFFT-reconstruction xIobtained from Nθ¼560 radial spokes (e) and the corresponding ground-truth image obtained by kt-

SENSE with Nθ¼3400 radial spokes (f). We see that after having fine-tuned the entire network, the point-wise error of the final reconstruction is the smallest.

Further, the cardiac motion is much better preserved as is pointed out by the yellow arrows. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (5), May 2021

2420 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2420

reconstruction time amounts to approximately 12 and 48 s

which are mainly coming from the CG-module. Since one

iteration in the CG-block takes approximately 1 s mainly

because of the evaluation of the operator H¼A#

IAIþλIN,

the overall reconstruction time of our proposed method can

be estimated as tREC≈MnCG. Further, note that our proposed

method quite consistently surpasses the 3D U-Net-based

reconstruction method even if it only contains around 9:1%

of the trainable parameters.

4. DISCUSSION

In this work, we have proposed the first end-to-end train-

able iterative reconstruction network for dynamic multi-coil

MR image reconstruction using nonuniform sampling

schemes. As we have seen, the proposed end-to-end trainable

reconstruction network provides a competitive method for

image reconstruction for 2D cine MR image reconstruction

with non-Cartesian multi-coil encoding operators. In the fol-

lowing, we highlight several advantages and limitations of

our proposed approach and put it in relation to other works.

4.A. End-to-end trainability

Since the considered forward model is computationally

demanding, methods as

9,10

can be used as an alternative,

where the generation of the CNN output and the step of

increasing data consistency are decoupled from each other.

However, the major advantage of our proposed network over

methods similar to Refs. [9]or[10], is that the reconstruction

network is trained in and-end-to-end manner. As can be seen

in Fig. 4in Section 3.B, including the DC-module given by

the CG method in the network architecture is highly benefi-

cial since it further reduces the training and validation error

and also reduces the gap between the both, leading to a better

generalization power. This result experimentally confirms the

theory on the achievable performance of the reconstruction

network derived in.

Further, we demonstrated that the pro-

posed training strategy is a viable option for training the

entire network in an end-to-end manner in a relatively short

amount of time while achieving good convergence properties

of the network parameters.

4.B. The choice of the CNN-block

As we have seen in Fig. 4, the proposed CNN-block’s

trainable parameters converge relatively fast (approximately

6 hr) to a set of suitable trainable parameters in the pretrain-

ing stage. Training the 3D U-Net, in contrast, took approxi-

mately 3 days. Because the proposed CNN-block is trained

relatively fast, fine-tuning the entire network is possible by

investing approximately 5 days of further training.

Note that our proposed approach transfers the learning of

the artifact-reduction mapping to from 3D to 2D by the appli-

cation of 2D convolutional layers in the temporally Fourier-

transformed spatiotemporal domain. Thus, it inherits all ben-

efits method presented in Ref. [7]. The most important prop-

erty is that due to the change of perspective on the data, for

each single cine MR image, NxþNysamples are actually

considered to train the CNN-block, which is essential for

training. Since only a relatively small amount of training data

in terms of number of subjects already suffices for a success-

ful training, the end-to-end-training, which is particularly

computationally expensive during the fine-tuning stage, can

be carried out in a reasonable amount of time without the

need to additionally augment the dataset in order to prevent

overfitting.

Note that, of course, if enough training data are available

and the hardware constraints allow it, training the network

with computationally heavier CNN-blocks is possible as well.

Thus, the proposed method can also be seen as a general

method for training an end-to-end reconstruction network for

large-scale image reconstruction problems with computation-

ally demanding forward operators.

4.C. The trade-off between the hyper-parametersM

andnCG

From the formulation of the network architecture in Eq.

(5), we can identify two main hyper-parameters which can be

varied and determine the nature of the proposed reconstruc-

tion algorithm. The overall number of iterations Mdefines

the length of the unrolled iterative scheme. Further, because

in general, minimizing functional (14) requires solving a lin-

ear system using an iterative solver, the number of iterations

to approximate the solution of Eq. (14), here named nCG, has

TABLE I. Intermediate and final reconstruction results after pretraining of the

CNN-block and after fine-tuning of the entire network architecture with

M=1 and nCG.=8. The results are shown for =Nθ560 and =Nθ1130

radial spokes.

After pretraining After fine-tuning

CNN-

prior

Final

reconstruction

CNN-

prior

Final

reconstruction

Number of Radial Spokes:Nθ¼560

PSNR 42.4541 45.4826 44.484 46.0036

NRMSE 0.1252 0.0874 0.0963 0.0810

SSIM 0.9576 0.9796 0.9833 0.9872

MS-

SSIM

0.9916 0.9966 0.9965 0.9977

UQI 0.8027 0.8776 0.9064 0.9275

VIQP 0.9431 0.9506 0.9429 0.9434

HPSI 0.9889 0.9941 0.9901 0.9943

Number of Radial Spokes:Nθ¼1130

PSNR 43.9893 47.0087 46.2383 47.7545

NRMSE 0.1044 0.0735 0.0808 0.0673

SSIM 0.9698 0.9854 0.9873 0.9903

MS-

SSIM

0.9945 0.9977 0.9974 0.9984

UQI 0.8324 0.8998 0.9209 0.9408

VIQP 0.9574 0.9642 0.9604 0.9626

HPSI 0.9916 0.9961 0.9930 0.9962

Medical Physics, 48 (5), May 2021

2421 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2421

to be chosen as well. By setting M¼1 and nCG relatively

"high," say nCG ¼12, one aims at constructing an end-to-end

trainable network which conceptually resembles methods

like.

9,10

There, the CNN-block is only applied once to obtain

a CNN-based image-prior and functional (14) is minimized

until convergence of the iteration. In contrast, one can also

set M>1, but because of hardware constraints, one necessar-

ily also has to either lower nCG in each CG-block, reduce the

complexity of the CNN-blocks, or in the worst case, both.

Lowering the number of CG-iterations nCG causes the solu-

tion of Eq. (14) most probably to only be poorly approxi-

mated and lowering the CNN-blocks complexity can be

expected to deliver poorer intermediate outputs of the differ-

ent CNN-blocks in terms of artifacts-reduction.

Interestingly, we have observed that even fine-tuning the

entire network with M¼1 and nCG ¼8 allowed to change the

hyper-parameters at test time by further obtaining a boost in

performance, see the supplementary material for more details.

We also trained an iterative network architecture with our

proposed CNN-block for M¼3 and nCG ¼2. Due to

hardware constraints, we employed smaller 2D U-Nets as

CNN-blocks, where, different to before, we set the initial

number of applied filters to nf¼4. The so-constructed net-

work consisted of only 5908 trainable parameters, that is,

only about 0.57%of the 3D U-Net.

In the supplementary material, one can see a comparison of

our method fine-tuned with M¼1andnCG ¼8againstM¼3

and nCG ¼2 which were evaluated with different configurations

of Mand nCG at test time. The network which was fine-tuned

with M¼1andnCG ¼8 consistently outperforms the other with

respect to all measures. Altough the comparison is not entirely

fair since the number of trainable parameters highly differs from

one CNN-block to the other, this result is important because of

the following reason. It suggests that sacrificing expressiveness

of the CNN-block in terms of trainable parameters in order be

able to fine-tune with M>1 seems not to be necessary since

also for the network fine-tuned with M¼1andnCG ¼8, differ-

ent Mand nCG can be used at test time and further improve the

results. Interestingly, we were not able to observe this phe-

nomenon for the 3D U-Net as can be seen from Table S3 in the

FIG. 6. Comparison of images and point-wise errors for different reconstruction methods with learned and non-learned regularization for Nθ¼560. Left column:

Classical iterative reconstruction methods with —from top to bottom —direct NUFFT-reconstruction, iterative SENSE,

total variation-minimization

(TV),

kt-SENSE.

Right column: Learning-based regularization methods with —from top to bottom —adaptive dictionary learning and sparse coding

(DIC),

CNN-based iterative reconstruction (CNN +IR)

using a 3D U-Net,

and our proposed approach. The yellow arrows in the images of the left column

point at errors occurring at regions of the heart which might affect the assessment of the cardiac motion while the ones in the right column point at residual image

noise. [Color figure can be viewed at wileyonlinelibrary.com]

Medical Physics, 48 (5), May 2021

2422 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2422

supplementary material. Thus, we attribute this property to the

fine-tuning stage in which the encoding operator is included in

the network architecture which again highlights the importance

of employing a CNN-block which allows end-to-end training of

the entire network in a reasonable amount of time.

4.D. Limitations

For the proposed method, training times tend to be quite

long, amounting to several days. This makes the algorithmic

development challenging in terms of hyper-parameter tuning

and limits the possibility to draw general conclusions because

repeating experiments for different parameter configurations

is prohibitive. Nevertheless, the proposed training scheme

shows that it is easily possible to outperform post-processing

methods as in

6,7

or the so-called model-agnostic approaches

in Refs. [9,10] where training of the CNN-block is decoupled

from the subsequent minimization of a CNN-regularized

functional.

Further, although we have seen that it is possible to increase

Mat test time and observe an increase in performance in terms

of reconstruction results, from a theoretical point of view, it

remains somewhat unclear why this is exactly possible. We

believe that the reason for this lies in the end-to-end training

the entire network but leave a rigorous theoretical investigation

and a convergence analysis as future work.

4.E. Differences and similarities to other works

Our presented approach shares similarities across different

works. First, for the case M¼1, it can be seen as an exten-

sion of the approaches presented in Refs. [7,9,49] in the sense

that we only generate only one CNN-based image-prior

which is used in a Tikhonov functional. However, because of

the proposed end-to-end training strategy, the obtained CNN-

based image-prior tends to be a much better estimate of the

ground-truth image compared to the cases when the training

of the CNN-block is decoupled. Further, for M>1, the struc-

ture of the network is similar to,

13,17

with the difference that

first of all, the considered inverse problem is different (radial

multi-coil instead of Cartesian single-coil) and thus the DC-

module is a CG-block instead of the implementation of a

closed-form solution to Eq. (14). Second, our CNN-block

consists of a spatiotemporal 2D U-Net which is applied in the

Fourier-transformed spatiotemporal domain.

In our work, the 2D U-Nets are applied in the temporally

Fourier-transformed spatiotemporal domain and thus use the

same change of perspective on the data as in.

However, in

Ref. [7] the slices extraction and reassembling process is not

part of the network architecture and thus does not allow end-

to-end training. Further, we apply the U-net after having per-

formed a temporal FFT, similar to Ref. [18].

In Ref. [50], where a non-Cartesian multi-coil dynamic

acquisition is considered, the measured k-space data are first

interpolated onto a Cartesian grid. After this interpolation, a

simple FFT can be used as forward operator and thus facili-

tates the construction of an iterative network. However,

because the k-space data interpolation is decoupled from the

network, the network cannot learn to compensate for the

interpolation errors. Thus, in order to really utilize the mea-

sured k-space data, applying the actual encoding operator

(which involves the gridding-step) which is associated with

the considered image reconstruction problem is unavoidable.

In contrast, in our approach, the gridding of the measured

k-space is part of the network architecture. This important

difference is the main motivation of the work since due to the

TABLE II. Quantitative results for Nθ¼560Nθ¼1130 radial spokes which corresponds to an acceleration factor of ~9and ∼18, respectively.

Non-learned regularization Learned regularization

NUFFT It SENSE TV kt-SENSE DL 3D U-Net +IR Proposed

Number of radial spokes:Nθ¼560

PSNR 34.9542 40.5610 41.8778 44.2316 45.6085 46.0845 47.4396

NRMSE 0.2913 0.1535 0.1313 0.0986 0.0844 0.0800 0.0697

SSIM 0.8843 0.9686 0.9786 0.9820 0.9885 0.9880 0.9895

MS-SSIM 0.9762 0.9935 0.9950 0.9958 0.9980 0.9979 0.9982

UQI 0.7574 0.8827 0.8889 0.8887 0.9347 0.9288 0.9388

VIQP 0.8339 0.8379 0.8234 0.8931 0.9256 0.9340 0.9620

HPSI 0.9456 0.9811 0.9849 0.9904 0.9950 0.9949 0.9961

Number of radial spokes:Nθ¼1130

PSNR 39.0289 43.8077 44.1841 46.0096 47.4373 47.5094 48.6761

NRMSE 0.1864 0.1067 0.0998 0.0817 0.0682 0.0681 0.0606

SSIM 0.9378 0.9778 0.9841 0.9875 0.9913 0.9906 0.9916

MS-SSIM 0.9892 0.9965 0.9965 0.9976 0.9986 0.9985 0.9986

UQI 0.8363 0.9156 0.9028 0.9224 0.9484 0.9420 0.9480

VIQP 0.9228 0.9513 0.8832 0.9498 0.9521 0.9547 0.9695

HPSI 0.9775 0.9923 0.9903 0.9943 0.9972 0.9966 0.9972

Parameters –– –– 9664 1 024 224 93 617

Backend GPU CPU CPU CPU CPU/GPU GPU GPU

Medical Physics, 48 (5), May 2021

2423 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2423

computational difficulties linked with the integration of the

nonuniform FFT in the network architecture, a computation-

ally light CNN-block has to be used.

Although the method shares similar features to other works,

this is —to the best our knowledge —the first work to combine

several components to construct a network architecture which is

trainable in an end-to-end manner for a dynamic nonuniform

multi-coil MR image reconstruction problem by actually using

the radially acquired data and not the one interpolated onto a

Cartesian grid. In fact, we believe that the large scale of the con-

sidered problem is the reason for the lack of end-to-end trainable

reconstruction networks for dynamic non-Cartesian data acquisi-

tion protocols using multiple receiver coils.

5. CONCLUSION

In this work, we have proposed a new end-to-end trainable

data-consistent reconstruction network for accelerated 2D

dynamic MR image reconstruction with nonuniformly sam-

pled data using multiple receiver coils. Further, since end-to-

end training is computationally expensive because of the for-

ward and the adjoint operators are included in the network,

we have proposed and investigated an efficient training strat-

egy to circumvent this issue. In addition, we have compared

our method to other well-established iterative reconstruction

methods as well as several methods based on learned regular-

ization. Our proposed method surpassed all methods using

non-learned regularization methods and achieved competitive

results compared to a dictionary learning-based method and a

method based on CNN-based image-priors. Although the

method was presented for 2D radial cine MRI, we expect the

training strategy to be applicable to general image reconstruc-

tion problems with computationally expensive operators as

well. Further, we expect the proposed CNN-block to be appli-

cable to arbitrary reconstruction problems with a time com-

ponent and temporal correlation.

CONFLICT OF INTEREST

The authors have no conflict to disclose.

DATA AVAILABILITY STATEMENT

The implementation of the encoding operators AI,AH

I,as

well as the proposed CNN-block uΘand the entire recon-

struction network uM

Θare available on https://github.com/kof

lera/DynamicRadCineMRI/.

Author to whom correspondence should be addressed. Electronic mail: an-

dreas.kof[email protected]

REFERENCES

1. Puntmann VO, Valbuena S, Hinojar R, et al. Society for Cardiovascular

Magnetic Resonance (SCMR) expert consensus for CMR imaging end-

points in clinical research: Part I-analytical validation and clinical quali-

fication. J Cardiovasc Magn Reson. 2018;20:67.

2. Wang G, Ye JC, Mueller K, Fessler JA. Image reconstruction is a new

frontier of machine learning. IEEE Trans Med Imaging.

2018;37:1289–1296.

3. Ongie G, Jalal A, Metzler CA, Baraniuk RG, Dimakis AG, Willett R.

Deep learning techniques for inverse problems in imaging. IEEE J Select

Areas Inform Theory. 2020;1:39–56

4. Jin KH, McCann MT, Froustey E, Unser M. Deep convolutional neural

network for inverse problems in imaging. IEEE Trans Image Process.

2017;26:4509–4522.

5. Sandino CM, Dixit N, Cheng JY, Vasanawala SS. Deep convolutional

neural networks for accelerated dynamic magnetic resonance imaging,

Proceedings of 31st Conference of Neural Information Processing Sys-

tems (NIPS), Medical Imaigng meets NIPS Workshop; 2017. Online.

Available at http://www.doc.ic.ac.uk/bglocker/public/mednips2017/med

nips_2017_paper_19.pdf).

6. Hauptmann A, Arridge S, Lucka F, Muthurangu V, Steeden JA. Real-

time cardiovascular MR with spatio-temporal artifact suppression using

deep learning–proof of concept in congenital heart disease. Magn Reson

Med. 2019;81:1143–1156.

7. Kofler A, Dewey M, Schaeffter T, Wald C, Kolbitsch C. Spatio-temporal

deep learning-based undersampling artefact reduction for 2D radial cine

MRI with limited training data. IEEE Trans Med Imaging.

2020;39:703–717.

8. El-Rewaidy H, Fahmy AS, Pashakhanloo F, et al. Multi-domain convolu-

tional neural network (MD-CNN) for radial reconstruction of dynamic

cardiac MRI. Magn Reson Med. 2020;85:1195–1208.

9. Hyun CM, Kim HP, Lee SM, Lee S, Seo JK. Deep learning for under-

sampled MRI reconstruction. Phys Med Biol. 2018;63:135007.

10. Kofler A, Haltmeier M, Schaeffter T, et al. Neural networks-based regu-

larization for large-scale medical image reconstruction. Phys Med Biol.

2020;65:135003.

11. Antun V, Renna F, Poon C, Adcock B, Hansen AC. On instabilities of

deep learning in image reconstruction and the potential costs of AI. Proc

Nat Acad Sci. 2020;117:30088–30095.

12. Maier AK, Syben C, Stimpel B, et al. Learning with known operators

reduces maximum error bounds. Nature Mach Int. 2019;1:373–380.

13. Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D. A deep cas-

cade of convolutional neural networks for dynamic MR image recon-

struction. IEEE Trans Med Imaging. 2018;37:491–503.

14. Hammernik K, Klatzer T, Kobler E, et al. Learning a variational network

for reconstruction of accelerated MRI data. Magn Reson Med.

2018;79:3055–3071.

15. Kobler E, Klatzer T, Hammernik K, Pock T. Variational networks: Con-

necting variational methods and deep learning. In: German conference

on pattern recognition. Springer; 2017:281–293.

16. Qin C, Schlemper J, Caballero J, Price AN, Hajnal JV, Rueckert D. Con-

volutional recurrent neural networks for dynamic MR image reconstruc-

tion. IEEE Trans Med Imaging. 2018;38:280–290.

17. Aggarwal HK, Mani MP, Jacob M. Modl: Model-based deep learning

architecture for inverse problems. IEEE Trans Med Imaging.

2018;38:394–405.

18. Qin C, Schlemper J, Duan J, et al. k-t NEXT: Dynamic MR Image

Reconstruction Exploiting Spatio-Temporal Correlations. In: Interna-

tional Conference on Medical Image Computing and Computer-Assisted

Intervention Springer; 2019:505–513.

19. Gilton D, Ongie G, Willett R. Neumann networks for linear

inverse problems in imaging. IEEE Trans Comput Imaging. 2019;6:

328–343.

20. Kofler A, Haltmeier M, Kolbitsch C, KachelrießM, Dewey M. A U-nets

cascade for sparse view computed tomography, In: International Work-

shop on Machine Learning for Medical Image Reconstruction, Springer;

2018:91–99.

21. K ¨

ustner T, Fuin N, Hammernik K., et al. CINENet: Deep learning-based

3D cardiac CINE MRI reconstruction with multi-coil complex-valued

4D spatio-temporal convolutions. Sci Rep. 2020;10:1–13.

22. Schlemper J, Salehi SSM, Kundu P, et al. Nonuniform Variational Net-

work: Deep Learning for Accelerated Nonuniform MR Image Recon-

struction. In: International Conference on Medical Image Computing

and Computer-Assisted Intervention. Springer; 2019:57–64.

23. Malav´

e MO, Baron CA, Koundinyan SP, et al. Reconstruction of under-

sampled 3D non-Cartesian image-based navigators for coronary MRA

Medical Physics, 48 (5), May 2021

2424 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2424

using an unrolled deep learning model. Magn Reson Med.

2020;84:800–812.

24. Duan J, Schlemper J, Qin C, et al. VS-Net: Variable splitting network for

accelerated parallel MRI reconstruction. In: International Conference on

Medical Image Computing and Computer-Assisted Intervention.

Springer; 2019:713–722.

25. Lustig M, Donoho DL, Santos JM, Pauly JM. Compressed sensing

MRI. IEEE Signal Process Mag. 2008;25:72–82.

26. Smith DS, Sengupta S, Smith SA, Welch EB, Trajectory optimized

NUFFT: Faster non-Cartesian MRI reconstruction through prior knowl-

edge and parallel architectures. Magn Reson Med. 2019;81:2064–2071.

27. Knoll F, Hammernik K, Zhang C, et al. Deep-learning methods for parallel

magnetic resonance imaging reconstruction: A survey of the current

approaches, trends, and issues. IEEE Signal Process Mag. 2020;37:128–140.

28. Winkelmann S, Schaeffter T, Koehler T, Eggers H, Doessel O. An opti-

mal radial profile order based on the Golden Ratio for time-resolved

MRI. IEEE Trans Med Imaging. 2006;26:68–76.

29. Qu P, Zhong K, Zhang B, Wang J, Shen GX. Convergence behavior of

iterative SENSE reconstruction with non-Cartesian trajectories. Magn

Reson Med. 2005;54:1040–1045.

30. Le Gall D. MPEG: A video compression standard for multimedia appli-

cations. Commun ACM. 1991;34:46–58.

31. Tsao J, Boesiger P, Pruessmann KP. k-t BLAST and k-t SENSE: dynamic

MRI with high frame rate exploiting spatiotemporal correlations. Magn

Reson Med. 2003;50:1031–1042.

32. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for

biomedical image segmentation, In: International Conference on Medi-

cal image computing and computer-assisted intervention. Springer;

2015:234–241.

33. Ong F, Uecker M, Lustig M. Accelerating non-Cartesian MRI recon-

struction convergence using k-space preconditioning. IEEE Trans Med

Imaging. 2019;39:1646–1654.

34. Hestenes MR, Stiefel E, et al. Methods of conjugate gradients for solving

linear systems. J Res Nat Bureau Stand. 1952;49:409–436.

35. Feng L, Srichai MB, Lim RP, et al. Highly accelerated real-time cardiac

cine MRI using k-t SPARSE-SENSE. Magn Reson Imaging.2012.

36. Malik WQ, Khan HA, Edwards DJ, Stevens CJ. A gridding algorithm

for efficient density compensation of arbitrarily sampled Fourier-domain

data. In: IEEE/Sarnoff Symposium on Advances in Wired and Wireless

Communication, 2005, IEEE; 2005:125–128.

37. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assess-

ment: From error visibility to structural similarity. IEEE Trans Image

Process. 2004;13:600–612.

38. Wang Z, Simoncelli EP, Bovik AC. Multiscale structural similarity for

image quality assessment. In: The Thrity-Seventh Asilomar Conference

on Signals, Systems & Computers, 2003; IEEE; 2003;2:1398–1402.

39. Wang Z, Bovik AC. A universal image quality index. IEEE Signal Pro-

cess Lett. 2002;9:81–84.

40. Sheikh HR, Bovik AC. Image information and visual quality. IEEE

Trans Image Process. 2006;15:430–444.

41. Reisenhofer R, Bosse S, Kutyniok G, Wiegand T. A Haar wavelet-based

perceptual similarity index for image quality assessment. Signal Process.

2018;61:33–43.

42. MEA Muckley, Torch KB-NUFFT. https://github.com/mmuckley/torc

hkbnufft, 2019.

43. Muckley MJ, Stern R, Murrell T, Knoll F, TorchKbNufft: A High-level,

hardware-agnostic non-uniform fast fourier transform. ISMRM Work-

shop on Data Sampling & Image Reconstruction. 2020.

44. Pruessmann KP, Weiger M, B¨

ornert P, Boesiger P. Advances in sensitiv-

ity encoding with arbitrary k-space trajectories. Magn Reson Med.

2001;46:638–651.

45. Block KT, Uecker M, Frahm J. Undersampled radial MRI with multiple

coils. Iterative image reconstruction using a total variation constraint.

Magn Reson Med. 2007;57:1086–1098.

46. Wang Y, Ying L. Compressed sensing dynamic cardiac cine MRI using

learned spatiotemporal dictionary. IEEE Trans Biomed Eng.

2014;61:1109–1120.

47. Caballero J, Price AN, Rueckert D, Hajnal JV. Dictionary learning and

time sparsity for dynamic MR data reconstruction. IEEE Trans Med

Imaging. 2014;33:979–994.

48. Pali MC, Schaeffter T, Kolbitsch C, Kofler A. Adaptive sparsity level

and dictionary size estimation for image reconstruction in accelerated

2D radial cine MRI. Med Phys. 2020;48:178–192.

49. Schwab J, Antholzer S, Haltmeier M. Deep null space learning for

inverse problems: Convergence analysis and rates. Inverse Prob.

2019;35:025008.

50. Biswas S, Aggarwal HK, Jacob M. Dynamic MRI using model-based

deep learning and SToRM priors: MoDL-SToRM. Magn Reson Med.

2019;82:485–494.

SUPPORTING INFORMATION

Additional supporting information may be found online in

the Supporting Information section at the end of the article.

Data S1. The supplementary material contains the results

obtained by variying the hyper-parameters Mand nCG for our

proposed method as well as for the method using the 3D U-

Net. Further, a comparison of our proposed method with two

different configurations of hyper-parameters is reported.

Medical Physics, 48 (5), May 2021

2425 Koﬂer et al.: An Iterative CNN for 2D Radial Cine MRI 2425