scieee Science in your language
[en] (orig)
An end-to-end-trainable iterative network architecture for accelerated radial
multi-coil 2D cine MR image reconstruction
Andreas Koer
a)
Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany
Markus Haltmeier
Department of Mathematics, University of Innsbruck, Innsbruck 6020, Austria
Tobias Schaeffter
Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany
School of Imaging Sciences and Biomedical Engineering, Kings College London, London SE1 7EH, UK
Department of Biomedical Engineering, Technical University of Berlin, Berlin 10623, Germany
Christoph Kolbitsch
Physikalisch-Technische Bundesanstalt (PTB), Braunschweig, Berlin 10587, Germany
School of Imaging Sciences and Biomedical Engineering, Kings College London, London SE1 7EH, UK
(Received 24 December 2020; revised 11 February 2021; accepted for publication 18 February 2021;
published 1 April 2021)
Purpose: Iterative convolutional neural networks (CNNs) which resemble unrolled learned iterative
schemes have shown to consistently deliver state-of-the-art results for image reconstruction problems
across different imaging modalities. However, because these methods include the forward model in
the architecture, their applicability is often restricted to either relatively small reconstruction prob-
lems or to problems with operators which are computationally cheap to compute. As a consequence,
they have not been applied to dynamic non-Cartesian multi-coil reconstruction problems so far.
Methods: In this work, we propose a CNN architecture for image reconstruction of accelerated 2D
radial cine MRI with multiple receiver coils. The network is based on a computationally light CNN
component and a subsequent conjugate gradient (CG) method which can be jointly trained end-to-
end using an efficient training strategy. We investigate the proposed training strategy and compare
our method with other well-known reconstruction techniques with learned and non-learned regular-
ization methods.
Results: Our proposed method outperforms all other methods based on non-learned regularization.
Further, it performs similar or better than a CNN-based method employing a 3D U-Net and a method
using adaptive dictionary learning. In addition, we empirically demonstrate that even by training the
network with only iteration, it is possible to increase the length of the network at test time and further
improve the results.
Conclusions: End-to-end training allows to highly reduce the number of trainable parameters of and
stabilize the reconstruction network. Further, because it is possible to change the length of the net-
work at the test time, the need to find a compromise between the complexity of the CNN-block and
the number of iterations in each CG-block becomes irrelevant.© 2021 The Authors. Medical Physics
published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.
[https://doi.org/10.1002/mp.14809]
Key words: deep learning, inverse problems, magnetic resonance imaging, neural networks
1. INTRODUCTION
Magnetic resonance imaging (MRI) is an important tool for
the assessment of different cardiovascular diseases. Cardiac
cine MRI, for example, is used to assess the cardiac function
as well as left and right ventricular volumes and left ventricu-
lar mass.
1
However, MRI is also known to suffer from rela-
tively long data acquisition times. In cardiac cine MRI, the
data acquisition usually has to take place during a single
breathhold of the patient in order to avoid artifacts arising
from the respiratory motion. Since for patients with limited
breathhold capabilities this can be challenging, undersam-
pling techniques can be used to accelerate the measurement
process. Undersampling in Fourier domain, the so-called k-
space, leads to a violation of the NyquistShannon sampling
theorem and therefore, regularization techniques must be
used to reconstruct artifact-free images.
Recently, image reconstruction has experienced a para-
digm shift with the re-emergence of neural networks (NNs),
and in particular, convolutional NNs (CNNs) which can be
employed as regularization methods.
2
CNNs can be used in
different ways as regularization methods for image recon-
struction problems. A key element to categorize these meth-
ods is whether the forward operator is used in the learning
process.
3
A straightforward approach is to simply post-
process an initial estimate of the solution which is impaired
2412 Med. Phys. 48 (5), May 2021 0094-2405/2021/48(5)/2412/14
© 2021 The Authors. Medical Physics published by Wiley Periodicals LLC on
behalf of American Association of Physicists in Medicine. This is an open
access article under the terms of the Creative Commons Attribution
License, which permits use, distribution and reproduction in any medium,
provided the original work is properly cited.
2412
by noise and artifacts (see, e.g., Ref. [47]). Further,
cross-/multi-domain methods which pre-process the k-space
and subsequently process the initially obtained reconstruction
have been investigated as well.
8
However, the post-processed
image might lack data consistency, meaning that it is not
clear how well the post-processed image matches the mea-
sured k-space data. Thus, approaches which used the output
of a pretrained CNN as image-priors have been considered as
well (see, e.g., Ref. [9,10]). Thereby, the CNN-priors are used
in the formulation of a Tikhonov functional which is subse-
quently minimized to increase the data consistency of the
solution. These methods have the (computational) advantage
of decoupling the regularization step from increasing data
consistency and are thus also applicable to arbitrary large-
scale image reconstruction problems. However, the network
training is typically carried out in the absence of the forward
model and although this strategy was reported to be success-
ful, image reconstruction methods based on NNs have been
reported to possibly suffer from instabilities.
11
This issue
could directly affect the obtained CNN-prior and thus, affect
the quality of the final reconstruction, since its solution
depends on the CNN-prior. Interestingly, in Ref. [11], the
authors showed that the CNN-based reconstruction methods
which include the forward and adjoint operators in the net-
work architecture are the most stable with respect to small
perturbations and adversarial attacks. Further, including the
forward model in the network architecture has been reported
to lower the maximum error bound of the CNNs.
12
Thus, it is
desirable to find a way to include the physical model in the
CNN architecture, even for computationally expensive/large-
scale problems.
Methods in which the CNNs architectures resemble
unrolled iterative schemes of finite length are referred to as
variational/iterative/cascaded networks (see, e.g., Ref. [13-
19,43]). Thereby, these methods typically consist of CNN-
blocks and data consistency (DC)-blocks, which are given in
the form of gradient steps with respect to a data-fidelity term
(see, e.g., Ref. [14,19]), or as layers which implement the
minimizer of a functional involving a data-fidelity term (see,
e.g., Ref. [13,16,18]). While the CNN-blocks can be inter-
preted as learned regularizers, the DC-blocks utilize the mea-
sured undersampled k-space data to increase/ensure the data
consistency of the intermediate CNN outputs.
Unfortunately, including the forward and adjoint operators
in the CNN can at the same time represent the computational
bottleneck of these methods, since the forward operator as
well as its adjoint can be computationally expensive to evalu-
ate. As a consequence, the applicability of iterative networks
is currently still limited to either reconstruction problems
with easy-to-compute forward operators, for example, a FFT
sampled on a Cartesian grid
13,16,18,21
or to non-dynamic prob-
lems with non-Cartesian sampling schemes, see Ref. [22]for
a single-coil data acquisition or Ref. [23] for a multiple-coil
acquisition. For example, in Ref. [13,16,18] only a single-coil
is used for the encoding operator. In,
21
multiple receiver coils
were used for a dynamic 3D reconstruction problem with a
Cartesian sampling grid. For non-dynamic problems, in
contrast, iterative CNNs for multi-coil data acquisition on a
Cartesian grid have been received more attention (see, e.g.,
Ref. [14,15,24]).
Acquisition protocols using non-Cartesian sampling tra-
jectories such as radial or spiral sampling can be an attractive
alternative to standard Cartesian acquisitions, especially
because the arising undersampling artifacts are much more
incoherent compared to the ones arising from a Cartesian
acquisition.
25
However, radially acquired k-space data are
computationally more demanding to reconstruct as the for-
ward encoding operator involves gridding on a Cartesian grid
before the fast Fourier transform can be applied, see Ref. [26]
for more details. In,
22
an iterative network for a non-dynamic
radial single-coil acquisition was proposed. However, the
standard in the clinical routine is in fact given by multi-coil
data-acquisitions
27
through which the acquisition of the k-
space data increases in terms of computational complexity.
Finally, acquiring a dynamic process is inherently computa-
tionally more demanding as for each time point, the encoding
operator has to be applied to the image.
In this work, we present a novel computationally light and
efficient CNN-block architecture as well as a training strategy
which are tailored to image reconstruction of dynamic multi-
coil radial MR data of the heart. The combination of the pro-
posed network architecture and training strategy allows to
train the entire reconstruction network in an end-to-end man-
ner, even for dynamic problems with nonuniformly sampled
data as well as multiple receiver coils. To the best of our
knowledge, this is the first work which overcomes these
computational difficulties and presents such an end-to-end
trainable network architecture for dynamic non-Cartesian
multi-coil MR reconstruction problems.
The paper is structured as follows: In Section 2, we for-
mally introduce the reconstruction problem and the proposed
method by discussing in more detail the used CNN-block as
well as the training strategy. In Section 3, we show extensive
experiments to validate the efficacy of our approach and com-
pare with to different state-of-the-art methods for dynamic
cardiac radial MRI. We then discuss the main advantages and
limitations of our work in Section 4and conclude the work in
Section 5.
2. MATERIALS AND METHODS
2.A. Problem formulation
Let xNwith N¼NxNyNtdenote the vector rep-
resentation of a complex-valued cine MR image, that is,
x¼½x1,...,xNtT. The forward operator Amaps the dynamic
cine MR image to its corresponding k-space. In this work, we
focus on a 2D radial encoding operator using multiple recei-
ver coils. More precisely, the operator Ais given by.
A:¼INcEðÞC, (1)
where INcdenotes the identity operator and Cconsists of the
concatenation of the Ncdifferent coil-sensitivity maps which
are multiplied with the cine MR image, that is,
Medical Physics, 48 (5), May 2021
2413 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2413
C¼½C1,...,CNcT, with Cj¼diag cj,cj,...,cj

NNand
cjNxNy. The operator E¼diagðE1,...,ENtÞdenotes a
composition of 2D radial encoding operators Etwhich for
each temporal point tf1,,Ntgsample a 2D image
xtNxNyalong a non-Cartesian grid in the Fourier domain.
In particular, for this work, we consider trajectories given by
the radial golden-angle method
28
but point out that other tra-
jectories can be considered as well. By J¼1, ...,Nrad
fg
,we
denote the set of indices the k-space coefficients which are
needed to sample a 2D image xtat Nyquist limit. In order to
accelerate the image acquisition process, the set Jis typically
constructed by reducing the number of radial lines. We
denote by AIthe radial Fourier encoding operator which sam-
ples a complex-valued 2D cine MR image xat a radial grid
given by the indices in the set I¼I1...INtwith ItJfor
all t¼1, ...,Nt:Thus, the considered reconstruction problem
is given by
AIxþe¼yI, (2)
where yI¼½y1
I,...,yNc
ITwith yc
INtmrad ,forc¼1,,Nc
denotes the measured k-space data for the dynamic 2D
cine MR image xand eNkwith Nk¼Ntmrad Ncdenotes
random noise. Multiple receiver coils are typically used in
order to achieve mrad >Nradsuch that problem (2) is overde-
termined and can be solved by considering the normal
equations
AH
IAIx¼AH
IyI:(3)
System (3) could in principle be solved by conjugate gra-
dient (CG)-like methods yielding an approximation of the
solution given by.
x¼ðAH
IAIÞ1AH
IyI, (4)
where the configuration of the receiver coils ensures that the
operator AH
IAIis invertible. However, for non-Cartesian tra-
jectories, problem (3) is ill-conditioned. Thus, methods used
to approximate xcan exhibit a semi-convergence behavior
and lead to undesired noise amplification.
29
Hence, regular-
ization techniques must be used to stabilize the inversion pro-
cess.
In this work, we propose a reconstruction network based
on a fixed-point iteration which has the form.
uM
Θ¼fDC
λuΘ

...fDC
λuΘ

|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}
Mtimes
¼ðfDC
λuΘÞM, (5)
where uΘis a CNN-block which reduces undersampling arti-
facts, and noise and fDC
λis a module which increases data
consistency of the intermediate outputs of the CNN-blocks.
Limits of Eq. (5) simultaneously satisfy data consistency
induced by the DC module and regularity induced by the
CNN-block.
For example, in the simplest case where fDC
λis defined as
the minimizer of a penalized least squares functional [see Eq.
(14)] and where uΘdefines a linear projection, the fixed-point
iteration Eq. (5) can be shown to converge to the minimizer
of the functional.
Fλ,yIxðÞ¼kAIxyIk2
2þλkxuΘxðÞk
2
2:(6)
Note that, similar to
17
but in contrast to,
13,14,16
the set of
parameters Θis the same for each CNN-block and therefore
allows a repeated application of the iterative network uM
Θin
general. Further, the proposed CNN-block differs from the
one in,
17
as we shall discuss later. Figure 1shows an illustra-
tion of the described reconstruction network.
In the following, we describe the CNN-block uΘand the
DC-module fDC
λin more detail.
2.B. Proposed CNN-block
In order to keep the notation as simple as possible, we
neglect the indices referring to the iteration within the net-
work. We illustrate the CNN-block for the first iteration of
the network, that is, when the input of the CNN-block is the
initial nonuniform FFT (NUFFT)-reconstruction. The process
described below is employed within every CNN block in the
proposed network architecture. Let Wdenote a diagonal
operator which contains the entries of the density compensa-
tion function in k-space. By xI:¼A#
IyI:¼AH
IWyI,we
denote the initial NUFFT-reconstruction which is obtained by
re-gridding the density-compensated measured k-space coef-
ficients onto a Cartesian grid and applying the inverse FFT
(IFFT). First, we compute a temporal average of the input
over all cardiac phases, that is,
νI¼1
Nt
Nt
t¼1
xI,tNxNy(7)
and stack it along the temporal axis, that is,
μI¼½νI,,νINxNyNt. Then, we subtract μIfrom the
initial NUFFT-reconstruction xIand apply a temporal Fourier
transform Ft, that is, we obtain z¼FtðxIμIÞ. Subtracting
the temporal average from image frames is a well-known and
established approach to sparsify image sequences, for exam-
ple in video-compression (see, e.g., Ref. [30]). Further, apply-
ing a temporal FFT provides an even sparser representation
and has been used for example in Refs. [18,31] for dynamic
MR image reconstruction. Thus, the CNN-block learns to
reduce the present undersampling artifacts and noise in a
sparse domain. We define the operators Rxt and Ryt which
rotate zby appropriately permuting the relevant axes of the
images such that
zxt ¼RxtzNyNxNt(8)
zyt ¼RytzðNxNyNtÞ:(9)
At this point, zxt and zyt can be interpreted as Nycom-
plex-valued images of shape NxNtand Nxcomplex-val-
ued images of shape NyNt, respectively. Then, a simple
2D U-Net,
32
which we denote by cΘis applied both to zxt
and zyt. Note that we use the same U-Net for both xt- and
yt-branches, that is, the weights are shared among the two.
This suggestion relies on the assumption of an isotropic
resolution in x- and y-direction and the fact that for a radial
sampling, the artifacts in xt-andyt-direction do not differ
Medical Physics, 48 (5), May 2021
2414 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2414
across the branches. For a Cartesian acquisition, where the
latter assumption is clearly not valid, one could simply
assign two different CNNs cΘxt and cΘyt to the two different
branches. Also, note that because the U-Net only consists
of convolutional and max-pooling layers and has no fully
connected layers, the approach can be used for the case
NxNyas well. Then, we obtain.
zxt
CNN ¼cΘzxt
ðÞ (10)
zyt
CNN ¼cΘzyt
ðÞ:(11)
and calculate the estimate zCNN by reassembling the pro-
cessed spatiotemporal slices, that is,
zCNN ¼1
2ðRxtÞTzxt
CNN þRyt
ðÞ
Tzyt
CNN

:(12)
The factor 1=2 in Eq. (12) is needed because every pixel is
used twice once in the xt-branch and once in the yt-
branch. Finally, the estimate xCNN is given by applying FH
t
and adopting a residual connection, that is,
xCNN ¼FH
tzCNN þμI:(13)
Figure 2shows an illustration of the just described
procedure. The 2D U-Net employed in our proposed network
consists of three encoding stages which are interleaved by
max-pooling layers. Each stage consists of a block of 2D con-
volutional layers with Leaky ReLU as activation function.
The number of initially extracted feature maps which is dou-
bled after each max-pooling layer is nf¼16 and is on pur-
pose chosen to be relatively small in order to keep the
models complexity as low as possible. The reason for that is
that the 2D U-Net will have to be used together with a (com-
putationally expensive) CG-block, which we describe in the
next subsection. In the decoding path, bilinear upsampling
followed by a 3 3 convolutional layer with no activation
function is used to upsample the images. Our 2D U-Net also
involves skip connections from the last to the first layers of
the corresponding stages in the encoding and the decoding
path. The residual connection is performed in image domain
rather than in Fourier domain.
2.C. DC-module
We illustrate the DC-module for the first iteration of the
network, that is, where the incoming input of the CNN-block
is given by the initial reconstruction. For later iterations, the
construction is analogous.
Let us fix xCNN ¼uΘðxIÞand consider the functional.
Fλ,yIxðÞ¼kW1=2AIxyI
ðÞk
2
2þλkxxCNN k2
2, (14)
where W1=2denotes the square root of the diagonal operator
which contains the entries of the density compensation func-
tion. For the special case where the operator Ais an isometry,
problem (14) has a simple closed-form solution (see, e.g.,
Ref. [13,16]). The involvement of Win functional (14) can be
motivated as a weighting factor which is used for precondi-
tioning the linear system which needs to be solved when min-
imizing Eq. (14).
33
This aspect is more clearly visible later in
Eq. (15). A simple example of Abeing an isometry is given
by a single-coil acquisition on a Cartesian grid using a simple
FFT. In this case, the solution of Eq. (14) can be obtained by
performing a linear combination of the measured k-space
data yIand the one estimated by applying AIto xCNN. See,
for example, Ref. [13], for more details. However, in the more
general case, a minimizer of Eq. (6) is obtained by solving a
system of linear equations. By setting the derivative of Eq.
(14) with respect to xto zero, it can be easily seen that the
system one needs to solve is given by Hx ¼bwith
H¼A#
IAIþλIN,
b¼A#
IyIþλxCNN:(15)
System (15) can be solved by means of any iterative algo-
rithm and, since the operator His symmetric, an appropriate
choice is the conjugate gradient (CG) method.
34
Note that
functional Eq. (14) is linear in xand therefore, due to strong
convexity, solving Eq. (15) leads to the unique minimizer of
Eq. (14). In practice, as we discuss later in the implementa-
tion details, the DC-module is an implementation of a finite
number of iterations of the CG method. We denote the num-
ber of such iterations by nCG. Note that in the CG method, the
operator Hhas to be applied at each iteration. In addition,
note that in general, the application of AIas well as A#
Ican
be computationally way more expensive than for a simple
FFT, for example, if the gridding of the k-space coefficients
is part of the operators as in our case. Thus, it is desirable to
have a CNN-block with as few trainable parameters as possi-
ble such that end-to-end training of the entire network is pos-
sible in a reasonable amount of time.
2.D. Training scheme
Because the solution of Eq. (15) has to be approximated
using an iterative scheme which employs the
FIG. 1. The proposed reconstruction network alternates between the application of CNN-blocks uΘand and data onsistency blocks fDC
λwhich increase the con-
sistency of the outputs of the intermediate CNN-blocks by making use of the measured data which are implicitly given by the initial reconstruction xI.
Medical Physics, 48 (5), May 2021
2415 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2415
(computationally expensive) application of the operator Hat
each iteration, end-to-end training of the entire reconstruction
network from scratch would be time-consuming. Thus, we
circumvent this issue by the following more efficient training
strategy.
First, in a pretraining step, we only train a single CNN-
block on image pairs as is typically done in deep learning-
based post-processing methods. More precisely, we minimize
the L2-norm of the error between the output of the CNN-
block which was predicted from the initial reconstruction xI
and its corresponding label. Then, in a second training stage,
we construct a network as in Eq. (5) and initialize each of the
CNN-blocks by the previously obtained parameter set Θand
perform a further fine-tuning of the entire network by end-to-
end training. Further, the regularization parameter λcon-
tained in each of the CG-blocks fDC
λcan be included in the
set of trainable parameters as well and is trained to find the
optimal strength of the contribution of the regularization
term. This means that it implicitly learns to estimate the noise
level present in the measured k-space data for the whole data-
set.
2.E. Experiments with in-vivo data
To evaluate the efficacy of our proposed approach, we per-
formed different experiments. First, we investigated the effect
of our proposed training strategy. We also compared the pro-
posed network with different configurations of hyper-parame-
ters Mand nCG. Finally, we further compared our proposed
approach to several other methods for non-Cartesian cardiac
cine MR image reconstruction. In the following, we provide
the reader with information about the used dataset, the meth-
ods of comparison and some details on the implementation
of the proposed method.
2.F. Dataset
We used a dataset of cine MR images of n¼19 subjects
(15 healthy volunteers +4 patients). For each healthy volun-
teer as well as for two patients, Nz¼12 different orientations
of cine MR images were acquired. For the resting two
patients, only Nz¼6 slices were acquired due to limited
breathhold capabilities. Thus, we have a total of 216 com-
plex-valued cine MR images. We split the dataset in 144/36/
36 images used for training, validation, and testing, where for
the test set, we use the 36 cine MR images of the patients in
order to be able to qualitatively assess the images with respect
to clinically relevant features.
All images were acquired using a balanced steady-state
free precession (bSSFP) sequence on a 1.5 T MR scanner
during a single breathhold of 10 s (repetition time/echo
time =3.0/1.5ms, flip angle 60°). The images used as
ground-truth data for the retrospective undersampling were
reconstructed from the k-space data which were sampled
along Nθ¼3400 radial spokes with kt-SENSE.
35
The coil-
sensitivity maps, which were calculated from low-resolution
images obtained from the central part of the radial k-space
data, were used to combine the different images to a single
FIG. 2. The proposed CNN-block. First, the input image xIis Fourier transformed along the temporal axis, then the Fourier-transformed image is reshaped into
the xt- and yt-domain. The same 2D U-Net uΘis applied to each of the 2D slices. Then, the estimate in the temporal Fourier domain is calculated by reassembling
the processed slices and the output is obtained by applying the inverse temporal Fourier transform and summing it up to the input image. [Color figure can be
viewed at wileyonlinelibrary.com]
Medical Physics, 48 (5), May 2021
2416 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2416
image after the NUFFT-reconstruction. The images have a
field of view of NxNy¼320 320 mm
2
with an in-plane
resolution of 2 mm
2
and a slice thickness of 8 mm. The num-
ber of acquired cardiac phases was Nt¼30. A 32-channel
cardiac-phased array coil was used for signal reception and
parallel image reconstruction. From these images, we retro-
spectively simulated k-space data by sampling Nθ¼560 and
Nθ¼1130 radial spokes. Since sampling along Nθ¼3400
already corresponds to an acceleration factor of approxi-
mately ~3 (with respect to the Nyquist limit), which was
needed to perform the scan during one breathhold, Nθ¼560
and Nθ¼1130 correspond to acceleration factors of ~18 and
~9, respectively.
Further, the k-space data were corrupted by normally dis-
tributed noise with standard variation σ¼0:02 which was
added to each the real and imaginary parts of yc
Ifor
c¼1,,12 after having centered them. The calculation of
the density compensation function for a fixed set of trajecto-
ries was based on partial Voronoi diagrams.
36
2.G. Quantitative measures
We evaluated the performance of our proposed network
architecture in terms of different error- and image-similarity-
based measures. The peak signal-to-noise ratio (PSNR) and
the normalized root mean squared error (NRMSE) are used
as error-based measures. Further, we report a variety of differ-
ent similarity-based measures: the structural similarity index
measure (SSIM),
37
the multi-scale SSIM (MS-SSIM),
38
the
universal image quality index (UQI),
39
the visual information
quality measure (VIQM),
40
and the Haar wavelet-based simi-
larity index measure.
41
We calculate all measures by comparing the 2D complex-
valued images in the xy-plane for each time point. This means
that our test set consists of 1080 2D images. For the calcula-
tion of the similarity measures, the real and imaginary part of
the images are treated as channels. The similarity-based mea-
sures are calculated for each channel separately and then aver-
aged across the two channels. The statistics were calculated
over a region of interest of 160 160 pixels in order to dis-
card background noise. Further, we segmented the patients in
the images of the test set in order for the statistics to reflect
the achieved performance on regions of interest.
2.H. Implementation details
The architecture was implemented in PyTorch. Complex-
valued images were stored as two-channeled images. The for-
ward and the adjoint operators AIand AH
Iwere implemented
using the publicly available library Torch KBNUFFT
42,43
which also allows to perform back-propagation across the for-
ward and adjoint operators. During training, the k-space tra-
jectories, the coil-sensitivity maps, and the density
compensation functions were stored as tensors for the imple-
mentation of AIand AH
I. Note that we make the regularization
parameter λtrainable such that a trade-off between the mea-
sured k-space and the output of the CNN-blocks is learned.
In order to constrain λto be strictly positive, during the fine-
tuning of the model, we apply a Softplus activation function,
that is, we set λ:¼SoftPlusð~
λÞ¼1
βðlogð1þexpðβ~
λÞÞ, which
maps ~
λto the interval ð0,Þ. We used the default parameter
β¼1. Note that a CG scheme is usually stopped after a cer-
tain stopping criterion is met. A commonly used stopping cri-
terion is if the norm of the newly calculated residual
rk¼Hxkbkis small enough, that is, krkk2TOLkbk2for
a tolerance TOL chosen by the user. During fine-tuning the
iterative network, we fix the number of CG iterations nCGbut
when testing the network on unseen data, we can choose to
use the number of iterations the CNN was fine-tuned with or
set an own stopping criterion. All experiments were per-
formed on an NVIDIA GeForce RTX 2080 with 11 GB
memory.
2.I. Comparison with other methods
We compared our proposed approach to the following
methods which employ recently published learned and well-
established non-learned regularization methods. As well-
established reconstruction methods, we applied iterative
SENSE,
44
a Total Variation (TV)-minimization method,
45
and kt-SENSE.
31
Further, we compared our proposed method
to a method based on dictionary learning (DL) and sparse
coding (SC)
46,47
using adaptive DL and adaptive SC
48
and a
method which employs CNN-based regularization in the form
of previously obtained image priors,
9,10
where we used a pre-
viously trained 3D U-Net
6
for obtaining the prior.
Note that on purpose we did not compare our proposed
approach with other CNN-based methods involving genera-
tive adversarial networks (GANs). The reason is that we are
mainly interested in the performance of the combination of
the proposed CNN-block in terms of artifacts-reduction as
well as the trade-off between employing a CG-module or not.
In addition, note that if the hardware allows it, it is always
possible to add different components based on GANs to regu-
larize the output of the CNN-blocks. Further, note that
although there exist several other state-of-the-art methods
using cascaded/iterative networks for dynamic cine MRI, see
for example Ref. [13,16,18] the underlying reconstruction
problem is a different one (single-coil and Cartesian vs. mul-
ti-coil and radial) and thus these methods are not directly
applicable as originally published.
3. RESULTS
In the following, we report the obtained results concerning
the training behavior and the reconstruction performance of
our proposed method.
3.A. Computational complexity of the forward and
adjoint operators
Here, we evaluated the computational complexity of the
proposed network architecture in terms of required GPU
memory as well as training times which can be estimated for
Medical Physics, 48 (5), May 2021
2417 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2417
the end-to-end training stage. Here, we fixed the number of
iterations of the network to be M¼1 and the CNN-block was
fixed to be the identity cΘ¼IN, that is, it contains no train-
able parameters. Thus, the allocated memory can be mainly
attributed to the tensors needed for the radial trajectories, the
density compensation, the coil-sensitivity maps which define
the operator, and the considered input image.
Figure 3shows the allocated GPU memory as well as the
required time to perform one weights-update by performing a
forward- and a backward-pass through the entire reconstruc-
tion network. By Nθwe denote the total number of radial
spokes which are acquired for each of the Ntcardiac phases.
The first row of Fig. 3shows the average allocated GPU
memory as well the required time to perform one weights-up-
date depending on the spatial image size. This is shown for
nCG ¼1 and different numbers of radial spokes Nθ. It can be
seen that already for nCG ¼1, the required GPU memory
amounts to approximately 5121024 GB and performing one
step of back-propagation is in the range of 4 s. The second
row of Fig. 3shows the same quantities for fixed Nθ¼560
and different nCG. Here, we also see that employing a rela-
tively high number of CG iterations, say nCG ¼12, almost
requires 2 GB of GPU memory and, more importantly,
requires more than 30 s. Training the entire network in this
configuration for 100 epochs would, for example, already
require 5 days. By that, one can estimate that training the
entire network from scratch in an end-to-end manner could
easily amount to weeks or months. Further, the required time
does not vary much for different image sizes, meaning that
even for relatively small reconstruction problems, the applica-
tion of AIand AH
Iis inherently computationally demanding.
These computational aspects highlight the importance of
the efficacy of the CNN-block in terms of having a small
number of parameters to ensure fast convergence during
training and at the same time a good performance in terms of
undersampling artifacts-reduction.
3.B. Efcacy of the training scheme
Here, we show the impact of including the forward and
the adjoint operators AIand AH
Iin the network architecture
during the learning process. Figure 4shows the training and
validation error of the proposed CNN-module during the pre-
training stage. In the first pretraining stage, only one single
CNN-block was trained to minimize the L2-error between the
output estimated by the CNN-block uΘand its corresponding
label. In the fine-tuning stage, a CG-module with nCG ¼8
iterations was attached to the CNN-block. As can be seen, in
the pretraining stage, after about 65000 weight updates
(which corresponds to approximately 450 epochs using a
mini-batch size of one), training and validation error start to
stagnate between 0.025 and 0.30, respectively. After
FIG. 3. Computational complexity and the time required for one backward pass for the operator Hwhich is employed in the DC-module for different spatial
image sizes NxNy. The number of temporal points was set to Nt¼30 and the number of coils was Nc¼12 [Color figure can be viewed at wileyonlinelibrary.
com]
Medical Physics, 48 (5), May 2021
2418 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2418
pretraining, the parameter set Θwas stored and the fine-tun-
ing of the entire network was carried out by initializing the
set of parameters Θas the one obtained after pretraining. As
can be seen from the orange curves, including the GC-mod-
ule which employs the encoding operators in the CNN archi-
tecture allowed to further reduce the training as well as the
validation error and therefore to obtain a more suitable
parameter set Θ.
Note that since the fine-tuning stage is much more compu-
tationally demanding than the pretraining stage, we only
trained for 150 epochs and tested on the whole training and
validation dataset only 12 times instead of 75 times as in the
pretraining stage. The time for pretraining the CNN-module
amounted to approximately 6 hr, while fine-tuning the entire
network took approximately 5 days.
Figure 5shows the effect of the proposed training strategy
on the obtained results. Figure 5(e) shows the initial NUFFT-
reconstruction which is directly obtained from the undersam-
pled k-space data. After having pretrained the single block uΘ,
the image in Fig. 5(a) is obtained by passing the input to the
CNN-block. Further, proceeding in the reconstruction network
with the CG-bock yields the image shown in Fig. 5(b) for
which the point-wise error is lower than for (a). Figure 5(c)
shows the output of the CNN-block after having fine-tuned the
entire network architecture, that is, a CNN-block with attached
CG-block, by end-to-end training. By comparing (c) to (a), we
see that the quality of the output of the CNN-block has clearly
increased as the point-wise error has been further reduced. Fur-
ther, proceeding with the CG-module in the network further
reduces the point-wise errors as can be seen in (d). Again, by
comparing the final reconstructions (d) and (b), we see that the
quality of the reconstruction has further increased as the point-
wise error has decreased. Note that by taking in consideration
the stagnation of the training and validation error shown in
Fig. 4, we can indeed attribute the increase in performance of
the reconstruction method to the fact that we included the for-
ward and adjoint operators in the training process and rather
than to the additionally performed weight-updates. Further, we
have experimentally confirmed the efficacy of the proposed
training procedure.
Table Ilists the quantitative measures obtained on the test
set for the intermediate output of the CNN-block as well as for
the final reconstruction before and after fine-tuning the entire
network for two different acceleration factors. The table well-
reflects the visual results in the sense that the intermediate out-
put of the CNN as well as the final reconstruction after the
fine-tuning stage surpass their corresponding image estimates
after the sole pretraining in terms of all reported measures.
3.C. Variation of the hyper-parametersMandnCG
As already mentioned, at test time, one does not necessar-
ily need to stick to the configuration of the network in terms
of length Mand number of CG-iterations nCG which were
used for fine-tuning the network. In particular, for the fine-
tuning stage, the choice of Mand nCG is mainly driven by
factors as training times and hardware constraints which do
not play a role at test time.
Thus, we performed a parameter study where at test time,
we varied the length of the network Mas well as the number
of CG-iterations nCG. We repeated these experiment for two
different configurations of our proposed network. We fine-
tuned one network with M¼3 and nCG ¼2 and another one
with M¼1 and nCG ¼8. Due to hardware constraints, the
networks also differ in terms of number of trainable parame-
ters of the CNN-block.
At test time, we fixed nCG ¼4 and varied Mfrom
M¼2,4,,12. With this configuration, the noise and the
artifacts are gradually reduced and data consistency of the
solution is increased several times during the whole recon-
struction process. In Tables S1 and S2 in the supplementary
material, one can see that increasing Mconsistently further
improved the results for both networks. Further, the compar-
ison of the two networks in Fig. S1 in the supplementary
material shows that the network containing more trainable
parameters (i.e., the one which was fine-tuned with M¼1
and nCG ¼8) surpasses the other one with respect to all mea-
sures. This observation motivates the choice of the final
reconstruction network used for comparison with other meth-
ods in the next subsection.
3.D. Reconstruction results
Here, we compare our reconstruction method with the
image reconstruction methods previously introduced. As dis-
cussed in Section 3.C we chose to use M¼12 and nCG ¼4
although the network was trained with M¼1 and nCG ¼8.
Note that, since the same strategy seemed not to be consis-
tently useful for the 3D U-Net which was trained without the
integration of the encoding operators, we report here the
0 10000 20000 30000 40000 50000 60000 70000 80000
Back-Propagations
0.020
0.025
0.030
0.035
0.040
Average L2-error
Training-Error (Pre-Training)
Validation-Error (Pre-Training)
Training-Error (Fine-Tuning)
Validation-Error (Fine-Tuning)
FIG. 4. Training behavior of the proposed network architecture. The lines
indicate the L2-error on the training data (solid lines) as well as on the valida-
tion data (dashed lines) during pretraining of the CNN-block (green lines)
and fine-tuning of the entire network (orange lines). As can be seen, in the
fine-tuning stage, where the physical models AIand AH
Iare included in the
network architecture, the error is further reduced on both the training and the
validation data. Further, the gap between training and validation errors
becomes smaller. [Color figure can be viewed at wileyonlinelibrary.com]
Medical Physics, 48 (5), May 2021
2419 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2419
values for M¼1 and nCG ¼12 which are also the ones used
in Ref 10. In the supplementary material, the results for the
variation of Mand nCG can be found as well. The regulariza-
tion parameter for the dictionary learning method
48
and the
CNN-based regularization method
10
was chosen as λ¼1.
Figure 6shows an example of the results obtained with
the previously described methods of comparison and our pro-
posed approach. As can be seen, the total variation-minimiza-
tion method as well as kt-SENSE successfully removed the
undersampling artifacts but also led to a loss of image details
as is indicated by the yellow arrows. In contrast, all learning-
based method yielded a good reconstruction performance in
terms of preservation of the cardiac motion. We see that our
proposed method is in addition the one which best reduced
residual image noise in the images. Table II shows the results
achieved in terms of the chosen measures. The best achieved
results are highlighted as bold numbers. Again, the experi-
ments were repeated for two different undersampling (i.e.,
acceleration) factors, given by sampling the k-space along
Nθ¼560 and Nθ¼1130 radial spokes, respectively.
The numbers well-reflect the observations from Fig. 6and
we see that all methods based on regularization methods
employing learning-based methods consistently outperform
the methods using hand-crafted regularization methods with
respect to all measures. All three reported methods using
machine learning-based regularization achieve competitive
results, where we see that our proposed method yields sub-
stantially better results compared to the dictionary learning
(DL)-based method and the other CNN-based method in
terms of error-based measures. In terms of image similarity-
based measures, the difference between the dictionary learn-
ing-based method and ours becomes less prominent except
for VIQP. Interestingly, the 3D-U-Net-based method is con-
sistently surpassed either by the dictionary learning-based
method or our proposed one. All observations are consistent
among both acceleration factors with Nθ¼560 and
Nθ¼1130. In addition, note that while the obtained results
for DL, the 3D U-Net-based iterative reconstruction and our
proposed approach are similar, the average reconstruction
time using DL amounts to approximately 2000 s, where the
most computationally intensive part is the repeated sparse
coding of the image patches at each iteration. However, the
implementation of the dictionary learning and sparse coding
algorithms aITKrM and aOMP is currently only available for
running on the CPU and thus, a further acceleration could be
expected. In contrast, for the CNN-based methods, the
(a) (c)
(b) (d)
(e) (f)
FIG. 5. Intermediate results and point-wise error images of our reconstruction method after pretraining and after fine-tuning our proposed method with M¼1
and nCG ¼8. The output of the CNN-block after pretraining (a), the final reconstruction (b) after pretraining where (a) was the output of the CNN-block, the out-
put of the CNN-block after fine-tuning the entire cascade in an end-to-end manner (c) and the final reconstruction after fine-tuning (d) where (c) was the output
of the CNN-block, the initial NUFFT-reconstruction xIobtained from Nθ¼560 radial spokes (e) and the corresponding ground-truth image obtained by kt-
SENSE with Nθ¼3400 radial spokes (f). We see that after having fine-tuned the entire network, the point-wise error of the final reconstruction is the smallest.
Further, the cardiac motion is much better preserved as is pointed out by the yellow arrows. [Color figure can be viewed at wileyonlinelibrary.com]
Medical Physics, 48 (5), May 2021
2420 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2420
reconstruction time amounts to approximately 12 and 48 s
which are mainly coming from the CG-module. Since one
iteration in the CG-block takes approximately 1 s mainly
because of the evaluation of the operator H¼A#
IAIþλIN,
the overall reconstruction time of our proposed method can
be estimated as tRECMnCG. Further, note that our proposed
method quite consistently surpasses the 3D U-Net-based
reconstruction method even if it only contains around 9:1%
of the trainable parameters.
4. DISCUSSION
In this work, we have proposed the first end-to-end train-
able iterative reconstruction network for dynamic multi-coil
MR image reconstruction using nonuniform sampling
schemes. As we have seen, the proposed end-to-end trainable
reconstruction network provides a competitive method for
image reconstruction for 2D cine MR image reconstruction
with non-Cartesian multi-coil encoding operators. In the fol-
lowing, we highlight several advantages and limitations of
our proposed approach and put it in relation to other works.
4.A. End-to-end trainability
Since the considered forward model is computationally
demanding, methods as
9,10
can be used as an alternative,
where the generation of the CNN output and the step of
increasing data consistency are decoupled from each other.
However, the major advantage of our proposed network over
methods similar to Refs. [9]or[10], is that the reconstruction
network is trained in and-end-to-end manner. As can be seen
in Fig. 4in Section 3.B, including the DC-module given by
the CG method in the network architecture is highly benefi-
cial since it further reduces the training and validation error
and also reduces the gap between the both, leading to a better
generalization power. This result experimentally confirms the
theory on the achievable performance of the reconstruction
network derived in.
12
Further, we demonstrated that the pro-
posed training strategy is a viable option for training the
entire network in an end-to-end manner in a relatively short
amount of time while achieving good convergence properties
of the network parameters.
4.B. The choice of the CNN-block
As we have seen in Fig. 4, the proposed CNN-blocks
trainable parameters converge relatively fast (approximately
6 hr) to a set of suitable trainable parameters in the pretrain-
ing stage. Training the 3D U-Net, in contrast, took approxi-
mately 3 days. Because the proposed CNN-block is trained
relatively fast, fine-tuning the entire network is possible by
investing approximately 5 days of further training.
Note that our proposed approach transfers the learning of
the artifact-reduction mapping to from 3D to 2D by the appli-
cation of 2D convolutional layers in the temporally Fourier-
transformed spatiotemporal domain. Thus, it inherits all ben-
efits method presented in Ref. [7]. The most important prop-
erty is that due to the change of perspective on the data, for
each single cine MR image, NxþNysamples are actually
considered to train the CNN-block, which is essential for
training. Since only a relatively small amount of training data
in terms of number of subjects already suffices for a success-
ful training, the end-to-end-training, which is particularly
computationally expensive during the fine-tuning stage, can
be carried out in a reasonable amount of time without the
need to additionally augment the dataset in order to prevent
overfitting.
Note that, of course, if enough training data are available
and the hardware constraints allow it, training the network
with computationally heavier CNN-blocks is possible as well.
Thus, the proposed method can also be seen as a general
method for training an end-to-end reconstruction network for
large-scale image reconstruction problems with computation-
ally demanding forward operators.
4.C. The trade-off between the hyper-parametersM
andnCG
From the formulation of the network architecture in Eq.
(5), we can identify two main hyper-parameters which can be
varied and determine the nature of the proposed reconstruc-
tion algorithm. The overall number of iterations Mdefines
the length of the unrolled iterative scheme. Further, because
in general, minimizing functional (14) requires solving a lin-
ear system using an iterative solver, the number of iterations
to approximate the solution of Eq. (14), here named nCG, has
TABLE I. Intermediate and final reconstruction results after pretraining of the
CNN-block and after fine-tuning of the entire network architecture with
M=1 and nCG.=8. The results are shown for =Nθ560 and =Nθ1130
radial spokes.
After pretraining After fine-tuning
CNN-
prior
Final
reconstruction
CNN-
prior
Final
reconstruction
Number of Radial Spokes:Nθ¼560
PSNR 42.4541 45.4826 44.484 46.0036
NRMSE 0.1252 0.0874 0.0963 0.0810
SSIM 0.9576 0.9796 0.9833 0.9872
MS-
SSIM
0.9916 0.9966 0.9965 0.9977
UQI 0.8027 0.8776 0.9064 0.9275
VIQP 0.9431 0.9506 0.9429 0.9434
HPSI 0.9889 0.9941 0.9901 0.9943
Number of Radial Spokes:Nθ¼1130
PSNR 43.9893 47.0087 46.2383 47.7545
NRMSE 0.1044 0.0735 0.0808 0.0673
SSIM 0.9698 0.9854 0.9873 0.9903
MS-
SSIM
0.9945 0.9977 0.9974 0.9984
UQI 0.8324 0.8998 0.9209 0.9408
VIQP 0.9574 0.9642 0.9604 0.9626
HPSI 0.9916 0.9961 0.9930 0.9962
Medical Physics, 48 (5), May 2021
2421 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2421
to be chosen as well. By setting M¼1 and nCG relatively
"high," say nCG ¼12, one aims at constructing an end-to-end
trainable network which conceptually resembles methods
like.
9,10
There, the CNN-block is only applied once to obtain
a CNN-based image-prior and functional (14) is minimized
until convergence of the iteration. In contrast, one can also
set M>1, but because of hardware constraints, one necessar-
ily also has to either lower nCG in each CG-block, reduce the
complexity of the CNN-blocks, or in the worst case, both.
Lowering the number of CG-iterations nCG causes the solu-
tion of Eq. (14) most probably to only be poorly approxi-
mated and lowering the CNN-blocks complexity can be
expected to deliver poorer intermediate outputs of the differ-
ent CNN-blocks in terms of artifacts-reduction.
Interestingly, we have observed that even fine-tuning the
entire network with M¼1 and nCG ¼8 allowed to change the
hyper-parameters at test time by further obtaining a boost in
performance, see the supplementary material for more details.
We also trained an iterative network architecture with our
proposed CNN-block for M¼3 and nCG ¼2. Due to
hardware constraints, we employed smaller 2D U-Nets as
CNN-blocks, where, different to before, we set the initial
number of applied filters to nf¼4. The so-constructed net-
work consisted of only 5908 trainable parameters, that is,
only about 0.57%of the 3D U-Net.
In the supplementary material, one can see a comparison of
our method fine-tuned with M¼1andnCG ¼8againstM¼3
and nCG ¼2 which were evaluated with different configurations
of Mand nCG at test time. The network which was fine-tuned
with M¼1andnCG ¼8 consistently outperforms the other with
respect to all measures. Altough the comparison is not entirely
fair since the number of trainable parameters highly differs from
one CNN-block to the other, this result is important because of
the following reason. It suggests that sacrificing expressiveness
of the CNN-block in terms of trainable parameters in order be
able to fine-tune with M>1 seems not to be necessary since
also for the network fine-tuned with M¼1andnCG ¼8, differ-
ent Mand nCG can be used at test time and further improve the
results. Interestingly, we were not able to observe this phe-
nomenon for the 3D U-Net as can be seen from Table S3 in the
FIG. 6. Comparison of images and point-wise errors for different reconstruction methods with learned and non-learned regularization for Nθ¼560. Left column:
Classical iterative reconstruction methods with from top to bottom direct NUFFT-reconstruction, iterative SENSE,
44
total variation-minimization
(TV),
45
kt-SENSE.
31
Right column: Learning-based regularization methods with from top to bottom adaptive dictionary learning and sparse coding
(DIC),
48
CNN-based iterative reconstruction (CNN +IR)
10
using a 3D U-Net,
6
and our proposed approach. The yellow arrows in the images of the left column
point at errors occurring at regions of the heart which might affect the assessment of the cardiac motion while the ones in the right column point at residual image
noise. [Color figure can be viewed at wileyonlinelibrary.com]
Medical Physics, 48 (5), May 2021
2422 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2422
supplementary material. Thus, we attribute this property to the
fine-tuning stage in which the encoding operator is included in
the network architecture which again highlights the importance
of employing a CNN-block which allows end-to-end training of
the entire network in a reasonable amount of time.
4.D. Limitations
For the proposed method, training times tend to be quite
long, amounting to several days. This makes the algorithmic
development challenging in terms of hyper-parameter tuning
and limits the possibility to draw general conclusions because
repeating experiments for different parameter configurations
is prohibitive. Nevertheless, the proposed training scheme
shows that it is easily possible to outperform post-processing
methods as in
6,7
or the so-called model-agnostic approaches
in Refs. [9,10] where training of the CNN-block is decoupled
from the subsequent minimization of a CNN-regularized
functional.
Further, although we have seen that it is possible to increase
Mat test time and observe an increase in performance in terms
of reconstruction results, from a theoretical point of view, it
remains somewhat unclear why this is exactly possible. We
believe that the reason for this lies in the end-to-end training
the entire network but leave a rigorous theoretical investigation
and a convergence analysis as future work.
4.E. Differences and similarities to other works
Our presented approach shares similarities across different
works. First, for the case M¼1, it can be seen as an exten-
sion of the approaches presented in Refs. [7,9,49] in the sense
that we only generate only one CNN-based image-prior
which is used in a Tikhonov functional. However, because of
the proposed end-to-end training strategy, the obtained CNN-
based image-prior tends to be a much better estimate of the
ground-truth image compared to the cases when the training
of the CNN-block is decoupled. Further, for M>1, the struc-
ture of the network is similar to,
13,17
with the difference that
first of all, the considered inverse problem is different (radial
multi-coil instead of Cartesian single-coil) and thus the DC-
module is a CG-block instead of the implementation of a
closed-form solution to Eq. (14). Second, our CNN-block
consists of a spatiotemporal 2D U-Net which is applied in the
Fourier-transformed spatiotemporal domain.
In our work, the 2D U-Nets are applied in the temporally
Fourier-transformed spatiotemporal domain and thus use the
same change of perspective on the data as in.
7
However, in
Ref. [7] the slices extraction and reassembling process is not
part of the network architecture and thus does not allow end-
to-end training. Further, we apply the U-net after having per-
formed a temporal FFT, similar to Ref. [18].
In Ref. [50], where a non-Cartesian multi-coil dynamic
acquisition is considered, the measured k-space data are first
interpolated onto a Cartesian grid. After this interpolation, a
simple FFT can be used as forward operator and thus facili-
tates the construction of an iterative network. However,
because the k-space data interpolation is decoupled from the
network, the network cannot learn to compensate for the
interpolation errors. Thus, in order to really utilize the mea-
sured k-space data, applying the actual encoding operator
(which involves the gridding-step) which is associated with
the considered image reconstruction problem is unavoidable.
In contrast, in our approach, the gridding of the measured
k-space is part of the network architecture. This important
difference is the main motivation of the work since due to the
TABLE II. Quantitative results for Nθ¼560Nθ¼1130 radial spokes which corresponds to an acceleration factor of ~9and 18, respectively.
Non-learned regularization Learned regularization
NUFFT It SENSE TV kt-SENSE DL 3D U-Net +IR Proposed
Number of radial spokes:Nθ¼560
PSNR 34.9542 40.5610 41.8778 44.2316 45.6085 46.0845 47.4396
NRMSE 0.2913 0.1535 0.1313 0.0986 0.0844 0.0800 0.0697
SSIM 0.8843 0.9686 0.9786 0.9820 0.9885 0.9880 0.9895
MS-SSIM 0.9762 0.9935 0.9950 0.9958 0.9980 0.9979 0.9982
UQI 0.7574 0.8827 0.8889 0.8887 0.9347 0.9288 0.9388
VIQP 0.8339 0.8379 0.8234 0.8931 0.9256 0.9340 0.9620
HPSI 0.9456 0.9811 0.9849 0.9904 0.9950 0.9949 0.9961
Number of radial spokes:Nθ¼1130
PSNR 39.0289 43.8077 44.1841 46.0096 47.4373 47.5094 48.6761
NRMSE 0.1864 0.1067 0.0998 0.0817 0.0682 0.0681 0.0606
SSIM 0.9378 0.9778 0.9841 0.9875 0.9913 0.9906 0.9916
MS-SSIM 0.9892 0.9965 0.9965 0.9976 0.9986 0.9985 0.9986
UQI 0.8363 0.9156 0.9028 0.9224 0.9484 0.9420 0.9480
VIQP 0.9228 0.9513 0.8832 0.9498 0.9521 0.9547 0.9695
HPSI 0.9775 0.9923 0.9903 0.9943 0.9972 0.9966 0.9972
Parameters –– 9664 1 024 224 93 617
Backend GPU CPU CPU CPU CPU/GPU GPU GPU
Medical Physics, 48 (5), May 2021
2423 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2423
computational difficulties linked with the integration of the
nonuniform FFT in the network architecture, a computation-
ally light CNN-block has to be used.
Although the method shares similar features to other works,
this is to the best our knowledge the first work to combine
several components to construct a network architecture which is
trainable in an end-to-end manner for a dynamic nonuniform
multi-coil MR image reconstruction problem by actually using
the radially acquired data and not the one interpolated onto a
Cartesian grid. In fact, we believe that the large scale of the con-
sidered problem is the reason for the lack of end-to-end trainable
reconstruction networks for dynamic non-Cartesian data acquisi-
tion protocols using multiple receiver coils.
5. CONCLUSION
In this work, we have proposed a new end-to-end trainable
data-consistent reconstruction network for accelerated 2D
dynamic MR image reconstruction with nonuniformly sam-
pled data using multiple receiver coils. Further, since end-to-
end training is computationally expensive because of the for-
ward and the adjoint operators are included in the network,
we have proposed and investigated an efficient training strat-
egy to circumvent this issue. In addition, we have compared
our method to other well-established iterative reconstruction
methods as well as several methods based on learned regular-
ization. Our proposed method surpassed all methods using
non-learned regularization methods and achieved competitive
results compared to a dictionary learning-based method and a
method based on CNN-based image-priors. Although the
method was presented for 2D radial cine MRI, we expect the
training strategy to be applicable to general image reconstruc-
tion problems with computationally expensive operators as
well. Further, we expect the proposed CNN-block to be appli-
cable to arbitrary reconstruction problems with a time com-
ponent and temporal correlation.
CONFLICT OF INTEREST
The authors have no conflict to disclose.
DATA AVAILABILITY STATEMENT
The implementation of the encoding operators AI,AH
I,as
well as the proposed CNN-block uΘand the entire recon-
struction network uM
Θare available on https://github.com/kof
lera/DynamicRadCineMRI/.
a)
Author to whom correspondence should be addressed. Electronic mail: an-
REFERENCES
1. Puntmann VO, Valbuena S, Hinojar R, et al. Society for Cardiovascular
Magnetic Resonance (SCMR) expert consensus for CMR imaging end-
points in clinical research: Part I-analytical validation and clinical quali-
fication. J Cardiovasc Magn Reson. 2018;20:67.
2. Wang G, Ye JC, Mueller K, Fessler JA. Image reconstruction is a new
frontier of machine learning. IEEE Trans Med Imaging.
2018;37:12891296.
3. Ongie G, Jalal A, Metzler CA, Baraniuk RG, Dimakis AG, Willett R.
Deep learning techniques for inverse problems in imaging. IEEE J Select
Areas Inform Theory. 2020;1:3956
4. Jin KH, McCann MT, Froustey E, Unser M. Deep convolutional neural
network for inverse problems in imaging. IEEE Trans Image Process.
2017;26:45094522.
5. Sandino CM, Dixit N, Cheng JY, Vasanawala SS. Deep convolutional
neural networks for accelerated dynamic magnetic resonance imaging,
Proceedings of 31st Conference of Neural Information Processing Sys-
tems (NIPS), Medical Imaigng meets NIPS Workshop; 2017. Online.
Available at http://www.doc.ic.ac.uk/bglocker/public/mednips2017/med
nips_2017_paper_19.pdf).
6. Hauptmann A, Arridge S, Lucka F, Muthurangu V, Steeden JA. Real-
time cardiovascular MR with spatio-temporal artifact suppression using
deep learningproof of concept in congenital heart disease. Magn Reson
Med. 2019;81:11431156.
7. Kofler A, Dewey M, Schaeffter T, Wald C, Kolbitsch C. Spatio-temporal
deep learning-based undersampling artefact reduction for 2D radial cine
MRI with limited training data. IEEE Trans Med Imaging.
2020;39:703717.
8. El-Rewaidy H, Fahmy AS, Pashakhanloo F, et al. Multi-domain convolu-
tional neural network (MD-CNN) for radial reconstruction of dynamic
cardiac MRI. Magn Reson Med. 2020;85:11951208.
9. Hyun CM, Kim HP, Lee SM, Lee S, Seo JK. Deep learning for under-
sampled MRI reconstruction. Phys Med Biol. 2018;63:135007.
10. Kofler A, Haltmeier M, Schaeffter T, et al. Neural networks-based regu-
larization for large-scale medical image reconstruction. Phys Med Biol.
2020;65:135003.
11. Antun V, Renna F, Poon C, Adcock B, Hansen AC. On instabilities of
deep learning in image reconstruction and the potential costs of AI. Proc
Nat Acad Sci. 2020;117:3008830095.
12. Maier AK, Syben C, Stimpel B, et al. Learning with known operators
reduces maximum error bounds. Nature Mach Int. 2019;1:373380.
13. Schlemper J, Caballero J, Hajnal JV, Price AN, Rueckert D. A deep cas-
cade of convolutional neural networks for dynamic MR image recon-
struction. IEEE Trans Med Imaging. 2018;37:491503.
14. Hammernik K, Klatzer T, Kobler E, et al. Learning a variational network
for reconstruction of accelerated MRI data. Magn Reson Med.
2018;79:30553071.
15. Kobler E, Klatzer T, Hammernik K, Pock T. Variational networks: Con-
necting variational methods and deep learning. In: German conference
on pattern recognition. Springer; 2017:281293.
16. Qin C, Schlemper J, Caballero J, Price AN, Hajnal JV, Rueckert D. Con-
volutional recurrent neural networks for dynamic MR image reconstruc-
tion. IEEE Trans Med Imaging. 2018;38:280290.
17. Aggarwal HK, Mani MP, Jacob M. Modl: Model-based deep learning
architecture for inverse problems. IEEE Trans Med Imaging.
2018;38:394405.
18. Qin C, Schlemper J, Duan J, et al. k-t NEXT: Dynamic MR Image
Reconstruction Exploiting Spatio-Temporal Correlations. In: Interna-
tional Conference on Medical Image Computing and Computer-Assisted
Intervention Springer; 2019:505513.
19. Gilton D, Ongie G, Willett R. Neumann networks for linear
inverse problems in imaging. IEEE Trans Comput Imaging. 2019;6:
328343.
20. Kofler A, Haltmeier M, Kolbitsch C, KachelrießM, Dewey M. A U-nets
cascade for sparse view computed tomography, In: International Work-
shop on Machine Learning for Medical Image Reconstruction, Springer;
2018:9199.
21. K ¨
ustner T, Fuin N, Hammernik K., et al. CINENet: Deep learning-based
3D cardiac CINE MRI reconstruction with multi-coil complex-valued
4D spatio-temporal convolutions. Sci Rep. 2020;10:113.
22. Schlemper J, Salehi SSM, Kundu P, et al. Nonuniform Variational Net-
work: Deep Learning for Accelerated Nonuniform MR Image Recon-
struction. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention. Springer; 2019:5764.
23. Malav´
e MO, Baron CA, Koundinyan SP, et al. Reconstruction of under-
sampled 3D non-Cartesian image-based navigators for coronary MRA
Medical Physics, 48 (5), May 2021
2424 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2424
using an unrolled deep learning model. Magn Reson Med.
2020;84:800812.
24. Duan J, Schlemper J, Qin C, et al. VS-Net: Variable splitting network for
accelerated parallel MRI reconstruction. In: International Conference on
Medical Image Computing and Computer-Assisted Intervention.
Springer; 2019:713722.
25. Lustig M, Donoho DL, Santos JM, Pauly JM. Compressed sensing
MRI. IEEE Signal Process Mag. 2008;25:7282.
26. Smith DS, Sengupta S, Smith SA, Welch EB, Trajectory optimized
NUFFT: Faster non-Cartesian MRI reconstruction through prior knowl-
edge and parallel architectures. Magn Reson Med. 2019;81:20642071.
27. Knoll F, Hammernik K, Zhang C, et al. Deep-learning methods for parallel
magnetic resonance imaging reconstruction: A survey of the current
approaches, trends, and issues. IEEE Signal Process Mag. 2020;37:128140.
28. Winkelmann S, Schaeffter T, Koehler T, Eggers H, Doessel O. An opti-
mal radial profile order based on the Golden Ratio for time-resolved
MRI. IEEE Trans Med Imaging. 2006;26:6876.
29. Qu P, Zhong K, Zhang B, Wang J, Shen GX. Convergence behavior of
iterative SENSE reconstruction with non-Cartesian trajectories. Magn
Reson Med. 2005;54:10401045.
30. Le Gall D. MPEG: A video compression standard for multimedia appli-
cations. Commun ACM. 1991;34:4658.
31. Tsao J, Boesiger P, Pruessmann KP. k-t BLAST and k-t SENSE: dynamic
MRI with high frame rate exploiting spatiotemporal correlations. Magn
Reson Med. 2003;50:10311042.
32. Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for
biomedical image segmentation, In: International Conference on Medi-
cal image computing and computer-assisted intervention. Springer;
2015:234241.
33. Ong F, Uecker M, Lustig M. Accelerating non-Cartesian MRI recon-
struction convergence using k-space preconditioning. IEEE Trans Med
Imaging. 2019;39:16461654.
34. Hestenes MR, Stiefel E, et al. Methods of conjugate gradients for solving
linear systems. J Res Nat Bureau Stand. 1952;49:409436.
35. Feng L, Srichai MB, Lim RP, et al. Highly accelerated real-time cardiac
cine MRI using k-t SPARSE-SENSE. Magn Reson Imaging.2012.
36. Malik WQ, Khan HA, Edwards DJ, Stevens CJ. A gridding algorithm
for efficient density compensation of arbitrarily sampled Fourier-domain
data. In: IEEE/Sarnoff Symposium on Advances in Wired and Wireless
Communication, 2005, IEEE; 2005:125128.
37. Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assess-
ment: From error visibility to structural similarity. IEEE Trans Image
Process. 2004;13:600612.
38. Wang Z, Simoncelli EP, Bovik AC. Multiscale structural similarity for
image quality assessment. In: The Thrity-Seventh Asilomar Conference
on Signals, Systems & Computers, 2003; IEEE; 2003;2:13981402.
39. Wang Z, Bovik AC. A universal image quality index. IEEE Signal Pro-
cess Lett. 2002;9:8184.
40. Sheikh HR, Bovik AC. Image information and visual quality. IEEE
Trans Image Process. 2006;15:430444.
41. Reisenhofer R, Bosse S, Kutyniok G, Wiegand T. A Haar wavelet-based
perceptual similarity index for image quality assessment. Signal Process.
2018;61:3343.
42. MEA Muckley, Torch KB-NUFFT. https://github.com/mmuckley/torc
hkbnufft, 2019.
43. Muckley MJ, Stern R, Murrell T, Knoll F, TorchKbNufft: A High-level,
hardware-agnostic non-uniform fast fourier transform. ISMRM Work-
shop on Data Sampling & Image Reconstruction. 2020.
44. Pruessmann KP, Weiger M, B¨
ornert P, Boesiger P. Advances in sensitiv-
ity encoding with arbitrary k-space trajectories. Magn Reson Med.
2001;46:638651.
45. Block KT, Uecker M, Frahm J. Undersampled radial MRI with multiple
coils. Iterative image reconstruction using a total variation constraint.
Magn Reson Med. 2007;57:10861098.
46. Wang Y, Ying L. Compressed sensing dynamic cardiac cine MRI using
learned spatiotemporal dictionary. IEEE Trans Biomed Eng.
2014;61:11091120.
47. Caballero J, Price AN, Rueckert D, Hajnal JV. Dictionary learning and
time sparsity for dynamic MR data reconstruction. IEEE Trans Med
Imaging. 2014;33:979994.
48. Pali MC, Schaeffter T, Kolbitsch C, Kofler A. Adaptive sparsity level
and dictionary size estimation for image reconstruction in accelerated
2D radial cine MRI. Med Phys. 2020;48:178192.
49. Schwab J, Antholzer S, Haltmeier M. Deep null space learning for
inverse problems: Convergence analysis and rates. Inverse Prob.
2019;35:025008.
50. Biswas S, Aggarwal HK, Jacob M. Dynamic MRI using model-based
deep learning and SToRM priors: MoDL-SToRM. Magn Reson Med.
2019;82:485494.
SUPPORTING INFORMATION
Additional supporting information may be found online in
the Supporting Information section at the end of the article.
Data S1. The supplementary material contains the results
obtained by variying the hyper-parameters Mand nCG for our
proposed method as well as for the method using the 3D U-
Net. Further, a comparison of our proposed method with two
different configurations of hyper-parameters is reported.
Medical Physics, 48 (5), May 2021
2425 Koer et al.: An Iterative CNN for 2D Radial Cine MRI 2425