Generative Adversarial Networks
for Medical Image Synthesis
in Stroke
vorgelegt von
M.Sc.
Tabea Kossen
ORCID: 0000-0002-2986-0907
von der Fakultät IV - Elektrotechnik und Informatik
der Technischen Universität Berlin
zur Erlangung des akademischen Grades
Doktor der Ingenieurwissenschaften
-Dr.-Ing.-
genehmigte Dissertation
Promotionsausschuss:
Vorsitzender: Prof. Dr. Marc Alexa
Gutachterin: Prof. Dr. Anja Hennemuth
Gutachter: Prof. Dr. Kristian Hildebrand
Gutachter: Prof. Dr. Daniel Rückert
Tag der wissenschaftlichen Aussprache: 30. September 2022
Berlin 2022
Abstract
Stroke is one of the leading causes of death worldwide. Medical imaging techniques such as
magnetic resonance imaging offer the possibility to extract essential individual information
about the disease that allowed for better patient care in the past decades. The advancement
in computational power and increase in data availability has led to the rise of Deep Learning
(DL) models, also for medical images. While DL methods have shown promising results in
automating the processing of medical images, a major challenge remains data availability, as
acquiring medical data is expensive and time-consuming. Additionally, medical images often
need to be annotated by medical experts to be useful for DL models. A solution to this would
be data sharing, but this is often hindered by privacy regulations. To sustain the patient’s
privacy and still allow for data sharing, synthesizing artificial images could be an encouraging
remedy.
For this, Generative Adversarial Networks (GANs) are gaining much attention. GANs
usually consist of two competing neural networks with one network, the generator, synthesizing
data samples. In contrast, the other network, the discriminator, judges how realistic the sample
looks and provides feedback to both networks. In this thesis, we generate synthetic images
using different GANs for two purposes in the stroke domain: sharing of labeled images and
automated image processing for treatment planning.
In the first part, we synthesize medical image patches for segmentation along with their
respective segmentation labels. We evaluate our synthetic data by training a segmentation
network on synthetic data and testing their performance on real data. In a next step, we simulate
the positive influence of sharing our synthetic data in terms of segmentation performance and
also extend our framework to generate 3D patches, thereby capturing more spatial information.
Additionally, we infuse noise in the discriminator during the GAN training to generate privacy-
preserving 2D patches leveraging the mathematical concept of differential privacy. In this way,
we quantify the level of privacy for our generated patches and investigate the trade-off between
privacy and the utility of our artificial data. The second part addresses a second application of
GANs for image synthesis: the automatic processing of perfusion-weighted imaging for stroke
treatment planning. Here, we propose a GAN variant for image-to-image translations with
additional temporal convolutions in the generator. We test our network on data including
both acute stroke patients and patients with chronic cerebrovascular disease and achieve high
performance in both cases.
In this thesis, we demonstrate the potential of utilizing GANs for image synthesis in the
field of stroke imaging. Our results are promising both for data sharing as well as for automated
image processing. In the future, GANs could substantially increase the data availability in the
medical field and also contribute to better treatment planning in stroke.
Zusammenfassung
Der Schlaganfall ist weltweit eine der häufigsten Todesursachen. Medizinische Bildgebungsver-
fahren wie die Magnetresonanztomographie bieten die Möglichkeit, wesentliche individuelle
Informationen über die Krankheit zu extrahieren, was in den vergangenen Jahrzehnten eine
bessere Patientenversorgung ermöglichte. Der Fortschritt bei der Rechenleistung und die
zunehmende Datenverfügbarkeit haben zum Aufkommen von Deep-Learning-Modellen (DL-
Modellen) geführt, auch für medizinische Bilder. Während DL-Methoden vielversprechende
Ergebnisse bei der automatisierten Verarbeitung medizinischer Bilder gezeigt haben, bleibt die
Datenverfügbarkeit eine große Herausforderung, da die Beschaffung medizinischer Daten teuer
und zeitaufwändig ist. Außerdem müssen medizinische Bilder oft von medizinischen Experten
annotiert werden, um für DL-Modelle nützlich zu sein. Eine Lösung für dieses Problem wäre
das Teilen von Daten, was jedoch häufig durch Datenschutzbestimmungen behindert wird. Um
die Privatsphäre des Patienten zu wahren und dennoch die gemeinsame Nutzung von Daten zu
ermöglichen, könnte die Synthese künstlicher Bilder eine vielversprechende Abhilfe schaffen.
Zu diesem Zweck gewinnen Generative Adversarial Networks (GANs) zunehmend an
Aufmerksamkeit. GANs bestehen in der Regel aus zwei konkurrierenden neuronalen Netzen,
wobei ein Netz, der Generator, Daten synthetisiert. Im Gegensatz dazu beurteilt das andere
Netz, der Diskriminator, wie realistisch die Daten aussehen und gibt beiden Netzwerken
Rückmeldung. In dieser Arbeit erzeugen wir synthetische Bilder mit Hilfe verschiedener GANs
für zwei Probleme im Schlaganfallbereich: das Teilen von annotierten Bildern und automatische
Bildverarbeitung für die Behandlungsplanung.
Im ersten Teil synthetisieren wir medizinische Bildausschnitte für ein Segmentierungs-
problem zusammen mit ihren entsprechenden Segmentierungslabel. Wir evaluieren unsere
synthetischen Daten durch das Training eines Segmentierungsnetzwerks, das auf synthetischen
Daten trainiert und ihre Performanz auf realen Daten getestet wird. In einem nächsten
Schritt simulieren wir den positiven Einfluss der gemeinsamen Nutzung unserer synthetischen
Daten auf die Segmentierungsleistung und erweitern unser Netzwerk, um 3D-Patches zu
generieren und so mehr räumliche Informationen zu erfassen. Außerdem fügen wir während des
GAN-Trainings Rauschen in den Diskriminator ein, um datenschutzfreundliche 2D-Patches
zu erzeugen. Um 2D-Bildausschnitte zu erzeugen, die die Privatsphäre bewahren, nutzen wir
das mathematische Konzept der differential privacy. Auf diese Weise quantifizieren wir den
Grad der Privatsphäre für die von uns erzeugten Patches und untersuchen den Kompromiss
zwischen Privatsphäre und dem Nutzen unserer künstlichen Daten. Der zweite Teil befasst sich
mit einer zweiten Anwendung von GANs für die Bildsynthese: die automatische Verarbeitung
von Perfusionsbildern für die Planung der Schlaganfallbehandlung. Hier schlagen wir eine
GAN-Variante für Bild-zu-Bild-Übersetzungen mit einer zusätzlichen temporalen Komponente
im Generator vor. Wir testen unser Netzwerk an Daten, die sowohl akute Schlaganfallpatienten
als auch Patienten mit chronischen zerebrovaskulären Erkrankungen umfassen und erreichen
in beiden Fällen eine hohe Performanz.
In dieser Dissertation demonstrieren wir das Potenzial des Einsatzes von GANs für die
Bildsynthese im Bereich der Schlaganfall-Bildgebung. Unsere Ergebnisse sind sowohl für das
Teilen von Daten als auch für die automatisierte Bildverarbeitung vielversprechend. In Zukunft
könnten GANs die Datenverfügbarkeit im medizinischen Bereich deutlich erhöhen und auch zu
einer besseren Behandlungsplanung bei Schlaganfall beitragen.
vi
Acknowledgements
There are many people who supported me during the last few years and whom I would like
to thank. First, I thank Prof. Anja Hennemuth for her scientific advice and guidance. Her
insightful feedback, fueled by both medical and technical expertise as well as her calming nature,
helped me to stay on a forthright path toward finishing my dissertation. I also thank Prof.
Kristian Hildebrand for his continuous support, the scientific discussions, and for contributing
his expert computer vision perspective on medical challenges.
I am very grateful to Dr. Dietmar Frey for giving me the opportunity to write my thesis in
the CLAIM group, for the medical insights, and for trusting me to freely pursue the scientific
projects I was interested in. Additionally, I would like to thank the whole CLAIM group for
insightful discussions about projects and publications as well as fun team days. In particular,
I thank Dr. Vince Madai for his exciting project ideas, fruitful discussions, and his honest,
constructive feedback that allowed me to improve my scientific skills tremendously. I also like
to thank Dr. Michelle Livne for introducing me to the world of science and for teaching me
the critical and scientific way of thinking.
Furthermore, I am thankful to the co-authors for contributing to our publications. I am also
grateful to the students I got to supervise during the last years, especially Pooja Subramaniam,
for her diligent and hard work. It was a pleasure to work with her!
Moreover, I thank my friends for their support, especially Laura and Oliver, for going
through the thesis writing and finalization phase together and for the ongoing invaluable
feedback. I also want to specifically thank Boris for proofreading the thesis.
I like to thank my parents and my sisters for their unconditional support throughout my
life, and my niece and nephews for the cutest and most wonderful distractions, especially
during exhausting weeks.
Finally, I would like to thank Joris for his loving support, encouragement, endless patience,
and for the occasional technical advice.
Table of Contents
Title Page i
Abstract iii
Zusammenfassung v
Abbreviations xi
1 Introduction 1
1.1 Machine Learning in Stroke Imaging . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Generative Adversarial Networks for Image Synthesis . . . . . . . . . . . . . . . 2
1.3 AimsandContributions ............................... 2
1.4 Thesis Outline and Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Preliminaries 7
2.1 Stroke Types and Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 StrokeImaging .................................... 8
2.2.1 Time-of-Flight (TOF) MRA . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Dynamic Susceptibility Contrast (DSC) MRI . . . . . . . . . . . . . . . 9
2.3 Challenges of Medical Imaging Data in Deep Learning . . . . . . . . . . . . . . 11
2.4 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4.1 StandardGAN................................ 12
2.4.2 WassersteinGAN............................... 13
2.4.3 Pix2pixGAN................................. 15
2.5 Evaluation of Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Image-Based Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.2 Evaluation Using a Downstream Task . . . . . . . . . . . . . . . . . . . 17
3 Related Work 19
3.1 Synthesis of Medical Images Using Unconditional GANs for Data Sharing . . . 19
3.2 Image-to-Image Translation for Treatment Planning . . . . . . . . . . . . . . . 20
I Synthesis of Medical Images for Data Sharing 23
4
Synthesizing Anonymized and Labeled TOF-MRA Patches for Brain
Vessel Segmentation Using Generative Adversarial Networks 25
ix
TABLE OF CONTENTS
4.1 ContextWithinThesis.............................. 25
4.2 JournalArticle .................................. 26
5
Generating 3D TOF-MRA Volumes and Segmentation Labels Using
Generative Adversarial Networks 37
5.1 ContextWithinThesis.............................. 37
5.2 JournalArticle .................................. 38
6
Toward Sharing Brain Images: Differentially Private TOF-MRA Images
With Segmentation Labels Using Generative Adversarial Networks 53
6.1 ContextWithinThesis.............................. 53
6.2 JournalArticle .................................. 54
II Image-to-Image Translation for Stroke Treatment Planning 69
7
Image-to-Image Generative Adversarial Networks for Synthesizing Perfu-
sion Parameter Maps from DSC-MR Images in Cerebrovascular Disease
71
7.1 ContextWithinThesis.............................. 71
7.2 Preprint...................................... 72
8 Discussion 85
8.1 Summary ....................................... 85
8.2 DiscussionandOutlook ............................... 86
8.2.1 Synthesis of Medical Images for Data Sharing . . . . . . . . . . . . . . . 86
8.2.2 Image-to-Image Translation for Stroke Treatment Planning . . . . . . . 88
8.2.3 Challenges and Opportunities for GANs in Medical Imaging . . . . . . . 88
References 91
x
Abbreviations
2D Two-dimensional
3D Three-dimensional
4D Four-dimensional
AI Artificial Intelligence
AIF Arterial Input Function
CBF Cerebral Blood Flow
CBV Cerebral Blood Volume
cGAN conditional Generative Adversarial Network
CT Computed Tomography
DL Deep Learning
DSC-MRI Dynamic Susceptibility Contrast Magnetic Resonance Imaging
DWI Diffusion-weighted Imaging
FID Fréchet Inception Distance
GAN Generative Adversarial Network
JS divergence Jensen-Shannon divergence
ML Machine Learning
MR Magnetic Resonance
MRA Magnetic Resonance Angiography
MRI Magnetic Resonance Imaging
MTT Mean Transit Time
Tmax Time-to-Maximum
TOF Time-of-Flight
TTP Time-to-Peak
WGAN Wasserstein Generative Adversarial Network
xi
1
Introduction
A recent study estimated that almost one in four people will suffer from a stroke during their
lifetime [1]. In 2019, stroke already accounted for 11.6% of all deaths, making it the second
leading cause of death worldwide [2]. The stroke survivors are not only more likely to get a
recurrent stroke [3], but approximately 24–49% of the survivors live with a disability after a
stroke [4]. Whereas the age-standardized incidence is declining, it is still estimated that stroke
cases will grow by 27% in the European Union over the next decades [5]. The main reasons
for this are the aging population as well as the increased stroke survival rate. In 2017 stroke
has already cost approximately 60 billion euros in Europe, and costs are expected to rise in
the upcoming years [6].
To lower the socioeconomic burden of stroke, improvements in patient care are needed.
This involves not only fast and accurate diagnosis and treatment planning but also prediction
of disease progression and patient outcome. For this, medical imaging techniques have the
potential to access crucial individual information to establish new guidelines [7]. For example,
for treatment planning, advanced imaging techniques using Magnetic Resonance Imaging
(MRI) or Computed Tomography (CT) have already been shown to increase the time window
for treatment, enabling doctors to treat more stroke patients and improve outcomes [8]. On
top of the broad availability of neuroimaging techniques, Machine Learning (ML) approaches
have gained a lot of attention in the past years, and applications to medical imaging are on
the rise [9].
1.1 Machine Learning in Stroke Imaging
ML techniques enable the identification and extraction of patterns within data. In particular,
for high-dimensional and large datasets, which are no longer processable for humans, ML can
be a powerful tool. With the increase in computational power and data availability, complex
ML methods, so-called Deep Learning (DL) techniques, have become more popular [10]. This
also holds true for the field of medical imaging [11]. Here, the applications range from image
denoising [12], risk prediction [13], and segmentation [14] to diagnosis [15] and treatment
planning [16].
1
1. Introduction
DL has been applied to similar use cases in the subfield of stroke imaging. Among them are
the prediction of stroke time onset [17], lesion segmentation [18], tissue fate [19], and patient
outcome [20, 21]. Since time is a critical factor in the clinical setting of stroke, DL methods
could be particularly valuable because once DL networks are trained, the application to new
data is usually fast. This could substantially automate and speed up processes for tasks that
otherwise would need manual validation from experts.
While DL has already shown state-of-the-art results in many areas, these methods require
large datasets in order to unleash their full potential. In the medical field, acquiring large
datasets is expensive, involving several data processing steps [22]. Additionally, for many ML
and DL applications, expert annotations are needed [22]. A solution to this would be data
sharing, but this is often not feasible due to privacy restrictions. The data in neuroimaging
is particularly sensitive as scans contain the patients’ faces, which can be identified by face-
recognition software [23]. Even when blurring or removing the face and skull, the face
could be partially reconstructed [24]. Above all, the brain itself has a unique structure, and
individuals could potentially be re-identified based on their cortical foldings [25]. Thus, novel
anonymization methods are needed, i.e., techniques that do not allow re-identification. One
promising option is the synthesis of artificial data. For this, Generative Adversarial Networks
(GANs) have become popular in the last years [26].
1.2 Generative Adversarial Networks for Image Synthesis
A GAN is a type of model that consists of two neural networks that try to mislead each
other. One network is the generator, which synthesizes a data sample, and the other network
is the discriminator, which tries to identify whether a sample is real or synthesized by the
generator [26]. At the end of a successful training process, the generator can synthesize
realistic-looking data samples while preserving the predictive properties of the original data
samples.
GANs offer many opportunities in the analysis of medical images. They have already
shown good performance for medical images in image-to-image translations [27], denoising [28],
and data augmentation [29, 30]. Specific to stroke imaging, applications so far have covered
mostly lesion segmentation or the synthesis of images for lesion segmentation [31, 32, 33].
This thesis focuses on two main applications of GANs in stroke imaging: the synthesis of
anonymous, labeled images for data sharing and an image-to-image translation application for
fast, automated image processing in stroke treatment planning.
1.3 Aims and Contributions
In the first part of the thesis, we leverage GANs for synthesizing shareable brain images.
Specifically, we generate Time-of-Flight (TOF)-Magnetic Resonance Angiography (MRA)
images, which show blood vessels. This type of imaging is mainly used for diagnosis in
the clinical context of cerebrovascular diseases such as stroke. Here, we utilize them for
the use case of brain vessel segmentation. We generate synthetic TOF-MRA images along
with their segmentation label for Two-dimensional (2D) and Three-dimensional (3D) patches.
Additionally, we examine the effect of introducing privacy measures in the generation process.
2
1.3 Aims and Contributions
Overall, the first part of the thesis aims to generate medically realistic and labeled images for
data sharing and evaluate their utility compared to the real data. Moreover, the generalizability
on a second dataset is tested, and the impact of restricting privacy leakage is measured.
In the second part of this thesis, we consider another application of GANs for image synthesis,
i.e., automatic image processing. In particular, we synthesize perfusion parameter maps from
Dynamic Susceptibility Contrast Magnetic Resonance Imaging (DSC-MRI). Perfusion maps
show the blood flow within the brain and are therefore relevant for treatment planning in
stroke. To create perfusion maps from DSC-MRI, experts are usually needed for manual
validation. Here, we propose a modified architecture of a so-called pix2pix GAN that directly
synthesizes the manually validated perfusion maps. The second part aims to speed up the
treatment decision making in the clinical setting.
Taken together, the main contributions of this thesis are:
•
Synthesis of 2D TOF-MRA with corresponding labels for brain vessel
segmentation (Chapter 4 and 6). We utilize different GAN architectures to synthesize
2D TOF-MRA patches and their segmentation mask showing the location of brain vessels.
Synthesizing the image along with the segmentation mask allows us to evaluate the
utility of our synthesized patches by training a segmentation network on the generated
data and testing it on real-world data. We show that our segmentation network trained
on synthetic data still performs well compared to the segmentation network trained on
real data.
•
Simulation of sharing synthesized data and application to a similar dataset
(Chapter 4). To test the generalizability of our segmentation model trained on
synthesized patches, we measure the segmentation performance on another similar
dataset and fine-tune it with an increasing amount of new patches. We demonstrate that
our fine-tuned network achieves better segmentation performance than a network trained
only on the second dataset. Thus, fewer newly annotated data samples are needed for
the same segmentation performance. By this means, we could showcase the positive
impact of sharing synthetic data.
•
Framework for generating labeled 3D data (Chapter 5). We extend our GAN
architecture to synthesize high-resolution 3D volumes to capture the 3D structure of the
brain better. Since the computational load of this model is substantially increased, we
implement measures to reduce memory consumption and training time, such as mixed
precision and the two timescale update rule. Here, we demonstrate that generating
high-resolution and labeled 3D images is feasible, and incorporating the third dimension
might even be beneficial for the downstream medical task if the computational setup
allows for it.
•
Elaborated evaluation scheme for synthetic data using a pre-trained Medi-
calNet, precision and recall, and a segmentation network (Chapter 5). We
evaluate our generated 3D patches by utilizing the MedicalNet, a neural network pre-
trained on different types of medical imaging. We compare the activation when providing
our synthesized and real data as input to the network in terms of the Fréchet Inception
Distance (FID) and precision and recall of the distributions to quantify both image quality
3
1. Introduction
and variations. Additionally, we train a 3D segmentation network on the synthesized
volumes and test it on real data to measure the usability of the generated data in the
context of brain vessel segmentation.
•
Introduction of differential privacy in GAN architecture for synthesizing
labeled TOF-MRA (Chapter 6). By inserting carefully calibrated noise into our
GAN architecture, we provide an upper bound on privacy leakage of our training data.
In this way, we can measure not only the utility of our generated data but also quantify
privacy and investigate the trade-off between these two properties.
•
Development of adapted pix2pix GAN architecture for automated perfusion
map generation from DSC-MRI (Chapter 7). We develop a modified version of
pix2pix GANs with additional temporal convolutions to generate perfusion maps from
DSC-MRI. As perfusion maps are usually manually validated by an expert in the clinical
setting, we aim to automate this step and speed up treatment planning with our GAN
architecture.
1.4 Thesis Outline and Publications
This thesis is structured as follows: Chapter 2 provides background information about medical
and technical methodologies. The medical background provides insights into stroke and the
relevant imaging techniques used in this thesis, while the technical background focuses on
GANs, the underlying principles, and their variants. The main chapters of the thesis are
split into two parts: the synthesis of medical images for the purpose of data sharing and
the generation of one image from another (image-to-image translation) in the context of
treatment planning. The first part consists of three chapters based on three different but
related publications. The first publication (Chapter 4) utilizes GANs to synthesize 2D labeled
TOF-MRA patches. Chapter 5 describes how this can be extended to 3D medical image
synthesis. The next Chapter (Chapter 6) explores the possibility of leveraging differential
privacy to synthesize 2D labeled image patches with privacy guarantees. The second part of
this thesis concerns image-to-image translation in stroke. This part consists of Chapter 7,
in which we automatically process perfusion-weighted imaging from 3D image sequences to
interpretable, expert-level perfusion maps that could be utilized for stroke treatment planning.
The last Chapter discusses and concludes the findings of this work. Moreover, it provides an
outlook on GANs in medical imaging.
The following publications are part of this thesis:
1.
T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J.
Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing
anonymized and labeled TOF-MRA patches for brain vessel segmentation using generative
adversarial networks”. In: Computers in Biology and Medicine 131 (2021). doi:
10.1016
/j.compbiomed.2021.104254, see Chapter 4
2.
P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert,
J. Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I.
4
1.4 Thesis Outline and Publications
Madai. “Generating 3D TOF-MRA Volumes and Segmentation Labels using Generative
Adversarial Networks”. In: Medical Image Analysis (2022). doi:
10.1016/j.media.2022
.102396, see Chapter 5.
This is an open access article under the CC BY license.
3.
T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand,
S. Pokutta, K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach,
and D. Frey. “Toward Sharing Brain Images: Differentially Private TOF-MRA Images
With Segmentation Labels Using Generative Adversarial Networks”. In: Frontiers in
Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.813842, see Chapter 6.
This is an open access article under the CC BY license.
4.
T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland,
A. Hilbert, J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative
adversarial networks for synthesizing perfusion parameter maps from DSC-MR images
in cerebrovascular disease”. In: medRxiv (2022). doi:
10.1101/2022.05.24.22274901
,
see Chapter 7.
This is an open access article under the CC BY license.
Additionally, I contributed to the following publications while working on this thesis:
1.
M. Livne, J. Rieger, O. U. Aydin, A. A. Taha, E. M. Akay, T. Kossen, J. Sobesky, J. D.
Kelleher, K. Hildebrand, D. Frey, and V. I. Madai. “A U-Net Deep Learning Framework
for High Performance Vessel Segmentation in Patients With Cerebrovascular Disease”.
In: Frontiers in Neuroscience 13 (2019), p. 97. doi:10.3389/fnins.2019.00097
2.
M. Ivantsits, L. Goubergrits, J.
-
M. Kuhnigk, M. Huellebrand, J. Brüning, T. Kossen, B.
Pfahringer, J. Schaller, A. Spuler, T. Kuehne, and A. Hennemuth. “Cerebral Aneurysm
Detection and Analysis Challenge 2020 (CADA)”. In: Cerebral Aneurysm Detection and
Analysis: First Challenge, CADA 2020, Held in Conjunction with MICCAI 2020, Lima,
Peru, October 8, 2020, Proceedings. Lima, Peru: Springer-Verlag, 2020, pp. 3–17. isbn:
978-3-030-72861-8. doi:10.1007/978-3-030-72862-5_1
3.
M. Ivantsits, L. Goubergrits, J.
-
M. Kuhnigk, M. Huellebrand, J. Bruening, T. Kossen,
B. Pfahringer, J. Schaller, A. Spuler, T. Kuehne, Y. Jia, X. Li, S. Shit, B. Menze, Z. Su,
J. Ma, Z. Nie, K. Jain, Y. Liu, Y. Lin, and A. Hennemuth. “Detection and analysis of
cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge”.
In: Medical Image Analysis 77 (2022). doi:10.1016/j.media.2021.102333
4.
A. Meddeb, T. Kossen, K. K. Bressem, B. Hamm, and S. N. Nagel. “Evaluation of a Deep
Learning Algorithm for Automated Spleen Segmentation in Patients with Conditions
Directly or Indirectly Affecting the Spleen”. In: Tomography 7.4 (2021), pp. 950–960.
doi:10.3390/tomography7040078
5
2
Preliminaries
2.1 Stroke Types and Treatment
A stroke is a sudden cerebrovascular event that either ruptures or blocks blood vessels in the
brain. It can be subdivided into two main categories: hemorrhagic and ischemic stroke. If a
blood vessel in the brain is ruptured, the event is called a hemorrhagic stroke. This rupture
leads to bleeding inside the brain and is associated with high mortality and morbidity [42].
Early diagnosis and treatment are crucial for a good outcome. Ischemic stroke is the most
common type of stroke and is usually caused by a blood clot, the so-called thrombus, that
blocks a blood vessel in the neck or brain [43]. The blockage reduces or even stops the blood
flow and leads to oxygen deprivation in the brain, eventually resulting in the death of the
affected brain cells. Therefore, immediate action is again required when an ischemic stroke
occurs [43]. This thesis focuses on this more common type of stroke, the ischemic stroke.
After a patient is admitted to a hospital with stroke symptoms, brain imaging is performed
for diagnosis and treatment planning. Current treatment options for an acute ischemic stroke
are intravenous medication or mechanical thrombectomy, i.e., removing the thrombus in
an operation [7]. Treatment planning still relies heavily on the time of symptom onset. If
symptoms started in the past 4–6 hours, patients are considered eligible for treatment [7, 44].
Beyond that time window, the risks of treatment are considered to outweigh the expected
benefits. This treatment stratification approach is particularly problematic for patients with
unknown stroke onset, e.g., for patients who wake up with stroke symptoms (“wake-up stroke”)
that make up 8–25% of stroke patients [45]. Treatment guidelines currently still rely on the time
window approach, whereas new clinical trials provide insights into more precise image-based
treatment stratification. Recent studies using advanced imaging techniques suggested that
groups of patients might still benefit from treatment in a time window of up to 24 hours after
the stroke [7, 8, 44]. Even patients with unknown stroke onset have been found to still benefit
from treatment. Therefore, brain imaging is crucial to move towards a more individualized
stroke treatment that shifts away from the classical time window approach.
7
2. Preliminaries
2.2 Stroke Imaging
Medical imaging techniques play a crucial role not only in stroke treatment but also stroke
diagnosis [46]. In particular, they can help determine the type of stroke, identify infarcted or
salvageable brain tissue, and abnormalities of the brain vessels.
In the context of stroke, it is crucial to look at the cerebrovascular system, i.e., the brain
vessels on a fine-scale anatomical level, as well as the blood flow and supply to the different
areas of the brain. To this end, both CT and MRI are commonly used in stroke imaging. Here,
we focus on MRI, specifically TOF-MRA, to visualize the blood vessels and DSC-MRI to show
the perfusion on a larger scale.
2.2.1 Time-of-Flight (TOF) MRA
TOF-MRA is a non-contrast-enhanced imaging technique to visualize vascular systems such as
the cerebrovascular system. It relies on the fact that blood in the brain vessels is flowing in
contrast to other stationary tissues [47]. While acquiring images, radiofrequency pulses are
repeatedly applied to a part of the body, in this case, brain slices or volumes [48]. This causes
protons in the stationary tissue to saturate. In contrast to that, the saturation of the protons
in the flowing blood takes longer due to new blood entering the scanned slice, which was not
previously exposed to the radiofrequency pulse [48]. This can be leveraged to make stationary
tissue appear darker while flowing tissue, such as the blood in the cerebral vessels, appear
lighter in the scanned 3D image (see Figure 2.1). More details about the underlying physical
principles can be found in the book chapter by Kirulata et al. [47] as well as in the book by
Carr et al. [49].
Figure 2.1: TOF-MRA image. The brain vessels appear with high intensity (white) on a
TOF-MRA image, whereas static tissue appears in gray.
TOF-MRA can be utilized to diagnose cerebrovascular diseases such as aneurysms or
steno-occlusive disease in the clinical setting [50, 51]. With regard to stroke, TOF-MRA can be
used as routine imaging to inspect whether a large vessel occlusion is present [52]. Examining
vessel occlusions is particularly important for treatment selection. For example, mechanical
8
2.2 Stroke Imaging
thrombectomy is recommended for acute ischemic stroke patients with large vessel occlusions
within a specific time window [53]. Thus, information about the brain vessels can have high
diagnostic value.
Recent studies suggest that brain vessel status could have medical value beyond the
assessment in the clinical routine. Gutierrez et al. showed that the anatomical structure could
be a biomarker for vascular events [54]. Furthermore, Frey et al. demonstrated that individual
vessel structures could be translated into a hemodynamic simulation to detect areas vulnerable
to stroke [55]. This simulation could be beneficial for both stroke prevention and outcome
prediction.
So far, brain vessels are only visually assessed in the clinical routine. Automating brain
vessel assessment could add value to diagnosis as well as stroke risk and outcome. The first step
of automated analysis of the brain vessels is vessel segmentation. For this, a neural network
specialized in segmentation, a U-Net, proved to be successful [38, 56, 57]. The problem of
automated brain vessel segmentation served as a use case for Chapter 4–6.
2.2.2 Dynamic Susceptibility Contrast (DSC) MRI
DSC-MRI is an imaging technique that measures the blood flow, i.e., the perfusion in the
brain. It is a widely used technique for stroke assessment and is essential for identifying the
penumbra [58]. The penumbra is the brain tissue around the infarct that currently shows
reduced blood flow but could be salvaged if reperfusion occurs. Thus, estimating the penumbra
is important for treatment decisions and could extend the time window for treatment for
certain groups of patients [59].
Figure 2.2: Diffusion-perfusion mismatch. Perfusion-weighted imaging (A) shows the perfusion
in the brain. In the right hemisphere, a large hypoperfused area can be seen (green/orange). In
contrast to that, the Diffusion-weighted Imaging (DWI), here presented as the apparent diffusion
coefficient in (B), shows the infarct core that is smaller (dark area). The mismatch between the
two image sequences roughly indicates the salvageable tissue.
To estimate the penumbra and thus the salvageable tissue, the diffusion-perfusion mismatch
model has been proposed, which aims to assess the penumbra. According to the model, the
mismatch between Diffusion-weighted Imaging (DWI) and perfusion-weighted imaging provides
an estimate of the salvageable tissue [60] and can therefore be used for patient stratification [61].
DWI measures the Brownian motion of the water molecules within a voxel, which can be used
to visualize the infarct core [46], while perfusion-weighted imaging such as DSC-MRI shows the
9
2. Preliminaries
perfusion of the brain. If there is a mismatch between the infarct core and the hypoperfused
area, the time window of treatment could be extended [62, 63] (see Figure 2.2).
DSC-MRI measures the perfusion by injecting a contrast agent into the patient’s blood.
After that, a series of MRIs is recorded, which results in a temporal sequence of 3D images
(hence a Four-dimensional (4D) image) and records the contrast agent’s flow through the brain.
The signal for each recorded voxel can be translated into a tissue concentration curve over time
(see Figure 2.3A). This is then voxel-wise deconvolved with a so-called Arterial Input Function
(AIF) resulting in a deconvolved tissue concentration curve (see Figure 2.3B). The AIF can be
automatically determined but is, in practice, manually validated by an expert. The two curves
depicted in Figure 2.3 show properties that provide insights into the perfusion of the patient’s
brain. The most important and clinically relevant parameters that can be extracted based on
these curves are Cerebral Blood Flow (CBF), Cerebral Blood Volume (CBV), Mean Transit
Time (MTT), Time-to-Maximum (Tmax), and Time-to-Peak (TTP). These five perfusion
parameter maps are 3D images with different clinical interpretations.
Figure 2.3: Tissue concentration curves over time for DSC-MRI. The measured concentration
curve is deconvolved with the AIF (A). This results in the concentration curve displayed in (B).
The five perfusion parameters CBF, CBV, MTT, Tmax, and TTP can be inferred from these two
curves. The figure is based on Figure 3 by Østergaard [64].
The CBF measures the blood supply to a given brain. It is usually estimated by the height
at timepoint Tmax in the deconvolved curve (see Figure 2.3B). It depends on the cerebral
perfusion pressure, the dilation of vessels, and the viscosity of the blood [65]. The CBV reflects
the area under the deconvolved curve and assesses the whole blood quantity within the brain
tissue. The ratio of the CBV and the CBF is defined as the MTT. MTT reflects the average
time the blood takes to enter the vessels and stay in the brain tissue. According to Kim et
al. [58], it might overestimate the penumbra. Tmax is regarded as the most reliable parameter
to measure the penumbra [65]. It is defined as the required time until the deconvolved curve
reaches its maximum. TTP also describes the time until the maximum is reached but refers
to the tissue concentration curve before deconvolution. All mentioned parameter maps are
shown in Figure 2.4. In Chapter 7, we propose a GAN-based method to generate the different
perfusion parameter maps from the DSC-MRI automatically.
10
2.3 Challenges of Medical Imaging Data in Deep Learning
Figure 2.4: Perfusion-weighted imaging. The parameter maps CBF, CBV, MTT, Tmax, and
TTP are calculated from the DSC source image. The DSC source is here shown at time point 0,
i.e., without contrast agent.
2.3 Challenges of Medical Imaging Data in Deep Learning
While medical imaging data offers great potential in diagnosis and treatment in the clinical
routine, there are still challenges when applying DL techniques to this type of data. First,
medical images are often high-dimensional. Hence, training DL models on them usually comes
along with high computational costs [66]. Second, medical imaging data is heterogeneous.
Hospitals utilize different scanners and imaging protocols, resulting in image variations across
institutions [11]. Furthermore, disease prevalence and demographics can vary across regions
leading to different patient cohorts [11]. Another problem is general data availability. DL
techniques require large amounts of data with high quality [67]. Additionally, medical imaging
data often needs to be annotated by one or more medical experts, which is expensive and
time-consuming [67]. To mitigate the problem of data availability, efficient use of data is crucial.
For this, transfer learning and fine-tuning as well as data augmentation can be utilized [9,
67]. Data augmentation aims to enrich the dataset by adding images with slight modifications
compared to the original images or by adding synthesized images, e.g., using GANs (see
Section 2.4).
On top of the above-mentioned challenges, privacy restrictions and thus ethical and legal
considerations of utilizing medical images should be taken into consideration [9, 11]. Even the
synthesis of artificial data does not automatically entail complete privacy preservation [68, 69].
To counteract privacy concerns, DL methods (including GANs) can be enhanced by secure
and private Artificial Intelligence (AI). Secure AI approaches intend to protect the algorithms,
11
2. Preliminaries
whereas private AI aims to protect the data [70]. Examples of secure AI are encryption and
federated learning. Federated learning is a technique that allows for decentralized training
of an ML model [71]. Instead of transferring data from one institution to another, the
model’s weights are shared during the training process. Hence, the security risk of transferring
data can be circumvented. However, the data itself is not secured by a federated learning
approach [70]. For this, private AI techniques can be complementary and protect the data
from re-identification. Examples of this are classical techniques such as anonymization and
pseudonymization and more advanced concepts such as differential privacy. Anonymization
aims to prevent re-identification by simply removing re-identifiable information, whereas
pseudonymization replaces this information. However, both methods can be seen as insufficient
to protect the data from re-identification [70]. Therefore, differential privacy has become an
important topic of research. It is a mathematical concept that puts a bound on individual
privacy leakage [72]. The intuition behind differential privacy is that a computation on a
specific dataset and a computation on the same dataset with one additional data sample should
have a very similar output. In other words, one sample should not have a large impact on
an algorithm to change the algorithm’s output completely, thus revealing that this particular
data sample is part of the training set. In Chapter 6 of this thesis, we investigate the influence
of differential privacy in a GAN architecture for synthesizing TOF-MRA images.
2.4 Generative Adversarial Networks
GANs are deep neural networks that synthesize data samples utilizing adversarial training.
They were first introduced in 2014 by Goodfellow et al. [26], who used them for synthesizing
natural images such as hand-written digits or faces. Starting in 2016, the first GANs were
applied to medical imaging data [30]. These applications range from data augmentation,
privacy preservation, and anomaly detection [29, 73, 74] to architectures for segmentation and
classification problems [75, 76]. Moreover, GANs can be used for image-to-image translations
such as denoising or cross-modality synthesis [77, 78].
2.4.1 Standard GAN
A classical GAN is usually trained for data synthesis. It consists of two simultaneously trained,
fully connected neural networks: the generator and the discriminator (see Figure 2.5). While
the generator (
G
) tries to synthesize a realistic data sample, the task of the discriminator (
D
)
is to distinguish between real data samples and the samples synthesized by the generator [26].
During training, the discriminator gets feedback on which of the samples were real and
which were generated. Additionally, the discriminator provides feedback to the generator
on how realistic the generated sample looks. After successfully training both networks, the
discriminator should be good at distinguishing between real and generated samples, and the
generator should synthesize samples that approximate the distribution of the real samples. As
both networks are trained on two opposing objectives, they can be regarded as adversaries.
Formally, the objective function of the generator can be formulated as:
LG=maxGExgen∼pgen [log(D(xgen))] (2.1)
12
2.4 Generative Adversarial Networks
Figure 2.5: The architecture of a classical GAN. The generator (
G
) takes a noise vector as an
input and produces a data sample. The task of the discriminator (
D
) is to decide whether a sample
looks realistic or not.
where
xgen
is the generated sample from the distribution
pgen
. The objective function for the
generator is maximized if the discriminator regards
xgen
as realistic (
D
(
xgen
)close to 1). In
contrast to that, the discriminator’s objective function is maximized if
xgen
is identified as
generated (
D
(
xgen
)close to 0) and the real samples
xreal ∼preal
as real (
D
(
xreal
)close to 1),
hence:
LD=maxDExreal∼preal [log D(xreal)] + Exgen∼pgen [log(1 −D(xgen))] (2.2)
The original GAN approach has several drawbacks. Three of the main disadvantages are
instability of the training, mode collapse, and vanishing gradients [79]: Since the two networks
have opposing objectives, the training can become unstable, and there is no guarantee that
the networks will converge. Additionally, the generator might find a data sample that looks
realistic to the discriminator and thus, generates only this sample with or without small
variations. This leads to poor variety within the samples and is termed mode collapse. The
problem of vanishing gradients concerns the training of the GAN. If the discriminator is too
good at detecting the synthesized samples, the generator does not get enough information
to generate more realistic samples. Thus, the generator’s gradient is small. During the
weight update (backpropagation), the gradient is then too small such that it vanishes, and the
generator does not improve anymore. Due to these limitations and to increase the range of
applications, modifications to the classical GAN have been proposed. Among those variants are
the Wasserstein Generative Adversarial Network (WGAN) [80] as well as the pix2pix GAN [81].
The WGAN is a variant that tries to solve the problems listed above, whereas the pix2pix GAN
is specialized for generating one image based on another, so-called image-to-image translations.
2.4.2 Wasserstein GAN
Several improvements have been proposed to counteract the disadvantages of the original GAN
model. First, Radford et al. suggested a deep convolutional GAN consisting of convolutional
layers instead of fully connected ones [82]. They reported more stable training and overall
good image representations that could be leveraged in image classification tasks. Additionally,
13
2. Preliminaries
Salimans et al. suggested improvements to the classical GAN architecture, such as feature
matching or minibatch discrimination [83]. These changes contributed to stabilizing training
as well as reducing mode collapse. To substantially stabilize and improve training, Arjovsky et
al. introduced a new type of GAN, the WGAN [80]. The WGAN provides a new perspective
on the cost functions and overall roles of the generator and the discriminator. This resulted in
more stable networks less prone to mode collapse and vanishing gradients.
The generator’s task can be regarded as synthesizing an approximation to the real data
distribution. Assuming an optimal discriminator, the classical GAN essentially minimizes
the Jensen-Shannon divergence (JS divergence) between the distribution of the real and the
distribution of the generated data [26]. Instead of minimizing the JS divergence, Arjovsky et al.
proposed to minimize the Wasserstein or Earth mover’s distance between the two distributions.
It describes the minimum cost of transporting “mass” from one point to the other in order to
transform one distribution (
pgen
) into another distribution (
preal
). Formally, it can be defined
as:
W(pgen, preal) = inf
γ∈Π(pgen,preal)
E(x,y)∼γ[∥x−y∥],(2.3)
where Π(
pgen, preal
)is the set of all joint distributions
γ
(
x, y
). Using the Kantorovich-Rubenstein
duality, this can be simplified to [80]:
W(pgen, preal) = sup
∥f∥L≤1
Exreal∼preal [f(xreal)] −Exgen∼pgen [f(xgen)].(2.4)
Here, fdenotes a 1-Lipschitz function, which is a function fulfilling the following constraint:
|f(x1)−f(x2)| ≤ |x1−x2|.(2.5)
In WGANs, the task of the discriminator is to learn this 1-Lipschitz function in order to help
compute the Wasserstein distance. It outputs a critic score rather than a probability of how
realistic a sample looks. Due to this new role, the discriminator in a WGAN is termed critic.
Compared to the JS divergence in classical GANs, the Wasserstein distance has a more reliable
gradient and is differentiable almost everywhere [80]. Thus, WGANs are more stable and suffer
less from vanishing gradients. Even if the discriminator is trained optimally, the generator
would still be able to learn from it. Additionally, the problem of mode collapse does not seem
to be present [80].
The objective functions of Gand the critic in a WGAN are:
LG=maxGExgen∼pgen [f(xgen)] (2.6)
Lcritic =maxw∈WExreal∼preal [f(xreal)] −Exgen∼pgen [f(xgen)],(2.7)
where
w
denotes the weights in the compact space
W
. To enforce the Lipschitz continuity,
the gradients of the critic are clipped in the WGAN. Since this can still lead to unstable
training, Gulrajani et al. came up with a more elegant solution, the gradient penalty [84].
The gradient penalty restricts the critic’s weights by the carefully constructed penalty term
λ(∥∇D(ϵxreal + (1 −ϵ)xgen)∥ − 1)2with ϵ∼U[0,1] and λweighting the regularization.
14
2.4 Generative Adversarial Networks
2.4.3 Pix2pix GAN
The pix2pix GAN was first introduced by Isola et al. [81]. It belongs to the subgroup of
conditional Generative Adversarial Networks (cGANs). In cGANs, the generator is usually
conditioned or dependent on other auxiliary information such as a certain label or class. The
objective function of a cGAN is:
LcGAN(G, D) = Ex,y[log D(x, y)] + Ex,z[log(1 −D(x, G(x, z)))],(2.8)
where
x
is the auxiliary information,
y
the output sample, and
z
a noise vector. Whereas the
generator tries to maximize this objective, the discriminator tries to minimize it. In the case
of pix2pix GANs
x
and
y
are both images and the noise vector
z
is usually introduced into
the generator as a dropout layer. In practice, directly feeding a noise vector into the generator
leads to the generator ignoring the noise [81]. Thus, noise is only introduced in the form of
dropout, which drops neurons in a neural network with a certain specified probability.
A pix2pix GAN allows for image-to-image translations, i.e., transforming an image from
one modality to another. For example, a landscape photo could be translated from a photo in
the daytime to a photo at night. In the clinical setting, a CT image could be computed based
on an MRI to get a larger variety of CT images [85]. Medical image-to-image translations could
also enable fast processing from one modality to another to reduce the number of necessary
scans or to save time [86].
In a pix2pix GAN, an image is fed into the generator as auxiliary information (see Figure 2.6).
This allows the generator to utilize the contextual information to generate a new image. The
input of the discriminator is a pair of images, the input image of the generator along with the
real image or together with the synthesized image. Again, the discriminator outputs whether
the two images look realistic or not.
Figure 2.6: The architecture of a pix2pix GAN. The input of the generator (
G
) is an image in
one modality, and the output is the same image in another modality. The task of the discriminator
(D) is to decide whether the image pair contains a generated image or not.
Since an image is fed into the generator, the architecture differs from the classical GAN
approach. Isola et al. proposed two different architectures for the generator [81]: an autoencoder
and a U-Net. An autoencoder consists of an encoding part for feature extraction and a decoding
part to rebuild the spatial dimensions. A U-Net adds to autoencoders by introducing skip
connections between layers in the encoder and the decoder. The U-Net architecture will be
15
2. Preliminaries
introduced in more depth in Section 2.5.2. In the study by Isola et al., the U-Net generator
outperformed the autoencoder architecture [81].
In the past years, pix2pix GANs have been successfully applied to medical images [30, 87].
They have been used to denoise images [77, 88] or to synthesize one imaging modality from
another (cross-modality synthesis) [78, 89, 90]. In this thesis, a modification of the pix2pix
GAN will be used to automatically process a 4D DSC-MRI into interpretable 3D perfusion
parameter maps (see Chapter 7).
2.5 Evaluation of Synthetic Images
Evaluating synthetic images is not an easy endeavor and depends on the setting in which
the image was synthesized. Therefore, many different approaches have been proposed. This
section discusses standard evaluation metrics, dividing them into image-based evaluation and
downstream task evaluation. The image-based approach compares the generated images or
their distributions directly or indirectly to the real ones, whereas in the downstream task, the
synthesized images are used for training another ML model. The performance of this model
on real test data is then utilized as a performance measure of the synthetic images.
2.5.1 Image-Based Evaluation
First, human observers could directly judge the quality of all synthesized images. This metric
does not rely on the specific architecture that synthesized the image and could always be used.
The disadvantage of medical images, however, is that one or several experts are often needed.
Therefore, this kind of evaluation is highly time-consuming and labor-intensive.
For some synthetic images, the ground truth image is available. Examples are image-to-
image translations with paired training data used to train a pix2pix GAN. For evaluation,
most studies rely on traditional metrics such as the mean absolute error, structural similarity
index measure, or peak signal-to-noise ratio to compare the generated image to the ground
truth [30]. In Chapter 7, we also utilize these metrics to evaluate our generated images.
If the ground truth is not available, the images could be evaluated via the activations of
a pre-trained network. Especially for non-medical images, the FID is a popular metric [91].
Specifically, the generated and real images are fed into a pre-trained inception network, which
is a deep, efficient neural network utilizing different filter sizes [92]. The difference between
the activation when feeding generated and real images is then measured. The smaller the
difference, the more similar the synthetic images are to the real ones. Since the inception
network is trained on natural images only, it is debatable whether this metric is suitable for
medical images as this has not been validated yet [30]. Nevertheless, it is still being used
in practice [93, 94]. Recently, a 3D network trained on different medical datasets has been
published [95]. This network offers an interesting possibility to replace the inception network
trained on non-medical images for evaluating 3D synthetic medical images (see Chapter 5).
The inception score is another popular metric for assessing the quality of synthetic non-
medical images [83]. It leverages a similar concept as the FID by utilizing the pre-trained
inception network. In contrast to the FID, it uses the network to predict the probability of a
generated image belonging to a specific class. The predictions are summarized in the inception
16
2.5 Evaluation of Synthetic Images
score. This score reflects whether images resemble an object of a specific class and whether the
wide range of classes is represented in the generated images. The inception score is therefore
specific to classification tasks and could be translated to medical image classification tasks if a
corresponding pre-trained network exists.
Another approach to estimating whether the generated images are realistic is to compare
the distribution of the real samples to the distribution of the generated samples. For this, the
log-likelihood [26], the Mahalanobis distance [96], the maximum mean discrepancy [97], or
similar metrics can be computed. For these metrics, the synthesized images are regarded as
more realistic the closer the two distributions are to each other. While the FID takes into
account the image quality, it lacks information about mode collapse, i.e., the variation within
the generated images. Thus, Sajjadi et al. introduced precision and recall for distributions [98].
Precision measures how much the distribution of the synthesized images can be generated by
a part of the real distribution. Thus, it quantifies the image quality of the synthetic images.
In comparison to that, recall measures how much of the real distribution can be accounted for
by a part of the generated distribution, hence quantifying mode collapse. Evaluating synthetic
images based on these two metrics offers a more detailed view of the quality of the images and
their diversity (see Chapter 5).
2.5.2 Evaluation Using a Downstream Task
Generating images for a downstream task is a good option to circumvent the problem of
selecting an appropriate metric for evaluating synthetic data [30]. Here, synthetic data and a
corresponding label are generated for a certain task, such as classification or segmentation.
After generation, a model is trained on the synthetic data, which can be evaluated on real
data. Its performance provides an estimate of how useful the synthesized data is.
Figure 2.7: The architecture of a U-Net. The image is fed through an encoding path for feature
extraction and a decoding path and outputs a segmentation mask. Skip connections preserve
spatial information. The figure is based on Figure 3 by Isola et al. [81].
In this thesis, we utilize a segmentation network as a downstream task for evaluating
synthetic data for brain vessel segmentation (see Chapter 4–6). For this, we use a specific type
of neural network, the U-Net architecture. Initially, the U-Net was proposed by Ronneberger et
al. for biomedical segmentation problems such as cell segmentation [56]. Since it is specialized
for a limited data set, it has been broadly applied to various other medical problems, including
brain vessel segmentation [38, 99]. The U-Net consists of an encoding and a decoding path
17
2. Preliminaries
with skip connections between the layers of the two paths (see Figure 2.7). Within these skirp
connections, the feature maps of the encoding path are copied to the decoding layers to better
preserve the spatial information. In the case of vessel segmentation, the performance can
be calculated by the metrics such as Dice Similarity Coefficient and the Hausdorff distance
between the predicted and ground truth segmentation [100]. The Dice Similarity Coefficient
(also known as the F1 score) measures the overlap of the predicted image and the ground truth
scaled by the total number of pixels. In contrast to this, the Hausdorff distance measures
the maximum of the minimum distances for each subset to another. It thus estimates of the
deviation between the predicted segmentation and the ground truth.
18
3
Related Work
In the following, we review related work in the field of GANs in medical imaging. We split the
review into the two parts of this thesis: the synthesis of medical images using unconditional
GANs for the purpose of data sharing and image-to-image translations, particularly for
treatment planning.
3.1
Synthesis of Medical Images Using Unconditional GANs
for Data Sharing
The motivation for synthesizing medical images using an unconditional GAN is usually
to increase data availability, typically for data augmentation or anonymization. Many
types of medical images were synthesized for data augmentation in the past years [101].
Among those were chest X-rays [102], retinal images [103, 104], and liver CTs [76]. In the
neuroimaging domain, Bowles et al. did a comprehensive study about augmenting both CT
and Magnetic Resonance (MR) images with progressive growing GANs for cerebrospinal fluid
segmentation [105]. Moreover, Shin et al. [29] and Foroozandeh et al. [106] synthesized images
using pix2pix GANs and progressive growing GANs, respectively, for improving brain tumor
segmentation performance.
In the context of medical image synthesis, most studies synthesized 2D images, although
many medical images have a third dimension. Thus, synthesizing only 2D images might
neglect important volumetric information. To date, only a few studies have generated 3D
images, which were usually downsampled due to computational constraints. For instance, both
Eklund [107] and Sun et al. [108] generated downsampled or resized 3D brain MR images.
Kwon et al. additionally synthesized thorax CTs [33]. In the field of stroke, Kwon et al.
generated downsampled 3D MR images with stroke lesions [33]. Still, there remains a lack
of studies that synthesize high-resolution 3D brain volumes. In particular for capturing fine
structures such as brain vessels, generating high-resolution volumes are crucial.
Unconditional GANs are especially useful for synthesizing privacy-preserving images as the
training images are not directly fed into the generator but indirectly via the discriminator’s
19
3. Related Work
input. There are only few applications of GANs for anonymization and thus for data
sharing in the medical imaging field. While Shin et al. tested their synthetic MR images for
anonymization [29], they did not provide any privacy guarantees. Other studies synthesized
chest X-ray images with differential privacy guarantees [73, 109, 110]. Zhang et al. [109]
and Nguyen et al. [110] additionally trained their GAN architectures in a federated learning
approach.
So far, the applications of GANs synthesizing data for sharing in the neuroimaging domain
are scarce with some MR sequences such as TOF-MRA not being considered. Additionally,
even fewer studies have generated 3D images, and no study to date has synthesized differentially
private neurological images. In the first part of this thesis, we explore the synthesis of labeled
TOF-MRA for the ultimate purpose of data sharing using different 2D GAN architectures
(see Chapter 4). In Chapter 5, we extend our GANs to 3D to capture the third dimension
and synthesize high-resolution TOF-MRA volumes. In the last Chapter of the first part (i.e.,
Chapter 6), we implement a differential private GAN to quantify the privacy and explore the
privacy-utility trade-off.
3.2 Image-to-Image Translation for Treatment Planning
GANs have shown to be useful for many image-to-image translation tasks. Among
them are image denoising [88], super-resolution [111], cross-modality synthesis [78], and
reconstruction [112]. The two main architectures used for image-to-image translations are
the pix2pix GAN [81] and the CycleGAN [113]. A CycleGAN consists of two discriminators
and two generators that are simultaneously trained and do not rely on aligned training pairs
as the pix2pix GAN. While the CycleGAN shows good performance in medical tasks [114,
115], it is still recommended to use a pix2pix GAN architecture when paired training data is
available [112].
In the field of stroke imaging, most studies using GAN architectures were applied to
the use case of lesion segmentation. Many ML approaches have already been suggested for
automating stroke lesion segmentation which could support clinicians in the treatment decision
making [116, 117, 118]. In this context, GAN-based studies included CT images that were
utilized to synthesize the segmentation mask using image-to-image GANs directly [119, 120]
or to generate MR scans from the CT images [31, 121]. Furthermore, Wang et al. used a
GAN to synthesize the segmentation mask from MRI [122], whereas Platscher et al. used the
segmentation mask to create new MR scans [123]. Other GAN applications in stroke include
denoising, such as CT super-resolution [111] and low-dose to full-dose CT perfusion [124], as
well as missing MR sequence synthesis [125] and white matter hyperintensities prediction [126].
Recently, Benzakoun et al. proposed a GAN that aimed to shorten MRI scanning time for
stroke [86]. They synthesized fluid-attenuated inversion recovery images from DWI, which can
be used for treatment decisions when the stroke onset time is unknown. The study showed that
the synthesized fluid-attenuated inversion recovery images had a similar diagnostic performance
to the real images.
Similar to Benzakoun et al. [86], the automatic processing of 4D DSC-MRI images resulting
in expert-level perfusion parameter maps could save time in stroke care. McKinley et al.
20
3.2 Image-to-Image Translation for Treatment Planning
have utilized classical ML approaches for generating expert validated perfusion maps. Other
related studies have created perfusion maps using DL approaches such as adapted U-Net
architectures [127, 128, 129]. While these studies show initial promising results, no study has
synthesized the perfusion maps utilizing GANs yet. In Chapter 7, we implement a modified
pix2pix GAN with additional temporal convolutions to generate expert-level perfusion maps
from DSC-MRI.
21
Part I
Synthesis of Medical Images for
Data Sharing
23
4
Synthesizing Anonymized and Labeled
TOF-MRA Patches for Brain Vessel
Segmentation Using Generative
Adversarial Networks
4.1 Context Within Thesis
GANs are typically used to synthesize new data. In this study, we focused on the feasibility
of GANs for generating 2D TOF-MRA image patches. This type of imaging visualizes
the cerebrovascular system, which is clinically relevant for cerebrovascular diseases such as
stroke. TOF-MRA images can be used to segment brain vessels and extract the patient’s
individual vessel tree. To evaluate the synthetic TOF-MRA patches appropriately, we generated
segmentation labels along with the image patches. We compared three different GAN
architectures for synthesis and trained a U-Net on the different synthesized image-label
pairs. These models were then evaluated and tested on real data.
Additionally, we simulated data sharing of our synthesized image-label pairs in a transfer
learning approach by fine-tuning the models trained on synthetic data on a second dataset.
We compared the segmentation performance of the pre-trained models to a model trained from
scratch with an increasing number of patches from the second dataset. We showed that our
pre-trained models led to superior performance compared to the newly trained model. This
chapter laid the groundwork for Chapters 5 and 6.
25
4. Synthesizing Anonymized and Labeled TOF-MRA Patches for Brain Vessel
Segmentation Using Generative Adversarial Networks
4.2 Journal Article
This chapter is based on the following publication that was published in Computers in Biology
& Medicine:
T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J.
Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing
anonymized and labeled TOF-MRA patches for brain vessel segmentation using
generative adversarial networks”. In: Computers in Biology and Medicine 131 (2021).
doi:10.1016/j.compbiomed.2021.104254
The original journal article is reprinted with permission of Elsevier.
Author Contribution
The first author Tabea Kossen conceptualized the study and interpreted the results together
with VIM, ML and DF. She implemented the GANs architectures and evaluations or supervised
PS in doing so. Additionally, she was responsible for the project administration, wrote the first
version of the manuscript, created the figures and coordinated the journal submission process.
Code Availability
The code for this project is publicly available:
https://github.com/prediction2020/GANs
-for-anonymized-labeled-TOF-MRA-patches.
26
Computers in Biology and Medicine 131 (2021) 104254
Available online 15 February 2021
0010-4825/© 2021 Elsevier Ltd. All rights reserved.
Synthesizing anonymized and labeled TOF-MRA patches for brain vessel
segmentation using generative adversarial networks
Tabea Kossen
a
,
b
,
*
, Pooja Subramaniam
a
,
c
, Vince I. Madai
a
,
d
, Anja Hennemuth
b
,
e
,
f
,
Kristian Hildebrand
g
, Adam Hilbert
a
, Jan Sobesky
h
,
i
, Michelle Livne
a
, Ivana Galinovic
i
,
Ahmed A. Khalil
i
,
j
,
k
,
l
, Jochen B. Fiebach
i
, Dietmar Frey
a
a
CLAIM - Charit´
e Lab for AI in Medicine, Charit´
e Universit¨
atsmedizin Berlin, Germany
b
Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany
c
Department of Electrical Engineering and Computer Science, Technical University of Berlin, Berlin, Germany
d
School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK
e
Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charit´
e Universit¨
atsmedizin Berlin, Berlin, Germany
f
Fraunhofer MEVIS, Bremen, Germany
g
Department VI Computer Science and Media, Beuth University of Applied Sciences, Berlin, Germany
h
Johanna-Etienne-Hospital, Neuss, Germany
i
Centre for Stroke Research Berlin, Charit´
e Universit¨
atsmedizin Berlin, Berlin, Germany
j
Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
k
Mind, Brain, Body Institute, Berlin School of Mind and Brain, Humboldt University Berlin, Berlin, Germany
l
Berlin Institute of Health, Berlin, Germany
ARTICLE INFO
Keywords:
Anonymization
Generative adversarial networks
Image segmentation
ABSTRACT
Anonymization and data sharing are crucial for privacy protection and acquisition of large datasets for medical
image analysis. This is a big challenge, especially for neuroimaging. Here, the brain’s unique structure allows for
re-identification and thus requires non-conventional anonymization. Generative adversarial networks (GANs)
have the potential to provide anonymous images while preserving predictive properties.
Analyzing brain vessel segmentation, we trained 3 GANs on time-of-flight (TOF) magnetic resonance angi-
ography (MRA) patches for image-label generation: 1) Deep convolutional GAN, 2) Wasserstein-GAN with
gradient penalty (WGAN-GP) and 3) WGAN-GP with spectral normalization (WGAN-GP-SN). The generated
image-labels from each GAN were used to train a U-net for segmentation and tested on real data. Moreover, we
applied our synthetic patches using transfer learning on a second dataset. For an increasing number of up to 15
patients we evaluated the model performance on real data with and without pre-training. The performance for all
models was assessed by the Dice Similarity Coefficient (DSC) and the 95th percentile of the Hausdorff Distance
(95HD).
Comparing the 3 GANs, the U-net trained on synthetic data generated by the WGAN-GP-SN showed the highest
performance to predict vessels (DSC/95HD 0.85/30.00) benchmarked by the U-net trained on real data (0.89/
26.57). The transfer learning approach showed superior performance for the same GAN compared to no pre-
training, especially for one patient only (0.91/24.66 vs. 0.84/27.36).
In this work, synthetic image-label pairs retained generalizable information and showed good performance for
vessel segmentation. Besides, we showed that synthetic patches can be used in a transfer learning approach with
independent data. This paves the way to overcome the challenges of scarce data and anonymization in medical
imaging.
1. Introduction
Modern deep learning methods have revolutionized the field of
natural image analysis [18,32]. These methods are translated to medical
image analysis with growing success [19,20,27]. However, in contrast to
natural images, the number of data sets in medical image analysis are
* Corresponding author. CLAIM - Charit´
e Lab for AI in Medicine, Charit´
e Universit¨
atsmedizin Berlin, Germany.
E-mail address: [email protected] (T. Kossen).
Contents lists available at ScienceDirect
Computers in Biology and Medicine
journal homepage: http://www.elsevier.com/locate/compbiomed
https://doi.org/10.1016/j.compbiomed.2021.104254
Received 17 November 2020; Received in revised form 27 January 2021; Accepted 3 February 2021
Computers in Biology and Medicine 131 (2021) 104254
2
usually orders of magnitude smaller since their availability is limited
owing to data privacy regulation. This poses a continuous challenge for
deep learning research in the medical imaging field. To meet this chal-
lenge, anonymization of medical images is an essential method to ensure
both data privacy and data availability for research. However, current
anonymization methods in neuroimaging such as face blurring or face
removal still allow re-identification and thus cannot be applied [1,26,
34]. These results call for new techniques to anonymize medical neu-
roimaging data to both protect patient privacy and to facilitate research
progress.
Generative adversarial networks (GANs) have the potential to fulfill
this need. GANs have already been applied successfully for medical
imaging data synthesis [24,33,37]. Also, first pilot studies have already
made use of GANs for anonymization purposes [15,30]. However, ap-
plications for neuroimages are scarce and synthesizing images often
requires additional patient information such as a segmentation label
[30]. This means that patient information is still fed into the model and
the generated images are then not properly anonymized. Thus, there is a
need to investigate the ability of GANs to create state-of-the-art anon-
ymous synthetic neuroimaging data maintaining the predictive prop-
erties of the original data. Importantly, such an approach would have
the most beneficial impact if the corresponding labels would be created
in the same process since many supervised deep learning applications
require time-consuming manual labeling of the dataset by experienced
physicians.
In this work, we utilize arterial brain vessel segmentation to test the
ability of GANs to create synthetic neuroimaging data and correspond-
ing labels. Moreover, we investigate the generalizability of the synthe-
sized data on a second, independent dataset. With respect to the
generative architectures, we train 3 different GAN architectures on time-
of-flight (TOF) magnetic resonance angiography (MRA) image patches
of patients with cerebrovascular disease: 1) Deep Convolutional GAN
(DCGAN), 2) Wasserstein GAN with gradient penalty (WGAN-GP) and 3)
WGAN-GP using spectral normalization (WGAN-GP-SN). With each GAN
type, we synthesized both the image and the corresponding label. We
validate the generated synthetic patches using two different approaches.
In the first approach, we evaluate the quality of the generated patches a)
using the Fr´
echet inception distance (FID) and b) by training a vessel
segmentation U-net on the synthetic patches. The U-net’s performance is
then assessed on real test data. In total, 66 patients were utilized for this
analysis. In the second approach, we use the synthetic patches to pre-
train a vessel segmentation model and apply the network weights in a
transfer learning setting to pre-initialize the training of a U-net model
using up to 15 patients from a second, independent TOF-MRA dataset.
The performance of this model is then compared to a U-net model
without any pre-training. Finally, to facilitate and accelerate future
research on arterial vessel segmentation and to corroborate the useful-
ness of the effective anonymization procedure, we make the synthesized
image-label pairs generated in our study available upon request.
Taken together, the contributions of the paper are: We present
effectively anonymized and labeled TOF-MRA patches for brain vessel
segmentation. To our knowledge, for the first time for this kind of im-
aging modality. Furthermore, we compare three different state-of-the-
art GAN architectures and evaluate our synthesized labeled data on an
independent, second dataset in a novel evaluation pipeline. We show
that pre-training a vessel segmentation network using our synthetic data
yields superior performance compared to no pre-training and can reduce
the amount of additional training data. Finally, we make our synthesized
data available upon request to facilitate further research.
2. Related work
GANs have already been shown to be successful in many applications
of data augmentation in medical imaging [6,29] as well as in
neuroimaging [3,5]. Here, real medical images together with synthe-
sized images were used to improve models that were trained on real data
only. Whereas we provide results on data augmentation, this study
focussed on the models trained on purely synthetic data and its gener-
alizability to a new dataset.
Generating medical images with labels is not a new idea. Neff et al.
showed that lung x-rays with corresponding segmentation labels can be
generated using a GAN architecture [24]. Guibas et al. demonstrated the
synthesization of labeled retina images using two GANs [7]. While these
studies focused on 2D medical images, we use a 3D dataset and evaluate
the performance on an independent dataset. In the neuroimaging
domain Foroozandeh et al. recently showed that synthesized and labeled
MR images can improve tumor segmentation performance [5]. How-
ever, the focus here was on augmentation and models trained on syn-
thesized data alone yielded comparably low performance.
In addition to that, we tested the generalizability of our GAN ar-
chitecture via a transfer learning approach. While previous studies such
as Foroozandeh et al. and Frid-Adar et al. only considered one dataset [5,
6], we successively added training images from a second dataset to a
pre-trained segmentation model. After that, we compared its perfor-
mance to the model’s performance that was trained from scratch.
Whereas Guibas et al. provided an evaluation on a second dataset, we
here extended the evaluation by adding images successively to get more
robust results [7].
3. Methods
3.1. Network architecture
The architecture of the proposed DCGAN was adapted from Radford
et al. [25] and Neff et al. [24]. The WGAN-GP is an extension of the
original Wasserstein GAN [2] using gradient penalty for regularization
[8]. For the third architecture WGAN-GP-SN spectral normalization was
used in the convolutional layers of the WGAN-GP [21]. Our code is
openly available.
1
The proposed methods and the structure of the GAN is
shown in Fig. 1.
The generator G of all architectures took a noise vector of length 100
sampled from a gaussian distribution as input. The noise vector was then
fed through 6 upsampling convolutional layers using a kernel size of 5
and stride of 2. After each convolution layer, a batch normalization layer
and a ReLU activation layer were added, except for the last convolution
layer. The activation function used after the last convolution layer is the
hyperbolic tangent function. The network then outputs two 96 ×96
images that correspond to one image-label pair xgenpgen. The objective
function for the generators of all architectures were built upon:
LG=maxGExgenpgen [log(D(xgen))] (1)
This term is maximized if the discriminator regards the generated
input as real with a high certainty (D(xgen)close to 1). In other words, the
discriminator is fooled by the generator. The discriminator D for all
architectures took two 96 ×96 images as input which correspond to
either a real image-label pair or generated image-label pair. The pairs
were again fed through 6 convolutional layers with a kernel size of 5 and
stride of 2. After each convolution layer, a batch normalization layer and
a leaky ReLU (with a slope of 0.2) were added, except for the last
convolution layer. The activation function used after the last
convolution layer in the DCGAN was a sigmoid function. The objective
function of the discriminator for the DCGAN was:
LD=maxDExreal∼preal [logD(xreal)] + Exgen ∼pgen [log(1−D(xgen))] (2)
where xrealpreal denoted the real image-label pair. Here, the first part of
1
https://github.com/prediction2020/GANs-for-anonymized-labele
d-TOF-MRA-patches.
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
3
the equation maximizes for the real input to be identified as real
(D(xreal)close to 1) and the second part for the generated input to be
identified as such (D(xgen)close to 0).
For the WGAN-GP and WGAN-GP-SN, a gradient penalty term for
regularization was added to the discriminator’s loss:
lossD=D(xgen)−D(xreal) + λ(
∇D(
ε
xreal + (1−
ε
)xgen)
−1)2,(3)
where
ε
U[0,1]and λ=10. The gradient penalty enforces the Lipschitz
constraint. In this way, the norm of the gradient is bounded and does not
lead to exploding gradients. Overall, this stabilizes the training of the
GAN [8]. Since the discriminator acted as a critic, the sigmoid activation
function in the last convolutional layer was omitted. The batch
normalization was replaced by instance normalization to normalize
across features and channels in the WGAN-GP. In the WGAN-GP-SN
architecture, spectral normalization was used instead of instance
normalization.
For training the DCGAN, the Adam optimizer [17] with a learning
rate of 0.0003 with β1=0.5 was used for both the generator and the
discriminator. The batch size was 512 and the model was trained for 178
epochs. To improve stability of the training, label smoothing (ranges
0.7–1.2/0–0.3) and feature matching between the last convolutional
layer using L1 norm were applied [28].
For WGAN-GP and WGAN-GP-SN, the Adam optimizer was utilized
with a learning rate of 0.0001 with β1=0 and β2=0.9 for both
generator and discriminator. The batch size was 128 (WGAN-GP)
trained for 194 epochs and 64 for the WGAN-GP trained for 157 epochs.
In each epoch the discriminator was updated five times and the gener-
ator once. All models were implemented in PyTorch and trained on a
Tesla V100.
3.2. Patients
A total of 121 patient MRA data from two studies were used:
PEGASUS (N =66) and 1000Plus (N =55). All patients were diagnosed
with a cerebrovascular disease. Details on both studies can be found in
previous papers, for the PEGASUS study see Mutke et al. [23], for the
1000Plus study see Hotter et al. [13]. All the patients gave their
informed written consent. The studies have been conducted in accor-
dance with the authorized ethical review committee of Charit´
e - Uni-
versit¨
atsmedizin Berlin.
Scans were performed on a clinical 3T whole-body system (Magne-
tom Trio, Siemens Healthcare, Erlangen, Germany; using a 12-channel
receive radiofrequency coil (Siemens Healthcare) tailored for head
imaging.
Parameters PEGASUS: voxel size =(0.5 ×0.5 ×0.7) mm
3
; matrix
size: 312 ×384 ×127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50
min, flip angle =18◦.
Parameters 1000Plus: voxel size =(0.5 ×0.5 ×0.7) mm
3
; matrix
size: 312 ×384 ×127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50
min, flip angle =18◦.
For both datasets, skull-stripping was applied. The segmentation
labels were produced semi-manually using a standardized pipeline along
with 4 raters correcting the labels as described in Livne et al. [20].
Fig. 1. Workflow of this study (A) and basic architecture of the generative adversarial networks that were trained (B).
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
4
3.3. Data splitting and patch extraction
For the anonymization, 41 out of the 66 PEGASUS patients were used
as a training set, 11 were used for validation and 14 for testing. For the
transfer learning approach, one to 15 patients in increments of two of
the 1000Plus data were utilized for training. The 1000Plus validation set
consisted of 10 and the test set of 40 patients.
Due to memory considerations, 2D patches of size 96 ×96 were
extracted from each patient instead of using the whole volume. The data
contained 1% vessels and 99% background. To compensate for this
imbalance, 500 patches per patient with a brain vessel in the center were
extracted. Then, 500 random patches per patient were added. The input
patches were normalized to a range between −1 and 1 for the GAN used
for anonymization. For the U-net segmentation model, the input was
normalized patch-wise to zero-mean and unit-variance.
3.4. Performance evaluation
The hyperparameters of the GAN architectures were pre-selected
based on visual inspection. After that, the generated images were
quantitatively evaluated using three different metrics: 1) Fr´
echet
inception distance (FID) [11], 2) the DSC and 3) the 95th percentile of
the Hausdorff distance (95HD) of a U-net segmentation model. The FID
measures the similarity of the real and generated images by feeding both
into an Inception-v3 network. The difference between the activations in
the pool3 layer inside the Inception-v3 network is then calculated as
follows:
FID =
μ
real −
μ
gen
2+Tr(
σ
real +
σ
gen −2(
σ
real
σ
gen)1/2),(4)
where xreal ∼N(
μ
real,
σ
real)and xgen ∼N(
μ
gen,
σ
gen)are the distributions
of the features in the pool3 layer of the real and generated data
respectively. In this way, the network’s activation is measured both for
real and generated data and then compared. If the similarity between
them is high, we expect similar activation and thus, a small distance.
For robustness, the FID was calculated on 4 different sets of gener-
ated data, each containing 41,000 patches of all three architectures with
the respective 41,000 real patches. The lower the FID, the higher the
similarity of the generated data to the original data.
As a second evaluation, the state-of-the-art “half U-net” used in Livne
et al. [20] was trained on the 4 sets of generated data alone as well as
both real and generated data. The parameters learning rate and dropout
rate were tuned with respect to the validation set. Additionally, classical
augmentation was used as described in Livne et al. [20] if this led to an
improved performance on the validation set. Each segmentation
network was trained for 15 epochs. Then, the performance was evalu-
ated on the binary segmentation maps of the test set by the DSC and
95HD:
DSC =2TP
2TP +FP +FN,(5)
where TP are the true positives, FP the false positives and FN the false
negatives. By this, the DSC measures the ratio of the overlap between the
predicted vessel voxels and ground truth compared to the total amount
of voxels. The Hausdorff distance is defined as:
HD =max(maxi∈[0,N−1]d(i,P,G),maxi∈[0,M−1]d(i,G,P)) (6)
where N and M denote the number of voxels on the vessel tree of the
ground truth G and the prediction P respectively. d(i,P,G)is defined as
the distance from vessel voxel i in G to the closest vessel voxel in P. In
other words, the Hausdorff distance finds the minimum distance for
each voxel in one subset (e.g. predicted vessel voxels) to another subset
(e.g. ground truth) and takes the maximum of this. The 95HD was then
the 95th percentile Hausdorff distance for each voxel, averaged over
each voxel and each patient. It was measured in millimeters.
In the second part of the analysis, the performance of the U-net
trained on generated patches was evaluated on the 1000Plus dataset. For
an increasing number of training patients (1, 3, …, 15) the U-net was
trained from scratch and using the weights from the best performing
model of those trained on the generated image-label pairs (transfer
learning). The performance of using real data only and transfer learning
was then compared by assessing the DSC and 95HD on the validation (10
patients) and test set (40 patients).
4. Results
Overall, generated synthetic patches showed high similarities to the
training set patches, in particular those that were synthesized by the
WGAN-GP-SN. The patches generated by the DCGAN showed a lower
resolution with slight checkerboard artifacts compared to the original
patches. The generated corresponding labels fit well to the patches for
all models. A subset of the synthesized image-label pairs for all GAN
architectures as well as original image-label pairs are shown in
Fig. 2A–D. In the quantitative assessment, the data generated by the
WGAN-GP-SN architecture showed the highest similarity to the real data
with a FID of 37.01 compared to 141.82 for the worst performing
DCGAN. All FID values for real and synthesized data can be found in
Table 1.
In the first validation approach, The U-net trained on data generated
by the WGAN-GP-SN showed the highest performance of all GAN models
with a segmentation performance of 0.85 DSC/30.00 95HD. The U-net
trained on real PEGASUS data showed a performance of 0.89 DSC/26.57
95HD. The same model showed a similarly high performance in the
external validation on the 1000Plus data with 0.88 DSC/25.12 95HD.
Quantitative results for all models trained on generated and/or real data
can be found in Table 3.
In the second validation approach applying transfer learning, the U-
net pre-initialized with the weights from training on synthesized patches
exhibited a higher performance compared to the model trained from
scratch on real data only could be observed. Particularly when training
on patches from one patient only (n =1000), transfer learning using
patch-label pairs generated by the WGAN-GP-SN led to a higher per-
formance in terms of DSC and 95HD (DSC/95HD 0.91/24.66 compared
to 0.84/27.36). This observed performance difference between pre-
initialized models and models trained from scratch became smaller
when more patients were used for training. Results of the transfer
learning approach are visualized in Figs. 3 and 4 shows the error maps
for both approaches on one example patient in large vessels (Fig. 4A and
C) and small vessels (Fig. 4B and D).
5. Discussion
We present a Wasserstein-GAN based model for the generation of
synthetic TOF-MRA imaging data and corresponding labels. The model
generated synthetic data of high quality, as evidenced visually and
through the FID measure, and retained much of the predictive properties
of the original images. Here, a predictive model for vessel segmentation
trained on synthetic data alone showed a good performance on one
dataset and excellent performance on an external validation set. The
synthetic data were also successfully applied in a transfer learning
approach where training was pre-initialized with weights from a model
trained on synthetic data. It outperformed the models trained on real
data. Our results mark a significant step towards the use of GAN-based
models to generate synthetic and effectively anonymous data. Conse-
quently, this approach has the potential to significantly accelerate
research in the field of neuroimaging.
While the image-label pairs synthesized by the DCGAN showed some
artifacts, the more recent GAN architectures (WGAN-GP and WGAN-GP-
SN) produced higher resolution data that looked similar to the real data
(Fig. 2). The superiority of the WGAN-approaches was confirmed by
lower FID values as well as the improved performance of the U-net
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
5
segmentation models trained on synthetic data. This can be explained by
the inherent differences between Wasserstein-GANs and the DCGAN. In
contrast to the DCGAN, the loss function of the WGAN-GP architectures
utilizes the Earth Mover’s distance and is bounded by a Lipschitz
constraint [2,8]. This works as a robust regularization and enhances
training stability while diminishing mode collapse at the same time. This
explains why the WGAN-GP produced more realistic looking
image-label pairs. Other studies confirm the superiority of Wasserstein
GAN architectures over the DCGAN [2,8]. A recent addition to GAN
architectures was the introduction of spectral normalization. This
method additionally restricts the discriminator’s weights for each layer
in order to stabilize training even for high learning rates [21]. As evi-
denced in our work, spectral normalization is also beneficial for the
application of Wasserstein GANs, and the combination of both regula-
rization techniques (WGAN-GP-SN) yielded the best image quality both
by visual inspection as well as in terms of FID. These techniques have
thus supported the preservation of the predictive properties for vessel
segmentation within the synthetic patches. Therefore, it is likely that
more sophisticated (future) GAN architectures will further improve the
generation of synthetic data. Here, potential current candidate methods
Fig. 2. Real and synthesized image patches with corresponding labels. (A) to (C) show image-label pairs generated by DCGAN (A), WGAN-GP (B) and WGAN-GP-SN
(C) respectively. (D) show real patches and corresponding labels. The synthesized patches resemble real vessel patches and the labels fit well to the patches, especially
those generated by WGAN-GP-SN (C).
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
6
are progressive growing GAN (PG-GAN) or stacked GAN architectures
[14,16].
Whereas the data generated by WGAN-GP-SN consistently yielded
the highest DSC in the transfer learning approach, this is not as apparent
in other parts of the results. First, the 95HD did not show a consistent
trend. Since the Hausdorff distance is vulnerable to outliers, we argue
that it might not be as reliable as the DSC. This is also corroborated by
the high standard deviation over the patients. Secondly, when training
the U-net with real data and additional synthesized data (data
augmentation), the performance only slightly increased for the WGAN-
GP-SN. In addition, the DCGAN seemed to perform slightly better. This
might be due to the more noisy and blurry appearance of the images
generated by the DCGAN compared to the WGAN architectures. Here,
for the DCGAN only the vessels seem to be sharp. Additionally, they fit
well to the generated segmentation label. This attention on the vessels
might lead to an increased focus on vessels within the feature extraction
in the encoding part of the U-net. Then, together with the real images the
U-net model is able to learn how real images look as well as the focus on
vessels. The WGAN architectures look more similar to the real images
which is also corroborated by the lower FID and cannot profit from this
effect. Thus, they cannot provide (much) additional information to the
real data. Training a U-net on synthesized data alone, the DCGAN is then
outperformed by the WGAN architectures as the generated images look
more noisy.
GAN architectures have the potential to generate anonymized data
since the generator does not have direct access to the training data. This
also holds true for this study: the generator synthesizes patch-label pairs
from a noise vector. However, a recent study by Hayes et al. [10] shows
that DCGANs might be vulnerable to so-called membership inference
attacks [31]. Such attacks aim to identify whether a given data sample
was part of the original training set or not. To prevent this, differentially
private GANs (DPGANs) have been introduced [36]. Here, carefully
adjusted noise is introduced in the gradients during the discriminator’s
training. While these GANs have the potential to ensure a certain level of
privacy, they show poorer performance to date [22] and have only been
trained on natural image datasets yet. Training a DPGAN on sparse
medical imaging datasets remains a major challenge. While DPGANs
might provide even further advantages in anonymization, we argue that
our synthesized patch-label pairs are effectively anonymized. For one, in
the WGAN-GP-SN approach, we apply Lipschitz regularization tech-
niques such as gradient penalty and spectral normalization. Wu et al.
found that these techniques might reduce information leakage and
might even make the trained models resistant to membership inference
attacks [35]. Furthermore, we use randomly sampled 2D patches in this
study. Thus, for a successful membership inference attack two events
must coincide: First, the real training data that is protected by
state-of-the-art hospital security systems has to be leaked. Second, the
patches need to be extracted in the exact same way as in the
GAN-training process to allow re-identification. The minuscule
Table 1
Fr´
echet inception distance (FID) as a quantitative mea-
surement of the generated image’s similarity compared to
the real images for each of the three GAN architectures. The
FID is averaged over the 4 different datasets generated from
one model. The standard deviation (SD) is shown in
brackets. WGAN-GP-SN showed the highest similarity to the
real data in terms of FID.
GAN architecture mean FID (SD)
DCGAN 141.82 (0.32)
WGAN-GP 52.41 (0.16)
WGAN-GP-SN 37.01 (0.22)
Table 2
Summary of the mean Dice similarity coefficient (DSC) and the mean 95th-
percentile Hausdorff distance (95HD) of the U-net on test set with standard
deviation (SD). Both metrics are averaged over 4 different sets of generated data.
The artificial patches were generated by Generative Adversarial Networks
(GANs) trained on the PEGASUS dataset. For data augmentation, both real and
generated patches have been used for training. For anonymization the U-net was
trained on generated patches only. Models trained on anonymized, synthetic
data only show performances close to the model trained on real data.
test DSC test 95HD [mm]
mean SD mean SD
U-net on real PEGASUS data (Livne et al.) 0.892 26.569
Data augmentation (real data (PEGASUS) and generated data)
DCGAN 0.903 0.003 26.482 1.027
WGAN-GP 0.891 0.003 26.784 0.736
WGAN-GP-SN 0.894 0.005 27.909 1.137
Anonymization (trained on generated data only)
PEGASUS anonymization models: validated and evaluated on PEGASUS data
DCGAN 0.779 0.008 31.481 0.559
WGAN-GP 0.812 0.008 30.242 1.228
WGAN-GP-SN 0.848 0.007 30.001 0.702
PEGASUS anonymization models evaluated on real 1000Plus
DCGAN 0.792 0.022 27.103 0.288
WGAN-GP 0.871 0.003 27.307 0.694
WGAN-GP-SN 0.875 0.010 25.119 0.403
Fig. 3. Performance evaluation for segmentation for an increasing number of patients on the 1000Plus dataset when trained from scratch (green) and using transfer
learning (blue). The black dotted lines indicate the performance of the Unet on the real PEGASUS dataset. The error bars show the standard deviation over the
patients. Especially for up to 5000 data samples the pre-trained WGAN-GP-SN outperform the models without any pre-training.
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
7
probability of these events to happen is comparable to other theoretical
scenarios of state-of-the-art anonymization. For example, any tabular
data anonymized using state-of-the-art techniques could be re-identified
when compared with the leaked original data. Thus, we consider our
generated patches anonymous and hence make them available for re-
searchers upon request.
Our results are also promising for AI in healthcare product devel-
opment [12]. In the medical AI research setting, a strong focus on per-
formance in homogeneous samples can be observed. This is in stark
contrast to the requirements for a medical imaging product. A product is
supposed to be used in a real world setting confronted with highly
heterogeneous data reflecting different settings and multiple hardware
options. Thus, product development should focus as much on training on
heterogeneous data as on keeping the necessary performance [12]. This,
however, is currently highly challenging as data is a scarce resource due
to limited availability. Our results show that a relatively small amount of
data is sufficient to generate robust results. Thus, a GAN-based ano-
nymization approach could allow the generation of high quality data
from a smaller number of patients from multiple locations that - in total -
reflect the full distribution of soft- and hardware settings in the clinical
setting. Here, the possibility to generate high-quality labels as evidenced
by our study is also a great advantage. Notably, a GAN model also learns
the quality of the labels provided during training. Thus, the final per-
formance of any model trained on synthetic data will also be dependent
on the quality of the real labels. Providing high-quality labels is no
simple task and requires usually hours of manual labor by highly qual-
ified medical staff. Thus, a novel GAN-based approach to product
development could entail the high-quality labeling of relatively small
data-sets from multiple data providers that are then anonymized and
pooled for training. This would on one hand keep development costs
relatively low which is a prerequisite for startup success. On the other
hand, such an approach would ensure both high performance and low
bias as the chance for out-of-sample data in the clinical setting would be
significantly lowered.
Our study has several limitations. The GANs are 2D due to compu-
tational restrictions. 3D approaches could help extracting information
about the 3D vessel tree structure and in this way improve the perfor-
mance of the segmentation task. The computational restrictions also did
not allow to try out more advanced GAN architectures such as PG-GAN.
Another limitation is the calculation of the FID. Due to computational
restrictions it was only calculated to confirm the quality of visually
inspected images and not for every epoch in an end-to-end solution.
Secondly, the FID for assessing the image quality might not be ideal.
Although it is used as a quality measurement in the medical field [4,9], it
was originally designed for natural images and hence might not entirely
capture relevant features for medical imaging. Thus, further research on
assessing image quality specific to medical images should be
undertaken.
In this study, we generated TOF-MRA images for brain vessel seg-
mentation. Since this segmentation relies on identifying local structures
within an image, it allowed us to generate only parts of the image, i.e.
patches, and obtain good segmentation results. Our results might
generalize to medical segmentation problems that rely on these local
properties such as segmenting small organs, lesions or tumors. Never-
theless, for medical problems that involve the understanding of global
structures, e.g. the whole brain, a patch-based approach would most
probably not suffice. Here, bigger patches or whole volumes need to be
generated which will be computationally more expensive.
6. Conclusion
This study marks an essential step towards true anonymization of
medical imaging data while maintaining crucial predictive features
within the image patch. We show that these features might be general-
izable to another, independent dataset. Our initial performance for
vessel segmentation on the PEGASUS dataset already is relatively high.
We show that training more advanced GAN architectures can further
increase the quality of synthesized image-label pairs. By using only one
patient from a different cohort, we can achieve a high comparable
performance on an independent dataset. Our synthesized image-label
pairs allow other researchers to build models that only require few
labeled patient data and will significantly facilitate research in this
domain. It may be the case that our framework achieves similar results
on other medical segmentation tasks. This could lead to a lower demand
of labeled patient data and allow more data sharing of anonymized data.
Nevertheless, further studies should assess the generalizability of this
analysis to other (more complex) segmentation problems.
CRediT authorship contribution statement
Tabea Kossen: Conceptualization, Formal analysis, Investigation,
Methodology, Project administration, Software, Validation, Visualiza-
tion, Writing - original draft. Pooja Subramaniam: Formal analysis,
Investigation, Methodology, Software, Validation, Writing - review &
editing. Vince I. Madai: Conceptualization, Data curation, Investiga-
tion, Methodology, Project administration, Supervision, Visualization,
Writing - original draft, Writing - review & editing. Anja Hennemuth:
Resources, Supervision, Writing - review & editing. Kristian Hilde-
brand: Supervision, Writing - review & editing. Adam Hilbert:
Conceptualization, Writing - review & editing. Jan Sobesky: Data
curation, Writing - review & editing. Michelle Livne: Conceptualiza-
tion, Writing - review & editing. Ivana Galinovic: Data curation,
Writing - review & editing. Ahmed A. Khalil: Data curation, Writing -
review & editing. Jochen B. Fiebach: Data curation, Writing - review &
editing. Dietmar Frey: Conceptualization, Funding acquisition, Project
administration, Resources, Supervision, Writing - review & editing.
Declaration of competing interest
Tabea Kossen reported receiving personal fees from ai4medicine
outside the submitted work. Dr Madai reported receiving personal fees
from ai4medicine outside the submitted work. Adam Hilbert reported
Fig. 4. Error maps for one example patient from the 1000Plus study using one patient when training from scratch (A, B) and using transfer learning from WGAN-GP-
SN generated patches (C, D). True positives are shown in red, false positives in green and false negatives in yellow. Transfer learning led to less errors, especially on
small vessels (B, D).
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
8
receiving personal fees from ai4medicine outside the submitted work.
While not related to this work, Dr Sobesky reports receipt of speakers
honoraria from Pfizer, Boehringer Ingelheim, and Daiichi Sankyo.
Furthermore, Dr Fiebach has received consulting and advisory board
fees from BioClinica, Cerevast, Artemida, Brainomix, Biogen, BMS,
EISAI, and Guerbet. Dr Frey reported receiving grants from the European
Commission, reported receiving personal fees from and holding an eq-
uity interest in ai4medicine outside the submitted work.
Acknowledgements
This work has received funding by the German Federal Ministry of
Education and Research through (1) the grant Center for Stroke
Research Berlin and (2) a Go-Bio grant for the research group PREDIC-
TioN 2020 (lead: DF).
Computation has been performed on the HPC for Research cluster of
the Berlin Institute of Health.
Appendix A
Table 3
Corresponding validation Dice similarity coefficient (DSC) and the 95th-percentile Hausdorff distance (95HD) to
Table 2. All models were trained on the PEGASUS dataset. For data augmentation, the U-net was trained both on real
and data generated by the respective GAN architecture. For anonymization, the U-nets were trained on generated
data alone. Both metrics are averaged over 4 different sets of generated data. SD stands for standard deviation.
val DSC val 95HD [mm]
mean SD mean SD
U-net (Livne et al.) 0.879 29.499
Data augmenta-tion
DCGAN 0.883 0.002 29.856 0.624
WGAN-GP 0.885 0.002 29.556 0.520
WGAN-GP-SN 0.887 0.001 29.749 0.629
Anonymization
DCGAN 0.810 0.004 34.331 0.274
WGAN-GP 0.848 0.005 30.964 0.054
WGAN-GP-SN 0.859 0.003 31.477 0.166
References
[1] Abramian, D., Eklund, A., . Refacing: Reconstructing Anonymized Facial Features
Using GANS , vol. 5.
[2] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv:1701.07875 [cs, stat]
URL, http://arxiv.org/abs/1701.07875, 2017. arXiv: 1701.07875.
[3] C. Bowles, L. Chen, R. Guerrero, P. Bentley, R. Gunn, A. Hammers, D.A. Dickie, M.
V. Hern´
andez, J. Wardlaw, D. Rueckert, GAN Augmentation: Augmenting Training
Data Using Generative Adversarial Networks, 2018 arXiv:1810.10863 [cs] URL, htt
p://arxiv.org/abs/1810.10863. arXiv: 1810.10863.
[4] B. Cao, H. Zhang, N. Wang, X. Gao, D. Shen, Auto-GAN: self-supervised
collaborative learning for medical image synthesis, in: Proceedings of the AAAI
Conference on Artificial Intelligence 34, 2020, pp. 10486–10493, https://doi.org/
10.1609/aaai.v34i07.6619. URL, https://aaai.org/ojs/index.php/AAAI/article/
view/6619.
[5] M. Foroozandeh, A. Eklund, Synthesizing Brain Tumor Images and Annotations by
Combining Progressive Growing GAN and SPADE, 2020 arXiv:2009.05946 [cs]
URL, http://arxiv.org/abs/2009.05946. arXiv: 2009.05946.
[6] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, GAN-
based synthetic medical image augmentation for increased CNN performance in
liver lesion classification, Neurocomputing 321 (2018) 321–331, https://doi.org/
10.1016/j.neucom.2018.09.013. URL, http://www.sciencedirect.com/science/
article/pii/S0925231218310749.
[7] J.T. Guibas, T.S. Virdi, P.S. Li, Synthetic Medical Images from Dual Generative
Adversarial Networks, 2018 arXiv:1709.01872 [cs] URL, http://arxiv.org/abs/1
709.01872. arXiv: 1709.01872.
[8] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C., . Improved
Training of Wasserstein GANs , vol. 11..
[9] C. Haarburger, N. Horst, D. Truhn, M. Broeckmann, S. Schrading, C. Kuhl,
D. Merhof, Multiparametric magnetic resonance image synthesis using generative
adversarial networks, Eurographics Workshop on Visual Computing for Biology
and Medicine 5 (2019), https://doi.org/10.2312/VCBM.20191226 pagesURL,
https://diglib.eg.org/handle/10.2312/vcbm20191226.
[10] J. Hayes, L. Melis, G. Danezis, E.D. Cristofaro, LOGAN: membership inference
attacks against generative models, in: Proceedings on Privacy Enhancing
Technologies 2019, 2019, pp. 133–152, https://doi.org/10.2478/popets-2019-
0008. URL, https://content.sciendo.com/view/journals/popets/2019/1/article
-p133.xml.
[11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs Trained by
a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, 2018 arXiv:
1706.08500 [cs, stat] URL, http://arxiv.org/abs/1706.08500. arXiv: 1706.08500.
[12] D. Higgins, V.I. Madai, From Bit to Bedside: A Practical Framework for Artificial
Intelligence Product Development in Healthcare. Advanced Intelligent Systems,
2020, https://doi.org/10.1002/aisy.202000052. N/a, 2000052, doi:10.1002/
aisy.202000052.
[13] B. Hotter, S. Pittl, M. Ebinger, G. Oepen, K. Jegzentis, K. Kudo, M. Rozanski, W.
U. Schmidt, P. Brunecker, C. Xu, P. Martus, M. Endres, G.J. Jungehülsing,
A. Villringer, J.B. Fiebach, Prospective study on the mismatch concept in acute
stroke patients within the first 24 h after symptom onset - 1000Plus study, BMC
Neurol. 9 (2009) 60, https://doi.org/10.1186/1471-2377-9-60. URL, doi:10.1186/
1471-2377-9-60.
[14] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, S. Belongie, Stacked Generative
Adversarial Networks, 2017 arXiv:1612.04357 [cs, stat] URL, http://arxiv.
org/abs/1612.04357. arXiv: 1612.04357.
[15] H. Hukkelås, R. Mester, F. Lindseth, Deep privacy: a generative adversarial
network for face anonymization, in: G. Bebis, R. Boyle, B. Parvin, D. Koracin,
D. Ushizima, S. Chai, S. Sueda, X. Lin, A. Lu, D. Thalmann, C. Wang, P. Xu (Eds.),
Advances in Visual Computing, vol. 11844, Springer International Publishing,
Cham, 2019, pp. 565–578, https://doi.org/10.1007/978-3-030-33720-9_44. URL,
http://link.springer.com/10.1007/978-3-030-33720-9_44.
[16] T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive Growing of GANs for Improved
Quality, Stability, and Variation, 2018 arXiv:1710.10196 [cs, stat] URL, htt
p://arxiv.org/abs/1710.10196. arXiv: 1710.10196.
[17] D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, 2017 arXiv:
1412.6980 [cs] URL, http://arxiv.org/abs/1412.6980. arXiv: 1412.6980.
[18] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep
convolutional neural networks, Commun. ACM 60 (2017) 84–90, https://doi.org/
10.1145/3065386. URL, http://dl.acm.org/citation.cfm?doid=3098997.3065386.
[19] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A.W.
M. van der Laak, B. van Ginneken, C.I. S´
anchez, A survey on deep learning in
medical image analysis, Med. Image Anal. 42 (2017) 60–88, https://doi.org/
10.1016/j.media.2017.07.005. URL, http://www.sciencedirect.com/science/artic
le/pii/S1361841517301135.
[20] M. Livne, J. Rieger, O.U. Aydin, A.A. Taha, E.M. Akay, T. Kossen, J. Sobesky, J.
D. Kelleher, K. Hildebrand, D. Frey, V.I. Madai, A U-net deep learning framework
for high performance vessel segmentation in patients with cerebrovascular disease,
Front. Neurosci. 13 (2019), https://doi.org/10.3389/fnins.2019.00097. URL,
https://www.frontiersin.org/articles/10.3389/fnins.2019.00097/full#h7.
[21] T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral Normalization for
Generative Adversarial Networks, 2018 arXiv:1802.05957 [cs, stat] URL, htt
p://arxiv.org/abs/1802.05957. arXiv: 1802.05957.
[22] S. Mukherjee, Y. Xu, A. Trivedi, J.L. Ferres, privGAN: Protecting GANs from
Membership Inference Attacks at Low Cost, 2020 arXiv:2001.00071 [cs, stat] URL,
http://arxiv.org/abs/2001.00071. arXiv: 2001.00071.
[23] M.A. Mutke, V.I. Madai, F.C. von Samson-Himmelstjerna, O. Zaro Weber, G.
S. Revankar, S.Z. Martin, K.L. Stengl, M. Bauer, S. Hetzer, M. Günther, J. Sobesky,
Clinical evaluation of an arterial-spin-labeling product sequence in steno-occlusive
disease of the brain, PloS One 9 (2014), e87143, https://doi.org/10.1371/journal.
pone.0087143.
[24] Neff, T., Payer, C., Stern, D., Urschler, M., . Generative adversarial network based
synthesis for supervised medical image segmentation. Proceedings of the OAGM &
T. Kossen et al.
Computers in Biology and Medicine 131 (2021) 104254
9
ARW Joint Workshop Vision, Automation and Robotics doi:10.3217/978-3-85125-
524-9-30..
[25] A. Radford, L. Metz, S. Chintala, Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks, 2016 arXiv:1511.06434 [cs] URL,
http://arxiv.org/abs/1511.06434. arXiv: 1511.06434.
[26] V. Ravindra, A. Grama, De-anonymization Attacks on Neuroimaging Datasets,
2019 arXiv:1908.03260 [cs, eess, q-bio] URL, http://arxiv.org/abs/1908.03260.
arXiv: 1908.03260.
[27] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical
image segmentation, in: N. Navab, J. Hornegger, W.M. Wells, A.F. Frangi (Eds.),
Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015,
vol. 9351, Springer International Publishing, Cham, 2015, pp. 234–241, https://
doi.org/10.1007/978-3-319-24574-4_28. URL, http://link.springer.com
/10.1007/978-3-319-24574-4_28.
[28] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved
Techniques for Training GANs, 2016 arXiv:1606.03498 [cs] URL, http://arxiv.
org/abs/1606.03498. arXiv: 1606.03498.
[29] V. Sandfort, K. Yan, P.J. Pickhardt, R.M. Summers, Data augmentation using
generative adversarial networks (CycleGAN) to improve generalizability in CT
segmentation tasks, Sci. Rep. 9 (2019) 16884, https://doi.org/10.1038/s41598-
019-52737-x. https://www.nature.com/articles/s41598-019-52737-x. number: 1
Publisher: Nature Publishing Group.
[30] H.C. Shin, N.A. Tenenholtz, J.K. Rogers, C.G. Schwarz, M.L. Senjem, J.L. Gunter, K.
P. Andriole, M. Michalski, Medical image synthesis for data augmentation and
anonymization using generative adversarial networks, in: A. Gooya, O. Goksel,
I. Oguz, N. Burgos (Eds.), Simulation and Synthesis in Medical Imaging, Springer
International Publishing, Cham, 2018, pp. 1–11, https://doi.org/10.1007/978-3-
030-00536-8_1.
[31] R. Shokri, M. Stronati, C. Song, V. Shmatikov, Membership inference attacks
against machine learning models, in: 2017 IEEE Symposium on Security and
Privacy (SP), 2017, pp. 3–18, https://doi.org/10.1109/SP.2017.41, iSSN: 2375-
1207.
[32] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale
Image Recognition, 2014 arXiv:1409.1556 [cs] URL, http://arxiv.org/abs/
1409.1556. arXiv: 1409.1556.
[33] V. Sorin, Y. Barash, E. Konen, E. Klang, Creating Artificial Images for Radiology
Applications Using Generative Adversarial Networks (GANs) – A Systematic
Review, Academic Radiology URL, 2020, https://doi.org/10.1016/j.
acra.2019.12.024. https://linkinghub.elsevier.com/retrieve/pii/S10766332203
00210.
[34] Wachinger, C., Golland, P., Kremen, W., Fischl, B., Reuter, M., Alzheimer’s Disease
Neuroimaging Initiative, 2015. BrainPrint: a discriminative characterization of
brain morphology. Neuroimage 109, 232–248.. doi:10.1016/j.
neuroimage.2015.01.032.
[35] B. Wu, S. Zhao, C. Chen, H. Xu, L. Wang, X. Zhang, G. Sun, J. Zhou, Generalization
in Generative Adversarial Networks: A Novel Perspective from Privacy Protection,
2019 arXiv:1908.07882 [cs, stat] URL, http://arxiv.org/abs/1908.07882. arXiv:
1908.07882.
[36] L. Xie, K. Lin, S. Wang, F. Wang, J. Zhou, Differentially Private Generative
Adversarial Network, 2018 arXiv:1802.06739 [cs, stat] URL, http://arxiv.
org/abs/1802.06739. arXiv: 1802.06739.
[37] X. Yi, E. Walia, P. Babyn, Generative adversarial network in medical imaging: a
review, Med. Image Anal. 58 (2019) 101552, https://doi.org/10.1016/j.
media.2019.101552. URL, http://arxiv.org/abs/1809.07294. arXiv: 1809.07294.
T. Kossen et al.
5
Generating 3D TOF-MRA Volumes and
Segmentation Labels Using Generative
Adversarial Networks
5.1 Context Within Thesis
Most medical images, including brain images, are 3D. Whereas the third dimension often
offers valuable spatial information, image processing on 3D images coincides with a substantial
increase in memory consumption and processing time compared to 2D approaches. This is
especially the case for computationally demanding neural networks such as GANs.
The present work tackled the limitation of neglecting information in the third dimension
of the brain images in Chapter 4 and extended the GAN architectures to synthesize 3D high-
resolution, labeled image volumes. To overcome the computational restrictions, we introduced
techniques for memory efficiency and reduced training times, such as mixed precision and the
two timescale update rule.
Furthermore, we extended the evaluation schemes compared to Chapter 4 to use a network
pre-trained on medical images when calculating the FID and precision-recall curves of the
distributions.
37
5. Generating 3D TOF-MRA Volumes and Segmentation Labels Using Generative
Adversarial Networks
5.2 Journal Article
This chapter is based on the following publication that was published in Medical Image Analysis:
P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert,
J. Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I.
Madai. “Generating 3D TOF-MRA Volumes and Segmentation Labels using Generative
Adversarial Networks”. In: Medical Image Analysis (2022). doi:
10.1016/j.media.20
22.102396
The original journal article is reprinted with permission of Elsevier. The article is open access
under the CC BY license.
Author Contribution
The second author Tabea Kossen conceptualized the study and interpreted the results together
with PS, VIM, DF, ML and AH. She performed and/or supervised PS in model implementation.
Additionally, she was responsible for the project administration, wrote the first version of the
manuscript together with PS and VIM and coordinated the journal submission process.
Code Availability
The code for this project is publicly available:
https://github.com/prediction2020/3DGA
N_synthesis_of_3D_TOF_MRA_with_segmentation_labels.
38
Medical Image Analysis 78 (2022) 102396
Contents lists available at ScienceDirect
Medical Image Analysis
journal homepage: www.elsevier.com/locate/media
Generating 3D TOF-MRA volumes and segmentation labels using
generative adversarial networks
Pooja Subramaniam
a
, Tabea Kossen
a , b , ∗, Kerstin Ritter
c , d
, Anja Hennemuth
b , e , f
,
Kristian Hildebrand
g
, Adam Hilbert
a
, Jan Sobesky
h , i
, Michelle Livne
a
, Ivana Galinovic
i
,
Ahmed A. Khalil
i , j , k , l
, Jochen B. Fiebach
i
, Dietmar Frey
a
, Vince I. Madai
a , m , n
a
CLAIM - Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Germany
b
Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany
c
Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu
Berlin, and Berlin Institute of Health), Berlin, Germany
d
Bernstein Center for Computational Neuroscience, Berlin, Germany
e
Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany
f
Fraunhofer MEVIS, Max-von-Laue-Str. 2, Bremen, Germany
g
Department VI Computer Science and Media, Beuth University of Applied Sciences, Berlin, Germany
h
Johanna-Etienne-Hospital, Neuss, Germany
i
Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany
j
Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
k
Mind, Brain, Body Institute, Berlin School of Mind and Brain, Humboldt University Berlin, Berlin, Germany
l
Berlin Institute of Health, Berlin, Germany
m
School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK
n
QUEST-Center for Transforming Biomedical Research, Berlin Institute of Health, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin 10117, Germany
a r t i c l e i n f o
Article history:
Received 13 July 2021
Revised 28 January 2022
Accepted 17 February 2022
Available online 24 February 2022
MSC:
41A05
41A10
65D05
65D17
Keywords:
Generative adversarial networks
3D Medical imaging
Mixed precision
Anonymization
Brain vessel segmentation
a b s t r a c t
Deep learning requires large labeled datasets that are difficult to gather in medical imaging due to data
privacy issues and time-consuming manual labeling. Generative Adversarial Networks (GANs) can allevi-
ate these challenges enabling synthesis of shareable data. While 2D GANs have been used to generate 2D
images with their corresponding labels, they cannot capture the volumetric information of 3D medical
imaging. 3D GANs are more suitable for this and have been used to generate 3D volumes but not their
corresponding labels. One reason might be that synthesizing 3D volumes is challenging owing to compu-
tational limitations. In this work, we present 3D GANs for the generation of 3D medical image volumes
with corresponding labels applying mixed precision to alleviate computational constraints.
We generated 3D Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) patches with their corre-
sponding brain blood vessel segmentation labels. We used four variants of 3D Wasserstein GAN (WGAN)
with: 1) gradient penalty (GP), 2) GP with spectral normalization (SN), 3) SN with mixed precision (SN-
MP), and 4) SN-MP with double filters per layer (c-SN-MP). The generated patches were quantitatively
evaluated using the Fréchet Inception Distance (FID) and Precision and Recall of Distributions (PRD). Fur-
ther, 3D U-Nets were trained with patch-label pairs from different WGAN models and their performance
was compared to the performance of a benchmark U-Net trained on real data. The segmentation perfor-
mance of all U-Net models was assessed using Dice Similarity Coefficient (DSC) and balanced Average
Hausdorff Distance (bAVD) for a) all vessels, and b) intracranial vessels only.
Our results show that patches generated with WGAN models using mixed precision (SN-MP and c-SN-
MP) yielded the lowest FID scores and the best PRD curves. Among the 3D U-Nets trained with synthetic
patch-label pairs, c-SN-MP pairs achieved the highest DSC (0.841) and lowest bAVD (0.508) compared to
the benchmark U-Net trained on real data (DSC 0.901; bAVD 0.294) for intracranial vessels.
In conclusion, our solution generates realistic 3D TOF-MRA patches and labels for brain vessel segmenta-
tion. We demonstrate the benefit of using mixed precision for computational efficiency resulting in the
best-performing GAN-architecture. Our work paves the way towards sharing of labeled 3D medical data
which would increase generalizability of deep learning models for clinical use.
©2022 The Authors. Published by Elsevier B.V.
This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
https://doi.org/10.1016/j.media.2022.102396
1361-8415/© 2022 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
1. Introduction
The success of deep learning algorithms in natural image anal-
ysis has been leveraged in recent years to the medical imaging do-
main. Deep learning methods have been used for automation of
various manual time-consuming tasks such as segmentation and
classification of medical images ( Greenspan et al., 2016; Lunder-
vold and Lundervold, 2019 ). Supervised deep learning methods,
specifically, learn relevant features from images by mapping fea-
tures in the input images to the label output. While the advantage
of these methods is that they do not need manual extraction of
features from the images, they do require large amounts of labeled
data. Here, a major challenge is that it is expensive and difficult
to acquire and label medical data ( Yi et al., 2019 ). Yet, even when
labeled medical data is available, it usually cannot be shared read-
ily with other researchers due to privacy concerns ( Clinical Prac-
tice Committee, 20 0 0 ). Anonymization methods typically applied
in medical imaging would not be beneficial in the case of neu-
roimaging as the unique neuroanatomical features present in brain
images could be used to identify individuals ( Wachinger et al.,
2015; Valizadeh et al., 2018 ). As a consequence, often small, siloed
or homogenous datasets are used when proposing new deep learn-
ing models in neuroimaging ( Willemink et al., 2020 ).
A potential solution to this problem is the generation of syn-
thetic medical imaging data. A very promising method for this pur-
pose is Generative Adversarial Networks (GANs) ( Goodfellow et al.,
2014 ). Various GAN architectures from the natural images domain
have gained popularity in medical imaging for image synthesis,
supervised image-to-image translation, reconstruction and super-
resolution ( Yi et al., 2019 ). For image synthesis, specifically, 2D
GANs have been used in several works such as synthesis of Com-
puted Tomography (CT) liver lesions ( Frid-Adar et al., 2018 ), skin
lesion images ( Baur et al., 2018 ), and axial Magnetic Resonance
(MR) slices ( Bermudez et al., 2018 ). GANs can be extended to gen-
erate the labels along with the synthesized images. For example,
2D GANs have been used to generate the corresponding segmenta-
tion labels for lung X-rays ( Neff et al., 2018 ), vessel segmentation
( Kossen et al., 2021 ), retinal fundus images ( Guibas et al., 2018 )
and brain tumor segmentation ( Foroozandeh and Eklund, 2020 ).
Although these results are promising, the challenge remains that
2D GANs cannot capture important anatomical relationships in the
third dimension. Since medical images are often recorded in 3D,
GANs generating 3D medical images are thus highly warranted. 3D
GANs have been used to generate downsampled or resized MRI im-
ages of different resolutions ( Kwon et al., 2019; Eklund, 2020; Sun
et al., 2021 ). However, to our knowledge, there is no 3D GAN med-
ical imaging study that generates the corresponding labels, which
is critical for using the data for supervised deep learning research.
One reason could be that synthesizing 3D volumes is still a chal-
lenge due to computational limitations.
In our study, we generate high resolution 3D medical image
patches along with their labels in an end-to-end paradigm for
brain vessel segmentation which aids in identifying and studying
cerebrovascular diseases. From 3D Time-of-Flight Magnetic Reso-
nance Angiography (TOF-MRA), we synthesize 3D patches together
with brain vessel segmentation labels. We implement and compare
four different 3D Wasserstein-GAN (WGAN) variants: three with
the same architecture but different regularizations and mixed pre-
cision ( Micikevicius et al., 2018 ) schemes, and one with a modi-
fied architecture - double filters per layer - owing to memory ef-
ficiency from mixed precision. Next to a qualitative visual assess-
∗Corresponding author at: CLAIM - Charité Lab for AI in Medicine, Charité Uni-
versitätsmedizin Berlin, Germany.
E-mail address: tabea.kossen@charite.de (T. Kossen).
ment, we use quantitative measures to evaluate the synthesized
patches. We further evaluate the performance of brain vessel seg-
mentation models trained on the generated patch-label pairs and
compare them to a benchmark model trained on real data. Addi-
tionally, we also compare the segmentation performance on a sec-
ond, independent dataset.
To summarize, our main contributions are:
1. For the first time to our knowledge in the medical imaging do-
main, we generate high resolution 3D patches along with seg-
mentation labels using GANs.
2. We utilize the memory efficiency provided by mixed precision
to enable a more complex WGAN architecture with double the
filters per layer.
3. Our generated labels allow us to train 3D U-Net models for
brain vessel segmentation on synthetic data in an end-to-end
framework.
2. Methods
2.1. Architecture
We adapted the WGAN - Gradient penalty ( Gulrajani et al.,
2017 ) model to 3D in order to produce 3D patches and their cor-
responding labels of brain vessel segmentation. We implemented
four variants of the architecture: a) GP model - WGAN-GP model
in 3D b) SN model - GP model with spectral normalization in the
critic network c) SN-MP model - SN model with mixed precision d)
c-SN-MP model - SN-MP model with double the filters per layer. An
overview of the GAN training is provided in Fig. 1 .
For all models, a noise vector ( z) of length 128 sampled from
a standard Gaussian distribution ( N (0 , 1) ) was input to the Gener-
ator G . It was fed through a linear layer and a 3D batch normal-
ization layer, then 3 blocks of upsampling and 3D convolutional
layers with consecutive batch normalization and ReLU activation,
and a final upsampling and 3D convolutional layer as shown in
Fig. 2 A. An upsample factor of 2 with nearest neighbor interpo-
lation was used. The convolutional layers used kernel size of 3 and
stride of 1. Hyperbolic tangent ( tanh ) was used as the final activa-
tion function. The output of the generator was a two channel im-
age of size 128 ×128 ×64 : one channel was the TOF-MRA patch
and the second channel was the corresponding label which is the
ground truth segmentation of the generated patch. The function of
the labels is to train a supervised segmentation model such as a
3D U-Net model with the generated data.
Next, the critic D either took the generated 3D patch-label pairs
(G (z(i ))) or the real 3D patch-label pairs ( x ) as its input. The
patch-label pairs were fed through four 3D convolutional layers.
A kernel size of 3 and stride of 2 was used in the convolutional
layers. After each convolutional layer, a 3D instance normalization
layer was used for the GP model as shown in Fig. 2 B. Here, for
the SN model, we used spectral normalization ( Miyato et al., 2018 )
after each convolutional layer which acts as an additional regular-
ization to gradient penalty as shown in Fig. 2 C. Leaky ReLU was
used as the activation layer after the normalization layers. The last
layer was linear that produced a scalar, coined as the critic’s score.
The score indicates how similar the distribution of the generated
patch-label pairs is to that of the real patch-label pairs. This indi-
rectly ensures that the generated labels correspond to the vessels
in the generated patches similar to how the real labels correspond
to the vessels in the real patches. The loss function of the critic
was:
loss
D
(i ) = D (G (z
(i )
)) −D (x ) + λ(∇ D (
ˆ
x ) −1)
2 (1)
where ˆ
x = x + (1 −) G (z
(i )
) , ∼U[0 , 1] , λ= 10 and ∇is gradient
of the critic. Here, the difference between the critic’s score for the
2
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Fig. 1. Structure of the workflow from training the 3D GAN to qualitative and quantitative assessments. Top: Overview of GAN training - Here, we illustrate our most complex
model using spectral normalization and mixed precision (c-SN-MP), middle: Evaluation schemes, bottom: Segmentation performance evaluation.
Fig. 2. Architectures of A. Generator of all models, B. Critic of GP model, and C.
Critic of all SN models.
real and generated data along with the gradient penalty is com-
puted. The loss function of the generator based on the output of
the critic was:
loss
G
(i ) = −D (G (z
(i )
)) (2)
This equates to maximizing the critic’s score for the generated
images by using the negative of the critic’s score as loss for the
generator.
In the case of the SN-MP model, mixed precision was used
for memory efficiency. The default precision used in deep learning
methods is 32 floating point (FP32). In mixed precision, both half
precision (FP16) and FP32 are used depending on the precision re-
quirements of a particular arithmetic operation. Here, FP16 is used
for storing weights, activations and gradients while an FP32 mas-
ter copy of weights is used for optimizer updates. A loss-scaling
factor is applied in order to maintain the performance equivalent
to a fully FP32 network. Using mixed precision, allowed us to use
more filters per layer. Hence, c-SN-MP model was trained where
double the filters were used in each layer of the SN-MP model. For
implementation details, see open source code
1
.
1 https://github.com/prediction2020/3DGAN _ synthesis _ of _ 3D _ TOF _ MRA _ with _
segmentation _ labels
3
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
2.2. Data
2.2.1. Datasets
TOF-MRA data of 137 patients with cerebrovascular disease
from two earlier studies, PEGASUS ( n = 72 ) and 10 0 0Plus ( n = 65 ),
were used in this work. The 65 TOF-MRA data from the 10 0 0Plus
study were used for additional validation as a second, independent
dataset. The details of the studies can be found in ( Mutke et al.,
2014 ) for PEGASUS and in ( Hotter et al., 2009 ) for 10 0 0Plus. The
imaging was performed with the following parameters for both
the studies: voxel size =0 . 5 ×0 . 5 ×0 . 7 mm3 ; matrix size =312 ×
384 ×127 ; TR/TE = 22 ms/3.86 ms; time of acquisition = 3:50 min,
flip angle = 18 degrees.
The images were pre-segmented semi-manually with a stan-
dardized pipeline using a thresholded region growing algorithm in
the case of PEGASUS dataset, and a 2D U-Net segmentation model
( Livne et al., 2019 ) in the case of 10 0 0Plus. Final ground truths
were created following pre-defined manual correction steps first
by junior and finally senior raters. Further details of the labeling
methodology can be found in ( Hilbert et al., 2020 ).
2.2.2. Data splitting and preprocessing
TOF-MRA images from each study were denoised, and non-
uniformity correction was applied to improve image quality
( Masoudi et al., 2021 ). For training the GANs, 47 of the patient data
of the PEGASUS study were used. For the downstream task of seg-
mentation, 12 were used as the validation set and 13 as the test
set. The 10 0 0Plus study dataset was solely used as an independent
test set ( n = 65 ) for evaluation of the trained segmentation model.
Due to computational limitations, 3D patches of size 128 ×
128 ×64 were extracted from the whole brain TOF-MRA scans of
the PEGASUS training set. For training of GANs, 50 patches of im-
ages and their labels per patient were extracted - in part system-
atically (18) to cover all parts of the image and in part randomly
(32) with the center voxel being a blood vessel in order to rep-
resent sufficient vessels. This amounted to a total of 2,350 patch-
label pairs. In addition, 250 patches per patient were randomly
extracted with center voxel as blood vessel for the downstream
segmentation model from the PEGASUS training and validation set
leading to 11,750 and 3,0 0 0 patch-label pairs respectively.
The image patches for the GAN training were normalized be-
tween -1 and +1. The corresponding labels were stacked on the
image patch as a second channel for training the GANs.
2.3. Evaluation methods
An overview of the evaluation methods is shown in Fig. 1 .
The qualitative evaluation was done by visually assessing the im-
ages, labels and the 3D vessel structure using ITK-SNAP
2 as a first
step. For a quantitative assessment, FID scores were computed
from the extracted features using MedicalNet following precedence
( Sun et al., 2021 ). This is a 3D ResNet model pretrained on 23 dif-
ferent medical datasets for segmentation ( Chen et al., 2019 ). We
chose this network instead of the commonly used Inception-v3
trained on ImageNet dataset ( Szegedy et al., 2016 ) for calculating
the FID scores to better match our 3D medical data.
While the FID measures the quality of the images, it does not
account for mode collapse. Mode collapse happens when the gen-
erator learns to output a small set of good quality images to get a
good critic’s score and does not learn further any new variations
present in the training data. In order to quantify both quality and
variety of modes captured in the synthetic data, we used Preci-
sion and Recall for Distributions (PRD) ( Sajjadi et al., 2018 ). Preci-
sion quantifies the quality of the image, and Recall amounts to the
2 http://www.itksnap.org/pmwiki/pmwiki.php
Fig. 3. Brain mask application for intracranial vessels analysis. Here, an axial slice
is shown of A. TOF-MRA image with skull B. brain mask extracted using FSL-BET
tool from TOF-MRA image C. ground truth segmentation label after brain mask ap-
plication leading to skull-stripping i.e. removal of all vessels of face and neck with
only intracranial vessels remaining.
mode collapse. We also computed the Area Under the Curve (AUC)
of the PRD curves to extract a single score for a simple quantifica-
tion. Here again, we compared the extracted features of the gener-
ated and real patches from the pre-trained MedicalNet. It is impor-
tant to note that both FID and PRD curves are based on the imag-
ing patches alone and the labels are not taken into consideration
for these performance measures.
Next, we tested the generated data for brain vessel segmen-
tation. 3D U-Nets were trained on the synthetic patch-label pairs
produced from the four different 3D GANs, and on the real data to
compare segmentation performance. The generated patches were
rescaled back to the real data range i.e. to 0–255 and the labels
made binary by using a threshold. The performance of all trained
U-Nets was evaluated on two independent test sets in two separate
analysis schemes: a) all vessels b) intracranial vessels. In the case
of all vessels, the whole predicted segmentation label was consid-
ered for evaluation. For intracranial vessels, the segmentation la-
bels were processed so that only the intracranial vessels were con-
sidered. This was done by applying brain masks of corresponding
TOF-MRA images on the ground truth segmentation labels and the
prediction labels from all the U-Net models. The brain masks were
obtained automatically using the FSL-Brain Extraction Tool
3 (BET)
with parameter frac = 0 . 05 on the TOF-MRA images. A visual il-
lustration of this post-processing of labels for intracranial vessels
is shown in Fig. 3 . In each case, the U-Net model that performed
the best on the real validation set was selected to compute and
report the performance on the real test sets. This method of eval-
uation not only signifies the utility of the synthetic data for the
brain vessel segmentation use case but also provides information
about how well the generated labels reflect the vessel information
in the generated patch as this is crucial for a good segmentation
performance. The segmentation performance was measured using
Dice Similarity Coefficient (DSC) and the balanced Average Haus-
dorff Distance (bAVD) ( Aydin et al., 2021b ). DSC is a commonly
used metric to evaluate segmentation performance, given by:
DSC =
2 ×T P
2 ×T P + F P + F N
(3)
where TP = True positive; FP = False positive; FN = False negative.
A higher DSC indicates good segmentation performance. bAVD is a
distance metric which has been shown to be a better metric for
evaluation of blood vessel segmentation ( Aydin et al., 2021a ). It is
a modified average Hausdorff distance defined as:
bAV D =
1
2
×
1
N
G
g∈ G
min
p∈ P
(d(g, p)) +
1
N
G
p∈ P
min
g∈ G
(d(p, g))
(4)
where G is the set of voxels in the ground truth, P is the set of
voxels in the predicted segmentation. The balanced directed av-
erage Hausdorff distance from voxel set G to P is given by the
sum of all minimum distances from all points belonging to point
3 https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET/UserGuide
4
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
set G to P divided by the number of points in G . Similarly, bal-
anced directed average Hausdorff distance from voxel set P to G
is given by the sum of all minimum distances from all points be-
longing to point set P to G divided by the number of points in
G . bAVD is the mean of the directed average Hausdorff distance
from G to P and directed average Hausdorff distance from P to G
in voxels. A lower bAVD indicates good segmentation performance.
We used the EvaluateSegmentation tool ( Taha and Hanbury, 2015 )
to calculate the DSC and bAVD for each patient prediction. The
mean DSC and mean bAVD was then calculated across all the
patients.
2.4. Training
The models were implemented in PyTorch, and trained using
an Nvidia TITAN RTX GPU for 100 epochs each. We used two time-
scale update rule ( Heusel et al., 2018 ) with different learning rates
of 0.0 0 04 and 0.0 0 02 for the critic and the generator respectively
instead of having more updates for the critic within each epoch.
Adam optimizer ( Kingma and Ba, 2014 ) with β1
= 0 and β2
= 0 . 9
was used. The batch-size for all models was 4. For mixed preci-
sion, the Automatic Mixed Precision (AMP) package from PyTorch
was used. A threshold of 0.3 was set for binarizing the generated
labels except in the case of SN model where 0.2 was used. All the
above hyperparameters were chosen based on the performance of
the validation set in the segmentation task. The training times and
the memory used for each GAN variant were recorded.
Table 1
FID scores and AUC of the PRD curves for
synthetic data from different models.
Data source FID PRD-AUC
GP model 0.0381 0.80
SN model 0.0322 0.82
SN-MP model 0.0206 0.87
c-SN-MP model 0.0244 0.86
For segmentation, the published 3D U-Net architec-
ture and framework implemented in TensorFlow from
Hilbert et al. (2020) was utilized with the default hyperpa-
rameters. These were Adam optimizer with a learning rate of
0.0 0 01 and β1
= 0 . 9 , β2
= 0 . 999 , and batch size of 8.
3. Results
In the visual analysis, the synthetic patches, labels and the 3D
vessel structure from the complex mixed precision model (c-SN-
MP) appeared as the most realistic ( Fig. 4 ). The patches from the
mixed precision models (SN-MP and c-SN-MP) had the lowest FID
scores ( Table 1 ), and the best PRD curves ( Fig. 5 ). Based on the PRD
curves, the precision of c-SN-MP outperformed SN-MP where the
recall values are higher while the precision of SN-MP is higher for
lower recall values. Based on the AUC of the PRD curves shown
in Table 1 , SN-MP and c-SN-MP patches performed similarly. In
Table 2 , the memory consumption and the training duration of
Fig. 4. Sets of samples of the mid-axial slice of the patch and label, and the corresponding 3D vessel structure from A) GP B) SN C) SN-MP D) c-SN-MP and E) real. The
visualizations were obtained using ITK-SNAP for illustrative purposes only.
5
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Table 2
Total number of trainable parameters, memory consumption and training times of vari-
ous 3D GAN models. Note that c-SN-MP, which is our complex mixed precision model,
uses twice the number of filters per layer leading to doubling of the trainable parame-
ters compared to non-complex models. The memory consumption increased by 1.5 times
compared to the SN model allowing it to be accommodated in the limited memory of
our computational infrastructure. The training time also increased by 2.5 times but it
was not a constraint
in our study.
Model Trainable parameters (million) Memory (MB) Time (hours)
GP model 145 15,085 78
SN model 145 14,333 77
SN-MP model 145 9,013 77
c-SN-MP model 308 21,351 192
Table 3
The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2
different datasets PEGASUS and 10 0 0Plus. The value in brackets is the standard deviation
across patients. A) All vessels is done on the entire prediction with the entire ground truth
as reference, and B) Intracranial vessels is done on skull-stripped prediction with skull-
stripped ground truth as reference.
Data source PEGASUS 10 0 0Plus
Mean DSC Mean bAVD Mean DSC Mean bAVD
A) All vessels
GP model 0.793 (0.024) 2.648 (1.189) 0.807 (0.03) 1.895(1.061)
SN model 0.804 (0.019) 2.425 (1.505) 0.796 (0.029) 1.855 (0.929)
SN-MP model 0.782 (0.020) 2.334 (1.122) 0.778 (0.032) 1.746 (0.894)
c-SN-MP model 0.820 (0.017) 1.859 (1.038) 0.809 (0.031) 0.858 (0.91)
Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)
B) Intracranial vessels
GP model 0.827 (0.015) 0.639 (0.132) 0.829 (0.019) 0.701 (0.195)
SN model 0.833 (0.013) 0.606 (0.141) 0.811 (0.023) 0.716 (0.213)
SN-MP model 0.804 (0.020) 0.784 (0.125) 0.785 (0.027) 0.822 (0.211)
c-SN-MP model 0.841 (0.016) 0.508 (0.083) 0.817 (0.028) 0.611 (0.18)
Real 0.901 (0.019) 0.294 (0.077) 0.880 (0.024) 0.507 (0.126)
Fig. 5. PRD Curves of synthetic data from the four different models with real data
as reference. Precision and Recall in GANs quantify the quality and modes captured
by the models respectively.
each of the GAN variants is shown. Using mixed precision im-
proved the memory efficiency by approximately 40%.
The test set performance of the 3D U-Net trained on generated
data from different models and on real data is shown in Table 3 .
Here, Table 3 A shows the performance when all vessels are con-
sidered. The U-Net trained with c-SN-MP synthetic data outper-
formed all the U-Nets trained on other synthetic data for the PE-
GASUS test set (mean DSC 0.820; mean bAVD 1.859). In the case
of the external dataset 10 0 0Plus, the performance of U-Net trained
on synthetic data from GP model and c-SN-MP model were the
same in terms of mean DSC with 0.810 whereas the performance
of U-Net trained on data from c-SN-MP was the lowest in terms of
mean bAVD with 1.301. In comparison, the performance of the 3D
U-Net trained with real data on PEGASUS test set was overall still
the highest (mean DSC 0.906; mean bAVD 0.339), and on 10 0 0Plus
test set (mean DSC 0.887; mean bAVD 0.622).
Next, Table 3 B shows the performance for intracranial vessels
alone. Here, the U-Net trained with c-SN-MP synthetic data outper-
formed all the U-Nets trained on other synthetic data for the PE-
GASUS test set (mean DSC 0.841; mean bAVD 0.508). For the exter-
nal test set from the 10 0 0Plus dataset, the U-Net trained on gen-
erated data from GP was the highest in terms of mean DSC with
0.830 whereas the U-Net trained on generated data from c-SN-MP
was the lowest in terms of mean bAVD with 0.639. The perfor-
mance of labels with only intracranial vessels from the 3D U-Net
trained with real data on the PEGASUS test set was still the high-
est (mean DSC 0.901; mean bAVD 0.294), and on the 10 0 0Plus test
set (mean DSC 0.880; mean bAVD 0.541).
Box-whisker plots of the prediction performance of various
models on the two test sets are plotted in Fig. 6 which shows the
inter-patient spread in performances for all vessels ( Fig. 6 A) and
for intracranial vessels ( Fig. 6 B). The error maps of segmentation
of two example patients, one from each of the two datasets, are
shown in Fig. 7 for all vessels and for intracranial vessels.
4. Discussion
To the best of our knowledge, this is the first work to present
generative adversarial network models that generate realistic 3D
TOF-MRA volumes along with segmentation labels in medical
imaging. We showed that utilizing mixed precision aids in achiev-
6
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Fig. 6. Segmentation performance (DSC and bAVD) of 3D U-Net models trained with 4 different generated data and PEGASUS training data on the 2 datasets PEGASUS and
10 0 0Plus of A) all vessels B) intracranial vessels. The horizontal line of the box-whisker plots indicates the median, the box indicates the interquartile range and the whiskers
the minimum and maximum.
ing the highest image quality of synthetic data. Additionally, the
synthetic data from our complex model maintained a substan-
tial amount of predictive properties of the original volumes re-
flected by the good segmentation performance on real test data.
These findings also held true on a second, independent dataset.
The results showcase the potential of utilizing memory efficiency
provided by mixed precision in designing a complex architecture.
Increasing the complexity is required in order to generate high-
resolution fine grained structures such as brain vessels in 3D TOF-
MRA volumes from noise along with the corresponding segmenta-
tion labels. This work sets an important step towards sharing la-
beled 3D medical images that would facilitate better research in
the medical imaging domain.
The segmentation performance of the 3D U-Net trained on syn-
thetic data from our complex mixed precision model, c-SN-MP,
showed the best performance compared to the other models based
on synthetic data in terms of both metrics DSC and bAVD. Here,
doubling the filters per layer in the GAN architecture is likely to
have helped to capture the vessel structure in the training data.
Also, visually it can be seen in Fig. 4 that c-SN-MP labels ( Fig. 4 D)
look connected and most similar to real vessel structures ( Fig. 4 E).
On the contrary, the labels of synthetic data from the simpler
mixed precision model, SN-MP ( Fig. 4 C), are sparsely connected
which explains the worst performance in terms of the DSC of the
U-Net trained on SN-MP data. This seems plausible as the vessels
are more relevant for segmentation than the background. The same
can be observed in patch-label pairs from our most basic model,
GP. Here, the segmentation performance was better than SN-MP in
terms of DSC even though visually Fig. 4 A shows that patch quality
of GP is not as sharp as the other generated images.
In terms of quantitative measures of patch quality, the FID
scores and PRD curves, the mixed precision models, both simple
and complex, were rated to be of much better quality and variety
when compared to models not using mixed precision (GP and SN
models). However, the U-Net trained with the simpler mixed pre-
cision model, SN-MP, patch-label pairs had the lowest segmenta-
tion performance. A possible reason for this could be that FID and
PRD curves, which are based on the features extracted only from
the patches, might focus not only on the vessel structure but also
the quality of the background. In contrast, the U-Net performance
is more focused on recognizing the vessel structure. This is con-
firmed when looking at Fig. 4 C where the patches seem realistic,
but the vessel structures look disconnected. We see the reverse of
this in the case of GP, where the patches look less realistic, but the
vessel structures look more connected. This could explain why the
GP model fared poorly in FID and PRD curves and yet did well in
segmentation when used to train a U-Net. Looking more closely at
the PRD curves ( Fig. 5 ), the simpler mixed precision model, SN-MP,
patches had good quality at lower recall values, while patches from
our complex mixed precision model, c-SN-MP, had better quality
when the recall values increased. This implies that c-SN-MP is ca-
pable of generating patches of slightly reduced quality but with
higher variety, and thus, is better at handling mode collapse which
is indicated by recall. While the FID and PRD curve provide in-
sights regarding the image quality and variety, these metrics do
not necessarily align with the performance in the vessel segmen-
tation task. This emphasizes the importance of generating labels
along with the image to determine the best generated data for the
specific use case.
Overall, the FID and PRD curves indicated that more regular-
izations have a positive effect on the image quality and variety.
The mixed precision models, SN-MP and c-SN-MP are the best per-
forming models in terms of these metrics. They are both regular-
ized with gradient penalty ( Gulrajani et al., 2017 ) and spectral nor-
malization ( Miyato et al., 2018 ). These methods have been individ-
ually proposed to bound the critic by ensuring Lipschitz continuity
which has been found to stabilize GAN training. Gradient penalty
does this by applying a gradient based constraint to the objec-
tive function of the critic. With spectral normalization, the critic is
bound by directly constraining its weight matrices by normalizing
them with their spectral norm. Using the two methods together
was proposed to be beneficial in the study that introduced spec-
tral normalization in GANs ( Miyato et al., 2018 ) and using them to-
gether has been shown to improve performance in another study
( Kossen et al., 2021 ). In addition to these methods, we also used
mixed precision for memory efficiency in the case of SN-MP and
c-SN-MP models. Mixed precision has been found to act as yet an-
other form of regularization ( Micikevicius et al., 2018 ). Unlike FID
and PRD curves, the segmentation performance does not always
benefit from synthetic data generated by more regularized mod-
els. When looking at the test DSC and bAVD of the U-Nets trained
on synthetic data from the simpler models, GP, SN and SN-MP, it
7
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Fig. 7. Segmentation error map of an example patient each from PEGASUS test set and 10 0 0Plus test set for all vessels and for intracranial vessels. Top to bottom maps
from 3D U-Net model trained on: A. GP synthetic data B. SN synthetic data C. SN-MP synthetic data D. c-SN-MP synthetic data E. real data. True positives are shown in red,
false positives are in green and false negatives in yellow. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this
article.)
is difficult to rank the overall performance due to the varied dif-
ferences in the two segmentation performance metrics across the
two test sets. One possible explanation for the differences might
be that regularization might positively impact the patches but not
necessarily the binary segmentation labels that are much simpler
to generate. To draw conclusions on how regularizations in GANs
affect the two segmentation metrics and the generalizability to the
additional set, a more systematic analysis would be required in
further research. For the c-SN-MP model, we increased the model
complexity which could better utilize the multiple regularizations
and thus showed good segmentation performance as well as good
image quality. An additional argument in favor of multiple reg-
ularizations is that it has been found to make models less vul-
nerable towards membership inference attacks (
Truex et al., 2019;
Chen et al., 2020 ). Such attacks are used by malicious parties to
find out if a particular patient’s data was used to train a model
( Shokri et al., 2017 ). This is crucial to consider when sharing the
synthetic data or the generator model. While regularization has
been found useful to mitigate some attacks, applying differential
privacy (DP) ( Dwork and Roth, 2014 ) to the training process, by
construction, puts an upper bound on the privacy leakage of the
training data. DP is challenging to implement especially in a 3D
GAN architecture, as it introduces a substantial number of param-
eters to an already overwhelming amount of parameters. This leads
to high computational cost in terms of both memory and pro-
longed training time while reducing the test performance consider-
8
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
ably. To showcase this, we have included preliminary results with
DP using Rényi divergence ( Mironov, 2017 ) in 3D on a simplified
GAN architecture in the appendix.
The U-Net trained on real data still outperforms all those
trained on generated data. Here, Fig. 7 (All vessels) shows that
all U-Net models trained with synthetic patches also segment
blood vessels that are not brain blood vessels but rather vessels
in the face and neck area, i.e. false positives. In contrast, the U-Net
trained on real patches of the same size recognized if a vessel be-
longed to the brain and did not segment vessels outside the brain.
This highlights that while the GANs learned to segment blood ves-
sels, they did not learn to take the anatomical context of the ves-
sels into account, i.e. the resulting models could not differentiate
between face, neck and brain blood vessels. A possible explana-
tion for this could be that the loss function for GANs focuses on
the quality of the generated data and not the segmentation perfor-
mance of the generated patch-label pairs compared to real patch-
label pairs. A potential solution would be to generate labels with a
first GAN similar in distribution to the real ground truth segmenta-
tion labels, and then a second GAN could be used to generate the
corresponding image in inference mode with a 3D image-to-image
translation GAN architecture that was trained with real labels and
the corresponding images. However, training two 3D GANs sepa-
rately would increase the overall training time substantially and
was not feasible with our hardware infrastructure. We utilized an
alternative solution where we applied the test image brain mask
in a post-processing step leading to the removal of face and neck
vessels. This is a valid post-processing approach since many clini-
cal use cases only require segmentation of intracranial vessels. The
performance of the 3D U-Net trained with generated data then im-
proved, bringing it closer to the performance of the U-Net trained
on real data as shown in Table 3 B, Figs. 6 B, 7 (Intracranial vessels).
Generating 3D data is more complex and computationally ex-
pensive compared to 2D. Yet, the best performing U-Net model
trained on synthesized 3D data (DSC 0.841) is comparable to the
best performing U-Net model trained on synthesized 2D data (DSC
0.848) for the same use case of intracranial vessel segmentation
( Kossen et al., 2021 ). Here, the number of voxels that are gen-
erated is increased by a factor of 100 approximately. Meanwhile,
only quarter of the number of filters per layer were used for all
our non-complex 3D GAN models owing to memory limitations. In
order to double the number of filters per layer for our complex
model (c-SN-MP), we used mixed precision. Next, we also used
upsampling instead of convtranspose to alleviate the checkerboard
artifacts which increased the memory consumption substantially.
Additionally, the training of WGAN requires the discriminator to
be updated more often than the generator. Since this would lead
to much longer training times, the current work utilized the Two
Timescale Update Rule (TTUR). Here, the learning rate of the dis-
criminator is set to be higher than that of the generator. These
changes were crucial to cope with the special challenges of syn-
thesis in 3D. Even with these restrictions, a similar segmentation
performance of 3D in comparison with 2D underlines the impor-
tance of generating data in 3D to capture the contextual informa-
tion within the third dimension for this 3D use case. It is likely
that the segmentation performance of the U-Net trained with gen-
erated 3D data could surpass the performance of 2D data with
more computational capacity, when more filters can be utilized in
the 3D GAN architecture.
A different strategy with regards to data privacy is Federated
Learning (FL). Here, sharing of data is avoided by locally computing
updates for a global model that is then aggregated to be utilized by
the participating clients. The results thus far are promising. How-
ever, standard FL does not create new data that can be made pub-
lic for other research groups to access and improve model archi-
tectures. This is especially important in the case of rare patholo-
gies where the data is scarce. Here, GANs can be used to generate
data of such pathologies by research groups that have access to
the data which can then be made publicly available. Additionally,
there are technical and collaborative hurdles in FL such as picking
a model-aggregation policy, standardization of hardware and soft-
ware across multiple organizations among others ( Ng et al., 2021 ).
These challenges are more acute in the case of deep learning re-
search. The organizational and collaborative effort s involved might
not be feasible for research groups with limited resources. Since
synthetic data from GANs can be shared, it provides easy and eq-
uitable access to all research groups investigating deep learning in
medical imaging. FL, on the other hand, is more suitable for clin-
ical application of well-established architectures with distributed
training. It should be noted that both FL and GANs are suscepti-
ble to information leakage from the model weights even if the real
data itself is not shared. This makes both methods open to pri-
vacy threats ( Sheller et al., 2020; Chen et al., 2020 ). Here, DPGAN
has been found useful ( Xie et al., 2018 ). DP algorithms incorpo-
rate random noise into the model making them resilient towards
information leakage ( Shokri et al., 2017 ). FL and DPGANs could be
taken together to combine their strengths as was done in FedDP-
GAN ( Zhang et al., 2021 ). In our work, we focus on the challenges
of generating 3D medical imaging along with corresponding labels
since labeling generated images is time and labour intensive. This
is an important step before inclusion of DP into the GAN architec-
ture. We have provided preliminary results using DP on a simple
3DGAN architecture in the appendix.
The main limitations of our study are computational in nature.
First, we have not employed DP in the presented GAN architectures
which would provide an upper bound on the information leakage
when the generated data and/or generated model is shared. The
computational load resulting from applying DP would have made
the study unfeasible with the available computing infrastructure.
Second, we did not use more novel GAN architectures validated
on natural images such as Progressive GANs ( Karras et al., 2018 )
or Multi-Scale-Gradients GANs ( Karnewar and Wang, 2020 ). This is
because of the multi-fold computational requirements of these ar-
chitectures, especially in 3D. Patches of much smaller size could
still be generated ( Eklund, 2020 ), but they would not be very use-
ful for the downstream task of vessel segmentation. Third, we gen-
erated patch-labels pairs and not whole volume-label pairs due to
computational limitations. While a recently introduced hierarchical
memory-efficient approach ( Sun et al., 2021 ) might help to over-
come the computational constraints, this would come at the cost
of much longer training times considering 2 GANs of different res-
olutions are trained along with encoders in an end-to-end manner.
Additionally, architectures that use data reconstruction are more
susceptible to membership inference attack ( Chen et al., 2020 ).
Two of the recent studies ( Kwon et al., 2019; Sun et al., 2021 ) gen-
erating 3D images alone use encoders in their architectures which
make them less useful for the purpose of privacy-preserving data
sharing. Lastly, we trained and tested our GAN architectures on one
imaging modality, i.e. TOF-MRA. While we expect generalization
of our results to other modalities that may not be high contrast-
to-noise modalities like TOF-MRA, this should be verified in fu-
ture studies. For that, we encourage other researchers to utilize
our publicly available code. Our findings for TOF-MRA can be re-
garded as a first proof-of-concept that GAN architectures are able
to synthesize realistic looking 3D volumes with corresponding seg-
mentation labels.
5. Conclusion
In this study, we generated high resolution TOF-MRA patches
along with their corresponding labels in 3D employing mixed
precision for memory efficiency. Since most medical imaging is
9
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
recorded in 3D, generating 3D images that retain the volumetric
information together with labels that are time-intensive to gener-
ate manually is a first step towards sharing labeled data. While
our approach is not privacy-preserving yet, the architecture was
designed with privacy as a key aspiration. It would be possible to
extend it with differential privacy in future works once the compu-
tational advancements allow it. This would pave the way for shar-
ing privacy-preserving, labeled 3D imaging data. Research groups
could utilize our open source code to implement a mixed precision
approach to generate 3D synthetic volumes and labels efficiently
and verify if they hold the necessary predictive properties for the
specific downstream task. Making such synthetic data available on
request would then allow for larger heterogeneous datasets to be
used in the future alleviating the typical data shortages in this do-
main. This will pave the way for robust and replicable model de-
velopment and will facilitate clinical applications.
Declaration of Competing Interest
All authors have participated in (a) conception and design, or
analysis and interpretation of the data; (b) drafting the article or
revising it critically for important intellectual content; and (c) ap-
proval of the final version.
This manuscript has not been submitted to, nor is under review
at, another journal or other publishing venue.
The authors have no affiliation with any organization with a di-
rect or indirect financial interest in the subject matter discussed in
the manuscript
The following authors have affiliations with organizations with
direct or indirect financial interest in the subject matter discussed
in the manuscript:
None of the authors have direct or indirect financial interest in
the subject matter discussed in the manuscript.
However, the following disclosures unrelated to the current
work is as follows:
Pooja Subramaniam reported receiving personal fees from
ai4medicine outside the submitted work. Tabea Kossen reported
receiving personal fees from ai4medicine outside the submitted
work. Dr Madai reported receiving personal fees from ai4medicine
outside the submitted work. Adam Hilbert reported receiving per-
sonal fees from ai4medicine outside the submitted work. Dr Frey
reported receiving grants from the European Commission, reported
receiving personal fees from and holding an equity interest in
ai4medicine outside the submitted work. There is no connec-
tion, commercial exploitation, transfer or association between the
projects of ai4medicine and the results presented in this work.
While not related to this work, Dr Sobesky reports receipt of
speakers honoraria from Pfizer, Boehringer Ingelheim, and Daiichi
Sankyo. Furthermore, Dr Fiebach has received consulting and ad-
visory board fees from BioClinica, Cerevast, Artemida, Brainomix,
Biogen, BMS, EISAI, and Guerbet.
CRediT authorship contribution statement
Pooja Subramaniam: Conceptualization, Formal analysis, Inves-
tigation, Methodology, Software, Validation, Visualization, Writing
– original draft, Writing – review & editing. Tabea Kossen: Con-
ceptualization, Investigation, Methodology, Project administration,
Software, Supervision, Validation, Visualization, Writing – origi-
nal draft, Writing –review & editing. Kerstin Ritter: Supervision,
Writing –review & editing. Anja Hennemuth: Supervision, Writ-
ing –review & editing. Kristian Hildebrand: Supervision, Writ-
ing –review & editing. Adam Hilbert: Conceptualization, Writ-
ing –review & editing. Jan Sobesky: Data curation, Writing –re-
view & editing. Michelle Livne: Conceptualization, Writing –re-
view & editing. Ivana Galinovic: Data curation, Writing –review &
editing. Ahmed A. Khalil: Data curation, Writing –review & edit-
ing. Jochen B. Fiebach: Data curation, Writing –review & edit-
ing. Dietmar Frey: Conceptualization, Funding acquisition, Project
administration, Resources, Supervision, Writing –review & edit-
ing. Vince I. Madai: Conceptualization, Data curation, Investiga-
tion, Methodology, Project administration, Supervision, Visualiza-
tion, Writing – original draft, Writing –review & editing.
Acknowledgments
This work has received funding by the German Federal Ministry
of Education and Research through (1) the grant Centre for Stroke
Research Berlin and (2) a Go-Bio grant for the research group PRE-
DICTioN2020 (lead: DF). Grant number 031B0154 .
Appendix A. Data augmentation
An additional analysis to complement the evaluation of our syn-
thetic data is to use it to augment the training data for the down-
stream brain blood vessel segmentation task. Here we trained seg-
mentation models with PEGASUS training data and augmented it
with synthetic data from the 4 GAN models separately for ad-
ditional analysis. Table A.1 summarizes the segmentation results
for all vessels ( Table A.1 A) and intracranial vessels ( Table A.1 B).
Fig. A.1 is a box-whisker plot to visualize the spread in the seg-
Table A.1
The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2 different
datasets PEGASUS and 10 0 0Plus using model trained with real data along with generated data
used as data augmentation. The value in brackets is the standard deviation across patients. A) All
vessels is done on the entire prediction with the entire ground truth as reference, and B) Intracra-
nial vessels is done on skull-stripped prediction with skull-stripped ground truth as reference.
Data source PEGASUS 10 0 0Plus
Mean DSC Mean bAVD Mean DSC Mean bAVD
A) All vessels
Real + GP model 0.902 (0.046) 0.333 (0.151) 0.862 (0.029) 0.65 (0.271)
Real + SN model 0.906 (0.016) 0.385 (0.145) 0.878 (0.021) 0.558 (0.199)
Real + SN-MP model 0.903 (0.013) 0.359 (0.133) 0.883 (0.02) 0.511 (0.145)
Real + c-SN-MP model 0.907 (0.012) 0.399 (0.204) 0.891 (0.02) 0.564 (0.222)
Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)
B) Intracranial vessels
Real + GP model 0.897 (0.018) 0.323 (0.073) 0.855 (0.029) 0.626 (0.199)
Real + SN model 0.905 (0.017) 0.318 (0.075) 0.874 (0.022) 0.546 (0.148)
Real + SN-MP model 0.900 (0.017) 0.328 (0.078) 0.877 (0.02) 0.518 (0.121)
Real +
c-SN-MP model 0.905 (0.016) 0.306 (0.091) 0.884 (0.02) 0.557 (0.129)
Real 0.901 (0.019) 0.294 (0.077) 0.880 (0.024) 0.507 (0.126)
10
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Fig. A.1. Segmentation performance (DSC and bAVD) of 3D U-Net models trained with PEGASUS training data together with 4 different generated data as data augmentation
on the 2 datasets PEGASUS and 10 0 0Plus of A) all vessels B) intracranial vessels. The horizontal line of the box-whisker plots indicates the median, the box indicates the
interquartile range and the whiskers the minimum and maximum.
mentation performance between patients for all vessels ( Fig. A.1 A)
and intracranial vessels ( Fig. A.1 B).
Using synthetic data from c-SN-MP model to augment the
real data for training segmentation model provided slightly bet-
ter mean DSC on both test sets (PEGASUS and 10 0 0Plus) for the
two cases of A) all vessels and B) intracranial vessels when com-
pared to using only real data or using synthetic data from other
GAN models along with real data. While data augmentation is a
valid application of our synthetic data, the additional value from
them is limited as can be seen from the results. This could be be-
cause the predictive properties captured by the synthesized data is
similar to the real data. This was also the case in the study with
2D GAN ( Kossen et al., 2021 ) where data augmentation with 2D
generated data did not lead to substantial difference in the seg-
mentation performance.
Appendix B. 3D differentially private GAN
Differential privacy (DP) is a natural mitigation strategy against
membership inference threats. Using DP to synthesize data would
allow accounting of the level of possible re-identification thus pro-
viding privacy guarantees of the generated data. In order to illus-
Fig. B.1. Sets of samples of the mid-axial slice of the patch and label, and the corresponding 3D vessel structure from A) DPGAN ≈10
2 B) DPGAN
≈10
3 C) DPGAN
≈10
6
D) real. Note that lower the
higher the privacy. The visualizations were obtained using ITK-SNAP for illustrative purposes only.
11
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Table B.1
The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2
different datasets PEGASUS and 10 0 0Plus using model trained with generated data from
3D DPGAN with different values - starting from low value indicating high privacy to
the high
value indicating low privacy. The value in brackets is the standard deviation
across patients. A) All vessels is done on the entire prediction with the entire ground
truth as reference, and B) Intracranial vessels is done on skull-stripped prediction with
skull-stripped ground truth as reference.
Data source PEGASUS 10 0 0Plus
Mean DSC Mean bAVD Mean DSC Mean bAVD
A) All vessels
DPGAN
≈10
2 0.085 (0.012) 4.509 (0.903) 0.083 (0.016) 4.116 (0.743)
DPGAN
≈10
3 0.562 (0.050) 6.307 (2.406) 0.567 (0.041) 4.624 (1.608)
DPGAN
≈10
6 0.581 (0.048) 5.05 (2.267) 0.568 (0.041) 3.962 (1.591)
Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)
B) Intracranial vessels
DPGAN
≈10
2 0.081 (0.013) 4.77 (1.081) 0.077 (0.015) 4.595 (0.878)
DPGAN
≈10
3 0.586 (0.045) 3.141 (0.514) 0.569 (0.045) 2.698 (0.604)
DPGAN
≈10
6 0.604 (0.048) 2.201 (0.413) 0.572 (0.048) 2.001 (0.445)
Real 0.901 (0.019) 0.294 (0.077) 0.88 (0.024) 0.507 (0.126)
Fig. B.2. Segmentation error map of an example patient each from PEGASUS test set and 10 0 0Plus test set for all vessels and for intracranial vessels. Top to bottom maps
from 3D U-Net model trained on: A. DPGAN
≈10
2
B. DPGAN
≈10
3
C. DPGAN
≈10
6
D. real data. True positives are shown in red, false positives are in green and false
negatives in yellow. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
12
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
trate this, we utilized the Opacus package from PyTorch to apply
DP-SGD algorithm with Rényi divergence on a 3D adapted version
of the WGAN ( Arjovsky et al., 2017 ). Henceforth, we refer to this
model architecture as 3D DPGAN. Here, we clipped the weights
of the critic with a clipping parameter of 0.01. We had to halve
the number of filters per layer in both the critic and the genera-
tor in order to be able to train the 3D DPGAN within our compu-
tational infrastructure. We trained the 3D WGAN with Rényi dif-
ferential privacy accountant which translates to ( , δ)-DP guaran-
tees. The ( , δ) pairs quantify the privacy properties of DP-SGD.
is the measure of privacy loss at a differential change in data
with δprobability that the privacy constraint of does not hold
true. A smaller value leads to better privacy. For comparing syn-
thetic data with different privacy guarantees, the noise multiplier
values were set to different values [0.1, 0.3, 0.5] which each provide
values [ ≈10
6
, 10
3
, 10
2
] respectively. It should be noted that since
the training samples consist of 3D patch-label pairs from the TOF-
MRA image-segmentation label pairs, the guarantees showcased
here also pertain to the patch-label pair data rather than the whole
TOF-MRA image of a patient. We set δto the inverse of the number
of training samples following convention ( Torkzadehmahani et al.,
2019 ). The maximum gradient norm value of 1 was applied for
clipping gradients. Adam optimizer with a learning rate of 0.0 0 01
for both the critic and the generator was used instead of the TTUR
method as the training time was reasonable with 5 updates of
critic for every update of the generator. All the GANs were trained
for 100 epochs. The threshold of 0.7 was applied on the generated
labels from DPGAN ≈10
6
and 0.6 for DPGAN ≈10
2
and ≈10
3
chosen based on the segmentation performance on the validation
set. The code for the same is also made available in the GitHub
repository that has already been provided.
The generated patch-label pairs and the 3D vessel structure
synthesized with different values are shown in Fig. B.1 along
with the real patch-label pairs for a qualitative comparison. With
decreasing values the generated data quality reduces. In other
words, higher privacy guarantees come with lower quality. This
is also supported quantitatively with lower segmentation perfor-
mance of those U-Nets trained with generated data from lower
DPGAN and vice-versa. Table B.1 shows the results of the test
segmentation performance on 2 datasets trained with generated
patch-label pairs from DPGANs with different values for A) all
vessels and B) intracranial vessels only. Synthetic data used from
DPGAN with ≈10
6 has the best performance in the case of
all vessels (mean DSC 0.581) and in the case of intracranial ves-
sels (mean DSC 0.604; mean bAVD 2.201). bAVD of U-Net trained
with synthetic data from DPGAN with ≈10
2 is unexpectedly
lower (mean bAVD 4.509) than that trained with ≈10
6 (mean
bAVD 5.05). This is because the metric bAVD penalizes false posi-
tives more than false negatives. This explanation is corroborated in
Fig. B.2 which visualizes the error masks of segmentation of two
example patients, one from each of the two datasets for all vessels
and intracranial vessels. Fig. B.2 A (PEGASUS) - All vessels shows
the segmentation error maps from U-Net trained on synthetic data
from the highest privacy guarantee of ≈10
2
. The network misses
almost all the vessels and yet the bAVD is lower than bAVD of U-
Net trained on data from DPGAN ≈10
6 ( Fig. B.2 C (PEGASUS) -
All vessels) which has far less false negatives but relatively more
false positives owing to vessels from neck and face area. This is
further confirmed when these vessels are removed for analysis by
the post-process skull-stripping of the labels. Then, the bAVD of
U-Net trained with DPGAN ≈10
6 (mean bAVD 2.201) improves
much more than that of U-Net trained with DPGAN ≈10
2 (mean
bAVD 4.77).
Our results for the 3D DPGAN show that the generated data
with the largest epsilon ≈10
6 yielded the best performance
(mean DSC 0.604). While this model provided an upper bound of
privacy, it should be noted that ≈10
6 is a very large value and
the resulting privacy bounds are thus too loose. Moreover, the per-
formance of our DPGAN with ≈10
6
is quite low compared to the
performance of our generated data without any privacy guarantees
(mean DSC 0.841). Therefore, we conclude that finding the right
balance between privacy and utility remains a challenge for differ-
ential privacy to be used even in a very simple 3D GAN architec-
ture.
Supplementary material
E-supplementary data of this work can be found in online ver-
sion of the paper.
Supplementary material associated with this article can be
found, in the online version, at doi: 10.1016/j.media.2022.102396 .
References
Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein GAN. arXiv:1701.07875 [cs,
stat] .
Aydin, O. U., Taha, A . A ., Hilbert, A., Khalil, A. A., Galinovic, I., Fiebach, J. B., Frey,
D., Madai, V. I., 2021. An evaluation of performance measures for arterial brain
vessel segmentation. Accepted for publication
Aydin, O.U., Taha, A .A ., Hilbert, A ., Khalil, A .A ., Galinovic, I., Fiebach, J.B., Frey, D.,
Madai, V.I., 2021. On the usage of average Hausdorff distance for segmentation
performance assessment: hidden error when used for ranking. Eur. Radiol. Exp.
5 (1), 4. doi: 10.1186/s41747- 020- 0 020 0-2 .
Baur, C., Albarqouni, S., Navab, N., 2018. Generating highly realistic images of skin
lesions with GANs. arXiv:1809.01410 [cs, eess] .
Bermudez, C., Plassard, A.J., Davis, T.L., Newton, A.T., Resnick, S.M., Landman, B.A.,
2018. Learning implicit brain MRI manifolds with deep learning. Proc SPIE Int.
Soc. Opt. Eng. 10574. doi: 10.1117/12.2293515 .
Chen, D., Yu, N., Zhang, Y., Fritz, M., 2020. GAN-leaks: a taxonomy of mem-
bership inference attacks against generative models. arXiv:1909.03935 [cs] .
10.1145/3372297.3417238
Chen, S., Ma, K., Zheng, Y., 2019. Med3D: transfer learning for 3D medical image
analysis. arXiv:1904.00625 [cs] .
Clinical Practice Committee, 20 0 0. Informed consent for medical photographs.
Dysmorphology subcommittee of the clinical practice committee, ameri-
can college of medical genetics. Genet. Med. 2 (6), 353–355. doi: 10.1097/
0 0125817-20 0 0110 0 0-0 0 010 .
Dwork, C., Roth, A., 2014. The algorithmic foundations of differential privacy. Foun-
dations Trends Theor. Comput. Sci. 9 (3–4), 211–407. doi: 10.1561/040 0 0 0 0 042 .
Eklund, A., 2020. Feeding the zombies: synthesizing brain volumes using a 3D pro-
gressive growing GAN. arXiv:1912.05357 [cs, eess] .
Foroozandeh, M., Eklund, A., 2020. Synthesizing brain tumor images and an-
notations by combining progressive growing GAN and SPADE. arXiv:2009.
05946
[cs] version: 1.
Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H., 2018.
GAN-based synthetic medical image augmentation for increased CNN perfor-
mance in liver lesion classification. Neurocomputing 321. doi: 10.1016/j.neucom.
2018.09.013 .
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,
Courville, A., Bengio, Y., 2014. Generative adversarial networks. arXiv:1406.
2661 [cs, stat] .
Greenspan, H., van Ginneken, B., Summers, R.M., 2016. Guest editorial deep learning
in medical imaging: overview and future promise of an exciting new technique.
IEEE Trans. Med. Imaging 35 (5), 1153–1159. doi: 10.1109/TMI.2016.2553401 .
Guibas, J. T., Virdi, T.
S., Li, P. S., 2018. Synthetic medical images from dual generative
adversarial networks. arXiv:1709.01872 [cs] version: 3.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A., 2017. Improved
training of wasserstein GANs. arXiv:1704.0 0 028 [cs, stat] .
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S., 2018. GANs
trained by a two time-scale update rule converge to a local nash equilibrium.
arXiv:1706.08500 [cs, stat] .
Hilbert, A., Madai, V.I., Akay, E.M., Aydin, O.U., Behland, J., Sobesky, J., Galinovic, I.,
Khalil, A .A ., Taha, A .A ., Würfel, J., Dusek, P., Niendorf, T., Fiebach, J.B.,
Frey, D., Livne, M., 2020. BRAVE-NET: fully automated arterial brain vessel
segmentation in patients with cerebrovascular disease. Neurology doi: 10.1101/
2020.04.08.20057570 . preprint
Hotter, B., Pittl, S., Ebinger, M., Oepen, G., Jegzentis, K., Kudo, K., Rozanski, M.,
Schmidt, W., Brunecker, P., Xu, C., Martus, P., Endres, M., Jungehülsing, G., Vill-
ringer, A., Fiebach, J., 2009. Prospective study on the mismatch concept in acute
stroke patients within the first 24 h after symptom onset - 10 0 0Plus study. BMC
Neurol. 9, 60. doi: 10.1186/1471- 2377- 9- 60 .
Karnewar, A., Wang, O., 2020. MSG-GAN: multi-scale gradients for generative adver-
sarial networks. arXiv:1903.06048 [cs, stat]
.
Karras, T., Aila, T., Laine, S., Lehtinen, J., 2018. Progressive growing of GANs for im-
proved quality, stability, and variation. arXiv:1710.10196 [cs, stat] .
Kingma, D. , Ba, J. , 2014. Adam: a method for stochastic optimization. In: Interna-
tional Conference on Learning Representations .
13
P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396
Kossen, T., Subramaniam, P., Madai, V.I., Hennemuth, A., Hildebrand, K., Hilbert, A.,
Sobesky, J., Livne, M., Galinovic, I., Khalil, A .A ., Fiebach, J.B., Frey, D., 2021. Syn-
thesizing anonymized and labeled TOF-MRA patches for brain vessel segmen-
tation using generative adversarial networks. Comput. Biol. Med. 131, 104254.
doi: 10.1016/j.compbiomed.2021.104254 .
Kwon, G., Han, C., Kim, D., 2019. Generation of 3D brain MRI using auto-encoding
generative adversarial networks. MICCAI doi: 10.1007/978- 3- 030- 32248- 9 _ 14 .
Livne, M., Rieger, J., Aydin, O.U., Taha, A .A ., Akay, E.M., Kossen, T., Sobesky, J., Kelle-
her, J.D., Hildebrand, K., Frey, D.,
Madai, V.I., 2019. A U-Net deep learning frame-
work for high performance vessel segmentation in patients with cerebrovascu-
lar disease. Front. Neurosci. 13. doi: 10.3389/fnins.2019.0 0 097 .
Lundervold, A.S., Lundervold, A., 2019. An overview of deep learning in medical
imaging focusing on MRI. Zeitschrift für Medizinische Physik 29 (2), 102–127.
doi: 10.1016/j.zemedi.2018.11.002 .
Masoudi, S., Harmon, S.A .A ., Mehralivand, S., Walker, S.M., Raviprakash, H., Bagci, U.,
Choyke, P.L., Turkbey, B., 2021. Quick guide on radiology image pre-processing
for deep learning applications in prostate cancer research. J. Med. Imaging 8 (1),
010901. doi: 10.1117/1.JMI.8.1.010901 .
Micikevicius, P., Narang, S., Alben,
J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B.,
Houston, M., Kuchaiev, O., Venkatesh, G., Wu, H., 2018. Mixed precision training.
arXiv:1710.03740 [cs, stat] .
Mironov, I., 2017. Renyi differential privacy. In: 2017 IEEE 30th Computer Security
Foundations Symposium (CSF), pp. 263–275. doi: 10.1109/CSF.2017.11 .
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y., 2018. Spectral normalization for gen-
erative adversarial networks. arXiv:1802.05957 [cs, stat] .
Mutke, M.A., Madai, V.I., von Samson-Himmelstjerna, F.C., Zaro Weber, O., Re-
vankar, G.S., Martin, S.Z., Stengl, K.L., Bauer, M., Hetzer, S., Günther, M.,
Sobesky, J., 2014. Clinical evaluation of an arterial-spin-labeling product se-
quence in steno-occlusive
disease of the brain. PLoS ONE 9 (2), e87143. doi: 10.
1371/journal.pone.0087143 .
Neff, T., Payer, C., ˚
Atern, D., Urschler, M., 2018. Generative adversarial networks to
synthetically augment data for deep learning based image segmentation. In:
Proceedings of the OAGM Workshop 2018 doi: 10.3217/978- 3- 85125- 603- 1- 07 .
Ng, D., Lan, X., Yao, M.M.-S., Chan, W.P., Feng, M., 2021. Federated learning: a col-
laborative effort to achieve better medical imaging models for individual sites
that have small labelled datasets. Quant. Imaging Med. Surg. 11 (2), 852–857.
doi: 10.21037/qims- 20- 595 .
Sajjadi, M.S.M. , Bachem, O. , Lucic, M. , Bousquet, O. , Gelly, S. , 2018. Assessing gener-
ative models via precision and recall. In: Proceedings of the 32nd International
Conference on Neural Information Processing Systems. Curran Associates Inc.,
Red Hook, NY, USA, pp. 5234–5243 .
Sheller, M.J., Edwards, B., Reina, G.A., Martin, J., Pati, S., Kotrotsou, A., Milchenko, M.,
Xu, W., Marcus, D., Colen, R.R., Bakas, S., 2020. Federated learning in medicine:
facilitating multi-institutional collaborations without sharing patient data. Sci.
Rep. 10 (1), 12598. doi: 10.1038/s41598- 020- 69250- 1 .
Shokri, R., Stronati, M., Song, C., Shmatikov, V., 2017. Membership inference attacks
against machine learning models. arXiv:1610.05820 [cs, stat] .
Sun, L., Chen, J., Xu, Y., Gong, M., Yu, K., Batmanghelich, K., 2021. Hierarchical
amortized training for memory-efficient high resolution 3D GAN. arXiv:2008.
01910 [cs, eess] .
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna,
Z., 2016. Rethinking the in-
ception architecture for computer vision. In: 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) doi: 10.1109/CVPR.2016.308 .
Taha, A .A ., Hanbury, A ., 2015. Metrics for evaluating 3D medical image seg-
mentation: analysis, selection, and tool. BMC Med. Imaging 15. doi: 10.1186/
s12880- 015- 0068- x .
Torkzadehmahani, R., Kairouz, P., Paten, B., 2019. DP-CGAN: differentially pri-
vate synthetic data and label generation. pp. 0–0 https://openaccess.
thecvf.com/content _ CVPRW _ 2019/html/CV-COPS/Torkzadehmahani _ DP-CGAN _
Differentially _ Private _ Synthetic _ Data _ and _ Label _ Generation _ CVPRW _ 2019 _
paper.html .
Truex, S., Liu, L., Gursoy, M. E., Yu, L., Wei, W., 2019. Towards demystifying member-
ship inference attacks. arXiv:1807.09173 [cs] .
Valizadeh, S.A., Liem, F., Mérillat, S., Hänggi, J., Jäncke, L., 2018. Identification of in-
dividual subjects on the basis of their brain anatomical features. Sci. Rep. 8 (1),
5611. doi: 10.1038/s41598-
018- 23696- 6 .
Wachinger, C., Golland, P., Kremen, W., Fischl, B., Reuter, M., Alzheimer’s Disease
Neuroimaging Initiative, 2015. BrainPrint: a discriminative characterization of
brain morphology. Neuroimage 109, 232–248. doi: 10.1016/j.neuroimage.2015.01.
032 .
Willemink, M.J., Koszek, W.A., Hardell, C., Wu, J., Fleischmann, D., Harvey, H., Fo-
lio, L.R., Summers, R.M., Rubin, D.L., Lungren, M.P., 2020. Preparing medical
imaging data for machine learning. Radiology 295 (1), 4–15. doi: 10.1148/radiol.
2020192224 .
Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J., 2018. Differentially private generative
adversarial network. arXiv:1802.06739 [cs, stat] .
Yi, X., Walia, E., Babyn, P., 2019. Generative adversarial network in
medical imaging:
a review. Med. Image Anal. 58, 101552. doi: 10.1016/j.media.2019.101552 .
Zhang, L., Shen, B., Barnawi, A., Xi, S., Kumar, N., Wu, Y., 2021. FedDPGAN: federated
differentially private generative adversarial networks framework for the detec-
tion of COVID-19 pneumonia. Inf. Syst. Front. doi: 10.1007/s10796- 021- 10144- 6 .
14
6
Toward Sharing Brain Images:
Differentially Private TOF-MRA
Images With Segmentation Labels
Using Generative Adversarial Networks
6.1 Context Within Thesis
Synthetic images generated by GANs are not necessarily private. The images could still be
vulnerable to membership inference attacks leaking information about the training images.
Introducing differential privacy has been shown to reduce the vulnerability of GANs to these
attacks. By inserting carefully calibrated noise into the training of the discriminator, we can
put an upper bound on the individual privacy leakage.
The present work addresses the limitations of Chapters 4 and 5. Both studies successfully
synthesized realistic-looking TOF-MRA patches with their corresponding segmentation labels.
However, the generated images in Chapter 4 did not implement differential privacy, and the
privacy-preserving 3D images in Chapter 5 were not usable. The poor utility could be ascribed
to either low privacy guarantees or low image quality, both resulting from the high memory
demand of differential privacy, which did not allow for more complex networks.
The work in this chapter reduced the computational demand by generating 2D instead
of 3D image-label pairs. This allowed for the introduction of differential privacy in the
discriminator’s training while still maintaining sufficient complexity in the generator to
synthesize realistic images. We then explored the privacy-utility trade-off for the use-case of
brain vessel segmentation and identified an upper privacy bound for which the segmentation
became unstable and not usable anymore.
53
6. Toward Sharing Brain Images: Differentially Private TOF-MRA Images With
Segmentation Labels Using Generative Adversarial Networks
6.2 Journal Article
This chapter is based on the following publication that was published in Frontiers in Artificial
Intelligence:
T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand,
S. Pokutta, K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach,
and D. Frey. “Toward Sharing Brain Images: Differentially Private TOF-MRA Images
With Segmentation Labels Using Generative Adversarial Networks”. In: Frontiers in
Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.813842
The original journal article is reprinted with permission of Frontiers. The article is open access
under the CC BY license.
Author Contribution
The first author Tabea Kossen conceptualized the study and interpreted the results together
with VIM, FB, AH, KH and DF. She implemented the GAN architecture and evaluations.
Additionally, she was responsible for the project administration, wrote the first version of the
manuscript, created the figures and coordinated the journal submission process.
Code Availability
The code for this project is publicly available:
https://github.com/prediction2020/Labe
led-TOF-MRA-with-DP.
54
ORIGINAL RESEARCH
published: 02 May 2022
doi: 10.3389/frai.2022.813842
Frontiers in Artificial Intelligence | www.frontiersin.org 1May 2022 | Volume 5 | Article 813842
Edited by:
Naimul Khan,
Ryerson University, Canada
Reviewed by:
Alessandro Bria,
University of Cassino, Italy
Zeeshan Ahmad,
Ryerson University, Canada
*Correspondence:
Tabea Kossen
Specialty section:
This article was submitted to
Medicine and Public Health,
a section of the journal
Frontiers in Artificial Intelligence
Received: 12 November 2021
Accepted: 31 March 2022
Published: 02 May 2022
Citation:
Kossen T, Hirzel MA, Madai VI,
Boenisch F, Hennemuth A,
Hildebrand K, Pokutta S, Sharma K,
Hilbert A, Sobesky J, Galinovic I,
Khalil AA, Fiebach JB and Frey D
(2022) Toward Sharing Brain Images:
Differentially Private TOF-MRA Images
With Segmentation Labels Using
Generative Adversarial Networks.
Front. Artif. Intell. 5:813842.
doi: 10.3389/frai.2022.813842
Toward Sharing Brain Images:
Differentially Private TOF-MRA
Images With Segmentation Labels
Using Generative Adversarial
Networks
Tabea Kossen 1,2*, Manuel A. Hirzel1, Vince I. Madai1,3,4, Franziska Boenisch 5,
Anja Hennemuth2,6,7, Kristian Hildebrand8, Sebastian Pokutta9,10, Kartikey Sharma9,
Adam Hilbert1, Jan Sobesky11,12, Ivana Galinovic 12, Ahmed A. Khalil12,13,14,
Jochen B. Fiebach12 and Dietmar Frey1
1CLAIM-Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany, 2Department of Computer
Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany, 3QUEST
Center for Responsible Research, Berlin Institute of Health (BIH), Charité-Universitätsmedizin Berlin, Berlin, Germany,
4Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham
City University, Birmingham, United Kingdom, 5Fraunhofer AISEC, Berlin, Germany, 6Institute for Imaging Science and
Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany, 7Fraunhofer
MEVIS, Bremen, Germany, 8Department VI Computer Science and Media, Berlin University of Applied Sciences and
Technology, Berlin, Germany, 9Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Berlin, Germany,
10 Institute of Mathematics, Technical University Berlin, Berlin, Germany, 11 Johanna-Etienne-Hospital, Neuss, Germany,
12 Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany, 13 Department of Neurology, Max
Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, 14 Mind, Brain, Body Institute, Berlin School of
Mind and Brain, Humboldt-Universität Berlin, Berlin, Germany
Sharing labeled data is crucial to acquire large datasets for various Deep Learning
applications. In medical imaging, this is often not feasible due to privacy regulations.
Whereas anonymization would be a solution, standard techniques have been shown to
be partially reversible. Here, synthetic data using a Generative Adversarial Network (GAN)
with differential privacy guarantees could be a solution to ensure the patient’s privacy
while maintaining the predictive properties of the data. In this study, we implemented a
Wasserstein GAN (WGAN) with and without differential privacy guarantees to generate
privacy-preserving labeled Time-of-Flight Magnetic Resonance Angiography (TOF-MRA)
image patches for brain vessel segmentation. The synthesized image-label pairs were
used to train a U-net which was evaluated in terms of the segmentation performance
on real patient images from two different datasets. Additionally, the Fréchet Inception
Distance (FID) was calculated between the generated images and the real images to
assess their similarity. During the evaluation using the U-Net and the FID, we explored
the effect of different levels of privacy which was represented by the parameter ǫ. With
stricter privacy guarantees, the segmentation performance and the similarity to the real
patient images in terms of FID decreased. Our best segmentation model, trained on
synthetic and private data, achieved a Dice Similarity Coefficient (DSC) of 0.75 for ǫ=7.4
compared to 0.84 for ǫ= ∞ in a brain vessel segmentation paradigm (DSC of 0.69 and
0.88 on the second test set, respectively). We identified a threshold of ǫ < 5 for which the
Kossen et al. Labeled TOF-MRA With Differential Privacy
performance (DSC <0.61) became unstable and not usable. Our synthesized labeled
TOF-MRA images with strict privacy guarantees retained predictive properties necessary
for segmenting the brain vessels. Although further research is warranted regarding
generalizability to other imaging modalities and performance improvement, our results
mark an encouraging first step for privacy-preserving data sharing in medical imaging.
Keywords: brain vessel segmentation, differential privacy, Generative Adversarial Networks, neuroimaging,
privacy preservation
1. INTRODUCTION
Deep Learning techniques are on the rise in many neuroimaging
applications (Lundervold and Lundervold, 2019; Zhu et al., 2019;
Hilbert et al., 2020). While showing great potential, they also
demand large amounts of data. In medical imaging, data is often
limited and medical experts are often needed to manually label
the images (Willemink et al., 2020). Thus, large datasets are
difficult to acquire. One potential solution would be data sharing.
For this, true anonymization, i.e. verifying that no identifying
information is leaked, is essential to sustain the patient’s privacy
which poses a big challenge, especially for neuroimaging (Bannier
et al., 2021). For example, face-recognition software has recently
identified individuals on medical images (Schwarz et al., 2019)
and even face removal techniques can be partially reversed
(Abramian and Eklund, 2019). Besides that, the brain itself has
a unique structure and cortical foldings can be utilized to identify
individuals even in the developing stage (Duan et al., 2020).
Consequently, it is highly challenging to truly anonymize brain
scans without risking re-identification. A promising remedy is
the generation of synthetic data.
For this purpose, Generative Adversarial Networks (GANs)
have gained a lot of attention in the past years (Yi et al., 2019).
This also holds true for the neuroimaging domain. Here, GANs
have shown promising results for synthesized images for different
types of imaging (Bowles et al., 2018; Foroozandeh and Eklund,
2020; Kossen et al., 2021) as well as for other medical problems
such as segmentation (Cirillo et al., 2020). To ensure the privacy
of the training data, GANs can be combined with differential
privacy (Xie et al., 2018). Differential privacy is a mathematical
framework that provides an upper bound on individual privacy
leakage (Dwork, 2008). This way the maximum privacy leakage
for every individual in the training data can be quantified. There
are extensive studies about GANs with differential privacy for
synthesizing natural images and tabular medical data (Xie et al.,
2018; Torkzadehmahani et al., 2019; Xu et al., 2019; Yoon et al.,
2019, 2020). Recently, Cheng et al. (2021) did a comprehensive
study about synthetic images and classification fairness with a
varying amount of privacy on various types of imaging data.
Among them were also 2D medical datasets such as chest x-
rays and melanoma images. Few other studies generated chest
x-rays with privacy guarantees as well (Nguyen et al., 2021; Zhang
et al., 2021). However, to date, no study has investigated whether
2D synthesized data using a GAN with differential privacy can
be utilized for a 3D medical application. Additionally, to the
best of our knowledge, GANs with differential privacy have
neither been used to synthesize labels for medical images nor the
neuroimaging domain yet.
In this study, we utilized a Wasserstein GAN (WGAN) with
and without differential privacy guarantees to synthesize
anonymously and labeled 2D Time-of-Flight Magnetic
Resonance Angiography (TOF-MRA) image patches for
brain vessel segmentation. The generated labeled image patches
were evaluated in terms of the segmentation performance by
training a U-Net and in terms of image quality using the Fréchet
Inception Distance (FID). The trained U-Net was further tested
on a second dataset. Overall, we investigated the effect of different
levels of privacy. Additionally, we visualized generated images
with and without privacy together with the real patient images
using t-distributed stochastic neighbor embedding (t-SNE).
In summary, our contributions are:
1. To the best of our knowledge, we are the first to
synthesize images with differential privacy guarantees in the
neuroimaging domain.
2. We also generate the corresponding segmentation labels to
evaluate the image-label pairs in an end-to-end brain vessel
segmentation paradigm on 3D medical data for different levels
of privacy.
3. For evaluation, we compare the distances between the
generated data and both the training and test data
to investigate the similarity of the synthesized to the
original data.
4. We visualize our generated images with and without
differential privacy and the original data using t-SNE.
2. RELATED STUDY
For the synthesis of medical images, deep generative models
have demonstrated promising results. Among them, especially
GANs and variational autoencoders (VAE) have shown good
performance in tasks such as data augmentation (Bowles
et al., 2018), image-to-image translations (Isola et al., 2018),
or reconstruction (Tudosiu et al., 2020). For the purpose
of synthesizing privacy-preserving images, VAE has two
disadvantages compared to GANs: First, they produce blurrier
images (Wang et al., 2020), and second, the training images are
directly fed into the network which makes them more vulnerable
to membership inference attacks (Chen et al., 2020).
Hence, in this context, GAN architectures with differential
privacy have been used in many previous studies to synthesize
non-medical images (Xie et al., 2018; Torkzadehmahani et al.,
Frontiers in Artificial Intelligence | www.frontiersin.org 2May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
2019; Xu et al., 2019) and medical tabular data (Yoon
et al., 2019, 2020). However, only few studies have applied
GANs with differential privacy to medical images. Additionally,
these were restricted to chest x-rays (Cheng et al., 2021; Nguyen
et al., 2021; Zhang et al., 2021). So far in the neuroimaging
domain, the application of GANs remained without differential
privacy (Bowles et al., 2018; Foroozandeh and Eklund, 2020;
Kossen et al., 2021).
In the present study, we propose a GAN architecture with
differential privacy in the neuroimaging domain. Along with
our synthesized images, we generate the segmentation labels for
testing our differentially private patches in an end-to-end brain
vessel segmentation paradigm.
3. MATERIALS AND METHODS
3.1. Data
In total, 131 patients with cerebrovascular disease from the
PEGASUS study (N =66) and the 1000Plus study (N =65) were
utilized in this study. All patients gave their written informed
consent and the studies have been authorized by the ethical
review committee of Charité–Universitätsmedizin Berlin. More
details on both datasets can be found in Mutke et al. (2014) for the
PEGASUS study and Hotter et al. (2009) for the 1000Plus study.
The brain scans were conducted on a clinical 3T whole-
body system (Magnetom Trio, Siemens Healthcare, Erlangen,
Germany) utilizing a 12-channel receive radiofrequency coil
(Siemens Healthcare) for head imaging. For both studies the
parameters were: voxel size =(0.5 x 0.5 x 0.7) mm3; matrix size:
312 x 384 x 127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50
min, flip angle =18◦.
The PEGASUS dataset was split into a training (41 patients),
validation (11 patients), and test (14 patients) set. The training
set was utilized for training the GANs (refer to Figure 1),
whereas the validation and test set were utilized for the
parameter selection of the U-Net and assessing the generalizable
performance of the U-Net, respectively. Additionally, the 65
patients from the 1000Plus dataset were used as a second test set.
For each patient of the training set 1,000 2D image patches and
corresponding segmentation masks of size 96x96 were extracted.
This patch size has been shown to be the most suitable patch
size for Wasserstein based GAN architectures for this use case
(Kossen et al., 2021). Due to the overemphasis of background
compared to brain vessels, 500 patches showing a vessel in the
center were extracted. The remaining 500 patches were extracted
randomly. It was verified that all patches were only selected at
most once.
3.2. Differential Privacy
To account for the level of privacy of the generated data and
provide theoretical privacy guarantees, differential privacy was
implemented (Dwork, 2008). A randomized algorithm f:d→
Rsatisfies (ǫ,δ)-differential privacy if for any two databases
d1,d2∈dthat differ from each other by a single sample, the
following holds:
Pr[f(d1)∈S]≤exp(ǫ)∗Pr[f(d2)∈S]+δ(1)
where f(d1) and f(d2) denote the output of fand Pr the
probabilities and with S⊂R.δis the probability that the value
of ǫholds true. With a probability of 1 −δthis equation is
equivalent to:
log Pr[f(d1)∈S]
Pr[f(d2)∈S]≤ǫ. (2)
Thus, differential privacy holds true if the algorithm’s output for
d1and d2is very similar to each other. In other words, one sample
should not have a big impact on the algorithm’s output. This way
the privacy of each possible datapoint is preserved. The maximal
deviation between the outputs is given by exp(ǫ). In this way, ǫ
can quantify the level of privacy with small values of ǫindicating
stricter privacy guarantees.
Mironov (2017) proposed Rényi differential privacy, a natural
relaxation of differential privacy built upon Rényi divergence.
Rényi divergence of order α > 1 of two probability distributions
Pand Qis defined as:
Dα(PkQ):=1
α−1log Ex∼QP(x)
Q(x)α
, (3)
where P(x) is the probability density of Pat point x. A
randomized algorithm f:d→Sis (α,ǫ)-Rényi differentially
private for any adjacent d1,d2∈dif the Rényi divergence Dα
is not larger than ǫ:
Dα(f(d1)kf(d2)) ≤ǫ. (4)
The advantage of Rényi differential privacy is that it provides
a tight composition for Gaussian mechanisms while preserving
essential properties of differential privacy. This means that (α,ǫ)-
Rényi differential privacy for composed mechanisms add up: the
composition of f(d1) satisfying (α,ǫ1)-Rényi differential privacy
and f(d2) satisfying (α,ǫ2)-Rényi differential privacy satisfies
(α,ǫ1+ǫ2)-Rényi differential privacy. Moreover, (α,ǫ)-Rényi
differential privacy has been shown to provide a tighter bound
on the privacy budget of compositions compared to (ǫ,δ)-
differential privacy (Mironov, 2017). (α,ǫ)-Rényi differential
privacy can also be translated back into (ǫ,δ)-differential privacy.
Balle et al. (2019) has proven that (α,ǫ)-Rényi differential privacy
also satisfies (ǫ′,δ)-differential privacy for any 0 < δ < 1.
According to Balle et al. (2019)ǫ′is then defined as:
ǫ′=ǫ+log α−1
α−log δ+log α
α−1. (5)
The most data sensitive part when training the proposed
GAN architecture is the gradient update of the discriminator
after training samples are presented. For that, the differentially
private stochastic gradient descent algorithm proposed by Abadi
et al. (2016) can be utilized. Here, differential privacy was
implemented by clipping these gradients and adding Gaussian
noise to avoid the memorization of single samples. Additionally,
Rényi differential privacy was then used to analyze the privacy
guarantees. In the last step, (α,ǫ)-Rényi differential privacy
is translated back to (ǫ,δ)-differential privacy. The parameter
Frontiers in Artificial Intelligence | www.frontiersin.org 3May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 1 | Study overview. Generative Adversarial Networks (GANs) with different levels of privacy guarantees are trained to synthesize labeled Time-of-Flight
Magnetic Resonance Angiography (TOF-MRA) patches. These are evaluated in a brain vessel segmentation paradigm and are compared to a segmentation network
trained on real patient image-label pairs. DP =Differential Privacy; DSC =Dice Similarity Coefficient.
δis typically chosen to be the inverse of the dataset size
(Torkzadehmahani et al., 2019). Thus, throughout this study, it
was set to 1/41, 000 =2.44e−5.
3.3. Network Architecture
The GAN architecture was based on the WGAN by Arjovsky
et al. (2017) and extended by inserting different amounts of noise
into the gradients of the discriminator in the training process
for differential privacy. Two neural networks were trained: the
generator Gand the discriminator D. The generator synthesized
data samples that were then assessed with respect to their realness
by the critic or discriminator. The discriminator was fed both
real and synthesized data and assigned a critic score for each
sample. The score of the synthetic data xgen was used to train the
generator. For the generator the overall training loss was:
lossG= −D(xgen). (6)
This way the generator aimed to maximize the realness of the
generated samples. In contrast to that, the discriminator intended
to minimize the scores for generated samples xgen and maximize
them for patient samples xreal:
lossD=D(xgen)−D(xreal) (7)
To enforce a Lipschitz constraint and, thus, put a bound on
the gradients, the discriminator’s weights were clipped after
each backpropagation step. This is a simple way to stabilize the
training (Arjovsky et al., 2017).
The architecture of the generator and discriminator is shown
in Figure 1. The generator took a noise vector sampled from
a Gaussian distribution of size 128 as input. This was then fed
through 1 linear layer and 6 upsampling convolutional layers as
shown in Figure 1. The generator outputs 2 96 x 96 images -
1 channel for the image and 1 for the segmentation label. The
discriminator’s input was 2 images: either the real patient image-
label pair or the generated one. These were then fed through
6 layers of downsampling convolutional layers as depicted in
Figure 1. The slope of the LeakyReLU activation was 0.2.
The GANs were implemented in PyTorch 1.8.1 using the
library opacus 0.14.0 for the differential privacy guarantees. Our
Frontiers in Artificial Intelligence | www.frontiersin.org 4May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
code was built upon the official GAN example by opacus1and
is publicly available2. The learning rate for both discriminator
and generator was 0.00005 using the RMSprop optimizer.
The kernel size was 4 with strides of 2. In each epoch, the
discriminator was updated 5 times. The network was trained
for 50 epochs. To randomly sample the training images, the
UniformWithReplacementSampler from the opacus package was
used. The sampling rate was the batch size of 32 divided
by the number of samples (41,000). The clipping parameter
for the WGAN was set to 0.01 and the clipping parameter
for the differential privacy was 1. In total, 8 different GANs
were trained with varying values of ǫ(noise multiplier was
set to {∞, 2, 1.5, 1.2, 1, 0.8, 0.725, 0.65}). Each GAN trained with
additional noise was trained 5 times for robust results.
All hyperparameters mentioned in the last paragraph were the
result of a tuning process and all models were trained on a Tesla
V100. The training time of one GAN including evaluation took
∼1.4 days.
3.4. Performance Evaluation
Among the many metrics to evaluate synthetic data (Yi et al.,
2019), we selected three to estimate the quality of our synthesized
images. First, we evaluated our synthesized image-label pairs
by visual inspection, and second, using the downstream task of
segmentation as suggested by Yi et al. (2019). Additionally, we
compared the images using the FID as proposed in previous
studies (Haarburger et al., 2019; Coyner et al., 2022).
The generated image-label pairs were evaluated by a U-Net for
brain vessel segmentation adapted from Livne et al. (2019). After
training the GANs, 41,000 image-label pairs were generated.
These were used to train 8 U-Net with different hyperparameter
settings varying in learning rates, dropout, and classical data
augmentation. The best U-Net was then selected based on the
best Dice Similarity Coefficient (DSC) on the validation set that
included real patient images. The final performance was then
evaluated in terms of DSC and balanced average Hausdorff
distance (bAHD) on the test set. The DSC that evaluated the
segmented voxels is defined as:
DSC =2TP
2TP +FP +FN (8)
where TP are the true positives, FP are the false positives, and
FN are the false negatives. As the DSC quantifies the overlap of
the ground truth and prediction scaled by the total number of
voxels in ground truth and prediction, it is a robust performance
measure for imbalanced segmentations, i.e., images contain more
background than segmented area. The bAHD is a newly proposed
metric for evaluating segmentations (Aydin et al., 2021):
bAHD =
1
NGX
g∈G
min
s∈Sd(g,s)+1
NGX
s∈S
min
g∈Gd(s,g)
/2 (9)
where NGis the number of ground truth voxels, Gis the set of
voxels belonging to the ground truth, and Sis the set of voxels
1https://github.com/pytorch/opacus/blob/master/examples/dcgan.py
2https://github.com/prediction2020/Labeled-TOF-MRA-with-DP
of the predicted segmentation. In other words, the bAHD is the
average of the directed Hausdorff distance from the ground truth
to the segmentation and the directed Hausdorff distance from the
segmentation to the ground truth both scaled by the number of
ground truth voxels.
Additionally, the DSC and bAHD of the U-Net models were
assessed on the 1000Plus dataset. The GAN and U-Nets were
implemented in an end-to-end pipeline. To calculate both DSC
and bAHD, we used the EvaluateSegmentation tool by Taha and
Hanbury (2015).
As an additional metric, the image quality was measured by
the FID (Heusel et al., 2018). The FID is a distance that measures
the similarity between images by comparing the activations of a
pre-trained Inception-v3 network. Here, the difference between
the activations in the pool3 layer of the generated images in
contrast to the real images is measured.
FID =
µreal −µgen
2+Tr σreal +σgen −2σrealσgen1/2
(10)
with N(µreal,σreal) and N(µgen,σgen) as the distributions
of the features of the pool3 layer of real and synthesized
data, respectively.
To explore to which degree the generated images reproduced
the training set, the FID between the synthetic data and both the
training and test data was calculated and compared for different
levels of privacy.
Finally, we measured the similarity between the images
synthesized by the GANs to check whether a model suffered
from mode collapse. For each model, we generated 1,000 images
and calculated the Structural Similarity Index Measure (SSIM)
between them and averaged the values. We repeated this analysis
for all 5 runs for each ǫvalue, for the model with ǫ= ∞ and the
real images. The SSIM between two images xand yis defined as a
product of luminance, contrast, and structure according to Wang
et al. (2004):
SSIM(x,y)=(2µxµy+c1)(2σxy +c2)
(µ2
x+µ2
y+c1)(σ2
x+σ2
y+c2), (11)
where µxis the average of x,σxis the variance, and σxy is the
covariance of xand y.c1=(k1L)2and c2=(k2L)2are for
stabilization with Lbeing the dynamic range of the pixel values
and k1≪1 and k2≪1 small constants.
3.5. Visualization Using t-SNE
Finally, the generated images with and without differential
privacy and the real patient images were visualized using a
t-SNE (Maaten and Hinton, 2008). t-SNE is an approach to
reducing dimensionality while preserving the structure of the
high dimensional data points. First, all data points are embedded
into an SNE which computed the pairwise similarities utilizing
conditional probabilities. For points xiand xjthe conditional
probability pj|iof xichoosing xjas its neighbor is defined as
pj|i=exp(−
xi−xj
2/2σ2
i)
Pk6=iexp −kxi−xkk2/2σ2
i)(12)
Frontiers in Artificial Intelligence | www.frontiersin.org 5May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 2 | Synthetic TOF-MRA patches (top row) and corresponding segmentation labels (bottom row) with different values of ǫcompared to real patient data (first
column). A lower ǫ(i.e., more privacy) leads to more noisy images.
and the symmetrized similarity as:
pij =pj|i+pi|j
2N(13)
with Nbeing the dimensionality of the data. Then the
algorithm aims to learn a lower dimensional representation
of the similarities. In order to get distinct clusters and
avoid overcrowding, a Student’s tdistribution that reflects the
similarities pj|iis used (Maaten and Hinton, 2008):
qij =(1 +
yi−yj
2)−1
Pk6=m(1 +
yk−ym
2)−1(14)
Starting from random initialization, the locations of the points
in the lower dimensional space yiare shifted so that a cost
function was minimized using a gradient descent method.
Instead of the Kullback-Leibler divergence, we here chose the
Wasserstein metric due to its success in GAN applications
(Arjovsky et al., 2017).
T-distributed stochastic neighbor embedding was
implemented using the sklearn package (Pedregosa et al.,
2011). The perplexity parameter reflecting the density of the
data distribution was chosen to be 30 which is in the suggested
range by Maaten and Hinton (2008). The images of the best
performing GAN with and without differential privacy, as
well as the real images were projected, onto 2 dimensions for
visualization purposes.
4. RESULTS
Visually, the synthetic image-label pairs appeared noisier
with decreasing ǫ, i.e., with stricter privacy guarantees
(Figure 2). Differentially private images with ǫ=1.3 show
almost only noise. The visual results corresponded to the
segmentation performance when training a U-Net on the
generated image-label pairs with different values of ǫ(Figure 3).
In Figure 3A, the averaged DSC over U-Net models that
were trained on synthetic data from five different GANs for
each ǫis plotted. With decreasing ǫ, the DSC decreased and
got more unstable, i.e., more variation between the different
models for the same ǫ. In particular, models with ǫ > 5
showed increased stability compared to models with lower
ǫ. When considering only the best run of the five models
(Figures 3B,C) the performance again dropped for decreasing
ǫ. This was reflected by a lower DSC and a higher bAHD.
The corresponding segmentation error maps are shown
in Figure 4.
When testing the best U-Net models on the 1000Plus dataset,
a similar trade-off between privacy and utility can be seen
(Figure 5). Here, the U-Net performance in terms of DSC
decreased more rapidly in comparison to the performance on
the PEGASUS dataset, starting at ǫ=8 with DSC ≈0.69
(Figure 5A). The bAHD showed instability in performance for
ǫ < 3 (Figure 5B).
The FID between the training data and the generated data
overall showed a similar trend: Less privacy led to a smaller
distance to the training data (Figure 6A). The generated data
trained without differential privacy (ǫ= ∞) showed an FID
of 62 compared to an FID of 244 and 228 for the images with
ǫ=5.7 and ǫ=10.2, respectively. The distance to the
test data was similar for different ǫvalues. Figure 6B shows
the difference between the distances to the training images and
test images for different values of ǫ. Here, the differences were
increasing for higher ǫvalues with ǫ= ∞ showing the largest
difference, at least twice as large compared to all models trained
with privacy guarantees.
Evaluating GAN models during training, we found the
best performing image-label pairs when training with a noise
multiplier of 0.65 for 29 epochs. This resulted in ǫ=7.4. The
U-Net trained on these synthetic image-labels showed a DSC of
0.75 on the test set (Table 1). The segmentation of an example
patient is shown in Figure 7. The big vessels are segmented
reasonably well while a lot of errors occur when smaller vessels
are segmented.
The similarity between the images is shown in Figure 8. For
ǫ < 2, high SSIM values were observed (SSIM >0.98). In
contrast, higher ǫvalues led to less similar images produced by
one model.
Figure 9 shows the t-SNE embedding of the best performing
GAN with and without differential privacy and the real patient
images. The synthetic images without privacy guarantees are
overall close to the real images. The images with differential
privacy cluster at the edges far away from the real images.
Frontiers in Artificial Intelligence | www.frontiersin.org 6May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 3 | Test segmentation performance of U-Nets trained on generated data with different values of ǫ(PEGASUS dataset). (A) shows a boxplot showing the DSC
over 5 runs for each value of ǫ. In (B), only the run with the best DSC is shown. (C) shows the balanced average Hausdorff distance (bAHD) in voxels for the best run
for each ǫ. The errorbar depicts the SD between patients. For ǫ < 5, the performance becomes unstable and worse compared to higher ǫvalues.
FIGURE 4 | Error maps of one example test patient for U-Nets trained on either real image-label pairs or generated image-labels with different values of ǫ. True
positives are shown in red, false positives in green, and false negatives in yellow. For lower ǫ, more errors occur.
5. DISCUSSION
In the present study, we generated differentially private TOF-
MRA images with corresponding labels and explored the trade-
off between privacy and utility on two different test sets. We
proposed different evaluation schemes including training a
segmentation network and identified a threshold of ǫ < 5
with DSC <0.61 for which the segmentation performance
became unstable and not usable. Our best segmentation model
trained on synthetic and private data achieved a DSC of 0.75 for
Frontiers in Artificial Intelligence | www.frontiersin.org 7May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 5 | Segmentation performance in terms of (A) DSC and (B) bAHD in voxels of the best performing model for each ǫevaluated on a second dataset
(1000Plus). The DSC shows a decreasing performance starting for ǫ < 8.
FIGURE 6 | Comparison of Fréchet Inception Distance (FID) between the synthetic images with different ǫvalues and both the real training data (light green squares
and light blue dotted line) and the real test data (dark green triangles and dark blue dashed line). (A) shows the absolute values for the 5 runs per ǫwhereas (B) shows
the difference between the distances from synthetic to training and synthetic to test. The higher the value of ǫ, the closer the images are to the training set. The
distance to the test set remains stable for different ǫvalues. The difference shown in (B) is the highest for the model trained without differential privacy.
TABLE 1 | Overview of segmentation performances in terms of DSC and bAHD for a U-Net trained on real patient images and generated with and without
differential privacy. The best of the three U-Net models is shown in bold for each metric and dataset. The best U-Net with differential privacy guarantees has an ǫof 7.4.
SD stands for standard deviation.
PEGASUS 1000Plus
U-Net trained on Mean DSC (SD) Mean bAHD (SD) Mean DSC (SD) Mean bAHD (SD)
Real images 0.89 (0.02) 0.33 (0.11) 0.90 (0.02) 0.69 (0.47)
Generated
images (ǫ= ∞)0.84 (0.02) 0.61 (0.12) 0.88 (0.02) 0.58 (0.32)
Generated
images (ǫ=7.4) 0.75 (0.04) 2.49 (1.96) 0.69 (0.04) 2.87 (1.25)
Frontiers in Artificial Intelligence | www.frontiersin.org 8May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 7 | Segmentation error maps of one test patient by the best U-Net model using differential privacy (ǫ=7.4). Red indicates the true positives, green stands for
false positives, and yellow for false negatives. (A) shows a slice containing big vessels, (B) small ones, and (C) the whole vessel tree. The segmentation works
reasonably well with errors occurring particularly when segmenting small vessels.
FIGURE 8 | Mean Structural Similarity Index Measure (SSIM) between 1,000 generated images for differential ǫvalues. The errorbar shows the standard deviation
over the 5 different runs for each ǫvalue. For ǫ < 2, the similarity between images is high, whereas it decreases for higher ǫvalues.
ǫ=7.4 in a brain vessel segmentation paradigm. Our results
mark the first step in data sharing with privacy guarantees for
neuroimaging problems.
Since differential privacy is based on introducing noise, a
decrease in utility is expected with the introduction of differential
privacy. Our results confirm this notion. For ǫ= ∞, we achieved
a DSC of 0.84 which is comparable to the literature (Kossen
et al., 2021). Stricter privacy constraints indicated by a lower ǫ
led to worse visual results as well as poorer segmentation results
(Figures 2–5). This also corresponds to findings in previous
studies on differential privacy (Xie et al., 2018; Xu et al., 2019;
Yoon et al., 2019). The increasing amount of noise might also
be the reason for the instability of the GAN training for lower
ǫvalues, especially for ǫ < 5 (Figure 2A). A performance
drop could also be observed for testing the U-Nets trained
on differential private image-label pairs on a second dataset
(Figure 5). In comparison to the first test set, the performance
drop occurred already for higher values of ǫ(ǫ < 8 compared
to ǫ < 5). Thus, models with fewer privacy guarantees showed
better generalizability. A reason for that might be again the
lower amount of noise and, therefore, fewer restrictions during
training. This is also in line with our findings in Figure 8. Here,
images generated from models with lower ǫ(ǫ < 2) values
showed more similarities between each other, thus indicating
more mode collapse compared to models with higher ǫvalues.
This could be another reason for the performance drop for
models with stricter privacy guarantees.
Images with larger ǫvalues also showed greater similarity
in terms of FID to the training images than those with stricter
privacy guarantees. This indicates that more specific features of
the training set can be memorized for less noisy models. The
FID between test images and synthetic images (FIDtest) stayed
constant for different values of ǫ(Figure 6A). The difference
between the FIDtrain and FIDtest can be seen as a measure of
the degree to which the images overfit the training set. Even for
the model with our largest ǫ=10.2, the difference between
FIDtrain and FIDtest was only half compared to the difference
of the model without any privacy constraints. This shows that
Frontiers in Artificial Intelligence | www.frontiersin.org 9May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
FIGURE 9 | Visualization of real and generated images with and without differential privacy in a t-SNE embedding. Each point represents an image. The distribution of
real images and generated images without privacy almost entirely overlap. In contrast, the images with privacy guarantees are only partially overlapping and cluster at
the edges, distant from the real images. The embedding showing the specific image instead of a point can be found in the Figure S1 in the supplementary material.
differential privacy substantially contributed to the prevention
of the memorization of the training set. Those findings are
also in line with the embedding shown in Figure 9 in which
the differentially private images are further away from the
training images compared to the images generated without any
privacy guarantees.
Machine learning models including GANs are susceptible
to so-called membership inference attacks (Shokri et al., 2017;
Hayes et al., 2019; Chen et al., 2020). Here, an attack model is
trained to predict whether a sample was part of the training set. If
these attacks are successful, the privacy of the training samples is
jeopardized. Differential privacy has been shown to decrease the
model’s vulnerability to privacy attacks (Shokri et al., 2017; Hayes
et al., 2019). While there is no consensus about an exact value
of ǫ, studies such as Hayes et al. (2019) and Bagdasaryan and
Shmatikov (2019) consider a value of ǫ < 10 acceptable. In this
study, we were able to synthesize image-label pairs with single-
digit ǫ(i.e., ǫ=7.4) that still show reasonable performance in
the segmentation task. Naturally, further research is necessary
to validate that our models would successfully defend against
membership inference attacks.
Whereas, the segmentation performance in terms of DSC
showed a consistent trend, this was not always true for the
bAHD. Figure 3C shows overall comparable results to the DSC
performance with some fluctuations. These fluctuations can be
explained by selecting the best model based on the best validation
DSC and not bAHD. In Figure 5B, however, the segmentation
model for ǫ=1.3 seemed to perform better compared to
models with ǫ=1.9 and ǫ=2.7. An explanation for
this might be the number of false positives and false negatives
in the segmentations. For ǫ=1.3, barely any voxel was
identified as belonging to a vessel which resulted in many false
negatives. For the other two models, there were many false
positives with a large distance to the ground truth. The bAHD
considers these models to be worse although none of the three
models show a good segmentation performance (see Figure S2
in the supplementary material). The characteristic of penalizing
especially false positives should be taken into consideration in
future studies when using the bAVD as a metric.
The main limitations of the present study are the
computational restrictions. Due to that only 2D patches
were used. Additionally, more complex GAN architectures
consisting of multiple generators and/or discriminators such
as PrivGAN (Mukherjee et al., 2021) or PATE-GAN (Yoon
et al., 2019) could not be implemented. Especially PrivGAN
appears to be an interesting direction for future research
since it does not only implement differential privacy but also
aims to reduce vulnerability toward membership inference
attacks directly.
6. CONCLUSION
In the present study, we synthesized differentially private
TOF-MRA images and segmentation labels using GANs
for a neuroimaging application. We proposed different
evaluation metrics including the performance of a trained
neural network for vessel segmentation. Even with privacy
constraints, we could train a segmentation model that
works reasonably well on real patient data. This is a crucial
step toward synthesizing medical imaging data that both
preserves predictive properties and privacy. Nonetheless,
further studies should be conducted to evaluate if our findings
generalize to other types of medical imaging data and to
Frontiers in Artificial Intelligence | www.frontiersin.org 10 May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
further improve performance. Our synthetic data is available
upon request.
DATA AVAILABILITY STATEMENT
The data analyzed in this study is subject to the following
licenses/restrictions: The datasets used in this article are not
readily available because data protection laws prohibit sharing
the PEGASUS and 1000Plus datasets at the current time
point. Requests to access these datasets should be directed
ETHICS STATEMENT
The studies involving human participants were reviewed and
approved by Ethics Committee of Charité University Medicine
Berlin and Berlin State Ethics Board. The patients/participants
provided their written informed consent to participate in
this study.
AUTHOR CONTRIBUTIONS
TK, MH, VM, FB, KS, AHe, KH, SP, AHi, and DF: concept and
design. VM, JS, IG, AK, and JF: acquisition of data. TK, VM, FB,
AHe, KH, and DF: model design. TK: data analysis. TK, MH, VM,
FB, AHe, KH, and DF: data interpretation. TK, MH, VM, FB, KS,
AHe, KH, SP, AHi, JS, IG, AK, JF, and DF: manuscript drafting
and approval. All authors contributed to the article and approved
the submitted version.
FUNDING
This study has received funding from the European Commission
through a Horizon2020 grant (PRECISE4Q grant no. 777 107,
coordinator: DF) and the German Federal Ministry of Education
and Research through a Go-Bio grant (PREDICTioN2020 grant
no. 031B0154 lead: DF).
ACKNOWLEDGMENTS
Computation has been performed on the HPC for the Research
cluster of the Berlin Institute of Health. We acknowledge support
from the German Research Foundation (DFG) and the Open
Access Publication Fund of Charité-Universitätsmedizin Berlin.
SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found
online at: https://www.frontiersin.org/articles/10.3389/frai.2022.
813842/full#supplementary-material
REFERENCES
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., et al.
(2016). “Deep learning with differential privacy,” in Proceedings of the 2016
ACM SIGSAC Conference on Computer and Communications Security, CCS ’16
(New York, NY: Association for Computing Machinery), 308–318.
Abramian, D., and Eklund, A. (2019). “Refacing: reconstructing anonymized facial
features using gans,” in 2019 IEEE 16th International Symposium on Biomedical
Imaging (ISBI 2019) (Venice: IEEE).
Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN.
arXiv:1701.07875 [cs, stat]. arXiv: 1701.07875.
Aydin, O. U., Taha, A. A., Hilbert, A., Khalil, A. A., Galinovic, I., Fiebach, J. B.,
et al. (2021). On the usage of average Hausdorff distance for segmentation
performance assessment: hidden error when used for ranking. Eur. Radiol. Exp.
5, 4. doi: 10.1186/s41747-020-00200-2
Bagdasaryan, E., and Shmatikov, V. (2019). Differential privacy has disparate
impact on model accuracy. CoRR, abs/1905.12101.
Balle, B., Barthe, G., Gaboardi, M., Hsu, J., and Sato, T. (2019). Hypothesis testing
interpretations and renyi differential privacy. arXiv:1905.09982 [cs, stat]. arXiv:
1905.09982.
Bannier, E., Barker, G., Borghesani, V., Broeckx, N., Clement, P., Emblem, K.
E., et al. (2021). The Open Brain Consent: Informing research participants
and obtaining consent to share brain imaging data. Hum. Brain Mapp. 42,
1945–1951. doi: 10.1002/hbm.25351
Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A., et al. (2018).
GAN Augmentation: augmenting training data using generative adversarial
networks. arXiv:1810.10863 [cs]. arXiv: 1810.10863.
Chen, D., Yu, N., Zhang, Y., and Fritz, M. (2020). “Gan-leaks: a taxonomy of
membership inference attacks against generative models,” in Proceedings of the
2020 ACM SIGSAC Conference on Computer and Communications Security,
CCS ’20 (New York, NY: Association for Computing Machinery), 343–362.
Cheng, V., Suriyakumar, V. M., Dullerud, N., Joshi, S., and Ghassemi, M. (2021).
“Can you fake it until you make it? impacts of differentially private synthetic
data on downstream classification fairness,” in Proceedings of the 2021 ACM
Conference on Fairness, Accountability, and Transparency, FAccT ’21 (New
York, NY: Association for Computing Machinery), 149–160.
Cirillo, M. D., Abramian, D., and Eklund, A. (2020). Vox2vox:
3d-gan for brain tumour segmentation. CoRR, abs/2003.13653.
doi: 10.1007/978-3-030-72084-1_25
Coyner, A. S., Chen, J. S., Chang, K., Singh, P., Ostmo, S., Chan, R. V. P.,
et al. (2022). Synthetic medical images for robust, privacy-preserving training
of artificial intelligence: application to retinopathy of prematurity diagnosis.
Ophthalmol. Sci. 2, 100126. doi: 10.1016/j.xops.2022.100126
Duan, D., Xia, S., Rekik, I., Wu, Z., Wang, L., Lin, W., et al. (2020). Individual
identification and individual variability analysis based on cortical folding
features in developing infant singletons and twins. Hum. Brain Mapp. 41,
1985–2003. doi: 10.1002/hbm.24924
Dwork, C. (2008). “Differential privacy: a survey of results,” in Theory and
Applications of Models of Computation, Lecture Notes in Computer Science, eds
M. Agrawal, D. Du, Z. Duan, and A. Li (Berlin; Heidelberg: Springer), 1–19.
Foroozandeh, M., and Eklund, A. (2020). Synthesizing brain tumor images
and annotations by combining progressive growing GAN and SPADE.
arXiv:2009.05946 [cs]. arXiv: 2009.05946.
Haarburger, C., Horst, N., Truhn, D., Broeckmann, M., Schrading, S., Kuhl,
C., et al. (2019). “Multiparametric magnetic resonance image synthesis
using generative adversarial networks,” in Eurographics Workshop on Visual
Computing for Biology and Medicine (The Eurographics Association Version
Number: 011-015), 5.
Hayes, J., Melis, L., Danezis, G., and Cristofaro, E. D. (2019). LOGAN: membership
inference attacks against generative models. Proc. Privacy Enhan. Technol. 2019,
133–152. doi: 10.2478/popets-2019-0008
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2018).
GANs trained by a two time-scale update rule converge to a local nash
equilibrium. arXiv:1706.08500 [cs, stat]. arXiv: 1706.08500.
Hilbert, A., Madai, V. I., Akay, E. M., Aydin, O. U., Behland, J., Sobesky, J., et al.
(2020). Brave-net: Fully automated arterial brain vessel segmentation
in patients with cerebrovascular disease. Front. Artif. Intell. 3, 78.
doi: 10.3389/frai.2020.552258
Frontiers in Artificial Intelligence | www.frontiersin.org 11 May 2022 | Volume 5 | Article 813842
Kossen et al. Labeled TOF-MRA With Differential Privacy
Hotter, B., Pittl, S., Ebinger, M., Oepen, G., Jegzentis, K., Kudo, K., et al. (2009).
Prospective study on the mismatch concept in acute stroke patients within
the first 24 h after symptom onset-1000Plus study. BMC Neurol. 9, 60.
doi: 10.1186/1471-2377-9-60
Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2018). Image-to-image
translation with conditional adversarial networks. arXiv:1611.07004 [cs].
doi: 10.1109/CVPR.2017.632
Kossen, T., Subramaniam, P., Madai, V. I., Hennemuth, A., Hildebrand, K.,
Hilbert, A., et al. (2021). Synthesizing anonymized and labeled TOF-MRA
patches for brain vessel segmentation using generative adversarial networks.
Comput. Biol. Med. 131, 104254. doi: 10.1016/j.compbiomed.2021.104254
Livne, M., Rieger, J., Aydin, O. U., Taha, A. A., Akay, E. M., Kossen, T.,
et al. (2019). A u-net deep learning framework for high performance vessel
segmentation in patients with cerebrovascular disease. Front. Neurosci. 13, 97.
doi: 10.3389/fnins.2019.00097
Lundervold, A. S., and Lundervold, A. (2019). An overview of deep learning
in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik 29,
102–127. doi: 10.1016/j.zemedi.2018.11.002
Maaten, L. V. D., and Hinton, G. (2008). Visualizing data using t-SNE. J. Mach.
Learn. Res. 9, 2579–2605.
Mironov, I. (2017). “Renyi differential privacy,” in 2017 IEEE 30th Computer
Security Foundations Symposium (CSF) (Santa Barbara, CA: IEEE), 263–275.
Mukherjee, S., Xu, Y., Trivedi, A., Patowary, N., and Ferres, J. L. (2021). privGAN:
protecting GANs from membership inference attacks at low cost to utility. Proc.
Privacy Enhan. Technol. 2021, 142–163. doi: 10.2478/popets-2021-0041
Mutke, M. A., Madai, V. I., von Samson-Himmelstjerna, F. C., Zaro Weber, O.,
Revankar, G. S., Martin, S. Z., et al. (2014). Clinical evaluation of an arterial-
spin-labeling product sequence in steno-occlusive disease of the brain. PLoS
ONE 9, e87143. doi: 10.1371/journal.pone.0087143
Nguyen, D. C., Ding, M., Pathirana, P. N., Seneviratne, A., and Zomaya,
A. Y. (2021). Federated learning for COVID-19 detection with generative
adversarial networks in edge cloud computing. IEEE Internet Things J. 1–1.
doi: 10.1109/JIOT.2021.3120998
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
et al. (2011). Scikit-learn: machine learning in python. Mach. Learn. Python 6,
2825–2830.
Schwarz, C. G., Kremers, W. K., Therneau, T. M., Sharp, R. R., Gunter, J.
L., Vemuri, P., et al. (2019). Identification of anonymous MRI research
participants with face-recognition software. N. Engl. J. Med. 381, 1684–1686.
doi: 10.1056/NEJMc1908881
Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). “Membership
inference attacks against machine learning models,” in 2017 IEEE Symposium
on Security and Privacy (SP) (San Jose, CA: IEEE), 3–18.
Taha, A. A., and Hanbury, A. (2015). Metrics for evaluating 3D medical
image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 29.
doi: 10.1186/s12880-015-0068-x
Torkzadehmahani, R., Kairouz, P., and Paten, B. (2019). “DP-CGAN: differentially
private synthetic data and label generation,” in 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition Workshops (CVPRW) (Long Beach,
CA: IEEE), 98–104.
Tudosiu, P.-D., Varsavsky, T., Shaw, R., Graham, M., Nachev, P., Ourselin, S.,
et al. (2020). Neuromorphologicaly-preserving volumetric data encoding using
VQ-VAE. arXiv:2002.05692 [cs, eess, q-bio]. arXiv: 2002.05692.
Wang, L., Chen, W., Yang, W., Bi, F., and Yu, F. R. (2020). A State-of-the-Art
review on image synthesis with generative adversarial networks. IEEE Access
8, 63514–63537. doi: 10.1109/ACCESS.2020.2982224
Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. (2004). Image quality
assessment: from error visibility to structural similarity. IEEE Trans. Image
Process. 13, 600–612. doi: 10.1109/TIP.2003.819861
Willemink, M. J., Koszek, W. A., Hardell, C., Wu, J., Fleischmann, D., Harvey, H.,
et al. (2020). Preparing medical imaging data for machine learning. Radiology
295, 4–15. doi: 10.1148/radiol.2020192224
Xie, L., Lin, K., Wang, S., Wang, F., and Zhou, J. (2018). Differentially private
generative adversarial network. arXiv:1802.06739 [cs, stat]. arXiv: 1802.06739.
Xu, C., Ren, J., Zhang, D., Zhang, Y., Qin, Z., and Ren, K. (2019). GANobfuscator:
mitigating information leakage under GAN via differential privacy. IEEE Trans.
Inf. Forensics Security 14, 2358–2371. doi: 10.1109/TIFS.2019.2897874
Yi, X., Walia, E., and Babyn, P. (2019). Generative adversarial network
in medical imaging: a review. Med. Image Anal. 58, 101552.
doi: 10.1016/j.media.2019.101552
Yoon, J., Drumright, L. N., and van der Schaar, M. (2020). Anonymization through
data synthesis using generative adversarial networks (ADS-GAN). IEEE J.
Biomed. Health Inform. 24, 2378–2388. doi: 10.1109/JBHI.2020.2980262
Yoon, J., Jordon, J., and van der Schaar, M. (2019). “PATE-GAN: generating
synthetic data with differential privacy guarantees,” in International Conference
on Learning Representations (New Orleans: ICLR).
Zhang, L., Shen, B., Barnawi, A., Xi, S., Kumar, N., and Wu, Y. (2021). FedDPGAN:
federated differentially private generative adversarial networks framework for
the detection of COVID-19 pneumonia. Inform. Syst. Front. 23, 1403–1415.
doi: 10.1007/s10796-021-10144-6
Zhu, G., Jiang, B., Tong, L., Xie, Y., Zaharchuk, G., and Wintermark,
M. (2019). Applications of deep learning to neuro-imaging
techniques. Front. Neurol. 10, 869. doi: 10.3389/fneur.20
19.00869
Conflict of Interest: TK, MH, VM, and AHi are employed by ai4medicine. FB
and AHe are employed by Fraunhofer. JS reports receipt of speakers’ honoraria
from Pfizer, Boehringer Ingelheim, and Daiichi Sankyo. JF has received consulting
and advisory board fees from BioClinica, Cerevast, Artemida, Brainomix,
Biogen, BMS, EISAI, and Guerbet. DF receiving grants from the European
Commission, reported receiving personal fees from and holding an equity interest
in ai4medicine.
The remaining authors declare that the research was conducted in the absence of
any commercial or financial relationships that could be construed as a potential
conflict of interest.
Publisher’s Note: All claims expressed in this article are solely those of the authors
and do not necessarily represent those of their affiliated organizations, or those of
the publisher, the editors and the reviewers. Any product that may be evaluated in
this article, or claim that may be made by its manufacturer, is not guaranteed or
endorsed by the publisher.
Copyright © 2022 Kossen, Hirzel, Madai, Boenisch, Hennemuth, Hildebrand,
Pokutta, Sharma, Hilbert, Sobesky, Galinovic, Khalil, Fiebach and Frey. This is an
open-access article distributed under the terms of the Creative Commons Attribution
License (CC BY). The use, distribution or reproduction in other forums is permitted,
provided the original author(s) and the copyright owner(s) are credited and that the
original publication in this journal is cited, in accordance with accepted academic
practice. No use, distribution or reproduction is permitted which does not comply
with these terms.
Frontiers in Artificial Intelligence | www.frontiersin.org 12 May 2022 | Volume 5 | Article 813842
Supplementary Material
1 SUPPLEMENTARY DATA
Figure S1.
Visualization of real and generated images with and without differential privacy in a t-SNE
embedding. The distribution of real images and generated images without privacy almost entirely overlap.
In contrast to that, the images with privacy guarantees are only partly overlapping and cluster at the edges,
distant from the real images.
Figure S2.
Segmentation error maps for two example patients for a model with
= 1.3(A)
and
= 2.7
(B)
. Voxels in red are true positives, yellow represents false negatives and green false positives.
(A)
shows
many false negatives with few false positives. The Dice Similarity Coefficient (DSC) is 0.046 and the
balanced average Hausdorff distance (bAHD) 8.3.
(B)
shows many false positives with a DSC of 0.052 and
a bAHD of 190.5.
1
Part II
Image-to-Image Translation for
Stroke Treatment Planning
69
7
Image-to-Image Generative Adversarial
Networks for Synthesizing Perfusion
Parameter Maps from DSC-MR Images
in Cerebrovascular Disease
7.1 Context Within Thesis
GAN architectures can not only be utilized for private image synthesis but are also currently
state-of-the-art in the field of medical image-to-image translations. In the clinical setting
of stroke, the translation of DSC-MRI to perfusion parameter maps can be regarded as an
image-to-image translation. DSC-MRI-derived perfusion maps are crucial for stroke treatment
planning. Nowadays, perfusion maps are derived by placing an AIF based on selected voxels.
While this process can be automated, it only takes into account a few voxels and also requires
oversight by a medical expert. Since time is a critical resource for stroke patients, automatic
processing of DSC-MRI into expert-level perfusion maps would speed up treatment planning.
In this chapter, we show that GANs could be utilized to automatically derive expert-level
perfusion maps as an alternative approach to AIF-based approaches. To this end, we developed
an adapted version of the pix2pix GAN incorporating the time dimension of the DSC-MRI.
We tested our architecture on two datasets: 1) a dataset comprising stroke patients and 2) a
dataset containing patients with steno-occlusive disease.
71
7. Image-to-Image Generative Adversarial Networks for Synthesizing Perfusion
Parameter Maps from DSC-MR Images in Cerebrovascular Disease
7.2 Preprint
This chapter is based on the following preprint:
T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland,
A. Hilbert, J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative
adversarial networks for synthesizing perfusion parameter maps from DSC-MR images
in cerebrovascular disease”. In: medRxiv (2022). doi:
10.1101/2022.05.24.22274901
In this section the preprint, that is available on medRxiv, is reprinted. The article is open
access under the CC BY license.
Author Contribution
The first author Tabea Kossen conceptualized the study and interpreted the results together
with VIM, AH, KH and DF. She implemented the GANs architectures and evaluation
scripts. Additionally, she was responsible for the project administration and created the figures.
Together with VIM, she wrote the first version of the manuscript.
Code Availability
The code for this project is publicly available:
https://github.com/prediction2020/DSC-t
o-perfusion.
72
Image-to-image generative adversarial networks for synthesizing
perfusion parameter maps from DSC-MR images in cerebrovascular
disease
Tabea Kossen1,2∗, Vince I Madai1,3,4∗, Matthias A Mutke5, Anja Hennemuth2,6,7, Kristian Hildebrand8,
Jonas Behland1, Adam Hilbert1, Jan Sobesky9,10, Martin Bendszus5and Dietmar Frey1
1CLAIM - Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Germany
2Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany
3QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité - Universitätsmedizin Berlin, Berlin, Germany
4School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK
5Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany
6Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany
7Fraunhofer MEVIS, Bremen, Germany
8Department VI Computer Science and Media, Berlin University of Applied Sciences and Technology, Berlin, Germany
9Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany
10Johanna-Etienne-Hospital, Neuss, Germany
ABSTRACT
Stroke is a major cause for death or disability. As imaging based patient stratification improves acute stroke therapy, dynamic
susceptibility contrast magnetic resonance imaging (DSC-MRI) is is of major interest to image brain perfusion. However,
expert-level perfusion maps require a manual or semi-manual post-processing by a medical expert making the procedure time-
consuming and less standardized. Modern machine learning methods such as generative adversarial networks (GANs) have the
potential to automate the perfusion map generation on an expert-level without manual validation. We propose a modified pix2pix
GAN with a temporal component (temp-pix2pix-GAN) that generates perfusion maps in an end-to-end fashion. We train our
model on perfusion maps infused with expert knowledge to encode it into the GANs. The performance was trained and evaluated
using the structural similarity index measure (SSIM) on two datasets including acute stroke patients and patients with steno-
occlusive disease. Our temp-pix2pix architecture showed high performance on the acute stroke dataset for all perfusion maps
(mean SSIM 0.92-0.99) and good performance on data including patients with steno-occlusive disease (mean SSIM 0.84-0.99).
While clinical validation is still necessary in future studies, our results mark an important step towards automated expert-level
perfusion maps and thus, fast patient stratification.
Keywords: stroke, perfusion weighted imaging, dynamic susceptibility contrast MR, cerebrovascular disease, generative ad-
versarial networks
1 INTRODUCTION
Ischemic stroke is a leading cause for death or disability worldwide1.
Standard treatment strategies include recanalization by mechanical
or pharmacological intervention, or a combination of both (Berge
et al. (2021); Turc et al. (2019)). In this context, the eligibility of pa-
tients for treatment is mainly based on large cohorts of interventional
trials that implement few imaging information (Lin et al. (2022);
McDermott et al. (2019)). However, this means that some patients
will not receive treatment that would be of benefit for them and, con-
versely, some patients will be subjected to futile treatment attempts
(Goyal et al. (2016)). An alternative approach to improve outcomes
is an individualized patient stratification based on specific patient
characteristics (Rehani et al. (2020); Sharobeam and Yan (2022)).
∗These authors contributed equally to this work
1WHO EMRO Stroke, Cerebrovascular Accident | Health Topics.
Available online at: http://www.emro.who.int/health-topics/
stroke-cerebrovascular-accident/index.html
One of the most important techniques for this approach is perfu-
sion weighted-imaging, a special imaging technique used in both
computed tomography (CT) and magnetic resonance imaging (MRI)
(Sharobeam and Yan (2022)). It provides highly relevant information
about (patho)physiological blood flow in and around the ischemic
brain tissue (Copen et al. (2011)). In MRI, the most commonly
used perfusion imaging technique is dynamic susceptibility contrast
(DSC) MRI (Jahng et al. (2014)). It measures brain perfusion by
injecting a gadolinium-based contrast agent into the patient’s blood
(Jahng et al. (2014)), followed by a series of T2- or T2*-weighted
MRI sequences that record the flow of the contrast agent through the
brain. The resulting 4D image is deconvolved voxel-wise with an ar-
terial input function (AIF) (Calamante (2013)). The tissue concen-
tration curve as well as the deconvolved curve result in interpretable
perfusion parameter maps such as the cerebral blood flow (CBF),
cerebral blood volume (CBV), mean transit time (MTT), time-to-
maximum (Tmax), and time-to-peak (TTP) (Calamante (2013)). Im-
portantly, the placement of the AIF is performed either in a semi-
manual or manual manner to achieve the highest quality. Addition-
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.
2Kossen et al.
Figure 1. Workflow of study. Our GAN is trained on expert-level perfusion maps. The resulting model is able to synthesize perfusion maps from unseen data
without the need of manual AIF selection, at the same expert level that was present in the training data.
ally, automated methods exist that in some areas - such as in stroke -
require little input by experts to provide perfusion parameter maps of
high quality (Hansen et al. (2016); Ben Alaya et al. (2022); Krusche
et al. (2021)). In clinical practice, however, all existing methods of
AIF determination require at least some oversight by experts to rule
out faulty calculations due to suboptimal AIFs. This is a particular
challenge in stroke care, where time is a critical resource as it is one
of the most important determinants of clinical outcome. Therefore,
there is a great clinical need for novel automation approaches that
provide expert-level perfusion maps without the necessity for any
manual input.
One possible solution is the application of modern artificial intel-
ligence (AI) methods based on machine learning and here particu-
larly deep learning approaches. These have shown great promise for
solving medical imaging problems in the past years (Wernick et al.
(2010); Lundervold and Lundervold (2019)). Among deep learning
applications, generative adversarial networks (GANs) are particu-
larly promising for the generation of expert level perfusion maps.
For example, GANs can be presented both with an original image
and a processed image and learn to generate the processed image
from the original. This is achieved by the special architecture of
GANs: They consist of two neural networks that try to fool each
other (Goodfellow et al. (2014)). One network, the generator, syn-
thesizes a data sample such as an image, whereas the other network,
the discriminator, decides whether the sample looks like a real sam-
ple or not. At the end of the training, the generated sample should re-
semble the original as closely as possible. For image-to-image trans-
lations GANs are considered to be state-of-the-art in the medical
field (Yi et al. (2019); Zhu et al. (2020)) and a conditional GAN
such as the pix2pix GAN can be applied (Isola et al. (2018)). For ex-
ample, pix2pix GANs have been successfully applied to transform
MR images to CT images (cross-modal) or to transform 3T MR im-
ages to 7T MR images (intramodal) (Brou Boni et al. (2020); Nie
et al. (2018)).
Given that the translation of a time-series of perfusion informa-
tion from source images to a single perfusion map can be seen as a
highly similar medical image-to-image translation problem, GANs
are a highly promising method for this use case. Preliminary work
on GANs for the translation of time-series in dynamic cine appli-
cations has been published (Ghodrati et al. (2021)). Yet, to the best
of our knowledge no study has investigated the generation of DSC
perfusion images from perfusion source data so far.
Thus, we propose a modified slice-wise pix2pix GAN with a tem-
poral component (temp-pix2pix-GAN) to account for the time di-
mension in DSC source perfusion imaging. Our GAN model auto-
matically generates perfusion parameter maps in an end-to-end fash-
ion. We train our model on expert-level perfusion parameter maps
(see Figure 1). The performance of our temp-pix2pix GAN model
is compared to a standard pix2pix GAN without a temporal com-
ponent. We train and test our approach on two different datasets in-
cluding acute stroke patients as well as patients with chronic cere-
brovascular disease.
2 MATERIALS AND METHODS
2.1 Data
In total, 276 patients were included in this study. 204 patients from
study Heidelberg suffered from acute stroke. 204 patients from a
study performed at Heidelberg University Hospital that suffered
from acute stroke. Imaging was performed with a T2*-weighted
gradient-echo EPI sequence with fat supression TR=2220ms,
TE=36ms, flip angle 90◦, field of view: 240x240mm2, image matrix:
128x128mm, 25-27 slices with ST of 5mm and was started simulta-
neously with bolus injection of a standard dose (0.1mmol/kg) of an
intravenous gadolinium-based contrast agent on 3 Tesla MRI sys-
tems (Magnetom Verio, TIM Trio and Magnetom Prisma; Siemens
Healthcare, Erlangen, Germany). In total, 50 to 75 dynamic mea-
surements were performed (including at least eight prebolus mea-
surements). Bolus and prebolus were injected with a pneumatically
driven injection pump at an injection rate of 5ml/s. The study pro-
tocol for this retrospective analysis of our prospectively established
stroke database was approved by the ethics committee of Heidelberg
University and patient informed consent was waived.
72 patients with steno-occlusive disease were included from the
PEGASUS study (Mutke et al. (2014)). 80 whole-brain images
were recorded using a single-shot FID-EPI sequence (TR=1390ms,
TE=29ms, voxel size: 1.8x1.8x5mm3) after injection of 5ml
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
Image-to-image GAN for synthesizing perfusion maps 3
Gadovist (Gadobutrol, 1 M, Bayer Schering Pharma AG, Berlin)
followed by 25ml saline flush by a power injector (Spectris, Medrad
Inc., Warrendale PA, USA) at a rate of 5ml/s. The acquisition time
was 1:54 minutes. All patients gave their written informed consent
and the study has been authorized by the ethical review committee
of Charite - Universitatsmedizin Berlin.
DSC post-processing was performed blinded to clinical outcome.
For the acute stroke data from Heidelberg, DSC data were post-
processed with Olea Sphere®(Olea Medical, La Ciotat, France), au-
tomatic motion correction was applied. Raw DSC images were used
to calculate perfusion maps of time-to-peak (TTP) from the tissue
response curve. Maps of cerebral blood flow (CBF), cerebral blood
volume (CBV), mean transit time (MTT), and time-to-maximum
(Tmax) were created by deconvolution of a regional concentration
time curve with an arterial input function (AIF). Block-circulant sin-
gular value decomposition (cSVD) deconvolution was applied. The
arterial input function (AIF) was detected automatically. All AIFs
were visually inspected by a neuroradiology expert (MAM, over 6
years experience in perfusion imaging) and only in two cases the
automatically detected AIF needed to be manually corrected.
For PEGASUS patients, DSC data were post-processed with the
PGui software (Version 1.0, provided for research purposes by the
Center for functional neuroimaging, Aarhus University, Denmark).
Motion correction was not available. Raw DSC images were used
to calculate perfusion maps of TTP from the tissue response curve.
Maps of CBF, CBV, MTT, and Tmax were created by deconvolution
of a regional concentration time curve with an AIF. Parametric de-
convolution was applied (Mouridsen et al. (2014)). For each patient,
an AIF was determined by a junior rater (JB, 2 years experience in
perfusion imaging) by manual selection of three or four intravas-
cular voxels of the MCA M2 segment contralateral to the side of
stenosis minimizing partial volume effects and bolus delay. The AIF
shape was visually assessed for peak sharpness, bolus peak time and
amplitude width (Calamante (2013); Thijs et al. (2004)). The AIFs
were inspected by a senior rater (VIM, over 12 years of experience
in perfusion imaging).
The post-processed data was split into a training (acute stroke
data: 142, PEGASUS: 50 patients), validation (acute stroke data: 20,
PEGASUS: 8 patients) and test (acute stroke data: 41, PEGASUS:
12 patients) set. The models were trained on the respective training
set and the hyperparameters were selected based on the performance
on the validation test. The generalizable performance was estimated
by the performance of the test set. The acute stroke data was resized
to 21 slices each containing 128x128 voxels. The DSC source was
rescaled to 80 time points. All images of one parameter map as well
as the DSC source images were normalized between -1 and +1 and
split into slices.
2.2 General methodological approach
We utilized a special type of AI model that was developed for gen-
erating an image based on the input of another image: the pix2pix
GAN (Isola et al. (2018)). A pix2pix GAN consists of two neural
networks that try to mislead each other. The first network, the gener-
ator, aims to produce realistic looking images based on another im-
age (e.g. produce a CT based on a MR image), whereas the second
network, the discriminator, tries to distinguish between the gener-
ated and real images. Based on the discriminator’s feedback, both
networks get better in their respective tasks.
Typically, the input and output to a pix2pix GAN generator is a 2D
image. For this use-case, we modified the pix2pix GAN to take a 3D
image (time sequence of the 2D DSC source image) as an input and
synthesize the corresponding 2D perfusion map slice (e.g. Tmax).
In this work we implemented two different generator architectures.
The first architecture, the classical pix2pix GAN, took in the 3D in-
put image without accounting for the temporal relation between the
images. In contrast to that, the second architecture, the temp-pix2pix
GAN, was designed to first extract the temporal relation between the
images followed by the transformation to the output image (see Fig-
ure 2). In the following, the technical details of the two approaches
are described in depth.
2.3 Network architecture
The GAN architecture was adapted from the pix2pix GAN (Isola
et al. (2018)). In our first architecture we utilized the original U-
Net generator as proposed in the paper with the time steps being
represented in the channels. For the second architecture we modified
the U-Net by adding 3D temporal convolutions before feeding the
result into the U-Net in the generator (see Figure 2).
Both GAN architectures consisted of two neural networks: the
generator G and the discriminator D. On the one hand, the genera-
tor’s task was to synthesize perfusion parameter maps such as Tmax
or CBF from the DSC source image. The discriminator, on the other
hand, learned to distinguish between the real DSC source image to-
gether with the real perfusion parameter map and the real DSC with
the generated perfusion parameter map.
In general, the objective function of a conditional GAN such as
the pix2pix GAN is:
LcGAN(G,D) = Ex,y[logD(x,y)] +Ex,z[log(1−D(x,G(x,z))] (1)
where xis the input image (DSC source in our case) and ythe output
image (Tmax for example) and za noise vector. The generator tries
to maximize the objective which is achieved when the discrimina-
tor outputs a high probability of the generated image pair being real
and a low probability for the real image pair respectively. In con-
trast to that, the discriminator tries to minimize this objective and
identify the real input images. The pix2pix GAN does not directly
incorporate the noise vector zbut introduces noise in the network
using dropout in the generator.
The loss of the generator consisted of two parts. The first part
was the adversarial loss which took into account the feedback of the
discriminator as described above. Additionally, a reconstruction loss
directly penalized deviation from the original image:
lossL1 =∥y−G(x,z)∥1(2)
This second loss was added to the adversarial loss and weighted by a
scalar λwhich was set to 1. The pix2pix generator was a U-Net with
6 down- and upsampling layers (see Figure 2B). One DSC source
slice at a time was fed as an input to the generator. The different
time points of the DSC were concatenated in the channel dimension.
Each downsampling layer consisted of a batch normalization layer
as well as a LeakyReLU with slope 0.2 and the upsampling layers of
ConvTranspose-layers, batch normalization and a ReLU activation.
After the last convolution, a tanh was applied.
In contrast to that, the generator of the temp-pix2pix GAN took
one slice of the DSC source at all time points as an input. The time
sequence of slices was then fed through 6 3D convolutions over the
time dimension iteratively reducing this dimension to 1. Each con-
volutional layer was followed by a batch normalization layer and a
LeakyReLu with slope 0.2. After the temporal path, the output was
fed into a 2D U-Net with convolutions over the spatial dimensions
with 6 down- and upsampling layers as described above (see Fig-
ure 2C).
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
4Kossen et al.
Figure 2. Architecture of the pix2pix and temp-pix2pix GAN. A shows the overall GAN architecture whereas B and C depict the two different generators and
D the discriminator.
The discriminator adapted the architecture of the discriminator
from the PatchGAN as suggested by Isola et al. (2018). It con-
sisted of 3 convolutional layers with batch normalization and a
LeakyReLU activation function followed by another convolutional
layer and a sigmoid activation function (see Figure 2D). For both
the generator and discriminator the kernel size was 4 with strides of
2.
2.4 Training
For each architecture, 5 GANs were trained on the acute stroke
dataset from Heidelberg for each of the five parameter maps (CBF,
CBV, MTT, Tmax and TTP). The models were trained for 100
epochs with a learning rate of 0.0001 for both generator and dis-
criminator using the Adam optimizer with β1=0.5 and β2=0.999.
The batch size was 4 and dropout 0. As the PEGASUS dataset was
smaller, the models trained on the acute stroke data served as a
weight initialization for the PEGASUS models and were then fur-
ther trained for 50 epochs. Thus, in total, 10 models were trained per
architecture.
All hyperparameters mentioned above were tuned and selected
according to visual inspection and the performance on the valida-
tion set. Due to the computational limitations an automated search
was not feasible. The code was implemented in PyTorch and is pub-
licly available2. The models were trained on a TESLA V100 GPU
(NVIDIA Corporation, Santa Clara, CA, USA).
2.5 Performance evaluation
The generated images were first visually inspected. Additionally,
four metrics were applied: the mean absolute error (MAE) or
L1 norm of the error, the normalized root mean squared error
(NRMSE), the structural similarity index measure (SSIM) and the
peak-signal-to-noise-ratio (PSNR).
The MAE is defined voxel-wise and measure the average absolute
of the error between the real image yand the generated image ˆy:
MAE =1
n
n
∑
i=1
|yi−ˆyi|(3)
The NRMSE is defined as the root mean squared error normalized
by average euclidean norm of the true image y:
NRMSE =RMSE
q1
n∑n
i=1y2
i
(4)
2https://github.com/prediction2020/DSC-to-perfusion
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
Image-to-image GAN for synthesizing perfusion maps 5
Figure 3. Synthesized perfusion parameter maps (middle and bottom row) compared to the ground truth reviewed by an expert (top row) for one representative
patient from the acute stroke test dataset. The perfusion parameter maps generated by the temp-pix2pix all look similar to the ground truth whereas the time-
dependent parameters (Tmax and TTP) are not well captured by the pix2pix GAN.
with
RMSE =s1
n
n
∑
i=1
(yi−ˆyi)2(5)
The SSIM is defined as a combination of luminance, contrast and
structure and can be summed up as:
SSIM(y,ˆy) = (2µyµˆy+c1)(2σyˆy+c2)
(µ2
y+µ2
ˆy+c1)(σ2
y+σ2
ˆy+c2),(6)
where µyand µˆyare the average values of yand ˆyrespectively, σy
the variance and σyˆythe covariance. c1and c2are constants for stabi-
lization and defined as c1= (k1L)2and c2= (k2L)2with Lbeing the
dynamic range of the pixel values and k1,k2≪1 small constants.
The higher the SSIM, the more similar are the two images with 1
denoting the highest similarity. The PSNR is defined as:
PSNR =10logMAXI
MSE (7)
with MAXIbeing the maximal possible pixel/voxel value. It de-
scribes the ratio between the maximal possible signal power and
noise power contained in the sample.
3 RESULTS
Visual inspection of the results of the acute stroke dataset showed
that the perfusion parameter maps generated by the temp-pix2pix
GAN looked similar to the ground truth (see Figure 3). For the
pix2pix model, on the other hand, only the CBF, CBV and MTT
were of sufficient quality, whereas the time-dependent parameters
TTP and Tmax did not consistently resemble the ground truth (also
Figure 3).
The quantitative analysis in the acute stroke dataset revealed for
all parameter maps a high SSIM ranging from 0.92-0.99 for the
temp-pix2pix model (Figure 4). In contrast to this, the pix2pix GAN
showed a comparable or worse SSIM ranging from 0.86-0.98. A per-
formance difference between the pix2pix and temp-pix2pix model
was especially prominent for Tmax and TTP (SSIM 0.92 vs 0.86 and
0.95 vs 0.91, respectively). For the PEGASUS dataset, the perfusion
maps generated by both the fine-tuned pix2pix and temp-pix2pix
GAN look similar to the ground truth (see Figure 5). For both net-
works, MTT appeared to be the least well reconstructed parame-
ter map which is also reflected in the metrics (Figure 6). Further-
more, the high intensities of Tmax were not well captured by the
pix2pix GAN (Figure 5). The performance metrics of the pix2pix
and temp-pix2pix GAN and the ground truth for the PEGASUS
dataset showed low error and high SSIM and PSNR for CBF, CBV
and Tmax. Here, for most metrics, the temp-pix2pix GAN achieved
a slightly better performance in contrast to the pix2pix GAN. For
MTT and TTP the temp-pix2pix showed a better performance com-
pared to the pix2pix GAN (SSIM 0.84 vs 0.78 and 0.86 vs 0.82
respectively). Overall, the metrics of the synthesized MTT and TTP
maps obtained a worse performance compared to the other parame-
ter maps. Figure 7A showed two patients whose generated parame-
ters showed the worst performance. For the acute stroke dataset these
are two Tmax maps (Figure 7A, first and second column). Whereas
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
6Kossen et al.
Figure 4. Mean performance metrics for evaluating the similarity between the ground truth and the synthesized parameter maps generated by the pix2pix GAN
(green) and the temp-pxi2pix GAN (blue) on the acute stroke dataset. A and B show the mean absolute error (MAE) and normalized mean root squared error
(NRMSE) respectively (the lower the better). C and D show the structural similarity index measure (SSIM) and the peak-signal-to-noise-ratio (PSNR) (the
higher the better). For all parameter maps the temp-pix2pix architecture shows a better or comparable performance compared to the pix2pix GAN. For the
time-dependent parameter maps Tmax and TTP the difference between the pix2pix and temp-pix2pix GAN performance is larger than for the other three maps.
The errorbar represents the standard deviation.
the generated Tmax in the first column did not capture the high inten-
sities well, the generated map in the second column visually looked
well. For the PEGASUS models, MTT performed the worst (Fig-
ure 7A, third and fourth column). In the third column the generated
MTT appears less noisy than the ground truth. In contrast to that, in
the fourth column the generated MTT map looked noisier compared
to the ground truth. Figure 7B showed the Tmax maps generated by
the temp-pix2pix and pix2pix GAN for four patients for which an
AIF could not be placed.
4 DISCUSSION
In the present study, we propose a novel pix2pix GAN variant with
temporal convolutions - coined temp-pix2pix - to generate expert-
level perfusion parameter maps from DSC-MR images in an end-to-
end fashion for the first time. The temp-pix2pix architecture showed
high performance in a dataset of acute stroke patients and good per-
formance on data of patients with chronic steno-occlusive disease.
Our results mark a decisive step towards the automated generation
of expert-level DSC perfusion maps for acute stroke and their appli-
cation in the clinical setting.
In acute stroke, “time is brain” (Saver (2006)). This requires rapid
decision making in the clinical setting to ensure an optimal outcome
for an affected patient. In such a situation, when DSC perfusion-
weighted imaging is used to stratify patients for treatment, a ma-
jor bottleneck is the generation of parameter maps derived from the
DSC source images. These maps of TTP, CBF, CBV, MTT, and
Tmax are different representations of the information encoded in
the time-intensity curve for each voxel. For all except TTP, to de-
rive robust and valid parameter maps, the time-intensity curve must
be deconvolved with an AIF (Calamante (2013)). Ideally, the AIF
is derived for each voxel separately, but in the clinical setting the
calculation of a global AIF is preferred (Calamante (2013)). The
gold standard is the manual selection of several - usually 3 or 4 -
AIFs in the hemisphere contralateral to the stroke, from segments of
the middle cerebral artery (Calamante (2013)). The manual selection
of AIFs is a tedious and time-consuming process that can only be
performed after training (Calamante (2013)). Therefore, automated
methods whose results are subsequently reviewed by an expert are
preferred in clinical practice (Calamante (2013)). While automated
methods have shown inconclusive results in the literature (Hansen
et al. (2016); Ghodrati et al. (2021); Galinovic et al. (2012); Pistoc-
chi et al. (2022); Deutschmann et al. (2021)), they are successfully
used in acute stroke to identify stroke-affected tissue. In our sample,
this was confirmed as the AIFs only required expert adjustment in
two out of 204 patients in the acute stroke set. Nevertheless, this ap-
proach still requires a manual check resulting in a time delay of a few
minutes per patient before patient stratification. As a consequence,
there is a major clinical need for automated methods that provide
final perfusion parameter maps without any manual input. Here, we
chose a GAN AI approach, as presenting this methodology expert
level perfusion maps would lead to a model after training that could
then generate expert-level perfusion maps, implicitly encoding the
choice of AIFs within ∼1.8 seconds per patient. Our exploratory
results show that this approach was successful.
This may have a positive impact on the clinical setting. First, it
would eliminate the need for manual review of AIFs. This would
reduce the time needed to calculate perfusion parameter maps and
also reduce resource requirements as radiologists and neurologists
would no longer need to be trained on how to identify optimal AIFs.
Second, as we have shown, it is even possible to calculate parameter
maps for patients who currently have to be excluded due to motion
artifacts that make it impossible for the standard software to calcu-
late the parameter maps. At this point, it is important to emphasize
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
Image-to-image GAN for synthesizing perfusion maps 7
Figure 5. Synthesized perfusion parameter maps (middle and bottom row) compared to the ground truth reviewed by an expert (top row) for one representative
patient from the PEGASUS test dataset. Both pix2pix and temp-pix2pix GAN synthesized most parameter maps that resemble the ground truth. Parts of MTT
were not entirely captured by pix2pix and temp-pix2pix. Moreover, the pix2pix GAN did not synthesize the higher intensities of Tmax well. For MTT and
Tmax, the temp-pix2pix GAN showed better performance in all metrics compared to the pix2pix GAN.
Figure 6. Mean performance metrics for evaluating the similarity between the ground truth and the synthesized parameter maps generated by the pix2pix GAN
(green) and the temp-pxi2pix GAN (blue) on the PEGASUS dataset. A and B show the mean absolute error (MAE) and normalized mean root squared error
(NRMSE) respectively (the lower the better). C and D show the structural similarity index measure (SSIM) and the peak-signal-to-noise-ratio (PSNR) (the
higher the better). For most metrics and parameter maps the temp-pix2pix architecture shows a better performance compared to the pix2pix GAN. In terms of
the metrics, the generated MTT maps showed the worst performance. The errorbar represents the standard deviation.
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
8Kossen et al.
Figure 7. The two patients with the poorest performance according to the metrics for each of the two datasets (A) and patients for which no AIF could be
computed (B). A: The first and second column show Tmax for two acute stroke patients. Whereas the synthesized image in the first column does not fully
capture the hypoperfused areas, the generated image in the second column looks quite close to the ground truth. Column three and four show MTT for two
PEGASUS patients. While the generated image in the third column shows less noise than the ground truth, the GAN introduced noise in the fourth column in the
synthesized image. B: Four Tmax maps generated by temp-pix2pix (upper row) and pix2pix (lower row) for cases from the acute stroke data for which no AIF
could be computed and, thus, with conventional methods not imaging would be available. Note that since motion artifacts affect the quality of the time-series,
in these cases the baseline pix2pix performs better than the temp-pix2pix.
that our study is exploratory and the generated model was and is only
used for internal research purposes. This is due to the fact that the
generative AI has fundamentally learned to approximate the non-AI
algorithm that was originally used to calculate the perfusion parame-
ter maps. To maximize clinical impact, we thus encourage the devel-
opers and vendors of relevant clinically used perfusion software to
consider adding GAN-based automated perfusion calculation mod-
ules to their products. To facilitate this process, we have made our
code publicly available.
One of the most important contributions of our approach was the
consideration of the temporal dimension of the time series input. Not
surprisingly, the temp-pix2pix architecture performed better than the
pix2pix GAN without a temporal component in both datasets. This
was particularly noticeable in the acute stroke dataset for parame-
ters directly related to the correct order of the time intensity curve,
namely TTP and Tmax. Maps of CBF, CBV and MTT (derived by
the central volume theorem as CBV/CBF) also performed quite well
in the baseline architecture without a temporal component, as for
these maps the order of input is not as relevant. This is because
CBV corresponds to the area under the time intensity curve and
CBF is calculated based on the height of the slope, which are in-
different to the order. In the chronic stroke dataset, the temp-pix2pix
also outperformed the baseline GAN without a temporal component.
However, the difference in performance was not as pronounced as in
the acute stroke dataset. This could be due to the fact that patients
with acute vascular obstruction usually have significantly higher de-
lays than patients with chronic steno-occlusive disease, and the per-
formance advantage of temp-pix2pix increases with increasing de-
lay. It is noteworthy that in contrast to the acute stroke patients in
the chronic steno-occlusive cohort, MTT and TTP maps performed
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
Image-to-image GAN for synthesizing perfusion maps 9
worse than the other parameter maps. This might be related to the
more complex perfusion pathophysiology in chronic steno-occlusive
disease. Whereas in acute stroke, delay is the main contributor to
blood flow abnormalities, in chronic steno-occlusive disease it is the
sum of delay and considerable dispersion due to vessel abnormalities
(Calamante et al. (2006)). This could pose particular difficulties for
neural networks to learn the relationships required to create param-
eter maps: MTT is as a parameter that depends on two other param-
eters (CBV and CBF) in the original software solutions, which are
likely to have greater variability in chronic steno-occlusive disease.
Addtionally, TTP delays are attributable to both delay and disper-
sion, with varying weights in individual patients leading again to a
larger variability (this effect is much less pronounced in Tmax pa-
rameter maps due to the deconvolution procedure). Such increased
variability might lead to less stable models and thus increased noise
in the generated maps.
Our work is the first work to utilize GANs to create perfusion
parameter maps in DSC-imaging. A few works exist that used dif-
ferent machine learning and deep learning methods to generate pa-
rameter perfusion maps from the DSC source image. For instance,
McKinley et al. (2018) used several classical voxel-wise machine
learning approaches to generate manually validated perfusion pa-
rameter maps and identified a tree-based algorithm as the best per-
forming model. Their best results for Tmax achieved a lower per-
formance with a NRMSE of 0.113 compared to our best model with
a NRMSE of 0.095. Vialard et al. (2021) suggested a deep learn-
ing based spatiotemporal U-net approach for translating DSC-MR
patches to CBV maps in patients with brain tumors. With a SSIM
of 0.821 their generated CBV maps obtained a worse performance
compared to our CBV generated by the temp-pix2pix model with
a SSIM of 0.986. In the field of stroke, Ho et al. (2016) proposed
a patch-based deep learning approach to generate CBF, CBV, MTT
and Tmax. The average RMSE for their generated Tmax showed a
higher error of 1.33 compared to ours with 0.06. Hess et al. (2019)
utilized a different voxel-wise deep learning approach to approxi-
mate Tmax from DSC-MR. This approach was clinically evaluated
in another study (Meier et al. (2019)). In Hess et al. (2019) they
reported the performance in terms of MAE with clipping to not ac-
count for noise. Their generated Tmax achieved a MAE with clip-
ping of 0.524 compared to our approach showing a MSE of 0.016.
These differences compared to our study might be due to the novel
use of the GAN method and the fact that our model considered whole
slices instead of patches to better account for the spatial dimension.
Our study has several limitations. First, our network was based on
2D slices instead of the full 3D volumes due to computations restric-
tions. It is likely that results could be improved further using the full
3D images. Secondly, our study is an exploratory hypothesis gen-
erating study. Its results need to be clinically validated in a future
study before an integrating into clinical practice would be possible.
Lastly, our approach so far is a black-box approach. It could be ex-
tended with explainable AI to generate insights which areas in the
source images are particularly relevant for the creation of different
perfusion parameter maps. This could further elucidate the causes
of the performance differences between maps that we identified and
could guide the way for further improvements.
5 CONCLUSION
We generated expert-level perfusion parameter maps using a novel
GAN approach showcasing that AI approaches might have the abil-
ity to overcome the need for oversight by medical experts. Our ex-
ploratory study paves the way for fully-automated DSC-MR pro-
cessing for faster patient stratification in acute stroke. In the clinical
setting where time is crucial for patient outcome, this could have a
big impact on standardized patient care in acute stroke.
DISCLOSURES
Tabea Kossen reported receiving personal fees from ai4medicine
outside the submitted work. Dr Madai reported receiving personal
fees from ai4medicine outside the submitted work. Adam Hilbert re-
ported receiving personal fees from ai4medicine outside the submit-
ted work. While not related to this work, Dr Sobesky reports receipt
of speakers’ honoraria from Pfizer, Boehringer Ingelheim, and Dai-
ichi Sankyo. Dr Frey reported receiving grants from the European
Commission, reported receiving personal fees from and holding an
equity interest in ai4medicine outside the submitted work.
ACKNOWLEDGEMENTS
Computation has been performed on the HPC for the Research clus-
ter of the Berlin Institute of Health.
REFERENCES
I. Ben Alaya, H. Limam, and T. Kraiem. Applications of artificial intel-
ligence for DWI and PWI data processing in acute ischemic stroke:
Current practices and future directions. Clinical Imaging, 81:79–86,
Jan. 2022. ISSN 0899-7071. doi:10.1016/j.clinimag.2021.09.015.
URL https://www.sciencedirect.com/science/article/pii/
S0899707121003880.
E. Berge, W. Whiteley, H. Audebert, G. De Marchis, A. C. Fonseca,
C. Padiglioni, N. Pérez de la Ossa, D. Strbian, G. Tsivgoulis, and
G. Turc. European Stroke Organisation (ESO) guidelines on intravenous
thrombolysis for acute ischaemic stroke. European Stroke Journal, 6(1):
I–LXII, Mar. 2021. ISSN 2396-9873. doi:10.1177/2396987321989865.
URL https://doi.org/10.1177/2396987321989865. Publisher:
SAGE Publications.
K. N. D. Brou Boni, J. Klein, L. Vanquin, A. Wagner, T. Lacornerie,
D. Pasquier, and N. Reynaert. MR to CT synthesis with multicenter data
in the pelvic area using a conditional generative adversarial network.
Physics in Medicine & Biology, 65(7):075002, Apr. 2020. ISSN 1361-
6560. doi:10.1088/1361-6560/ab7633. URL https://iopscience.
iop.org/article/10.1088/1361-6560/ab7633.
F. Calamante. Arterial input function in perfusion MRI: A comprehensive
review. Progress in Nuclear Magnetic Resonance Spectroscopy, 74:
1–32, Oct. 2013. ISSN 0079-6565. doi:10.1016/j.pnmrs.2013.04.002.
URL https://www.sciencedirect.com/science/article/pii/
S0079656513000514.
F. Calamante, L. Willats, D. G. Gadian, and A. Connelly. Bolus delay and
dispersion in perfusion MRI: implications for tissue predictor models in
stroke. Magnetic Resonance in Medicine, 55(5):1180–1185, May 2006.
ISSN 0740-3194. doi:10.1002/mrm.20873.
W. A. Copen, P. W. Schaefer, and O. Wu. MR Perfusion Imaging in Acute
Ischemic Stroke. Neuroimaging clinics of North America, 21(2):259–
283, May 2011. ISSN 1052-5149. doi:10.1016/j.nic.2011.02.007. URL
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3135980/.
H. Deutschmann, N. Hinteregger, U. Wießpeiner, M. Kneihsl, S. Fandler-
Höfler, M. Michenthaler, C. Enzinger, E. Hassler, S. Leber, and
G. Reishofer. Automated MRI perfusion-diffusion mismatch estima-
tion may be significantly different in individual patients when using dif-
ferent software packages. European Radiology, 31(2):658–665, Feb.
2021. ISSN 1432-1084. doi:10.1007/s00330-020-07150-8. URL
https://doi.org/10.1007/s00330-020-07150-8.
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
10 Kossen et al.
I. Galinovic, A.-C. Ostwaldt, C. Soemmer, H. Bros, B. Hotter, P. Bru-
necker, and J. B. Fiebach. Automated vs manual delineations of re-
gions of interest- a comparison in commercially available perfusion MRI
software. BMC Medical Imaging, 12(1):16, July 2012. ISSN 1471-
2342. doi:10.1186/1471-2342-12-16. URL https://doi.org/10.
1186/1471-2342-12-16.
V. Ghodrati, M. Bydder, A. Bedayat, A. Prosper, T. Yoshida, K.-L. Nguyen,
J. P. Finn, and P. Hu. Temporally aware volumetric generative ad-
versarial network-based MR image reconstruction with simultaneous
respiratory motion compensation: Initial feasibility in 3D dynamic cine
cardiac MRI. Magnetic Resonance in Medicine, 86(5):2666–2683,
2021. ISSN 1522-2594. doi:10.1002/mrm.28912. URL https:
//onlinelibrary.wiley.com/doi/abs/10.1002/mrm.28912.
_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/mrm.28912.
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,
S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks.
arXiv:1406.2661 [cs, stat], June 2014. URL http://arxiv.org/abs/
1406.2661. arXiv: 1406.2661.
M. Goyal, B. K. Menon, W. H. van Zwam, D. W. J. Dippel, P. J. Mitchell,
A. M. Demchuk, A. Dávalos, C. B. L. M. Majoie, A. van der Lugt, M. A.
de Miquel, G. A. Donnan, Y. B. W. E. M. Roos, A. Bonafe, R. Jahan,
H.-C. Diener, L. A. van den Berg, E. I. Levy, O. A. Berkhemer, V. M.
Pereira, J. Rempel, M. Millán, S. M. Davis, D. Roy, J. Thornton, L. S.
Román, M. Ribó, D. Beumer, B. Stouch, S. Brown, B. C. V. Campbell,
R. J. van Oostenbrugge, J. L. Saver, M. D. Hill, T. G. Jovin, and HER-
MES collaborators. Endovascular thrombectomy after large-vessel is-
chaemic stroke: a meta-analysis of individual patient data from five ran-
domised trials. Lancet (London, England), 387(10029):1723–1731, Apr.
2016. ISSN 1474-547X. doi:10.1016/S0140-6736(16)00163-X.
M. B. Hansen, K. Nagenthiraja, L. R. Ribe, K. H. Dupont,
L. Østergaard, and K. Mouridsen. Automated estimation of
salvageable tissue: Comparison with expert readers. Jour-
nal of Magnetic Resonance Imaging, 43(1):220–228, 2016.
ISSN 1522-2586. doi:10.1002/jmri.24963. URL https:
//onlinelibrary.wiley.com/doi/abs/10.1002/jmri.24963.
_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jmri.24963.
A. Hess, R. Meier, J. Kaesmacher, S. Jung, F. Scalzo, D. Liebeskind,
R. Wiest, and R. McKinley. Synthetic Perfusion Maps: Imaging Per-
fusion Deficits in DSC-MRI with Deep Learning. In A. Crimi, S. Bakas,
H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum, editors, Brainlesion:
Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Lec-
ture Notes in Computer Science, pages 447–455, Cham, 2019. Springer
International Publishing. ISBN 978-3-030-11723-8. doi:10.1007/978-3-
030-11723-8_45.
K. C. Ho, F. Scalzo, K. V. Sarma, S. El-Saden, and C. W. Arnold. A temporal
deep learning approach for MR perfusion parameter estimation in stroke.
In 2016 23rd International Conference on Pattern Recognition (ICPR),
pages 1315–1320, Cancun, Dec. 2016. IEEE. ISBN 978-1-5090-4847-2.
doi:10.1109/ICPR.2016.7899819. URL http://ieeexplore.ieee.
org/document/7899819/.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-Image Transla-
tion with Conditional Adversarial Networks. arXiv:1611.07004 [cs],
Nov. 2018. URL http://arxiv.org/abs/1611.07004. arXiv:
1611.07004.
G.-H. Jahng, K.-L. Li, L. Ostergaard, and F. Calamante. Perfusion Mag-
netic Resonance Imaging: A Comprehensive Update on Principles and
Techniques. Korean Journal of Radiology, 15(5):554, 2014. ISSN 1229-
6929, 2005-8330. doi:10.3348/kjr.2014.15.5.554. URL https://www.
kjronline.org/DOIx.php?id=10.3348/kjr.2014.15.5.554.
C. Krusche, C. Rio Bartulos, M. Abu-Mugheisib, M. Haimerl, and P. Wigger-
mann. Dynamic perfusion analysis in acute ischemic stroke: A compara-
tive study of two different softwares. Clinical Hemorheology and Micro-
circulation, 79(1):55–63, Jan. 2021. ISSN 1386-0291. doi:10.3233/CH-
219106. URL https://content.iospress.com/articles/
clinical-hemorheology-and-microcirculation/ch219106.
Publisher: IOS Press.
C.-H. Lin, J. L. Saver, B. Ovbiagele, W.-Y. Huang, and M. Lee. Endovas-
cular thrombectomy without versus with intravenous thrombolysis in
acute ischemic stroke: a non-inferiority meta-analysis of randomized
clinical trials. Journal of NeuroInterventional Surgery, 14(3):227–232,
Mar. 2022. ISSN 1759-8478, 1759-8486. doi:10.1136/neurintsurg-
2021-017667. URL https://jnis.bmj.com/content/14/3/227.
Publisher: British Medical Journal Publishing Group Section: Ischemic
stroke.
A. S. Lundervold and A. Lundervold. An overview of deep
learning in medical imaging focusing on MRI. Zeitschrift
für Medizinische Physik, 29(2):102–127, May 2019. ISSN
0939-3889. doi:10.1016/j.zemedi.2018.11.002. URL
https://www.sciencedirect.com/science/article/pii/
S0939388918301181.
M. McDermott, L. E. Skolarus, and J. F. Burke. A systematic re-
view and meta-analysis of interventions to increase stroke throm-
bolysis. BMC Neurology, 19(1):86, May 2019. ISSN 1471-2377.
doi:10.1186/s12883-019-1298-2. URL https://doi.org/10.1186/
s12883-019-1298-2.
R. McKinley, F. Hung, R. Wiest, D. S. Liebeskind, and F. Scalzo. A Ma-
chine Learning Approach to Perfusion Imaging With Dynamic Suscep-
tibility Contrast MR. Frontiers in Neurology, 9, 2018. ISSN 1664-2295.
doi:10.3389/fneur.2018.00717. URL https://www.frontiersin.
org/articles/10.3389/fneur.2018.00717/full. Publisher:
Frontiers.
R. Meier, P. Lux, B. Med, S. Jung, U. Fischer, J. Gralla, M. Reyes, R. Wiest,
R. McKinley, and J. Kaesmacher. Neural Network–derived Perfusion
Maps for the Assessment of Lesions in Patients with Acute Ischemic
Stroke. Radiology: Artificial Intelligence, 1(5):e190019, Sept. 2019.
doi:10.1148/ryai.2019190019. URL https://pubs.rsna.org/doi/
full/10.1148/ryai.2019190019. Publisher: Radiological Society
of North America.
K. Mouridsen, M. B. Hansen, L. Østergaard, and S. N. Jespersen. Reliable
Estimation of Capillary Transit Time Distributions Using DSC-MRI.
Journal of Cerebral Blood Flow & Metabolism, 34(9):1511–1521, Sept.
2014. ISSN 0271-678X. doi:10.1038/jcbfm.2014.111. URL https:
//doi.org/10.1038/jcbfm.2014.111. Publisher: SAGE Publica-
tions Ltd STM.
M. A. Mutke, V. I. Madai, F. C. von Samson-Himmelstjerna, O. Zaro Weber,
G. S. Revankar, S. Z. Martin, K. L. Stengl, M. Bauer, S. Hetzer, M. Gün-
ther, and J. Sobesky. Clinical evaluation of an arterial-spin-labeling prod-
uct sequence in steno-occlusive disease of the brain. PloS One, 9(2):
e87143, 2014. ISSN 1932-6203. doi:10.1371/journal.pone.0087143.
D. Nie, R. Trullo, J. Lian, L. Wang, C. Petitjean, S. Ruan, Q. Wang,
and D. Shen. Medical Image Synthesis with Deep Convolu-
tional Adversarial Networks. IEEE Transactions on Biomedi-
cal Engineering, 65(12):2720–2730, Dec. 2018. ISSN 1558-2531.
doi:10.1109/TBME.2018.2814538. Conference Name: IEEE Transac-
tions on Biomedical Engineering.
S. Pistocchi, D. Strambo, B. Bartolini, P. Maeder, R. Meuli, P. Michel,
and V. Dunet. MRI software for diffusion-perfusion mismatch
analysis may impact on patients’ selection and clinical outcome.
European Radiology, 32(2):1144–1153, Feb. 2022. ISSN 1432-
1084. doi:10.1007/s00330-021-08211-2. URL https://doi.org/10.
1007/s00330-021-08211-2.
B. Rehani, S. G. Ammanuel, Y. Zhang, W. Smith, D. L. Cooke, S. W. Hetts,
S. A. Josephson, A. Kim, J. C. Hemphill, and W. Dillon. A New Era
of Extended Time Window Acute Stroke Interventions Guided by Imag-
ing. The Neurohospitalist, 10(1):29–37, Jan. 2020. ISSN 1941-8744.
doi:10.1177/1941874419870701. URL https://doi.org/10.1177/
1941874419870701. Publisher: SAGE Publications Inc.
J. L. Saver. Time Is Brain—Quantified. Stroke, 37(1):263–266,
Jan. 2006. doi:10.1161/01.STR.0000196957.55928.ab. URL
https://www.ahajournals.org/doi/full/10.1161/01.STR.
0000196957.55928.ab. Publisher: American Heart Association.
A. Sharobeam and B. Yan. Advanced imaging in acute ischemic
stroke: an updated guide to the hub-and-spoke hospitals. Cur-
rent Opinion in Neurology, 35(1):24–30, Feb. 2022. ISSN 1350-
7540. doi:10.1097/WCO.0000000000001020. URL https:
//journals.lww.com/co-neurology/Fulltext/2022/02000/
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
Image-to-image GAN for synthesizing perfusion maps 11
Advanced_imaging_in_acute_ischemic_stroke__an.6.aspx.
V. N. Thijs, D. M. Somford, R. Bammer, W. Robberecht, M. E. Moseley, and
G. W. Albers. Influence of Arterial Input Function on Hypoperfusion
Volumes Measured With Perfusion-Weighted Imaging. Stroke, 35(1):
94–98, Jan. 2004. doi:10.1161/01.STR.0000106136.15163.73. URL
https://www.ahajournals.org/doi/full/10.1161/01.STR.
0000106136.15163.73. Publisher: American Heart Association.
G. Turc, P. Bhogal, U. Fischer, P. Khatri, K. Lobotesis, M. Mazighi, P. D.
Schellinger, D. Toni, J. de Vries, P. White, and J. Fiehler. European
Stroke Organisation (ESO) – European Society for Minimally Invasive
Neurological Therapy (ESMINT) Guidelines on Mechanical Thrombec-
tomy in Acute Ischaemic StrokeEndorsed by Stroke Alliance for Europe
(SAFE). European Stroke Journal, 4(1):6–12, Mar. 2019. ISSN 2396-
9873. doi:10.1177/2396987319832140. URL https://doi.org/10.
1177/2396987319832140. Publisher: SAGE Publications.
J. V. Vialard, M.-M. Rohé, P. Robert, F. Nicolas, and A. Bône. Going be-
yond voxel-wise deconvolution in perfusion MRI: learning and leverag-
ing spatio-temporal regularities with the stU-Net. page 6, 2021.
M. N. Wernick, Y. Yang, J. G. Brankov, G. Yourganov, and S. C.
Strother. Machine Learning in Medical Imaging. IEEE Signal
Processing Magazine, 27(4):25–38, July 2010. ISSN 1558-0792.
doi:10.1109/MSP.2010.936730. Conference Name: IEEE Signal Pro-
cessing Magazine.
X. Yi, E. Walia, and P. Babyn. Generative Adversarial Network in Med-
ical Imaging: A Review. Medical Image Analysis, 58:101552, Dec.
2019. ISSN 13618415. doi:10.1016/j.media.2019.101552. URL http:
//arxiv.org/abs/1809.07294. arXiv: 1809.07294.
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Trans-
lation using Cycle-Consistent Adversarial Networks. arXiv:1703.10593
[cs], Aug. 2020. URL http://arxiv.org/abs/1703.10593. arXiv:
1703.10593.
. CC-BY 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint
8
Discussion
8.1 Summary
This thesis aimed to investigate the opportunities and challenges of GANs for image synthesis
in the field of stroke. For this, we focused on two main topics that divided this thesis into
two parts. In the first part (Chapter 4–6), we examined the synthesis of stroke images for
data sharing, while in the second part, we performed image-to-image translation by extracting
perfusion maps from DSC-MRI for fast patient stratification.
In the first part, we showed that 2D synthesized data could preserve the essential predictive
properties for brain vessel segmentation. A segmentation network trained on our synthetic
data achieved a high Dice Similarity Coefficient of 0.85 on real test data (Chapter 4). In a
transfer learning approach, we simulated sharing our synthetic data by pre-initializing the
weights from a model trained on synthetic data and evaluating the performance of this model
on a second dataset. We fine-tuned this network with increasing amounts of new data and
showed that our fine-tuned network outperforms the segmentation networks trained on the
new dataset alone. Furthermore, our fine-tuned model needed less newly annotated data than
a model trained from scratch to achieve a comparable segmentation performance, showcasing
the potential benefit of sharing synthetic data.
Building on these encouraging results, we extended our GAN architectures to generate
high-resolution 3D volumes in order to exploit all three dimensions and spatial relations within
the medical images. The change to 3D substantially increased the computational load, which
we managed by implementing the two timescale update rule as well as by introducing mixed
precision training. Compared to Chapter 4, we increased the amount of generated voxels
by a factor of roughly 100, whereas the number of filters per layer was halved for our best
performing model. Even with these restrictions, the segmentation model trained on our 3D
volumes showed a comparable performance on real data as our 2D model, indicating a benefit
of including the third dimension. Additionally, we extended the assessment of the synthetic
images giving a more in-depth evaluation by computing precision, recall, and the FID using
the activations of the pre-trained MedicalNet [95].
85
8. Discussion
After verifying that synthetic TOF-MRA patches still pertain predictive properties for
our use case of brain vessel segmentation, we wanted to investigate their degree of privacy.
Previous research has suggested that artificially created datasets are not necessarily private as
ML models such as GANs are susceptible to membership inference attacks. A successful attack
could jeopardize the privacy of the patients that were used to train the GAN. To provide
an upper bound on the patient’s privacy leakage, the mathematical concept of differential
privacy was introduced. Differential privacy can be integrated into the training of a GAN and
quantify the privacy, while reducing the vulnerability of the model to membership inference
attacks. In Chapter 6, we implemented differential privacy into our 2D GAN architecture
from Chapter 4 and explored the impact of privacy restrictions on the quality of the generated
images. Moreover, we tested the usability of the newly created images for varying levels
of privacy in the brain vessel segmentation task. We could identify a good Dice Similarity
Coefficient of 0.75 for an acceptable privacy bound of
ϵ
= 7
.
4and identified a threshold of
ϵ < 5for which the images became unusable.
Taken together, we could show that GANs are able to generate 2D, 3D, and privacy-
preserving medical images for brain vessel segmentation. There are still computational
obstacles when training GANs, especially when training them on high-dimensional data such
as large high-resolution 3D images. Nevertheless, if the computational setup allows for it,
generating 3D images might be beneficial for the downstream task. Moreover, introducing
differential privacy into GAN training is feasible for synthesizing 2D image patches. We
show that the synthetic privacy-preserving patches still maintained the predictive properties
necessary for our segmentation task. However, the implementation of differential privacy
came along with a performance drop when utilizing the synthetic, private-preserving data in a
downstream ML model.
In the second part of the thesis, we showcased another application of GANs for medical
image synthesis, i.e., an image-to-image translation. We developed a pix2pix GAN variant for
automating the extraction of perfusion maps from DSC-MRI scans for treatment planning in
stroke (see Chapter 7). To this end, we introduced temporal convolutions into the generator’s
architecture to account for the temporal dimension. Our GAN variant achieved excellent
performance on acute stroke patient data and good performance on data of patients with
cerebrovascular disease. Notably, we could even generate perfusion images for DSC-MRI scans
with motion artifacts, for which conventional approaches have failed. Our results pave the way
for fully-automated translation of DSC-MRI to perfusion maps for fast patient stratification in
acute ischemic stroke.
8.2 Discussion and Outlook
8.2.1 Synthesis of Medical Images for Data Sharing
We have demonstrated that we can utilize GANs for generating 2D, 3D, and privacy-preserving
TOF-MRA image patches for the ultimate goal of data sharing. Comparing our 2D and 3D
approaches, we could show that 3D information incorporated into GAN architectures might
be beneficial for the downstream task, especially when large computing power is available
(see Chapter 5). Further studies should investigate whether our findings generalize to similar
86
8.2 Discussion and Outlook
datasets as well as to different MR sequences and modalities. Synthesizing images based
on patients from different cohorts and scanners could also increase the synthetic data’s
heterogeneity.
In the context of data sharing, it would be particularly interesting to test our GAN
architectures (Chapter 4–6) on other neuroimaging data because brain images are especially
sensitive. Since only a few studies have synthesized medical images using differential private
GAN architectures, it would be interesting whether our findings regarding the privacy-utility
trade-off generalize. A more detailed investigation of this trade-off on different images would
allow for a better understanding of the parameter
ϵ
and the associated performance drop with
decreasing
ϵ
. In addition, future studies could investigate whether the observed performance
drop for differential private GANs holds true when including more images for training. In a
study, Xu et al. have already shown on non-medical images that a larger dataset utilized for
training a GAN with the same privacy budget led to better performing synthetic images [130].
To facilitate the training of our architectures on other data, we have made our code for each
study publicly available.
Another related aspect of privacy-preserving synthetic data is membership inference
attacks [131]. These attacks aim to identify whether a data sample was part of the training set
for an ML model and are a potential privacy breach for generative models [68]. Future studies
could investigate the direct effect of differential privacy in GAN training on the success rate of
those attacks in the neuroimaging domain. While this effect has already been investigated
on non-medical images [68, 130], studies investigating the accuracy of membership inference
attacks on GANs synthesizing medical images have yet to be performed. Such studies would
help to better translate the value of
ϵ
into a probability estimation for re-identifying patient
data with state-of-the-art attacks.
While differential privacy is a powerful technique to quantify the privacy leakage of the data
itself, other secure AI techniques can complement it. One technique that recently caught much
attention is federated learning [71]. This decentralized approach relies on transferring the model
weights during the training process and hence allows for training on several datasets without
transferring the data. This makes collaborations and data sharing across institutions not
only simpler but also more secure. Another advantage is that larger and more heterogeneous
datasets can be combined and used to build more robust ML models [132, 133]. However,
federated learning alone does not offer data security unless combined with other privacy-
preserving methods such as differential privacy [70]. Other promising secure AI techniques
that could be explored in the medical imaging domain are homomorphic encryption and secure
multi-party computation [70].
In general, privacy-preserving techniques aim to bridge the gap between usable, data-
driven modeling and maintaining the patients’ privacy from a technological point of view [70].
Nevertheless, privacy in the medical field is a multi-disciplinary endeavor, which should not
only include ML and cryptography researchers but also informed patients, physicians, and
policymakers. Here, an open discussion is needed about the expectations on data security, the
trade-off between the patients’ privacy, and the opportunities of ML applications to improve
87
8. Discussion
patient care. In any case, we believe that secure and private ML is a prerequisite for building
trust of patients and physicians in ML systems.
To summarize, large, publicly available datasets are crucial for the development and
application of ML models and for GANs. As medical data, including stroke imaging, is
sensitive, the lack of open data substantially constraints this research. The opportunity of
filling this gap with synthetic, privacy-preserving data is, therefore, an encouraging research
direction.
8.2.2 Image-to-Image Translation for Stroke Treatment Planning
In the second part of the thesis, we showed that a GAN variant could be utilized for synthesizing
perfusion parameter maps from DSC-MRI in stroke imaging. As a result, our model’s synthetic
perfusion maps closely resembled the ground truth. However, it is crucial to note that our
model is exploratory at this stage and still needs clinical validation.
Whereas in the traditional approach, an AIF is placed, and the perfusion maps are calculated
voxel-wise, our approach operated on the whole slice. If patients move during the recording of
the MR sequences, the sequences can be misaligned. For these patients, a voxel-wise approach
cannot synthesize perfusion maps as it relies on aligned voxels in the temporal dimension of
the DSC-MRI. Here, our slice-wise approach can have a decisive advantage. In our work (see
Chapter 7), we show first results for patients with these so-called motion artifacts. Nevertheless,
future studies could investigate this in more depth. For example, models could be trained on
data specifically augmented by image sequences, for which patients’ movements are simulated.
In clinical practice, these models offer a solution for patients for which otherwise no perfusion
parameter maps would be available.
Similar GAN models could also be utilized for cross-modality synthesis. For example,
MR can be translated into CT images [85] or MR to positron emission tomography [134].
These approaches could also be leveraged to merge existing datasets that were initially in two
different modalities and thus increase data availability. In the same way, perfusion parameter
maps could also be generated from another modality or sequence that is not DSC-MRI as
it requires the injection of a contrast agent to which some patients might be allergic [135].
Promising candidates are images containing fine-grained information about the vessels such
as TOF-MRA. Since perfusion-weighted imaging would not need to be additionally recorded,
this could potentially speed up the clinical routine. Image-to-image translations have vast
applications in the medical domain, and we believe GAN solutions can substantially impact
patient care not only in stroke but also in medical imaging in general.
8.2.3 Challenges and Opportunities for GANs in Medical Imaging
GANs can be regarded as a universal concept that is invariant to a specific type of model
architecture. Nowadays, most GAN architectures in medical imaging rely on deep convolutional
networks [30]. Therefore, the limitations of these networks also concern GANs. For example,
deep neural networks are regarded as black box approaches [136]. Especially in the medical
field, explainable algorithms are crucial for building the patient’s and physician’s trust in the
ML system. The field of explainable AI aims to find solutions to the opaqueness of neural
networks either by developing interpretable algorithms or finding retrospective explanations
88
8.2 Discussion and Outlook
for the model’s behavior. This research can also be beneficial to GANs as the architecture of
both the generator and the discriminator could be replaced by more explainable algorithms.
Suppose our GAN in Chapter 7 could not only be clinically validated, but the internal workings
of our generator could additionally be explained in more detail. In that case, our approach for
perfusion map generation might be accepted by more clinicians as an alternative to conventional
approaches in clinical practice. Very recent studies have started to investigate this for GANs
using non-medical data [137] as well as for a GAN trained on CT images [138]. However, more
research needs to be conducted to incorporate explainability into GAN architectures which
could eventually also be applied to our GAN architecture for synthesizing perfusion parameter
maps.
Furthermore, related GAN approaches aim to disentangle features and thus make the
synthetic images more interpretable. Our work synthesized images based on an intrinsic
representation the GAN has learned without explicitly knowing what kind of features this
entails. To shed light on this, researchers aimed to learn disentangled representations by
associating meaning to the latent variables that are fed into the generator, for example, by
utilizing an InfoGAN architecture [139]. After training the network, the latent variable can
control a subset of features within the image [139]. In an example of non-medical images,
the width or rotation of a synthetic handwritten digit can be altered by fixing the noise
vector and carefully adjusting the latent vector. A similar approach can also be used in
the medical field. For instance, Toda et al. utilized an InfoGAN architecture to generate
lung tumors in different shapes and sizes [140]. The authors tested their synthetic data by
augmenting data for lung cancer classification and achieving a better result than augmenting
with WGAN-created synthetic data. Other popular strategies for disentanglement using GAN
models involve disentangling content and style, which can be utilized for data augmentation
as well as cross-modality synthesis [141].
Generally, disentangled GAN approaches are especially valuable for patients with rare
diseases that are usually underrepresented in a dataset. Here, those patient data could be
augmented and thus improve performance in the medical task. In our use case of brain
vessel segmentation (Chapter 4–6), the biggest challenge is to segment rare pathologies [38,
57]. A disentangled model that could generate pathologies in a controlled manner and
augment segmentation networks with it might improve their performance. So far, not all
disentanglement strategies developed have been tested in the medical field [141]. Therefore,
we believe more explainable and interpretable models, including feature disentanglement, will
emerge to make GANs in healthcare more trustworthy.
Another promising neural network architecture for GANs in the medical field is vision
transformers [142, 143]. Transformers stem from the field of natural language processing
and have recently been adapted to images. A vision transformer takes image patches
as an input, embeds them, and then feeds them through a transformer encoder. This
encoder consists of different layers, including multi-headed attention layers. The built-in
attention within a vision transformer offers the opportunity to visualize attention maps that
can provide insights into the model decision process [144]. Thus, by design, they can be
regarded as more interpretable compared to convolutional neural networks. This difference
89
8. Discussion
in interpretability, however, needs to be further evaluated in future studies [144]. Vision
transformer models particularly profit from large datasets, and in such a setting, they have
already outperformed deep convolutional networks [142]. Several studies have tested vision
transformers on medical images. Still, they rely on pre-training on large datasets only
available for non-medical images to achieve a performance comparable to convolutional neural
networks [143]. Only a few studies have investigated the combination of GANs and visual
transformers in the medical field [143, 145]. With increasing data availability in the medical
imaging field – which could be achieved, for instance, by data sharing – they might be a useful
architecture for an ML model on medical images as well as medical image synthesis using GANs.
Another major challenge for synthetic data, in general, is a proper evaluation. While
a human rater could directly estimate the data quality, this approach is time-consuming,
especially when a trained medical expert is needed. Many metrics have been proposed to
automate and quantify synthetic images that are mostly based on comparing the intensity
distributions or the activations in a pre-trained network [30, 97]. To date, no standardized
metric exists [97]. When the ground truth is available (as in Chapter 7), metrics such as the
structural similarity index measure or peak-signal-to-noise ratio can be computed. However,
these metrics might not fully capture important properties of medical images [146]. Recently,
Alaa et al. proposed a more holistic, model-agnostic approach to evaluate synthetic data [147].
They identified fidelity, diversity, and generalization as key components for the evaluation.
These were translated into a 3D evaluation metric, containing so-called
α
-Precision,
β
-Recall,
and Authenticity. The authors were able to improve the synthetic data via model auditing using
their three part 3D metric. This was confirmed by increased performance in the downstream
task. Such holistic evaluation strategies for synthetic data are a promising new direction for
future GAN evaluation.
Research on GANs is still presented with several challenges. For instance, training a
GAN is computationally expensive, especially when training multiple generators and/or
discriminators on 3D medical images. Adapted software and hardware could facilitate GAN
training further [148]. Other main challenges revolve around problems such as vanishing
gradients, mode collapse, and unstable training. While there are already improvements for
these problems, they remain challenges that need to be addressed [149]. Nevertheless, GANs
have accelerated the field of generative models and have set new standards for synthesizing
high-quality images. With the generation of high-resolution, realistic-looking synthetic images
that retain the predictive properties, GANs could revolutionize the field of medical imaging
and solve the lack of data availability as well as efficiently process medical images. Future
studies should consider privacy aspects, explainability, and clinical validation.
90
References
[1]
The GBD 2016 Lifetime Risk of Stroke Collaborators. “Global, Regional, and Country-Specific
Lifetime Risks of Stroke, 1990 and 2016”. In: New England Journal of Medicine 379.25 (2018),
pp. 2429–2437. doi:10.1056/NEJMoa1804492.
[2]
V. L. Feigin, B. A. Stark, C. O. Johnson, G. A. Roth, C. Bisignano, et al. “Global, regional,
and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the
Global Burden of Disease Study 2019”. In: The Lancet Neurology 20.10 (2021), pp. 795–820.
doi:10.1016/S1474-4422(21)00252-0.
[3]
J. Burn, M. Dennis, J. Bamford, P. Sandercock, D. Wade, and C. Warlow. “Long-term risk
of recurrent stroke after a first-ever stroke. The Oxfordshire Community Stroke Project.” In:
Stroke 25.2 (1994), pp. 333–337. doi:10.1161/01.STR.25.2.333.
[4]
J. F. d. Carmo, R. L. Morelato, H. P. Pinto, and E. R. A. d. Oliveira. “Disability after
stroke: a systematic review”. In: Fisioterapia em Movimento 28 (2015), pp. 407–418. doi:
10.1590/0103-5150.028.002.AR02.
[5]
H. A. Wafa, C. D. Wolfe, E. Emmett, G. A. Roth, C. O. Johnson, and Y. Wang. “Burden of Stroke
in Europe”. In: Stroke 51.8 (2020), pp. 2418–2427. doi:10.1161/STROKEAHA.120.029606.
[6]
R. Luengo-Fernandez, M. Violato, P. Candio, and J. Leal. “Economic burden of stroke across
Europe: A population-based cost analysis”. In: European Stroke Journal 5.1 (2020), pp. 17–25.
doi:10.1177/2396987319883160.
[7]
B. Rehani, S. G. Ammanuel, Y. Zhang, W. Smith, D. L. Cooke, S. W. Hetts, S. A. Josephson,
A. Kim, J. C. Hemphill, and W. Dillon. “A New Era of Extended Time Window Acute
Stroke Interventions Guided by Imaging”. In: The Neurohospitalist 10.1 (2020), pp. 29–37. doi:
10.1177/1941874419870701.
[8]
G. Thomalla and C. Gerloff. “Acute imaging for evidence-based treatment of ischemic stroke”. In:
Current Opinion in Neurology 32.4 (2019), pp. 521–529. doi:
10.1097/WCO.0000000000000716
.
[9]
A. S. Lundervold and A. Lundervold. “An overview of deep learning in medical imaging focusing
on MRI”. In: Zeitschrift für Medizinische Physik. Special Issue: Deep Learning in Medical
Physics 29.2 (2019), pp. 102–127. doi:10.1016/j.zemedi.2018.11.002.
[10]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet classification with deep convolutional
neural networks”. In: Communications of the ACM 60.6 (2017), pp. 84–90. doi:
10.1145/3065
386.
[11]
J.
-
G. Lee, S. Jun, Y.
-
W. Cho, H. Lee, G. B. Kim, J. B. Seo, and N. Kim. “Deep Learning in
Medical Imaging: General Overview”. In: Korean Journal of Radiology 18.4 (2017), pp. 570–584.
doi:10.3348/kjr.2017.18.4.570.
[12]
C. Tian, L. Fei, W. Zheng, Y. Xu, W. Zuo, and C.
-
W. Lin. “Deep learning on image denoising: An
overview”. In: Neural Networks 131 (2020), pp. 251–275. doi:
10.1016/j.neunet.2020.07.025
.
91
REFERENCES
[13]
A. Yala, C. Lehman, T. Schuster, T. Portnoi, and R. Barzilay. “A Deep Learning Mammography-
based Model for Improved Breast Cancer Risk Prediction”. In: Radiology 292.1 (2019), pp. 60–66.
doi:10.1148/radiol.2019182716.
[14]
Z. Guo, X. Li, H. Huang, N. Guo, and Q. Li. “Deep Learning-Based Image Segmentation
on Multimodal Medical Imaging”. In: IEEE Transactions on Radiation and Plasma Medical
Sciences 3.2 (2019), pp. 162–169. doi:10.1109/TRPMS.2018.2890359.
[15]
R. Aggarwal, V. Sounderajah, G. Martin, D. S. W. Ting, A. Karthikesalingam, D. King,
H. Ashrafian, and A. Darzi. “Diagnostic accuracy of deep learning in medical imaging: a
systematic review and meta-analysis”. In: npj Digital Medicine 4.1 (2021), pp. 1–23. doi:
10.1038/s41746-021-00438-z.
[16]
L. Lu, L. Dercle, B. Zhao, and L. H. Schwartz. “Deep learning for the prediction of early
on-treatment response in metastatic colorectal cancer from serial medical imaging”. In: Nature
Communications 12.1 (2021), p. 6654. doi:10.1038/s41467-021-26990-6.
[17]
K. C. Ho, W. Speier, S. El-Saden, and C. W. Arnold. “Classifying Acute Ischemic Stroke Onset
Time using Deep Imaging Features”. In: AMIA Annual Symposium Proceedings 2017 (2018),
pp. 892–901. issn: 1942-597X. url:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC597
7679/.
[18]
L. Chen, P. Bentley, and D. Rückert. “Fully automatic acute ischemic lesion segmentation in
DWI using convolutional neural networks”. In: NeuroImage: Clinical 15 (2017), pp. 633–643.
doi:10.1016/j.nicl.2017.06.016.
[19]
N. Stier, N. Vincent, D. Liebeskind, and F. Scalzo. “Deep learning of tissue fate features in acute
ischemic stroke”. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine
(BIBM). 2015, pp. 1316–1321. doi:10.1109/BIBM.2015.7359869.
[20]
A. Nielsen, M. B. Hansen, A. Tietze, and K. Mouridsen. “Prediction of Tissue Outcome and
Assessment of Treatment Effect in Acute Ischemic Stroke Using Deep Learning”. In: Stroke 49.6
(2018), pp. 1394–1401. doi:10.1161/STROKEAHA.117.019740.
[21]
H. Kamal, V. Lopez, and S. A. Sheth. “Machine Learning in Acute Ischemic Stroke
Neuroimaging”. In: Frontiers in Neurology 9 (2018). doi:10.3389/fneur.2018.00945.
[22]
M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio,
R. M. Summers, D. L. Rubin, and M. P. Lungren. “Preparing Medical Imaging Data for Machine
Learning”. In: Radiology 295.1 (2020), pp. 4–15. doi:10.1148/radiol.2020192224.
[23]
C. G. Schwarz, W. K. Kremers, T. M. Therneau, R. R. Sharp, J. L. Gunter, P. Vemuri, A. Arani,
A. J. Spychalla, K. Kantarci, D. S. Knopman, R. C. Petersen, and C. R. Jack. “Identification
of Anonymous MRI Research Participants with Face-Recognition Software”. In: New England
Journal of Medicine 381.17 (2019), pp. 1684–1686. doi:10.1056/NEJMc1908881.
[24]
D. Abramian and A. Eklund. “Refacing: Reconstructing Anonymized Facial Features Using
GANS”. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).
2019, pp. 1104–1108. doi:10.1109/ISBI.2019.8759515.
[25]
D. Duan, S. Xia, I. Rekik, Z. Wu, L. Wang, W. Lin, J. H. Gilmore, D. Shen, and G. Li.
“Individual identification and individual variability analysis based on cortical folding features in
developing infant singletons and twins”. In: Human Brain Mapping 41.8 (2020), pp. 1985–2003.
doi:10.1002/hbm.24924.
[26]
I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
and Y. Bengio. “Generative Adversarial Networks”. In: arXiv (2014). doi:
10.48550/ARXIV.14
06.2661.
92
REFERENCES
[27]
K. Armanious, C. Jiang, M. Fischer, T. Küstner, T. Hepp, K. Nikolaou, S. Gatidis, and B. Yang.
“MedGAN: Medical image translation using GANs”. In: Computerized Medical Imaging and
Graphics 79 (2020). doi:10.1016/j.compmedimag.2019.101684.
[28]
Z. Yin, K. Xia, Z. He, J. Zhang, S. Wang, and B. Zu. “Unpaired Image Denoising via Wasserstein
GAN in Low-Dose CT Image with Multi-Perceptual Loss and Fidelity Loss”. In: Symmetry 13.1
(2021). doi:10.3390/sym13010126.
[29]
H.
-
C. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter,
K. P. Andriole, and M. Michalski. “Medical Image Synthesis for Data Augmentation and
Anonymization Using Generative Adversarial Networks”. In: Simulation and Synthesis in
Medical Imaging. Ed. by A. Gooya, O. Goksel, I. Oguz, and N. Burgos. Lecture Notes in
Computer Science. Cham: Springer International Publishing, 2018, pp. 1–11. isbn: 978-3-030-
00536-8. doi:10.1007/978-3-030-00536-8_1.
[30]
X. Yi, E. Walia, and P. Babyn. “Generative Adversarial Network in Medical Imaging: A Review”.
In: Medical Image Analysis 58 (2019). doi:10.1016/j.media.2019.101552.
[31]
J. Rubin and S. M. Abulnaga. “CT-To-MR Conditional Generative Adversarial Networks for
Ischemic Stroke Lesion Segmentation”. In: 2019 IEEE International Conference on Healthcare
Informatics (ICHI). 2019, pp. 1–7. doi:10.1109/ICHI.2019.8904574.
[32]
L. Bi, J. Kim, A. Kumar, D. Feng, and M. Fulham. “Synthesis of Positron Emission Tomography
(PET) Images via Multi-channel Generative Adversarial Networks (GANs)”. In: Molecular
Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and
Treatment. Ed. by M. J. Cardoso, T. Arbel, F. Gao, B. Kainz, T. van Walsum, K. Shi,
K. K. Bhatia, R. Peter, T. Vercauteren, M. Reyes, A. Dalca, R. Wiest, W. Niessen, and B. J.
Emmer. Cham: Springer International Publishing, 2017, pp. 43–51. isbn: 978-3-319-67564-0.
doi:10.1007/978-3-319-67564-0_5.
[33]
G. Kwon, C. Han, and D.
-
s. Kim. “Generation of 3D Brain MRI Using Auto-Encoding Generative
Adversarial Networks”. In: Medical Image Computing and Computer Assisted Intervention –
MICCAI 2019. Ed. by D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P.
-
T. Yap,
and A. Khan. Cham: Springer International Publishing, 2019, pp. 118–126. isbn: 978-3-030-
32248-9. doi:10.1007/978-3-030-32248-9_14.
[34]
T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J. Sobesky,
M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing anonymized and
labeled TOF-MRA patches for brain vessel segmentation using generative adversarial networks”.
In: Computers in Biology and Medicine 131 (2021). doi:
10.1016/j.compbiomed.2021.104254
.
[35]
P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert, J. Sobesky,
M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I. Madai. “Generating
3D TOF-MRA Volumes and Segmentation Labels using Generative Adversarial Networks”. In:
Medical Image Analysis (2022). doi:10.1016/j.media.2022.102396.
[36]
T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand, S. Pokutta,
K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey.
“Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation
Labels Using Generative Adversarial Networks”. In: Frontiers in Artificial Intelligence 5 (2022).
doi:10.3389/frai.2022.813842.
[37]
T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland, A. Hilbert,
J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative adversarial networks for
synthesizing perfusion parameter maps from DSC-MR images in cerebrovascular disease”. In:
medRxiv (2022). doi:10.1101/2022.05.24.22274901.
93
REFERENCES
[38]
M. Livne, J. Rieger, O. U. Aydin, A. A. Taha, E. M. Akay, T. Kossen, J. Sobesky, J. D.
Kelleher, K. Hildebrand, D. Frey, and V. I. Madai. “A U-Net Deep Learning Framework for
High Performance Vessel Segmentation in Patients With Cerebrovascular Disease”. In: Frontiers
in Neuroscience 13 (2019), p. 97. doi:10.3389/fnins.2019.00097.
[39]
M. Ivantsits, L. Goubergrits, J.
-
M. Kuhnigk, M. Huellebrand, J. Brüning, T. Kossen, B.
Pfahringer, J. Schaller, A. Spuler, T. Kuehne, and A. Hennemuth. “Cerebral Aneurysm Detection
and Analysis Challenge 2020 (CADA)”. In: Cerebral Aneurysm Detection and Analysis: First
Challenge, CADA 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8,
2020, Proceedings. Lima, Peru: Springer-Verlag, 2020, pp. 3–17. isbn: 978-3-030-72861-8. doi:
10.1007/978-3-030-72862-5_1.
[40]
M. Ivantsits, L. Goubergrits, J.
-
M. Kuhnigk, M. Huellebrand, J. Bruening, T. Kossen, B.
Pfahringer, J. Schaller, A. Spuler, T. Kuehne, Y. Jia, X. Li, S. Shit, B. Menze, Z. Su, J. Ma, Z.
Nie, K. Jain, Y. Liu, Y. Lin, and A. Hennemuth. “Detection and analysis of cerebral aneurysms
based on X-ray rotational angiography - the CADA 2020 challenge”. In: Medical Image Analysis
77 (2022). doi:10.1016/j.media.2021.102333.
[41]
A. Meddeb, T. Kossen, K. K. Bressem, B. Hamm, and S. N. Nagel. “Evaluation of a Deep
Learning Algorithm for Automated Spleen Segmentation in Patients with Conditions Directly
or Indirectly Affecting the Spleen”. In: Tomography 7.4 (2021), pp. 950–960. doi:
10.3390/tom
ography7040078.
[42]
A. K. A. Unnithan and P. Mehta. Hemorrhagic Stroke. StatPearls Publishing, Treasure Island
(FL), 2021. url:http://europepmc.org/books/NBK559173.
[43]
S. A. Randolph. “Ischemic Stroke”. In: Workplace Health & Safety 64.9 (2016), pp. 444–444.
doi:10.1177/2165079916665400.
[44]
M. T. Beckhauser, L. H. Castro-Afonso, F. A. Dias, G. S. Nakiri, L. M. Monsignore, R. K.
Martins Filho, M. R. Camilo, F. F. Aléssio Alves, M. Libardi, G. R. Rodrigues, O. M. Pontes-
Neto, and D. G. Abud. “Extended Time Window Mechanical Thrombectomy for Acute Stroke
in Brazil”. In: Journal of Stroke and Cerebrovascular Diseases: The Official Journal of National
Stroke Association 29.10 (2020). doi:10.1016/j.jstrokecerebrovasdis.2020.105134.
[45]
A. Wouters, R. Lemmens, P. Dupont, and V. Thijs. “Wake-Up Stroke and Stroke of Unknown
Onset: A Critical Review”. In: Frontiers in Neurology 5 (2014). doi:
10.3389/fneur.2014.00
153.
[46]
O. Shafaat and H. Sotoudeh. “Stroke Imaging”. In: StatPearls. Treasure Island (FL): StatPearls
Publishing, 2022. url:http://www.ncbi.nlm.nih.gov/books/NBK546635/.
[47]
A. J. M. Kiruluta and R. G. González. “Chapter 7 - Magnetic resonance angiography: physical
principles and applications”. In: Handbook of Clinical Neurology. Ed. by J. C. Masdeu and
R. G. González. Vol. 135. Neuroimaging Part I. Elsevier, 2016, pp. 137–149. doi:
10.1016/B97
8-0-444-53485-9.00007-6.
[48]
D. Chien and R. R. Edelman. “Basic principles and clinical applications of magnetic resonance
angiography”. In: Seminars in Roentgenology. Noninvasive Vascular Imaging 27.1 (1992), pp. 53–
62. doi:10.1016/0037-198X(92)90046-5.
[49]
J. C. Carr and T. J. Carroll. Magnetic Resonance Angiography: Principles and Applications.
Springer Science & Business Media, 2011. isbn: 978-1-4419-1685-3. doi:
10.1007/978-1-4419-
1686-0.
[50]
T. Sichtermann, A. Faron, R. Sijben, N. Teichert, J. Freiherr, and M. Wiesmann. “Deep
Learning–Based Detection of Intracranial Aneurysms in 3D TOF-MRA”. In: American Journal
of Neuroradiology 40.1 (2019), pp. 25–32. doi:10.3174/ajnr.A5911.
94
REFERENCES
[51]
C. G. Choi, D. H. Lee, J. H. Lee, H. W. Pyun, D. W. Kang, S. U. Kwon, J. K. Kim, S. J.
Kim, and D. C. Suh. “Detection of Intracranial Atherosclerotic Steno-Occlusive Disease with
3D Time-of-Flight Magnetic Resonance Angiography with Sensitivity Encoding at 3T”. In:
American Journal of Neuroradiology 28.3 (2007), pp. 439–446. issn: 0195-6108, 1936-959X. url:
http://www.ajnr.org/content/28/3/439.
[52]
X. Gong, Z. Chen, F. Shi, M. Zhang, C. Xu, R. Zhang, and M. Lou. “Conveniently-Grasped
Field Assessment Stroke Triage (CG-FAST): A Modified Scale to Detect Large Vessel Occlusion
Stroke”. In: Frontiers in Neurology 10 (2019). doi:10.3389/fneur.2019.00390.
[53]
G. Turc, P. Bhogal, U. Fischer, P. Khatri, K. Lobotesis, M. Mazighi, P. D. Schellinger, D.
Toni, J. de Vries, P. White, and J. Fiehler. “European Stroke Organisation (ESO) – European
Society for Minimally Invasive Neurological Therapy (ESMINT) Guidelines on Mechanical
Thrombectomy in Acute Ischaemic StrokeEndorsed by Stroke Alliance for Europe (SAFE)”. In:
European Stroke Journal 4.1 (2019), pp. 6–12. doi:10.1177/2396987319832140.
[54]
J. Gutierrez, K. Cheung, A. Bagci, T. Rundek, N. Alperin, R. L. Sacco, C. B. Wright, and
M. S. V. Elkind. “Brain Arterial Diameters as a Risk Factor for Vascular Events”. In: Journal
of the American Heart Association 4.8 (2015). doi:10.1161/JAHA.115.002289.
[55]
D. Frey, M. Livne, H. Leppin, E. M. Akay, O. U. Aydin, J. Behland, J. Sobesky, P. Vajkoczy,
and V. I. Madai. “A precision medicine framework for personalized simulation of hemodynamics
in cerebrovascular disease”. In: BioMedical Engineering OnLine 20.1 (2021), p. 44. doi:
10.118
6/s12938-021-00880-w.
[56]
O. Ronneberger, P. Fischer, and T. Brox. “U-Net: Convolutional Networks for Biomedical Image
Segmentation”. In: arXiv (2015). doi:10.48550/ARXIV.1505.04597.
[57]
A. Hilbert, V. I. Madai, E. M. Akay, O. U. Aydin, J. Behland, J. Sobesky, I. Galinovic,
A. A. Khalil, A. A. Taha, J. Wuerfel, P. Dusek, T. Niendorf, J. B. Fiebach, D. Frey, and M.
Livne. “BRAVE-NET: Fully Automated Arterial Brain Vessel Segmentation in Patients With
Cerebrovascular Disease”. In: Frontiers in Artificial Intelligence 3 (2020). doi:
10.3389/frai
.2020.552258.
[58]
B. J. Kim. “Principles and Practical Application of Brain MRI in Acute Ischemic Stroke”.
In: Stroke Revisited: Diagnosis and Treatment of Ischemic Stroke. Ed. by S.
-
H. Lee. Stroke
Revisited. Singapore: Springer, 2017, pp. 109–119. isbn: 978-981-10-1424-6. doi:
10.1007/978-
981-10-1424-6_10.
[59]
C. M. Ermine, A. Bivard, M. W. Parsons, and J.
-
C. Baron. “The ischemic penumbra: From
concept to reality”. In: International Journal of Stroke 16 (2021), pp. 497–509. doi:
10.1177/1
747493020975229.
[60]
C. S. Kidwell, J. R. Alger, and J. L. Saver. “Evolving Paradigms in Neuroimaging of the Ischemic
Penumbra”. In: Stroke 35.11_suppl_1 (2004), pp. 2662–2665. doi:
10.1161/01.STR.00001432
22.13069.70.
[61]
M. Straka, G. W. Albers, and R. Bammer. “Real-time diffusion-perfusion mismatch analysis
in acute stroke”. In: Journal of Magnetic Resonance Imaging 32.5 (2010), pp. 1024–1037. doi:
10.1002/jmri.22338.
[62]
P. D. Schellinger, J. B. Fiebach, and W. Hacke. “Imaging-Based Decision Making in Thrombolytic
Therapy for Ischemic Stroke”. In: Stroke 34.2 (2003), pp. 575–583. doi:
10.1161/01.STR.0000
051504.10095.9C.
95
REFERENCES
[63]
P. D. Schellinger and J. B. Fiebach. “Perfusion-Weighted Imaging/Diffusion-Weighted Imaging
Mismatch on MRI Can Now Be Used to Select Patients for Recombinant Tissue Plasminogen
Activator Beyond 3 Hours”. In: Stroke 36.5 (2005), pp. 1098–1101. doi:
10.1161/01.STR.0000
162388.67745.8d.
[64]
L. Østergaard. “Principles of cerebral perfusion imaging by bolus tracking”. In: Journal of
Magnetic Resonance Imaging 22.6 (2005), pp. 710–717. doi:10.1002/jmri.20460.
[65]
B. J. Kim, H. G. Kang, H.
-
J. Kim, S.
-
H. Ahn, N. Y. Kim, S. Warach, and D.
-
W. Kang.
“Magnetic Resonance Imaging in Acute Ischemic Stroke Treatment”. In: Journal of Stroke 16.3
(2014), pp. 131–145. doi:10.5853/jos.2014.16.3.131.
[66]
K. Suzuki. “Overview of deep learning in medical imaging”. In: Radiological Physics and
Technology 10.3 (2017), pp. 257–273. doi:10.1007/s12194-017-0406-5.
[67]
B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker, K. H. Cha, R. M. Summers, and
M. L. Giger. “Deep learning in medical imaging and radiation therapy”. In: Medical Physics
46.1 (2019), e1–e36. doi:10.1002/mp.13264.
[68]
J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro. “LOGAN: Membership Inference Attacks
Against Generative Models”. In: Proceedings on Privacy Enhancing Technologies 2019.1 (2019),
pp. 133–152. doi:10.2478/popets-2019-0008.
[69]
D. Chen, N. Yu, Y. Zhang, and M. Fritz. “GAN-Leaks: A Taxonomy of Membership Inference
Attacks against Generative Models”. In: Proceedings of the 2020 ACM SIGSAC Conference on
Computer and Communications Security. ACM, 2020. doi:10.1145/3372297.3417238.
[70]
G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren. “Secure, privacy-preserving and
federated machine learning in medical imaging”. In: Nature Machine Intelligence 2.6 (2020),
pp. 305–311. doi:10.1038/s42256-020-0186-1.
[71]
H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas. “Communication-
Efficient Learning of Deep Networks from Decentralized Data”. In: arXiv (2016). doi:
10.4855
0/ARXIV.1602.05629.
[72]
C. Dwork. “Differential Privacy: A Survey of Results”. In: Theory and Applications of Models
of Computation. Ed. by M. Agrawal, D. Du, Z. Duan, and A. Li. Lecture Notes in Computer
Science. Berlin, Heidelberg: Springer, 2008, pp. 1–19. isbn: 978-3-540-79228-4. doi:
10.1007/9
78-3-540-79228-4_1.
[73]
V. Cheng, V. M. Suriyakumar, N. Dullerud, S. Joshi, and M. Ghassemi. “Can You Fake It Until
You Make It?: Impacts of Differentially Private Synthetic Data on Downstream Classification
Fairness”. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and
Transparency. Virtual Event Canada: ACM, 2021, pp. 149–160. isbn: 978-1-4503-8309-7. doi:
10.1145/3442188.3445879.
[74]
T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth. “f-AnoGAN:
Fast unsupervised anomaly detection with generative adversarial networks”. In: Medical Image
Analysis 54 (2019), pp. 30–44. doi:10.1016/j.media.2019.01.010.
[75]
M. D. Cirillo, D. Abramian, and A. Eklund. “Vox2Vox: 3D-GAN for Brain Tumour
Segmentation”. In: arXiv (2020). doi:10.48550/ARXIV.2003.13653.
[76]
M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. “GAN-
based synthetic medical image augmentation for increased CNN performance in liver lesion
classification”. In: Neurocomputing 321 (2018), pp. 321–331. doi:
10.1016/j.neucom.2018.09
.013.
96
REFERENCES
[77]
J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum. “Generative Adversarial Networks
for Noise Reduction in Low-Dose CT”. In: IEEE Transactions on Medical Imaging 36.12 (2017),
pp. 2536–2545. doi:10.1109/TMI.2017.2708987.
[78]
B. Yu, L. Zhou, L. Wang, Y. Shi, J. Fripp, and P. Bourgeat. “Ea-GANs: Edge-Aware Generative
Adversarial Networks for Cross-Modality MR Image Synthesis”. In: IEEE Transactions on
Medical Imaging 38.7 (2019), pp. 1750–1762. doi:10.1109/TMI.2019.2895894.
[79]
M. AlAmir and M. AlGhamdi. “The Role of Generative Adversarial Network in Medical Image
Analysis: An in-depth survey”. In: ACM Computing Surveys (2022). doi:10.1145/3527849.
[80]
M. Arjovsky, S. Chintala, and L. Bottou. “Wasserstein GAN”. In: arXiv (2017). doi:
10.48550
/ARXIV.1701.07875.
[81]
P. Isola, J.
-
Y. Zhu, T. Zhou, and A. A. Efros. “Image-to-Image Translation with Conditional
Adversarial Networks”. In: arXiv (2016). doi:10.48550/ARXIV.1611.07004.
[82]
A. Radford, L. Metz, and S. Chintala. “Unsupervised Representation Learning with Deep
Convolutional Generative Adversarial Networks”. In: arXiv (2015). doi:
10.48550/ARXIV.151
1.06434.
[83]
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. “Improved
Techniques for Training GANs”. In: arXiv (2016). doi:10.48550/ARXIV.1606.03498.
[84]
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. “Improved Training of
Wasserstein GANs”. In: arXiv, 2017. doi:10.48550/ARXIV.1704.00028.
[85]
K. N. D. Brou Boni, J. Klein, L. Vanquin, A. Wagner, T. Lacornerie, D. Pasquier, and N.
Reynaert. “MR to CT synthesis with multicenter data in the pelvic area using a conditional
generative adversarial network”. In: Physics in Medicine & Biology 65.7 (2020). doi:
10.1088
/1361-6560/ab7633.
[86]
J. Benzakoun, M.
-
A. Deslys, L. Legrand, G. Hmeydia, G. Turc, W. B. Hassen, S. Charron,
C. Debacker, O. Naggara, J.
-
C. Baron, B. Thirion, and C. Oppenheim. “Synthetic FLAIR
as a Substitute for FLAIR Sequence in Acute Ischemic Stroke”. In: Radiology 303.1 (2022),
pp. 153–159. doi:10.1148/radiol.211394.
[87]
N. K. Singh and K. Raza. “Medical Image Generation Using Generative Adversarial Networks:
A Review”. In: Health Informatics: A Computational Perspective in Healthcare. Ed. by R.
Patgiri, A. Biswas, and P. Roy. Studies in Computational Intelligence. Singapore: Springer,
2021, pp. 77–96. isbn: 9789811597350. doi:10.1007/978-981-15-9735-0_5.
[88]
Y. Wang, B. Yu, L. Wang, C. Zu, D. S. Lalush, W. Lin, X. Wu, J. Zhou, D. Shen, and L. Zhou.
“3D conditional generative adversarial networks for high-quality PET image estimation at low
dose”. In: NeuroImage 174 (2018), pp. 550–562. doi:10.1016/j.neuroimage.2018.03.045.
[89]
H. Choi, D. S. Lee, and Alzheimer’s Disease Neuroimaging Initiative. “Generation of Structural
MR Images from Amyloid PET: Application to MR-Less Quantification”. In: Journal of Nuclear
Medicine: Official Publication, Society of Nuclear Medicine 59.7 (2018), pp. 1111–1117. doi:
10.2967/jnumed.117.199414.
[90]
M. Maspero, M. H. F. Savenije, A. M. Dinkla, P. R. Seevinck, M. P. W. Intven, I. M. Jurgenliemk-
Schulz, L. G. W. Kerkmeijer, and C. A. T. v. d. Berg. “Dose evaluation of fast synthetic-CT
generation using a generative adversarial network for general pelvis MR-only radiotherapy”. In:
Physics in Medicine & Biology 63.18 (2018). doi:10.1088/1361-6560/aada6d.
[91]
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. “GANs Trained by a
Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. In: arXiv (2017). doi:
10.48550/ARXIV.1706.08500.
97
REFERENCES
[92]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. “Rethinking the Inception
Architecture for Computer Vision”. In: arXiv (2015). doi:10.48550/ARXIV.1512.00567.
[93]
C. Haarburger, N. Horst, D. Truhn, M. Broeckmann, S. Schrading, C. Kuhl, and D. Merhof.
“Multiparametric Magnetic Resonance Image Synthesis using Generative Adversarial Networks”.
In: Eurographics Workshop on Visual Computing for Biology and Medicine (2019), 5 pages. doi:
10.2312/VCBM.20191226.
[94]
B. Cao, H. Zhang, N. Wang, X. Gao, and D. Shen. “Auto-GAN: Self-Supervised Collaborative
Learning for Medical Image Synthesis”. In: Proceedings of the AAAI Conference on Artificial
Intelligence 34.07 (2020), pp. 10486–10493. doi:10.1609/aaai.v34i07.6619.
[95]
S. Chen, K. Ma, and Y. Zheng. “Med3D: Transfer Learning for 3D Medical Image Analysis”.
In: arXiv (2019). doi:10.48550/ARXIV.1904.00625.
[96]
L. Tronchin, R. Sicilia, E. Cordelli, S. Ramella, and P. Soda. “Evaluating GANs in Medical
Imaging”. In: Deep Generative Models, and Data Augmentation, Labelling, and Imperfections.
Ed. by S. Engelhardt, I. Oksuz, D. Zhu, Y. Yuan, A. Mukhopadhyay, N. Heller, S. X. Huang,
H. Nguyen, R. Sznitman, and Y. Xue. Lecture Notes in Computer Science. Cham: Springer
International Publishing, 2021, pp. 112–121. isbn: 978-3-030-88210-5. doi:
10.1007/978-3-03
0-88210-5_10.
[97]
A. Borji. “Pros and Cons of GAN Evaluation Measures”. In: arXiv (2018). doi:
10.48550
/ARXIV.1802.03446.
[98]
M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly. “Assessing Generative
Models via Precision and Recall”. In: arXiv (2018). doi:10.48550/ARXIV.1806.00035.
[99]
N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni. “U-Net and Its Variants for
Medical Image Segmentation: A Review of Theory and Applications”. In: IEEE Access 9 (2021),
pp. 82031–82057. doi:10.1109/ACCESS.2021.3086020.
[100]
S. Moccia, E. De Momi, S. El Hadji, and L. S. Mattos. “Blood vessel segmentation algorithms –
Review of methods, datasets and evaluation metrics”. In: Computer Methods and Programs in
Biomedicine 158 (2018), pp. 71–91. doi:10.1016/j.cmpb.2018.02.001.
[101]
Y. Chen, X.
-
H. Yang, Z. Wei, A. A. Heidari, N. Zheng, Z. Li, H. Chen, H. Hu, Q. Zhou, and
Q. Guan. “Generative Adversarial Networks in Medical Image augmentation: A review”. In:
Computers in Biology and Medicine 144 (2022). doi:10.1016/j.compbiomed.2022.105382.
[102]
T. Neff, C. Payer, D. Stern, and M. Urschler. “Generative Adversarial Network based Synthesis
for Supervised Medical Image Segmentation”. In: Proceedings of the OAGM & ARW Joint
Workshop Vision, Automation and Robotics (). doi:10.3217/978-3-85125-524-9-30.
[103]
J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, F. K. Chen, and M. J. Collins. “Data
augmentation for patch-based OCT chorio-retinal segmentation using generative adversarial
networks”. In: Neural Computing and Applications 33.13 (2021), pp. 7393–7408. doi:
10.1007
/s00521-021-05826-w.
[104]
J. T. Guibas, T. S. Virdi, and P. S. Li. “Synthetic Medical Images from Dual Generative
Adversarial Networks”. In: arXiv (2017). doi:10.48550/ARXIV.1709.01872.
[105]
C. Bowles, L. Chen, R. Guerrero, P. Bentley, R. Gunn, A. Hammers, D. A. Dickie, M. V.
Hernández, J. Wardlaw, and D. Rückert. “GAN Augmentation: Augmenting Training Data
using Generative Adversarial Networks”. In: arXiv (2018). doi:
10.48550/ARXIV.1810.10863
.
[106]
M. Foroozandeh and A. Eklund. “Synthesizing brain tumor images and annotations by combining
progressive growing GAN and SPADE”. In: arXiv (2020). doi:10.48550/ARXIV.2009.05946.
98
REFERENCES
[107]
A. Eklund. “Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing
GAN”. In: arXiv (2019). doi:10.48550/ARXIV.1912.05357.
[108]
L. Sun, J. Chen, Y. Xu, M. Gong, K. Yu, and K. Batmanghelich. “Hierarchical Amortized
Training for Memory-efficient High Resolution 3D GAN”. In: arXiv (2020). doi:
10.48550
/ARXIV.2008.01910.
[109]
L. Zhang, B. Shen, A. Barnawi, S. Xi, N. Kumar, and Y. Wu. “FedDPGAN: Federated
Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-
19 Pneumonia”. In: Information Systems Frontiers (2021). doi:
10.1007/s10796-021-10144-6
.
[110]
D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, and A. Y. Zomaya. “Federated Learning
for COVID-19 Detection with Generative Adversarial Networks in Edge Cloud Computing”. In:
IEEE Internet of Things Journal (2021), pp. 1–1. doi:10.1109/JIOT.2021.3120998.
[111]
Y. Xiao, K. R. Peters, W. C. Fox, J. H. Rees, D. A. Rajderkar, M. M. Arreola, I. Barreto,
W. E. Bolch, and R. Fang. “Transfer-Gan: Multimodal Ct Image Super-Resolution Via Transfer
Generative Adversarial Networks”. In: 2020 IEEE 17th International Symposium on Biomedical
Imaging (ISBI). 2020, pp. 195–198. doi:10.1109/ISBI45749.2020.9098322.
[112]
S. Kaji and S. Kida. “Overview of image-to-image translation by use of deep neural networks:
denoising, super-resolution, modality conversion, and reconstruction in medical imaging”. In:
Radiological Physics and Technology 12.3 (2019), pp. 235–248. doi:
10.1007/s12194-019-005
20-y.
[113]
J.
-
Y. Zhu, T. Park, P. Isola, and A. A. Efros. “Unpaired Image-to-Image Translation using
Cycle-Consistent Adversarial Networks”. In: arXiv (2017). doi:
10.48550/ARXIV.1703.10593
.
[114]
K. Armanious, C. Jiang, S. Abdulatif, T. Küstner, S. Gatidis, and B. Yang. “Unsupervised
Medical Image Translation Using Cycle-MedGAN”. In: 2019 27th European Signal Processing
Conference (EUSIPCO). 2019, pp. 1–5. doi:10.23919/EUSIPCO.2019.8902799.
[115]
J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. v. d. Berg, and
I. Isgum. “Deep MR to CT Synthesis using Unpaired Data”. In: arXiv (2017). doi:
10.48550
/ARXIV.1708.01155.
[116]
O. Maier, B. H. Menze, J. von der Gablentz, L. Häni, M. P. Heinrich, M. Liebrand, S. Winzeck,
A. Basit, P. Bentley, L. Chen, D. Christiaens, F. Dutil, K. Egger, C. Feng, B. Glocker, M. Götz,
T. Haeck, H.
-
L. Halme, M. Havaei, K. M. Iftekharuddin, P.
-
M. Jodoin, K. Kamnitsas, E. Kellner,
A. Korvenoja, H. Larochelle, C. Ledig, J.
-
H. Lee, F. Maes, Q. Mahmood, K. H. Maier-Hein, R.
McKinley, J. Muschelli, C. Pal, L. Pei, J. R. Rangarajan, S. M. S. Reza, D. Robben, D. Rückert,
E. Salli, P. Suetens, C.
-
W. Wang, M. Wilms, J. S. Kirschke, U. M. Krämer, T. F. Münte, P.
Schramm, R. Wiest, H. Handels, and M. Reyes. “ISLES 2015 - A public evaluation benchmark
for ischemic stroke lesion segmentation from multispectral MRI”. In: Medical Image Analysis 35
(2017), pp. 250–269. doi:10.1016/j.media.2016.07.009.
[117]
K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon,
D. Rückert, and B. Glocker. “Efficient multi-scale 3D CNN with fully connected CRF for
accurate brain lesion segmentation”. In: Medical Image Analysis 36 (2017), pp. 61–78. doi:
10.1016/j.media.2016.10.004.
[118]
Y. Zhang, S. Liu, C. Li, and J. Wang. “Application of Deep Learning Method on Ischemic
Stroke Lesion Segmentation”. In: Journal of Shanghai Jiaotong University (Science) 27.1 (2022),
pp. 99–111. doi:10.1007/s12204-021-2273-9.
99
REFERENCES
[119]
M. Islam, N. R. Vaidyanathan, V. J. M. Jose, and H. Ren. “Ischemic Stroke Lesion Segmentation
Using Adversarial Learning”. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic
Brain Injuries. Ed. by A. Crimi, S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum.
Cham: Springer International Publishing, 2019, pp. 292–300. isbn: 978-3-030-11723-8. doi:
10.1007/978-3-030-11723-8_29.
[120]
H. Kuang, B. K. Menon, and W. Qiu. “Automated stroke lesion segmentation in non-contrast
CT scans using dense multi-path contextual generative adversarial network”. In: Physics in
Medicine & Biology 65.21 (2020). doi:10.1088/1361-6560/aba166.
[121]
N. Hu, T. Zhang, Y. Wu, B. Tang, M. Li, B. Song, Q. Gong, M. Wu, S. Gu, and S. Lui.
“Detecting brain lesions in suspected acute ischemic stroke with CT-based synthetic MRI
using generative adversarial networks”. In: Annals of Translational Medicine 10.2 (2022). doi:
10.21037/atm-21-4056.
[122]
S. Wang, Z. Chen, S. You, B. Wang, Y. Shen, and B. Lei. “Brain stroke lesion segmentation using
consistent perception generative adversarial network”. In: Neural Computing and Applications
(2022). doi:10.1007/s00521-021-06816-8.
[123]
M. Platscher, J. Zopes, and C. Federau. “Image translation for medical image generation:
Ischemic stroke lesion segmentation”. In: Biomedical Signal Processing and Control 72 (2022).
doi:10.1016/j.bspc.2021.103283.
[124]
M. D. Moghari, L. Zhou, B. Yu, K. Moore, N. Young, R. Fulton, and A. Kyme. “Estimation of
full-dose 4D CT perfusion images from low-dose images using conditional generative adversarial
networks”. In: 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference
(NSS/MIC). 2019, pp. 1–3. doi:10.1109/NSS/MIC42101.2019.9059723.
[125]
A. Sharma and G. Hamarneh. “Missing MRI Pulse Sequence Synthesis Using Multi-Modal
Generative Adversarial Network”. In: IEEE Transactions on Medical Imaging 39.4 (2020),
pp. 1170–1183. doi:10.1109/TMI.2019.2945521.
[126]
M. F. Rachmadi, M. del C. Valdés-Hernández, S. Makin, J. M. Wardlaw, and T. Komura.
“Predicting the Evolution of White Matter Hyperintensities in Brain MRI Using Generative
Adversarial Networks and Irregularity Map”. In: Medical Image Computing and Computer
Assisted Intervention – MICCAI 2019. Ed. by D. Shen, T. Liu, T. M. Peters, L. H. Staib,
C. Essert, S. Zhou, P.
-
T. Yap, and A. Khan. Cham: Springer International Publishing, 2019,
pp. 146–154. isbn: 978-3-030-32248-9. doi:10.1007/978-3-030-32248-9_17.
[127]
A. Hess, R. Meier, J. Kaesmacher, S. Jung, F. Scalzo, D. Liebeskind, R. Wiest, and R. McKinley.
“Synthetic Perfusion Maps: Imaging Perfusion Deficits in DSC-MRI with Deep Learning”. In:
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Ed. by A. Crimi,
S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum. Lecture Notes in Computer
Science. Cham: Springer International Publishing, 2019, pp. 447–455. isbn: 978-3-030-11723-8.
doi:10.1007/978-3-030-11723-8_45.
[128]
K. C. Ho, F. Scalzo, K. V. Sarma, S. El-Saden, and C. W. Arnold. “A temporal deep learning
approach for MR perfusion parameter estimation in stroke”. In: 2016 23rd International
Conference on Pattern Recognition (ICPR). Cancun: IEEE, 2016, pp. 1315–1320. isbn: 978-1-
5090-4847-2. doi:10.1109/ICPR.2016.7899819.
[129]
R. Meier, P. Lux, B. Med, S. Jung, U. Fischer, J. Gralla, M. Reyes, R. Wiest, R. McKinley,
and J. Kaesmacher. “Neural Network–derived Perfusion Maps for the Assessment of Lesions
in Patients with Acute Ischemic Stroke”. In: Radiology: Artificial Intelligence 1.5 (2019). doi:
10.1148/ryai.2019190019.
100
REFERENCES
[130]
C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren. “GANobfuscator: Mitigating Information
Leakage Under GAN via Differential Privacy”. In: IEEE Transactions on Information Forensics
and Security 14.9 (2019), pp. 2358–2371. doi:10.1109/TIFS.2019.2897874.
[131]
R. Shokri, M. Stronati, C. Song, and V. Shmatikov. “Membership Inference Attacks Against
Machine Learning Models”. In: 2017 IEEE Symposium on Security and Privacy (SP). 2017,
pp. 3–18. doi:10.1109/SP.2017.41.
[132]
N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier,
B. A. Landman, K. Maier-Hein, S. Ourselin, M. Sheller, R. M. Summers, A. Trask, D. Xu,
M. Baust, and M. J. Cardoso. “The future of digital health with federated learning”. In: npj
Digital Medicine 3.1 (2020), pp. 1–7. doi:10.1038/s41746-020-00323-1.
[133]
M. Grama, M. Musat, L. Muñoz-González, J. Passerat-Palmbach, D. Rückert, and A. Alansary.
Robust Aggregation for Adaptive Privacy Preserving Federated Learning in Healthcare. Tech. rep.
2020. doi:10.48550/ARXIV.2009.08294.
[134]
S. Hu, Y. Shen, S. Wang, and B. Lei. “Brain MR to PET Synthesis via Bidirectional Generative
Adversarial Network”. In: Medical Image Computing and Computer Assisted Intervention –
MICCAI 2020. Ed. by A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga,
S. K. Zhou, D. Racoceanu, and L. Joskowicz. Lecture Notes in Computer Science. Cham:
Springer International Publishing, 2020, pp. 698–707. isbn: 978-3-030-59713-9. doi:
10.1007/9
78-3-030-59713-9_67.
[135]
M. T. Gracia Bara, A. Gallardo-Higueras, E. M. Moreno, E. Laffond, F. J. Muñoz Bellido, C.
Martin, M. Sobrino, E. Macias, S. Arriba-Méndez, R. Castillo, and I. Davila. “Hypersensitivity
to Gadolinium-Based Contrast Media”. In: Frontiers in Allergy 3 (2022). doi:
10.3389/falgy
.2022.813927.
[136]
Y.
-
J. Jung, S.
-
H. Han, and H.
-
J. Choi. “Explaining CNN and RNN Using Selective Layer-Wise
Relevance Propagation”. In: IEEE Access 9 (2021), pp. 18670–18681. doi:
10.1109/ACCESS.20
21.3051171.
[137]
V. Nagisetty, L. Graves, J. Scott, and V. Ganesh. “xAI-GAN: Enhancing Generative Adversarial
Networks via Explainable AI Systems”. In: arXiv (2020). doi:10.48550/ARXIV.2002.10438.
[138]
C. Wu, H. Zhang, J. Chen, Z. Gao, P. Zhang, K. Muhammad, and J. Del Ser. “Vessel-
GAN: Angiographic reconstructions from myocardial CT perfusion with explainable generative
adversarial networks”. In: Future Generation Computer Systems 130 (2022), pp. 128–139. doi:
10.1016/j.future.2021.12.007.
[139]
X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. “InfoGAN:
Interpretable Representation Learning by Information Maximizing Generative Adversarial
Nets”. In: arXiv (2016). doi:10.48550/ARXIV.1606.03657.
[140]
R. Toda, A. Teramoto, M. Tsujimoto, H. Toyama, K. Imaizumi, K. Saito, and H. Fujita.
“Synthetic CT image generation of shape-controlled lung cancer using semi-conditional InfoGAN
and its applicability for type classification”. In: International Journal of Computer Assisted
Radiology and Surgery 16.2 (2021), pp. 241–251. doi:10.1007/s11548-021-02308-1.
[141]
J. Fragemann, L. Ardizzone, J. Egger, J. Kleesiek, and M. Workshop. Review of Disentanglement
Approaches for Medical Applications: Towards Solving the Gordian Knot of Generative Models
in Healthcare. preprint. 2022. doi:10.36227/techrxiv.19364897.v1.
[142]
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. “An Image is Worth 16x16
Words: Transformers for Image Recognition at Scale”. In: arXiv (2020). doi:
10.48550/ARXIV.2
010.11929.
101
REFERENCES
[143]
F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, and H. Fu.
“Transformers in Medical Imaging: A Survey”. In: arXiv (2022). doi:
10.48550/ARXIV.2
201.09873.
[144] C. Matsoukas, J. F. Haslum, M. Söderberg, and K. Smith. “Is it Time to Replace CNNs with
Transformers for Medical Images?” In: arXiv (2021). doi:10.48550/ARXIV.2108.09038.
[145]
S. A. Kamran, K. F. Hossain, A. Tavakkoli, S. L. Zuckerbrod, and S. A. Baker. “VTGAN:
Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers”. In:
2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021),
pp. 3228–3238. doi:10.1109/ICCVW54120.2021.00362.
[146]
X. Li, Y. Jiang, J. J. Rodriguez-Andina, H. Luo, S. Yin, and O. Kaynak. “When medical
images meet generative adversarial network: recent development and research opportunities”.
In: Discover Artificial Intelligence 1.1 (2021). doi:10.1007/s44163-021-00006-0.
[147]
A. M. Alaa, B. van Breugel, E. Saveliev, and M. van der Schaar. “How Faithful is your Synthetic
Data? Sample-level Metrics for Evaluating and Auditing Generative Models”. In: arXiv (2021).
doi:10.48550/ARXIV.2102.08921.
[148]
N. Shrivastava, M. A. Hanif, S. Mittal, S. R. Sarangi, and M. Shafique. “A survey of hardware
architectures for generative adversarial networks”. In: Journal of Systems Architecture 118
(2021). doi:10.1016/j.sysarc.2021.102227.
[149]
D. Saxena and J. Cao. “Generative Adversarial Networks (GANs): Challenges, Solutions, and
Future Directions”. In: ACM Computing Surveys 54.3 (2022), pp. 1–42. doi:
10.1145/3446374
.
102