Document [original]

Generative Adversarial Networks

for Medical Image Synthesis

in Stroke

vorgelegt von

M.Sc.

Tabea Kossen

ORCID: 0000-0002-2986-0907

von der Fakultät IV - Elektrotechnik und Informatik

der Technischen Universität Berlin

zur Erlangung des akademischen Grades

Doktor der Ingenieurwissenschaften

-Dr.-Ing.-

genehmigte Dissertation

Promotionsausschuss:

Vorsitzender: Prof. Dr. Marc Alexa

Gutachterin: Prof. Dr. Anja Hennemuth

Gutachter: Prof. Dr. Kristian Hildebrand

Gutachter: Prof. Dr. Daniel Rückert

Tag der wissenschaftlichen Aussprache: 30. September 2022

Berlin 2022

Abstract

Stroke is one of the leading causes of death worldwide. Medical imaging techniques such as

magnetic resonance imaging offer the possibility to extract essential individual information

about the disease that allowed for better patient care in the past decades. The advancement

in computational power and increase in data availability has led to the rise of Deep Learning

(DL) models, also for medical images. While DL methods have shown promising results in

automating the processing of medical images, a major challenge remains data availability, as

acquiring medical data is expensive and time-consuming. Additionally, medical images often

need to be annotated by medical experts to be useful for DL models. A solution to this would

be data sharing, but this is often hindered by privacy regulations. To sustain the patient’s

privacy and still allow for data sharing, synthesizing artificial images could be an encouraging

remedy.

For this, Generative Adversarial Networks (GANs) are gaining much attention. GANs

usually consist of two competing neural networks with one network, the generator, synthesizing

data samples. In contrast, the other network, the discriminator, judges how realistic the sample

looks and provides feedback to both networks. In this thesis, we generate synthetic images

using different GANs for two purposes in the stroke domain: sharing of labeled images and

automated image processing for treatment planning.

In the first part, we synthesize medical image patches for segmentation along with their

respective segmentation labels. We evaluate our synthetic data by training a segmentation

network on synthetic data and testing their performance on real data. In a next step, we simulate

the positive influence of sharing our synthetic data in terms of segmentation performance and

also extend our framework to generate 3D patches, thereby capturing more spatial information.

Additionally, we infuse noise in the discriminator during the GAN training to generate privacy-

preserving 2D patches leveraging the mathematical concept of differential privacy. In this way,

we quantify the level of privacy for our generated patches and investigate the trade-off between

privacy and the utility of our artificial data. The second part addresses a second application of

GANs for image synthesis: the automatic processing of perfusion-weighted imaging for stroke

treatment planning. Here, we propose a GAN variant for image-to-image translations with

additional temporal convolutions in the generator. We test our network on data including

both acute stroke patients and patients with chronic cerebrovascular disease and achieve high

performance in both cases.

In this thesis, we demonstrate the potential of utilizing GANs for image synthesis in the

field of stroke imaging. Our results are promising both for data sharing as well as for automated

image processing. In the future, GANs could substantially increase the data availability in the

medical field and also contribute to better treatment planning in stroke.

Zusammenfassung

Der Schlaganfall ist weltweit eine der häufigsten Todesursachen. Medizinische Bildgebungsver-

fahren wie die Magnetresonanztomographie bieten die Möglichkeit, wesentliche individuelle

Informationen über die Krankheit zu extrahieren, was in den vergangenen Jahrzehnten eine

bessere Patientenversorgung ermöglichte. Der Fortschritt bei der Rechenleistung und die

zunehmende Datenverfügbarkeit haben zum Aufkommen von Deep-Learning-Modellen (DL-

Modellen) geführt, auch für medizinische Bilder. Während DL-Methoden vielversprechende

Ergebnisse bei der automatisierten Verarbeitung medizinischer Bilder gezeigt haben, bleibt die

Datenverfügbarkeit eine große Herausforderung, da die Beschaffung medizinischer Daten teuer

und zeitaufwändig ist. Außerdem müssen medizinische Bilder oft von medizinischen Experten

annotiert werden, um für DL-Modelle nützlich zu sein. Eine Lösung für dieses Problem wäre

das Teilen von Daten, was jedoch häufig durch Datenschutzbestimmungen behindert wird. Um

die Privatsphäre des Patienten zu wahren und dennoch die gemeinsame Nutzung von Daten zu

ermöglichen, könnte die Synthese künstlicher Bilder eine vielversprechende Abhilfe schaffen.

Zu diesem Zweck gewinnen Generative Adversarial Networks (GANs) zunehmend an

Aufmerksamkeit. GANs bestehen in der Regel aus zwei konkurrierenden neuronalen Netzen,

wobei ein Netz, der Generator, Daten synthetisiert. Im Gegensatz dazu beurteilt das andere

Netz, der Diskriminator, wie realistisch die Daten aussehen und gibt beiden Netzwerken

Rückmeldung. In dieser Arbeit erzeugen wir synthetische Bilder mit Hilfe verschiedener GANs

für zwei Probleme im Schlaganfallbereich: das Teilen von annotierten Bildern und automatische

Bildverarbeitung für die Behandlungsplanung.

Im ersten Teil synthetisieren wir medizinische Bildausschnitte für ein Segmentierungs-

problem zusammen mit ihren entsprechenden Segmentierungslabel. Wir evaluieren unsere

synthetischen Daten durch das Training eines Segmentierungsnetzwerks, das auf synthetischen

Daten trainiert und ihre Performanz auf realen Daten getestet wird. In einem nächsten

Schritt simulieren wir den positiven Einfluss der gemeinsamen Nutzung unserer synthetischen

Daten auf die Segmentierungsleistung und erweitern unser Netzwerk, um 3D-Patches zu

generieren und so mehr räumliche Informationen zu erfassen. Außerdem fügen wir während des

GAN-Trainings Rauschen in den Diskriminator ein, um datenschutzfreundliche 2D-Patches

zu erzeugen. Um 2D-Bildausschnitte zu erzeugen, die die Privatsphäre bewahren, nutzen wir

das mathematische Konzept der differential privacy. Auf diese Weise quantifizieren wir den

Grad der Privatsphäre für die von uns erzeugten Patches und untersuchen den Kompromiss

zwischen Privatsphäre und dem Nutzen unserer künstlichen Daten. Der zweite Teil befasst sich

mit einer zweiten Anwendung von GANs für die Bildsynthese: die automatische Verarbeitung

von Perfusionsbildern für die Planung der Schlaganfallbehandlung. Hier schlagen wir eine

GAN-Variante für Bild-zu-Bild-Übersetzungen mit einer zusätzlichen temporalen Komponente

im Generator vor. Wir testen unser Netzwerk an Daten, die sowohl akute Schlaganfallpatienten

als auch Patienten mit chronischen zerebrovaskulären Erkrankungen umfassen und erreichen

in beiden Fällen eine hohe Performanz.

In dieser Dissertation demonstrieren wir das Potenzial des Einsatzes von GANs für die

Bildsynthese im Bereich der Schlaganfall-Bildgebung. Unsere Ergebnisse sind sowohl für das

Teilen von Daten als auch für die automatisierte Bildverarbeitung vielversprechend. In Zukunft

könnten GANs die Datenverfügbarkeit im medizinischen Bereich deutlich erhöhen und auch zu

einer besseren Behandlungsplanung bei Schlaganfall beitragen.

Acknowledgements

There are many people who supported me during the last few years and whom I would like

to thank. First, I thank Prof. Anja Hennemuth for her scientific advice and guidance. Her

insightful feedback, fueled by both medical and technical expertise as well as her calming nature,

helped me to stay on a forthright path toward finishing my dissertation. I also thank Prof.

Kristian Hildebrand for his continuous support, the scientific discussions, and for contributing

his expert computer vision perspective on medical challenges.

I am very grateful to Dr. Dietmar Frey for giving me the opportunity to write my thesis in

the CLAIM group, for the medical insights, and for trusting me to freely pursue the scientific

projects I was interested in. Additionally, I would like to thank the whole CLAIM group for

insightful discussions about projects and publications as well as fun team days. In particular,

I thank Dr. Vince Madai for his exciting project ideas, fruitful discussions, and his honest,

constructive feedback that allowed me to improve my scientific skills tremendously. I also like

to thank Dr. Michelle Livne for introducing me to the world of science and for teaching me

the critical and scientific way of thinking.

Furthermore, I am thankful to the co-authors for contributing to our publications. I am also

grateful to the students I got to supervise during the last years, especially Pooja Subramaniam,

for her diligent and hard work. It was a pleasure to work with her!

Moreover, I thank my friends for their support, especially Laura and Oliver, for going

through the thesis writing and finalization phase together and for the ongoing invaluable

feedback. I also want to specifically thank Boris for proofreading the thesis.

I like to thank my parents and my sisters for their unconditional support throughout my

life, and my niece and nephews for the cutest and most wonderful distractions, especially

during exhausting weeks.

Finally, I would like to thank Joris for his loving support, encouragement, endless patience,

and for the occasional technical advice.

Table of Contents

Title Page i

Abstract iii

Zusammenfassung v

Abbreviations xi

1 Introduction 1

1.1 Machine Learning in Stroke Imaging . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Generative Adversarial Networks for Image Synthesis . . . . . . . . . . . . . . . 2

1.3 AimsandContributions ............................... 2

1.4 Thesis Outline and Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Preliminaries 7

2.1 Stroke Types and Treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 StrokeImaging .................................... 8

2.2.1 Time-of-Flight (TOF) MRA . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.2 Dynamic Susceptibility Contrast (DSC) MRI . . . . . . . . . . . . . . . 9

2.3 Challenges of Medical Imaging Data in Deep Learning . . . . . . . . . . . . . . 11

2.4 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.4.1 StandardGAN................................ 12

2.4.2 WassersteinGAN............................... 13

2.4.3 Pix2pixGAN................................. 15

2.5 Evaluation of Synthetic Images . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.1 Image-Based Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.2 Evaluation Using a Downstream Task . . . . . . . . . . . . . . . . . . . 17

3 Related Work 19

3.1 Synthesis of Medical Images Using Unconditional GANs for Data Sharing . . . 19

3.2 Image-to-Image Translation for Treatment Planning . . . . . . . . . . . . . . . 20

I Synthesis of Medical Images for Data Sharing 23

Synthesizing Anonymized and Labeled TOF-MRA Patches for Brain

Vessel Segmentation Using Generative Adversarial Networks 25

TABLE OF CONTENTS

4.1 ContextWithinThesis.............................. 25

4.2 JournalArticle .................................. 26

Generating 3D TOF-MRA Volumes and Segmentation Labels Using

Generative Adversarial Networks 37

5.1 ContextWithinThesis.............................. 37

5.2 JournalArticle .................................. 38

Toward Sharing Brain Images: Differentially Private TOF-MRA Images

With Segmentation Labels Using Generative Adversarial Networks 53

6.1 ContextWithinThesis.............................. 53

6.2 JournalArticle .................................. 54

II Image-to-Image Translation for Stroke Treatment Planning 69

Image-to-Image Generative Adversarial Networks for Synthesizing Perfu-

sion Parameter Maps from DSC-MR Images in Cerebrovascular Disease

7.1 ContextWithinThesis.............................. 71

7.2 Preprint...................................... 72

8 Discussion 85

8.1 Summary ....................................... 85

8.2 DiscussionandOutlook ............................... 86

8.2.1 Synthesis of Medical Images for Data Sharing . . . . . . . . . . . . . . . 86

8.2.2 Image-to-Image Translation for Stroke Treatment Planning . . . . . . . 88

8.2.3 Challenges and Opportunities for GANs in Medical Imaging . . . . . . . 88

References 91

Abbreviations

2D Two-dimensional

3D Three-dimensional

4D Four-dimensional

AI Artificial Intelligence

AIF Arterial Input Function

CBF Cerebral Blood Flow

CBV Cerebral Blood Volume

cGAN conditional Generative Adversarial Network

CT Computed Tomography

DL Deep Learning

DSC-MRI Dynamic Susceptibility Contrast Magnetic Resonance Imaging

DWI Diffusion-weighted Imaging

FID Fréchet Inception Distance

GAN Generative Adversarial Network

JS divergence Jensen-Shannon divergence

ML Machine Learning

MR Magnetic Resonance

MRA Magnetic Resonance Angiography

MRI Magnetic Resonance Imaging

MTT Mean Transit Time

Tmax Time-to-Maximum

TOF Time-of-Flight

TTP Time-to-Peak

WGAN Wasserstein Generative Adversarial Network

Introduction

A recent study estimated that almost one in four people will suffer from a stroke during their

lifetime [1]. In 2019, stroke already accounted for 11.6% of all deaths, making it the second

leading cause of death worldwide [2]. The stroke survivors are not only more likely to get a

recurrent stroke [3], but approximately 24–49% of the survivors live with a disability after a

stroke [4]. Whereas the age-standardized incidence is declining, it is still estimated that stroke

cases will grow by 27% in the European Union over the next decades [5]. The main reasons

for this are the aging population as well as the increased stroke survival rate. In 2017 stroke

has already cost approximately 60 billion euros in Europe, and costs are expected to rise in

the upcoming years [6].

To lower the socioeconomic burden of stroke, improvements in patient care are needed.

This involves not only fast and accurate diagnosis and treatment planning but also prediction

of disease progression and patient outcome. For this, medical imaging techniques have the

potential to access crucial individual information to establish new guidelines [7]. For example,

for treatment planning, advanced imaging techniques using Magnetic Resonance Imaging

(MRI) or Computed Tomography (CT) have already been shown to increase the time window

for treatment, enabling doctors to treat more stroke patients and improve outcomes [8]. On

top of the broad availability of neuroimaging techniques, Machine Learning (ML) approaches

have gained a lot of attention in the past years, and applications to medical imaging are on

the rise [9].

1.1 Machine Learning in Stroke Imaging

ML techniques enable the identification and extraction of patterns within data. In particular,

for high-dimensional and large datasets, which are no longer processable for humans, ML can

be a powerful tool. With the increase in computational power and data availability, complex

ML methods, so-called Deep Learning (DL) techniques, have become more popular [10]. This

also holds true for the field of medical imaging [11]. Here, the applications range from image

denoising [12], risk prediction [13], and segmentation [14] to diagnosis [15] and treatment

planning [16].

1. Introduction

DL has been applied to similar use cases in the subfield of stroke imaging. Among them are

the prediction of stroke time onset [17], lesion segmentation [18], tissue fate [19], and patient

outcome [20, 21]. Since time is a critical factor in the clinical setting of stroke, DL methods

could be particularly valuable because once DL networks are trained, the application to new

data is usually fast. This could substantially automate and speed up processes for tasks that

otherwise would need manual validation from experts.

While DL has already shown state-of-the-art results in many areas, these methods require

large datasets in order to unleash their full potential. In the medical field, acquiring large

datasets is expensive, involving several data processing steps [22]. Additionally, for many ML

and DL applications, expert annotations are needed [22]. A solution to this would be data

sharing, but this is often not feasible due to privacy restrictions. The data in neuroimaging

is particularly sensitive as scans contain the patients’ faces, which can be identified by face-

recognition software [23]. Even when blurring or removing the face and skull, the face

could be partially reconstructed [24]. Above all, the brain itself has a unique structure, and

individuals could potentially be re-identified based on their cortical foldings [25]. Thus, novel

anonymization methods are needed, i.e., techniques that do not allow re-identification. One

promising option is the synthesis of artificial data. For this, Generative Adversarial Networks

(GANs) have become popular in the last years [26].

1.2 Generative Adversarial Networks for Image Synthesis

A GAN is a type of model that consists of two neural networks that try to mislead each

other. One network is the generator, which synthesizes a data sample, and the other network

is the discriminator, which tries to identify whether a sample is real or synthesized by the

generator [26]. At the end of a successful training process, the generator can synthesize

realistic-looking data samples while preserving the predictive properties of the original data

samples.

GANs offer many opportunities in the analysis of medical images. They have already

shown good performance for medical images in image-to-image translations [27], denoising [28],

and data augmentation [29, 30]. Specific to stroke imaging, applications so far have covered

mostly lesion segmentation or the synthesis of images for lesion segmentation [31, 32, 33].

This thesis focuses on two main applications of GANs in stroke imaging: the synthesis of

anonymous, labeled images for data sharing and an image-to-image translation application for

fast, automated image processing in stroke treatment planning.

1.3 Aims and Contributions

In the first part of the thesis, we leverage GANs for synthesizing shareable brain images.

Specifically, we generate Time-of-Flight (TOF)-Magnetic Resonance Angiography (MRA)

images, which show blood vessels. This type of imaging is mainly used for diagnosis in

the clinical context of cerebrovascular diseases such as stroke. Here, we utilize them for

the use case of brain vessel segmentation. We generate synthetic TOF-MRA images along

with their segmentation label for Two-dimensional (2D) and Three-dimensional (3D) patches.

Additionally, we examine the effect of introducing privacy measures in the generation process.

1.3 Aims and Contributions

Overall, the first part of the thesis aims to generate medically realistic and labeled images for

data sharing and evaluate their utility compared to the real data. Moreover, the generalizability

on a second dataset is tested, and the impact of restricting privacy leakage is measured.

In the second part of this thesis, we consider another application of GANs for image synthesis,

i.e., automatic image processing. In particular, we synthesize perfusion parameter maps from

Dynamic Susceptibility Contrast Magnetic Resonance Imaging (DSC-MRI). Perfusion maps

show the blood flow within the brain and are therefore relevant for treatment planning in

stroke. To create perfusion maps from DSC-MRI, experts are usually needed for manual

validation. Here, we propose a modified architecture of a so-called pix2pix GAN that directly

synthesizes the manually validated perfusion maps. The second part aims to speed up the

treatment decision making in the clinical setting.

Taken together, the main contributions of this thesis are:

•

Synthesis of 2D TOF-MRA with corresponding labels for brain vessel

segmentation (Chapter 4 and 6). We utilize different GAN architectures to synthesize

2D TOF-MRA patches and their segmentation mask showing the location of brain vessels.

Synthesizing the image along with the segmentation mask allows us to evaluate the

utility of our synthesized patches by training a segmentation network on the generated

data and testing it on real-world data. We show that our segmentation network trained

on synthetic data still performs well compared to the segmentation network trained on

real data.

•

Simulation of sharing synthesized data and application to a similar dataset

(Chapter 4). To test the generalizability of our segmentation model trained on

synthesized patches, we measure the segmentation performance on another similar

dataset and fine-tune it with an increasing amount of new patches. We demonstrate that

our fine-tuned network achieves better segmentation performance than a network trained

only on the second dataset. Thus, fewer newly annotated data samples are needed for

the same segmentation performance. By this means, we could showcase the positive

impact of sharing synthetic data.

•

Framework for generating labeled 3D data (Chapter 5). We extend our GAN

architecture to synthesize high-resolution 3D volumes to capture the 3D structure of the

brain better. Since the computational load of this model is substantially increased, we

implement measures to reduce memory consumption and training time, such as mixed

precision and the two timescale update rule. Here, we demonstrate that generating

high-resolution and labeled 3D images is feasible, and incorporating the third dimension

might even be beneficial for the downstream medical task if the computational setup

allows for it.

•

Elaborated evaluation scheme for synthetic data using a pre-trained Medi-

calNet, precision and recall, and a segmentation network (Chapter 5). We

evaluate our generated 3D patches by utilizing the MedicalNet, a neural network pre-

trained on different types of medical imaging. We compare the activation when providing

our synthesized and real data as input to the network in terms of the Fréchet Inception

Distance (FID) and precision and recall of the distributions to quantify both image quality

1. Introduction

and variations. Additionally, we train a 3D segmentation network on the synthesized

volumes and test it on real data to measure the usability of the generated data in the

context of brain vessel segmentation.

•

Introduction of differential privacy in GAN architecture for synthesizing

labeled TOF-MRA (Chapter 6). By inserting carefully calibrated noise into our

GAN architecture, we provide an upper bound on privacy leakage of our training data.

In this way, we can measure not only the utility of our generated data but also quantify

privacy and investigate the trade-off between these two properties.

•

Development of adapted pix2pix GAN architecture for automated perfusion

map generation from DSC-MRI (Chapter 7). We develop a modified version of

pix2pix GANs with additional temporal convolutions to generate perfusion maps from

DSC-MRI. As perfusion maps are usually manually validated by an expert in the clinical

setting, we aim to automate this step and speed up treatment planning with our GAN

architecture.

1.4 Thesis Outline and Publications

This thesis is structured as follows: Chapter 2 provides background information about medical

and technical methodologies. The medical background provides insights into stroke and the

relevant imaging techniques used in this thesis, while the technical background focuses on

GANs, the underlying principles, and their variants. The main chapters of the thesis are

split into two parts: the synthesis of medical images for the purpose of data sharing and

the generation of one image from another (image-to-image translation) in the context of

treatment planning. The first part consists of three chapters based on three different but

related publications. The first publication (Chapter 4) utilizes GANs to synthesize 2D labeled

TOF-MRA patches. Chapter 5 describes how this can be extended to 3D medical image

synthesis. The next Chapter (Chapter 6) explores the possibility of leveraging differential

privacy to synthesize 2D labeled image patches with privacy guarantees. The second part of

this thesis concerns image-to-image translation in stroke. This part consists of Chapter 7,

in which we automatically process perfusion-weighted imaging from 3D image sequences to

interpretable, expert-level perfusion maps that could be utilized for stroke treatment planning.

The last Chapter discusses and concludes the findings of this work. Moreover, it provides an

outlook on GANs in medical imaging.

The following publications are part of this thesis:

T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J.

Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing

anonymized and labeled TOF-MRA patches for brain vessel segmentation using generative

adversarial networks”. In: Computers in Biology and Medicine 131 (2021). doi:

10.1016

/j.compbiomed.2021.104254, see Chapter 4

P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert,

J. Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I.

1.4 Thesis Outline and Publications

Madai. “Generating 3D TOF-MRA Volumes and Segmentation Labels using Generative

Adversarial Networks”. In: Medical Image Analysis (2022). doi:

10.1016/j.media.2022

.102396, see Chapter 5.

This is an open access article under the CC BY license.

T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand,

S. Pokutta, K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach,

and D. Frey. “Toward Sharing Brain Images: Differentially Private TOF-MRA Images

With Segmentation Labels Using Generative Adversarial Networks”. In: Frontiers in

Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.813842, see Chapter 6.

This is an open access article under the CC BY license.

T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland,

A. Hilbert, J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative

adversarial networks for synthesizing perfusion parameter maps from DSC-MR images

in cerebrovascular disease”. In: medRxiv (2022). doi:

10.1101/2022.05.24.22274901

see Chapter 7.

This is an open access article under the CC BY license.

Additionally, I contributed to the following publications while working on this thesis:

M. Livne, J. Rieger, O. U. Aydin, A. A. Taha, E. M. Akay, T. Kossen, J. Sobesky, J. D.

Kelleher, K. Hildebrand, D. Frey, and V. I. Madai. “A U-Net Deep Learning Framework

for High Performance Vessel Segmentation in Patients With Cerebrovascular Disease”.

In: Frontiers in Neuroscience 13 (2019), p. 97. doi:10.3389/fnins.2019.00097

M. Ivantsits, L. Goubergrits, J.

M. Kuhnigk, M. Huellebrand, J. Brüning, T. Kossen, B.

Pfahringer, J. Schaller, A. Spuler, T. Kuehne, and A. Hennemuth. “Cerebral Aneurysm

Detection and Analysis Challenge 2020 (CADA)”. In: Cerebral Aneurysm Detection and

Analysis: First Challenge, CADA 2020, Held in Conjunction with MICCAI 2020, Lima,

Peru, October 8, 2020, Proceedings. Lima, Peru: Springer-Verlag, 2020, pp. 3–17. isbn:

978-3-030-72861-8. doi:10.1007/978-3-030-72862-5_1

M. Ivantsits, L. Goubergrits, J.

M. Kuhnigk, M. Huellebrand, J. Bruening, T. Kossen,

B. Pfahringer, J. Schaller, A. Spuler, T. Kuehne, Y. Jia, X. Li, S. Shit, B. Menze, Z. Su,

J. Ma, Z. Nie, K. Jain, Y. Liu, Y. Lin, and A. Hennemuth. “Detection and analysis of

cerebral aneurysms based on X-ray rotational angiography - the CADA 2020 challenge”.

In: Medical Image Analysis 77 (2022). doi:10.1016/j.media.2021.102333

A. Meddeb, T. Kossen, K. K. Bressem, B. Hamm, and S. N. Nagel. “Evaluation of a Deep

Learning Algorithm for Automated Spleen Segmentation in Patients with Conditions

Directly or Indirectly Affecting the Spleen”. In: Tomography 7.4 (2021), pp. 950–960.

doi:10.3390/tomography7040078

Preliminaries

2.1 Stroke Types and Treatment

A stroke is a sudden cerebrovascular event that either ruptures or blocks blood vessels in the

brain. It can be subdivided into two main categories: hemorrhagic and ischemic stroke. If a

blood vessel in the brain is ruptured, the event is called a hemorrhagic stroke. This rupture

leads to bleeding inside the brain and is associated with high mortality and morbidity [42].

Early diagnosis and treatment are crucial for a good outcome. Ischemic stroke is the most

common type of stroke and is usually caused by a blood clot, the so-called thrombus, that

blocks a blood vessel in the neck or brain [43]. The blockage reduces or even stops the blood

flow and leads to oxygen deprivation in the brain, eventually resulting in the death of the

affected brain cells. Therefore, immediate action is again required when an ischemic stroke

occurs [43]. This thesis focuses on this more common type of stroke, the ischemic stroke.

After a patient is admitted to a hospital with stroke symptoms, brain imaging is performed

for diagnosis and treatment planning. Current treatment options for an acute ischemic stroke

are intravenous medication or mechanical thrombectomy, i.e., removing the thrombus in

an operation [7]. Treatment planning still relies heavily on the time of symptom onset. If

symptoms started in the past 4–6 hours, patients are considered eligible for treatment [7, 44].

Beyond that time window, the risks of treatment are considered to outweigh the expected

benefits. This treatment stratification approach is particularly problematic for patients with

unknown stroke onset, e.g., for patients who wake up with stroke symptoms (“wake-up stroke”)

that make up 8–25% of stroke patients [45]. Treatment guidelines currently still rely on the time

window approach, whereas new clinical trials provide insights into more precise image-based

treatment stratification. Recent studies using advanced imaging techniques suggested that

groups of patients might still benefit from treatment in a time window of up to 24 hours after

the stroke [7, 8, 44]. Even patients with unknown stroke onset have been found to still benefit

from treatment. Therefore, brain imaging is crucial to move towards a more individualized

stroke treatment that shifts away from the classical time window approach.

2. Preliminaries

2.2 Stroke Imaging

Medical imaging techniques play a crucial role not only in stroke treatment but also stroke

diagnosis [46]. In particular, they can help determine the type of stroke, identify infarcted or

salvageable brain tissue, and abnormalities of the brain vessels.

In the context of stroke, it is crucial to look at the cerebrovascular system, i.e., the brain

vessels on a fine-scale anatomical level, as well as the blood flow and supply to the different

areas of the brain. To this end, both CT and MRI are commonly used in stroke imaging. Here,

we focus on MRI, specifically TOF-MRA, to visualize the blood vessels and DSC-MRI to show

the perfusion on a larger scale.

2.2.1 Time-of-Flight (TOF) MRA

TOF-MRA is a non-contrast-enhanced imaging technique to visualize vascular systems such as

the cerebrovascular system. It relies on the fact that blood in the brain vessels is flowing in

contrast to other stationary tissues [47]. While acquiring images, radiofrequency pulses are

repeatedly applied to a part of the body, in this case, brain slices or volumes [48]. This causes

protons in the stationary tissue to saturate. In contrast to that, the saturation of the protons

in the flowing blood takes longer due to new blood entering the scanned slice, which was not

previously exposed to the radiofrequency pulse [48]. This can be leveraged to make stationary

tissue appear darker while flowing tissue, such as the blood in the cerebral vessels, appear

lighter in the scanned 3D image (see Figure 2.1). More details about the underlying physical

principles can be found in the book chapter by Kirulata et al. [47] as well as in the book by

Carr et al. [49].

Figure 2.1: TOF-MRA image. The brain vessels appear with high intensity (white) on a

TOF-MRA image, whereas static tissue appears in gray.

TOF-MRA can be utilized to diagnose cerebrovascular diseases such as aneurysms or

steno-occlusive disease in the clinical setting [50, 51]. With regard to stroke, TOF-MRA can be

used as routine imaging to inspect whether a large vessel occlusion is present [52]. Examining

vessel occlusions is particularly important for treatment selection. For example, mechanical

2.2 Stroke Imaging

thrombectomy is recommended for acute ischemic stroke patients with large vessel occlusions

within a specific time window [53]. Thus, information about the brain vessels can have high

diagnostic value.

Recent studies suggest that brain vessel status could have medical value beyond the

assessment in the clinical routine. Gutierrez et al. showed that the anatomical structure could

be a biomarker for vascular events [54]. Furthermore, Frey et al. demonstrated that individual

vessel structures could be translated into a hemodynamic simulation to detect areas vulnerable

to stroke [55]. This simulation could be beneficial for both stroke prevention and outcome

prediction.

So far, brain vessels are only visually assessed in the clinical routine. Automating brain

vessel assessment could add value to diagnosis as well as stroke risk and outcome. The first step

of automated analysis of the brain vessels is vessel segmentation. For this, a neural network

specialized in segmentation, a U-Net, proved to be successful [38, 56, 57]. The problem of

automated brain vessel segmentation served as a use case for Chapter 4–6.

2.2.2 Dynamic Susceptibility Contrast (DSC) MRI

DSC-MRI is an imaging technique that measures the blood flow, i.e., the perfusion in the

brain. It is a widely used technique for stroke assessment and is essential for identifying the

penumbra [58]. The penumbra is the brain tissue around the infarct that currently shows

reduced blood flow but could be salvaged if reperfusion occurs. Thus, estimating the penumbra

is important for treatment decisions and could extend the time window for treatment for

certain groups of patients [59].

Figure 2.2: Diffusion-perfusion mismatch. Perfusion-weighted imaging (A) shows the perfusion

in the brain. In the right hemisphere, a large hypoperfused area can be seen (green/orange). In

contrast to that, the Diffusion-weighted Imaging (DWI), here presented as the apparent diffusion

coefficient in (B), shows the infarct core that is smaller (dark area). The mismatch between the

two image sequences roughly indicates the salvageable tissue.

To estimate the penumbra and thus the salvageable tissue, the diffusion-perfusion mismatch

model has been proposed, which aims to assess the penumbra. According to the model, the

mismatch between Diffusion-weighted Imaging (DWI) and perfusion-weighted imaging provides

an estimate of the salvageable tissue [60] and can therefore be used for patient stratification [61].

DWI measures the Brownian motion of the water molecules within a voxel, which can be used

to visualize the infarct core [46], while perfusion-weighted imaging such as DSC-MRI shows the

2. Preliminaries

perfusion of the brain. If there is a mismatch between the infarct core and the hypoperfused

area, the time window of treatment could be extended [62, 63] (see Figure 2.2).

DSC-MRI measures the perfusion by injecting a contrast agent into the patient’s blood.

After that, a series of MRIs is recorded, which results in a temporal sequence of 3D images

(hence a Four-dimensional (4D) image) and records the contrast agent’s flow through the brain.

The signal for each recorded voxel can be translated into a tissue concentration curve over time

(see Figure 2.3A). This is then voxel-wise deconvolved with a so-called Arterial Input Function

(AIF) resulting in a deconvolved tissue concentration curve (see Figure 2.3B). The AIF can be

automatically determined but is, in practice, manually validated by an expert. The two curves

depicted in Figure 2.3 show properties that provide insights into the perfusion of the patient’s

brain. The most important and clinically relevant parameters that can be extracted based on

these curves are Cerebral Blood Flow (CBF), Cerebral Blood Volume (CBV), Mean Transit

Time (MTT), Time-to-Maximum (Tmax), and Time-to-Peak (TTP). These five perfusion

parameter maps are 3D images with different clinical interpretations.

Figure 2.3: Tissue concentration curves over time for DSC-MRI. The measured concentration

curve is deconvolved with the AIF (A). This results in the concentration curve displayed in (B).

The five perfusion parameters CBF, CBV, MTT, Tmax, and TTP can be inferred from these two

curves. The figure is based on Figure 3 by Østergaard [64].

The CBF measures the blood supply to a given brain. It is usually estimated by the height

at timepoint Tmax in the deconvolved curve (see Figure 2.3B). It depends on the cerebral

perfusion pressure, the dilation of vessels, and the viscosity of the blood [65]. The CBV reflects

the area under the deconvolved curve and assesses the whole blood quantity within the brain

tissue. The ratio of the CBV and the CBF is defined as the MTT. MTT reflects the average

time the blood takes to enter the vessels and stay in the brain tissue. According to Kim et

al. [58], it might overestimate the penumbra. Tmax is regarded as the most reliable parameter

to measure the penumbra [65]. It is defined as the required time until the deconvolved curve

reaches its maximum. TTP also describes the time until the maximum is reached but refers

to the tissue concentration curve before deconvolution. All mentioned parameter maps are

shown in Figure 2.4. In Chapter 7, we propose a GAN-based method to generate the different

perfusion parameter maps from the DSC-MRI automatically.

2.3 Challenges of Medical Imaging Data in Deep Learning

Figure 2.4: Perfusion-weighted imaging. The parameter maps CBF, CBV, MTT, Tmax, and

TTP are calculated from the DSC source image. The DSC source is here shown at time point 0,

i.e., without contrast agent.

2.3 Challenges of Medical Imaging Data in Deep Learning

While medical imaging data offers great potential in diagnosis and treatment in the clinical

routine, there are still challenges when applying DL techniques to this type of data. First,

medical images are often high-dimensional. Hence, training DL models on them usually comes

along with high computational costs [66]. Second, medical imaging data is heterogeneous.

Hospitals utilize different scanners and imaging protocols, resulting in image variations across

institutions [11]. Furthermore, disease prevalence and demographics can vary across regions

leading to different patient cohorts [11]. Another problem is general data availability. DL

techniques require large amounts of data with high quality [67]. Additionally, medical imaging

data often needs to be annotated by one or more medical experts, which is expensive and

time-consuming [67]. To mitigate the problem of data availability, efficient use of data is crucial.

For this, transfer learning and fine-tuning as well as data augmentation can be utilized [9,

67]. Data augmentation aims to enrich the dataset by adding images with slight modifications

compared to the original images or by adding synthesized images, e.g., using GANs (see

Section 2.4).

On top of the above-mentioned challenges, privacy restrictions and thus ethical and legal

considerations of utilizing medical images should be taken into consideration [9, 11]. Even the

synthesis of artificial data does not automatically entail complete privacy preservation [68, 69].

To counteract privacy concerns, DL methods (including GANs) can be enhanced by secure

and private Artificial Intelligence (AI). Secure AI approaches intend to protect the algorithms,

2. Preliminaries

whereas private AI aims to protect the data [70]. Examples of secure AI are encryption and

federated learning. Federated learning is a technique that allows for decentralized training

of an ML model [71]. Instead of transferring data from one institution to another, the

model’s weights are shared during the training process. Hence, the security risk of transferring

data can be circumvented. However, the data itself is not secured by a federated learning

approach [70]. For this, private AI techniques can be complementary and protect the data

from re-identification. Examples of this are classical techniques such as anonymization and

pseudonymization and more advanced concepts such as differential privacy. Anonymization

aims to prevent re-identification by simply removing re-identifiable information, whereas

pseudonymization replaces this information. However, both methods can be seen as insufficient

to protect the data from re-identification [70]. Therefore, differential privacy has become an

important topic of research. It is a mathematical concept that puts a bound on individual

privacy leakage [72]. The intuition behind differential privacy is that a computation on a

specific dataset and a computation on the same dataset with one additional data sample should

have a very similar output. In other words, one sample should not have a large impact on

an algorithm to change the algorithm’s output completely, thus revealing that this particular

data sample is part of the training set. In Chapter 6 of this thesis, we investigate the influence

of differential privacy in a GAN architecture for synthesizing TOF-MRA images.

2.4 Generative Adversarial Networks

GANs are deep neural networks that synthesize data samples utilizing adversarial training.

They were first introduced in 2014 by Goodfellow et al. [26], who used them for synthesizing

natural images such as hand-written digits or faces. Starting in 2016, the first GANs were

applied to medical imaging data [30]. These applications range from data augmentation,

privacy preservation, and anomaly detection [29, 73, 74] to architectures for segmentation and

classification problems [75, 76]. Moreover, GANs can be used for image-to-image translations

such as denoising or cross-modality synthesis [77, 78].

2.4.1 Standard GAN

A classical GAN is usually trained for data synthesis. It consists of two simultaneously trained,

fully connected neural networks: the generator and the discriminator (see Figure 2.5). While

the generator (

) tries to synthesize a realistic data sample, the task of the discriminator (

)

is to distinguish between real data samples and the samples synthesized by the generator [26].

During training, the discriminator gets feedback on which of the samples were real and

which were generated. Additionally, the discriminator provides feedback to the generator

on how realistic the generated sample looks. After successfully training both networks, the

discriminator should be good at distinguishing between real and generated samples, and the

generator should synthesize samples that approximate the distribution of the real samples. As

both networks are trained on two opposing objectives, they can be regarded as adversaries.

Formally, the objective function of the generator can be formulated as:

LG=maxGExgen∼pgen [log(D(xgen))] (2.1)

2.4 Generative Adversarial Networks

Figure 2.5: The architecture of a classical GAN. The generator (

) takes a noise vector as an

input and produces a data sample. The task of the discriminator (

) is to decide whether a sample

looks realistic or not.

where

xgen

is the generated sample from the distribution

pgen

. The objective function for the

generator is maximized if the discriminator regards

xgen

as realistic (

(

xgen

)close to 1). In

contrast to that, the discriminator’s objective function is maximized if

xgen

is identified as

generated (

(

xgen

)close to 0) and the real samples

xreal ∼preal

as real (

(

xreal

)close to 1),

hence:

LD=maxDExreal∼preal [log D(xreal)] + Exgen∼pgen [log(1 −D(xgen))] (2.2)

The original GAN approach has several drawbacks. Three of the main disadvantages are

instability of the training, mode collapse, and vanishing gradients [79]: Since the two networks

have opposing objectives, the training can become unstable, and there is no guarantee that

the networks will converge. Additionally, the generator might find a data sample that looks

realistic to the discriminator and thus, generates only this sample with or without small

variations. This leads to poor variety within the samples and is termed mode collapse. The

problem of vanishing gradients concerns the training of the GAN. If the discriminator is too

good at detecting the synthesized samples, the generator does not get enough information

to generate more realistic samples. Thus, the generator’s gradient is small. During the

weight update (backpropagation), the gradient is then too small such that it vanishes, and the

generator does not improve anymore. Due to these limitations and to increase the range of

applications, modifications to the classical GAN have been proposed. Among those variants are

the Wasserstein Generative Adversarial Network (WGAN) [80] as well as the pix2pix GAN [81].

The WGAN is a variant that tries to solve the problems listed above, whereas the pix2pix GAN

is specialized for generating one image based on another, so-called image-to-image translations.

2.4.2 Wasserstein GAN

Several improvements have been proposed to counteract the disadvantages of the original GAN

model. First, Radford et al. suggested a deep convolutional GAN consisting of convolutional

layers instead of fully connected ones [82]. They reported more stable training and overall

good image representations that could be leveraged in image classification tasks. Additionally,

2. Preliminaries

Salimans et al. suggested improvements to the classical GAN architecture, such as feature

matching or minibatch discrimination [83]. These changes contributed to stabilizing training

as well as reducing mode collapse. To substantially stabilize and improve training, Arjovsky et

al. introduced a new type of GAN, the WGAN [80]. The WGAN provides a new perspective

on the cost functions and overall roles of the generator and the discriminator. This resulted in

more stable networks less prone to mode collapse and vanishing gradients.

The generator’s task can be regarded as synthesizing an approximation to the real data

distribution. Assuming an optimal discriminator, the classical GAN essentially minimizes

the Jensen-Shannon divergence (JS divergence) between the distribution of the real and the

distribution of the generated data [26]. Instead of minimizing the JS divergence, Arjovsky et al.

proposed to minimize the Wasserstein or Earth mover’s distance between the two distributions.

It describes the minimum cost of transporting “mass” from one point to the other in order to

transform one distribution (

pgen

) into another distribution (

preal

). Formally, it can be defined

as:

W(pgen, preal) = inf

γ∈Π(pgen,preal)

E(x,y)∼γ[∥x−y∥],(2.3)

where Π(

pgen, preal

)is the set of all joint distributions

(

x, y

). Using the Kantorovich-Rubenstein

duality, this can be simplified to [80]:

W(pgen, preal) = sup

∥f∥L≤1

Exreal∼preal [f(xreal)] −Exgen∼pgen [f(xgen)].(2.4)

Here, fdenotes a 1-Lipschitz function, which is a function fulfilling the following constraint:

|f(x1)−f(x2)| ≤ |x1−x2|.(2.5)

In WGANs, the task of the discriminator is to learn this 1-Lipschitz function in order to help

compute the Wasserstein distance. It outputs a critic score rather than a probability of how

realistic a sample looks. Due to this new role, the discriminator in a WGAN is termed critic.

Compared to the JS divergence in classical GANs, the Wasserstein distance has a more reliable

gradient and is differentiable almost everywhere [80]. Thus, WGANs are more stable and suffer

less from vanishing gradients. Even if the discriminator is trained optimally, the generator

would still be able to learn from it. Additionally, the problem of mode collapse does not seem

to be present [80].

The objective functions of Gand the critic in a WGAN are:

LG=maxGExgen∼pgen [f(xgen)] (2.6)

Lcritic =maxw∈WExreal∼preal [f(xreal)] −Exgen∼pgen [f(xgen)],(2.7)

where

denotes the weights in the compact space

. To enforce the Lipschitz continuity,

the gradients of the critic are clipped in the WGAN. Since this can still lead to unstable

training, Gulrajani et al. came up with a more elegant solution, the gradient penalty [84].

The gradient penalty restricts the critic’s weights by the carefully constructed penalty term

λ(∥∇D(ϵxreal + (1 −ϵ)xgen)∥ − 1)2with ϵ∼U[0,1] and λweighting the regularization.

2.4 Generative Adversarial Networks

2.4.3 Pix2pix GAN

The pix2pix GAN was first introduced by Isola et al. [81]. It belongs to the subgroup of

conditional Generative Adversarial Networks (cGANs). In cGANs, the generator is usually

conditioned or dependent on other auxiliary information such as a certain label or class. The

objective function of a cGAN is:

LcGAN(G, D) = Ex,y[log D(x, y)] + Ex,z[log(1 −D(x, G(x, z)))],(2.8)

where

is the auxiliary information,

the output sample, and

a noise vector. Whereas the

generator tries to maximize this objective, the discriminator tries to minimize it. In the case

of pix2pix GANs

and

are both images and the noise vector

is usually introduced into

the generator as a dropout layer. In practice, directly feeding a noise vector into the generator

leads to the generator ignoring the noise [81]. Thus, noise is only introduced in the form of

dropout, which drops neurons in a neural network with a certain specified probability.

A pix2pix GAN allows for image-to-image translations, i.e., transforming an image from

one modality to another. For example, a landscape photo could be translated from a photo in

the daytime to a photo at night. In the clinical setting, a CT image could be computed based

on an MRI to get a larger variety of CT images [85]. Medical image-to-image translations could

also enable fast processing from one modality to another to reduce the number of necessary

scans or to save time [86].

In a pix2pix GAN, an image is fed into the generator as auxiliary information (see Figure 2.6).

This allows the generator to utilize the contextual information to generate a new image. The

input of the discriminator is a pair of images, the input image of the generator along with the

real image or together with the synthesized image. Again, the discriminator outputs whether

the two images look realistic or not.

Figure 2.6: The architecture of a pix2pix GAN. The input of the generator (

) is an image in

one modality, and the output is the same image in another modality. The task of the discriminator

(D) is to decide whether the image pair contains a generated image or not.

Since an image is fed into the generator, the architecture differs from the classical GAN

approach. Isola et al. proposed two different architectures for the generator [81]: an autoencoder

and a U-Net. An autoencoder consists of an encoding part for feature extraction and a decoding

part to rebuild the spatial dimensions. A U-Net adds to autoencoders by introducing skip

connections between layers in the encoder and the decoder. The U-Net architecture will be

2. Preliminaries

introduced in more depth in Section 2.5.2. In the study by Isola et al., the U-Net generator

outperformed the autoencoder architecture [81].

In the past years, pix2pix GANs have been successfully applied to medical images [30, 87].

They have been used to denoise images [77, 88] or to synthesize one imaging modality from

another (cross-modality synthesis) [78, 89, 90]. In this thesis, a modification of the pix2pix

GAN will be used to automatically process a 4D DSC-MRI into interpretable 3D perfusion

parameter maps (see Chapter 7).

2.5 Evaluation of Synthetic Images

Evaluating synthetic images is not an easy endeavor and depends on the setting in which

the image was synthesized. Therefore, many different approaches have been proposed. This

section discusses standard evaluation metrics, dividing them into image-based evaluation and

downstream task evaluation. The image-based approach compares the generated images or

their distributions directly or indirectly to the real ones, whereas in the downstream task, the

synthesized images are used for training another ML model. The performance of this model

on real test data is then utilized as a performance measure of the synthetic images.

2.5.1 Image-Based Evaluation

First, human observers could directly judge the quality of all synthesized images. This metric

does not rely on the specific architecture that synthesized the image and could always be used.

The disadvantage of medical images, however, is that one or several experts are often needed.

Therefore, this kind of evaluation is highly time-consuming and labor-intensive.

For some synthetic images, the ground truth image is available. Examples are image-to-

image translations with paired training data used to train a pix2pix GAN. For evaluation,

most studies rely on traditional metrics such as the mean absolute error, structural similarity

index measure, or peak signal-to-noise ratio to compare the generated image to the ground

truth [30]. In Chapter 7, we also utilize these metrics to evaluate our generated images.

If the ground truth is not available, the images could be evaluated via the activations of

a pre-trained network. Especially for non-medical images, the FID is a popular metric [91].

Specifically, the generated and real images are fed into a pre-trained inception network, which

is a deep, efficient neural network utilizing different filter sizes [92]. The difference between

the activation when feeding generated and real images is then measured. The smaller the

difference, the more similar the synthetic images are to the real ones. Since the inception

network is trained on natural images only, it is debatable whether this metric is suitable for

medical images as this has not been validated yet [30]. Nevertheless, it is still being used

in practice [93, 94]. Recently, a 3D network trained on different medical datasets has been

published [95]. This network offers an interesting possibility to replace the inception network

trained on non-medical images for evaluating 3D synthetic medical images (see Chapter 5).

The inception score is another popular metric for assessing the quality of synthetic non-

medical images [83]. It leverages a similar concept as the FID by utilizing the pre-trained

inception network. In contrast to the FID, it uses the network to predict the probability of a

generated image belonging to a specific class. The predictions are summarized in the inception

2.5 Evaluation of Synthetic Images

score. This score reflects whether images resemble an object of a specific class and whether the

wide range of classes is represented in the generated images. The inception score is therefore

specific to classification tasks and could be translated to medical image classification tasks if a

corresponding pre-trained network exists.

Another approach to estimating whether the generated images are realistic is to compare

the distribution of the real samples to the distribution of the generated samples. For this, the

log-likelihood [26], the Mahalanobis distance [96], the maximum mean discrepancy [97], or

similar metrics can be computed. For these metrics, the synthesized images are regarded as

more realistic the closer the two distributions are to each other. While the FID takes into

account the image quality, it lacks information about mode collapse, i.e., the variation within

the generated images. Thus, Sajjadi et al. introduced precision and recall for distributions [98].

Precision measures how much the distribution of the synthesized images can be generated by

a part of the real distribution. Thus, it quantifies the image quality of the synthetic images.

In comparison to that, recall measures how much of the real distribution can be accounted for

by a part of the generated distribution, hence quantifying mode collapse. Evaluating synthetic

images based on these two metrics offers a more detailed view of the quality of the images and

their diversity (see Chapter 5).

2.5.2 Evaluation Using a Downstream Task

Generating images for a downstream task is a good option to circumvent the problem of

selecting an appropriate metric for evaluating synthetic data [30]. Here, synthetic data and a

corresponding label are generated for a certain task, such as classification or segmentation.

After generation, a model is trained on the synthetic data, which can be evaluated on real

data. Its performance provides an estimate of how useful the synthesized data is.

Figure 2.7: The architecture of a U-Net. The image is fed through an encoding path for feature

extraction and a decoding path and outputs a segmentation mask. Skip connections preserve

spatial information. The figure is based on Figure 3 by Isola et al. [81].

In this thesis, we utilize a segmentation network as a downstream task for evaluating

synthetic data for brain vessel segmentation (see Chapter 4–6). For this, we use a specific type

of neural network, the U-Net architecture. Initially, the U-Net was proposed by Ronneberger et

al. for biomedical segmentation problems such as cell segmentation [56]. Since it is specialized

for a limited data set, it has been broadly applied to various other medical problems, including

brain vessel segmentation [38, 99]. The U-Net consists of an encoding and a decoding path

2. Preliminaries

with skip connections between the layers of the two paths (see Figure 2.7). Within these skirp

connections, the feature maps of the encoding path are copied to the decoding layers to better

preserve the spatial information. In the case of vessel segmentation, the performance can

be calculated by the metrics such as Dice Similarity Coefficient and the Hausdorff distance

between the predicted and ground truth segmentation [100]. The Dice Similarity Coefficient

(also known as the F1 score) measures the overlap of the predicted image and the ground truth

scaled by the total number of pixels. In contrast to this, the Hausdorff distance measures

the maximum of the minimum distances for each subset to another. It thus estimates of the

deviation between the predicted segmentation and the ground truth.

Related Work

In the following, we review related work in the field of GANs in medical imaging. We split the

review into the two parts of this thesis: the synthesis of medical images using unconditional

GANs for the purpose of data sharing and image-to-image translations, particularly for

treatment planning.

3.1

Synthesis of Medical Images Using Unconditional GANs

for Data Sharing

The motivation for synthesizing medical images using an unconditional GAN is usually

to increase data availability, typically for data augmentation or anonymization. Many

types of medical images were synthesized for data augmentation in the past years [101].

Among those were chest X-rays [102], retinal images [103, 104], and liver CTs [76]. In the

neuroimaging domain, Bowles et al. did a comprehensive study about augmenting both CT

and Magnetic Resonance (MR) images with progressive growing GANs for cerebrospinal fluid

segmentation [105]. Moreover, Shin et al. [29] and Foroozandeh et al. [106] synthesized images

using pix2pix GANs and progressive growing GANs, respectively, for improving brain tumor

segmentation performance.

In the context of medical image synthesis, most studies synthesized 2D images, although

many medical images have a third dimension. Thus, synthesizing only 2D images might

neglect important volumetric information. To date, only a few studies have generated 3D

images, which were usually downsampled due to computational constraints. For instance, both

Eklund [107] and Sun et al. [108] generated downsampled or resized 3D brain MR images.

Kwon et al. additionally synthesized thorax CTs [33]. In the field of stroke, Kwon et al.

generated downsampled 3D MR images with stroke lesions [33]. Still, there remains a lack

of studies that synthesize high-resolution 3D brain volumes. In particular for capturing fine

structures such as brain vessels, generating high-resolution volumes are crucial.

Unconditional GANs are especially useful for synthesizing privacy-preserving images as the

training images are not directly fed into the generator but indirectly via the discriminator’s

3. Related Work

input. There are only few applications of GANs for anonymization and thus for data

sharing in the medical imaging field. While Shin et al. tested their synthetic MR images for

anonymization [29], they did not provide any privacy guarantees. Other studies synthesized

chest X-ray images with differential privacy guarantees [73, 109, 110]. Zhang et al. [109]

and Nguyen et al. [110] additionally trained their GAN architectures in a federated learning

approach.

So far, the applications of GANs synthesizing data for sharing in the neuroimaging domain

are scarce with some MR sequences such as TOF-MRA not being considered. Additionally,

even fewer studies have generated 3D images, and no study to date has synthesized differentially

private neurological images. In the first part of this thesis, we explore the synthesis of labeled

TOF-MRA for the ultimate purpose of data sharing using different 2D GAN architectures

(see Chapter 4). In Chapter 5, we extend our GANs to 3D to capture the third dimension

and synthesize high-resolution TOF-MRA volumes. In the last Chapter of the first part (i.e.,

Chapter 6), we implement a differential private GAN to quantify the privacy and explore the

privacy-utility trade-off.

3.2 Image-to-Image Translation for Treatment Planning

GANs have shown to be useful for many image-to-image translation tasks. Among

them are image denoising [88], super-resolution [111], cross-modality synthesis [78], and

reconstruction [112]. The two main architectures used for image-to-image translations are

the pix2pix GAN [81] and the CycleGAN [113]. A CycleGAN consists of two discriminators

and two generators that are simultaneously trained and do not rely on aligned training pairs

as the pix2pix GAN. While the CycleGAN shows good performance in medical tasks [114,

115], it is still recommended to use a pix2pix GAN architecture when paired training data is

available [112].

In the field of stroke imaging, most studies using GAN architectures were applied to

the use case of lesion segmentation. Many ML approaches have already been suggested for

automating stroke lesion segmentation which could support clinicians in the treatment decision

making [116, 117, 118]. In this context, GAN-based studies included CT images that were

utilized to synthesize the segmentation mask using image-to-image GANs directly [119, 120]

or to generate MR scans from the CT images [31, 121]. Furthermore, Wang et al. used a

GAN to synthesize the segmentation mask from MRI [122], whereas Platscher et al. used the

segmentation mask to create new MR scans [123]. Other GAN applications in stroke include

denoising, such as CT super-resolution [111] and low-dose to full-dose CT perfusion [124], as

well as missing MR sequence synthesis [125] and white matter hyperintensities prediction [126].

Recently, Benzakoun et al. proposed a GAN that aimed to shorten MRI scanning time for

stroke [86]. They synthesized fluid-attenuated inversion recovery images from DWI, which can

be used for treatment decisions when the stroke onset time is unknown. The study showed that

the synthesized fluid-attenuated inversion recovery images had a similar diagnostic performance

to the real images.

Similar to Benzakoun et al. [86], the automatic processing of 4D DSC-MRI images resulting

in expert-level perfusion parameter maps could save time in stroke care. McKinley et al.

3.2 Image-to-Image Translation for Treatment Planning

have utilized classical ML approaches for generating expert validated perfusion maps. Other

related studies have created perfusion maps using DL approaches such as adapted U-Net

architectures [127, 128, 129]. While these studies show initial promising results, no study has

synthesized the perfusion maps utilizing GANs yet. In Chapter 7, we implement a modified

pix2pix GAN with additional temporal convolutions to generate expert-level perfusion maps

from DSC-MRI.

Part I

Synthesis of Medical Images for

Data Sharing

Synthesizing Anonymized and Labeled

TOF-MRA Patches for Brain Vessel

Segmentation Using Generative

Adversarial Networks

4.1 Context Within Thesis

GANs are typically used to synthesize new data. In this study, we focused on the feasibility

of GANs for generating 2D TOF-MRA image patches. This type of imaging visualizes

the cerebrovascular system, which is clinically relevant for cerebrovascular diseases such as

stroke. TOF-MRA images can be used to segment brain vessels and extract the patient’s

individual vessel tree. To evaluate the synthetic TOF-MRA patches appropriately, we generated

segmentation labels along with the image patches. We compared three different GAN

architectures for synthesis and trained a U-Net on the different synthesized image-label

pairs. These models were then evaluated and tested on real data.

Additionally, we simulated data sharing of our synthesized image-label pairs in a transfer

learning approach by fine-tuning the models trained on synthetic data on a second dataset.

We compared the segmentation performance of the pre-trained models to a model trained from

scratch with an increasing number of patches from the second dataset. We showed that our

pre-trained models led to superior performance compared to the newly trained model. This

chapter laid the groundwork for Chapters 5 and 6.

4. Synthesizing Anonymized and Labeled TOF-MRA Patches for Brain Vessel

Segmentation Using Generative Adversarial Networks

4.2 Journal Article

This chapter is based on the following publication that was published in Computers in Biology

& Medicine:

T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J.

Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing

anonymized and labeled TOF-MRA patches for brain vessel segmentation using

generative adversarial networks”. In: Computers in Biology and Medicine 131 (2021).

doi:10.1016/j.compbiomed.2021.104254

The original journal article is reprinted with permission of Elsevier.

Author Contribution

The first author Tabea Kossen conceptualized the study and interpreted the results together

with VIM, ML and DF. She implemented the GANs architectures and evaluations or supervised

PS in doing so. Additionally, she was responsible for the project administration, wrote the first

version of the manuscript, created the figures and coordinated the journal submission process.

Code Availability

The code for this project is publicly available:

https://github.com/prediction2020/GANs

-for-anonymized-labeled-TOF-MRA-patches.

Computers in Biology and Medicine 131 (2021) 104254

Available online 15 February 2021

Synthesizing anonymized and labeled TOF-MRA patches for brain vessel

segmentation using generative adversarial networks

Tabea Kossen

, Pooja Subramaniam

, Vince I. Madai

, Anja Hennemuth

Kristian Hildebrand

, Adam Hilbert

, Jan Sobesky

, Michelle Livne

, Ivana Galinovic

Ahmed A. Khalil

, Jochen B. Fiebach

, Dietmar Frey

CLAIM - Charit´

e Lab for AI in Medicine, Charit´

e Universit¨

atsmedizin Berlin, Germany

Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany

Department of Electrical Engineering and Computer Science, Technical University of Berlin, Berlin, Germany

School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK

Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charit´

e Universit¨

atsmedizin Berlin, Berlin, Germany

Fraunhofer MEVIS, Bremen, Germany

Department VI Computer Science and Media, Beuth University of Applied Sciences, Berlin, Germany

Johanna-Etienne-Hospital, Neuss, Germany

Centre for Stroke Research Berlin, Charit´

e Universit¨

atsmedizin Berlin, Berlin, Germany

Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

Mind, Brain, Body Institute, Berlin School of Mind and Brain, Humboldt University Berlin, Berlin, Germany

Berlin Institute of Health, Berlin, Germany

ARTICLE INFO

Keywords:

Anonymization

Generative adversarial networks

Image segmentation

ABSTRACT

Anonymization and data sharing are crucial for privacy protection and acquisition of large datasets for medical

image analysis. This is a big challenge, especially for neuroimaging. Here, the brain’s unique structure allows for

re-identification and thus requires non-conventional anonymization. Generative adversarial networks (GANs)

have the potential to provide anonymous images while preserving predictive properties.

Analyzing brain vessel segmentation, we trained 3 GANs on time-of-flight (TOF) magnetic resonance angi-

ography (MRA) patches for image-label generation: 1) Deep convolutional GAN, 2) Wasserstein-GAN with

gradient penalty (WGAN-GP) and 3) WGAN-GP with spectral normalization (WGAN-GP-SN). The generated

image-labels from each GAN were used to train a U-net for segmentation and tested on real data. Moreover, we

applied our synthetic patches using transfer learning on a second dataset. For an increasing number of up to 15

patients we evaluated the model performance on real data with and without pre-training. The performance for all

models was assessed by the Dice Similarity Coefficient (DSC) and the 95th percentile of the Hausdorff Distance

(95HD).

Comparing the 3 GANs, the U-net trained on synthetic data generated by the WGAN-GP-SN showed the highest

performance to predict vessels (DSC/95HD 0.85/30.00) benchmarked by the U-net trained on real data (0.89/

26.57). The transfer learning approach showed superior performance for the same GAN compared to no pre-

training, especially for one patient only (0.91/24.66 vs. 0.84/27.36).

In this work, synthetic image-label pairs retained generalizable information and showed good performance for

vessel segmentation. Besides, we showed that synthetic patches can be used in a transfer learning approach with

independent data. This paves the way to overcome the challenges of scarce data and anonymization in medical

imaging.

1. Introduction

Modern deep learning methods have revolutionized the field of

natural image analysis [18,32]. These methods are translated to medical

image analysis with growing success [19,20,27]. However, in contrast to

natural images, the number of data sets in medical image analysis are

* Corresponding author. CLAIM - Charit´

e Lab for AI in Medicine, Charit´

e Universit¨

atsmedizin Berlin, Germany.

E-mail address: [email protected] (T. Kossen).

Contents lists available at ScienceDirect

Computers in Biology and Medicine

journal homepage: http://www.elsevier.com/locate/compbiomed

https://doi.org/10.1016/j.compbiomed.2021.104254

Received 17 November 2020; Received in revised form 27 January 2021; Accepted 3 February 2021

Computers in Biology and Medicine 131 (2021) 104254

usually orders of magnitude smaller since their availability is limited

owing to data privacy regulation. This poses a continuous challenge for

deep learning research in the medical imaging field. To meet this chal-

lenge, anonymization of medical images is an essential method to ensure

both data privacy and data availability for research. However, current

anonymization methods in neuroimaging such as face blurring or face

removal still allow re-identification and thus cannot be applied [1,26,

34]. These results call for new techniques to anonymize medical neu-

roimaging data to both protect patient privacy and to facilitate research

progress.

Generative adversarial networks (GANs) have the potential to fulfill

this need. GANs have already been applied successfully for medical

imaging data synthesis [24,33,37]. Also, first pilot studies have already

made use of GANs for anonymization purposes [15,30]. However, ap-

plications for neuroimages are scarce and synthesizing images often

requires additional patient information such as a segmentation label

[30]. This means that patient information is still fed into the model and

the generated images are then not properly anonymized. Thus, there is a

need to investigate the ability of GANs to create state-of-the-art anon-

ymous synthetic neuroimaging data maintaining the predictive prop-

erties of the original data. Importantly, such an approach would have

the most beneficial impact if the corresponding labels would be created

in the same process since many supervised deep learning applications

require time-consuming manual labeling of the dataset by experienced

physicians.

In this work, we utilize arterial brain vessel segmentation to test the

ability of GANs to create synthetic neuroimaging data and correspond-

ing labels. Moreover, we investigate the generalizability of the synthe-

sized data on a second, independent dataset. With respect to the

generative architectures, we train 3 different GAN architectures on time-

of-flight (TOF) magnetic resonance angiography (MRA) image patches

of patients with cerebrovascular disease: 1) Deep Convolutional GAN

(DCGAN), 2) Wasserstein GAN with gradient penalty (WGAN-GP) and 3)

WGAN-GP using spectral normalization (WGAN-GP-SN). With each GAN

type, we synthesized both the image and the corresponding label. We

validate the generated synthetic patches using two different approaches.

In the first approach, we evaluate the quality of the generated patches a)

using the Fr´

echet inception distance (FID) and b) by training a vessel

segmentation U-net on the synthetic patches. The U-net’s performance is

then assessed on real test data. In total, 66 patients were utilized for this

analysis. In the second approach, we use the synthetic patches to pre-

train a vessel segmentation model and apply the network weights in a

transfer learning setting to pre-initialize the training of a U-net model

using up to 15 patients from a second, independent TOF-MRA dataset.

The performance of this model is then compared to a U-net model

without any pre-training. Finally, to facilitate and accelerate future

research on arterial vessel segmentation and to corroborate the useful-

ness of the effective anonymization procedure, we make the synthesized

image-label pairs generated in our study available upon request.

Taken together, the contributions of the paper are: We present

effectively anonymized and labeled TOF-MRA patches for brain vessel

segmentation. To our knowledge, for the first time for this kind of im-

aging modality. Furthermore, we compare three different state-of-the-

art GAN architectures and evaluate our synthesized labeled data on an

independent, second dataset in a novel evaluation pipeline. We show

that pre-training a vessel segmentation network using our synthetic data

yields superior performance compared to no pre-training and can reduce

the amount of additional training data. Finally, we make our synthesized

data available upon request to facilitate further research.

2. Related work

GANs have already been shown to be successful in many applications

of data augmentation in medical imaging [6,29] as well as in

neuroimaging [3,5]. Here, real medical images together with synthe-

sized images were used to improve models that were trained on real data

only. Whereas we provide results on data augmentation, this study

focussed on the models trained on purely synthetic data and its gener-

alizability to a new dataset.

Generating medical images with labels is not a new idea. Neff et al.

showed that lung x-rays with corresponding segmentation labels can be

generated using a GAN architecture [24]. Guibas et al. demonstrated the

synthesization of labeled retina images using two GANs [7]. While these

studies focused on 2D medical images, we use a 3D dataset and evaluate

the performance on an independent dataset. In the neuroimaging

domain Foroozandeh et al. recently showed that synthesized and labeled

MR images can improve tumor segmentation performance [5]. How-

ever, the focus here was on augmentation and models trained on syn-

thesized data alone yielded comparably low performance.

In addition to that, we tested the generalizability of our GAN ar-

chitecture via a transfer learning approach. While previous studies such

as Foroozandeh et al. and Frid-Adar et al. only considered one dataset [5,

6], we successively added training images from a second dataset to a

pre-trained segmentation model. After that, we compared its perfor-

mance to the model’s performance that was trained from scratch.

Whereas Guibas et al. provided an evaluation on a second dataset, we

here extended the evaluation by adding images successively to get more

robust results [7].

3. Methods

3.1. Network architecture

The architecture of the proposed DCGAN was adapted from Radford

et al. [25] and Neff et al. [24]. The WGAN-GP is an extension of the

original Wasserstein GAN [2] using gradient penalty for regularization

[8]. For the third architecture WGAN-GP-SN spectral normalization was

used in the convolutional layers of the WGAN-GP [21]. Our code is

openly available.

The proposed methods and the structure of the GAN is

shown in Fig. 1.

The generator G of all architectures took a noise vector of length 100

sampled from a gaussian distribution as input. The noise vector was then

fed through 6 upsampling convolutional layers using a kernel size of 5

and stride of 2. After each convolution layer, a batch normalization layer

and a ReLU activation layer were added, except for the last convolution

layer. The activation function used after the last convolution layer is the

hyperbolic tangent function. The network then outputs two 96 ×96

images that correspond to one image-label pair xgenpgen. The objective

function for the generators of all architectures were built upon:

LG=maxGExgenpgen [log(D(xgen))] (1)

This term is maximized if the discriminator regards the generated

input as real with a high certainty (D(xgen)close to 1). In other words, the

discriminator is fooled by the generator. The discriminator D for all

architectures took two 96 ×96 images as input which correspond to

either a real image-label pair or generated image-label pair. The pairs

were again fed through 6 convolutional layers with a kernel size of 5 and

stride of 2. After each convolution layer, a batch normalization layer and

a leaky ReLU (with a slope of 0.2) were added, except for the last

convolution layer. The activation function used after the last

convolution layer in the DCGAN was a sigmoid function. The objective

function of the discriminator for the DCGAN was:

LD=maxDExreal∼preal [logD(xreal)] + Exgen ∼pgen [log(1−D(xgen))] (2)

where xrealpreal denoted the real image-label pair. Here, the first part of

https://github.com/prediction2020/GANs-for-anonymized-labele

d-TOF-MRA-patches.

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

the equation maximizes for the real input to be identified as real

(D(xreal)close to 1) and the second part for the generated input to be

identified as such (D(xgen)close to 0).

For the WGAN-GP and WGAN-GP-SN, a gradient penalty term for

regularization was added to the discriminator’s loss:

lossD=D(xgen)−D(xreal) + λ(

∇D(

xreal + (1−

)xgen)

−1)2,(3)

where

U[0,1]and λ=10. The gradient penalty enforces the Lipschitz

constraint. In this way, the norm of the gradient is bounded and does not

lead to exploding gradients. Overall, this stabilizes the training of the

GAN [8]. Since the discriminator acted as a critic, the sigmoid activation

function in the last convolutional layer was omitted. The batch

normalization was replaced by instance normalization to normalize

across features and channels in the WGAN-GP. In the WGAN-GP-SN

architecture, spectral normalization was used instead of instance

normalization.

For training the DCGAN, the Adam optimizer [17] with a learning

rate of 0.0003 with β1=0.5 was used for both the generator and the

discriminator. The batch size was 512 and the model was trained for 178

epochs. To improve stability of the training, label smoothing (ranges

0.7–1.2/0–0.3) and feature matching between the last convolutional

layer using L1 norm were applied [28].

For WGAN-GP and WGAN-GP-SN, the Adam optimizer was utilized

with a learning rate of 0.0001 with β1=0 and β2=0.9 for both

generator and discriminator. The batch size was 128 (WGAN-GP)

trained for 194 epochs and 64 for the WGAN-GP trained for 157 epochs.

In each epoch the discriminator was updated five times and the gener-

ator once. All models were implemented in PyTorch and trained on a

Tesla V100.

3.2. Patients

A total of 121 patient MRA data from two studies were used:

PEGASUS (N =66) and 1000Plus (N =55). All patients were diagnosed

with a cerebrovascular disease. Details on both studies can be found in

previous papers, for the PEGASUS study see Mutke et al. [23], for the

1000Plus study see Hotter et al. [13]. All the patients gave their

informed written consent. The studies have been conducted in accor-

dance with the authorized ethical review committee of Charit´

e - Uni-

versit¨

atsmedizin Berlin.

Scans were performed on a clinical 3T whole-body system (Magne-

tom Trio, Siemens Healthcare, Erlangen, Germany; using a 12-channel

receive radiofrequency coil (Siemens Healthcare) tailored for head

imaging.

Parameters PEGASUS: voxel size =(0.5 ×0.5 ×0.7) mm

; matrix

size: 312 ×384 ×127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50

min, flip angle =18◦.

Parameters 1000Plus: voxel size =(0.5 ×0.5 ×0.7) mm

; matrix

size: 312 ×384 ×127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50

min, flip angle =18◦.

For both datasets, skull-stripping was applied. The segmentation

labels were produced semi-manually using a standardized pipeline along

with 4 raters correcting the labels as described in Livne et al. [20].

Fig. 1. Workflow of this study (A) and basic architecture of the generative adversarial networks that were trained (B).

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

3.3. Data splitting and patch extraction

For the anonymization, 41 out of the 66 PEGASUS patients were used

as a training set, 11 were used for validation and 14 for testing. For the

transfer learning approach, one to 15 patients in increments of two of

the 1000Plus data were utilized for training. The 1000Plus validation set

consisted of 10 and the test set of 40 patients.

Due to memory considerations, 2D patches of size 96 ×96 were

extracted from each patient instead of using the whole volume. The data

contained 1% vessels and 99% background. To compensate for this

imbalance, 500 patches per patient with a brain vessel in the center were

extracted. Then, 500 random patches per patient were added. The input

patches were normalized to a range between −1 and 1 for the GAN used

for anonymization. For the U-net segmentation model, the input was

normalized patch-wise to zero-mean and unit-variance.

3.4. Performance evaluation

The hyperparameters of the GAN architectures were pre-selected

based on visual inspection. After that, the generated images were

quantitatively evaluated using three different metrics: 1) Fr´

echet

inception distance (FID) [11], 2) the DSC and 3) the 95th percentile of

the Hausdorff distance (95HD) of a U-net segmentation model. The FID

measures the similarity of the real and generated images by feeding both

into an Inception-v3 network. The difference between the activations in

the pool3 layer inside the Inception-v3 network is then calculated as

follows:

FID =





real −

gen



2+Tr(

real +

gen −2(

real

gen)1/2),(4)

where xreal ∼N(

real,

real)and xgen ∼N(

gen,

gen)are the distributions

of the features in the pool3 layer of the real and generated data

respectively. In this way, the network’s activation is measured both for

real and generated data and then compared. If the similarity between

them is high, we expect similar activation and thus, a small distance.

For robustness, the FID was calculated on 4 different sets of gener-

ated data, each containing 41,000 patches of all three architectures with

the respective 41,000 real patches. The lower the FID, the higher the

similarity of the generated data to the original data.

As a second evaluation, the state-of-the-art “half U-net” used in Livne

et al. [20] was trained on the 4 sets of generated data alone as well as

both real and generated data. The parameters learning rate and dropout

rate were tuned with respect to the validation set. Additionally, classical

augmentation was used as described in Livne et al. [20] if this led to an

improved performance on the validation set. Each segmentation

network was trained for 15 epochs. Then, the performance was evalu-

ated on the binary segmentation maps of the test set by the DSC and

95HD:

DSC =2TP

2TP +FP +FN,(5)

where TP are the true positives, FP the false positives and FN the false

negatives. By this, the DSC measures the ratio of the overlap between the

predicted vessel voxels and ground truth compared to the total amount

of voxels. The Hausdorff distance is defined as:

HD =max(maxi∈[0,N−1]d(i,P,G),maxi∈[0,M−1]d(i,G,P)) (6)

where N and M denote the number of voxels on the vessel tree of the

ground truth G and the prediction P respectively. d(i,P,G)is defined as

the distance from vessel voxel i in G to the closest vessel voxel in P. In

other words, the Hausdorff distance finds the minimum distance for

each voxel in one subset (e.g. predicted vessel voxels) to another subset

(e.g. ground truth) and takes the maximum of this. The 95HD was then

the 95th percentile Hausdorff distance for each voxel, averaged over

each voxel and each patient. It was measured in millimeters.

In the second part of the analysis, the performance of the U-net

trained on generated patches was evaluated on the 1000Plus dataset. For

an increasing number of training patients (1, 3, …, 15) the U-net was

trained from scratch and using the weights from the best performing

model of those trained on the generated image-label pairs (transfer

learning). The performance of using real data only and transfer learning

was then compared by assessing the DSC and 95HD on the validation (10

patients) and test set (40 patients).

4. Results

Overall, generated synthetic patches showed high similarities to the

training set patches, in particular those that were synthesized by the

WGAN-GP-SN. The patches generated by the DCGAN showed a lower

resolution with slight checkerboard artifacts compared to the original

patches. The generated corresponding labels fit well to the patches for

all models. A subset of the synthesized image-label pairs for all GAN

architectures as well as original image-label pairs are shown in

Fig. 2A–D. In the quantitative assessment, the data generated by the

WGAN-GP-SN architecture showed the highest similarity to the real data

with a FID of 37.01 compared to 141.82 for the worst performing

DCGAN. All FID values for real and synthesized data can be found in

Table 1.

In the first validation approach, The U-net trained on data generated

by the WGAN-GP-SN showed the highest performance of all GAN models

with a segmentation performance of 0.85 DSC/30.00 95HD. The U-net

trained on real PEGASUS data showed a performance of 0.89 DSC/26.57

95HD. The same model showed a similarly high performance in the

external validation on the 1000Plus data with 0.88 DSC/25.12 95HD.

Quantitative results for all models trained on generated and/or real data

can be found in Table 3.

In the second validation approach applying transfer learning, the U-

net pre-initialized with the weights from training on synthesized patches

exhibited a higher performance compared to the model trained from

scratch on real data only could be observed. Particularly when training

on patches from one patient only (n =1000), transfer learning using

patch-label pairs generated by the WGAN-GP-SN led to a higher per-

formance in terms of DSC and 95HD (DSC/95HD 0.91/24.66 compared

to 0.84/27.36). This observed performance difference between pre-

initialized models and models trained from scratch became smaller

when more patients were used for training. Results of the transfer

learning approach are visualized in Figs. 3 and 4 shows the error maps

for both approaches on one example patient in large vessels (Fig. 4A and

C) and small vessels (Fig. 4B and D).

5. Discussion

We present a Wasserstein-GAN based model for the generation of

synthetic TOF-MRA imaging data and corresponding labels. The model

generated synthetic data of high quality, as evidenced visually and

through the FID measure, and retained much of the predictive properties

of the original images. Here, a predictive model for vessel segmentation

trained on synthetic data alone showed a good performance on one

dataset and excellent performance on an external validation set. The

synthetic data were also successfully applied in a transfer learning

approach where training was pre-initialized with weights from a model

trained on synthetic data. It outperformed the models trained on real

data. Our results mark a significant step towards the use of GAN-based

models to generate synthetic and effectively anonymous data. Conse-

quently, this approach has the potential to significantly accelerate

research in the field of neuroimaging.

While the image-label pairs synthesized by the DCGAN showed some

artifacts, the more recent GAN architectures (WGAN-GP and WGAN-GP-

SN) produced higher resolution data that looked similar to the real data

(Fig. 2). The superiority of the WGAN-approaches was confirmed by

lower FID values as well as the improved performance of the U-net

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

segmentation models trained on synthetic data. This can be explained by

the inherent differences between Wasserstein-GANs and the DCGAN. In

contrast to the DCGAN, the loss function of the WGAN-GP architectures

utilizes the Earth Mover’s distance and is bounded by a Lipschitz

constraint [2,8]. This works as a robust regularization and enhances

training stability while diminishing mode collapse at the same time. This

explains why the WGAN-GP produced more realistic looking

image-label pairs. Other studies confirm the superiority of Wasserstein

GAN architectures over the DCGAN [2,8]. A recent addition to GAN

architectures was the introduction of spectral normalization. This

method additionally restricts the discriminator’s weights for each layer

in order to stabilize training even for high learning rates [21]. As evi-

denced in our work, spectral normalization is also beneficial for the

application of Wasserstein GANs, and the combination of both regula-

rization techniques (WGAN-GP-SN) yielded the best image quality both

by visual inspection as well as in terms of FID. These techniques have

thus supported the preservation of the predictive properties for vessel

segmentation within the synthetic patches. Therefore, it is likely that

more sophisticated (future) GAN architectures will further improve the

generation of synthetic data. Here, potential current candidate methods

Fig. 2. Real and synthesized image patches with corresponding labels. (A) to (C) show image-label pairs generated by DCGAN (A), WGAN-GP (B) and WGAN-GP-SN

(C) respectively. (D) show real patches and corresponding labels. The synthesized patches resemble real vessel patches and the labels fit well to the patches, especially

those generated by WGAN-GP-SN (C).

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

are progressive growing GAN (PG-GAN) or stacked GAN architectures

[14,16].

Whereas the data generated by WGAN-GP-SN consistently yielded

the highest DSC in the transfer learning approach, this is not as apparent

in other parts of the results. First, the 95HD did not show a consistent

trend. Since the Hausdorff distance is vulnerable to outliers, we argue

that it might not be as reliable as the DSC. This is also corroborated by

the high standard deviation over the patients. Secondly, when training

the U-net with real data and additional synthesized data (data

augmentation), the performance only slightly increased for the WGAN-

GP-SN. In addition, the DCGAN seemed to perform slightly better. This

might be due to the more noisy and blurry appearance of the images

generated by the DCGAN compared to the WGAN architectures. Here,

for the DCGAN only the vessels seem to be sharp. Additionally, they fit

well to the generated segmentation label. This attention on the vessels

might lead to an increased focus on vessels within the feature extraction

in the encoding part of the U-net. Then, together with the real images the

U-net model is able to learn how real images look as well as the focus on

vessels. The WGAN architectures look more similar to the real images

which is also corroborated by the lower FID and cannot profit from this

effect. Thus, they cannot provide (much) additional information to the

real data. Training a U-net on synthesized data alone, the DCGAN is then

outperformed by the WGAN architectures as the generated images look

more noisy.

GAN architectures have the potential to generate anonymized data

since the generator does not have direct access to the training data. This

also holds true for this study: the generator synthesizes patch-label pairs

from a noise vector. However, a recent study by Hayes et al. [10] shows

that DCGANs might be vulnerable to so-called membership inference

attacks [31]. Such attacks aim to identify whether a given data sample

was part of the original training set or not. To prevent this, differentially

private GANs (DPGANs) have been introduced [36]. Here, carefully

adjusted noise is introduced in the gradients during the discriminator’s

training. While these GANs have the potential to ensure a certain level of

privacy, they show poorer performance to date [22] and have only been

trained on natural image datasets yet. Training a DPGAN on sparse

medical imaging datasets remains a major challenge. While DPGANs

might provide even further advantages in anonymization, we argue that

our synthesized patch-label pairs are effectively anonymized. For one, in

the WGAN-GP-SN approach, we apply Lipschitz regularization tech-

niques such as gradient penalty and spectral normalization. Wu et al.

found that these techniques might reduce information leakage and

might even make the trained models resistant to membership inference

attacks [35]. Furthermore, we use randomly sampled 2D patches in this

study. Thus, for a successful membership inference attack two events

must coincide: First, the real training data that is protected by

state-of-the-art hospital security systems has to be leaked. Second, the

patches need to be extracted in the exact same way as in the

GAN-training process to allow re-identification. The minuscule

Table 1

Fr´

echet inception distance (FID) as a quantitative mea-

surement of the generated image’s similarity compared to

the real images for each of the three GAN architectures. The

FID is averaged over the 4 different datasets generated from

one model. The standard deviation (SD) is shown in

brackets. WGAN-GP-SN showed the highest similarity to the

real data in terms of FID.

GAN architecture mean FID (SD)

DCGAN 141.82 (0.32)

WGAN-GP 52.41 (0.16)

WGAN-GP-SN 37.01 (0.22)

Table 2

Summary of the mean Dice similarity coefficient (DSC) and the mean 95th-

percentile Hausdorff distance (95HD) of the U-net on test set with standard

deviation (SD). Both metrics are averaged over 4 different sets of generated data.

The artificial patches were generated by Generative Adversarial Networks

(GANs) trained on the PEGASUS dataset. For data augmentation, both real and

generated patches have been used for training. For anonymization the U-net was

trained on generated patches only. Models trained on anonymized, synthetic

data only show performances close to the model trained on real data.

test DSC test 95HD [mm]

mean SD mean SD

U-net on real PEGASUS data (Livne et al.) 0.892 26.569

Data augmentation (real data (PEGASUS) and generated data)

DCGAN 0.903 0.003 26.482 1.027

WGAN-GP 0.891 0.003 26.784 0.736

WGAN-GP-SN 0.894 0.005 27.909 1.137

Anonymization (trained on generated data only)

PEGASUS anonymization models: validated and evaluated on PEGASUS data

DCGAN 0.779 0.008 31.481 0.559

WGAN-GP 0.812 0.008 30.242 1.228

WGAN-GP-SN 0.848 0.007 30.001 0.702

PEGASUS anonymization models evaluated on real 1000Plus

DCGAN 0.792 0.022 27.103 0.288

WGAN-GP 0.871 0.003 27.307 0.694

WGAN-GP-SN 0.875 0.010 25.119 0.403

Fig. 3. Performance evaluation for segmentation for an increasing number of patients on the 1000Plus dataset when trained from scratch (green) and using transfer

learning (blue). The black dotted lines indicate the performance of the Unet on the real PEGASUS dataset. The error bars show the standard deviation over the

patients. Especially for up to 5000 data samples the pre-trained WGAN-GP-SN outperform the models without any pre-training.

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

probability of these events to happen is comparable to other theoretical

scenarios of state-of-the-art anonymization. For example, any tabular

data anonymized using state-of-the-art techniques could be re-identified

when compared with the leaked original data. Thus, we consider our

generated patches anonymous and hence make them available for re-

searchers upon request.

Our results are also promising for AI in healthcare product devel-

opment [12]. In the medical AI research setting, a strong focus on per-

formance in homogeneous samples can be observed. This is in stark

contrast to the requirements for a medical imaging product. A product is

supposed to be used in a real world setting confronted with highly

heterogeneous data reflecting different settings and multiple hardware

options. Thus, product development should focus as much on training on

heterogeneous data as on keeping the necessary performance [12]. This,

however, is currently highly challenging as data is a scarce resource due

to limited availability. Our results show that a relatively small amount of

data is sufficient to generate robust results. Thus, a GAN-based ano-

nymization approach could allow the generation of high quality data

from a smaller number of patients from multiple locations that - in total -

reflect the full distribution of soft- and hardware settings in the clinical

setting. Here, the possibility to generate high-quality labels as evidenced

by our study is also a great advantage. Notably, a GAN model also learns

the quality of the labels provided during training. Thus, the final per-

formance of any model trained on synthetic data will also be dependent

on the quality of the real labels. Providing high-quality labels is no

simple task and requires usually hours of manual labor by highly qual-

ified medical staff. Thus, a novel GAN-based approach to product

development could entail the high-quality labeling of relatively small

data-sets from multiple data providers that are then anonymized and

pooled for training. This would on one hand keep development costs

relatively low which is a prerequisite for startup success. On the other

hand, such an approach would ensure both high performance and low

bias as the chance for out-of-sample data in the clinical setting would be

significantly lowered.

Our study has several limitations. The GANs are 2D due to compu-

tational restrictions. 3D approaches could help extracting information

about the 3D vessel tree structure and in this way improve the perfor-

mance of the segmentation task. The computational restrictions also did

not allow to try out more advanced GAN architectures such as PG-GAN.

Another limitation is the calculation of the FID. Due to computational

restrictions it was only calculated to confirm the quality of visually

inspected images and not for every epoch in an end-to-end solution.

Secondly, the FID for assessing the image quality might not be ideal.

Although it is used as a quality measurement in the medical field [4,9], it

was originally designed for natural images and hence might not entirely

capture relevant features for medical imaging. Thus, further research on

assessing image quality specific to medical images should be

undertaken.

In this study, we generated TOF-MRA images for brain vessel seg-

mentation. Since this segmentation relies on identifying local structures

within an image, it allowed us to generate only parts of the image, i.e.

patches, and obtain good segmentation results. Our results might

generalize to medical segmentation problems that rely on these local

properties such as segmenting small organs, lesions or tumors. Never-

theless, for medical problems that involve the understanding of global

structures, e.g. the whole brain, a patch-based approach would most

probably not suffice. Here, bigger patches or whole volumes need to be

generated which will be computationally more expensive.

6. Conclusion

This study marks an essential step towards true anonymization of

medical imaging data while maintaining crucial predictive features

within the image patch. We show that these features might be general-

izable to another, independent dataset. Our initial performance for

vessel segmentation on the PEGASUS dataset already is relatively high.

We show that training more advanced GAN architectures can further

increase the quality of synthesized image-label pairs. By using only one

patient from a different cohort, we can achieve a high comparable

performance on an independent dataset. Our synthesized image-label

pairs allow other researchers to build models that only require few

labeled patient data and will significantly facilitate research in this

domain. It may be the case that our framework achieves similar results

on other medical segmentation tasks. This could lead to a lower demand

of labeled patient data and allow more data sharing of anonymized data.

Nevertheless, further studies should assess the generalizability of this

analysis to other (more complex) segmentation problems.

CRediT authorship contribution statement

Tabea Kossen: Conceptualization, Formal analysis, Investigation,

Methodology, Project administration, Software, Validation, Visualiza-

tion, Writing - original draft. Pooja Subramaniam: Formal analysis,

Investigation, Methodology, Software, Validation, Writing - review &

editing. Vince I. Madai: Conceptualization, Data curation, Investiga-

tion, Methodology, Project administration, Supervision, Visualization,

Writing - original draft, Writing - review & editing. Anja Hennemuth:

Resources, Supervision, Writing - review & editing. Kristian Hilde-

brand: Supervision, Writing - review & editing. Adam Hilbert:

Conceptualization, Writing - review & editing. Jan Sobesky: Data

curation, Writing - review & editing. Michelle Livne: Conceptualiza-

tion, Writing - review & editing. Ivana Galinovic: Data curation,

Writing - review & editing. Ahmed A. Khalil: Data curation, Writing -

review & editing. Jochen B. Fiebach: Data curation, Writing - review &

editing. Dietmar Frey: Conceptualization, Funding acquisition, Project

administration, Resources, Supervision, Writing - review & editing.

Declaration of competing interest

Tabea Kossen reported receiving personal fees from ai4medicine

outside the submitted work. Dr Madai reported receiving personal fees

from ai4medicine outside the submitted work. Adam Hilbert reported

Fig. 4. Error maps for one example patient from the 1000Plus study using one patient when training from scratch (A, B) and using transfer learning from WGAN-GP-

SN generated patches (C, D). True positives are shown in red, false positives in green and false negatives in yellow. Transfer learning led to less errors, especially on

small vessels (B, D).

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

receiving personal fees from ai4medicine outside the submitted work.

While not related to this work, Dr Sobesky reports receipt of speakers

honoraria from Pfizer, Boehringer Ingelheim, and Daiichi Sankyo.

Furthermore, Dr Fiebach has received consulting and advisory board

fees from BioClinica, Cerevast, Artemida, Brainomix, Biogen, BMS,

EISAI, and Guerbet. Dr Frey reported receiving grants from the European

Commission, reported receiving personal fees from and holding an eq-

uity interest in ai4medicine outside the submitted work.

Acknowledgements

This work has received funding by the German Federal Ministry of

Education and Research through (1) the grant Center for Stroke

Research Berlin and (2) a Go-Bio grant for the research group PREDIC-

TioN 2020 (lead: DF).

Computation has been performed on the HPC for Research cluster of

the Berlin Institute of Health.

Appendix A

Table 3

Corresponding validation Dice similarity coefficient (DSC) and the 95th-percentile Hausdorff distance (95HD) to

Table 2. All models were trained on the PEGASUS dataset. For data augmentation, the U-net was trained both on real

and data generated by the respective GAN architecture. For anonymization, the U-nets were trained on generated

data alone. Both metrics are averaged over 4 different sets of generated data. SD stands for standard deviation.

val DSC val 95HD [mm]

mean SD mean SD

U-net (Livne et al.) 0.879 29.499

Data augmenta-tion

DCGAN 0.883 0.002 29.856 0.624

WGAN-GP 0.885 0.002 29.556 0.520

WGAN-GP-SN 0.887 0.001 29.749 0.629

Anonymization

DCGAN 0.810 0.004 34.331 0.274

WGAN-GP 0.848 0.005 30.964 0.054

WGAN-GP-SN 0.859 0.003 31.477 0.166

References

[1] Abramian, D., Eklund, A., . Refacing: Reconstructing Anonymized Facial Features

Using GANS , vol. 5.

[2] M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, arXiv:1701.07875 [cs, stat]

URL, http://arxiv.org/abs/1701.07875, 2017. arXiv: 1701.07875.

[3] C. Bowles, L. Chen, R. Guerrero, P. Bentley, R. Gunn, A. Hammers, D.A. Dickie, M.

V. Hern´

andez, J. Wardlaw, D. Rueckert, GAN Augmentation: Augmenting Training

Data Using Generative Adversarial Networks, 2018 arXiv:1810.10863 [cs] URL, htt

p://arxiv.org/abs/1810.10863. arXiv: 1810.10863.

[4] B. Cao, H. Zhang, N. Wang, X. Gao, D. Shen, Auto-GAN: self-supervised

collaborative learning for medical image synthesis, in: Proceedings of the AAAI

Conference on Artificial Intelligence 34, 2020, pp. 10486–10493, https://doi.org/

10.1609/aaai.v34i07.6619. URL, https://aaai.org/ojs/index.php/AAAI/article/

view/6619.

[5] M. Foroozandeh, A. Eklund, Synthesizing Brain Tumor Images and Annotations by

Combining Progressive Growing GAN and SPADE, 2020 arXiv:2009.05946 [cs]

URL, http://arxiv.org/abs/2009.05946. arXiv: 2009.05946.

[6] M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, H. Greenspan, GAN-

based synthetic medical image augmentation for increased CNN performance in

liver lesion classification, Neurocomputing 321 (2018) 321–331, https://doi.org/

10.1016/j.neucom.2018.09.013. URL, http://www.sciencedirect.com/science/

article/pii/S0925231218310749.

[7] J.T. Guibas, T.S. Virdi, P.S. Li, Synthetic Medical Images from Dual Generative

Adversarial Networks, 2018 arXiv:1709.01872 [cs] URL, http://arxiv.org/abs/1

709.01872. arXiv: 1709.01872.

[8] Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C., . Improved

Training of Wasserstein GANs , vol. 11..

[9] C. Haarburger, N. Horst, D. Truhn, M. Broeckmann, S. Schrading, C. Kuhl,

D. Merhof, Multiparametric magnetic resonance image synthesis using generative

adversarial networks, Eurographics Workshop on Visual Computing for Biology

and Medicine 5 (2019), https://doi.org/10.2312/VCBM.20191226 pagesURL,

https://diglib.eg.org/handle/10.2312/vcbm20191226.

[10] J. Hayes, L. Melis, G. Danezis, E.D. Cristofaro, LOGAN: membership inference

attacks against generative models, in: Proceedings on Privacy Enhancing

Technologies 2019, 2019, pp. 133–152, https://doi.org/10.2478/popets-2019-

0008. URL, https://content.sciendo.com/view/journals/popets/2019/1/article

-p133.xml.

[11] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs Trained by

a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium, 2018 arXiv:

1706.08500 [cs, stat] URL, http://arxiv.org/abs/1706.08500. arXiv: 1706.08500.

[12] D. Higgins, V.I. Madai, From Bit to Bedside: A Practical Framework for Artificial

Intelligence Product Development in Healthcare. Advanced Intelligent Systems,

2020, https://doi.org/10.1002/aisy.202000052. N/a, 2000052, doi:10.1002/

aisy.202000052.

[13] B. Hotter, S. Pittl, M. Ebinger, G. Oepen, K. Jegzentis, K. Kudo, M. Rozanski, W.

U. Schmidt, P. Brunecker, C. Xu, P. Martus, M. Endres, G.J. Jungehülsing,

A. Villringer, J.B. Fiebach, Prospective study on the mismatch concept in acute

stroke patients within the first 24 h after symptom onset - 1000Plus study, BMC

Neurol. 9 (2009) 60, https://doi.org/10.1186/1471-2377-9-60. URL, doi:10.1186/

1471-2377-9-60.

[14] X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, S. Belongie, Stacked Generative

Adversarial Networks, 2017 arXiv:1612.04357 [cs, stat] URL, http://arxiv.

org/abs/1612.04357. arXiv: 1612.04357.

[15] H. Hukkelås, R. Mester, F. Lindseth, Deep privacy: a generative adversarial

network for face anonymization, in: G. Bebis, R. Boyle, B. Parvin, D. Koracin,

D. Ushizima, S. Chai, S. Sueda, X. Lin, A. Lu, D. Thalmann, C. Wang, P. Xu (Eds.),

Advances in Visual Computing, vol. 11844, Springer International Publishing,

Cham, 2019, pp. 565–578, https://doi.org/10.1007/978-3-030-33720-9_44. URL,

http://link.springer.com/10.1007/978-3-030-33720-9_44.

[16] T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive Growing of GANs for Improved

Quality, Stability, and Variation, 2018 arXiv:1710.10196 [cs, stat] URL, htt

p://arxiv.org/abs/1710.10196. arXiv: 1710.10196.

[17] D.P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, 2017 arXiv:

1412.6980 [cs] URL, http://arxiv.org/abs/1412.6980. arXiv: 1412.6980.

[18] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep

convolutional neural networks, Commun. ACM 60 (2017) 84–90, https://doi.org/

10.1145/3065386. URL, http://dl.acm.org/citation.cfm?doid=3098997.3065386.

[19] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A.W.

M. van der Laak, B. van Ginneken, C.I. S´

anchez, A survey on deep learning in

medical image analysis, Med. Image Anal. 42 (2017) 60–88, https://doi.org/

10.1016/j.media.2017.07.005. URL, http://www.sciencedirect.com/science/artic

le/pii/S1361841517301135.

[20] M. Livne, J. Rieger, O.U. Aydin, A.A. Taha, E.M. Akay, T. Kossen, J. Sobesky, J.

D. Kelleher, K. Hildebrand, D. Frey, V.I. Madai, A U-net deep learning framework

for high performance vessel segmentation in patients with cerebrovascular disease,

Front. Neurosci. 13 (2019), https://doi.org/10.3389/fnins.2019.00097. URL,

https://www.frontiersin.org/articles/10.3389/fnins.2019.00097/full#h7.

[21] T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida, Spectral Normalization for

Generative Adversarial Networks, 2018 arXiv:1802.05957 [cs, stat] URL, htt

p://arxiv.org/abs/1802.05957. arXiv: 1802.05957.

[22] S. Mukherjee, Y. Xu, A. Trivedi, J.L. Ferres, privGAN: Protecting GANs from

Membership Inference Attacks at Low Cost, 2020 arXiv:2001.00071 [cs, stat] URL,

http://arxiv.org/abs/2001.00071. arXiv: 2001.00071.

[23] M.A. Mutke, V.I. Madai, F.C. von Samson-Himmelstjerna, O. Zaro Weber, G.

S. Revankar, S.Z. Martin, K.L. Stengl, M. Bauer, S. Hetzer, M. Günther, J. Sobesky,

Clinical evaluation of an arterial-spin-labeling product sequence in steno-occlusive

disease of the brain, PloS One 9 (2014), e87143, https://doi.org/10.1371/journal.

pone.0087143.

[24] Neff, T., Payer, C., Stern, D., Urschler, M., . Generative adversarial network based

synthesis for supervised medical image segmentation. Proceedings of the OAGM &

T. Kossen et al.

Computers in Biology and Medicine 131 (2021) 104254

ARW Joint Workshop Vision, Automation and Robotics doi:10.3217/978-3-85125-

524-9-30..

[25] A. Radford, L. Metz, S. Chintala, Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks, 2016 arXiv:1511.06434 [cs] URL,

http://arxiv.org/abs/1511.06434. arXiv: 1511.06434.

[26] V. Ravindra, A. Grama, De-anonymization Attacks on Neuroimaging Datasets,

2019 arXiv:1908.03260 [cs, eess, q-bio] URL, http://arxiv.org/abs/1908.03260.

arXiv: 1908.03260.

[27] O. Ronneberger, P. Fischer, T. Brox, U-net: convolutional networks for biomedical

image segmentation, in: N. Navab, J. Hornegger, W.M. Wells, A.F. Frangi (Eds.),

Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015,

vol. 9351, Springer International Publishing, Cham, 2015, pp. 234–241, https://

doi.org/10.1007/978-3-319-24574-4_28. URL, http://link.springer.com

/10.1007/978-3-319-24574-4_28.

[28] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen, Improved

Techniques for Training GANs, 2016 arXiv:1606.03498 [cs] URL, http://arxiv.

org/abs/1606.03498. arXiv: 1606.03498.

[29] V. Sandfort, K. Yan, P.J. Pickhardt, R.M. Summers, Data augmentation using

generative adversarial networks (CycleGAN) to improve generalizability in CT

segmentation tasks, Sci. Rep. 9 (2019) 16884, https://doi.org/10.1038/s41598-

019-52737-x. https://www.nature.com/articles/s41598-019-52737-x. number: 1

Publisher: Nature Publishing Group.

[30] H.C. Shin, N.A. Tenenholtz, J.K. Rogers, C.G. Schwarz, M.L. Senjem, J.L. Gunter, K.

P. Andriole, M. Michalski, Medical image synthesis for data augmentation and

anonymization using generative adversarial networks, in: A. Gooya, O. Goksel,

I. Oguz, N. Burgos (Eds.), Simulation and Synthesis in Medical Imaging, Springer

International Publishing, Cham, 2018, pp. 1–11, https://doi.org/10.1007/978-3-

030-00536-8_1.

[31] R. Shokri, M. Stronati, C. Song, V. Shmatikov, Membership inference attacks

against machine learning models, in: 2017 IEEE Symposium on Security and

Privacy (SP), 2017, pp. 3–18, https://doi.org/10.1109/SP.2017.41, iSSN: 2375-

1207.

[32] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale

Image Recognition, 2014 arXiv:1409.1556 [cs] URL, http://arxiv.org/abs/

1409.1556. arXiv: 1409.1556.

[33] V. Sorin, Y. Barash, E. Konen, E. Klang, Creating Artificial Images for Radiology

Applications Using Generative Adversarial Networks (GANs) – A Systematic

Review, Academic Radiology URL, 2020, https://doi.org/10.1016/j.

acra.2019.12.024. https://linkinghub.elsevier.com/retrieve/pii/S10766332203

00210.

[34] Wachinger, C., Golland, P., Kremen, W., Fischl, B., Reuter, M., Alzheimer’s Disease

Neuroimaging Initiative, 2015. BrainPrint: a discriminative characterization of

brain morphology. Neuroimage 109, 232–248.. doi:10.1016/j.

neuroimage.2015.01.032.

[35] B. Wu, S. Zhao, C. Chen, H. Xu, L. Wang, X. Zhang, G. Sun, J. Zhou, Generalization

in Generative Adversarial Networks: A Novel Perspective from Privacy Protection,

2019 arXiv:1908.07882 [cs, stat] URL, http://arxiv.org/abs/1908.07882. arXiv:

1908.07882.

[36] L. Xie, K. Lin, S. Wang, F. Wang, J. Zhou, Differentially Private Generative

Adversarial Network, 2018 arXiv:1802.06739 [cs, stat] URL, http://arxiv.

org/abs/1802.06739. arXiv: 1802.06739.

[37] X. Yi, E. Walia, P. Babyn, Generative adversarial network in medical imaging: a

review, Med. Image Anal. 58 (2019) 101552, https://doi.org/10.1016/j.

media.2019.101552. URL, http://arxiv.org/abs/1809.07294. arXiv: 1809.07294.

T. Kossen et al.

Generating 3D TOF-MRA Volumes and

Segmentation Labels Using Generative

Adversarial Networks

5.1 Context Within Thesis

Most medical images, including brain images, are 3D. Whereas the third dimension often

offers valuable spatial information, image processing on 3D images coincides with a substantial

increase in memory consumption and processing time compared to 2D approaches. This is

especially the case for computationally demanding neural networks such as GANs.

The present work tackled the limitation of neglecting information in the third dimension

of the brain images in Chapter 4 and extended the GAN architectures to synthesize 3D high-

resolution, labeled image volumes. To overcome the computational restrictions, we introduced

techniques for memory efficiency and reduced training times, such as mixed precision and the

two timescale update rule.

Furthermore, we extended the evaluation schemes compared to Chapter 4 to use a network

pre-trained on medical images when calculating the FID and precision-recall curves of the

distributions.

5. Generating 3D TOF-MRA Volumes and Segmentation Labels Using Generative

Adversarial Networks

5.2 Journal Article

This chapter is based on the following publication that was published in Medical Image Analysis:

P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert,

J. Sobesky, M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I.

Madai. “Generating 3D TOF-MRA Volumes and Segmentation Labels using Generative

Adversarial Networks”. In: Medical Image Analysis (2022). doi:

10.1016/j.media.20

22.102396

The original journal article is reprinted with permission of Elsevier. The article is open access

under the CC BY license.

Author Contribution

The second author Tabea Kossen conceptualized the study and interpreted the results together

with PS, VIM, DF, ML and AH. She performed and/or supervised PS in model implementation.

Additionally, she was responsible for the project administration, wrote the first version of the

manuscript together with PS and VIM and coordinated the journal submission process.

Code Availability

The code for this project is publicly available:

https://github.com/prediction2020/3DGA

N_synthesis_of_3D_TOF_MRA_with_segmentation_labels.

Medical Image Analysis 78 (2022) 102396

Contents lists available at ScienceDirect

Medical Image Analysis

journal homepage: www.elsevier.com/locate/media

Generating 3D TOF-MRA volumes and segmentation labels using

generative adversarial networks

Pooja Subramaniam

, Tabea Kossen

a , b , ∗, Kerstin Ritter

c , d

, Anja Hennemuth

b , e , f

Kristian Hildebrand

, Adam Hilbert

, Jan Sobesky

h , i

, Michelle Livne

, Ivana Galinovic

Ahmed A. Khalil

i , j , k , l

, Jochen B. Fiebach

, Dietmar Frey

, Vince I. Madai

a , m , n

CLAIM - Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Germany

Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany

Department of Psychiatry and Psychotherapy, Charité Universitätsmedizin Berlin (corporate member of Freie Universität Berlin, Humboldt-Universität zu

Berlin, and Berlin Institute of Health), Berlin, Germany

Bernstein Center for Computational Neuroscience, Berlin, Germany

Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany

Fraunhofer MEVIS, Max-von-Laue-Str. 2, Bremen, Germany

Department VI Computer Science and Media, Beuth University of Applied Sciences, Berlin, Germany

Johanna-Etienne-Hospital, Neuss, Germany

Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany

Department of Neurology, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

Mind, Brain, Body Institute, Berlin School of Mind and Brain, Humboldt University Berlin, Berlin, Germany

Berlin Institute of Health, Berlin, Germany

School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK

QUEST-Center for Transforming Biomedical Research, Berlin Institute of Health, Charité Universitätsmedizin Berlin, Charitéplatz 1, Berlin 10117, Germany

a r t i c l e i n f o

Article history:

Received 13 July 2021

Revised 28 January 2022

Accepted 17 February 2022

Available online 24 February 2022

MSC:

41A05

41A10

65D05

65D17

Keywords:

Generative adversarial networks

3D Medical imaging

Mixed precision

Anonymization

Brain vessel segmentation

a b s t r a c t

Deep learning requires large labeled datasets that are diﬃcult to gather in medical imaging due to data

privacy issues and time-consuming manual labeling. Generative Adversarial Networks (GANs) can allevi-

ate these challenges enabling synthesis of shareable data. While 2D GANs have been used to generate 2D

images with their corresponding labels, they cannot capture the volumetric information of 3D medical

imaging. 3D GANs are more suitable for this and have been used to generate 3D volumes but not their

corresponding labels. One reason might be that synthesizing 3D volumes is challenging owing to compu-

tational limitations. In this work, we present 3D GANs for the generation of 3D medical image volumes

with corresponding labels applying mixed precision to alleviate computational constraints.

We generated 3D Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) patches with their corre-

sponding brain blood vessel segmentation labels. We used four variants of 3D Wasserstein GAN (WGAN)

with: 1) gradient penalty (GP), 2) GP with spectral normalization (SN), 3) SN with mixed precision (SN-

MP), and 4) SN-MP with double ﬁlters per layer (c-SN-MP). The generated patches were quantitatively

evaluated using the Fréchet Inception Distance (FID) and Precision and Recall of Distributions (PRD). Fur-

ther, 3D U-Nets were trained with patch-label pairs from different WGAN models and their performance

was compared to the performance of a benchmark U-Net trained on real data. The segmentation perfor-

mance of all U-Net models was assessed using Dice Similarity Coeﬃcient (DSC) and balanced Average

Hausdorff Distance (bAVD) for a) all vessels, and b) intracranial vessels only.

Our results show that patches generated with WGAN models using mixed precision (SN-MP and c-SN-

MP) yielded the lowest FID scores and the best PRD curves. Among the 3D U-Nets trained with synthetic

patch-label pairs, c-SN-MP pairs achieved the highest DSC (0.841) and lowest bAVD (0.508) compared to

the benchmark U-Net trained on real data (DSC 0.901; bAVD 0.294) for intracranial vessels.

In conclusion, our solution generates realistic 3D TOF-MRA patches and labels for brain vessel segmenta-

tion. We demonstrate the beneﬁt of using mixed precision for computational eﬃciency resulting in the

best-performing GAN-architecture. Our work paves the way towards sharing of labeled 3D medical data

which would increase generalizability of deep learning models for clinical use.

This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )

https://doi.org/10.1016/j.media.2022.102396

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

1. Introduction

The success of deep learning algorithms in natural image anal-

ysis has been leveraged in recent years to the medical imaging do-

main. Deep learning methods have been used for automation of

various manual time-consuming tasks such as segmentation and

classiﬁcation of medical images ( Greenspan et al., 2016; Lunder-

vold and Lundervold, 2019 ). Supervised deep learning methods,

speciﬁcally, learn relevant features from images by mapping fea-

tures in the input images to the label output. While the advantage

of these methods is that they do not need manual extraction of

features from the images, they do require large amounts of labeled

data. Here, a major challenge is that it is expensive and diﬃcult

to acquire and label medical data ( Yi et al., 2019 ). Yet, even when

labeled medical data is available, it usually cannot be shared read-

ily with other researchers due to privacy concerns ( Clinical Prac-

tice Committee, 20 0 0 ). Anonymization methods typically applied

in medical imaging would not be beneﬁcial in the case of neu-

roimaging as the unique neuroanatomical features present in brain

images could be used to identify individuals ( Wachinger et al.,

2015; Valizadeh et al., 2018 ). As a consequence, often small, siloed

or homogenous datasets are used when proposing new deep learn-

ing models in neuroimaging ( Willemink et al., 2020 ).

A potential solution to this problem is the generation of syn-

thetic medical imaging data. A very promising method for this pur-

pose is Generative Adversarial Networks (GANs) ( Goodfellow et al.,

2014 ). Various GAN architectures from the natural images domain

have gained popularity in medical imaging for image synthesis,

supervised image-to-image translation, reconstruction and super-

resolution ( Yi et al., 2019 ). For image synthesis, speciﬁcally, 2D

GANs have been used in several works such as synthesis of Com-

puted Tomography (CT) liver lesions ( Frid-Adar et al., 2018 ), skin

lesion images ( Baur et al., 2018 ), and axial Magnetic Resonance

(MR) slices ( Bermudez et al., 2018 ). GANs can be extended to gen-

erate the labels along with the synthesized images. For example,

2D GANs have been used to generate the corresponding segmenta-

tion labels for lung X-rays ( Neff et al., 2018 ), vessel segmentation

( Kossen et al., 2021 ), retinal fundus images ( Guibas et al., 2018 )

and brain tumor segmentation ( Foroozandeh and Eklund, 2020 ).

Although these results are promising, the challenge remains that

2D GANs cannot capture important anatomical relationships in the

third dimension. Since medical images are often recorded in 3D,

GANs generating 3D medical images are thus highly warranted. 3D

GANs have been used to generate downsampled or resized MRI im-

ages of different resolutions ( Kwon et al., 2019; Eklund, 2020; Sun

et al., 2021 ). However, to our knowledge, there is no 3D GAN med-

ical imaging study that generates the corresponding labels, which

is critical for using the data for supervised deep learning research.

One reason could be that synthesizing 3D volumes is still a chal-

lenge due to computational limitations.

In our study, we generate high resolution 3D medical image

patches along with their labels in an end-to-end paradigm for

brain vessel segmentation which aids in identifying and studying

cerebrovascular diseases. From 3D Time-of-Flight Magnetic Reso-

nance Angiography (TOF-MRA), we synthesize 3D patches together

with brain vessel segmentation labels. We implement and compare

four different 3D Wasserstein-GAN (WGAN) variants: three with

the same architecture but different regularizations and mixed pre-

cision ( Micikevicius et al., 2018 ) schemes, and one with a modi-

ﬁed architecture - double ﬁlters per layer - owing to memory ef-

ﬁciency from mixed precision. Next to a qualitative visual assess-

∗Corresponding author at: CLAIM - Charité Lab for AI in Medicine, Charité Uni-

versitätsmedizin Berlin, Germany.

E-mail address: tabea.kossen@charite.de (T. Kossen).

ment, we use quantitative measures to evaluate the synthesized

patches. We further evaluate the performance of brain vessel seg-

mentation models trained on the generated patch-label pairs and

compare them to a benchmark model trained on real data. Addi-

tionally, we also compare the segmentation performance on a sec-

ond, independent dataset.

To summarize, our main contributions are:

1. For the ﬁrst time to our knowledge in the medical imaging do-

main, we generate high resolution 3D patches along with seg-

mentation labels using GANs.

2. We utilize the memory eﬃciency provided by mixed precision

to enable a more complex WGAN architecture with double the

ﬁlters per layer.

3. Our generated labels allow us to train 3D U-Net models for

brain vessel segmentation on synthetic data in an end-to-end

framework.

2. Methods

2.1. Architecture

We adapted the WGAN - Gradient penalty ( Gulrajani et al.,

2017 ) model to 3D in order to produce 3D patches and their cor-

responding labels of brain vessel segmentation. We implemented

four variants of the architecture: a) GP model - WGAN-GP model

in 3D b) SN model - GP model with spectral normalization in the

critic network c) SN-MP model - SN model with mixed precision d)

c-SN-MP model - SN-MP model with double the ﬁlters per layer. An

overview of the GAN training is provided in Fig. 1 .

For all models, a noise vector ( z) of length 128 sampled from

a standard Gaussian distribution ( N (0 , 1) ) was input to the Gener-

ator G . It was fed through a linear layer and a 3D batch normal-

ization layer, then 3 blocks of upsampling and 3D convolutional

layers with consecutive batch normalization and ReLU activation,

and a ﬁnal upsampling and 3D convolutional layer as shown in

Fig. 2 A. An upsample factor of 2 with nearest neighbor interpo-

lation was used. The convolutional layers used kernel size of 3 and

stride of 1. Hyperbolic tangent ( tanh ) was used as the ﬁnal activa-

tion function. The output of the generator was a two channel im-

age of size 128 ×128 ×64 : one channel was the TOF-MRA patch

and the second channel was the corresponding label which is the

ground truth segmentation of the generated patch. The function of

the labels is to train a supervised segmentation model such as a

3D U-Net model with the generated data.

Next, the critic D either took the generated 3D patch-label pairs

(G (z(i ))) or the real 3D patch-label pairs ( x ) as its input. The

patch-label pairs were fed through four 3D convolutional layers.

A kernel size of 3 and stride of 2 was used in the convolutional

layers. After each convolutional layer, a 3D instance normalization

layer was used for the GP model as shown in Fig. 2 B. Here, for

the SN model, we used spectral normalization ( Miyato et al., 2018 )

after each convolutional layer which acts as an additional regular-

ization to gradient penalty as shown in Fig. 2 C. Leaky ReLU was

used as the activation layer after the normalization layers. The last

layer was linear that produced a scalar, coined as the critic’s score.

The score indicates how similar the distribution of the generated

patch-label pairs is to that of the real patch-label pairs. This indi-

rectly ensures that the generated labels correspond to the vessels

in the generated patches similar to how the real labels correspond

to the vessels in the real patches. The loss function of the critic

was:

loss

(i ) = D (G (z

(i )

)) −D (x ) + λ(∇ D (

x )  −1)

2 (1)

where ˆ

x = x + (1 −) G (z

(i )

) , ∼U[0 , 1] , λ= 10 and ∇is gradient

of the critic. Here, the difference between the critic’s score for the

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Fig. 1. Structure of the workﬂow from training the 3D GAN to qualitative and quantitative assessments. Top: Overview of GAN training - Here, we illustrate our most complex

model using spectral normalization and mixed precision (c-SN-MP), middle: Evaluation schemes, bottom: Segmentation performance evaluation.

Fig. 2. Architectures of A. Generator of all models, B. Critic of GP model, and C.

Critic of all SN models.

real and generated data along with the gradient penalty is com-

puted. The loss function of the generator based on the output of

the critic was:

loss

(i ) = −D (G (z

(i )

)) (2)

This equates to maximizing the critic’s score for the generated

images by using the negative of the critic’s score as loss for the

generator.

In the case of the SN-MP model, mixed precision was used

for memory eﬃciency. The default precision used in deep learning

methods is 32 ﬂoating point (FP32). In mixed precision, both half

precision (FP16) and FP32 are used depending on the precision re-

quirements of a particular arithmetic operation. Here, FP16 is used

for storing weights, activations and gradients while an FP32 mas-

ter copy of weights is used for optimizer updates. A loss-scaling

factor is applied in order to maintain the performance equivalent

to a fully FP32 network. Using mixed precision, allowed us to use

more ﬁlters per layer. Hence, c-SN-MP model was trained where

double the ﬁlters were used in each layer of the SN-MP model. For

implementation details, see open source code

1 https://github.com/prediction2020/3DGAN _ synthesis _ of _ 3D _ TOF _ MRA _ with _

segmentation _ labels

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

2.2. Data

2.2.1. Datasets

TOF-MRA data of 137 patients with cerebrovascular disease

from two earlier studies, PEGASUS ( n = 72 ) and 10 0 0Plus ( n = 65 ),

were used in this work. The 65 TOF-MRA data from the 10 0 0Plus

study were used for additional validation as a second, independent

dataset. The details of the studies can be found in ( Mutke et al.,

2014 ) for PEGASUS and in ( Hotter et al., 2009 ) for 10 0 0Plus. The

imaging was performed with the following parameters for both

the studies: voxel size =0 . 5 ×0 . 5 ×0 . 7 mm3 ; matrix size =312 ×

384 ×127 ; TR/TE = 22 ms/3.86 ms; time of acquisition = 3:50 min,

ﬂip angle = 18 degrees.

The images were pre-segmented semi-manually with a stan-

dardized pipeline using a thresholded region growing algorithm in

the case of PEGASUS dataset, and a 2D U-Net segmentation model

( Livne et al., 2019 ) in the case of 10 0 0Plus. Final ground truths

were created following pre-deﬁned manual correction steps ﬁrst

by junior and ﬁnally senior raters. Further details of the labeling

methodology can be found in ( Hilbert et al., 2020 ).

2.2.2. Data splitting and preprocessing

TOF-MRA images from each study were denoised, and non-

uniformity correction was applied to improve image quality

( Masoudi et al., 2021 ). For training the GANs, 47 of the patient data

of the PEGASUS study were used. For the downstream task of seg-

mentation, 12 were used as the validation set and 13 as the test

set. The 10 0 0Plus study dataset was solely used as an independent

test set ( n = 65 ) for evaluation of the trained segmentation model.

Due to computational limitations, 3D patches of size 128 ×

128 ×64 were extracted from the whole brain TOF-MRA scans of

the PEGASUS training set. For training of GANs, 50 patches of im-

ages and their labels per patient were extracted - in part system-

atically (18) to cover all parts of the image and in part randomly

(32) with the center voxel being a blood vessel in order to rep-

resent suﬃcient vessels. This amounted to a total of 2,350 patch-

label pairs. In addition, 250 patches per patient were randomly

extracted with center voxel as blood vessel for the downstream

segmentation model from the PEGASUS training and validation set

leading to 11,750 and 3,0 0 0 patch-label pairs respectively.

The image patches for the GAN training were normalized be-

tween -1 and +1. The corresponding labels were stacked on the

image patch as a second channel for training the GANs.

2.3. Evaluation methods

An overview of the evaluation methods is shown in Fig. 1 .

The qualitative evaluation was done by visually assessing the im-

ages, labels and the 3D vessel structure using ITK-SNAP

2 as a ﬁrst

step. For a quantitative assessment, FID scores were computed

from the extracted features using MedicalNet following precedence

( Sun et al., 2021 ). This is a 3D ResNet model pretrained on 23 dif-

ferent medical datasets for segmentation ( Chen et al., 2019 ). We

chose this network instead of the commonly used Inception-v3

trained on ImageNet dataset ( Szegedy et al., 2016 ) for calculating

the FID scores to better match our 3D medical data.

While the FID measures the quality of the images, it does not

account for mode collapse. Mode collapse happens when the gen-

erator learns to output a small set of good quality images to get a

good critic’s score and does not learn further any new variations

present in the training data. In order to quantify both quality and

variety of modes captured in the synthetic data, we used Preci-

sion and Recall for Distributions (PRD) ( Sajjadi et al., 2018 ). Preci-

sion quantiﬁes the quality of the image, and Recall amounts to the

2 http://www.itksnap.org/pmwiki/pmwiki.php

Fig. 3. Brain mask application for intracranial vessels analysis. Here, an axial slice

is shown of A. TOF-MRA image with skull B. brain mask extracted using FSL-BET

tool from TOF-MRA image C. ground truth segmentation label after brain mask ap-

plication leading to skull-stripping i.e. removal of all vessels of face and neck with

only intracranial vessels remaining.

mode collapse. We also computed the Area Under the Curve (AUC)

of the PRD curves to extract a single score for a simple quantiﬁca-

tion. Here again, we compared the extracted features of the gener-

ated and real patches from the pre-trained MedicalNet. It is impor-

tant to note that both FID and PRD curves are based on the imag-

ing patches alone and the labels are not taken into consideration

for these performance measures.

Next, we tested the generated data for brain vessel segmen-

tation. 3D U-Nets were trained on the synthetic patch-label pairs

produced from the four different 3D GANs, and on the real data to

compare segmentation performance. The generated patches were

rescaled back to the real data range i.e. to 0–255 and the labels

made binary by using a threshold. The performance of all trained

U-Nets was evaluated on two independent test sets in two separate

analysis schemes: a) all vessels b) intracranial vessels. In the case

of all vessels, the whole predicted segmentation label was consid-

ered for evaluation. For intracranial vessels, the segmentation la-

bels were processed so that only the intracranial vessels were con-

sidered. This was done by applying brain masks of corresponding

TOF-MRA images on the ground truth segmentation labels and the

prediction labels from all the U-Net models. The brain masks were

obtained automatically using the FSL-Brain Extraction Tool

3 (BET)

with parameter frac = 0 . 05 on the TOF-MRA images. A visual il-

lustration of this post-processing of labels for intracranial vessels

is shown in Fig. 3 . In each case, the U-Net model that performed

the best on the real validation set was selected to compute and

report the performance on the real test sets. This method of eval-

uation not only signiﬁes the utility of the synthetic data for the

brain vessel segmentation use case but also provides information

about how well the generated labels reﬂect the vessel information

in the generated patch as this is crucial for a good segmentation

performance. The segmentation performance was measured using

Dice Similarity Coeﬃcient (DSC) and the balanced Average Haus-

dorff Distance (bAVD) ( Aydin et al., 2021b ). DSC is a commonly

used metric to evaluate segmentation performance, given by:

DSC =

2 ×T P

2 ×T P + F P + F N

(3)

where TP = True positive; FP = False positive; FN = False negative.

A higher DSC indicates good segmentation performance. bAVD is a

distance metric which has been shown to be a better metric for

evaluation of blood vessel segmentation ( Aydin et al., 2021a ). It is

a modiﬁed average Hausdorff distance deﬁned as:

bAV D =

×



g∈ G

min

p∈ P

(d(g, p)) +



p∈ P

min

g∈ G

(d(p, g))



(4)

where G is the set of voxels in the ground truth, P is the set of

voxels in the predicted segmentation. The balanced directed av-

erage Hausdorff distance from voxel set G to P is given by the

sum of all minimum distances from all points belonging to point

3 https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/BET/UserGuide

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

set G to P divided by the number of points in G . Similarly, bal-

anced directed average Hausdorff distance from voxel set P to G

is given by the sum of all minimum distances from all points be-

longing to point set P to G divided by the number of points in

G . bAVD is the mean of the directed average Hausdorff distance

from G to P and directed average Hausdorff distance from P to G

in voxels. A lower bAVD indicates good segmentation performance.

We used the EvaluateSegmentation tool ( Taha and Hanbury, 2015 )

to calculate the DSC and bAVD for each patient prediction. The

mean DSC and mean bAVD was then calculated across all the

patients.

2.4. Training

The models were implemented in PyTorch, and trained using

an Nvidia TITAN RTX GPU for 100 epochs each. We used two time-

scale update rule ( Heusel et al., 2018 ) with different learning rates

of 0.0 0 04 and 0.0 0 02 for the critic and the generator respectively

instead of having more updates for the critic within each epoch.

Adam optimizer ( Kingma and Ba, 2014 ) with β1

= 0 and β2

= 0 . 9

was used. The batch-size for all models was 4. For mixed preci-

sion, the Automatic Mixed Precision (AMP) package from PyTorch

was used. A threshold of 0.3 was set for binarizing the generated

labels except in the case of SN model where 0.2 was used. All the

above hyperparameters were chosen based on the performance of

the validation set in the segmentation task. The training times and

the memory used for each GAN variant were recorded.

Table 1

FID scores and AUC of the PRD curves for

synthetic data from different models.

Data source FID PRD-AUC

GP model 0.0381 0.80

SN model 0.0322 0.82

SN-MP model 0.0206 0.87

c-SN-MP model 0.0244 0.86

For segmentation, the published 3D U-Net architec-

ture and framework implemented in TensorFlow from

Hilbert et al. (2020) was utilized with the default hyperpa-

rameters. These were Adam optimizer with a learning rate of

0.0 0 01 and β1

= 0 . 9 , β2

= 0 . 999 , and batch size of 8.

3. Results

In the visual analysis, the synthetic patches, labels and the 3D

vessel structure from the complex mixed precision model (c-SN-

MP) appeared as the most realistic ( Fig. 4 ). The patches from the

mixed precision models (SN-MP and c-SN-MP) had the lowest FID

scores ( Table 1 ), and the best PRD curves ( Fig. 5 ). Based on the PRD

curves, the precision of c-SN-MP outperformed SN-MP where the

recall values are higher while the precision of SN-MP is higher for

lower recall values. Based on the AUC of the PRD curves shown

in Table 1 , SN-MP and c-SN-MP patches performed similarly. In

Table 2 , the memory consumption and the training duration of

Fig. 4. Sets of samples of the mid-axial slice of the patch and label, and the corresponding 3D vessel structure from A) GP B) SN C) SN-MP D) c-SN-MP and E) real. The

visualizations were obtained using ITK-SNAP for illustrative purposes only.

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Table 2

Total number of trainable parameters, memory consumption and training times of vari-

ous 3D GAN models. Note that c-SN-MP, which is our complex mixed precision model,

uses twice the number of ﬁlters per layer leading to doubling of the trainable parame-

ters compared to non-complex models. The memory consumption increased by 1.5 times

compared to the SN model allowing it to be accommodated in the limited memory of

our computational infrastructure. The training time also increased by 2.5 times but it

was not a constraint

in our study.

Model Trainable parameters (million) Memory (MB) Time (hours)

GP model 145 15,085 78

SN model 145 14,333 77

SN-MP model 145 9,013 77

c-SN-MP model 308 21,351 192

Table 3

The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2

different datasets PEGASUS and 10 0 0Plus. The value in brackets is the standard deviation

across patients. A) All vessels is done on the entire prediction with the entire ground truth

as reference, and B) Intracranial vessels is done on skull-stripped prediction with skull-

stripped ground truth as reference.

Data source PEGASUS 10 0 0Plus

Mean DSC Mean bAVD Mean DSC Mean bAVD

A) All vessels

GP model 0.793 (0.024) 2.648 (1.189) 0.807 (0.03) 1.895(1.061)

SN model 0.804 (0.019) 2.425 (1.505) 0.796 (0.029) 1.855 (0.929)

SN-MP model 0.782 (0.020) 2.334 (1.122) 0.778 (0.032) 1.746 (0.894)

c-SN-MP model 0.820 (0.017) 1.859 (1.038) 0.809 (0.031) 0.858 (0.91)

Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)

B) Intracranial vessels

GP model 0.827 (0.015) 0.639 (0.132) 0.829 (0.019) 0.701 (0.195)

SN model 0.833 (0.013) 0.606 (0.141) 0.811 (0.023) 0.716 (0.213)

SN-MP model 0.804 (0.020) 0.784 (0.125) 0.785 (0.027) 0.822 (0.211)

c-SN-MP model 0.841 (0.016) 0.508 (0.083) 0.817 (0.028) 0.611 (0.18)

Real 0.901 (0.019) 0.294 (0.077) 0.880 (0.024) 0.507 (0.126)

Fig. 5. PRD Curves of synthetic data from the four different models with real data

as reference. Precision and Recall in GANs quantify the quality and modes captured

by the models respectively.

each of the GAN variants is shown. Using mixed precision im-

proved the memory eﬃciency by approximately 40%.

The test set performance of the 3D U-Net trained on generated

data from different models and on real data is shown in Table 3 .

Here, Table 3 A shows the performance when all vessels are con-

sidered. The U-Net trained with c-SN-MP synthetic data outper-

formed all the U-Nets trained on other synthetic data for the PE-

GASUS test set (mean DSC 0.820; mean bAVD 1.859). In the case

of the external dataset 10 0 0Plus, the performance of U-Net trained

on synthetic data from GP model and c-SN-MP model were the

same in terms of mean DSC with 0.810 whereas the performance

of U-Net trained on data from c-SN-MP was the lowest in terms of

mean bAVD with 1.301. In comparison, the performance of the 3D

U-Net trained with real data on PEGASUS test set was overall still

the highest (mean DSC 0.906; mean bAVD 0.339), and on 10 0 0Plus

test set (mean DSC 0.887; mean bAVD 0.622).

Next, Table 3 B shows the performance for intracranial vessels

alone. Here, the U-Net trained with c-SN-MP synthetic data outper-

formed all the U-Nets trained on other synthetic data for the PE-

GASUS test set (mean DSC 0.841; mean bAVD 0.508). For the exter-

nal test set from the 10 0 0Plus dataset, the U-Net trained on gen-

erated data from GP was the highest in terms of mean DSC with

0.830 whereas the U-Net trained on generated data from c-SN-MP

was the lowest in terms of mean bAVD with 0.639. The perfor-

mance of labels with only intracranial vessels from the 3D U-Net

trained with real data on the PEGASUS test set was still the high-

est (mean DSC 0.901; mean bAVD 0.294), and on the 10 0 0Plus test

set (mean DSC 0.880; mean bAVD 0.541).

Box-whisker plots of the prediction performance of various

models on the two test sets are plotted in Fig. 6 which shows the

inter-patient spread in performances for all vessels ( Fig. 6 A) and

for intracranial vessels ( Fig. 6 B). The error maps of segmentation

of two example patients, one from each of the two datasets, are

shown in Fig. 7 for all vessels and for intracranial vessels.

4. Discussion

To the best of our knowledge, this is the ﬁrst work to present

generative adversarial network models that generate realistic 3D

TOF-MRA volumes along with segmentation labels in medical

imaging. We showed that utilizing mixed precision aids in achiev-

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Fig. 6. Segmentation performance (DSC and bAVD) of 3D U-Net models trained with 4 different generated data and PEGASUS training data on the 2 datasets PEGASUS and

10 0 0Plus of A) all vessels B) intracranial vessels. The horizontal line of the box-whisker plots indicates the median, the box indicates the interquartile range and the whiskers

the minimum and maximum.

ing the highest image quality of synthetic data. Additionally, the

synthetic data from our complex model maintained a substan-

tial amount of predictive properties of the original volumes re-

ﬂected by the good segmentation performance on real test data.

These ﬁndings also held true on a second, independent dataset.

The results showcase the potential of utilizing memory eﬃciency

provided by mixed precision in designing a complex architecture.

Increasing the complexity is required in order to generate high-

resolution ﬁne grained structures such as brain vessels in 3D TOF-

MRA volumes from noise along with the corresponding segmenta-

tion labels. This work sets an important step towards sharing la-

beled 3D medical images that would facilitate better research in

the medical imaging domain.

The segmentation performance of the 3D U-Net trained on syn-

thetic data from our complex mixed precision model, c-SN-MP,

showed the best performance compared to the other models based

on synthetic data in terms of both metrics DSC and bAVD. Here,

doubling the ﬁlters per layer in the GAN architecture is likely to

have helped to capture the vessel structure in the training data.

Also, visually it can be seen in Fig. 4 that c-SN-MP labels ( Fig. 4 D)

look connected and most similar to real vessel structures ( Fig. 4 E).

On the contrary, the labels of synthetic data from the simpler

mixed precision model, SN-MP ( Fig. 4 C), are sparsely connected

which explains the worst performance in terms of the DSC of the

U-Net trained on SN-MP data. This seems plausible as the vessels

are more relevant for segmentation than the background. The same

can be observed in patch-label pairs from our most basic model,

GP. Here, the segmentation performance was better than SN-MP in

terms of DSC even though visually Fig. 4 A shows that patch quality

of GP is not as sharp as the other generated images.

In terms of quantitative measures of patch quality, the FID

scores and PRD curves, the mixed precision models, both simple

and complex, were rated to be of much better quality and variety

when compared to models not using mixed precision (GP and SN

models). However, the U-Net trained with the simpler mixed pre-

cision model, SN-MP, patch-label pairs had the lowest segmenta-

tion performance. A possible reason for this could be that FID and

PRD curves, which are based on the features extracted only from

the patches, might focus not only on the vessel structure but also

the quality of the background. In contrast, the U-Net performance

is more focused on recognizing the vessel structure. This is con-

ﬁrmed when looking at Fig. 4 C where the patches seem realistic,

but the vessel structures look disconnected. We see the reverse of

this in the case of GP, where the patches look less realistic, but the

vessel structures look more connected. This could explain why the

GP model fared poorly in FID and PRD curves and yet did well in

segmentation when used to train a U-Net. Looking more closely at

the PRD curves ( Fig. 5 ), the simpler mixed precision model, SN-MP,

patches had good quality at lower recall values, while patches from

our complex mixed precision model, c-SN-MP, had better quality

when the recall values increased. This implies that c-SN-MP is ca-

pable of generating patches of slightly reduced quality but with

higher variety, and thus, is better at handling mode collapse which

is indicated by recall. While the FID and PRD curve provide in-

sights regarding the image quality and variety, these metrics do

not necessarily align with the performance in the vessel segmen-

tation task. This emphasizes the importance of generating labels

along with the image to determine the best generated data for the

speciﬁc use case.

Overall, the FID and PRD curves indicated that more regular-

izations have a positive effect on the image quality and variety.

The mixed precision models, SN-MP and c-SN-MP are the best per-

forming models in terms of these metrics. They are both regular-

ized with gradient penalty ( Gulrajani et al., 2017 ) and spectral nor-

malization ( Miyato et al., 2018 ). These methods have been individ-

ually proposed to bound the critic by ensuring Lipschitz continuity

which has been found to stabilize GAN training. Gradient penalty

does this by applying a gradient based constraint to the objec-

tive function of the critic. With spectral normalization, the critic is

bound by directly constraining its weight matrices by normalizing

them with their spectral norm. Using the two methods together

was proposed to be beneﬁcial in the study that introduced spec-

tral normalization in GANs ( Miyato et al., 2018 ) and using them to-

gether has been shown to improve performance in another study

( Kossen et al., 2021 ). In addition to these methods, we also used

mixed precision for memory eﬃciency in the case of SN-MP and

c-SN-MP models. Mixed precision has been found to act as yet an-

other form of regularization ( Micikevicius et al., 2018 ). Unlike FID

and PRD curves, the segmentation performance does not always

beneﬁt from synthetic data generated by more regularized mod-

els. When looking at the test DSC and bAVD of the U-Nets trained

on synthetic data from the simpler models, GP, SN and SN-MP, it

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Fig. 7. Segmentation error map of an example patient each from PEGASUS test set and 10 0 0Plus test set for all vessels and for intracranial vessels. Top to bottom maps

from 3D U-Net model trained on: A. GP synthetic data B. SN synthetic data C. SN-MP synthetic data D. c-SN-MP synthetic data E. real data. True positives are shown in red,

false positives are in green and false negatives in yellow. (For interpretation of the references to color in this ﬁgure legend, the reader is referred to the web version of this

article.)

is diﬃcult to rank the overall performance due to the varied dif-

ferences in the two segmentation performance metrics across the

two test sets. One possible explanation for the differences might

be that regularization might positively impact the patches but not

necessarily the binary segmentation labels that are much simpler

to generate. To draw conclusions on how regularizations in GANs

affect the two segmentation metrics and the generalizability to the

additional set, a more systematic analysis would be required in

further research. For the c-SN-MP model, we increased the model

complexity which could better utilize the multiple regularizations

and thus showed good segmentation performance as well as good

image quality. An additional argument in favor of multiple reg-

ularizations is that it has been found to make models less vul-

nerable towards membership inference attacks (

Truex et al., 2019;

Chen et al., 2020 ). Such attacks are used by malicious parties to

ﬁnd out if a particular patient’s data was used to train a model

( Shokri et al., 2017 ). This is crucial to consider when sharing the

synthetic data or the generator model. While regularization has

been found useful to mitigate some attacks, applying differential

privacy (DP) ( Dwork and Roth, 2014 ) to the training process, by

construction, puts an upper bound on the privacy leakage of the

training data. DP is challenging to implement especially in a 3D

GAN architecture, as it introduces a substantial number of param-

eters to an already overwhelming amount of parameters. This leads

to high computational cost in terms of both memory and pro-

longed training time while reducing the test performance consider-

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

ably. To showcase this, we have included preliminary results with

DP using Rényi divergence ( Mironov, 2017 ) in 3D on a simpliﬁed

GAN architecture in the appendix.

The U-Net trained on real data still outperforms all those

trained on generated data. Here, Fig. 7 (All vessels) shows that

all U-Net models trained with synthetic patches also segment

blood vessels that are not brain blood vessels but rather vessels

in the face and neck area, i.e. false positives. In contrast, the U-Net

trained on real patches of the same size recognized if a vessel be-

longed to the brain and did not segment vessels outside the brain.

This highlights that while the GANs learned to segment blood ves-

sels, they did not learn to take the anatomical context of the ves-

sels into account, i.e. the resulting models could not differentiate

between face, neck and brain blood vessels. A possible explana-

tion for this could be that the loss function for GANs focuses on

the quality of the generated data and not the segmentation perfor-

mance of the generated patch-label pairs compared to real patch-

label pairs. A potential solution would be to generate labels with a

ﬁrst GAN similar in distribution to the real ground truth segmenta-

tion labels, and then a second GAN could be used to generate the

corresponding image in inference mode with a 3D image-to-image

translation GAN architecture that was trained with real labels and

the corresponding images. However, training two 3D GANs sepa-

rately would increase the overall training time substantially and

was not feasible with our hardware infrastructure. We utilized an

alternative solution where we applied the test image brain mask

in a post-processing step leading to the removal of face and neck

vessels. This is a valid post-processing approach since many clini-

cal use cases only require segmentation of intracranial vessels. The

performance of the 3D U-Net trained with generated data then im-

proved, bringing it closer to the performance of the U-Net trained

on real data as shown in Table 3 B, Figs. 6 B, 7 (Intracranial vessels).

Generating 3D data is more complex and computationally ex-

pensive compared to 2D. Yet, the best performing U-Net model

trained on synthesized 3D data (DSC 0.841) is comparable to the

best performing U-Net model trained on synthesized 2D data (DSC

0.848) for the same use case of intracranial vessel segmentation

( Kossen et al., 2021 ). Here, the number of voxels that are gen-

erated is increased by a factor of 100 approximately. Meanwhile,

only quarter of the number of ﬁlters per layer were used for all

our non-complex 3D GAN models owing to memory limitations. In

order to double the number of ﬁlters per layer for our complex

model (c-SN-MP), we used mixed precision. Next, we also used

upsampling instead of convtranspose to alleviate the checkerboard

artifacts which increased the memory consumption substantially.

Additionally, the training of WGAN requires the discriminator to

be updated more often than the generator. Since this would lead

to much longer training times, the current work utilized the Two

Timescale Update Rule (TTUR). Here, the learning rate of the dis-

criminator is set to be higher than that of the generator. These

changes were crucial to cope with the special challenges of syn-

thesis in 3D. Even with these restrictions, a similar segmentation

performance of 3D in comparison with 2D underlines the impor-

tance of generating data in 3D to capture the contextual informa-

tion within the third dimension for this 3D use case. It is likely

that the segmentation performance of the U-Net trained with gen-

erated 3D data could surpass the performance of 2D data with

more computational capacity, when more ﬁlters can be utilized in

the 3D GAN architecture.

A different strategy with regards to data privacy is Federated

Learning (FL). Here, sharing of data is avoided by locally computing

updates for a global model that is then aggregated to be utilized by

the participating clients. The results thus far are promising. How-

ever, standard FL does not create new data that can be made pub-

lic for other research groups to access and improve model archi-

tectures. This is especially important in the case of rare patholo-

gies where the data is scarce. Here, GANs can be used to generate

data of such pathologies by research groups that have access to

the data which can then be made publicly available. Additionally,

there are technical and collaborative hurdles in FL such as picking

a model-aggregation policy, standardization of hardware and soft-

ware across multiple organizations among others ( Ng et al., 2021 ).

These challenges are more acute in the case of deep learning re-

search. The organizational and collaborative effort s involved might

not be feasible for research groups with limited resources. Since

synthetic data from GANs can be shared, it provides easy and eq-

uitable access to all research groups investigating deep learning in

medical imaging. FL, on the other hand, is more suitable for clin-

ical application of well-established architectures with distributed

training. It should be noted that both FL and GANs are suscepti-

ble to information leakage from the model weights even if the real

data itself is not shared. This makes both methods open to pri-

vacy threats ( Sheller et al., 2020; Chen et al., 2020 ). Here, DPGAN

has been found useful ( Xie et al., 2018 ). DP algorithms incorpo-

rate random noise into the model making them resilient towards

information leakage ( Shokri et al., 2017 ). FL and DPGANs could be

taken together to combine their strengths as was done in FedDP-

GAN ( Zhang et al., 2021 ). In our work, we focus on the challenges

of generating 3D medical imaging along with corresponding labels

since labeling generated images is time and labour intensive. This

is an important step before inclusion of DP into the GAN architec-

ture. We have provided preliminary results using DP on a simple

3DGAN architecture in the appendix.

The main limitations of our study are computational in nature.

First, we have not employed DP in the presented GAN architectures

which would provide an upper bound on the information leakage

when the generated data and/or generated model is shared. The

computational load resulting from applying DP would have made

the study unfeasible with the available computing infrastructure.

Second, we did not use more novel GAN architectures validated

on natural images such as Progressive GANs ( Karras et al., 2018 )

or Multi-Scale-Gradients GANs ( Karnewar and Wang, 2020 ). This is

because of the multi-fold computational requirements of these ar-

chitectures, especially in 3D. Patches of much smaller size could

still be generated ( Eklund, 2020 ), but they would not be very use-

ful for the downstream task of vessel segmentation. Third, we gen-

erated patch-labels pairs and not whole volume-label pairs due to

computational limitations. While a recently introduced hierarchical

memory-eﬃcient approach ( Sun et al., 2021 ) might help to over-

come the computational constraints, this would come at the cost

of much longer training times considering 2 GANs of different res-

olutions are trained along with encoders in an end-to-end manner.

Additionally, architectures that use data reconstruction are more

susceptible to membership inference attack ( Chen et al., 2020 ).

Two of the recent studies ( Kwon et al., 2019; Sun et al., 2021 ) gen-

erating 3D images alone use encoders in their architectures which

make them less useful for the purpose of privacy-preserving data

sharing. Lastly, we trained and tested our GAN architectures on one

imaging modality, i.e. TOF-MRA. While we expect generalization

of our results to other modalities that may not be high contrast-

to-noise modalities like TOF-MRA, this should be veriﬁed in fu-

ture studies. For that, we encourage other researchers to utilize

our publicly available code. Our ﬁndings for TOF-MRA can be re-

garded as a ﬁrst proof-of-concept that GAN architectures are able

to synthesize realistic looking 3D volumes with corresponding seg-

mentation labels.

5. Conclusion

In this study, we generated high resolution TOF-MRA patches

along with their corresponding labels in 3D employing mixed

precision for memory eﬃciency. Since most medical imaging is

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

recorded in 3D, generating 3D images that retain the volumetric

information together with labels that are time-intensive to gener-

ate manually is a ﬁrst step towards sharing labeled data. While

our approach is not privacy-preserving yet, the architecture was

designed with privacy as a key aspiration. It would be possible to

extend it with differential privacy in future works once the compu-

tational advancements allow it. This would pave the way for shar-

ing privacy-preserving, labeled 3D imaging data. Research groups

could utilize our open source code to implement a mixed precision

approach to generate 3D synthetic volumes and labels eﬃciently

and verify if they hold the necessary predictive properties for the

speciﬁc downstream task. Making such synthetic data available on

request would then allow for larger heterogeneous datasets to be

used in the future alleviating the typical data shortages in this do-

main. This will pave the way for robust and replicable model de-

velopment and will facilitate clinical applications.

Declaration of Competing Interest

All authors have participated in (a) conception and design, or

analysis and interpretation of the data; (b) drafting the article or

revising it critically for important intellectual content; and (c) ap-

proval of the ﬁnal version.

This manuscript has not been submitted to, nor is under review

at, another journal or other publishing venue.

The authors have no aﬃliation with any organization with a di-

rect or indirect ﬁnancial interest in the subject matter discussed in

the manuscript

The following authors have aﬃliations with organizations with

direct or indirect ﬁnancial interest in the subject matter discussed

in the manuscript:

None of the authors have direct or indirect ﬁnancial interest in

the subject matter discussed in the manuscript.

However, the following disclosures unrelated to the current

work is as follows:

Pooja Subramaniam reported receiving personal fees from

ai4medicine outside the submitted work. Tabea Kossen reported

receiving personal fees from ai4medicine outside the submitted

work. Dr Madai reported receiving personal fees from ai4medicine

outside the submitted work. Adam Hilbert reported receiving per-

sonal fees from ai4medicine outside the submitted work. Dr Frey

reported receiving grants from the European Commission, reported

receiving personal fees from and holding an equity interest in

ai4medicine outside the submitted work. There is no connec-

tion, commercial exploitation, transfer or association between the

projects of ai4medicine and the results presented in this work.

While not related to this work, Dr Sobesky reports receipt of

speakers honoraria from Pﬁzer, Boehringer Ingelheim, and Daiichi

Sankyo. Furthermore, Dr Fiebach has received consulting and ad-

visory board fees from BioClinica, Cerevast, Artemida, Brainomix,

Biogen, BMS, EISAI, and Guerbet.

CRediT authorship contribution statement

Pooja Subramaniam: Conceptualization, Formal analysis, Inves-

tigation, Methodology, Software, Validation, Visualization, Writing

– original draft, Writing – review & editing. Tabea Kossen: Con-

ceptualization, Investigation, Methodology, Project administration,

Software, Supervision, Validation, Visualization, Writing – origi-

nal draft, Writing –review & editing. Kerstin Ritter: Supervision,

Writing –review & editing. Anja Hennemuth: Supervision, Writ-

ing –review & editing. Kristian Hildebrand: Supervision, Writ-

ing –review & editing. Adam Hilbert: Conceptualization, Writ-

ing –review & editing. Jan Sobesky: Data curation, Writing –re-

view & editing. Michelle Livne: Conceptualization, Writing –re-

view & editing. Ivana Galinovic: Data curation, Writing –review &

editing. Ahmed A. Khalil: Data curation, Writing –review & edit-

ing. Jochen B. Fiebach: Data curation, Writing –review & edit-

ing. Dietmar Frey: Conceptualization, Funding acquisition, Project

administration, Resources, Supervision, Writing –review & edit-

ing. Vince I. Madai: Conceptualization, Data curation, Investiga-

tion, Methodology, Project administration, Supervision, Visualiza-

tion, Writing – original draft, Writing –review & editing.

Acknowledgments

This work has received funding by the German Federal Ministry

of Education and Research through (1) the grant Centre for Stroke

Research Berlin and (2) a Go-Bio grant for the research group PRE-

DICTioN2020 (lead: DF). Grant number 031B0154 .

Appendix A. Data augmentation

An additional analysis to complement the evaluation of our syn-

thetic data is to use it to augment the training data for the down-

stream brain blood vessel segmentation task. Here we trained seg-

mentation models with PEGASUS training data and augmented it

with synthetic data from the 4 GAN models separately for ad-

ditional analysis. Table A.1 summarizes the segmentation results

for all vessels ( Table A.1 A) and intracranial vessels ( Table A.1 B).

Fig. A.1 is a box-whisker plot to visualize the spread in the seg-

Table A.1

The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2 different

datasets PEGASUS and 10 0 0Plus using model trained with real data along with generated data

used as data augmentation. The value in brackets is the standard deviation across patients. A) All

vessels is done on the entire prediction with the entire ground truth as reference, and B) Intracra-

nial vessels is done on skull-stripped prediction with skull-stripped ground truth as reference.

Data source PEGASUS 10 0 0Plus

Mean DSC Mean bAVD Mean DSC Mean bAVD

A) All vessels

Real + GP model 0.902 (0.046) 0.333 (0.151) 0.862 (0.029) 0.65 (0.271)

Real + SN model 0.906 (0.016) 0.385 (0.145) 0.878 (0.021) 0.558 (0.199)

Real + SN-MP model 0.903 (0.013) 0.359 (0.133) 0.883 (0.02) 0.511 (0.145)

Real + c-SN-MP model 0.907 (0.012) 0.399 (0.204) 0.891 (0.02) 0.564 (0.222)

Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)

B) Intracranial vessels

Real + GP model 0.897 (0.018) 0.323 (0.073) 0.855 (0.029) 0.626 (0.199)

Real + SN model 0.905 (0.017) 0.318 (0.075) 0.874 (0.022) 0.546 (0.148)

Real + SN-MP model 0.900 (0.017) 0.328 (0.078) 0.877 (0.02) 0.518 (0.121)

Real +

c-SN-MP model 0.905 (0.016) 0.306 (0.091) 0.884 (0.02) 0.557 (0.129)

Real 0.901 (0.019) 0.294 (0.077) 0.880 (0.024) 0.507 (0.126)

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Fig. A.1. Segmentation performance (DSC and bAVD) of 3D U-Net models trained with PEGASUS training data together with 4 different generated data as data augmentation

on the 2 datasets PEGASUS and 10 0 0Plus of A) all vessels B) intracranial vessels. The horizontal line of the box-whisker plots indicates the median, the box indicates the

interquartile range and the whiskers the minimum and maximum.

mentation performance between patients for all vessels ( Fig. A.1 A)

and intracranial vessels ( Fig. A.1 B).

Using synthetic data from c-SN-MP model to augment the

real data for training segmentation model provided slightly bet-

ter mean DSC on both test sets (PEGASUS and 10 0 0Plus) for the

two cases of A) all vessels and B) intracranial vessels when com-

pared to using only real data or using synthetic data from other

GAN models along with real data. While data augmentation is a

valid application of our synthetic data, the additional value from

them is limited as can be seen from the results. This could be be-

cause the predictive properties captured by the synthesized data is

similar to the real data. This was also the case in the study with

2D GAN ( Kossen et al., 2021 ) where data augmentation with 2D

generated data did not lead to substantial difference in the seg-

mentation performance.

Appendix B. 3D differentially private GAN

Differential privacy (DP) is a natural mitigation strategy against

membership inference threats. Using DP to synthesize data would

allow accounting of the level of possible re-identiﬁcation thus pro-

viding privacy guarantees of the generated data. In order to illus-

Fig. B.1. Sets of samples of the mid-axial slice of the patch and label, and the corresponding 3D vessel structure from A) DPGAN ≈10

2 B) DPGAN

≈10

3 C) DPGAN

≈10

D) real. Note that lower the

higher the privacy. The visualizations were obtained using ITK-SNAP for illustrative purposes only.

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Table B.1

The mean DSC and mean bAVD (in voxels) across all the patients in the test set for 2

different datasets PEGASUS and 10 0 0Plus using model trained with generated data from

3D DPGAN with different values - starting from low value indicating high privacy to

the high

value indicating low privacy. The value in brackets is the standard deviation

across patients. A) All vessels is done on the entire prediction with the entire ground

truth as reference, and B) Intracranial vessels is done on skull-stripped prediction with

skull-stripped ground truth as reference.

Data source PEGASUS 10 0 0Plus

Mean DSC Mean bAVD Mean DSC Mean bAVD

A) All vessels

DPGAN

≈10

2 0.085 (0.012) 4.509 (0.903) 0.083 (0.016) 4.116 (0.743)

DPGAN

≈10

3 0.562 (0.050) 6.307 (2.406) 0.567 (0.041) 4.624 (1.608)

DPGAN

≈10

6 0.581 (0.048) 5.05 (2.267) 0.568 (0.041) 3.962 (1.591)

Real 0.906 (0.016) 0.339 (0.139) 0.883 (0.023) 0.554 (0.221)

B) Intracranial vessels

DPGAN

≈10

2 0.081 (0.013) 4.77 (1.081) 0.077 (0.015) 4.595 (0.878)

DPGAN

≈10

3 0.586 (0.045) 3.141 (0.514) 0.569 (0.045) 2.698 (0.604)

DPGAN

≈10

6 0.604 (0.048) 2.201 (0.413) 0.572 (0.048) 2.001 (0.445)

Real 0.901 (0.019) 0.294 (0.077) 0.88 (0.024) 0.507 (0.126)

Fig. B.2. Segmentation error map of an example patient each from PEGASUS test set and 10 0 0Plus test set for all vessels and for intracranial vessels. Top to bottom maps

from 3D U-Net model trained on: A. DPGAN

≈10

B. DPGAN

≈10

C. DPGAN

≈10

D. real data. True positives are shown in red, false positives are in green and false

negatives in yellow. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

trate this, we utilized the Opacus package from PyTorch to apply

DP-SGD algorithm with Rényi divergence on a 3D adapted version

of the WGAN ( Arjovsky et al., 2017 ). Henceforth, we refer to this

model architecture as 3D DPGAN. Here, we clipped the weights

of the critic with a clipping parameter of 0.01. We had to halve

the number of ﬁlters per layer in both the critic and the genera-

tor in order to be able to train the 3D DPGAN within our compu-

tational infrastructure. We trained the 3D WGAN with Rényi dif-

ferential privacy accountant which translates to ( , δ)-DP guaran-

tees. The ( , δ) pairs quantify the privacy properties of DP-SGD.

is the measure of privacy loss at a differential change in data

with δprobability that the privacy constraint of does not hold

true. A smaller value leads to better privacy. For comparing syn-

thetic data with different privacy guarantees, the noise multiplier

values were set to different values [0.1, 0.3, 0.5] which each provide

values [ ≈10

, 10

] respectively. It should be noted that since

the training samples consist of 3D patch-label pairs from the TOF-

MRA image-segmentation label pairs, the guarantees showcased

here also pertain to the patch-label pair data rather than the whole

TOF-MRA image of a patient. We set δto the inverse of the number

of training samples following convention ( Torkzadehmahani et al.,

2019 ). The maximum gradient norm value of 1 was applied for

clipping gradients. Adam optimizer with a learning rate of 0.0 0 01

for both the critic and the generator was used instead of the TTUR

method as the training time was reasonable with 5 updates of

critic for every update of the generator. All the GANs were trained

for 100 epochs. The threshold of 0.7 was applied on the generated

labels from DPGAN ≈10

and 0.6 for DPGAN ≈10

and ≈10

chosen based on the segmentation performance on the validation

set. The code for the same is also made available in the GitHub

repository that has already been provided.

The generated patch-label pairs and the 3D vessel structure

synthesized with different values are shown in Fig. B.1 along

with the real patch-label pairs for a qualitative comparison. With

decreasing values the generated data quality reduces. In other

words, higher privacy guarantees come with lower quality. This

is also supported quantitatively with lower segmentation perfor-

mance of those U-Nets trained with generated data from lower

DPGAN and vice-versa. Table B.1 shows the results of the test

segmentation performance on 2 datasets trained with generated

patch-label pairs from DPGANs with different values for A) all

vessels and B) intracranial vessels only. Synthetic data used from

DPGAN with ≈10

6 has the best performance in the case of

all vessels (mean DSC 0.581) and in the case of intracranial ves-

sels (mean DSC 0.604; mean bAVD 2.201). bAVD of U-Net trained

with synthetic data from DPGAN with ≈10

2 is unexpectedly

lower (mean bAVD 4.509) than that trained with ≈10

6 (mean

bAVD 5.05). This is because the metric bAVD penalizes false posi-

tives more than false negatives. This explanation is corroborated in

Fig. B.2 which visualizes the error masks of segmentation of two

example patients, one from each of the two datasets for all vessels

and intracranial vessels. Fig. B.2 A (PEGASUS) - All vessels shows

the segmentation error maps from U-Net trained on synthetic data

from the highest privacy guarantee of ≈10

. The network misses

almost all the vessels and yet the bAVD is lower than bAVD of U-

Net trained on data from DPGAN ≈10

6 ( Fig. B.2 C (PEGASUS) -

All vessels) which has far less false negatives but relatively more

false positives owing to vessels from neck and face area. This is

further conﬁrmed when these vessels are removed for analysis by

the post-process skull-stripping of the labels. Then, the bAVD of

U-Net trained with DPGAN ≈10

6 (mean bAVD 2.201) improves

much more than that of U-Net trained with DPGAN ≈10

2 (mean

bAVD 4.77).

Our results for the 3D DPGAN show that the generated data

with the largest epsilon ≈10

6 yielded the best performance

(mean DSC 0.604). While this model provided an upper bound of

privacy, it should be noted that ≈10

6 is a very large value and

the resulting privacy bounds are thus too loose. Moreover, the per-

formance of our DPGAN with ≈10

is quite low compared to the

performance of our generated data without any privacy guarantees

(mean DSC 0.841). Therefore, we conclude that ﬁnding the right

balance between privacy and utility remains a challenge for differ-

ential privacy to be used even in a very simple 3D GAN architec-

ture.

Supplementary material

E-supplementary data of this work can be found in online ver-

sion of the paper.

Supplementary material associated with this article can be

found, in the online version, at doi: 10.1016/j.media.2022.102396 .

References

Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein GAN. arXiv:1701.07875 [cs,

stat] .

Aydin, O. U., Taha, A . A ., Hilbert, A., Khalil, A. A., Galinovic, I., Fiebach, J. B., Frey,

D., Madai, V. I., 2021. An evaluation of performance measures for arterial brain

vessel segmentation. Accepted for publication

Aydin, O.U., Taha, A .A ., Hilbert, A ., Khalil, A .A ., Galinovic, I., Fiebach, J.B., Frey, D.,

Madai, V.I., 2021. On the usage of average Hausdorff distance for segmentation

performance assessment: hidden error when used for ranking. Eur. Radiol. Exp.

5 (1), 4. doi: 10.1186/s41747- 020- 0 020 0-2 .

Baur, C., Albarqouni, S., Navab, N., 2018. Generating highly realistic images of skin

lesions with GANs. arXiv:1809.01410 [cs, eess] .

Bermudez, C., Plassard, A.J., Davis, T.L., Newton, A.T., Resnick, S.M., Landman, B.A.,

2018. Learning implicit brain MRI manifolds with deep learning. Proc SPIE Int.

Soc. Opt. Eng. 10574. doi: 10.1117/12.2293515 .

Chen, D., Yu, N., Zhang, Y., Fritz, M., 2020. GAN-leaks: a taxonomy of mem-

bership inference attacks against generative models. arXiv:1909.03935 [cs] .

10.1145/3372297.3417238

Chen, S., Ma, K., Zheng, Y., 2019. Med3D: transfer learning for 3D medical image

analysis. arXiv:1904.00625 [cs] .

Clinical Practice Committee, 20 0 0. Informed consent for medical photographs.

Dysmorphology subcommittee of the clinical practice committee, ameri-

can college of medical genetics. Genet. Med. 2 (6), 353–355. doi: 10.1097/

0 0125817-20 0 0110 0 0-0 0 010 .

Dwork, C., Roth, A., 2014. The algorithmic foundations of differential privacy. Foun-

dations Trends Theor. Comput. Sci. 9 (3–4), 211–407. doi: 10.1561/040 0 0 0 0 042 .

Eklund, A., 2020. Feeding the zombies: synthesizing brain volumes using a 3D pro-

gressive growing GAN. arXiv:1912.05357 [cs, eess] .

Foroozandeh, M., Eklund, A., 2020. Synthesizing brain tumor images and an-

notations by combining progressive growing GAN and SPADE. arXiv:2009.

05946

[cs] version: 1.

Frid-Adar, M., Diamant, I., Klang, E., Amitai, M., Goldberger, J., Greenspan, H., 2018.

GAN-based synthetic medical image augmentation for increased CNN perfor-

mance in liver lesion classiﬁcation. Neurocomputing 321. doi: 10.1016/j.neucom.

2018.09.013 .

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.,

Courville, A., Bengio, Y., 2014. Generative adversarial networks. arXiv:1406.

2661 [cs, stat] .

Greenspan, H., van Ginneken, B., Summers, R.M., 2016. Guest editorial deep learning

in medical imaging: overview and future promise of an exciting new technique.

IEEE Trans. Med. Imaging 35 (5), 1153–1159. doi: 10.1109/TMI.2016.2553401 .

Guibas, J. T., Virdi, T.

S., Li, P. S., 2018. Synthetic medical images from dual generative

adversarial networks. arXiv:1709.01872 [cs] version: 3.

Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A., 2017. Improved

training of wasserstein GANs. arXiv:1704.0 0 028 [cs, stat] .

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S., 2018. GANs

trained by a two time-scale update rule converge to a local nash equilibrium.

arXiv:1706.08500 [cs, stat] .

Hilbert, A., Madai, V.I., Akay, E.M., Aydin, O.U., Behland, J., Sobesky, J., Galinovic, I.,

Khalil, A .A ., Taha, A .A ., W&uumlrfel, J., Dusek, P., Niendorf, T., Fiebach, J.B.,

Frey, D., Livne, M., 2020. BRAVE-NET: fully automated arterial brain vessel

segmentation in patients with cerebrovascular disease. Neurology doi: 10.1101/

2020.04.08.20057570 . preprint

Hotter, B., Pittl, S., Ebinger, M., Oepen, G., Jegzentis, K., Kudo, K., Rozanski, M.,

Schmidt, W., Brunecker, P., Xu, C., Martus, P., Endres, M., Jungehülsing, G., Vill-

ringer, A., Fiebach, J., 2009. Prospective study on the mismatch concept in acute

stroke patients within the ﬁrst 24 h after symptom onset - 10 0 0Plus study. BMC

Neurol. 9, 60. doi: 10.1186/1471- 2377- 9- 60 .

Karnewar, A., Wang, O., 2020. MSG-GAN: multi-scale gradients for generative adver-

sarial networks. arXiv:1903.06048 [cs, stat]

Karras, T., Aila, T., Laine, S., Lehtinen, J., 2018. Progressive growing of GANs for im-

proved quality, stability, and variation. arXiv:1710.10196 [cs, stat] .

Kingma, D. , Ba, J. , 2014. Adam: a method for stochastic optimization. In: Interna-

tional Conference on Learning Representations .

P. Subramaniam, T. Kossen, K. Ritter et al. Medical Image Analysis 78 (2022) 102396

Kossen, T., Subramaniam, P., Madai, V.I., Hennemuth, A., Hildebrand, K., Hilbert, A.,

Sobesky, J., Livne, M., Galinovic, I., Khalil, A .A ., Fiebach, J.B., Frey, D., 2021. Syn-

thesizing anonymized and labeled TOF-MRA patches for brain vessel segmen-

tation using generative adversarial networks. Comput. Biol. Med. 131, 104254.

doi: 10.1016/j.compbiomed.2021.104254 .

Kwon, G., Han, C., Kim, D., 2019. Generation of 3D brain MRI using auto-encoding

generative adversarial networks. MICCAI doi: 10.1007/978- 3- 030- 32248- 9 _ 14 .

Livne, M., Rieger, J., Aydin, O.U., Taha, A .A ., Akay, E.M., Kossen, T., Sobesky, J., Kelle-

her, J.D., Hildebrand, K., Frey, D.,

Madai, V.I., 2019. A U-Net deep learning frame-

work for high performance vessel segmentation in patients with cerebrovascu-

lar disease. Front. Neurosci. 13. doi: 10.3389/fnins.2019.0 0 097 .

Lundervold, A.S., Lundervold, A., 2019. An overview of deep learning in medical

imaging focusing on MRI. Zeitschrift für Medizinische Physik 29 (2), 102–127.

doi: 10.1016/j.zemedi.2018.11.002 .

Masoudi, S., Harmon, S.A .A ., Mehralivand, S., Walker, S.M., Raviprakash, H., Bagci, U.,

Choyke, P.L., Turkbey, B., 2021. Quick guide on radiology image pre-processing

for deep learning applications in prostate cancer research. J. Med. Imaging 8 (1),

010901. doi: 10.1117/1.JMI.8.1.010901 .

Micikevicius, P., Narang, S., Alben,

J., Diamos, G., Elsen, E., Garcia, D., Ginsburg, B.,

Houston, M., Kuchaiev, O., Venkatesh, G., Wu, H., 2018. Mixed precision training.

arXiv:1710.03740 [cs, stat] .

Mironov, I., 2017. Renyi differential privacy. In: 2017 IEEE 30th Computer Security

Foundations Symposium (CSF), pp. 263–275. doi: 10.1109/CSF.2017.11 .

Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y., 2018. Spectral normalization for gen-

erative adversarial networks. arXiv:1802.05957 [cs, stat] .

Mutke, M.A., Madai, V.I., von Samson-Himmelstjerna, F.C., Zaro Weber, O., Re-

vankar, G.S., Martin, S.Z., Stengl, K.L., Bauer, M., Hetzer, S., Günther, M.,

Sobesky, J., 2014. Clinical evaluation of an arterial-spin-labeling product se-

quence in steno-occlusive

disease of the brain. PLoS ONE 9 (2), e87143. doi: 10.

1371/journal.pone.0087143 .

Neff, T., Payer, C., ˚

Atern, D., Urschler, M., 2018. Generative adversarial networks to

synthetically augment data for deep learning based image segmentation. In:

Proceedings of the OAGM Workshop 2018 doi: 10.3217/978- 3- 85125- 603- 1- 07 .

Ng, D., Lan, X., Yao, M.M.-S., Chan, W.P., Feng, M., 2021. Federated learning: a col-

laborative effort to achieve better medical imaging models for individual sites

that have small labelled datasets. Quant. Imaging Med. Surg. 11 (2), 852–857.

doi: 10.21037/qims- 20- 595 .

Sajjadi, M.S.M. , Bachem, O. , Lucic, M. , Bousquet, O. , Gelly, S. , 2018. Assessing gener-

ative models via precision and recall. In: Proceedings of the 32nd International

Conference on Neural Information Processing Systems. Curran Associates Inc.,

Red Hook, NY, USA, pp. 5234–5243 .

Sheller, M.J., Edwards, B., Reina, G.A., Martin, J., Pati, S., Kotrotsou, A., Milchenko, M.,

Xu, W., Marcus, D., Colen, R.R., Bakas, S., 2020. Federated learning in medicine:

facilitating multi-institutional collaborations without sharing patient data. Sci.

Rep. 10 (1), 12598. doi: 10.1038/s41598- 020- 69250- 1 .

Shokri, R., Stronati, M., Song, C., Shmatikov, V., 2017. Membership inference attacks

against machine learning models. arXiv:1610.05820 [cs, stat] .

Sun, L., Chen, J., Xu, Y., Gong, M., Yu, K., Batmanghelich, K., 2021. Hierarchical

amortized training for memory-eﬃcient high resolution 3D GAN. arXiv:2008.

01910 [cs, eess] .

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna,

Z., 2016. Rethinking the in-

ception architecture for computer vision. In: 2016 IEEE Conference on Computer

Vision and Pattern Recognition (CVPR) doi: 10.1109/CVPR.2016.308 .

Taha, A .A ., Hanbury, A ., 2015. Metrics for evaluating 3D medical image seg-

mentation: analysis, selection, and tool. BMC Med. Imaging 15. doi: 10.1186/

s12880- 015- 0068- x .

Torkzadehmahani, R., Kairouz, P., Paten, B., 2019. DP-CGAN: differentially pri-

vate synthetic data and label generation. pp. 0–0 https://openaccess.

thecvf.com/content _ CVPRW _ 2019/html/CV-COPS/Torkzadehmahani _ DP-CGAN _

Differentially _ Private _ Synthetic _ Data _ and _ Label _ Generation _ CVPRW _ 2019 _

paper.html .

Truex, S., Liu, L., Gursoy, M. E., Yu, L., Wei, W., 2019. Towards demystifying member-

ship inference attacks. arXiv:1807.09173 [cs] .

Valizadeh, S.A., Liem, F., Mérillat, S., Hänggi, J., Jäncke, L., 2018. Identiﬁcation of in-

dividual subjects on the basis of their brain anatomical features. Sci. Rep. 8 (1),

5611. doi: 10.1038/s41598-

018- 23696- 6 .

Wachinger, C., Golland, P., Kremen, W., Fischl, B., Reuter, M., Alzheimer’s Disease

Neuroimaging Initiative, 2015. BrainPrint: a discriminative characterization of

brain morphology. Neuroimage 109, 232–248. doi: 10.1016/j.neuroimage.2015.01.

032 .

Willemink, M.J., Koszek, W.A., Hardell, C., Wu, J., Fleischmann, D., Harvey, H., Fo-

lio, L.R., Summers, R.M., Rubin, D.L., Lungren, M.P., 2020. Preparing medical

imaging data for machine learning. Radiology 295 (1), 4–15. doi: 10.1148/radiol.

2020192224 .

Xie, L., Lin, K., Wang, S., Wang, F., Zhou, J., 2018. Differentially private generative

adversarial network. arXiv:1802.06739 [cs, stat] .

Yi, X., Walia, E., Babyn, P., 2019. Generative adversarial network in

medical imaging:

a review. Med. Image Anal. 58, 101552. doi: 10.1016/j.media.2019.101552 .

Zhang, L., Shen, B., Barnawi, A., Xi, S., Kumar, N., Wu, Y., 2021. FedDPGAN: federated

differentially private generative adversarial networks framework for the detec-

tion of COVID-19 pneumonia. Inf. Syst. Front. doi: 10.1007/s10796- 021- 10144- 6 .

Toward Sharing Brain Images:

Differentially Private TOF-MRA

Images With Segmentation Labels

Using Generative Adversarial Networks

6.1 Context Within Thesis

Synthetic images generated by GANs are not necessarily private. The images could still be

vulnerable to membership inference attacks leaking information about the training images.

Introducing differential privacy has been shown to reduce the vulnerability of GANs to these

attacks. By inserting carefully calibrated noise into the training of the discriminator, we can

put an upper bound on the individual privacy leakage.

The present work addresses the limitations of Chapters 4 and 5. Both studies successfully

synthesized realistic-looking TOF-MRA patches with their corresponding segmentation labels.

However, the generated images in Chapter 4 did not implement differential privacy, and the

privacy-preserving 3D images in Chapter 5 were not usable. The poor utility could be ascribed

to either low privacy guarantees or low image quality, both resulting from the high memory

demand of differential privacy, which did not allow for more complex networks.

The work in this chapter reduced the computational demand by generating 2D instead

of 3D image-label pairs. This allowed for the introduction of differential privacy in the

discriminator’s training while still maintaining sufficient complexity in the generator to

synthesize realistic images. We then explored the privacy-utility trade-off for the use-case of

brain vessel segmentation and identified an upper privacy bound for which the segmentation

became unstable and not usable anymore.

6. Toward Sharing Brain Images: Differentially Private TOF-MRA Images With

Segmentation Labels Using Generative Adversarial Networks

6.2 Journal Article

This chapter is based on the following publication that was published in Frontiers in Artificial

Intelligence:

T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand,

S. Pokutta, K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach,

and D. Frey. “Toward Sharing Brain Images: Differentially Private TOF-MRA Images

With Segmentation Labels Using Generative Adversarial Networks”. In: Frontiers in

Artificial Intelligence 5 (2022). doi:10.3389/frai.2022.813842

The original journal article is reprinted with permission of Frontiers. The article is open access

under the CC BY license.

Author Contribution

The first author Tabea Kossen conceptualized the study and interpreted the results together

with VIM, FB, AH, KH and DF. She implemented the GAN architecture and evaluations.

Additionally, she was responsible for the project administration, wrote the first version of the

manuscript, created the figures and coordinated the journal submission process.

Code Availability

The code for this project is publicly available:

https://github.com/prediction2020/Labe

led-TOF-MRA-with-DP.

ORIGINAL RESEARCH

published: 02 May 2022

doi: 10.3389/frai.2022.813842

Frontiers in Artificial Intelligence | www.frontiersin.org 1May 2022 | Volume 5 | Article 813842

Edited by:

Naimul Khan,

Ryerson University, Canada

Reviewed by:

Alessandro Bria,

University of Cassino, Italy

Zeeshan Ahmad,

Ryerson University, Canada

*Correspondence:

Tabea Kossen

[email protected]

Specialty section:

This article was submitted to

Medicine and Public Health,

a section of the journal

Frontiers in Artificial Intelligence

Received: 12 November 2021

Accepted: 31 March 2022

Published: 02 May 2022

Citation:

Kossen T, Hirzel MA, Madai VI,

Boenisch F, Hennemuth A,

Hildebrand K, Pokutta S, Sharma K,

Hilbert A, Sobesky J, Galinovic I,

Khalil AA, Fiebach JB and Frey D

(2022) Toward Sharing Brain Images:

Differentially Private TOF-MRA Images

With Segmentation Labels Using

Generative Adversarial Networks.

Front. Artif. Intell. 5:813842.

doi: 10.3389/frai.2022.813842

Toward Sharing Brain Images:

Differentially Private TOF-MRA

Images With Segmentation Labels

Using Generative Adversarial

Networks

Tabea Kossen 1,2*, Manuel A. Hirzel1, Vince I. Madai1,3,4, Franziska Boenisch 5,

Anja Hennemuth2,6,7, Kristian Hildebrand8, Sebastian Pokutta9,10, Kartikey Sharma9,

Adam Hilbert1, Jan Sobesky11,12, Ivana Galinovic 12, Ahmed A. Khalil12,13,14,

Jochen B. Fiebach12 and Dietmar Frey1

1CLAIM-Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany, 2Department of Computer

Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany, 3QUEST

Center for Responsible Research, Berlin Institute of Health (BIH), Charité-Universitätsmedizin Berlin, Berlin, Germany,

4Faculty of Computing, Engineering and the Built Environment, School of Computing and Digital Technology, Birmingham

City University, Birmingham, United Kingdom, 5Fraunhofer AISEC, Berlin, Germany, 6Institute for Imaging Science and

Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany, 7Fraunhofer

MEVIS, Bremen, Germany, 8Department VI Computer Science and Media, Berlin University of Applied Sciences and

Technology, Berlin, Germany, 9Department for AI in Society, Science, and Technology, Zuse Institute Berlin, Berlin, Germany,

10 Institute of Mathematics, Technical University Berlin, Berlin, Germany, 11 Johanna-Etienne-Hospital, Neuss, Germany,

12 Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany, 13 Department of Neurology, Max

Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, 14 Mind, Brain, Body Institute, Berlin School of

Mind and Brain, Humboldt-Universität Berlin, Berlin, Germany

Sharing labeled data is crucial to acquire large datasets for various Deep Learning

applications. In medical imaging, this is often not feasible due to privacy regulations.

Whereas anonymization would be a solution, standard techniques have been shown to

be partially reversible. Here, synthetic data using a Generative Adversarial Network (GAN)

with differential privacy guarantees could be a solution to ensure the patient’s privacy

while maintaining the predictive properties of the data. In this study, we implemented a

Wasserstein GAN (WGAN) with and without differential privacy guarantees to generate

privacy-preserving labeled Time-of-Flight Magnetic Resonance Angiography (TOF-MRA)

image patches for brain vessel segmentation. The synthesized image-label pairs were

used to train a U-net which was evaluated in terms of the segmentation performance

on real patient images from two different datasets. Additionally, the Fréchet Inception

Distance (FID) was calculated between the generated images and the real images to

assess their similarity. During the evaluation using the U-Net and the FID, we explored

the effect of different levels of privacy which was represented by the parameter ǫ. With

stricter privacy guarantees, the segmentation performance and the similarity to the real

patient images in terms of FID decreased. Our best segmentation model, trained on

synthetic and private data, achieved a Dice Similarity Coefficient (DSC) of 0.75 for ǫ=7.4

compared to 0.84 for ǫ= ∞ in a brain vessel segmentation paradigm (DSC of 0.69 and

0.88 on the second test set, respectively). We identified a threshold of ǫ < 5 for which the

Kossen et al. Labeled TOF-MRA With Differential Privacy

performance (DSC <0.61) became unstable and not usable. Our synthesized labeled

TOF-MRA images with strict privacy guarantees retained predictive properties necessary

for segmenting the brain vessels. Although further research is warranted regarding

generalizability to other imaging modalities and performance improvement, our results

mark an encouraging first step for privacy-preserving data sharing in medical imaging.

Keywords: brain vessel segmentation, differential privacy, Generative Adversarial Networks, neuroimaging,

privacy preservation

1. INTRODUCTION

Deep Learning techniques are on the rise in many neuroimaging

applications (Lundervold and Lundervold, 2019; Zhu et al., 2019;

Hilbert et al., 2020). While showing great potential, they also

demand large amounts of data. In medical imaging, data is often

limited and medical experts are often needed to manually label

the images (Willemink et al., 2020). Thus, large datasets are

difficult to acquire. One potential solution would be data sharing.

For this, true anonymization, i.e. verifying that no identifying

information is leaked, is essential to sustain the patient’s privacy

which poses a big challenge, especially for neuroimaging (Bannier

et al., 2021). For example, face-recognition software has recently

identified individuals on medical images (Schwarz et al., 2019)

and even face removal techniques can be partially reversed

(Abramian and Eklund, 2019). Besides that, the brain itself has

a unique structure and cortical foldings can be utilized to identify

individuals even in the developing stage (Duan et al., 2020).

Consequently, it is highly challenging to truly anonymize brain

scans without risking re-identification. A promising remedy is

the generation of synthetic data.

For this purpose, Generative Adversarial Networks (GANs)

have gained a lot of attention in the past years (Yi et al., 2019).

This also holds true for the neuroimaging domain. Here, GANs

have shown promising results for synthesized images for different

types of imaging (Bowles et al., 2018; Foroozandeh and Eklund,

2020; Kossen et al., 2021) as well as for other medical problems

such as segmentation (Cirillo et al., 2020). To ensure the privacy

of the training data, GANs can be combined with differential

privacy (Xie et al., 2018). Differential privacy is a mathematical

framework that provides an upper bound on individual privacy

leakage (Dwork, 2008). This way the maximum privacy leakage

for every individual in the training data can be quantified. There

are extensive studies about GANs with differential privacy for

synthesizing natural images and tabular medical data (Xie et al.,

2018; Torkzadehmahani et al., 2019; Xu et al., 2019; Yoon et al.,

2019, 2020). Recently, Cheng et al. (2021) did a comprehensive

study about synthetic images and classification fairness with a

varying amount of privacy on various types of imaging data.

Among them were also 2D medical datasets such as chest x-

rays and melanoma images. Few other studies generated chest

x-rays with privacy guarantees as well (Nguyen et al., 2021; Zhang

et al., 2021). However, to date, no study has investigated whether

2D synthesized data using a GAN with differential privacy can

be utilized for a 3D medical application. Additionally, to the

best of our knowledge, GANs with differential privacy have

neither been used to synthesize labels for medical images nor the

neuroimaging domain yet.

In this study, we utilized a Wasserstein GAN (WGAN) with

and without differential privacy guarantees to synthesize

anonymously and labeled 2D Time-of-Flight Magnetic

Resonance Angiography (TOF-MRA) image patches for

brain vessel segmentation. The generated labeled image patches

were evaluated in terms of the segmentation performance by

training a U-Net and in terms of image quality using the Fréchet

Inception Distance (FID). The trained U-Net was further tested

on a second dataset. Overall, we investigated the effect of different

levels of privacy. Additionally, we visualized generated images

with and without privacy together with the real patient images

using t-distributed stochastic neighbor embedding (t-SNE).

In summary, our contributions are:

1. To the best of our knowledge, we are the first to

synthesize images with differential privacy guarantees in the

neuroimaging domain.

2. We also generate the corresponding segmentation labels to

evaluate the image-label pairs in an end-to-end brain vessel

segmentation paradigm on 3D medical data for different levels

of privacy.

3. For evaluation, we compare the distances between the

generated data and both the training and test data

to investigate the similarity of the synthesized to the

original data.

4. We visualize our generated images with and without

differential privacy and the original data using t-SNE.

2. RELATED STUDY

For the synthesis of medical images, deep generative models

have demonstrated promising results. Among them, especially

GANs and variational autoencoders (VAE) have shown good

performance in tasks such as data augmentation (Bowles

et al., 2018), image-to-image translations (Isola et al., 2018),

or reconstruction (Tudosiu et al., 2020). For the purpose

of synthesizing privacy-preserving images, VAE has two

disadvantages compared to GANs: First, they produce blurrier

images (Wang et al., 2020), and second, the training images are

directly fed into the network which makes them more vulnerable

to membership inference attacks (Chen et al., 2020).

Hence, in this context, GAN architectures with differential

privacy have been used in many previous studies to synthesize

non-medical images (Xie et al., 2018; Torkzadehmahani et al.,

Frontiers in Artificial Intelligence | www.frontiersin.org 2May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

2019; Xu et al., 2019) and medical tabular data (Yoon

et al., 2019, 2020). However, only few studies have applied

GANs with differential privacy to medical images. Additionally,

these were restricted to chest x-rays (Cheng et al., 2021; Nguyen

et al., 2021; Zhang et al., 2021). So far in the neuroimaging

domain, the application of GANs remained without differential

privacy (Bowles et al., 2018; Foroozandeh and Eklund, 2020;

Kossen et al., 2021).

In the present study, we propose a GAN architecture with

differential privacy in the neuroimaging domain. Along with

our synthesized images, we generate the segmentation labels for

testing our differentially private patches in an end-to-end brain

vessel segmentation paradigm.

3. MATERIALS AND METHODS

3.1. Data

In total, 131 patients with cerebrovascular disease from the

PEGASUS study (N =66) and the 1000Plus study (N =65) were

utilized in this study. All patients gave their written informed

consent and the studies have been authorized by the ethical

review committee of Charité–Universitätsmedizin Berlin. More

details on both datasets can be found in Mutke et al. (2014) for the

PEGASUS study and Hotter et al. (2009) for the 1000Plus study.

The brain scans were conducted on a clinical 3T whole-

body system (Magnetom Trio, Siemens Healthcare, Erlangen,

Germany) utilizing a 12-channel receive radiofrequency coil

(Siemens Healthcare) for head imaging. For both studies the

parameters were: voxel size =(0.5 x 0.5 x 0.7) mm3; matrix size:

312 x 384 x 127; TR/TE =22 ms/3.86 ms; acquisition time: 3:50

min, flip angle =18◦.

The PEGASUS dataset was split into a training (41 patients),

validation (11 patients), and test (14 patients) set. The training

set was utilized for training the GANs (refer to Figure 1),

whereas the validation and test set were utilized for the

parameter selection of the U-Net and assessing the generalizable

performance of the U-Net, respectively. Additionally, the 65

patients from the 1000Plus dataset were used as a second test set.

For each patient of the training set 1,000 2D image patches and

corresponding segmentation masks of size 96x96 were extracted.

This patch size has been shown to be the most suitable patch

size for Wasserstein based GAN architectures for this use case

(Kossen et al., 2021). Due to the overemphasis of background

compared to brain vessels, 500 patches showing a vessel in the

center were extracted. The remaining 500 patches were extracted

randomly. It was verified that all patches were only selected at

most once.

3.2. Differential Privacy

To account for the level of privacy of the generated data and

provide theoretical privacy guarantees, differential privacy was

implemented (Dwork, 2008). A randomized algorithm f:d→

Rsatisfies (ǫ,δ)-differential privacy if for any two databases

d1,d2∈dthat differ from each other by a single sample, the

following holds:

Pr[f(d1)∈S]≤exp(ǫ)∗Pr[f(d2)∈S]+δ(1)

where f(d1) and f(d2) denote the output of fand Pr the

probabilities and with S⊂R.δis the probability that the value

of ǫholds true. With a probability of 1 −δthis equation is

equivalent to:

log Pr[f(d1)∈S]

Pr[f(d2)∈S]≤ǫ. (2)

Thus, differential privacy holds true if the algorithm’s output for

d1and d2is very similar to each other. In other words, one sample

should not have a big impact on the algorithm’s output. This way

the privacy of each possible datapoint is preserved. The maximal

deviation between the outputs is given by exp(ǫ). In this way, ǫ

can quantify the level of privacy with small values of ǫindicating

stricter privacy guarantees.

Mironov (2017) proposed Rényi differential privacy, a natural

relaxation of differential privacy built upon Rényi divergence.

Rényi divergence of order α > 1 of two probability distributions

Pand Qis defined as:

Dα(PkQ):=1

α−1log Ex∼QP(x)

Q(x)α

, (3)

where P(x) is the probability density of Pat point x. A

randomized algorithm f:d→Sis (α,ǫ)-Rényi differentially

private for any adjacent d1,d2∈dif the Rényi divergence Dα

is not larger than ǫ:

Dα(f(d1)kf(d2)) ≤ǫ. (4)

The advantage of Rényi differential privacy is that it provides

a tight composition for Gaussian mechanisms while preserving

essential properties of differential privacy. This means that (α,ǫ)-

Rényi differential privacy for composed mechanisms add up: the

composition of f(d1) satisfying (α,ǫ1)-Rényi differential privacy

and f(d2) satisfying (α,ǫ2)-Rényi differential privacy satisfies

(α,ǫ1+ǫ2)-Rényi differential privacy. Moreover, (α,ǫ)-Rényi

differential privacy has been shown to provide a tighter bound

on the privacy budget of compositions compared to (ǫ,δ)-

differential privacy (Mironov, 2017). (α,ǫ)-Rényi differential

privacy can also be translated back into (ǫ,δ)-differential privacy.

Balle et al. (2019) has proven that (α,ǫ)-Rényi differential privacy

also satisfies (ǫ′,δ)-differential privacy for any 0 < δ < 1.

According to Balle et al. (2019)ǫ′is then defined as:

ǫ′=ǫ+log α−1

α−log δ+log α

α−1. (5)

The most data sensitive part when training the proposed

GAN architecture is the gradient update of the discriminator

after training samples are presented. For that, the differentially

private stochastic gradient descent algorithm proposed by Abadi

et al. (2016) can be utilized. Here, differential privacy was

implemented by clipping these gradients and adding Gaussian

noise to avoid the memorization of single samples. Additionally,

Rényi differential privacy was then used to analyze the privacy

guarantees. In the last step, (α,ǫ)-Rényi differential privacy

is translated back to (ǫ,δ)-differential privacy. The parameter

Frontiers in Artificial Intelligence | www.frontiersin.org 3May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 1 | Study overview. Generative Adversarial Networks (GANs) with different levels of privacy guarantees are trained to synthesize labeled Time-of-Flight

Magnetic Resonance Angiography (TOF-MRA) patches. These are evaluated in a brain vessel segmentation paradigm and are compared to a segmentation network

trained on real patient image-label pairs. DP =Differential Privacy; DSC =Dice Similarity Coefficient.

δis typically chosen to be the inverse of the dataset size

(Torkzadehmahani et al., 2019). Thus, throughout this study, it

was set to 1/41, 000 =2.44e−5.

3.3. Network Architecture

The GAN architecture was based on the WGAN by Arjovsky

et al. (2017) and extended by inserting different amounts of noise

into the gradients of the discriminator in the training process

for differential privacy. Two neural networks were trained: the

generator Gand the discriminator D. The generator synthesized

data samples that were then assessed with respect to their realness

by the critic or discriminator. The discriminator was fed both

real and synthesized data and assigned a critic score for each

sample. The score of the synthetic data xgen was used to train the

generator. For the generator the overall training loss was:

lossG= −D(xgen). (6)

This way the generator aimed to maximize the realness of the

generated samples. In contrast to that, the discriminator intended

to minimize the scores for generated samples xgen and maximize

them for patient samples xreal:

lossD=D(xgen)−D(xreal) (7)

To enforce a Lipschitz constraint and, thus, put a bound on

the gradients, the discriminator’s weights were clipped after

each backpropagation step. This is a simple way to stabilize the

training (Arjovsky et al., 2017).

The architecture of the generator and discriminator is shown

in Figure 1. The generator took a noise vector sampled from

a Gaussian distribution of size 128 as input. This was then fed

through 1 linear layer and 6 upsampling convolutional layers as

shown in Figure 1. The generator outputs 2 96 x 96 images -

1 channel for the image and 1 for the segmentation label. The

discriminator’s input was 2 images: either the real patient image-

label pair or the generated one. These were then fed through

6 layers of downsampling convolutional layers as depicted in

Figure 1. The slope of the LeakyReLU activation was 0.2.

The GANs were implemented in PyTorch 1.8.1 using the

library opacus 0.14.0 for the differential privacy guarantees. Our

Frontiers in Artificial Intelligence | www.frontiersin.org 4May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

code was built upon the official GAN example by opacus1and

is publicly available2. The learning rate for both discriminator

and generator was 0.00005 using the RMSprop optimizer.

The kernel size was 4 with strides of 2. In each epoch, the

discriminator was updated 5 times. The network was trained

for 50 epochs. To randomly sample the training images, the

UniformWithReplacementSampler from the opacus package was

used. The sampling rate was the batch size of 32 divided

by the number of samples (41,000). The clipping parameter

for the WGAN was set to 0.01 and the clipping parameter

for the differential privacy was 1. In total, 8 different GANs

were trained with varying values of ǫ(noise multiplier was

set to {∞, 2, 1.5, 1.2, 1, 0.8, 0.725, 0.65}). Each GAN trained with

additional noise was trained 5 times for robust results.

All hyperparameters mentioned in the last paragraph were the

result of a tuning process and all models were trained on a Tesla

V100. The training time of one GAN including evaluation took

∼1.4 days.

3.4. Performance Evaluation

Among the many metrics to evaluate synthetic data (Yi et al.,

2019), we selected three to estimate the quality of our synthesized

images. First, we evaluated our synthesized image-label pairs

by visual inspection, and second, using the downstream task of

segmentation as suggested by Yi et al. (2019). Additionally, we

compared the images using the FID as proposed in previous

studies (Haarburger et al., 2019; Coyner et al., 2022).

The generated image-label pairs were evaluated by a U-Net for

brain vessel segmentation adapted from Livne et al. (2019). After

training the GANs, 41,000 image-label pairs were generated.

These were used to train 8 U-Net with different hyperparameter

settings varying in learning rates, dropout, and classical data

augmentation. The best U-Net was then selected based on the

best Dice Similarity Coefficient (DSC) on the validation set that

included real patient images. The final performance was then

evaluated in terms of DSC and balanced average Hausdorff

distance (bAHD) on the test set. The DSC that evaluated the

segmented voxels is defined as:

DSC =2TP

2TP +FP +FN (8)

where TP are the true positives, FP are the false positives, and

FN are the false negatives. As the DSC quantifies the overlap of

the ground truth and prediction scaled by the total number of

voxels in ground truth and prediction, it is a robust performance

measure for imbalanced segmentations, i.e., images contain more

background than segmented area. The bAHD is a newly proposed

metric for evaluating segmentations (Aydin et al., 2021):

bAHD =



NGX

g∈G

min

s∈Sd(g,s)+1

NGX

s∈S

min

g∈Gd(s,g)

/2 (9)

where NGis the number of ground truth voxels, Gis the set of

voxels belonging to the ground truth, and Sis the set of voxels

1https://github.com/pytorch/opacus/blob/master/examples/dcgan.py

2https://github.com/prediction2020/Labeled-TOF-MRA-with-DP

of the predicted segmentation. In other words, the bAHD is the

average of the directed Hausdorff distance from the ground truth

to the segmentation and the directed Hausdorff distance from the

segmentation to the ground truth both scaled by the number of

ground truth voxels.

Additionally, the DSC and bAHD of the U-Net models were

assessed on the 1000Plus dataset. The GAN and U-Nets were

implemented in an end-to-end pipeline. To calculate both DSC

and bAHD, we used the EvaluateSegmentation tool by Taha and

Hanbury (2015).

As an additional metric, the image quality was measured by

the FID (Heusel et al., 2018). The FID is a distance that measures

the similarity between images by comparing the activations of a

pre-trained Inception-v3 network. Here, the difference between

the activations in the pool3 layer of the generated images in

contrast to the real images is measured.

FID =

µreal −µgen



2+Tr σreal +σgen −2σrealσgen1/2

(10)

with N(µreal,σreal) and N(µgen,σgen) as the distributions

of the features of the pool3 layer of real and synthesized

data, respectively.

To explore to which degree the generated images reproduced

the training set, the FID between the synthetic data and both the

training and test data was calculated and compared for different

levels of privacy.

Finally, we measured the similarity between the images

synthesized by the GANs to check whether a model suffered

from mode collapse. For each model, we generated 1,000 images

and calculated the Structural Similarity Index Measure (SSIM)

between them and averaged the values. We repeated this analysis

for all 5 runs for each ǫvalue, for the model with ǫ= ∞ and the

real images. The SSIM between two images xand yis defined as a

product of luminance, contrast, and structure according to Wang

et al. (2004):

SSIM(x,y)=(2µxµy+c1)(2σxy +c2)

(µ2

x+µ2

y+c1)(σ2

x+σ2

y+c2), (11)

where µxis the average of x,σxis the variance, and σxy is the

covariance of xand y.c1=(k1L)2and c2=(k2L)2are for

stabilization with Lbeing the dynamic range of the pixel values

and k1≪1 and k2≪1 small constants.

3.5. Visualization Using t-SNE

Finally, the generated images with and without differential

privacy and the real patient images were visualized using a

t-SNE (Maaten and Hinton, 2008). t-SNE is an approach to

reducing dimensionality while preserving the structure of the

high dimensional data points. First, all data points are embedded

into an SNE which computed the pairwise similarities utilizing

conditional probabilities. For points xiand xjthe conditional

probability pj|iof xichoosing xjas its neighbor is defined as

pj|i=exp(−

xi−xj



2/2σ2

Pk6=iexp −kxi−xkk2/2σ2

i)(12)

Frontiers in Artificial Intelligence | www.frontiersin.org 5May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 2 | Synthetic TOF-MRA patches (top row) and corresponding segmentation labels (bottom row) with different values of ǫcompared to real patient data (first

column). A lower ǫ(i.e., more privacy) leads to more noisy images.

and the symmetrized similarity as:

pij =pj|i+pi|j

2N(13)

with Nbeing the dimensionality of the data. Then the

algorithm aims to learn a lower dimensional representation

of the similarities. In order to get distinct clusters and

avoid overcrowding, a Student’s tdistribution that reflects the

similarities pj|iis used (Maaten and Hinton, 2008):

qij =(1 +

yi−yj



2)−1

Pk6=m(1 +

yk−ym



2)−1(14)

Starting from random initialization, the locations of the points

in the lower dimensional space yiare shifted so that a cost

function was minimized using a gradient descent method.

Instead of the Kullback-Leibler divergence, we here chose the

Wasserstein metric due to its success in GAN applications

(Arjovsky et al., 2017).

T-distributed stochastic neighbor embedding was

implemented using the sklearn package (Pedregosa et al.,

2011). The perplexity parameter reflecting the density of the

data distribution was chosen to be 30 which is in the suggested

range by Maaten and Hinton (2008). The images of the best

performing GAN with and without differential privacy, as

well as the real images were projected, onto 2 dimensions for

visualization purposes.

4. RESULTS

Visually, the synthetic image-label pairs appeared noisier

with decreasing ǫ, i.e., with stricter privacy guarantees

(Figure 2). Differentially private images with ǫ=1.3 show

almost only noise. The visual results corresponded to the

segmentation performance when training a U-Net on the

generated image-label pairs with different values of ǫ(Figure 3).

In Figure 3A, the averaged DSC over U-Net models that

were trained on synthetic data from five different GANs for

each ǫis plotted. With decreasing ǫ, the DSC decreased and

got more unstable, i.e., more variation between the different

models for the same ǫ. In particular, models with ǫ > 5

showed increased stability compared to models with lower

ǫ. When considering only the best run of the five models

(Figures 3B,C) the performance again dropped for decreasing

ǫ. This was reflected by a lower DSC and a higher bAHD.

The corresponding segmentation error maps are shown

in Figure 4.

When testing the best U-Net models on the 1000Plus dataset,

a similar trade-off between privacy and utility can be seen

(Figure 5). Here, the U-Net performance in terms of DSC

decreased more rapidly in comparison to the performance on

the PEGASUS dataset, starting at ǫ=8 with DSC ≈0.69

(Figure 5A). The bAHD showed instability in performance for

ǫ < 3 (Figure 5B).

The FID between the training data and the generated data

overall showed a similar trend: Less privacy led to a smaller

distance to the training data (Figure 6A). The generated data

trained without differential privacy (ǫ= ∞) showed an FID

of 62 compared to an FID of 244 and 228 for the images with

ǫ=5.7 and ǫ=10.2, respectively. The distance to the

test data was similar for different ǫvalues. Figure 6B shows

the difference between the distances to the training images and

test images for different values of ǫ. Here, the differences were

increasing for higher ǫvalues with ǫ= ∞ showing the largest

difference, at least twice as large compared to all models trained

with privacy guarantees.

Evaluating GAN models during training, we found the

best performing image-label pairs when training with a noise

multiplier of 0.65 for 29 epochs. This resulted in ǫ=7.4. The

U-Net trained on these synthetic image-labels showed a DSC of

0.75 on the test set (Table 1). The segmentation of an example

patient is shown in Figure 7. The big vessels are segmented

reasonably well while a lot of errors occur when smaller vessels

are segmented.

The similarity between the images is shown in Figure 8. For

ǫ < 2, high SSIM values were observed (SSIM >0.98). In

contrast, higher ǫvalues led to less similar images produced by

one model.

Figure 9 shows the t-SNE embedding of the best performing

GAN with and without differential privacy and the real patient

images. The synthetic images without privacy guarantees are

overall close to the real images. The images with differential

privacy cluster at the edges far away from the real images.

Frontiers in Artificial Intelligence | www.frontiersin.org 6May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 3 | Test segmentation performance of U-Nets trained on generated data with different values of ǫ(PEGASUS dataset). (A) shows a boxplot showing the DSC

over 5 runs for each value of ǫ. In (B), only the run with the best DSC is shown. (C) shows the balanced average Hausdorff distance (bAHD) in voxels for the best run

for each ǫ. The errorbar depicts the SD between patients. For ǫ < 5, the performance becomes unstable and worse compared to higher ǫvalues.

FIGURE 4 | Error maps of one example test patient for U-Nets trained on either real image-label pairs or generated image-labels with different values of ǫ. True

positives are shown in red, false positives in green, and false negatives in yellow. For lower ǫ, more errors occur.

5. DISCUSSION

In the present study, we generated differentially private TOF-

MRA images with corresponding labels and explored the trade-

off between privacy and utility on two different test sets. We

proposed different evaluation schemes including training a

segmentation network and identified a threshold of ǫ < 5

with DSC <0.61 for which the segmentation performance

became unstable and not usable. Our best segmentation model

trained on synthetic and private data achieved a DSC of 0.75 for

Frontiers in Artificial Intelligence | www.frontiersin.org 7May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 5 | Segmentation performance in terms of (A) DSC and (B) bAHD in voxels of the best performing model for each ǫevaluated on a second dataset

(1000Plus). The DSC shows a decreasing performance starting for ǫ < 8.

FIGURE 6 | Comparison of Fréchet Inception Distance (FID) between the synthetic images with different ǫvalues and both the real training data (light green squares

and light blue dotted line) and the real test data (dark green triangles and dark blue dashed line). (A) shows the absolute values for the 5 runs per ǫwhereas (B) shows

the difference between the distances from synthetic to training and synthetic to test. The higher the value of ǫ, the closer the images are to the training set. The

distance to the test set remains stable for different ǫvalues. The difference shown in (B) is the highest for the model trained without differential privacy.

TABLE 1 | Overview of segmentation performances in terms of DSC and bAHD for a U-Net trained on real patient images and generated with and without

differential privacy. The best of the three U-Net models is shown in bold for each metric and dataset. The best U-Net with differential privacy guarantees has an ǫof 7.4.

SD stands for standard deviation.

PEGASUS 1000Plus

U-Net trained on Mean DSC (SD) Mean bAHD (SD) Mean DSC (SD) Mean bAHD (SD)

Real images 0.89 (0.02) 0.33 (0.11) 0.90 (0.02) 0.69 (0.47)

Generated

images (ǫ= ∞)0.84 (0.02) 0.61 (0.12) 0.88 (0.02) 0.58 (0.32)

Generated

images (ǫ=7.4) 0.75 (0.04) 2.49 (1.96) 0.69 (0.04) 2.87 (1.25)

Frontiers in Artificial Intelligence | www.frontiersin.org 8May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 7 | Segmentation error maps of one test patient by the best U-Net model using differential privacy (ǫ=7.4). Red indicates the true positives, green stands for

false positives, and yellow for false negatives. (A) shows a slice containing big vessels, (B) small ones, and (C) the whole vessel tree. The segmentation works

reasonably well with errors occurring particularly when segmenting small vessels.

FIGURE 8 | Mean Structural Similarity Index Measure (SSIM) between 1,000 generated images for differential ǫvalues. The errorbar shows the standard deviation

over the 5 different runs for each ǫvalue. For ǫ < 2, the similarity between images is high, whereas it decreases for higher ǫvalues.

ǫ=7.4 in a brain vessel segmentation paradigm. Our results

mark the first step in data sharing with privacy guarantees for

neuroimaging problems.

Since differential privacy is based on introducing noise, a

decrease in utility is expected with the introduction of differential

privacy. Our results confirm this notion. For ǫ= ∞, we achieved

a DSC of 0.84 which is comparable to the literature (Kossen

et al., 2021). Stricter privacy constraints indicated by a lower ǫ

led to worse visual results as well as poorer segmentation results

(Figures 2–5). This also corresponds to findings in previous

studies on differential privacy (Xie et al., 2018; Xu et al., 2019;

Yoon et al., 2019). The increasing amount of noise might also

be the reason for the instability of the GAN training for lower

ǫvalues, especially for ǫ < 5 (Figure 2A). A performance

drop could also be observed for testing the U-Nets trained

on differential private image-label pairs on a second dataset

(Figure 5). In comparison to the first test set, the performance

drop occurred already for higher values of ǫ(ǫ < 8 compared

to ǫ < 5). Thus, models with fewer privacy guarantees showed

better generalizability. A reason for that might be again the

lower amount of noise and, therefore, fewer restrictions during

training. This is also in line with our findings in Figure 8. Here,

images generated from models with lower ǫ(ǫ < 2) values

showed more similarities between each other, thus indicating

more mode collapse compared to models with higher ǫvalues.

This could be another reason for the performance drop for

models with stricter privacy guarantees.

Images with larger ǫvalues also showed greater similarity

in terms of FID to the training images than those with stricter

privacy guarantees. This indicates that more specific features of

the training set can be memorized for less noisy models. The

FID between test images and synthetic images (FIDtest) stayed

constant for different values of ǫ(Figure 6A). The difference

between the FIDtrain and FIDtest can be seen as a measure of

the degree to which the images overfit the training set. Even for

the model with our largest ǫ=10.2, the difference between

FIDtrain and FIDtest was only half compared to the difference

of the model without any privacy constraints. This shows that

Frontiers in Artificial Intelligence | www.frontiersin.org 9May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

FIGURE 9 | Visualization of real and generated images with and without differential privacy in a t-SNE embedding. Each point represents an image. The distribution of

real images and generated images without privacy almost entirely overlap. In contrast, the images with privacy guarantees are only partially overlapping and cluster at

the edges, distant from the real images. The embedding showing the specific image instead of a point can be found in the Figure S1 in the supplementary material.

differential privacy substantially contributed to the prevention

of the memorization of the training set. Those findings are

also in line with the embedding shown in Figure 9 in which

the differentially private images are further away from the

training images compared to the images generated without any

privacy guarantees.

Machine learning models including GANs are susceptible

to so-called membership inference attacks (Shokri et al., 2017;

Hayes et al., 2019; Chen et al., 2020). Here, an attack model is

trained to predict whether a sample was part of the training set. If

these attacks are successful, the privacy of the training samples is

jeopardized. Differential privacy has been shown to decrease the

model’s vulnerability to privacy attacks (Shokri et al., 2017; Hayes

et al., 2019). While there is no consensus about an exact value

of ǫ, studies such as Hayes et al. (2019) and Bagdasaryan and

Shmatikov (2019) consider a value of ǫ < 10 acceptable. In this

study, we were able to synthesize image-label pairs with single-

digit ǫ(i.e., ǫ=7.4) that still show reasonable performance in

the segmentation task. Naturally, further research is necessary

to validate that our models would successfully defend against

membership inference attacks.

Whereas, the segmentation performance in terms of DSC

showed a consistent trend, this was not always true for the

bAHD. Figure 3C shows overall comparable results to the DSC

performance with some fluctuations. These fluctuations can be

explained by selecting the best model based on the best validation

DSC and not bAHD. In Figure 5B, however, the segmentation

model for ǫ=1.3 seemed to perform better compared to

models with ǫ=1.9 and ǫ=2.7. An explanation for

this might be the number of false positives and false negatives

in the segmentations. For ǫ=1.3, barely any voxel was

identified as belonging to a vessel which resulted in many false

negatives. For the other two models, there were many false

positives with a large distance to the ground truth. The bAHD

considers these models to be worse although none of the three

models show a good segmentation performance (see Figure S2

in the supplementary material). The characteristic of penalizing

especially false positives should be taken into consideration in

future studies when using the bAVD as a metric.

The main limitations of the present study are the

computational restrictions. Due to that only 2D patches

were used. Additionally, more complex GAN architectures

consisting of multiple generators and/or discriminators such

as PrivGAN (Mukherjee et al., 2021) or PATE-GAN (Yoon

et al., 2019) could not be implemented. Especially PrivGAN

appears to be an interesting direction for future research

since it does not only implement differential privacy but also

aims to reduce vulnerability toward membership inference

attacks directly.

6. CONCLUSION

In the present study, we synthesized differentially private

TOF-MRA images and segmentation labels using GANs

for a neuroimaging application. We proposed different

evaluation metrics including the performance of a trained

neural network for vessel segmentation. Even with privacy

constraints, we could train a segmentation model that

works reasonably well on real patient data. This is a crucial

step toward synthesizing medical imaging data that both

preserves predictive properties and privacy. Nonetheless,

further studies should be conducted to evaluate if our findings

generalize to other types of medical imaging data and to

Frontiers in Artificial Intelligence | www.frontiersin.org 10 May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

further improve performance. Our synthetic data is available

upon request.

DATA AVAILABILITY STATEMENT

The data analyzed in this study is subject to the following

licenses/restrictions: The datasets used in this article are not

readily available because data protection laws prohibit sharing

the PEGASUS and 1000Plus datasets at the current time

point. Requests to access these datasets should be directed

to [email protected].

ETHICS STATEMENT

The studies involving human participants were reviewed and

approved by Ethics Committee of Charité University Medicine

Berlin and Berlin State Ethics Board. The patients/participants

provided their written informed consent to participate in

this study.

AUTHOR CONTRIBUTIONS

TK, MH, VM, FB, KS, AHe, KH, SP, AHi, and DF: concept and

design. VM, JS, IG, AK, and JF: acquisition of data. TK, VM, FB,

AHe, KH, and DF: model design. TK: data analysis. TK, MH, VM,

FB, AHe, KH, and DF: data interpretation. TK, MH, VM, FB, KS,

AHe, KH, SP, AHi, JS, IG, AK, JF, and DF: manuscript drafting

and approval. All authors contributed to the article and approved

the submitted version.

FUNDING

This study has received funding from the European Commission

through a Horizon2020 grant (PRECISE4Q grant no. 777 107,

coordinator: DF) and the German Federal Ministry of Education

and Research through a Go-Bio grant (PREDICTioN2020 grant

no. 031B0154 lead: DF).

ACKNOWLEDGMENTS

Computation has been performed on the HPC for the Research

cluster of the Berlin Institute of Health. We acknowledge support

from the German Research Foundation (DFG) and the Open

Access Publication Fund of Charité-Universitätsmedizin Berlin.

SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found

online at: https://www.frontiersin.org/articles/10.3389/frai.2022.

813842/full#supplementary-material

REFERENCES

Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., et al.

(2016). “Deep learning with differential privacy,” in Proceedings of the 2016

ACM SIGSAC Conference on Computer and Communications Security, CCS ’16

(New York, NY: Association for Computing Machinery), 308–318.

Abramian, D., and Eklund, A. (2019). “Refacing: reconstructing anonymized facial

features using gans,” in 2019 IEEE 16th International Symposium on Biomedical

Imaging (ISBI 2019) (Venice: IEEE).

Arjovsky, M., Chintala, S., and Bottou, L. (2017). Wasserstein GAN.

arXiv:1701.07875 [cs, stat]. arXiv: 1701.07875.

Aydin, O. U., Taha, A. A., Hilbert, A., Khalil, A. A., Galinovic, I., Fiebach, J. B.,

et al. (2021). On the usage of average Hausdorff distance for segmentation

performance assessment: hidden error when used for ranking. Eur. Radiol. Exp.

5, 4. doi: 10.1186/s41747-020-00200-2

Bagdasaryan, E., and Shmatikov, V. (2019). Differential privacy has disparate

impact on model accuracy. CoRR, abs/1905.12101.

Balle, B., Barthe, G., Gaboardi, M., Hsu, J., and Sato, T. (2019). Hypothesis testing

interpretations and renyi differential privacy. arXiv:1905.09982 [cs, stat]. arXiv:

1905.09982.

Bannier, E., Barker, G., Borghesani, V., Broeckx, N., Clement, P., Emblem, K.

E., et al. (2021). The Open Brain Consent: Informing research participants

and obtaining consent to share brain imaging data. Hum. Brain Mapp. 42,

1945–1951. doi: 10.1002/hbm.25351

Bowles, C., Chen, L., Guerrero, R., Bentley, P., Gunn, R., Hammers, A., et al. (2018).

GAN Augmentation: augmenting training data using generative adversarial

networks. arXiv:1810.10863 [cs]. arXiv: 1810.10863.

Chen, D., Yu, N., Zhang, Y., and Fritz, M. (2020). “Gan-leaks: a taxonomy of

membership inference attacks against generative models,” in Proceedings of the

2020 ACM SIGSAC Conference on Computer and Communications Security,

CCS ’20 (New York, NY: Association for Computing Machinery), 343–362.

Cheng, V., Suriyakumar, V. M., Dullerud, N., Joshi, S., and Ghassemi, M. (2021).

“Can you fake it until you make it? impacts of differentially private synthetic

data on downstream classification fairness,” in Proceedings of the 2021 ACM

Conference on Fairness, Accountability, and Transparency, FAccT ’21 (New

York, NY: Association for Computing Machinery), 149–160.

Cirillo, M. D., Abramian, D., and Eklund, A. (2020). Vox2vox:

3d-gan for brain tumour segmentation. CoRR, abs/2003.13653.

doi: 10.1007/978-3-030-72084-1_25

Coyner, A. S., Chen, J. S., Chang, K., Singh, P., Ostmo, S., Chan, R. V. P.,

et al. (2022). Synthetic medical images for robust, privacy-preserving training

of artificial intelligence: application to retinopathy of prematurity diagnosis.

Ophthalmol. Sci. 2, 100126. doi: 10.1016/j.xops.2022.100126

Duan, D., Xia, S., Rekik, I., Wu, Z., Wang, L., Lin, W., et al. (2020). Individual

identification and individual variability analysis based on cortical folding

features in developing infant singletons and twins. Hum. Brain Mapp. 41,

1985–2003. doi: 10.1002/hbm.24924

Dwork, C. (2008). “Differential privacy: a survey of results,” in Theory and

Applications of Models of Computation, Lecture Notes in Computer Science, eds

M. Agrawal, D. Du, Z. Duan, and A. Li (Berlin; Heidelberg: Springer), 1–19.

Foroozandeh, M., and Eklund, A. (2020). Synthesizing brain tumor images

and annotations by combining progressive growing GAN and SPADE.

arXiv:2009.05946 [cs]. arXiv: 2009.05946.

Haarburger, C., Horst, N., Truhn, D., Broeckmann, M., Schrading, S., Kuhl,

C., et al. (2019). “Multiparametric magnetic resonance image synthesis

using generative adversarial networks,” in Eurographics Workshop on Visual

Computing for Biology and Medicine (The Eurographics Association Version

Number: 011-015), 5.

Hayes, J., Melis, L., Danezis, G., and Cristofaro, E. D. (2019). LOGAN: membership

inference attacks against generative models. Proc. Privacy Enhan. Technol. 2019,

133–152. doi: 10.2478/popets-2019-0008

Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., and Hochreiter, S. (2018).

GANs trained by a two time-scale update rule converge to a local nash

equilibrium. arXiv:1706.08500 [cs, stat]. arXiv: 1706.08500.

Hilbert, A., Madai, V. I., Akay, E. M., Aydin, O. U., Behland, J., Sobesky, J., et al.

(2020). Brave-net: Fully automated arterial brain vessel segmentation

in patients with cerebrovascular disease. Front. Artif. Intell. 3, 78.

doi: 10.3389/frai.2020.552258

Frontiers in Artificial Intelligence | www.frontiersin.org 11 May 2022 | Volume 5 | Article 813842

Kossen et al. Labeled TOF-MRA With Differential Privacy

Hotter, B., Pittl, S., Ebinger, M., Oepen, G., Jegzentis, K., Kudo, K., et al. (2009).

Prospective study on the mismatch concept in acute stroke patients within

the first 24 h after symptom onset-1000Plus study. BMC Neurol. 9, 60.

doi: 10.1186/1471-2377-9-60

Isola, P., Zhu, J.-Y., Zhou, T., and Efros, A. A. (2018). Image-to-image

translation with conditional adversarial networks. arXiv:1611.07004 [cs].

doi: 10.1109/CVPR.2017.632

Kossen, T., Subramaniam, P., Madai, V. I., Hennemuth, A., Hildebrand, K.,

Hilbert, A., et al. (2021). Synthesizing anonymized and labeled TOF-MRA

patches for brain vessel segmentation using generative adversarial networks.

Comput. Biol. Med. 131, 104254. doi: 10.1016/j.compbiomed.2021.104254

Livne, M., Rieger, J., Aydin, O. U., Taha, A. A., Akay, E. M., Kossen, T.,

et al. (2019). A u-net deep learning framework for high performance vessel

segmentation in patients with cerebrovascular disease. Front. Neurosci. 13, 97.

doi: 10.3389/fnins.2019.00097

Lundervold, A. S., and Lundervold, A. (2019). An overview of deep learning

in medical imaging focusing on MRI. Zeitschrift für Medizinische Physik 29,

102–127. doi: 10.1016/j.zemedi.2018.11.002

Maaten, L. V. D., and Hinton, G. (2008). Visualizing data using t-SNE. J. Mach.

Learn. Res. 9, 2579–2605.

Mironov, I. (2017). “Renyi differential privacy,” in 2017 IEEE 30th Computer

Security Foundations Symposium (CSF) (Santa Barbara, CA: IEEE), 263–275.

Mukherjee, S., Xu, Y., Trivedi, A., Patowary, N., and Ferres, J. L. (2021). privGAN:

protecting GANs from membership inference attacks at low cost to utility. Proc.

Privacy Enhan. Technol. 2021, 142–163. doi: 10.2478/popets-2021-0041

Mutke, M. A., Madai, V. I., von Samson-Himmelstjerna, F. C., Zaro Weber, O.,

Revankar, G. S., Martin, S. Z., et al. (2014). Clinical evaluation of an arterial-

spin-labeling product sequence in steno-occlusive disease of the brain. PLoS

ONE 9, e87143. doi: 10.1371/journal.pone.0087143

Nguyen, D. C., Ding, M., Pathirana, P. N., Seneviratne, A., and Zomaya,

A. Y. (2021). Federated learning for COVID-19 detection with generative

adversarial networks in edge cloud computing. IEEE Internet Things J. 1–1.

doi: 10.1109/JIOT.2021.3120998

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,

et al. (2011). Scikit-learn: machine learning in python. Mach. Learn. Python 6,

2825–2830.

Schwarz, C. G., Kremers, W. K., Therneau, T. M., Sharp, R. R., Gunter, J.

L., Vemuri, P., et al. (2019). Identification of anonymous MRI research

participants with face-recognition software. N. Engl. J. Med. 381, 1684–1686.

doi: 10.1056/NEJMc1908881

Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). “Membership

inference attacks against machine learning models,” in 2017 IEEE Symposium

on Security and Privacy (SP) (San Jose, CA: IEEE), 3–18.

Taha, A. A., and Hanbury, A. (2015). Metrics for evaluating 3D medical

image segmentation: analysis, selection, and tool. BMC Med. Imaging 15, 29.

doi: 10.1186/s12880-015-0068-x

Torkzadehmahani, R., Kairouz, P., and Paten, B. (2019). “DP-CGAN: differentially

private synthetic data and label generation,” in 2019 IEEE/CVF Conference on

Computer Vision and Pattern Recognition Workshops (CVPRW) (Long Beach,

CA: IEEE), 98–104.

Tudosiu, P.-D., Varsavsky, T., Shaw, R., Graham, M., Nachev, P., Ourselin, S.,

et al. (2020). Neuromorphologicaly-preserving volumetric data encoding using

VQ-VAE. arXiv:2002.05692 [cs, eess, q-bio]. arXiv: 2002.05692.

Wang, L., Chen, W., Yang, W., Bi, F., and Yu, F. R. (2020). A State-of-the-Art

review on image synthesis with generative adversarial networks. IEEE Access

8, 63514–63537. doi: 10.1109/ACCESS.2020.2982224

Wang, Z., Bovik, A., Sheikh, H., and Simoncelli, E. (2004). Image quality

assessment: from error visibility to structural similarity. IEEE Trans. Image

Process. 13, 600–612. doi: 10.1109/TIP.2003.819861

Willemink, M. J., Koszek, W. A., Hardell, C., Wu, J., Fleischmann, D., Harvey, H.,

et al. (2020). Preparing medical imaging data for machine learning. Radiology

295, 4–15. doi: 10.1148/radiol.2020192224

Xie, L., Lin, K., Wang, S., Wang, F., and Zhou, J. (2018). Differentially private

generative adversarial network. arXiv:1802.06739 [cs, stat]. arXiv: 1802.06739.

Xu, C., Ren, J., Zhang, D., Zhang, Y., Qin, Z., and Ren, K. (2019). GANobfuscator:

mitigating information leakage under GAN via differential privacy. IEEE Trans.

Inf. Forensics Security 14, 2358–2371. doi: 10.1109/TIFS.2019.2897874

Yi, X., Walia, E., and Babyn, P. (2019). Generative adversarial network

in medical imaging: a review. Med. Image Anal. 58, 101552.

doi: 10.1016/j.media.2019.101552

Yoon, J., Drumright, L. N., and van der Schaar, M. (2020). Anonymization through

data synthesis using generative adversarial networks (ADS-GAN). IEEE J.

Biomed. Health Inform. 24, 2378–2388. doi: 10.1109/JBHI.2020.2980262

Yoon, J., Jordon, J., and van der Schaar, M. (2019). “PATE-GAN: generating

synthetic data with differential privacy guarantees,” in International Conference

on Learning Representations (New Orleans: ICLR).

Zhang, L., Shen, B., Barnawi, A., Xi, S., Kumar, N., and Wu, Y. (2021). FedDPGAN:

federated differentially private generative adversarial networks framework for

the detection of COVID-19 pneumonia. Inform. Syst. Front. 23, 1403–1415.

doi: 10.1007/s10796-021-10144-6

Zhu, G., Jiang, B., Tong, L., Xie, Y., Zaharchuk, G., and Wintermark,

M. (2019). Applications of deep learning to neuro-imaging

techniques. Front. Neurol. 10, 869. doi: 10.3389/fneur.20

19.00869

Conflict of Interest: TK, MH, VM, and AHi are employed by ai4medicine. FB

and AHe are employed by Fraunhofer. JS reports receipt of speakers’ honoraria

from Pfizer, Boehringer Ingelheim, and Daiichi Sankyo. JF has received consulting

and advisory board fees from BioClinica, Cerevast, Artemida, Brainomix,

Biogen, BMS, EISAI, and Guerbet. DF receiving grants from the European

Commission, reported receiving personal fees from and holding an equity interest

in ai4medicine.

The remaining authors declare that the research was conducted in the absence of

any commercial or financial relationships that could be construed as a potential

conflict of interest.

Publisher’s Note: All claims expressed in this article are solely those of the authors

and do not necessarily represent those of their affiliated organizations, or those of

the publisher, the editors and the reviewers. Any product that may be evaluated in

this article, or claim that may be made by its manufacturer, is not guaranteed or

endorsed by the publisher.

Pokutta, Sharma, Hilbert, Sobesky, Galinovic, Khalil, Fiebach and Frey. This is an

open-access article distributed under the terms of the Creative Commons Attribution

License (CC BY). The use, distribution or reproduction in other forums is permitted,

provided the original author(s) and the copyright owner(s) are credited and that the

original publication in this journal is cited, in accordance with accepted academic

practice. No use, distribution or reproduction is permitted which does not comply

with these terms.

Frontiers in Artificial Intelligence | www.frontiersin.org 12 May 2022 | Volume 5 | Article 813842

Supplementary Material

1 SUPPLEMENTARY DATA

Figure S1.

Visualization of real and generated images with and without differential privacy in a t-SNE

embedding. The distribution of real images and generated images without privacy almost entirely overlap.

In contrast to that, the images with privacy guarantees are only partly overlapping and cluster at the edges,

distant from the real images.

Figure S2.

Segmentation error maps for two example patients for a model with

= 1.3(A)

and

= 2.7

(B)

. Voxels in red are true positives, yellow represents false negatives and green false positives.

(A)

shows

many false negatives with few false positives. The Dice Similarity Coefficient (DSC) is 0.046 and the

balanced average Hausdorff distance (bAHD) 8.3.

(B)

shows many false positives with a DSC of 0.052 and

a bAHD of 190.5.

Part II

Image-to-Image Translation for

Stroke Treatment Planning

Image-to-Image Generative Adversarial

Networks for Synthesizing Perfusion

Parameter Maps from DSC-MR Images

in Cerebrovascular Disease

7.1 Context Within Thesis

GAN architectures can not only be utilized for private image synthesis but are also currently

state-of-the-art in the field of medical image-to-image translations. In the clinical setting

of stroke, the translation of DSC-MRI to perfusion parameter maps can be regarded as an

image-to-image translation. DSC-MRI-derived perfusion maps are crucial for stroke treatment

planning. Nowadays, perfusion maps are derived by placing an AIF based on selected voxels.

While this process can be automated, it only takes into account a few voxels and also requires

oversight by a medical expert. Since time is a critical resource for stroke patients, automatic

processing of DSC-MRI into expert-level perfusion maps would speed up treatment planning.

In this chapter, we show that GANs could be utilized to automatically derive expert-level

perfusion maps as an alternative approach to AIF-based approaches. To this end, we developed

an adapted version of the pix2pix GAN incorporating the time dimension of the DSC-MRI.

We tested our architecture on two datasets: 1) a dataset comprising stroke patients and 2) a

dataset containing patients with steno-occlusive disease.

7. Image-to-Image Generative Adversarial Networks for Synthesizing Perfusion

Parameter Maps from DSC-MR Images in Cerebrovascular Disease

7.2 Preprint

This chapter is based on the following preprint:

T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland,

A. Hilbert, J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative

adversarial networks for synthesizing perfusion parameter maps from DSC-MR images

in cerebrovascular disease”. In: medRxiv (2022). doi:

10.1101/2022.05.24.22274901

In this section the preprint, that is available on medRxiv, is reprinted. The article is open

access under the CC BY license.

Author Contribution

The first author Tabea Kossen conceptualized the study and interpreted the results together

with VIM, AH, KH and DF. She implemented the GANs architectures and evaluation

scripts. Additionally, she was responsible for the project administration and created the figures.

Together with VIM, she wrote the first version of the manuscript.

Code Availability

The code for this project is publicly available:

https://github.com/prediction2020/DSC-t

o-perfusion.

Image-to-image generative adversarial networks for synthesizing

perfusion parameter maps from DSC-MR images in cerebrovascular

disease

Tabea Kossen1,2∗, Vince I Madai1,3,4∗, Matthias A Mutke5, Anja Hennemuth2,6,7, Kristian Hildebrand8,

Jonas Behland1, Adam Hilbert1, Jan Sobesky9,10, Martin Bendszus5and Dietmar Frey1

1CLAIM - Charité Lab for AI in Medicine, Charité Universitätsmedizin Berlin, Germany

2Department of Computer Engineering and Microelectronics, Computer Vision & Remote Sensing, Technical University Berlin, Berlin, Germany

3QUEST Center for Responsible Research, Berlin Institute of Health (BIH), Charité - Universitätsmedizin Berlin, Berlin, Germany

4School of Computing and Digital Technology, Faculty of Computing, Engineering and the Built Environment, Birmingham City University, Birmingham, UK

5Department of Neuroradiology, Heidelberg University Hospital, Heidelberg, Germany

6Institute for Imaging Science and Computational Modelling in Cardiovascular Medicine, Charité Universitätsmedizin Berlin, Berlin, Germany

7Fraunhofer MEVIS, Bremen, Germany

8Department VI Computer Science and Media, Berlin University of Applied Sciences and Technology, Berlin, Germany

9Centre for Stroke Research Berlin, Charité Universitätsmedizin Berlin, Berlin, Germany

10Johanna-Etienne-Hospital, Neuss, Germany

ABSTRACT

Stroke is a major cause for death or disability. As imaging based patient stratification improves acute stroke therapy, dynamic

susceptibility contrast magnetic resonance imaging (DSC-MRI) is is of major interest to image brain perfusion. However,

expert-level perfusion maps require a manual or semi-manual post-processing by a medical expert making the procedure time-

consuming and less standardized. Modern machine learning methods such as generative adversarial networks (GANs) have the

potential to automate the perfusion map generation on an expert-level without manual validation. We propose a modified pix2pix

GAN with a temporal component (temp-pix2pix-GAN) that generates perfusion maps in an end-to-end fashion. We train our

model on perfusion maps infused with expert knowledge to encode it into the GANs. The performance was trained and evaluated

using the structural similarity index measure (SSIM) on two datasets including acute stroke patients and patients with steno-

occlusive disease. Our temp-pix2pix architecture showed high performance on the acute stroke dataset for all perfusion maps

(mean SSIM 0.92-0.99) and good performance on data including patients with steno-occlusive disease (mean SSIM 0.84-0.99).

While clinical validation is still necessary in future studies, our results mark an important step towards automated expert-level

perfusion maps and thus, fast patient stratification.

Keywords: stroke, perfusion weighted imaging, dynamic susceptibility contrast MR, cerebrovascular disease, generative ad-

versarial networks

1 INTRODUCTION

Ischemic stroke is a leading cause for death or disability worldwide1.

Standard treatment strategies include recanalization by mechanical

or pharmacological intervention, or a combination of both (Berge

et al. (2021); Turc et al. (2019)). In this context, the eligibility of pa-

tients for treatment is mainly based on large cohorts of interventional

trials that implement few imaging information (Lin et al. (2022);

McDermott et al. (2019)). However, this means that some patients

will not receive treatment that would be of benefit for them and, con-

versely, some patients will be subjected to futile treatment attempts

(Goyal et al. (2016)). An alternative approach to improve outcomes

is an individualized patient stratification based on specific patient

characteristics (Rehani et al. (2020); Sharobeam and Yan (2022)).

∗These authors contributed equally to this work

1WHO EMRO Stroke, Cerebrovascular Accident | Health Topics.

Available online at: http://www.emro.who.int/health-topics/

stroke-cerebrovascular-accident/index.html

One of the most important techniques for this approach is perfu-

sion weighted-imaging, a special imaging technique used in both

computed tomography (CT) and magnetic resonance imaging (MRI)

(Sharobeam and Yan (2022)). It provides highly relevant information

about (patho)physiological blood flow in and around the ischemic

brain tissue (Copen et al. (2011)). In MRI, the most commonly

used perfusion imaging technique is dynamic susceptibility contrast

(DSC) MRI (Jahng et al. (2014)). It measures brain perfusion by

injecting a gadolinium-based contrast agent into the patient’s blood

(Jahng et al. (2014)), followed by a series of T2- or T2*-weighted

MRI sequences that record the flow of the contrast agent through the

brain. The resulting 4D image is deconvolved voxel-wise with an ar-

terial input function (AIF) (Calamante (2013)). The tissue concen-

tration curve as well as the deconvolved curve result in interpretable

perfusion parameter maps such as the cerebral blood flow (CBF),

cerebral blood volume (CBV), mean transit time (MTT), time-to-

maximum (Tmax), and time-to-peak (TTP) (Calamante (2013)). Im-

portantly, the placement of the AIF is performed either in a semi-

manual or manual manner to achieve the highest quality. Addition-

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

2Kossen et al.

Figure 1. Workflow of study. Our GAN is trained on expert-level perfusion maps. The resulting model is able to synthesize perfusion maps from unseen data

without the need of manual AIF selection, at the same expert level that was present in the training data.

ally, automated methods exist that in some areas - such as in stroke -

require little input by experts to provide perfusion parameter maps of

high quality (Hansen et al. (2016); Ben Alaya et al. (2022); Krusche

et al. (2021)). In clinical practice, however, all existing methods of

AIF determination require at least some oversight by experts to rule

out faulty calculations due to suboptimal AIFs. This is a particular

challenge in stroke care, where time is a critical resource as it is one

of the most important determinants of clinical outcome. Therefore,

there is a great clinical need for novel automation approaches that

provide expert-level perfusion maps without the necessity for any

manual input.

One possible solution is the application of modern artificial intel-

ligence (AI) methods based on machine learning and here particu-

larly deep learning approaches. These have shown great promise for

solving medical imaging problems in the past years (Wernick et al.

(2010); Lundervold and Lundervold (2019)). Among deep learning

applications, generative adversarial networks (GANs) are particu-

larly promising for the generation of expert level perfusion maps.

For example, GANs can be presented both with an original image

and a processed image and learn to generate the processed image

from the original. This is achieved by the special architecture of

GANs: They consist of two neural networks that try to fool each

other (Goodfellow et al. (2014)). One network, the generator, syn-

thesizes a data sample such as an image, whereas the other network,

the discriminator, decides whether the sample looks like a real sam-

ple or not. At the end of the training, the generated sample should re-

semble the original as closely as possible. For image-to-image trans-

lations GANs are considered to be state-of-the-art in the medical

field (Yi et al. (2019); Zhu et al. (2020)) and a conditional GAN

such as the pix2pix GAN can be applied (Isola et al. (2018)). For ex-

ample, pix2pix GANs have been successfully applied to transform

MR images to CT images (cross-modal) or to transform 3T MR im-

ages to 7T MR images (intramodal) (Brou Boni et al. (2020); Nie

et al. (2018)).

Given that the translation of a time-series of perfusion informa-

tion from source images to a single perfusion map can be seen as a

highly similar medical image-to-image translation problem, GANs

are a highly promising method for this use case. Preliminary work

on GANs for the translation of time-series in dynamic cine appli-

cations has been published (Ghodrati et al. (2021)). Yet, to the best

of our knowledge no study has investigated the generation of DSC

perfusion images from perfusion source data so far.

Thus, we propose a modified slice-wise pix2pix GAN with a tem-

poral component (temp-pix2pix-GAN) to account for the time di-

mension in DSC source perfusion imaging. Our GAN model auto-

matically generates perfusion parameter maps in an end-to-end fash-

ion. We train our model on expert-level perfusion parameter maps

(see Figure 1). The performance of our temp-pix2pix GAN model

is compared to a standard pix2pix GAN without a temporal com-

ponent. We train and test our approach on two different datasets in-

cluding acute stroke patients as well as patients with chronic cere-

brovascular disease.

2 MATERIALS AND METHODS

2.1 Data

In total, 276 patients were included in this study. 204 patients from

study Heidelberg suffered from acute stroke. 204 patients from a

study performed at Heidelberg University Hospital that suffered

from acute stroke. Imaging was performed with a T2*-weighted

gradient-echo EPI sequence with fat supression TR=2220ms,

TE=36ms, flip angle 90◦, field of view: 240x240mm2, image matrix:

128x128mm, 25-27 slices with ST of 5mm and was started simulta-

neously with bolus injection of a standard dose (0.1mmol/kg) of an

intravenous gadolinium-based contrast agent on 3 Tesla MRI sys-

tems (Magnetom Verio, TIM Trio and Magnetom Prisma; Siemens

Healthcare, Erlangen, Germany). In total, 50 to 75 dynamic mea-

surements were performed (including at least eight prebolus mea-

surements). Bolus and prebolus were injected with a pneumatically

driven injection pump at an injection rate of 5ml/s. The study pro-

tocol for this retrospective analysis of our prospectively established

stroke database was approved by the ethics committee of Heidelberg

University and patient informed consent was waived.

72 patients with steno-occlusive disease were included from the

PEGASUS study (Mutke et al. (2014)). 80 whole-brain images

were recorded using a single-shot FID-EPI sequence (TR=1390ms,

TE=29ms, voxel size: 1.8x1.8x5mm3) after injection of 5ml

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Image-to-image GAN for synthesizing perfusion maps 3

Gadovist (Gadobutrol, 1 M, Bayer Schering Pharma AG, Berlin)

followed by 25ml saline flush by a power injector (Spectris, Medrad

Inc., Warrendale PA, USA) at a rate of 5ml/s. The acquisition time

was 1:54 minutes. All patients gave their written informed consent

and the study has been authorized by the ethical review committee

of Charite - Universitatsmedizin Berlin.

DSC post-processing was performed blinded to clinical outcome.

For the acute stroke data from Heidelberg, DSC data were post-

processed with Olea Sphere®(Olea Medical, La Ciotat, France), au-

tomatic motion correction was applied. Raw DSC images were used

to calculate perfusion maps of time-to-peak (TTP) from the tissue

response curve. Maps of cerebral blood flow (CBF), cerebral blood

volume (CBV), mean transit time (MTT), and time-to-maximum

(Tmax) were created by deconvolution of a regional concentration

time curve with an arterial input function (AIF). Block-circulant sin-

gular value decomposition (cSVD) deconvolution was applied. The

arterial input function (AIF) was detected automatically. All AIFs

were visually inspected by a neuroradiology expert (MAM, over 6

years experience in perfusion imaging) and only in two cases the

automatically detected AIF needed to be manually corrected.

For PEGASUS patients, DSC data were post-processed with the

PGui software (Version 1.0, provided for research purposes by the

Center for functional neuroimaging, Aarhus University, Denmark).

Motion correction was not available. Raw DSC images were used

to calculate perfusion maps of TTP from the tissue response curve.

Maps of CBF, CBV, MTT, and Tmax were created by deconvolution

of a regional concentration time curve with an AIF. Parametric de-

convolution was applied (Mouridsen et al. (2014)). For each patient,

an AIF was determined by a junior rater (JB, 2 years experience in

perfusion imaging) by manual selection of three or four intravas-

cular voxels of the MCA M2 segment contralateral to the side of

stenosis minimizing partial volume effects and bolus delay. The AIF

shape was visually assessed for peak sharpness, bolus peak time and

amplitude width (Calamante (2013); Thijs et al. (2004)). The AIFs

were inspected by a senior rater (VIM, over 12 years of experience

in perfusion imaging).

The post-processed data was split into a training (acute stroke

data: 142, PEGASUS: 50 patients), validation (acute stroke data: 20,

PEGASUS: 8 patients) and test (acute stroke data: 41, PEGASUS:

12 patients) set. The models were trained on the respective training

set and the hyperparameters were selected based on the performance

on the validation test. The generalizable performance was estimated

by the performance of the test set. The acute stroke data was resized

to 21 slices each containing 128x128 voxels. The DSC source was

rescaled to 80 time points. All images of one parameter map as well

as the DSC source images were normalized between -1 and +1 and

split into slices.

2.2 General methodological approach

We utilized a special type of AI model that was developed for gen-

erating an image based on the input of another image: the pix2pix

GAN (Isola et al. (2018)). A pix2pix GAN consists of two neural

networks that try to mislead each other. The first network, the gener-

ator, aims to produce realistic looking images based on another im-

age (e.g. produce a CT based on a MR image), whereas the second

network, the discriminator, tries to distinguish between the gener-

ated and real images. Based on the discriminator’s feedback, both

networks get better in their respective tasks.

Typically, the input and output to a pix2pix GAN generator is a 2D

image. For this use-case, we modified the pix2pix GAN to take a 3D

image (time sequence of the 2D DSC source image) as an input and

synthesize the corresponding 2D perfusion map slice (e.g. Tmax).

In this work we implemented two different generator architectures.

The first architecture, the classical pix2pix GAN, took in the 3D in-

put image without accounting for the temporal relation between the

images. In contrast to that, the second architecture, the temp-pix2pix

GAN, was designed to first extract the temporal relation between the

images followed by the transformation to the output image (see Fig-

ure 2). In the following, the technical details of the two approaches

are described in depth.

2.3 Network architecture

The GAN architecture was adapted from the pix2pix GAN (Isola

et al. (2018)). In our first architecture we utilized the original U-

Net generator as proposed in the paper with the time steps being

represented in the channels. For the second architecture we modified

the U-Net by adding 3D temporal convolutions before feeding the

result into the U-Net in the generator (see Figure 2).

Both GAN architectures consisted of two neural networks: the

generator G and the discriminator D. On the one hand, the genera-

tor’s task was to synthesize perfusion parameter maps such as Tmax

or CBF from the DSC source image. The discriminator, on the other

hand, learned to distinguish between the real DSC source image to-

gether with the real perfusion parameter map and the real DSC with

the generated perfusion parameter map.

In general, the objective function of a conditional GAN such as

the pix2pix GAN is:

LcGAN(G,D) = Ex,y[logD(x,y)] +Ex,z[log(1−D(x,G(x,z))] (1)

where xis the input image (DSC source in our case) and ythe output

image (Tmax for example) and za noise vector. The generator tries

to maximize the objective which is achieved when the discrimina-

tor outputs a high probability of the generated image pair being real

and a low probability for the real image pair respectively. In con-

trast to that, the discriminator tries to minimize this objective and

identify the real input images. The pix2pix GAN does not directly

incorporate the noise vector zbut introduces noise in the network

using dropout in the generator.

The loss of the generator consisted of two parts. The first part

was the adversarial loss which took into account the feedback of the

discriminator as described above. Additionally, a reconstruction loss

directly penalized deviation from the original image:

lossL1 =∥y−G(x,z)∥1(2)

This second loss was added to the adversarial loss and weighted by a

scalar λwhich was set to 1. The pix2pix generator was a U-Net with

6 down- and upsampling layers (see Figure 2B). One DSC source

slice at a time was fed as an input to the generator. The different

time points of the DSC were concatenated in the channel dimension.

Each downsampling layer consisted of a batch normalization layer

as well as a LeakyReLU with slope 0.2 and the upsampling layers of

ConvTranspose-layers, batch normalization and a ReLU activation.

After the last convolution, a tanh was applied.

In contrast to that, the generator of the temp-pix2pix GAN took

one slice of the DSC source at all time points as an input. The time

sequence of slices was then fed through 6 3D convolutions over the

time dimension iteratively reducing this dimension to 1. Each con-

volutional layer was followed by a batch normalization layer and a

LeakyReLu with slope 0.2. After the temporal path, the output was

fed into a 2D U-Net with convolutions over the spatial dimensions

with 6 down- and upsampling layers as described above (see Fig-

ure 2C).

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

4Kossen et al.

Figure 2. Architecture of the pix2pix and temp-pix2pix GAN. A shows the overall GAN architecture whereas B and C depict the two different generators and

D the discriminator.

The discriminator adapted the architecture of the discriminator

from the PatchGAN as suggested by Isola et al. (2018). It con-

sisted of 3 convolutional layers with batch normalization and a

LeakyReLU activation function followed by another convolutional

layer and a sigmoid activation function (see Figure 2D). For both

the generator and discriminator the kernel size was 4 with strides of

2.4 Training

For each architecture, 5 GANs were trained on the acute stroke

dataset from Heidelberg for each of the five parameter maps (CBF,

CBV, MTT, Tmax and TTP). The models were trained for 100

epochs with a learning rate of 0.0001 for both generator and dis-

criminator using the Adam optimizer with β1=0.5 and β2=0.999.

The batch size was 4 and dropout 0. As the PEGASUS dataset was

smaller, the models trained on the acute stroke data served as a

weight initialization for the PEGASUS models and were then fur-

ther trained for 50 epochs. Thus, in total, 10 models were trained per

architecture.

All hyperparameters mentioned above were tuned and selected

according to visual inspection and the performance on the valida-

tion set. Due to the computational limitations an automated search

was not feasible. The code was implemented in PyTorch and is pub-

licly available2. The models were trained on a TESLA V100 GPU

(NVIDIA Corporation, Santa Clara, CA, USA).

2.5 Performance evaluation

The generated images were first visually inspected. Additionally,

four metrics were applied: the mean absolute error (MAE) or

L1 norm of the error, the normalized root mean squared error

(NRMSE), the structural similarity index measure (SSIM) and the

peak-signal-to-noise-ratio (PSNR).

The MAE is defined voxel-wise and measure the average absolute

of the error between the real image yand the generated image ˆy:

MAE =1

∑

i=1

|yi−ˆyi|(3)

The NRMSE is defined as the root mean squared error normalized

by average euclidean norm of the true image y:

NRMSE =RMSE

n∑n

i=1y2

(4)

2https://github.com/prediction2020/DSC-to-perfusion

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Image-to-image GAN for synthesizing perfusion maps 5

Figure 3. Synthesized perfusion parameter maps (middle and bottom row) compared to the ground truth reviewed by an expert (top row) for one representative

patient from the acute stroke test dataset. The perfusion parameter maps generated by the temp-pix2pix all look similar to the ground truth whereas the time-

dependent parameters (Tmax and TTP) are not well captured by the pix2pix GAN.

with

RMSE =s1

∑

i=1

(yi−ˆyi)2(5)

The SSIM is defined as a combination of luminance, contrast and

structure and can be summed up as:

SSIM(y,ˆy) = (2µyµˆy+c1)(2σyˆy+c2)

(µ2

y+µ2

ˆy+c1)(σ2

y+σ2

ˆy+c2),(6)

where µyand µˆyare the average values of yand ˆyrespectively, σy

the variance and σyˆythe covariance. c1and c2are constants for stabi-

lization and defined as c1= (k1L)2and c2= (k2L)2with Lbeing the

dynamic range of the pixel values and k1,k2≪1 small constants.

The higher the SSIM, the more similar are the two images with 1

denoting the highest similarity. The PSNR is defined as:

PSNR =10logMAXI

MSE (7)

with MAXIbeing the maximal possible pixel/voxel value. It de-

scribes the ratio between the maximal possible signal power and

noise power contained in the sample.

3 RESULTS

Visual inspection of the results of the acute stroke dataset showed

that the perfusion parameter maps generated by the temp-pix2pix

GAN looked similar to the ground truth (see Figure 3). For the

pix2pix model, on the other hand, only the CBF, CBV and MTT

were of sufficient quality, whereas the time-dependent parameters

TTP and Tmax did not consistently resemble the ground truth (also

Figure 3).

The quantitative analysis in the acute stroke dataset revealed for

all parameter maps a high SSIM ranging from 0.92-0.99 for the

temp-pix2pix model (Figure 4). In contrast to this, the pix2pix GAN

showed a comparable or worse SSIM ranging from 0.86-0.98. A per-

formance difference between the pix2pix and temp-pix2pix model

was especially prominent for Tmax and TTP (SSIM 0.92 vs 0.86 and

0.95 vs 0.91, respectively). For the PEGASUS dataset, the perfusion

maps generated by both the fine-tuned pix2pix and temp-pix2pix

GAN look similar to the ground truth (see Figure 5). For both net-

works, MTT appeared to be the least well reconstructed parame-

ter map which is also reflected in the metrics (Figure 6). Further-

more, the high intensities of Tmax were not well captured by the

pix2pix GAN (Figure 5). The performance metrics of the pix2pix

and temp-pix2pix GAN and the ground truth for the PEGASUS

dataset showed low error and high SSIM and PSNR for CBF, CBV

and Tmax. Here, for most metrics, the temp-pix2pix GAN achieved

a slightly better performance in contrast to the pix2pix GAN. For

MTT and TTP the temp-pix2pix showed a better performance com-

pared to the pix2pix GAN (SSIM 0.84 vs 0.78 and 0.86 vs 0.82

respectively). Overall, the metrics of the synthesized MTT and TTP

maps obtained a worse performance compared to the other parame-

ter maps. Figure 7A showed two patients whose generated parame-

ters showed the worst performance. For the acute stroke dataset these

are two Tmax maps (Figure 7A, first and second column). Whereas

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

6Kossen et al.

Figure 4. Mean performance metrics for evaluating the similarity between the ground truth and the synthesized parameter maps generated by the pix2pix GAN

(green) and the temp-pxi2pix GAN (blue) on the acute stroke dataset. A and B show the mean absolute error (MAE) and normalized mean root squared error

(NRMSE) respectively (the lower the better). C and D show the structural similarity index measure (SSIM) and the peak-signal-to-noise-ratio (PSNR) (the

higher the better). For all parameter maps the temp-pix2pix architecture shows a better or comparable performance compared to the pix2pix GAN. For the

time-dependent parameter maps Tmax and TTP the difference between the pix2pix and temp-pix2pix GAN performance is larger than for the other three maps.

The errorbar represents the standard deviation.

the generated Tmax in the first column did not capture the high inten-

sities well, the generated map in the second column visually looked

well. For the PEGASUS models, MTT performed the worst (Fig-

ure 7A, third and fourth column). In the third column the generated

MTT appears less noisy than the ground truth. In contrast to that, in

the fourth column the generated MTT map looked noisier compared

to the ground truth. Figure 7B showed the Tmax maps generated by

the temp-pix2pix and pix2pix GAN for four patients for which an

AIF could not be placed.

4 DISCUSSION

In the present study, we propose a novel pix2pix GAN variant with

temporal convolutions - coined temp-pix2pix - to generate expert-

level perfusion parameter maps from DSC-MR images in an end-to-

end fashion for the first time. The temp-pix2pix architecture showed

high performance in a dataset of acute stroke patients and good per-

formance on data of patients with chronic steno-occlusive disease.

Our results mark a decisive step towards the automated generation

of expert-level DSC perfusion maps for acute stroke and their appli-

cation in the clinical setting.

In acute stroke, “time is brain” (Saver (2006)). This requires rapid

decision making in the clinical setting to ensure an optimal outcome

for an affected patient. In such a situation, when DSC perfusion-

weighted imaging is used to stratify patients for treatment, a ma-

jor bottleneck is the generation of parameter maps derived from the

DSC source images. These maps of TTP, CBF, CBV, MTT, and

Tmax are different representations of the information encoded in

the time-intensity curve for each voxel. For all except TTP, to de-

rive robust and valid parameter maps, the time-intensity curve must

be deconvolved with an AIF (Calamante (2013)). Ideally, the AIF

is derived for each voxel separately, but in the clinical setting the

calculation of a global AIF is preferred (Calamante (2013)). The

gold standard is the manual selection of several - usually 3 or 4 -

AIFs in the hemisphere contralateral to the stroke, from segments of

the middle cerebral artery (Calamante (2013)). The manual selection

of AIFs is a tedious and time-consuming process that can only be

performed after training (Calamante (2013)). Therefore, automated

methods whose results are subsequently reviewed by an expert are

preferred in clinical practice (Calamante (2013)). While automated

methods have shown inconclusive results in the literature (Hansen

et al. (2016); Ghodrati et al. (2021); Galinovic et al. (2012); Pistoc-

chi et al. (2022); Deutschmann et al. (2021)), they are successfully

used in acute stroke to identify stroke-affected tissue. In our sample,

this was confirmed as the AIFs only required expert adjustment in

two out of 204 patients in the acute stroke set. Nevertheless, this ap-

proach still requires a manual check resulting in a time delay of a few

minutes per patient before patient stratification. As a consequence,

there is a major clinical need for automated methods that provide

final perfusion parameter maps without any manual input. Here, we

chose a GAN AI approach, as presenting this methodology expert

level perfusion maps would lead to a model after training that could

then generate expert-level perfusion maps, implicitly encoding the

choice of AIFs within ∼1.8 seconds per patient. Our exploratory

results show that this approach was successful.

This may have a positive impact on the clinical setting. First, it

would eliminate the need for manual review of AIFs. This would

reduce the time needed to calculate perfusion parameter maps and

also reduce resource requirements as radiologists and neurologists

would no longer need to be trained on how to identify optimal AIFs.

Second, as we have shown, it is even possible to calculate parameter

maps for patients who currently have to be excluded due to motion

artifacts that make it impossible for the standard software to calcu-

late the parameter maps. At this point, it is important to emphasize

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Image-to-image GAN for synthesizing perfusion maps 7

Figure 5. Synthesized perfusion parameter maps (middle and bottom row) compared to the ground truth reviewed by an expert (top row) for one representative

patient from the PEGASUS test dataset. Both pix2pix and temp-pix2pix GAN synthesized most parameter maps that resemble the ground truth. Parts of MTT

were not entirely captured by pix2pix and temp-pix2pix. Moreover, the pix2pix GAN did not synthesize the higher intensities of Tmax well. For MTT and

Tmax, the temp-pix2pix GAN showed better performance in all metrics compared to the pix2pix GAN.

Figure 6. Mean performance metrics for evaluating the similarity between the ground truth and the synthesized parameter maps generated by the pix2pix GAN

(green) and the temp-pxi2pix GAN (blue) on the PEGASUS dataset. A and B show the mean absolute error (MAE) and normalized mean root squared error

(NRMSE) respectively (the lower the better). C and D show the structural similarity index measure (SSIM) and the peak-signal-to-noise-ratio (PSNR) (the

higher the better). For most metrics and parameter maps the temp-pix2pix architecture shows a better performance compared to the pix2pix GAN. In terms of

the metrics, the generated MTT maps showed the worst performance. The errorbar represents the standard deviation.

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

8Kossen et al.

Figure 7. The two patients with the poorest performance according to the metrics for each of the two datasets (A) and patients for which no AIF could be

computed (B). A: The first and second column show Tmax for two acute stroke patients. Whereas the synthesized image in the first column does not fully

capture the hypoperfused areas, the generated image in the second column looks quite close to the ground truth. Column three and four show MTT for two

PEGASUS patients. While the generated image in the third column shows less noise than the ground truth, the GAN introduced noise in the fourth column in the

synthesized image. B: Four Tmax maps generated by temp-pix2pix (upper row) and pix2pix (lower row) for cases from the acute stroke data for which no AIF

could be computed and, thus, with conventional methods not imaging would be available. Note that since motion artifacts affect the quality of the time-series,

in these cases the baseline pix2pix performs better than the temp-pix2pix.

that our study is exploratory and the generated model was and is only

used for internal research purposes. This is due to the fact that the

generative AI has fundamentally learned to approximate the non-AI

algorithm that was originally used to calculate the perfusion parame-

ter maps. To maximize clinical impact, we thus encourage the devel-

opers and vendors of relevant clinically used perfusion software to

consider adding GAN-based automated perfusion calculation mod-

ules to their products. To facilitate this process, we have made our

code publicly available.

One of the most important contributions of our approach was the

consideration of the temporal dimension of the time series input. Not

surprisingly, the temp-pix2pix architecture performed better than the

pix2pix GAN without a temporal component in both datasets. This

was particularly noticeable in the acute stroke dataset for parame-

ters directly related to the correct order of the time intensity curve,

namely TTP and Tmax. Maps of CBF, CBV and MTT (derived by

the central volume theorem as CBV/CBF) also performed quite well

in the baseline architecture without a temporal component, as for

these maps the order of input is not as relevant. This is because

CBV corresponds to the area under the time intensity curve and

CBF is calculated based on the height of the slope, which are in-

different to the order. In the chronic stroke dataset, the temp-pix2pix

also outperformed the baseline GAN without a temporal component.

However, the difference in performance was not as pronounced as in

the acute stroke dataset. This could be due to the fact that patients

with acute vascular obstruction usually have significantly higher de-

lays than patients with chronic steno-occlusive disease, and the per-

formance advantage of temp-pix2pix increases with increasing de-

lay. It is noteworthy that in contrast to the acute stroke patients in

the chronic steno-occlusive cohort, MTT and TTP maps performed

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Image-to-image GAN for synthesizing perfusion maps 9

worse than the other parameter maps. This might be related to the

more complex perfusion pathophysiology in chronic steno-occlusive

disease. Whereas in acute stroke, delay is the main contributor to

blood flow abnormalities, in chronic steno-occlusive disease it is the

sum of delay and considerable dispersion due to vessel abnormalities

(Calamante et al. (2006)). This could pose particular difficulties for

neural networks to learn the relationships required to create param-

eter maps: MTT is as a parameter that depends on two other param-

eters (CBV and CBF) in the original software solutions, which are

likely to have greater variability in chronic steno-occlusive disease.

Addtionally, TTP delays are attributable to both delay and disper-

sion, with varying weights in individual patients leading again to a

larger variability (this effect is much less pronounced in Tmax pa-

rameter maps due to the deconvolution procedure). Such increased

variability might lead to less stable models and thus increased noise

in the generated maps.

Our work is the first work to utilize GANs to create perfusion

parameter maps in DSC-imaging. A few works exist that used dif-

ferent machine learning and deep learning methods to generate pa-

rameter perfusion maps from the DSC source image. For instance,

McKinley et al. (2018) used several classical voxel-wise machine

learning approaches to generate manually validated perfusion pa-

rameter maps and identified a tree-based algorithm as the best per-

forming model. Their best results for Tmax achieved a lower per-

formance with a NRMSE of 0.113 compared to our best model with

a NRMSE of 0.095. Vialard et al. (2021) suggested a deep learn-

ing based spatiotemporal U-net approach for translating DSC-MR

patches to CBV maps in patients with brain tumors. With a SSIM

of 0.821 their generated CBV maps obtained a worse performance

compared to our CBV generated by the temp-pix2pix model with

a SSIM of 0.986. In the field of stroke, Ho et al. (2016) proposed

a patch-based deep learning approach to generate CBF, CBV, MTT

and Tmax. The average RMSE for their generated Tmax showed a

higher error of 1.33 compared to ours with 0.06. Hess et al. (2019)

utilized a different voxel-wise deep learning approach to approxi-

mate Tmax from DSC-MR. This approach was clinically evaluated

in another study (Meier et al. (2019)). In Hess et al. (2019) they

reported the performance in terms of MAE with clipping to not ac-

count for noise. Their generated Tmax achieved a MAE with clip-

ping of 0.524 compared to our approach showing a MSE of 0.016.

These differences compared to our study might be due to the novel

use of the GAN method and the fact that our model considered whole

slices instead of patches to better account for the spatial dimension.

Our study has several limitations. First, our network was based on

2D slices instead of the full 3D volumes due to computations restric-

tions. It is likely that results could be improved further using the full

3D images. Secondly, our study is an exploratory hypothesis gen-

erating study. Its results need to be clinically validated in a future

study before an integrating into clinical practice would be possible.

Lastly, our approach so far is a black-box approach. It could be ex-

tended with explainable AI to generate insights which areas in the

source images are particularly relevant for the creation of different

perfusion parameter maps. This could further elucidate the causes

of the performance differences between maps that we identified and

could guide the way for further improvements.

5 CONCLUSION

We generated expert-level perfusion parameter maps using a novel

GAN approach showcasing that AI approaches might have the abil-

ity to overcome the need for oversight by medical experts. Our ex-

ploratory study paves the way for fully-automated DSC-MR pro-

cessing for faster patient stratification in acute stroke. In the clinical

setting where time is crucial for patient outcome, this could have a

big impact on standardized patient care in acute stroke.

DISCLOSURES

Tabea Kossen reported receiving personal fees from ai4medicine

outside the submitted work. Dr Madai reported receiving personal

fees from ai4medicine outside the submitted work. Adam Hilbert re-

ported receiving personal fees from ai4medicine outside the submit-

ted work. While not related to this work, Dr Sobesky reports receipt

of speakers’ honoraria from Pfizer, Boehringer Ingelheim, and Dai-

ichi Sankyo. Dr Frey reported receiving grants from the European

Commission, reported receiving personal fees from and holding an

equity interest in ai4medicine outside the submitted work.

ACKNOWLEDGEMENTS

Computation has been performed on the HPC for the Research clus-

ter of the Berlin Institute of Health.

REFERENCES

I. Ben Alaya, H. Limam, and T. Kraiem. Applications of artificial intel-

ligence for DWI and PWI data processing in acute ischemic stroke:

Current practices and future directions. Clinical Imaging, 81:79–86,

Jan. 2022. ISSN 0899-7071. doi:10.1016/j.clinimag.2021.09.015.

URL https://www.sciencedirect.com/science/article/pii/

S0899707121003880.

E. Berge, W. Whiteley, H. Audebert, G. De Marchis, A. C. Fonseca,

C. Padiglioni, N. Pérez de la Ossa, D. Strbian, G. Tsivgoulis, and

G. Turc. European Stroke Organisation (ESO) guidelines on intravenous

thrombolysis for acute ischaemic stroke. European Stroke Journal, 6(1):

I–LXII, Mar. 2021. ISSN 2396-9873. doi:10.1177/2396987321989865.

URL https://doi.org/10.1177/2396987321989865. Publisher:

SAGE Publications.

K. N. D. Brou Boni, J. Klein, L. Vanquin, A. Wagner, T. Lacornerie,

D. Pasquier, and N. Reynaert. MR to CT synthesis with multicenter data

in the pelvic area using a conditional generative adversarial network.

Physics in Medicine & Biology, 65(7):075002, Apr. 2020. ISSN 1361-

6560. doi:10.1088/1361-6560/ab7633. URL https://iopscience.

iop.org/article/10.1088/1361-6560/ab7633.

F. Calamante. Arterial input function in perfusion MRI: A comprehensive

review. Progress in Nuclear Magnetic Resonance Spectroscopy, 74:

1–32, Oct. 2013. ISSN 0079-6565. doi:10.1016/j.pnmrs.2013.04.002.

URL https://www.sciencedirect.com/science/article/pii/

S0079656513000514.

F. Calamante, L. Willats, D. G. Gadian, and A. Connelly. Bolus delay and

dispersion in perfusion MRI: implications for tissue predictor models in

stroke. Magnetic Resonance in Medicine, 55(5):1180–1185, May 2006.

ISSN 0740-3194. doi:10.1002/mrm.20873.

W. A. Copen, P. W. Schaefer, and O. Wu. MR Perfusion Imaging in Acute

Ischemic Stroke. Neuroimaging clinics of North America, 21(2):259–

283, May 2011. ISSN 1052-5149. doi:10.1016/j.nic.2011.02.007. URL

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3135980/.

H. Deutschmann, N. Hinteregger, U. Wießpeiner, M. Kneihsl, S. Fandler-

Höfler, M. Michenthaler, C. Enzinger, E. Hassler, S. Leber, and

G. Reishofer. Automated MRI perfusion-diffusion mismatch estima-

tion may be significantly different in individual patients when using dif-

ferent software packages. European Radiology, 31(2):658–665, Feb.

2021. ISSN 1432-1084. doi:10.1007/s00330-020-07150-8. URL

https://doi.org/10.1007/s00330-020-07150-8.

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

10 Kossen et al.

I. Galinovic, A.-C. Ostwaldt, C. Soemmer, H. Bros, B. Hotter, P. Bru-

necker, and J. B. Fiebach. Automated vs manual delineations of re-

gions of interest- a comparison in commercially available perfusion MRI

software. BMC Medical Imaging, 12(1):16, July 2012. ISSN 1471-

2342. doi:10.1186/1471-2342-12-16. URL https://doi.org/10.

1186/1471-2342-12-16.

V. Ghodrati, M. Bydder, A. Bedayat, A. Prosper, T. Yoshida, K.-L. Nguyen,

J. P. Finn, and P. Hu. Temporally aware volumetric generative ad-

versarial network-based MR image reconstruction with simultaneous

respiratory motion compensation: Initial feasibility in 3D dynamic cine

cardiac MRI. Magnetic Resonance in Medicine, 86(5):2666–2683,

2021. ISSN 1522-2594. doi:10.1002/mrm.28912. URL https:

//onlinelibrary.wiley.com/doi/abs/10.1002/mrm.28912.

_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/mrm.28912.

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,

S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks.

arXiv:1406.2661 [cs, stat], June 2014. URL http://arxiv.org/abs/

1406.2661. arXiv: 1406.2661.

M. Goyal, B. K. Menon, W. H. van Zwam, D. W. J. Dippel, P. J. Mitchell,

A. M. Demchuk, A. Dávalos, C. B. L. M. Majoie, A. van der Lugt, M. A.

de Miquel, G. A. Donnan, Y. B. W. E. M. Roos, A. Bonafe, R. Jahan,

H.-C. Diener, L. A. van den Berg, E. I. Levy, O. A. Berkhemer, V. M.

Pereira, J. Rempel, M. Millán, S. M. Davis, D. Roy, J. Thornton, L. S.

Román, M. Ribó, D. Beumer, B. Stouch, S. Brown, B. C. V. Campbell,

R. J. van Oostenbrugge, J. L. Saver, M. D. Hill, T. G. Jovin, and HER-

MES collaborators. Endovascular thrombectomy after large-vessel is-

chaemic stroke: a meta-analysis of individual patient data from five ran-

domised trials. Lancet (London, England), 387(10029):1723–1731, Apr.

2016. ISSN 1474-547X. doi:10.1016/S0140-6736(16)00163-X.

M. B. Hansen, K. Nagenthiraja, L. R. Ribe, K. H. Dupont,

L. Østergaard, and K. Mouridsen. Automated estimation of

salvageable tissue: Comparison with expert readers. Jour-

nal of Magnetic Resonance Imaging, 43(1):220–228, 2016.

ISSN 1522-2586. doi:10.1002/jmri.24963. URL https:

//onlinelibrary.wiley.com/doi/abs/10.1002/jmri.24963.

_eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/jmri.24963.

A. Hess, R. Meier, J. Kaesmacher, S. Jung, F. Scalzo, D. Liebeskind,

R. Wiest, and R. McKinley. Synthetic Perfusion Maps: Imaging Per-

fusion Deficits in DSC-MRI with Deep Learning. In A. Crimi, S. Bakas,

H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum, editors, Brainlesion:

Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Lec-

ture Notes in Computer Science, pages 447–455, Cham, 2019. Springer

International Publishing. ISBN 978-3-030-11723-8. doi:10.1007/978-3-

030-11723-8_45.

K. C. Ho, F. Scalzo, K. V. Sarma, S. El-Saden, and C. W. Arnold. A temporal

deep learning approach for MR perfusion parameter estimation in stroke.

In 2016 23rd International Conference on Pattern Recognition (ICPR),

pages 1315–1320, Cancun, Dec. 2016. IEEE. ISBN 978-1-5090-4847-2.

doi:10.1109/ICPR.2016.7899819. URL http://ieeexplore.ieee.

org/document/7899819/.

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros. Image-to-Image Transla-

tion with Conditional Adversarial Networks. arXiv:1611.07004 [cs],

Nov. 2018. URL http://arxiv.org/abs/1611.07004. arXiv:

1611.07004.

G.-H. Jahng, K.-L. Li, L. Ostergaard, and F. Calamante. Perfusion Mag-

netic Resonance Imaging: A Comprehensive Update on Principles and

Techniques. Korean Journal of Radiology, 15(5):554, 2014. ISSN 1229-

6929, 2005-8330. doi:10.3348/kjr.2014.15.5.554. URL https://www.

kjronline.org/DOIx.php?id=10.3348/kjr.2014.15.5.554.

C. Krusche, C. Rio Bartulos, M. Abu-Mugheisib, M. Haimerl, and P. Wigger-

mann. Dynamic perfusion analysis in acute ischemic stroke: A compara-

tive study of two different softwares. Clinical Hemorheology and Micro-

circulation, 79(1):55–63, Jan. 2021. ISSN 1386-0291. doi:10.3233/CH-

219106. URL https://content.iospress.com/articles/

clinical-hemorheology-and-microcirculation/ch219106.

Publisher: IOS Press.

C.-H. Lin, J. L. Saver, B. Ovbiagele, W.-Y. Huang, and M. Lee. Endovas-

cular thrombectomy without versus with intravenous thrombolysis in

acute ischemic stroke: a non-inferiority meta-analysis of randomized

clinical trials. Journal of NeuroInterventional Surgery, 14(3):227–232,

Mar. 2022. ISSN 1759-8478, 1759-8486. doi:10.1136/neurintsurg-

2021-017667. URL https://jnis.bmj.com/content/14/3/227.

Publisher: British Medical Journal Publishing Group Section: Ischemic

stroke.

A. S. Lundervold and A. Lundervold. An overview of deep

learning in medical imaging focusing on MRI. Zeitschrift

für Medizinische Physik, 29(2):102–127, May 2019. ISSN

0939-3889. doi:10.1016/j.zemedi.2018.11.002. URL

https://www.sciencedirect.com/science/article/pii/

S0939388918301181.

M. McDermott, L. E. Skolarus, and J. F. Burke. A systematic re-

view and meta-analysis of interventions to increase stroke throm-

bolysis. BMC Neurology, 19(1):86, May 2019. ISSN 1471-2377.

doi:10.1186/s12883-019-1298-2. URL https://doi.org/10.1186/

s12883-019-1298-2.

R. McKinley, F. Hung, R. Wiest, D. S. Liebeskind, and F. Scalzo. A Ma-

chine Learning Approach to Perfusion Imaging With Dynamic Suscep-

tibility Contrast MR. Frontiers in Neurology, 9, 2018. ISSN 1664-2295.

doi:10.3389/fneur.2018.00717. URL https://www.frontiersin.

org/articles/10.3389/fneur.2018.00717/full. Publisher:

Frontiers.

R. Meier, P. Lux, B. Med, S. Jung, U. Fischer, J. Gralla, M. Reyes, R. Wiest,

R. McKinley, and J. Kaesmacher. Neural Network–derived Perfusion

Maps for the Assessment of Lesions in Patients with Acute Ischemic

Stroke. Radiology: Artificial Intelligence, 1(5):e190019, Sept. 2019.

doi:10.1148/ryai.2019190019. URL https://pubs.rsna.org/doi/

full/10.1148/ryai.2019190019. Publisher: Radiological Society

of North America.

K. Mouridsen, M. B. Hansen, L. Østergaard, and S. N. Jespersen. Reliable

Estimation of Capillary Transit Time Distributions Using DSC-MRI.

Journal of Cerebral Blood Flow & Metabolism, 34(9):1511–1521, Sept.

2014. ISSN 0271-678X. doi:10.1038/jcbfm.2014.111. URL https:

//doi.org/10.1038/jcbfm.2014.111. Publisher: SAGE Publica-

tions Ltd STM.

M. A. Mutke, V. I. Madai, F. C. von Samson-Himmelstjerna, O. Zaro Weber,

G. S. Revankar, S. Z. Martin, K. L. Stengl, M. Bauer, S. Hetzer, M. Gün-

ther, and J. Sobesky. Clinical evaluation of an arterial-spin-labeling prod-

uct sequence in steno-occlusive disease of the brain. PloS One, 9(2):

e87143, 2014. ISSN 1932-6203. doi:10.1371/journal.pone.0087143.

D. Nie, R. Trullo, J. Lian, L. Wang, C. Petitjean, S. Ruan, Q. Wang,

and D. Shen. Medical Image Synthesis with Deep Convolu-

tional Adversarial Networks. IEEE Transactions on Biomedi-

cal Engineering, 65(12):2720–2730, Dec. 2018. ISSN 1558-2531.

doi:10.1109/TBME.2018.2814538. Conference Name: IEEE Transac-

tions on Biomedical Engineering.

S. Pistocchi, D. Strambo, B. Bartolini, P. Maeder, R. Meuli, P. Michel,

and V. Dunet. MRI software for diffusion-perfusion mismatch

analysis may impact on patients’ selection and clinical outcome.

European Radiology, 32(2):1144–1153, Feb. 2022. ISSN 1432-

1084. doi:10.1007/s00330-021-08211-2. URL https://doi.org/10.

1007/s00330-021-08211-2.

B. Rehani, S. G. Ammanuel, Y. Zhang, W. Smith, D. L. Cooke, S. W. Hetts,

S. A. Josephson, A. Kim, J. C. Hemphill, and W. Dillon. A New Era

of Extended Time Window Acute Stroke Interventions Guided by Imag-

ing. The Neurohospitalist, 10(1):29–37, Jan. 2020. ISSN 1941-8744.

doi:10.1177/1941874419870701. URL https://doi.org/10.1177/

1941874419870701. Publisher: SAGE Publications Inc.

J. L. Saver. Time Is Brain—Quantified. Stroke, 37(1):263–266,

Jan. 2006. doi:10.1161/01.STR.0000196957.55928.ab. URL

https://www.ahajournals.org/doi/full/10.1161/01.STR.

0000196957.55928.ab. Publisher: American Heart Association.

A. Sharobeam and B. Yan. Advanced imaging in acute ischemic

stroke: an updated guide to the hub-and-spoke hospitals. Cur-

rent Opinion in Neurology, 35(1):24–30, Feb. 2022. ISSN 1350-

7540. doi:10.1097/WCO.0000000000001020. URL https:

//journals.lww.com/co-neurology/Fulltext/2022/02000/

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Image-to-image GAN for synthesizing perfusion maps 11

Advanced_imaging_in_acute_ischemic_stroke__an.6.aspx.

V. N. Thijs, D. M. Somford, R. Bammer, W. Robberecht, M. E. Moseley, and

G. W. Albers. Influence of Arterial Input Function on Hypoperfusion

Volumes Measured With Perfusion-Weighted Imaging. Stroke, 35(1):

94–98, Jan. 2004. doi:10.1161/01.STR.0000106136.15163.73. URL

https://www.ahajournals.org/doi/full/10.1161/01.STR.

0000106136.15163.73. Publisher: American Heart Association.

G. Turc, P. Bhogal, U. Fischer, P. Khatri, K. Lobotesis, M. Mazighi, P. D.

Schellinger, D. Toni, J. de Vries, P. White, and J. Fiehler. European

Stroke Organisation (ESO) – European Society for Minimally Invasive

Neurological Therapy (ESMINT) Guidelines on Mechanical Thrombec-

tomy in Acute Ischaemic StrokeEndorsed by Stroke Alliance for Europe

(SAFE). European Stroke Journal, 4(1):6–12, Mar. 2019. ISSN 2396-

9873. doi:10.1177/2396987319832140. URL https://doi.org/10.

1177/2396987319832140. Publisher: SAGE Publications.

J. V. Vialard, M.-M. Rohé, P. Robert, F. Nicolas, and A. Bône. Going be-

yond voxel-wise deconvolution in perfusion MRI: learning and leverag-

ing spatio-temporal regularities with the stU-Net. page 6, 2021.

M. N. Wernick, Y. Yang, J. G. Brankov, G. Yourganov, and S. C.

Strother. Machine Learning in Medical Imaging. IEEE Signal

Processing Magazine, 27(4):25–38, July 2010. ISSN 1558-0792.

doi:10.1109/MSP.2010.936730. Conference Name: IEEE Signal Pro-

cessing Magazine.

X. Yi, E. Walia, and P. Babyn. Generative Adversarial Network in Med-

ical Imaging: A Review. Medical Image Analysis, 58:101552, Dec.

2019. ISSN 13618415. doi:10.1016/j.media.2019.101552. URL http:

//arxiv.org/abs/1809.07294. arXiv: 1809.07294.

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired Image-to-Image Trans-

lation using Cycle-Consistent Adversarial Networks. arXiv:1703.10593

[cs], Aug. 2020. URL http://arxiv.org/abs/1703.10593. arXiv:

1703.10593.

. CC-BY 4.0 International licenseIt is made available under a

is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted May 25, 2022. ; https://doi.org/10.1101/2022.05.24.22274901doi: medRxiv preprint

Discussion

8.1 Summary

This thesis aimed to investigate the opportunities and challenges of GANs for image synthesis

in the field of stroke. For this, we focused on two main topics that divided this thesis into

two parts. In the first part (Chapter 4–6), we examined the synthesis of stroke images for

data sharing, while in the second part, we performed image-to-image translation by extracting

perfusion maps from DSC-MRI for fast patient stratification.

In the first part, we showed that 2D synthesized data could preserve the essential predictive

properties for brain vessel segmentation. A segmentation network trained on our synthetic

data achieved a high Dice Similarity Coefficient of 0.85 on real test data (Chapter 4). In a

transfer learning approach, we simulated sharing our synthetic data by pre-initializing the

weights from a model trained on synthetic data and evaluating the performance of this model

on a second dataset. We fine-tuned this network with increasing amounts of new data and

showed that our fine-tuned network outperforms the segmentation networks trained on the

new dataset alone. Furthermore, our fine-tuned model needed less newly annotated data than

a model trained from scratch to achieve a comparable segmentation performance, showcasing

the potential benefit of sharing synthetic data.

Building on these encouraging results, we extended our GAN architectures to generate

high-resolution 3D volumes in order to exploit all three dimensions and spatial relations within

the medical images. The change to 3D substantially increased the computational load, which

we managed by implementing the two timescale update rule as well as by introducing mixed

precision training. Compared to Chapter 4, we increased the amount of generated voxels

by a factor of roughly 100, whereas the number of filters per layer was halved for our best

performing model. Even with these restrictions, the segmentation model trained on our 3D

volumes showed a comparable performance on real data as our 2D model, indicating a benefit

of including the third dimension. Additionally, we extended the assessment of the synthetic

images giving a more in-depth evaluation by computing precision, recall, and the FID using

the activations of the pre-trained MedicalNet [95].

8. Discussion

After verifying that synthetic TOF-MRA patches still pertain predictive properties for

our use case of brain vessel segmentation, we wanted to investigate their degree of privacy.

Previous research has suggested that artificially created datasets are not necessarily private as

ML models such as GANs are susceptible to membership inference attacks. A successful attack

could jeopardize the privacy of the patients that were used to train the GAN. To provide

an upper bound on the patient’s privacy leakage, the mathematical concept of differential

privacy was introduced. Differential privacy can be integrated into the training of a GAN and

quantify the privacy, while reducing the vulnerability of the model to membership inference

attacks. In Chapter 6, we implemented differential privacy into our 2D GAN architecture

from Chapter 4 and explored the impact of privacy restrictions on the quality of the generated

images. Moreover, we tested the usability of the newly created images for varying levels

of privacy in the brain vessel segmentation task. We could identify a good Dice Similarity

Coefficient of 0.75 for an acceptable privacy bound of

= 7

4and identified a threshold of

ϵ < 5for which the images became unusable.

Taken together, we could show that GANs are able to generate 2D, 3D, and privacy-

preserving medical images for brain vessel segmentation. There are still computational

obstacles when training GANs, especially when training them on high-dimensional data such

as large high-resolution 3D images. Nevertheless, if the computational setup allows for it,

generating 3D images might be beneficial for the downstream task. Moreover, introducing

differential privacy into GAN training is feasible for synthesizing 2D image patches. We

show that the synthetic privacy-preserving patches still maintained the predictive properties

necessary for our segmentation task. However, the implementation of differential privacy

came along with a performance drop when utilizing the synthetic, private-preserving data in a

downstream ML model.

In the second part of the thesis, we showcased another application of GANs for medical

image synthesis, i.e., an image-to-image translation. We developed a pix2pix GAN variant for

automating the extraction of perfusion maps from DSC-MRI scans for treatment planning in

stroke (see Chapter 7). To this end, we introduced temporal convolutions into the generator’s

architecture to account for the temporal dimension. Our GAN variant achieved excellent

performance on acute stroke patient data and good performance on data of patients with

cerebrovascular disease. Notably, we could even generate perfusion images for DSC-MRI scans

with motion artifacts, for which conventional approaches have failed. Our results pave the way

for fully-automated translation of DSC-MRI to perfusion maps for fast patient stratification in

acute ischemic stroke.

8.2 Discussion and Outlook

8.2.1 Synthesis of Medical Images for Data Sharing

We have demonstrated that we can utilize GANs for generating 2D, 3D, and privacy-preserving

TOF-MRA image patches for the ultimate goal of data sharing. Comparing our 2D and 3D

approaches, we could show that 3D information incorporated into GAN architectures might

be beneficial for the downstream task, especially when large computing power is available

(see Chapter 5). Further studies should investigate whether our findings generalize to similar

8.2 Discussion and Outlook

datasets as well as to different MR sequences and modalities. Synthesizing images based

on patients from different cohorts and scanners could also increase the synthetic data’s

heterogeneity.

In the context of data sharing, it would be particularly interesting to test our GAN

architectures (Chapter 4–6) on other neuroimaging data because brain images are especially

sensitive. Since only a few studies have synthesized medical images using differential private

GAN architectures, it would be interesting whether our findings regarding the privacy-utility

trade-off generalize. A more detailed investigation of this trade-off on different images would

allow for a better understanding of the parameter

and the associated performance drop with

decreasing

. In addition, future studies could investigate whether the observed performance

drop for differential private GANs holds true when including more images for training. In a

study, Xu et al. have already shown on non-medical images that a larger dataset utilized for

training a GAN with the same privacy budget led to better performing synthetic images [130].

To facilitate the training of our architectures on other data, we have made our code for each

study publicly available.

Another related aspect of privacy-preserving synthetic data is membership inference

attacks [131]. These attacks aim to identify whether a data sample was part of the training set

for an ML model and are a potential privacy breach for generative models [68]. Future studies

could investigate the direct effect of differential privacy in GAN training on the success rate of

those attacks in the neuroimaging domain. While this effect has already been investigated

on non-medical images [68, 130], studies investigating the accuracy of membership inference

attacks on GANs synthesizing medical images have yet to be performed. Such studies would

help to better translate the value of

into a probability estimation for re-identifying patient

data with state-of-the-art attacks.

While differential privacy is a powerful technique to quantify the privacy leakage of the data

itself, other secure AI techniques can complement it. One technique that recently caught much

attention is federated learning [71]. This decentralized approach relies on transferring the model

weights during the training process and hence allows for training on several datasets without

transferring the data. This makes collaborations and data sharing across institutions not

only simpler but also more secure. Another advantage is that larger and more heterogeneous

datasets can be combined and used to build more robust ML models [132, 133]. However,

federated learning alone does not offer data security unless combined with other privacy-

preserving methods such as differential privacy [70]. Other promising secure AI techniques

that could be explored in the medical imaging domain are homomorphic encryption and secure

multi-party computation [70].

In general, privacy-preserving techniques aim to bridge the gap between usable, data-

driven modeling and maintaining the patients’ privacy from a technological point of view [70].

Nevertheless, privacy in the medical field is a multi-disciplinary endeavor, which should not

only include ML and cryptography researchers but also informed patients, physicians, and

policymakers. Here, an open discussion is needed about the expectations on data security, the

trade-off between the patients’ privacy, and the opportunities of ML applications to improve

8. Discussion

patient care. In any case, we believe that secure and private ML is a prerequisite for building

trust of patients and physicians in ML systems.

To summarize, large, publicly available datasets are crucial for the development and

application of ML models and for GANs. As medical data, including stroke imaging, is

sensitive, the lack of open data substantially constraints this research. The opportunity of

filling this gap with synthetic, privacy-preserving data is, therefore, an encouraging research

direction.

8.2.2 Image-to-Image Translation for Stroke Treatment Planning

In the second part of the thesis, we showed that a GAN variant could be utilized for synthesizing

perfusion parameter maps from DSC-MRI in stroke imaging. As a result, our model’s synthetic

perfusion maps closely resembled the ground truth. However, it is crucial to note that our

model is exploratory at this stage and still needs clinical validation.

Whereas in the traditional approach, an AIF is placed, and the perfusion maps are calculated

voxel-wise, our approach operated on the whole slice. If patients move during the recording of

the MR sequences, the sequences can be misaligned. For these patients, a voxel-wise approach

cannot synthesize perfusion maps as it relies on aligned voxels in the temporal dimension of

the DSC-MRI. Here, our slice-wise approach can have a decisive advantage. In our work (see

Chapter 7), we show first results for patients with these so-called motion artifacts. Nevertheless,

future studies could investigate this in more depth. For example, models could be trained on

data specifically augmented by image sequences, for which patients’ movements are simulated.

In clinical practice, these models offer a solution for patients for which otherwise no perfusion

parameter maps would be available.

Similar GAN models could also be utilized for cross-modality synthesis. For example,

MR can be translated into CT images [85] or MR to positron emission tomography [134].

These approaches could also be leveraged to merge existing datasets that were initially in two

different modalities and thus increase data availability. In the same way, perfusion parameter

maps could also be generated from another modality or sequence that is not DSC-MRI as

it requires the injection of a contrast agent to which some patients might be allergic [135].

Promising candidates are images containing fine-grained information about the vessels such

as TOF-MRA. Since perfusion-weighted imaging would not need to be additionally recorded,

this could potentially speed up the clinical routine. Image-to-image translations have vast

applications in the medical domain, and we believe GAN solutions can substantially impact

patient care not only in stroke but also in medical imaging in general.

8.2.3 Challenges and Opportunities for GANs in Medical Imaging

GANs can be regarded as a universal concept that is invariant to a specific type of model

architecture. Nowadays, most GAN architectures in medical imaging rely on deep convolutional

networks [30]. Therefore, the limitations of these networks also concern GANs. For example,

deep neural networks are regarded as black box approaches [136]. Especially in the medical

field, explainable algorithms are crucial for building the patient’s and physician’s trust in the

ML system. The field of explainable AI aims to find solutions to the opaqueness of neural

networks either by developing interpretable algorithms or finding retrospective explanations

8.2 Discussion and Outlook

for the model’s behavior. This research can also be beneficial to GANs as the architecture of

both the generator and the discriminator could be replaced by more explainable algorithms.

Suppose our GAN in Chapter 7 could not only be clinically validated, but the internal workings

of our generator could additionally be explained in more detail. In that case, our approach for

perfusion map generation might be accepted by more clinicians as an alternative to conventional

approaches in clinical practice. Very recent studies have started to investigate this for GANs

using non-medical data [137] as well as for a GAN trained on CT images [138]. However, more

research needs to be conducted to incorporate explainability into GAN architectures which

could eventually also be applied to our GAN architecture for synthesizing perfusion parameter

maps.

Furthermore, related GAN approaches aim to disentangle features and thus make the

synthetic images more interpretable. Our work synthesized images based on an intrinsic

representation the GAN has learned without explicitly knowing what kind of features this

entails. To shed light on this, researchers aimed to learn disentangled representations by

associating meaning to the latent variables that are fed into the generator, for example, by

utilizing an InfoGAN architecture [139]. After training the network, the latent variable can

control a subset of features within the image [139]. In an example of non-medical images,

the width or rotation of a synthetic handwritten digit can be altered by fixing the noise

vector and carefully adjusting the latent vector. A similar approach can also be used in

the medical field. For instance, Toda et al. utilized an InfoGAN architecture to generate

lung tumors in different shapes and sizes [140]. The authors tested their synthetic data by

augmenting data for lung cancer classification and achieving a better result than augmenting

with WGAN-created synthetic data. Other popular strategies for disentanglement using GAN

models involve disentangling content and style, which can be utilized for data augmentation

as well as cross-modality synthesis [141].

Generally, disentangled GAN approaches are especially valuable for patients with rare

diseases that are usually underrepresented in a dataset. Here, those patient data could be

augmented and thus improve performance in the medical task. In our use case of brain

vessel segmentation (Chapter 4–6), the biggest challenge is to segment rare pathologies [38,

57]. A disentangled model that could generate pathologies in a controlled manner and

augment segmentation networks with it might improve their performance. So far, not all

disentanglement strategies developed have been tested in the medical field [141]. Therefore,

we believe more explainable and interpretable models, including feature disentanglement, will

emerge to make GANs in healthcare more trustworthy.

Another promising neural network architecture for GANs in the medical field is vision

transformers [142, 143]. Transformers stem from the field of natural language processing

and have recently been adapted to images. A vision transformer takes image patches

as an input, embeds them, and then feeds them through a transformer encoder. This

encoder consists of different layers, including multi-headed attention layers. The built-in

attention within a vision transformer offers the opportunity to visualize attention maps that

can provide insights into the model decision process [144]. Thus, by design, they can be

regarded as more interpretable compared to convolutional neural networks. This difference

8. Discussion

in interpretability, however, needs to be further evaluated in future studies [144]. Vision

transformer models particularly profit from large datasets, and in such a setting, they have

already outperformed deep convolutional networks [142]. Several studies have tested vision

transformers on medical images. Still, they rely on pre-training on large datasets only

available for non-medical images to achieve a performance comparable to convolutional neural

networks [143]. Only a few studies have investigated the combination of GANs and visual

transformers in the medical field [143, 145]. With increasing data availability in the medical

imaging field – which could be achieved, for instance, by data sharing – they might be a useful

architecture for an ML model on medical images as well as medical image synthesis using GANs.

Another major challenge for synthetic data, in general, is a proper evaluation. While

a human rater could directly estimate the data quality, this approach is time-consuming,

especially when a trained medical expert is needed. Many metrics have been proposed to

automate and quantify synthetic images that are mostly based on comparing the intensity

distributions or the activations in a pre-trained network [30, 97]. To date, no standardized

metric exists [97]. When the ground truth is available (as in Chapter 7), metrics such as the

structural similarity index measure or peak-signal-to-noise ratio can be computed. However,

these metrics might not fully capture important properties of medical images [146]. Recently,

Alaa et al. proposed a more holistic, model-agnostic approach to evaluate synthetic data [147].

They identified fidelity, diversity, and generalization as key components for the evaluation.

These were translated into a 3D evaluation metric, containing so-called

-Precision,

-Recall,

and Authenticity. The authors were able to improve the synthetic data via model auditing using

their three part 3D metric. This was confirmed by increased performance in the downstream

task. Such holistic evaluation strategies for synthetic data are a promising new direction for

future GAN evaluation.

Research on GANs is still presented with several challenges. For instance, training a

GAN is computationally expensive, especially when training multiple generators and/or

discriminators on 3D medical images. Adapted software and hardware could facilitate GAN

training further [148]. Other main challenges revolve around problems such as vanishing

gradients, mode collapse, and unstable training. While there are already improvements for

these problems, they remain challenges that need to be addressed [149]. Nevertheless, GANs

have accelerated the field of generative models and have set new standards for synthesizing

high-quality images. With the generation of high-resolution, realistic-looking synthetic images

that retain the predictive properties, GANs could revolutionize the field of medical imaging

and solve the lack of data availability as well as efficiently process medical images. Future

studies should consider privacy aspects, explainability, and clinical validation.

References

[1]

The GBD 2016 Lifetime Risk of Stroke Collaborators. “Global, Regional, and Country-Specific

Lifetime Risks of Stroke, 1990 and 2016”. In: New England Journal of Medicine 379.25 (2018),

pp. 2429–2437. doi:10.1056/NEJMoa1804492.

[2]

V. L. Feigin, B. A. Stark, C. O. Johnson, G. A. Roth, C. Bisignano, et al. “Global, regional,

and national burden of stroke and its risk factors, 1990–2019: a systematic analysis for the

Global Burden of Disease Study 2019”. In: The Lancet Neurology 20.10 (2021), pp. 795–820.

doi:10.1016/S1474-4422(21)00252-0.

[3]

J. Burn, M. Dennis, J. Bamford, P. Sandercock, D. Wade, and C. Warlow. “Long-term risk

of recurrent stroke after a first-ever stroke. The Oxfordshire Community Stroke Project.” In:

Stroke 25.2 (1994), pp. 333–337. doi:10.1161/01.STR.25.2.333.

[4]

J. F. d. Carmo, R. L. Morelato, H. P. Pinto, and E. R. A. d. Oliveira. “Disability after

stroke: a systematic review”. In: Fisioterapia em Movimento 28 (2015), pp. 407–418. doi:

10.1590/0103-5150.028.002.AR02.

[5]

H. A. Wafa, C. D. Wolfe, E. Emmett, G. A. Roth, C. O. Johnson, and Y. Wang. “Burden of Stroke

in Europe”. In: Stroke 51.8 (2020), pp. 2418–2427. doi:10.1161/STROKEAHA.120.029606.

[6]

R. Luengo-Fernandez, M. Violato, P. Candio, and J. Leal. “Economic burden of stroke across

Europe: A population-based cost analysis”. In: European Stroke Journal 5.1 (2020), pp. 17–25.

doi:10.1177/2396987319883160.

[7]

B. Rehani, S. G. Ammanuel, Y. Zhang, W. Smith, D. L. Cooke, S. W. Hetts, S. A. Josephson,

A. Kim, J. C. Hemphill, and W. Dillon. “A New Era of Extended Time Window Acute

Stroke Interventions Guided by Imaging”. In: The Neurohospitalist 10.1 (2020), pp. 29–37. doi:

10.1177/1941874419870701.

[8]

G. Thomalla and C. Gerloff. “Acute imaging for evidence-based treatment of ischemic stroke”. In:

Current Opinion in Neurology 32.4 (2019), pp. 521–529. doi:

10.1097/WCO.0000000000000716

[9]

A. S. Lundervold and A. Lundervold. “An overview of deep learning in medical imaging focusing

on MRI”. In: Zeitschrift für Medizinische Physik. Special Issue: Deep Learning in Medical

Physics 29.2 (2019), pp. 102–127. doi:10.1016/j.zemedi.2018.11.002.

[10]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. “ImageNet classification with deep convolutional

neural networks”. In: Communications of the ACM 60.6 (2017), pp. 84–90. doi:

10.1145/3065

386.

[11]

G. Lee, S. Jun, Y.

W. Cho, H. Lee, G. B. Kim, J. B. Seo, and N. Kim. “Deep Learning in

Medical Imaging: General Overview”. In: Korean Journal of Radiology 18.4 (2017), pp. 570–584.

doi:10.3348/kjr.2017.18.4.570.

[12]

C. Tian, L. Fei, W. Zheng, Y. Xu, W. Zuo, and C.

W. Lin. “Deep learning on image denoising: An

overview”. In: Neural Networks 131 (2020), pp. 251–275. doi:

10.1016/j.neunet.2020.07.025

REFERENCES

[13]

A. Yala, C. Lehman, T. Schuster, T. Portnoi, and R. Barzilay. “A Deep Learning Mammography-

based Model for Improved Breast Cancer Risk Prediction”. In: Radiology 292.1 (2019), pp. 60–66.

doi:10.1148/radiol.2019182716.

[14]

Z. Guo, X. Li, H. Huang, N. Guo, and Q. Li. “Deep Learning-Based Image Segmentation

on Multimodal Medical Imaging”. In: IEEE Transactions on Radiation and Plasma Medical

Sciences 3.2 (2019), pp. 162–169. doi:10.1109/TRPMS.2018.2890359.

[15]

R. Aggarwal, V. Sounderajah, G. Martin, D. S. W. Ting, A. Karthikesalingam, D. King,

H. Ashrafian, and A. Darzi. “Diagnostic accuracy of deep learning in medical imaging: a

systematic review and meta-analysis”. In: npj Digital Medicine 4.1 (2021), pp. 1–23. doi:

10.1038/s41746-021-00438-z.

[16]

L. Lu, L. Dercle, B. Zhao, and L. H. Schwartz. “Deep learning for the prediction of early

on-treatment response in metastatic colorectal cancer from serial medical imaging”. In: Nature

Communications 12.1 (2021), p. 6654. doi:10.1038/s41467-021-26990-6.

[17]

K. C. Ho, W. Speier, S. El-Saden, and C. W. Arnold. “Classifying Acute Ischemic Stroke Onset

Time using Deep Imaging Features”. In: AMIA Annual Symposium Proceedings 2017 (2018),

pp. 892–901. issn: 1942-597X. url:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC597

7679/.

[18]

L. Chen, P. Bentley, and D. Rückert. “Fully automatic acute ischemic lesion segmentation in

DWI using convolutional neural networks”. In: NeuroImage: Clinical 15 (2017), pp. 633–643.

doi:10.1016/j.nicl.2017.06.016.

[19]

N. Stier, N. Vincent, D. Liebeskind, and F. Scalzo. “Deep learning of tissue fate features in acute

ischemic stroke”. In: 2015 IEEE International Conference on Bioinformatics and Biomedicine

(BIBM). 2015, pp. 1316–1321. doi:10.1109/BIBM.2015.7359869.

[20]

A. Nielsen, M. B. Hansen, A. Tietze, and K. Mouridsen. “Prediction of Tissue Outcome and

Assessment of Treatment Effect in Acute Ischemic Stroke Using Deep Learning”. In: Stroke 49.6

(2018), pp. 1394–1401. doi:10.1161/STROKEAHA.117.019740.

[21]

H. Kamal, V. Lopez, and S. A. Sheth. “Machine Learning in Acute Ischemic Stroke

Neuroimaging”. In: Frontiers in Neurology 9 (2018). doi:10.3389/fneur.2018.00945.

[22]

M. J. Willemink, W. A. Koszek, C. Hardell, J. Wu, D. Fleischmann, H. Harvey, L. R. Folio,

R. M. Summers, D. L. Rubin, and M. P. Lungren. “Preparing Medical Imaging Data for Machine

Learning”. In: Radiology 295.1 (2020), pp. 4–15. doi:10.1148/radiol.2020192224.

[23]

C. G. Schwarz, W. K. Kremers, T. M. Therneau, R. R. Sharp, J. L. Gunter, P. Vemuri, A. Arani,

A. J. Spychalla, K. Kantarci, D. S. Knopman, R. C. Petersen, and C. R. Jack. “Identification

of Anonymous MRI Research Participants with Face-Recognition Software”. In: New England

Journal of Medicine 381.17 (2019), pp. 1684–1686. doi:10.1056/NEJMc1908881.

[24]

D. Abramian and A. Eklund. “Refacing: Reconstructing Anonymized Facial Features Using

GANS”. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

2019, pp. 1104–1108. doi:10.1109/ISBI.2019.8759515.

[25]

D. Duan, S. Xia, I. Rekik, Z. Wu, L. Wang, W. Lin, J. H. Gilmore, D. Shen, and G. Li.

“Individual identification and individual variability analysis based on cortical folding features in

developing infant singletons and twins”. In: Human Brain Mapping 41.8 (2020), pp. 1985–2003.

doi:10.1002/hbm.24924.

[26]

I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,

and Y. Bengio. “Generative Adversarial Networks”. In: arXiv (2014). doi:

10.48550/ARXIV.14

06.2661.

REFERENCES

[27]

K. Armanious, C. Jiang, M. Fischer, T. Küstner, T. Hepp, K. Nikolaou, S. Gatidis, and B. Yang.

“MedGAN: Medical image translation using GANs”. In: Computerized Medical Imaging and

Graphics 79 (2020). doi:10.1016/j.compmedimag.2019.101684.

[28]

Z. Yin, K. Xia, Z. He, J. Zhang, S. Wang, and B. Zu. “Unpaired Image Denoising via Wasserstein

GAN in Low-Dose CT Image with Multi-Perceptual Loss and Fidelity Loss”. In: Symmetry 13.1

(2021). doi:10.3390/sym13010126.

[29]

C. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter,

K. P. Andriole, and M. Michalski. “Medical Image Synthesis for Data Augmentation and

Anonymization Using Generative Adversarial Networks”. In: Simulation and Synthesis in

Medical Imaging. Ed. by A. Gooya, O. Goksel, I. Oguz, and N. Burgos. Lecture Notes in

Computer Science. Cham: Springer International Publishing, 2018, pp. 1–11. isbn: 978-3-030-

00536-8. doi:10.1007/978-3-030-00536-8_1.

[30]

X. Yi, E. Walia, and P. Babyn. “Generative Adversarial Network in Medical Imaging: A Review”.

In: Medical Image Analysis 58 (2019). doi:10.1016/j.media.2019.101552.

[31]

J. Rubin and S. M. Abulnaga. “CT-To-MR Conditional Generative Adversarial Networks for

Ischemic Stroke Lesion Segmentation”. In: 2019 IEEE International Conference on Healthcare

Informatics (ICHI). 2019, pp. 1–7. doi:10.1109/ICHI.2019.8904574.

[32]

L. Bi, J. Kim, A. Kumar, D. Feng, and M. Fulham. “Synthesis of Positron Emission Tomography

(PET) Images via Multi-channel Generative Adversarial Networks (GANs)”. In: Molecular

Imaging, Reconstruction and Analysis of Moving Body Organs, and Stroke Imaging and

Treatment. Ed. by M. J. Cardoso, T. Arbel, F. Gao, B. Kainz, T. van Walsum, K. Shi,

K. K. Bhatia, R. Peter, T. Vercauteren, M. Reyes, A. Dalca, R. Wiest, W. Niessen, and B. J.

Emmer. Cham: Springer International Publishing, 2017, pp. 43–51. isbn: 978-3-319-67564-0.

doi:10.1007/978-3-319-67564-0_5.

[33]

G. Kwon, C. Han, and D.

s. Kim. “Generation of 3D Brain MRI Using Auto-Encoding Generative

Adversarial Networks”. In: Medical Image Computing and Computer Assisted Intervention –

MICCAI 2019. Ed. by D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P.

T. Yap,

and A. Khan. Cham: Springer International Publishing, 2019, pp. 118–126. isbn: 978-3-030-

32248-9. doi:10.1007/978-3-030-32248-9_14.

[34]

T. Kossen, P. Subramaniam, V. I. Madai, A. Hennemuth, K. Hildebrand, A. Hilbert, J. Sobesky,

M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey. “Synthesizing anonymized and

labeled TOF-MRA patches for brain vessel segmentation using generative adversarial networks”.

In: Computers in Biology and Medicine 131 (2021). doi:

10.1016/j.compbiomed.2021.104254

[35]

P. Subramaniam, T. Kossen, K. Ritter, A. Hennemuth, K. Hildebrand, A. Hilbert, J. Sobesky,

M. Livne, I. Galinovic, A. A. Khalil, J. B. Fiebach, D. Frey, and V. I. Madai. “Generating

3D TOF-MRA Volumes and Segmentation Labels using Generative Adversarial Networks”. In:

Medical Image Analysis (2022). doi:10.1016/j.media.2022.102396.

[36]

T. Kossen, M. A. Hirzel, V. I. Madai, F. Boenisch, A. Hennemuth, K. Hildebrand, S. Pokutta,

K. Sharma, A. Hilbert, J. Sobesky, I. Galinovic, A. A. Khalil, J. B. Fiebach, and D. Frey.

“Toward Sharing Brain Images: Differentially Private TOF-MRA Images With Segmentation

Labels Using Generative Adversarial Networks”. In: Frontiers in Artificial Intelligence 5 (2022).

doi:10.3389/frai.2022.813842.

[37]

T. Kossen, V. I. Madai, M. A. Mutke, A. Hennemuth, K. Hildebrand, J. Behland, A. Hilbert,

J. Sobesky, M. Bendszus, and D. Frey. “Image-to-image generative adversarial networks for

synthesizing perfusion parameter maps from DSC-MR images in cerebrovascular disease”. In:

medRxiv (2022). doi:10.1101/2022.05.24.22274901.

REFERENCES

[38]

M. Livne, J. Rieger, O. U. Aydin, A. A. Taha, E. M. Akay, T. Kossen, J. Sobesky, J. D.

Kelleher, K. Hildebrand, D. Frey, and V. I. Madai. “A U-Net Deep Learning Framework for

High Performance Vessel Segmentation in Patients With Cerebrovascular Disease”. In: Frontiers

in Neuroscience 13 (2019), p. 97. doi:10.3389/fnins.2019.00097.

[39]

M. Ivantsits, L. Goubergrits, J.

M. Kuhnigk, M. Huellebrand, J. Brüning, T. Kossen, B.

Pfahringer, J. Schaller, A. Spuler, T. Kuehne, and A. Hennemuth. “Cerebral Aneurysm Detection

and Analysis Challenge 2020 (CADA)”. In: Cerebral Aneurysm Detection and Analysis: First

Challenge, CADA 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, October 8,

2020, Proceedings. Lima, Peru: Springer-Verlag, 2020, pp. 3–17. isbn: 978-3-030-72861-8. doi:

10.1007/978-3-030-72862-5_1.

[40]

M. Ivantsits, L. Goubergrits, J.

M. Kuhnigk, M. Huellebrand, J. Bruening, T. Kossen, B.

Pfahringer, J. Schaller, A. Spuler, T. Kuehne, Y. Jia, X. Li, S. Shit, B. Menze, Z. Su, J. Ma, Z.

Nie, K. Jain, Y. Liu, Y. Lin, and A. Hennemuth. “Detection and analysis of cerebral aneurysms

based on X-ray rotational angiography - the CADA 2020 challenge”. In: Medical Image Analysis

77 (2022). doi:10.1016/j.media.2021.102333.

[41]

A. Meddeb, T. Kossen, K. K. Bressem, B. Hamm, and S. N. Nagel. “Evaluation of a Deep

Learning Algorithm for Automated Spleen Segmentation in Patients with Conditions Directly

or Indirectly Affecting the Spleen”. In: Tomography 7.4 (2021), pp. 950–960. doi:

10.3390/tom

ography7040078.

[42]

A. K. A. Unnithan and P. Mehta. Hemorrhagic Stroke. StatPearls Publishing, Treasure Island

(FL), 2021. url:http://europepmc.org/books/NBK559173.

[43]

S. A. Randolph. “Ischemic Stroke”. In: Workplace Health & Safety 64.9 (2016), pp. 444–444.

doi:10.1177/2165079916665400.

[44]

M. T. Beckhauser, L. H. Castro-Afonso, F. A. Dias, G. S. Nakiri, L. M. Monsignore, R. K.

Martins Filho, M. R. Camilo, F. F. Aléssio Alves, M. Libardi, G. R. Rodrigues, O. M. Pontes-

Neto, and D. G. Abud. “Extended Time Window Mechanical Thrombectomy for Acute Stroke

in Brazil”. In: Journal of Stroke and Cerebrovascular Diseases: The Official Journal of National

Stroke Association 29.10 (2020). doi:10.1016/j.jstrokecerebrovasdis.2020.105134.

[45]

A. Wouters, R. Lemmens, P. Dupont, and V. Thijs. “Wake-Up Stroke and Stroke of Unknown

Onset: A Critical Review”. In: Frontiers in Neurology 5 (2014). doi:

10.3389/fneur.2014.00

153.

[46]

O. Shafaat and H. Sotoudeh. “Stroke Imaging”. In: StatPearls. Treasure Island (FL): StatPearls

Publishing, 2022. url:http://www.ncbi.nlm.nih.gov/books/NBK546635/.

[47]

A. J. M. Kiruluta and R. G. González. “Chapter 7 - Magnetic resonance angiography: physical

principles and applications”. In: Handbook of Clinical Neurology. Ed. by J. C. Masdeu and

R. G. González. Vol. 135. Neuroimaging Part I. Elsevier, 2016, pp. 137–149. doi:

10.1016/B97

8-0-444-53485-9.00007-6.

[48]

D. Chien and R. R. Edelman. “Basic principles and clinical applications of magnetic resonance

angiography”. In: Seminars in Roentgenology. Noninvasive Vascular Imaging 27.1 (1992), pp. 53–

62. doi:10.1016/0037-198X(92)90046-5.

[49]

J. C. Carr and T. J. Carroll. Magnetic Resonance Angiography: Principles and Applications.

Springer Science & Business Media, 2011. isbn: 978-1-4419-1685-3. doi:

10.1007/978-1-4419-

1686-0.

[50]

T. Sichtermann, A. Faron, R. Sijben, N. Teichert, J. Freiherr, and M. Wiesmann. “Deep

Learning–Based Detection of Intracranial Aneurysms in 3D TOF-MRA”. In: American Journal

of Neuroradiology 40.1 (2019), pp. 25–32. doi:10.3174/ajnr.A5911.

REFERENCES

[51]

C. G. Choi, D. H. Lee, J. H. Lee, H. W. Pyun, D. W. Kang, S. U. Kwon, J. K. Kim, S. J.

Kim, and D. C. Suh. “Detection of Intracranial Atherosclerotic Steno-Occlusive Disease with

3D Time-of-Flight Magnetic Resonance Angiography with Sensitivity Encoding at 3T”. In:

American Journal of Neuroradiology 28.3 (2007), pp. 439–446. issn: 0195-6108, 1936-959X. url:

http://www.ajnr.org/content/28/3/439.

[52]

X. Gong, Z. Chen, F. Shi, M. Zhang, C. Xu, R. Zhang, and M. Lou. “Conveniently-Grasped

Field Assessment Stroke Triage (CG-FAST): A Modified Scale to Detect Large Vessel Occlusion

Stroke”. In: Frontiers in Neurology 10 (2019). doi:10.3389/fneur.2019.00390.

[53]

G. Turc, P. Bhogal, U. Fischer, P. Khatri, K. Lobotesis, M. Mazighi, P. D. Schellinger, D.

Toni, J. de Vries, P. White, and J. Fiehler. “European Stroke Organisation (ESO) – European

Society for Minimally Invasive Neurological Therapy (ESMINT) Guidelines on Mechanical

Thrombectomy in Acute Ischaemic StrokeEndorsed by Stroke Alliance for Europe (SAFE)”. In:

European Stroke Journal 4.1 (2019), pp. 6–12. doi:10.1177/2396987319832140.

[54]

J. Gutierrez, K. Cheung, A. Bagci, T. Rundek, N. Alperin, R. L. Sacco, C. B. Wright, and

M. S. V. Elkind. “Brain Arterial Diameters as a Risk Factor for Vascular Events”. In: Journal

of the American Heart Association 4.8 (2015). doi:10.1161/JAHA.115.002289.

[55]

D. Frey, M. Livne, H. Leppin, E. M. Akay, O. U. Aydin, J. Behland, J. Sobesky, P. Vajkoczy,

and V. I. Madai. “A precision medicine framework for personalized simulation of hemodynamics

in cerebrovascular disease”. In: BioMedical Engineering OnLine 20.1 (2021), p. 44. doi:

10.118

6/s12938-021-00880-w.

[56]

O. Ronneberger, P. Fischer, and T. Brox. “U-Net: Convolutional Networks for Biomedical Image

Segmentation”. In: arXiv (2015). doi:10.48550/ARXIV.1505.04597.

[57]

A. Hilbert, V. I. Madai, E. M. Akay, O. U. Aydin, J. Behland, J. Sobesky, I. Galinovic,

A. A. Khalil, A. A. Taha, J. Wuerfel, P. Dusek, T. Niendorf, J. B. Fiebach, D. Frey, and M.

Livne. “BRAVE-NET: Fully Automated Arterial Brain Vessel Segmentation in Patients With

Cerebrovascular Disease”. In: Frontiers in Artificial Intelligence 3 (2020). doi:

10.3389/frai

.2020.552258.

[58]

B. J. Kim. “Principles and Practical Application of Brain MRI in Acute Ischemic Stroke”.

In: Stroke Revisited: Diagnosis and Treatment of Ischemic Stroke. Ed. by S.

H. Lee. Stroke

Revisited. Singapore: Springer, 2017, pp. 109–119. isbn: 978-981-10-1424-6. doi:

10.1007/978-

981-10-1424-6_10.

[59]

C. M. Ermine, A. Bivard, M. W. Parsons, and J.

C. Baron. “The ischemic penumbra: From

concept to reality”. In: International Journal of Stroke 16 (2021), pp. 497–509. doi:

10.1177/1

747493020975229.

[60]

C. S. Kidwell, J. R. Alger, and J. L. Saver. “Evolving Paradigms in Neuroimaging of the Ischemic

Penumbra”. In: Stroke 35.11_suppl_1 (2004), pp. 2662–2665. doi:

10.1161/01.STR.00001432

22.13069.70.

[61]

M. Straka, G. W. Albers, and R. Bammer. “Real-time diffusion-perfusion mismatch analysis

in acute stroke”. In: Journal of Magnetic Resonance Imaging 32.5 (2010), pp. 1024–1037. doi:

10.1002/jmri.22338.

[62]

P. D. Schellinger, J. B. Fiebach, and W. Hacke. “Imaging-Based Decision Making in Thrombolytic

Therapy for Ischemic Stroke”. In: Stroke 34.2 (2003), pp. 575–583. doi:

10.1161/01.STR.0000

051504.10095.9C.

REFERENCES

[63]

P. D. Schellinger and J. B. Fiebach. “Perfusion-Weighted Imaging/Diffusion-Weighted Imaging

Mismatch on MRI Can Now Be Used to Select Patients for Recombinant Tissue Plasminogen

Activator Beyond 3 Hours”. In: Stroke 36.5 (2005), pp. 1098–1101. doi:

10.1161/01.STR.0000

162388.67745.8d.

[64]

L. Østergaard. “Principles of cerebral perfusion imaging by bolus tracking”. In: Journal of

Magnetic Resonance Imaging 22.6 (2005), pp. 710–717. doi:10.1002/jmri.20460.

[65]

B. J. Kim, H. G. Kang, H.

J. Kim, S.

H. Ahn, N. Y. Kim, S. Warach, and D.

W. Kang.

“Magnetic Resonance Imaging in Acute Ischemic Stroke Treatment”. In: Journal of Stroke 16.3

(2014), pp. 131–145. doi:10.5853/jos.2014.16.3.131.

[66]

K. Suzuki. “Overview of deep learning in medical imaging”. In: Radiological Physics and

Technology 10.3 (2017), pp. 257–273. doi:10.1007/s12194-017-0406-5.

[67]

B. Sahiner, A. Pezeshk, L. M. Hadjiiski, X. Wang, K. Drukker, K. H. Cha, R. M. Summers, and

M. L. Giger. “Deep learning in medical imaging and radiation therapy”. In: Medical Physics

46.1 (2019), e1–e36. doi:10.1002/mp.13264.

[68]

J. Hayes, L. Melis, G. Danezis, and E. D. Cristofaro. “LOGAN: Membership Inference Attacks

Against Generative Models”. In: Proceedings on Privacy Enhancing Technologies 2019.1 (2019),

pp. 133–152. doi:10.2478/popets-2019-0008.

[69]

D. Chen, N. Yu, Y. Zhang, and M. Fritz. “GAN-Leaks: A Taxonomy of Membership Inference

Attacks against Generative Models”. In: Proceedings of the 2020 ACM SIGSAC Conference on

Computer and Communications Security. ACM, 2020. doi:10.1145/3372297.3417238.

[70]

G. A. Kaissis, M. R. Makowski, D. Rückert, and R. F. Braren. “Secure, privacy-preserving and

federated machine learning in medical imaging”. In: Nature Machine Intelligence 2.6 (2020),

pp. 305–311. doi:10.1038/s42256-020-0186-1.

[71]

H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas. “Communication-

Efficient Learning of Deep Networks from Decentralized Data”. In: arXiv (2016). doi:

10.4855

0/ARXIV.1602.05629.

[72]

C. Dwork. “Differential Privacy: A Survey of Results”. In: Theory and Applications of Models

of Computation. Ed. by M. Agrawal, D. Du, Z. Duan, and A. Li. Lecture Notes in Computer

Science. Berlin, Heidelberg: Springer, 2008, pp. 1–19. isbn: 978-3-540-79228-4. doi:

10.1007/9

78-3-540-79228-4_1.

[73]

V. Cheng, V. M. Suriyakumar, N. Dullerud, S. Joshi, and M. Ghassemi. “Can You Fake It Until

You Make It?: Impacts of Differentially Private Synthetic Data on Downstream Classification

Fairness”. In: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and

Transparency. Virtual Event Canada: ACM, 2021, pp. 149–160. isbn: 978-1-4503-8309-7. doi:

10.1145/3442188.3445879.

[74]

T. Schlegl, P. Seeböck, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth. “f-AnoGAN:

Fast unsupervised anomaly detection with generative adversarial networks”. In: Medical Image

Analysis 54 (2019), pp. 30–44. doi:10.1016/j.media.2019.01.010.

[75]

M. D. Cirillo, D. Abramian, and A. Eklund. “Vox2Vox: 3D-GAN for Brain Tumour

Segmentation”. In: arXiv (2020). doi:10.48550/ARXIV.2003.13653.

[76]

M. Frid-Adar, I. Diamant, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan. “GAN-

based synthetic medical image augmentation for increased CNN performance in liver lesion

classification”. In: Neurocomputing 321 (2018), pp. 321–331. doi:

10.1016/j.neucom.2018.09

.013.

REFERENCES

[77]

J. M. Wolterink, T. Leiner, M. A. Viergever, and I. Išgum. “Generative Adversarial Networks

for Noise Reduction in Low-Dose CT”. In: IEEE Transactions on Medical Imaging 36.12 (2017),

pp. 2536–2545. doi:10.1109/TMI.2017.2708987.

[78]

B. Yu, L. Zhou, L. Wang, Y. Shi, J. Fripp, and P. Bourgeat. “Ea-GANs: Edge-Aware Generative

Adversarial Networks for Cross-Modality MR Image Synthesis”. In: IEEE Transactions on

Medical Imaging 38.7 (2019), pp. 1750–1762. doi:10.1109/TMI.2019.2895894.

[79]

M. AlAmir and M. AlGhamdi. “The Role of Generative Adversarial Network in Medical Image

Analysis: An in-depth survey”. In: ACM Computing Surveys (2022). doi:10.1145/3527849.

[80]

M. Arjovsky, S. Chintala, and L. Bottou. “Wasserstein GAN”. In: arXiv (2017). doi:

10.48550

/ARXIV.1701.07875.

[81]

P. Isola, J.

Y. Zhu, T. Zhou, and A. A. Efros. “Image-to-Image Translation with Conditional

Adversarial Networks”. In: arXiv (2016). doi:10.48550/ARXIV.1611.07004.

[82]

A. Radford, L. Metz, and S. Chintala. “Unsupervised Representation Learning with Deep

Convolutional Generative Adversarial Networks”. In: arXiv (2015). doi:

10.48550/ARXIV.151

1.06434.

[83]

T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. “Improved

Techniques for Training GANs”. In: arXiv (2016). doi:10.48550/ARXIV.1606.03498.

[84]

I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. “Improved Training of

Wasserstein GANs”. In: arXiv, 2017. doi:10.48550/ARXIV.1704.00028.

[85]

K. N. D. Brou Boni, J. Klein, L. Vanquin, A. Wagner, T. Lacornerie, D. Pasquier, and N.

Reynaert. “MR to CT synthesis with multicenter data in the pelvic area using a conditional

generative adversarial network”. In: Physics in Medicine & Biology 65.7 (2020). doi:

10.1088

/1361-6560/ab7633.

[86]

J. Benzakoun, M.

A. Deslys, L. Legrand, G. Hmeydia, G. Turc, W. B. Hassen, S. Charron,

C. Debacker, O. Naggara, J.

C. Baron, B. Thirion, and C. Oppenheim. “Synthetic FLAIR

as a Substitute for FLAIR Sequence in Acute Ischemic Stroke”. In: Radiology 303.1 (2022),

pp. 153–159. doi:10.1148/radiol.211394.

[87]

N. K. Singh and K. Raza. “Medical Image Generation Using Generative Adversarial Networks:

A Review”. In: Health Informatics: A Computational Perspective in Healthcare. Ed. by R.

Patgiri, A. Biswas, and P. Roy. Studies in Computational Intelligence. Singapore: Springer,

2021, pp. 77–96. isbn: 9789811597350. doi:10.1007/978-981-15-9735-0_5.

[88]

Y. Wang, B. Yu, L. Wang, C. Zu, D. S. Lalush, W. Lin, X. Wu, J. Zhou, D. Shen, and L. Zhou.

“3D conditional generative adversarial networks for high-quality PET image estimation at low

dose”. In: NeuroImage 174 (2018), pp. 550–562. doi:10.1016/j.neuroimage.2018.03.045.

[89]

H. Choi, D. S. Lee, and Alzheimer’s Disease Neuroimaging Initiative. “Generation of Structural

MR Images from Amyloid PET: Application to MR-Less Quantification”. In: Journal of Nuclear

Medicine: Official Publication, Society of Nuclear Medicine 59.7 (2018), pp. 1111–1117. doi:

10.2967/jnumed.117.199414.

[90]

M. Maspero, M. H. F. Savenije, A. M. Dinkla, P. R. Seevinck, M. P. W. Intven, I. M. Jurgenliemk-

Schulz, L. G. W. Kerkmeijer, and C. A. T. v. d. Berg. “Dose evaluation of fast synthetic-CT

generation using a generative adversarial network for general pelvis MR-only radiotherapy”. In:

Physics in Medicine & Biology 63.18 (2018). doi:10.1088/1361-6560/aada6d.

[91]

M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. “GANs Trained by a

Two Time-Scale Update Rule Converge to a Local Nash Equilibrium”. In: arXiv (2017). doi:

10.48550/ARXIV.1706.08500.

REFERENCES

[92]

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. “Rethinking the Inception

Architecture for Computer Vision”. In: arXiv (2015). doi:10.48550/ARXIV.1512.00567.

[93]

C. Haarburger, N. Horst, D. Truhn, M. Broeckmann, S. Schrading, C. Kuhl, and D. Merhof.

“Multiparametric Magnetic Resonance Image Synthesis using Generative Adversarial Networks”.

In: Eurographics Workshop on Visual Computing for Biology and Medicine (2019), 5 pages. doi:

10.2312/VCBM.20191226.

[94]

B. Cao, H. Zhang, N. Wang, X. Gao, and D. Shen. “Auto-GAN: Self-Supervised Collaborative

Learning for Medical Image Synthesis”. In: Proceedings of the AAAI Conference on Artificial

Intelligence 34.07 (2020), pp. 10486–10493. doi:10.1609/aaai.v34i07.6619.

[95]

S. Chen, K. Ma, and Y. Zheng. “Med3D: Transfer Learning for 3D Medical Image Analysis”.

In: arXiv (2019). doi:10.48550/ARXIV.1904.00625.

[96]

L. Tronchin, R. Sicilia, E. Cordelli, S. Ramella, and P. Soda. “Evaluating GANs in Medical

Imaging”. In: Deep Generative Models, and Data Augmentation, Labelling, and Imperfections.

Ed. by S. Engelhardt, I. Oksuz, D. Zhu, Y. Yuan, A. Mukhopadhyay, N. Heller, S. X. Huang,

H. Nguyen, R. Sznitman, and Y. Xue. Lecture Notes in Computer Science. Cham: Springer

International Publishing, 2021, pp. 112–121. isbn: 978-3-030-88210-5. doi:

10.1007/978-3-03

0-88210-5_10.

[97]

A. Borji. “Pros and Cons of GAN Evaluation Measures”. In: arXiv (2018). doi:

10.48550

/ARXIV.1802.03446.

[98]

M. S. M. Sajjadi, O. Bachem, M. Lucic, O. Bousquet, and S. Gelly. “Assessing Generative

Models via Precision and Recall”. In: arXiv (2018). doi:10.48550/ARXIV.1806.00035.

[99]

N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni. “U-Net and Its Variants for

Medical Image Segmentation: A Review of Theory and Applications”. In: IEEE Access 9 (2021),

pp. 82031–82057. doi:10.1109/ACCESS.2021.3086020.

[100]

S. Moccia, E. De Momi, S. El Hadji, and L. S. Mattos. “Blood vessel segmentation algorithms –

Review of methods, datasets and evaluation metrics”. In: Computer Methods and Programs in

Biomedicine 158 (2018), pp. 71–91. doi:10.1016/j.cmpb.2018.02.001.

[101]

Y. Chen, X.

H. Yang, Z. Wei, A. A. Heidari, N. Zheng, Z. Li, H. Chen, H. Hu, Q. Zhou, and

Q. Guan. “Generative Adversarial Networks in Medical Image augmentation: A review”. In:

Computers in Biology and Medicine 144 (2022). doi:10.1016/j.compbiomed.2022.105382.

[102]

T. Neff, C. Payer, D. Stern, and M. Urschler. “Generative Adversarial Network based Synthesis

for Supervised Medical Image Segmentation”. In: Proceedings of the OAGM & ARW Joint

Workshop Vision, Automation and Robotics (). doi:10.3217/978-3-85125-524-9-30.

[103]

J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, F. K. Chen, and M. J. Collins. “Data

augmentation for patch-based OCT chorio-retinal segmentation using generative adversarial

networks”. In: Neural Computing and Applications 33.13 (2021), pp. 7393–7408. doi:

10.1007

/s00521-021-05826-w.

[104]

J. T. Guibas, T. S. Virdi, and P. S. Li. “Synthetic Medical Images from Dual Generative

Adversarial Networks”. In: arXiv (2017). doi:10.48550/ARXIV.1709.01872.

[105]

C. Bowles, L. Chen, R. Guerrero, P. Bentley, R. Gunn, A. Hammers, D. A. Dickie, M. V.

Hernández, J. Wardlaw, and D. Rückert. “GAN Augmentation: Augmenting Training Data

using Generative Adversarial Networks”. In: arXiv (2018). doi:

10.48550/ARXIV.1810.10863

[106]

M. Foroozandeh and A. Eklund. “Synthesizing brain tumor images and annotations by combining

progressive growing GAN and SPADE”. In: arXiv (2020). doi:10.48550/ARXIV.2009.05946.

REFERENCES

[107]

A. Eklund. “Feeding the zombies: Synthesizing brain volumes using a 3D progressive growing

GAN”. In: arXiv (2019). doi:10.48550/ARXIV.1912.05357.

[108]

L. Sun, J. Chen, Y. Xu, M. Gong, K. Yu, and K. Batmanghelich. “Hierarchical Amortized

Training for Memory-efficient High Resolution 3D GAN”. In: arXiv (2020). doi:

10.48550

/ARXIV.2008.01910.

[109]

L. Zhang, B. Shen, A. Barnawi, S. Xi, N. Kumar, and Y. Wu. “FedDPGAN: Federated

Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-

19 Pneumonia”. In: Information Systems Frontiers (2021). doi:

10.1007/s10796-021-10144-6

[110]

D. C. Nguyen, M. Ding, P. N. Pathirana, A. Seneviratne, and A. Y. Zomaya. “Federated Learning

for COVID-19 Detection with Generative Adversarial Networks in Edge Cloud Computing”. In:

IEEE Internet of Things Journal (2021), pp. 1–1. doi:10.1109/JIOT.2021.3120998.

[111]

Y. Xiao, K. R. Peters, W. C. Fox, J. H. Rees, D. A. Rajderkar, M. M. Arreola, I. Barreto,

W. E. Bolch, and R. Fang. “Transfer-Gan: Multimodal Ct Image Super-Resolution Via Transfer

Generative Adversarial Networks”. In: 2020 IEEE 17th International Symposium on Biomedical

Imaging (ISBI). 2020, pp. 195–198. doi:10.1109/ISBI45749.2020.9098322.

[112]

S. Kaji and S. Kida. “Overview of image-to-image translation by use of deep neural networks:

denoising, super-resolution, modality conversion, and reconstruction in medical imaging”. In:

Radiological Physics and Technology 12.3 (2019), pp. 235–248. doi:

10.1007/s12194-019-005

20-y.

[113]

Y. Zhu, T. Park, P. Isola, and A. A. Efros. “Unpaired Image-to-Image Translation using

Cycle-Consistent Adversarial Networks”. In: arXiv (2017). doi:

10.48550/ARXIV.1703.10593

[114]

K. Armanious, C. Jiang, S. Abdulatif, T. Küstner, S. Gatidis, and B. Yang. “Unsupervised

Medical Image Translation Using Cycle-MedGAN”. In: 2019 27th European Signal Processing

Conference (EUSIPCO). 2019, pp. 1–5. doi:10.23919/EUSIPCO.2019.8902799.

[115]

J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. v. d. Berg, and

I. Isgum. “Deep MR to CT Synthesis using Unpaired Data”. In: arXiv (2017). doi:

10.48550

/ARXIV.1708.01155.

[116]

O. Maier, B. H. Menze, J. von der Gablentz, L. Häni, M. P. Heinrich, M. Liebrand, S. Winzeck,

A. Basit, P. Bentley, L. Chen, D. Christiaens, F. Dutil, K. Egger, C. Feng, B. Glocker, M. Götz,

T. Haeck, H.

L. Halme, M. Havaei, K. M. Iftekharuddin, P.

M. Jodoin, K. Kamnitsas, E. Kellner,

A. Korvenoja, H. Larochelle, C. Ledig, J.

H. Lee, F. Maes, Q. Mahmood, K. H. Maier-Hein, R.

McKinley, J. Muschelli, C. Pal, L. Pei, J. R. Rangarajan, S. M. S. Reza, D. Robben, D. Rückert,

E. Salli, P. Suetens, C.

W. Wang, M. Wilms, J. S. Kirschke, U. M. Krämer, T. F. Münte, P.

Schramm, R. Wiest, H. Handels, and M. Reyes. “ISLES 2015 - A public evaluation benchmark

for ischemic stroke lesion segmentation from multispectral MRI”. In: Medical Image Analysis 35

(2017), pp. 250–269. doi:10.1016/j.media.2016.07.009.

[117]

K. Kamnitsas, C. Ledig, V. F. J. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon,

D. Rückert, and B. Glocker. “Efficient multi-scale 3D CNN with fully connected CRF for

accurate brain lesion segmentation”. In: Medical Image Analysis 36 (2017), pp. 61–78. doi:

10.1016/j.media.2016.10.004.

[118]

Y. Zhang, S. Liu, C. Li, and J. Wang. “Application of Deep Learning Method on Ischemic

Stroke Lesion Segmentation”. In: Journal of Shanghai Jiaotong University (Science) 27.1 (2022),

pp. 99–111. doi:10.1007/s12204-021-2273-9.

REFERENCES

[119]

M. Islam, N. R. Vaidyanathan, V. J. M. Jose, and H. Ren. “Ischemic Stroke Lesion Segmentation

Using Adversarial Learning”. In: Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic

Brain Injuries. Ed. by A. Crimi, S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum.

Cham: Springer International Publishing, 2019, pp. 292–300. isbn: 978-3-030-11723-8. doi:

10.1007/978-3-030-11723-8_29.

[120]

H. Kuang, B. K. Menon, and W. Qiu. “Automated stroke lesion segmentation in non-contrast

CT scans using dense multi-path contextual generative adversarial network”. In: Physics in

Medicine & Biology 65.21 (2020). doi:10.1088/1361-6560/aba166.

[121]

N. Hu, T. Zhang, Y. Wu, B. Tang, M. Li, B. Song, Q. Gong, M. Wu, S. Gu, and S. Lui.

“Detecting brain lesions in suspected acute ischemic stroke with CT-based synthetic MRI

using generative adversarial networks”. In: Annals of Translational Medicine 10.2 (2022). doi:

10.21037/atm-21-4056.

[122]

S. Wang, Z. Chen, S. You, B. Wang, Y. Shen, and B. Lei. “Brain stroke lesion segmentation using

consistent perception generative adversarial network”. In: Neural Computing and Applications

(2022). doi:10.1007/s00521-021-06816-8.

[123]

M. Platscher, J. Zopes, and C. Federau. “Image translation for medical image generation:

Ischemic stroke lesion segmentation”. In: Biomedical Signal Processing and Control 72 (2022).

doi:10.1016/j.bspc.2021.103283.

[124]

M. D. Moghari, L. Zhou, B. Yu, K. Moore, N. Young, R. Fulton, and A. Kyme. “Estimation of

full-dose 4D CT perfusion images from low-dose images using conditional generative adversarial

networks”. In: 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference

(NSS/MIC). 2019, pp. 1–3. doi:10.1109/NSS/MIC42101.2019.9059723.

[125]

A. Sharma and G. Hamarneh. “Missing MRI Pulse Sequence Synthesis Using Multi-Modal

Generative Adversarial Network”. In: IEEE Transactions on Medical Imaging 39.4 (2020),

pp. 1170–1183. doi:10.1109/TMI.2019.2945521.

[126]

M. F. Rachmadi, M. del C. Valdés-Hernández, S. Makin, J. M. Wardlaw, and T. Komura.

“Predicting the Evolution of White Matter Hyperintensities in Brain MRI Using Generative

Adversarial Networks and Irregularity Map”. In: Medical Image Computing and Computer

Assisted Intervention – MICCAI 2019. Ed. by D. Shen, T. Liu, T. M. Peters, L. H. Staib,

C. Essert, S. Zhou, P.

T. Yap, and A. Khan. Cham: Springer International Publishing, 2019,

pp. 146–154. isbn: 978-3-030-32248-9. doi:10.1007/978-3-030-32248-9_17.

[127]

A. Hess, R. Meier, J. Kaesmacher, S. Jung, F. Scalzo, D. Liebeskind, R. Wiest, and R. McKinley.

“Synthetic Perfusion Maps: Imaging Perfusion Deficits in DSC-MRI with Deep Learning”. In:

Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. Ed. by A. Crimi,

S. Bakas, H. Kuijf, F. Keyvan, M. Reyes, and T. van Walsum. Lecture Notes in Computer

Science. Cham: Springer International Publishing, 2019, pp. 447–455. isbn: 978-3-030-11723-8.

doi:10.1007/978-3-030-11723-8_45.

[128]

K. C. Ho, F. Scalzo, K. V. Sarma, S. El-Saden, and C. W. Arnold. “A temporal deep learning

approach for MR perfusion parameter estimation in stroke”. In: 2016 23rd International

Conference on Pattern Recognition (ICPR). Cancun: IEEE, 2016, pp. 1315–1320. isbn: 978-1-

5090-4847-2. doi:10.1109/ICPR.2016.7899819.

[129]

R. Meier, P. Lux, B. Med, S. Jung, U. Fischer, J. Gralla, M. Reyes, R. Wiest, R. McKinley,

and J. Kaesmacher. “Neural Network–derived Perfusion Maps for the Assessment of Lesions

in Patients with Acute Ischemic Stroke”. In: Radiology: Artificial Intelligence 1.5 (2019). doi:

10.1148/ryai.2019190019.

100

REFERENCES

[130]

C. Xu, J. Ren, D. Zhang, Y. Zhang, Z. Qin, and K. Ren. “GANobfuscator: Mitigating Information

Leakage Under GAN via Differential Privacy”. In: IEEE Transactions on Information Forensics

and Security 14.9 (2019), pp. 2358–2371. doi:10.1109/TIFS.2019.2897874.

[131]

R. Shokri, M. Stronati, C. Song, and V. Shmatikov. “Membership Inference Attacks Against

Machine Learning Models”. In: 2017 IEEE Symposium on Security and Privacy (SP). 2017,

pp. 3–18. doi:10.1109/SP.2017.41.

[132]

N. Rieke, J. Hancox, W. Li, F. Milletarì, H. R. Roth, S. Albarqouni, S. Bakas, M. N. Galtier,

B. A. Landman, K. Maier-Hein, S. Ourselin, M. Sheller, R. M. Summers, A. Trask, D. Xu,

M. Baust, and M. J. Cardoso. “The future of digital health with federated learning”. In: npj

Digital Medicine 3.1 (2020), pp. 1–7. doi:10.1038/s41746-020-00323-1.

[133]

M. Grama, M. Musat, L. Muñoz-González, J. Passerat-Palmbach, D. Rückert, and A. Alansary.

Robust Aggregation for Adaptive Privacy Preserving Federated Learning in Healthcare. Tech. rep.

2020. doi:10.48550/ARXIV.2009.08294.

[134]

S. Hu, Y. Shen, S. Wang, and B. Lei. “Brain MR to PET Synthesis via Bidirectional Generative

Adversarial Network”. In: Medical Image Computing and Computer Assisted Intervention –

MICCAI 2020. Ed. by A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga,

S. K. Zhou, D. Racoceanu, and L. Joskowicz. Lecture Notes in Computer Science. Cham:

Springer International Publishing, 2020, pp. 698–707. isbn: 978-3-030-59713-9. doi:

10.1007/9

78-3-030-59713-9_67.

[135]

M. T. Gracia Bara, A. Gallardo-Higueras, E. M. Moreno, E. Laffond, F. J. Muñoz Bellido, C.

Martin, M. Sobrino, E. Macias, S. Arriba-Méndez, R. Castillo, and I. Davila. “Hypersensitivity

to Gadolinium-Based Contrast Media”. In: Frontiers in Allergy 3 (2022). doi:

10.3389/falgy

.2022.813927.

[136]

J. Jung, S.

H. Han, and H.

J. Choi. “Explaining CNN and RNN Using Selective Layer-Wise

Relevance Propagation”. In: IEEE Access 9 (2021), pp. 18670–18681. doi:

10.1109/ACCESS.20

21.3051171.

[137]

V. Nagisetty, L. Graves, J. Scott, and V. Ganesh. “xAI-GAN: Enhancing Generative Adversarial

Networks via Explainable AI Systems”. In: arXiv (2020). doi:10.48550/ARXIV.2002.10438.

[138]

C. Wu, H. Zhang, J. Chen, Z. Gao, P. Zhang, K. Muhammad, and J. Del Ser. “Vessel-

GAN: Angiographic reconstructions from myocardial CT perfusion with explainable generative

adversarial networks”. In: Future Generation Computer Systems 130 (2022), pp. 128–139. doi:

10.1016/j.future.2021.12.007.

[139]

X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel. “InfoGAN:

Interpretable Representation Learning by Information Maximizing Generative Adversarial

Nets”. In: arXiv (2016). doi:10.48550/ARXIV.1606.03657.

[140]

R. Toda, A. Teramoto, M. Tsujimoto, H. Toyama, K. Imaizumi, K. Saito, and H. Fujita.

“Synthetic CT image generation of shape-controlled lung cancer using semi-conditional InfoGAN

and its applicability for type classification”. In: International Journal of Computer Assisted

Radiology and Surgery 16.2 (2021), pp. 241–251. doi:10.1007/s11548-021-02308-1.

[141]

J. Fragemann, L. Ardizzone, J. Egger, J. Kleesiek, and M. Workshop. Review of Disentanglement

Approaches for Medical Applications: Towards Solving the Gordian Knot of Generative Models

in Healthcare. preprint. 2022. doi:10.36227/techrxiv.19364897.v1.

[142]

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,

M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. “An Image is Worth 16x16

Words: Transformers for Image Recognition at Scale”. In: arXiv (2020). doi:

10.48550/ARXIV.2

010.11929.

101

REFERENCES

[143]

F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, and H. Fu.

“Transformers in Medical Imaging: A Survey”. In: arXiv (2022). doi:

10.48550/ARXIV.2

201.09873.

[144] C. Matsoukas, J. F. Haslum, M. Söderberg, and K. Smith. “Is it Time to Replace CNNs with

Transformers for Medical Images?” In: arXiv (2021). doi:10.48550/ARXIV.2108.09038.

[145]

S. A. Kamran, K. F. Hossain, A. Tavakkoli, S. L. Zuckerbrod, and S. A. Baker. “VTGAN:

Semi-supervised Retinal Image Synthesis and Disease Prediction using Vision Transformers”. In:

2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) (2021),

pp. 3228–3238. doi:10.1109/ICCVW54120.2021.00362.

[146]

X. Li, Y. Jiang, J. J. Rodriguez-Andina, H. Luo, S. Yin, and O. Kaynak. “When medical

images meet generative adversarial network: recent development and research opportunities”.

In: Discover Artificial Intelligence 1.1 (2021). doi:10.1007/s44163-021-00006-0.

[147]

A. M. Alaa, B. van Breugel, E. Saveliev, and M. van der Schaar. “How Faithful is your Synthetic

Data? Sample-level Metrics for Evaluating and Auditing Generative Models”. In: arXiv (2021).

doi:10.48550/ARXIV.2102.08921.

[148]

N. Shrivastava, M. A. Hanif, S. Mittal, S. R. Sarangi, and M. Shafique. “A survey of hardware

architectures for generative adversarial networks”. In: Journal of Systems Architecture 118

(2021). doi:10.1016/j.sysarc.2021.102227.

[149]

D. Saxena and J. Cao. “Generative Adversarial Networks (GANs): Challenges, Solutions, and

Future Directions”. In: ACM Computing Surveys 54.3 (2022), pp. 1–42. doi:

10.1145/3446374

102