scieee Science in your language
[en] (orig)

On Using Secure Aggregation in Differentially Private Federated Learning with Multiple Local Steps

Author: Heikkilä, Mikko
Publisher: Zenodo
DOI: 10.5281/zenodo.17293440
Source: https://zenodo.org/records/17293440/files/Heikkila_TMLR2025.pdf
On Using Secu e Agg ega ion in Di e en ially P i a e Fede -
a ed Lea ning wi h Mul iple Local S eps
Mikko A. Heikkilä [email p o ec ed]
Depa men o Compu e Science
Uni e si y o Helsinki ∗
Abs ac
Fede a ed lea ning is a dis ibu ed lea ning se ing whe e he main aim is o ain machine
lea ning models wi hou ha ing o sha e aw da a bu only wha is equi ed o lea ning.
To gua an ee aining da a p i acy and high-u ili y models, di e en ial p i acy and secu e
agg ega ion echniques a e o en combined wi h ede a ed lea ning. Howe e , wi h ine-
g ained p o ec ion g anula i ies, e.g., wi h he common sample-le el p o ec ion, he cu en ly
exis ing echniques gene ally equi e he pa ies o communica e o each local op imiza ion
s ep, i hey wan o ully bene i om he secu e agg ega ion in e ms o he esul ing o mal
p i acy gua an ees. In his pape , we show how a simple new analysis allows he pa ies o
pe o m mul iple local op imiza ion s eps while s ill bene i ing om using secu e agg ega ion.
We show ha ou analysis enables highe u ili y models wi h gua an eed p i acy p o ec ion
unde limi ed numbe o communica ion ounds.
1 In oduc ion
Fede a ed lea ning (FL; McMahan e al. 2017; Kai ouz e al. 2019) is a common dis ibu ed lea ning se ing,
whe e a cen al se e and se e al clien s holding hei own local da a se s collabo a e o ain a single global
model. The main ea u e in FL is ha he clien s do no di ec ly communica e da a, bu only wha is equi ed
o lea ning, e.g., g adien s o upda ed model pa ame e s (pseudo-g adien s).
While FL sa is ies he da a minimiza ion p inciple, i.e., only wha is ac ually needed is communica ed while
he ac ual aw da a ne e lea es he clien , i does no p o ec agains p i acy a acks such as membe ship
in e ence (Shok i e al., 2017) o econs uc ion (F ed ikson e al., 2014; Yeom e al., 2018). Ins ead, aining
da a p i acy is commonly ensu ed by combining di e en ial p i acy (DP; Dwo k e al. 2006b), a o mal
p i acy de ini ion, and secu e mul ipa y compu a ion (MPC; Yao 1982) wi h FL (see, e.g., Kai ouz e al.
2019).
DP is essen ially a obus ness gua an ee o s ochas ic algo i hms, which gua an ees ha small pe u ba ions
o he inpu s ha e small e ec s on he algo i hms’ ou pu p obabili ies. Wha cons i u es a small pe u ba ion
depends on he chosen p o ec ion g anula i y: he same basic DP de ini ion can be used o ensu ing p i acy
on any hing om single sample o en i e da a se le el. In u n, MPC p o ocols can be used o limi he
amoun o in o ma ion an ad e sa y has abou compu a ions. In FL, secu e agg ega ion (SecAgg) p o ocols,
a specialised o m o secu e compu a ion ha equi es signi ican ly less esou ces han gene al MPC, a e
commonly used o communica ing model upda es om he clien s o he se e . This app oach enables
p o ably be e dis ibu ed DP (DDP) gua an ees, ha ely join ly on a g oup o clien s, han is possible o
achie e by any single clien in isola ion.
Unde he gene al FL se up, wo main al e na i es a e commonly conside ed: c oss-de ice FL and c oss-silo
FL (Kai ouz e al., 2019). In c oss-de ice FL, each clien is assumed o ha e a small local da a se , while he
o al numbe o clien s is la ge, e.g., housands o millions. In he c oss-silo case, he o al numbe o clien s
is small, o example, a dozen, bu each clien is assumed o ha e a la ge local da a se . In his pape , ou
∗Wo k done pa ly a Tele ónica Resea ch, Ba celona.
1
unning example is s anda d c oss-silo di e en ially p i a e FL (DPFL) whe e he clien s communica e all
upda es o he se e using SecAgg.
1
In his se ing, he mos use ul DP p o ec ion g anula i y is ypically
some hing s ic ly mo e ine-g ained han clien -le el: when clien s a e, e.g., di e en hospi als o banks,
he e a e ypically se e al indi iduals in a single clien s’ local da a se and he p o ec ion g anula i y needs
o ma ch he use case.
While clien -le el g anula i y in DPFL is, a leas in p inciple, s aigh o wa d o combine wi h SecAgg, mo e
ine-g ained g anula i ies such as sample-le el DP can p esen p oblems: using exis ing echniques one has o
choose be ween i) ha ing DDP gua an ees wi h less noise due o SecAgg bu wi h all clien s using only a
single local op imiza ion s ep pe FL ound, and ii) ha ing mo e noisy local DP (LDP) gua an ees ha do
no o mally bene i om SecAgg while allowing he clien s o do mo e local op imiza ion s eps pe FL ound.
Bo h o hese op ions ha e signi ican d awbacks: he amoun o se e -clien communica ions is ypically one
o he i s bo lenecks ha limi model aining in FL, while LDP gua an ees egula ly equi e noise le els
ha hea ily a ec he esul ing model u ili y. In his pape we show ha his ade-o is no una oidable
bu can be la gely emedied by a simple new analysis o he p oblem.
Ou Con ibu ion
•
We p esen a no el and simple heo e ical p i acy analysis showing when we can inc ease he numbe
o local op imiza ion s eps in FL using ine-g ained DP g anula i y, while s ill bene i ing om DDP
gua an ees using a us ed agg ega o .
•
We demons a e empi ically ha he p oposed app oach can lead o la ge u ili y bene i s (in e ms
o p edic ion accu acy and loss) wi hou equi ing any changes o he unde lying algo i hms unde
bo h iid and he e ogeneous clien da a spli s. In ou expe imen s wi h limi ed numbe o global
communica ion ounds, using a con olu ional neu al ne wo k wi h Fashion MNIST da a se as well
as linea models on CIFAR-10 (using p e- ained model as ea u e ex ac o ) and ACS Income da a
se s, we imp o e p edic ion accu acy oughly by 16% poin s, 4% poin s, and 4% poin s, espec i ely
(co esponding o 23%,5% and 5% imp o emen s).
•
Ou esul s poin o a misma ch be ween he cu en heo e ical unde s anding o anilla DPFL
(s anda d DPFL wi h FedA g) and p ac ical esul s.
2 Rela ed Wo k
The e is a signi ican amoun o exis ing wo k ocusing on he gene al p oblem o combining DP wi h FL,
al hough he ocus has mos ly been on he c oss-de ice FL se ing wi h use - o clien -le el DP. To he
bes o ou knowledge, while he combina ion o DPFL wi h SecAgg is ce ainly no no el (see, e.g., T uex
e al. 2019; Kai ouz e al. 2019; Heikkilä e al. 2020; S e ens e al. 2022; Yang e al. 2023), he e is e y
limi ed amoun o wo k speci ically on he p i acy analysis when he clien s do mul iple local op imiza ion
s eps wi h ine-g ained DP and communica e he esul s ia SecAgg. As a as we know, he only di ec ly
ele an analysis is by Noble e al. (2022), who conside he Gaussian mechanism o sample-le el DP in he
c oss-de ice se ing wi hou secu e agg ega ion. Ou main heo em (Theo em 4.6) can be seen as ex ending
hei esul on he Gaussian mechanism o o he DP mechanisms ha a e mo e sui able o combining wi h
ac ual secu e agg ega ion p o ocols. 2
Conside ing o he exis ing wo k in mo e de ail, we can dis inguish some main lines o closely- ela ed esea ch.
Besides he DDP app oach we ocus on, i.e., he combina ion o LDP noise wi h secu e agg ega ion, he use
o LDP mechanism by each clien independen ly p o ides obus p i acy wi h ew assump ions, bu gene ally
esul s in wo se u ili y (see, e.g., Kasi iswana han e al. 2008; T uex e al. 2020). Ano he ecen ly conside ed
1
Ins ead o conside ing any speci ic SecAgg implemen a ion, in his wo k we mos ly assume an idealised us ed agg ega o .
We discuss p ac ical implemen a ions in Appendix A.4.
2
No e ha (T uex e al., 2019, Algo i hm 4) also s a e a simila specialised e sion o he Gaussian mechanism, i.e., hey use
se e al local op imiza ion s eps wi h sample-le el DP and SecAgg in FL, while scaling he Gaussian noise o DDP conside ing
all he clien s. Howe e , as also no ed by Malekzadeh e al. (2021), he app oach o T uex e al. (2019) would equi e a sepa a e
p oo o p i acy beyond wha is ac ually p o ided in he pape .
2
app oach assumes a us ed se e ha is asked wi h he DP noise addi ion, and does p i acy accoun ing
ia he ma ix mechanism (Zhang e al., 2024). This can be shown o imp o e he p i acy-u ili y ade-o
compa ed o s anda d DP-SGD unde some assump ions, bu is no di ec ly compa ible wi h he secu e
agg ega ion app oach o DDP we ocus on in his wo k.
Looking a somewha mo e closely- ela ed pape s in e ms o he gene al se ing, he e is plen y o wo k
p oposing no el lea ning me hods o FL, assuming sample-le el DP and LDP noise wi h SecAgg o imp o ed
DDP gua an ees. While he exis ing wo k only uses a single local op imiza ion s ep (see, e.g., Heikkilä e al.
2020; Malekzadeh e al. 2021; S e ens e al. 2022; Yang e al. 2023), ou analysis can be le e aged in his
se ing o enable unning mul iple local s eps gene ally o many such me hods wi hou equi ing any o he
changes o he algo i hms.
Ano he clea line o wo k has ocused on in oducing no el disc e e DP mechanisms ha can be used wi h
addi i ely homomo phic enc yp ion echniques, which ypically ope a e on he g oup o in ege s wi h modulo
addi ions. Aga wal e al. (2018) p oposed a binomial mechanism ha p o ides DP using disc e e binomial
noise. Imp o ing on he binomial mechanism, Canonne e al. (2020) p oposed a disc e e Gaussian mechanism,
while Aga wal e al. (2021) in oduced a Skellam mechanism and Chen e al. (2022b) a Poisson-binomial
mechanism, bo h o which imp o e on he disc e e Gaussian, e.g., by being in ini ely di isible dis ibu ions:
he sum o Skellam/Poisson-binomial dis ibu ed andom a iables is ano he Skellam/Poisson-binomial
andom a iable. Ou wo k is no ocused on in oducing new DP mechanisms, bu ou analysis allows
o using many di e en DP noise mechanism. In pa icula , ou analysis allows o using LDP noise wi h
SecAgg including when using in ini ely di isible DP mechanisms, such as he Skellam mechanism, wi h
pseudo-g adien s and ine-g ained DP p o ec ion le el.
While ou main ocus is on p i acy accoun ing wi h SecAgg unde limi ed communica ion budge , he e has
also been conside able e o by he communi y o educe he amoun o equi ed communica ion u he by
applying quan iza ion o he g adien s (Aga wal e al., 2018; Kai ouz e al., 2021; Aga wal e al., 2021; Chen
e al., 2022b; Jin e al., 2020; Chaudhu i e al., 2022; Guo e al., 2023) o by comp essing he upda es sen by
he clien s (T ias cyn e al., 2021; Chen e al., 2022a). In p inciple, any such echnique o comp essing he
model upda es compa ible wi h SecAgg can also be di ec ly combined wi h ou p i acy analysis. In con as ,
bene i ing om g adien quan iza ion is no en i ely s aigh o wa d as in ou case he model upda es a e
pseudo-g adien s and no g adien s. We lea e a de ailed conside a ion and compa ison o he possible me hods
o educing he equi ed communica ion budge beyond wha is possible by pushing mo e op imiza ion s eps
o he clien s o u u e wo k.
In summa y, while many o he con ibu ions ci ed abo e, e.g., no el DP mechanisms, a e no limi ed o
c oss-de ice FL, all he expe imen s and use cases men ioned in he ci ed pape s ha a e compa ible wi h
SecAgg and use mul iple local s eps only conside DDP wi h use - o clien -le el DP in c oss-de ice FL.
In con as , we ocus on mo e ine-g ained DP g anula i ies, namely on sample-le el DP. As we discuss in
Sec ion 3, combining sample-le el DP wi h mul iple local s eps and LDP noise wi h SecAgg o imp o ed
u ili y equi es a no el p i acy analysis. The main aim o his pape is o p o ide such an analysis.
While he cu en ly exis ing heo e ical con e gence bounds o anilla DPFL wi h FedA g do no show any
bene i om inc easing he numbe o local s eps in DPFL (see Malekmohammadi e al. 2024, Theo em
3.2), we empi ically demons a e he u ili y o ou analysis in Sec ion 5 a e s a ing he esul s in Sec ion 4.
Ou esul s clea ly highligh he need o imp o ing he heo e ical analysis o s anda d DPFL o e wha is
shown by Malekmohammadi e al. (2024) o unde s and when inc easing he numbe o local s eps is use ul
(compa e his disag eemen o empi ical esul s and heo y o he discussion by Mishchenko e al. 2022 on he
p o able use ulness o local s eps in anilla non-DP FL).
3 Backg ound
Fede a ed lea ning (FL, McMahan e al. 2017; Kai ouz e al. 2019) is a collabo a i e lea ning se ing, whe e
he pa icipan s include a cen al se e and clien s holding some da a. On each FL ound, he se e chooses
a g oup o clien s o an upda e and sends hem he cu en model pa ame e s. The chosen clien s upda e
hei local model pa ame e s by aking some amoun o op imiza ion s eps using only hei own local da a,
3
and hen send an upda e back o he se e . The se e hen agg ega es he clien -speci ic con ibu ions o
upda e he global model. We use he s anda d ede a ed a e aging upda e ule: assuming w.l.o.g. ha clien s
i
= 1
, . . . , N
ha e been selec ed a FL ound
, and ha clien
i
sends an upda e ∆
( )
i
(pseudo-g adien ), he
upda ed global model θ is gi en by
θ =θ −1+1
N
N
X
i=1
∆( )
i.(1)
3.1 Di e en ial P i acy
We wan o gua an ee p i acy o he ained model w. . . he aining da a, o which we use di e en ial
p i acy (DP). W i ing he space o possible da a se s as X∗:= ∪n∈NXn, we ha e he ollowing:
De ini ion 3.1. (Dwo k e al., 2006b;a) Le
ε >
0and
δ∈
[0
,
1]. A andomised algo i hm
A
:
X∗→ O
is
(ε, δ)-DP i o e e y x, x′∈ X∗:x≃x′, and e e y measu able se E⊂ O,
P(A(x)∈E)≤eεP(A(x′)∈E) + δ,
whe e
≃
is a neighbou hood ela ion.
A
is igh ly (
ε, δ
)-DP, i he e does no exis
δ′< δ
such ha
A
is (
ε, δ′
)-DP. When
δ
= 0, we w i e
ε
-DP and call i pu e DP. The mo e gene al case (
ε, δ
)-DP is called
app oxima e DP (ADP).
De ini ion 3.1 can be equi alen ly s a ed as a bound on he so-called hockey-s ick di e gence:
De ini ion 3.2. Le α > 0. The hockey-s ick di e gence be ween dis ibu ions P, Q is gi en by
Hα(P∥Q) := E ∼Q dP
dQ( )−α+!=E ∼Q p( )
q( )−α+!,(2)
whe e [
a
]
+
:=
max
(
a,
0),
dP
dQ
is he Radon-Nikodym de i a i e, and
p, q
a e he densi ies o
P, Q
, espec i ely.
In he es o his pape , we assume ha all he ele an densi ies exis s.
I has been shown ha a andomised algo i hm
A
is (
ε, δ
)-DP i
supx≃x′Heε
(
A
(
x
)
∥A
(
x′
))
≤δ
(Ba he e al.,
2013; Ba he & Olmedo, 2013).
Ou main esul s do no depend on he exac neighbou hood de ini ion, bu in all he expe imen s we use he
add/ emo e ela ion o unbounded DP, ha is,
x, x′∈ X∗
a e neighbou s, i
x
can be ans o med in o
x′
by adding o emo ing a single p o ec ed uni om
X
. Fo he p o ec ion g anula i y, al hough his is no
a s ic limi a ion, we ocus on he common sample-le el DP, i.e., a single p o ec ed uni co esponds o a
single sample o da a. As no ed ea lie , ou analysis could be ad an ageous o any hing mo e ine-g ained
han clien -le el DP, e.g., elemen -le el DP (Asi e al., 2019) o e en o indi idual-le el when he e a e mo e
han one indi idual in he clien s’ local da a.
We also make use o domina ing pai s:
De ini ion 3.3. (Zhu e al., 2022) A pai o dis ibu ions (
P, Q
)is a domina ing pai o a s ochas ic
algo i hm A, i o all α≥0
sup
x,x′∈X∗:x≃x′
Hα(A(x)∥A(x′)) ≤Hα(P∥Q),(3)
whe e Hαis he hockey-s ick di e gence (De ini ion 3.2).
3.2 P oblem wi h Local S eps in DPFL wi h SecAgg
While DP o e s s ic p i acy p o ec ion, i comes a he cos o educed model u ili y. This is especially
ue in he local DP (LDP) se ing, whe e each clien p o ec s i s own da a independen ly o any o he pa y
(Kasi iswana han e al., 2008). One well-known echnique o imp o e model u ili y in DPFL has been o
u ilise secu e agg ega ion (SecAgg) o u n LDP gua an ees in o dis ibu ed DP (DDP) gua an ees ha
4
depend on mul iple clien s (see, e.g., Kai ouz e al. 2019).
3
In e ec , his means ha when he clien s a e able
o collabo a e, hey can join ly scale he locally used noise le els o each a a ge DDP gua an ee esul ing
in imp o ed u ili y compa ed o he LDP gua an ees. E en i he clien s do no explici ly collabo a e on
calib a ing he noise le els, he DDP gua an ees esul ing om using SecAgg will ne e heless be s onge
compa ed o he LDP gua an ees eached by each clien indi idually.
Howe e , nai ely combining ine-g ained DP p o ec ion wi h SecAgg o DDP uns in o p oblems, as we
demons a e in he es o his sec ion. Fo con enience, we gi e he pseudo-code o unning s anda d DPFL
wi h sample- and use -le el DP p o ec ion in Appendix A.2.
S a ing wi h he unp oblema ic case o clien -le el DP, w i ing
TA
o an ideal us ed agg ega o and using
he well-known Gaussian mechanism (Dwo k e al., 2006a) o simplici y, one can ge DDP gua an ees o
any numbe o local op imiza ion s eps wi h he ollowing upda e:
θ =θ −1+1
NTA N
X
i=1
clipC(∆( )
i) + ξ( )
i!,(4)
whe e he sum inside
TA
is done by a us ed agg ega o ,
clipC
ensu es ha each clien -speci ic upda e
has
ℓ2
-no m bounded by he cons an
C
, and
ξ( )
i
is Gaussian noise s. .
PN
i=1 ξ( )
i
gi es he DDP p o ec ion
le el we a e aiming o . As he clipping and noise a e applied di ec ly o he upda ed weigh s a e he local
op imiza ion has inished, he p i acy p o ec ion is no a ec ed by he numbe o local op imiza ion s eps
clien iis using o a i e a ∆( )
ibe o e applying DP.
The e is also a simple app oach ha wo ks wi h mo e ine-g ained g anula i ies, when he clien s use a single
local op imiza ion s ep wi h common lea ning a e
γ
and, o example, s anda d DP s ochas ic g adien
descen (DP-SGD, Song e al. 2013) again u ilising Gaussian noise: we can ake ∆
( )
i
=
−γ
(
g( )
i
+
ξ( )
i
), whe e
g( )
i
is a sum o clipped pe -uni g adien s (e.g. pe -sample o sample-le el DP) om clien
i
, o ha e he
upda e
θ =θ −1−1
NTA N
X
i=1
γ(g( )
i+ξ( )
i)!.(5)
Looking a he sum in Equa ion 5, since each pe -uni g adien has a common bounded no m and Gaussian
noise is in ini ely di isible, i.e., he summed-up noise is ano he Gaussian, we can calcula e he esul ing
p i acy wi h s anda d echniques (see, e.g., Mi ono e al. 2019; Koskela e al. 2020; Zhu e al. 2022). Now, i
one ies o use he same easoning wi h sample-le el DP using
S >
1local op imiza ion s eps, he p oblem is
ha he sensi i i y o he pe -sample clipped g adien s when summed o e he local s eps inc eases wi h
S
:
assuming ∥gi,s∥2≤C, s = 1, . . . , S implies ha ∥PS
s=1 gi,s∥2≤SC ( iangle-inequali y).4
In o he wo ds, ying o scale he noise o e mul iple local op imiza ion s eps nai ely ends up scaling he
que y sensi i i y linea ly wi h he o al numbe o s eps, while he ob ious p oblem in using only a single
s ep pe FL ound is ha he numbe o communica ion ounds is ypically one o he main bo lenecks in
FL (Kai ouz e al., 2019).
3.3 T us Model
In his pape , we assume an hones -bu -cu ious (hbc) se e and ha all he clien s a e ully hones . The
la e assump ion can be easily gene alised o allow o hbc clien s wi h some weakening o he ele an
p i acy bounds: wi h
N
(non-colluding) hbc clien s, since any clien could po en ially emo e i s own noise
om he agg ega ed esul s, he noise om he o he
N−
1clien s needs o gua an ee he a ge DP le el. In
e ec , o allow o all hbc clien s, we would need o scale up he noise le el somewha (see, e.g., Heikkilä
e al. 2017 o a discussion on he noise scaling and o o mal p oo s).
3
He e, we dis inguish be ween local DP (LDP, each clien gua an ees DP o i ’s own con ibu ion locally and in isola ion),
dis ibu ed DP (DDP, a g oup o clien s join ly con ibu e o he esul ing DP gua an ees agains ou side ad e sa ies wi hou a
us ed cen al pa y) and cen alised DP (CDP, a us ed pa y gua an ees DP o e e yone).
4
No e ha , as al eady men ioned in Sec ion 2, o he special case o Gaussian noise wi h a us ed se e , he analysis o
Noble e al. (2022) shows ha we can indeed a oid he p oblem and ge alid DDP gua an ees ha imp o e on he LDP case.
5

In p inciple, he same echnique can also p o ec agains p i acy h ea s in he case o including some ully
malicious clien s in he p o ocol (i.e, simply scale he noise so ha he hbc clien s a e enough o gua an ee
he equi ed DP le el). Howe e , in his case he equi ed le el o ex a noise will inc ease quickly wi h he
numbe o malicious clien s leading o hea ie u ili y loss. Wi h malicious clien s, he e would also be no
gua an ee ha he lea ning algo i hm e mina es p ope ly.
4 P i acy Accoun ing o Mul iple Local S eps Using a T us ed Agg ega o
Conside s anda d FL se ing wi h
M
clien s and clien
i
holding some local da a
xi
. On FL ound
,
N
clien s a e selec ed o upda ing by he se e , w.l.o.g. assumed o be clien s
i
= 1
, . . . , N
. Each selec ed clien
i
ecei es he cu en model pa ame e s
θ( −1)
om he se e , hen uns
S
local op imiza ion s eps using
DP-SGD wi h cons an lea ning a e
γ
, and inally sends an upda e o he se e ia a us ed agg ega o
TA:
∆( )
i=θ( )
i−θ( −1) =−
S
X
s=1
γ (g( )
i,s +ξ( )
i,s ),(6)
whe e we w i e
g( )
i,s
o he pe -uni clipped g adien s o clien
i
a local s ep
s
, and
ξ( )
i,s
o he DP noise.
A e ecei ing all he messages ia he us ed agg ega o , he se e upda es he global model using FedA g:
θ( )=θ( −1) +1
N
TA(
N
X
i=1
∆( )
i).(7)
In he es o his sec ion we s a e ou main esul s: we show ha unde some assump ions we can accoun
o p i acy in FL by looking a he local op imiza ion s eps while bene i ing om he noise added by all he
clien s on each s ep, e en i he e is no communica ion be ween he clien s du ing he local op imiza ion bu
only a single us ed agg ega ion a he end o he ound o upda e he global model pa ame e s.
W.l.o.g. om now on we d op he FL ound index
and simply w i e, e.g.,
N
ins ead o
N
o he numbe o
upda ing clien s. Since he global upda es do no access any sensi i e da a, once we can do p i acy accoun ing
o a single FL ound, which is he main opic in he es o his sec ion, gene alising o
T
FL ounds can be
done in a s aigh o wa d manne (see Appendix A.5).
In he ollowing, we assume ha all clien s ha e access o an ideal us ed agg ega o , and ha all sums a e
calcula ed by calling he us ed agg ega o . We commen on mo e ealis ic implemen a ions in Appendix A.4
a e s a ing ou main esul s.
We make he ollowing assump ions h oughou his sec ion:
Assump ion 4.1. Le
xi∈ X∗, i
= 1
, . . . , N
. We w i e
x
=
∪N
i=1xi
, and assume ha
xi∩xj
=
∅
o
e e y
i
=
j
, i.e., he e a e no o e lapping samples in di e en clien s’ local da a se s. We a e in e es ed in
ixed-leng h op imiza ion uns o
S
local s eps (common o all clien s), which leads o ( ixed-leng h) adap i e
sequen ial composi ion o p i acy accoun ing (see e.g. Roge s e al. 2016; Zhu e al. 2022). We assume all
clien s use he same lea ning a e
γ
and no m clipping wi h cons an
C
when applicable. We also assume
ha all local DP mechanisms
A(s)
i, s
= 1
, . . . , S, i
= 1
, . . . , N
a e DP w. . . he i s a gumen o any gi en
auxilia y alues (which we gene ally do no w i e ou explici ly).
No e ha we conside how o loosen many o hese assump ions in Appendix A.3.
No all possible DP mechanisms migh allow o imp o ed DDP gua an ees ia simple agg ega ion. Fo
con enience, in De ini ion 4.2 we de ine a amily o sui able mechanisms, which we call sum-domina ing:
De ini ion 4.2 (Sum-domina ing mechanism).Le
A,Ai
:
X∗→ O, i
= 1
, . . . , N
be andomised algo i hms.
We call Aasum-domina ing mechanism w. . . Ai, i = 1, . . . , N, i
sup
x≃x′
Hα N
X
i=1 Ai(xi)∥
N
X
i=1 Ai(x′
i)!≤sup
x≃x′
Hα(A(x)∥A(x′)),(8)
6
whe e
Hα
is he hockey-s ick di e gence, and
≃
is he DP neighbou hood ela ion. I addi ionally
supx≃x′Hα(A(x)∥A(x′)) <maxisupx≃x′Hα(Ai(xi)∥Ai(x′
i))
, hen we call
A
ap ope ly sum-domina ing
mechanism.
Looking a De ini ion 4.2, in p ac ice he in e es ing mechanisms o us will be he p ope ly sum-domina ing
mechanisms, as hese ha e imp o ed p i acy bounds compa ed o any single mechanism
Ai
in isola ion.
Howe e , om he ollowing esul s only Theo em 4.7 is speci ic o p ope sum-domina ing mechanisms.
Conside ing conc e e mechanisms ha a e p ope ly sum-domina ing, simple examples a e gi en by he
exis ing DP mechanisms ha use in ini ely di isible noise (see De ini ion A.1), such as Skellam (Valo ich &
Aldà, 2017; Aga wal e al., 2021), Poisson-binomial (Chen e al., 2022b), as well as he ubiqui ous con inuous
Gaussian (Dwo k e al., 2006a):5
Example 4.3 (Gaussian mechanism).Assume
Ai
is a Gaussian mechanism wi h noise co a iance
C2σ2
i·Id
and
has bounded sensi i i y
C
. Since he no mal dis ibu ion is in ini ely di isible (De ini ion A.1), he combined
mechanism
A
=
PN
i=1 Ai
is ano he Gaussian wi h sensi i i y
C
and noise co a iance
C2
(
PN
i=1 σ2
i
)
·Id
.
Finally, due o well-known exis ing esul s (see e.g. Meise & Mohammadi 2018; Koskela e al. 2020; Zhu
e al. 2022), a ( igh ly) domina ing pai o dis ibu ions (
P, Q
)in he sense o De ini ion 3.3 o he sum-
domina ing mechanism
A
is gi en by a pai o 1d Gaussians wi h means
µP
= 0
, µQ
= 1, and a iances
σ2
P
=
σ2
Q
=
PN
i=1 σ2
i
. As he DP gua an ees o he Gaussian imp o e when inc easing he a iance (Balle &
Wang, 2018), Ais a p ope ly sum-domina ing mechanism w. . . o Ai, i = 1, . . . , N (De ini ion 4.2).
Nex , we conside composing a sum-domina ing mechanisms o e
S
local s eps. This allows us o accoun o
he o al p i acy when doing mo e han one local op imiza ion s ep:
Lemma 4.4. Assume
A(s)
is a sum-domina ing mechanism w. . .
A(s)
i, i
= 1
, . . . , N
o e e y
s
= 1
, . . . , S
.
Then he composi ion o he sum-domina ing mechanisms (A(1),...,A(S))domina es he composi ion
N
X
i=1 A(1)
i,...,
N
X
i=1 A(S)
i!.(9)
P oo . Fo any s∈ {1, . . . , S}, we immedia ely ha e
sup
x≃x′
Hα N
X
i=1 A(s)
i(xi)∥
N
X
i=1 A(s)
i(x′
i)!≤sup
x≃x′
HαA(s)(x)∥A(s)(x′)(10)
by de ini ion o
A
(De ini ion 4.2). The claim he e o e ollows immedia ely om (Zhu e al., 2022, Theo em 10).
Conside ing Lemma 4.4, in ou case i essen ially says ha o accoun o unning
S
local op imiza ion s eps,
i is enough o ind a sum-domina ing mechanism o each s ep sepa a ely.
Wi h he nex esul gi en in Lemma 4.5, we can connec he p e ious esul s wi h he o m o ou pu we ge
om ac ually unning local op imiza ion in FL:
Lemma 4.5. Assume ha eleasing he ec o
N
X
i=1 A(1)
i(xi),...,
N
X
i=1 A(S)
i(xi)!(11)
sa is ies (ε, δ)-DP. Then eleasing
N
X
i=1
S
X
s=1 A(s)
i(xi)(12)
5
Disc e e Gaussian (Canonne e al., 2020) is no in ini ely di isible, bu is close-enough ha a p ope ly sum-domina ing
mechanism can s ill be ound in many p ac ical se ings, see Kai ouz e al. (2021). In such cases, he inequali y in Equa ion 8
could always be s ic , whe eas o he in ini ely di isible noise mechanisms men ioned in he ex i can be w i en as an equali y.
We no e ha e en in his case, howe e , w i ing Equa ion 8 wi h inequali y is necessa y o a oid nonsensical limi a ions, such as
ha ing a DP mechanism ha sa is ies De ini ion 4.2 wi h a gi en δwhile no sa is ying i o any δ′> δ.
7
also sa is ies (ε, δ)-DP.
P oo .
Due o he pos -p ocessing immuni y o DP (see, e.g., Dwo k & Ro h 2014), he assump ion implies
ha eleasing
S
X
s=1
N
X
i=1 A(s)
i(xi)(13)
sa is ies (
ε, δ
)-DP, and by exchanging he o de o summa ion he claim ollows. No e ha all he mechanisms
a e assumed o be DP w. . . hei i s a gumen o any gi en auxilia y alue, which allows us o do he
exchange wi hou a ec ing p i acy (in he con ex o FL, we e ec i ely swi ch om communica ing be ween
each local s ep o unning all local s eps and hen communica ing).
Taken oge he , De ini ion 4.2 along wi h Lemmas 4.4 & 4.5 allow us o compose DP mechanisms o DDP
gua an ees using LDP noise om mul iple clien s. In ou main esul gi en as Theo em 4.6, we show ha
wi h (p ope ly) sum-domina ing mechanisms each clien can un DP-SGD wi h se e al local s eps and LDP
noise while ecei ing (imp o ed) DDP gua an ees when communica ing he upda e ia a us ed agg ega o .
Theo em 4.6. Assume
N
clien s use local noise mechanisms
A(s)
i, i
= 1
, . . . , N
as in De ini ion 4.2 o each
local g adien op imiza ion s ep
s
= 1
, . . . , S
, and ha he inal agg ega ed upda e
PN
i=1
∆
i
is eleased ia an
ideal us ed agg ega o . Then deno ing he sum-domina ing mechanism o s ep
s
by
A(s)
, he que y elease
sa is ies (ε(δ), δ)-DP o any δ∈[0,1], when ε(δ)is gi en by accoun ing o eleasing he ec o
A(1)(x),...,A(S)(x),
whe e x=∪N
i=1xi.
P oo .
Fo p i acy accoun ing, assuming all sums a e done by us ed agg ega o
TA
, eleasing he agg ega ed
upda e TA(PN
i=1 ∆i)co esponds o eleasing he esul
−γ
N
X
i=1
S
X
s=1 A(s)
i(xi;zi,s),
whe e each mechanism includes a mapping ha maps he local samples o he clipped pe -uni g adien s as
well as he DP noise, and
zi,s
a e auxilia y alues (e.g., s a e a e he p e ious s ep).
6
Since all mechanisms
a e assumed o be DP w. . . he i s a gumen o any auxilia y alue, he auxilia y alues do no a ec he
DP gua an ees, and hence we do no w i e hem explici ly in he ollowing.
F om Lemma 4.5 i ollows ha alid DP gua an ees can be es ablished by accoun ing o he elease o he
ec o
PN
i=1 A(1)
i(xi),...,PN
i=1 A(S)
i(xi)
. Fu he mo e, De ini ion 4.2 implies ha o any s ep
s∈1, . . . , S
,
he sum-domina ing mechanism
A(s)
domina es
PN
i=1 A(s)
i
, and he e o e by Lemma 4.4 he claim ollows.
Conside ing p ope ly sum-domina ing mechanisms based on in ini ely di isible noise, we also ha e he
ollowing heo em on scaling he noise wi h mul iple clien s:
Theo em 4.7. In addi ion o he assump ions o Theo em 4.6, assume ha each clien uses a common noise
le el
σLDP
o some mechanism based on in ini ely di isible addi i e noise o each local s ep s. . he DP
noise
ξLDP
o any gi en local mechanism
A(s)
i, i
= 1
, . . . , N, s
= 1
, . . . , S
, has co a iance
σ2
LDP
Σ
d
o some
Σd. Then o gua an eeing gi en (ε(δ), δ)-DDP, he equi ed σLDP scales as O(1
√N).
6
Fo example, wi h s anda d DP-SGD, sample-le el DP and con inuous Gaussian noise,
PN
i=1
∆
i
=
−γPN
i=1 PS
s=1
(
gi,s
+
ξi,s), whe e gi,s a e (sums o ) clipped pe -sample g adien s and ξi,s a e he pe -s ep Gaussian noises.
8
P oo .
Le
σ2
Σbe such ha when accoun ing o he o al o
S
local s eps, he composi ion o he sum-
domina ing mechanisms
A(1),...,A(S)
is (
ε
(
δ
)
, δ
)-DDP. F om De ini ion A.1 i immedia ely ollows ha
he equi ed o al noise can be b oken in o pa s while s aying in he same noise amily o gi e
σ2=
N
X
i=1
σ2
LDP ⇔σ2
LDP =σ2
N⇔σLDP =σ
√N.
We no e ha simila scaling esul s as gi en in Theo em 4.7 o speci ic cases like he Gaussian mechanism
a e al eady well-known in he exis ing li e a u e (see, e.g., Noble e al. 2022; Heikkilä e al. 2017).
Finally, conside ing igh ness o he p i acy accoun ing done based on Theo em 4.6, i is wo h no ing ha
since he accoun ing elies on Lemma 4.5, which assumes eleasing each local s ep while he ac ually eleased
que y answe is a sum o e he local s eps, he esul ing p i acy bound need no be igh bu an uppe
bound on he p i acy budge . Howe e , his ma ches he usual DP-SGD p i acy accoun ing analysis (see e.g.
Mi ono e al. 2019; Koskela e al. 2020), which ypically needs o accoun o each local op imiza ion s ep
due o echnical easons e en i only he inal model is eleased. In he gene al case, i has also been shown
ha hiding he in e media e s eps does no b ing any p i acy bene i s compa ed o he pe -s ep accoun ing
(Annamalai, 2024).
5 Expe imen s
Se up and Mo i a ion: Ou chosen se ings y o mimic a ypical c oss-silo FL se up: he e a e a limi ed
numbe o clien s, each ha ing a smallish local da abase. The clien s ha e enough local compu e o un
op imiza ion on he chosen model, while he numbe o se e -clien communica ions equi ed o upda ing
he global model a e he main bo le-neck. No e ha his bo leneck will eme ge e en wi h la ge ac ual
o ganisa ions aining models wi h b oadband connec ions, when he model size is la ge-enough, e.g., when
aining ounda ion models (Bommasani e al., 2021). This is especially ue when using SecAgg p o ocols,
since he cos o unning a eal SecAgg algo i hm p esen s signi ican compu e and communica ion o e heads
e en wi h he e icien p o ocols discussed in Appendix A.4. In his se ing, i makes sense o y and push
mo e op imiza ion s eps o he clien s while educing he numbe o global upda es (FL ounds). We also
assume ha he clien s send hei local upda es ia some us ed agg ega o (which we only assume and do
no implemen in p ac ice in he expe imen s. Howe e , we do use only disc e e DP mechanisms compa ible
wi h s anda d SecAgg algo i hms in all he expe imen s). Fo mo e de ails on all he expe imen s, see
Appendix A.6. The code o ep oducing he esul s will be eleased on Gi Hub.7
CNN on Fashion-MNIST: We i s ain a small con olu ional neu al ne wo k (CNN) on Fashion MNIST
da a (Xiao e al., 2017), ha is dis ibu ed iid among 10 clien s. We use he CNN a chi ec u e in oduced by
Pape no e al. (2021); T amè & Boneh (2021). Figu e 1 shows he mean wi h s anda d e o o he mean
(SEM) o e 5 epea s o es accu acy and loss wi h DP-SGD using Skellam noise (Aga wal e al., 2021)
wi h 32 bi s g adien quan iza ion,.i.e., wi hou quan iza ion. We ain he model o 20 FL ounds and
a ying numbe o local s eps. Compa ing he esul s o 1local s ep as opposed o 1local epoch (
≃
11 s eps,
bu wi h di e en sampling ac ion compa ed o baseline), i is e iden ha being able o ake mo e local
op imiza ion s eps (as allowed by Theo em 4.6) b ings conside able u ili y bene i s unde ixed p i acy and
communica ion budge s.
Linea Classi ie on T ans o med CIFAR-10: O e all, assuming a ixed p i acy budge , we migh
expec he bene i s om being able o un mo e local s eps o be mo e accen ua ed wi h mo e complex models
and e y limi ed communica ion budge , while o simple-enough models and mo e FL ounds e en a ew
local s eps could lead o good esul s. To es o wha ex en his is ue o simple ye s ill use ul models,
we conside CIFAR-10 da a (K izhe sky, 2009). Simila o T amè & Boneh (2021), we ake a ResNeX -29
model (Xie e al., 2017) p e- ained wi h CIFAR-100 da a (K izhe sky, 2009), emo e he inal classi ie , and
7h ps://gi hub.com/mixheikk/DPFL-wi h-SecAgg-pape
9
olume 151 o P oceedings o Machine Lea ning Resea ch, pp. 10110–10145. PMLR, 2022. URL
h ps:
//p oceedings.ml .p ess/ 151/noble22a.h ml.
Nicolas Pape no , Abh adeep Thaku a, Shuang Song, S e e Chien, and Úl a E lingsson. Tempe ed sigmoid
ac i a ions o deep lea ning wi h di e en ial p i acy. In Thi y-Fi h AAAI Con e ence on A i icial
In elligence, AAAI 2021, Thi y-Thi d Con e ence on Inno a i e Applica ions o A i icial In elligence,
IAAI 2021, The Ele en h Symposium on Educa ional Ad ances in A i icial In elligence, EAAI 2021,
Vi ual E en , Feb ua y 2-9, 2021, pp. 9312–9321. AAAI P ess, 2021. URL
h ps://ojs.aaai.o g/index.
php/AAAI/a icle/ iew/17123.
Ryan M. Roge s, Salil P. Vadhan, Aa on Ro h, and Jona han R. Ullman. P i acy odome e s and il e s: Pay-as-
you-go composi ion. In Daniel D. Lee, Masashi Sugiyama, Ul ike on Luxbu g, Isabelle Guyon, and Roman
Ga ne (eds.), Ad ances in Neu al In o ma ion P ocessing Sys ems 29: Annual Con e ence on Neu al In o -
ma ion P ocessing Sys ems 2016, Decembe 5-10, 2016, Ba celona, Spain, pp. 1921–1929, 2016. URL
h ps:
//p oceedings.neu ips.cc/pape /2016/hash/58c54802a9 b9526cd0923353a34a7ae-Abs ac .h ml.
Césa Saba e , Au élien Belle , and Jan Ramon. An accu a e, scalable and e i iable p o ocol o ede a ed
di e en ially p i a e a e aging. Mach. Lea n., 111(11):4249–4293, 2022. doi: 10.1007/S10994-022-06267-9.
URL h ps://doi.o g/10.1007/s10994-022-06267-9.
Reza Shok i, Ma co S ona i, Congzheng Song, and Vi aly Shma iko . Membe ship in e ence a acks
agains machine lea ning models. In 2017 IEEE Symposium on Secu i y and P i acy, SP 2017, San Jose,
CA, USA, May 22-26, 2017, pp. 3–18. IEEE Compu e Socie y, 2017. doi: 10.1109/SP.2017.41. URL
h ps://doi.o g/10.1109/SP.2017.41.
Jinhyun So, Basak Güle , and Ami Salman A es imeh . Tu bo-agg ega e: B eaking he quad a ic agg ega ion
ba ie in secu e ede a ed lea ning. IEEE J. Sel. A eas In . Theo y, 2(1):479–489, 2021. doi: 10.1109/
JSAIT.2021.3054610. URL h ps://doi.o g/10.1109/JSAIT.2021.3054610.
Shuang Song, Kamalika Chaudhu i, and Anand D. Sa wa e. S ochas ic g adien descen wi h di e en ially
p i a e upda es. In IEEE Global Con e ence on Signal and In o ma ion P ocessing, GlobalSIP 2013, Aus in,
TX, USA, Decembe 3-5, 2013, pp. 245–248. IEEE, 2013. doi: 10.1109/GLOBALSIP.2013.6736861. URL
h ps://doi.o g/10.1109/GlobalSIP.2013.6736861.
Thomas S einke. Composi ion o di e en ial p i acy & p i acy ampli ica ion by subsampling. CoRR,
abs/2210.00597, 2022. URL h ps://doi.o g/10.48550/a Xi .2210.00597.
Timo hy S e ens, Ch is ian Skalka, Ch is elle Vincen , John Ring, Samuel Cla k, and Joseph P. Nea .
E icien di e en ially p i a e secu e agg ega ion o ede a ed lea ning ia ha dness o lea ning wi h
e o s. In Ke in R. B. Bu le and Ku Thomas (eds.), 31s USENIX Secu i y Symposium, USENIX
Secu i y 2022, Bos on, MA, USA, Augus 10-12, 2022, pp. 1379–1395. USENIX Associa ion, 2022. URL
h ps://www.usenix.o g/con e ence/usenixsecu i y22/p esen a ion/s e ens.
Flo ian T amè and Dan Boneh. Di e en ially p i a e lea ning needs be e ea u es (o much mo e da a). In
9 h In e na ional Con e ence on Lea ning Rep esen a ions, ICLR 2021, Vi ual E en , Aus ia, May 3-7,
2021. OpenRe iew.ne , 2021. URL h ps://open e iew.ne / o um?id=YTWG pFOQD-.
Aleksei T ias cyn, Ma hias Reisse , and Ch is os Louizos. DP-REC: p i a e & communica ion-e icien
ede a ed lea ning. CoRR, abs/2111.05454, 2021. URL h ps://a xi .o g/abs/2111.05454.
S acey T uex, Na halie Ba acaldo, Ali Anwa , Thomas S einke, Heiko Ludwig, Rui Zhang, and Yi Zhou.
A hyb id app oach o p i acy-p ese ing ede a ed lea ning. In P oceedings o he 12 h ACM Wo kshop
on A i icial In elligence and Secu i y, AISec’19, pp. 1–11, New Yo k, NY, USA, 2019. Associa ion o
Compu ing Machine y. ISBN 9781450368339. doi: 10.1145/3338501.3357370. URL
h ps://doi.o g/10.
1145/3338501.3357370.
S acey T uex, Ling Liu, Ka Ho Chow, Mehme Em e Gu soy, and Wenqi Wei. Ldp- ed: ede a ed lea ning
wi h local di e en ial p i acy. In Aa on Yi Ding and Richa d Mo ie (eds.), P oceedings o he 3 d
16

In e na ional Wo kshop on Edge Sys ems, Analy ics and Ne wo king, EdgeSys@Eu oSys 2020, He aklion,
G eece, Ap il 27, 2020, pp. 61–66. ACM, 2020. doi: 10.1145/3378679.3394533. URL
h ps://doi.o g/10.
1145/3378679.3394533.
Filipp Valo ich and F ancesco Aldà. Compu a ional di e en ial p i acy om la ice-based c yp og aphy. In
Je zy Kaczo owski, Jose Piep zyk, and Jacek Pomykala (eds.), Numbe -Theo e ic Me hods in C yp ology -
Fi s In e na ional Con e ence, NuTMiC 2017, Wa saw, Poland, Sep embe 11-13, 2017, Re ised Selec ed
Pape s, olume 10737 o Lec u e No es in Compu e Science, pp. 121–141. Sp inge , 2017. doi: 10.1007/
978-3-319-76620-1 _8. URL h ps://doi.o g/10.1007/978-3-319-76620-1_8.
Han Xiao, Kashi Rasul, and Roland Vollg a . Fashion-MNIST: a no el image da ase o benchma king
machine lea ning algo i hms. CoRR, abs/1708.07747, 2017. URL h p://a xi .o g/abs/1708.07747.
Saining Xie, Ross B. Gi shick, Pio Dollá , Zhuowen Tu, and Kaiming He. Agg ega ed esidual ans o ma ions
o deep neu al ne wo ks. In 2017 IEEE Con e ence on Compu e Vision and Pa e n Recogni ion,
CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pp. 5987–5995. IEEE Compu e Socie y, 2017. doi:
10.1109/CVPR.2017.634. URL h ps://doi.o g/10.1109/CVPR.2017.634.
Yuchen Yang, Bo Hui, Haolin Yuan, Neil Gong, and Yinzhi Cao. P i a eFL: Accu a e, di e en ially p i a e
ede a ed lea ning ia pe sonalized da a ans o ma ion. In 32nd USENIX Secu i y Symposium (USENIX
Secu i y 23), pp. 1595–1612, Anaheim, CA, 2023. USENIX Associa ion. ISBN 978-1-939133-37-3. URL
h ps://www.usenix.o g/con e ence/usenixsecu i y23/p esen a ion/yang-yuchen.
And ew C Yao. P o ocols o secu e compu a ions. In SFCS ’82, pp. 160–164. IEEE Compu e Socie y, 1982.
doi: 10.1109/SFCS.1982.88. URL h p://dx.doi.o g/10.1109/SFCS.1982.88.
Samuel Yeom, I ene Giacomelli, Ma F ed ikson, and Somesh Jha. P i acy isk in machine lea ning: Analyzing
he connec ion o o e i ing. In 31s IEEE Compu e Secu i y Founda ions Symposium, CSF 2018, Ox o d,
Uni ed Kingdom, July 9-12, 2018, pp. 268–282. IEEE Compu e Socie y, 2018. doi: 10.1109/CSF.2018.00027.
URL h ps://doi.o g/10.1109/CSF.2018.00027.
Ashkan Youse pou , Igo Shilo , Alexand e Sablay olles, Da ide Tes uggine, Ka hik P asad, Mani Malek,
John Nguyen, Sayan Ghosh, Akash Bha adwaj, Jessica Zhao, G aham Co mode, and Ilya Mi ono .
Opacus: Use - iendly di e en ial p i acy lib a y in PyTo ch. CoRR, abs/2109.12298, 2021. URL
h ps://a xi .o g/abs/2109.12298.
Jiaojiao Zhang, Linglingzhi Zhu, Dominik Fay, and Mikael Johansson. Locally di e en ially p i a e online
ede a ed lea ning wi h co ela ed noise. CoRR, abs/2411.18752, 2024. URL
h ps://doi.o g/10.48550/
a Xi .2411.18752.
Yuqing Zhu, Jinshuo Dong, and Yu-Xiang Wang. Op imal accoun ing o di e en ial p i acy ia cha ac e is ic
unc ion. In Gus au Camps-Valls, F ancisco J. R. Ruiz, and Isabel Vale a (eds.), In e na ional Con e ence
on A i icial In elligence and S a is ics, AISTATS 2022, 28-30 Ma ch 2022, Vi ual E en , olume 151
o P oceedings o Machine Lea ning Resea ch, pp. 4782–4817. PMLR, 2022. URL
h ps://p oceedings.
ml .p ess/ 151/zhu22c.h ml.
A Appendix
A.1 De ini ions Omi ed om he Main Pape
In he ollowing, we w i e
d
=
o mean equali y o dis ibu ions. Wi h his no a ion, we de ine in ini e di isibili y
in a s anda d way (see, e.g., Felle 1966):
De ini ion A.1 (In ini e di isibili y).A dis ibu ion
F
is in ini ely di isible i o e e y
n
he e exis s a
amily o dis ibu ions Fns. .
Sn
d
=
n
X
i=1
Xi,n,
whe e he andom a iables Sn∼Fand Xi,n ∼Fna e independen o all i= 1, . . . , n.
17
A.2 S anda d Algo i hms Omi ed om he Main Pape
Fo con enience, we gi e he pseudo-code o unning DPFL wi h FedA g when using DP-SGD o ei he
sample- (Algo i hm 2) o use -le el (Algo i hm 3) p o ec ion. We emphasize ha we do no claim any no el
con ibu ion in hese algo i hms, bu hey a e included simply o cla i y he discussion in he main pape .
Algo i hm 1 (DP)FL wi h FedA g: Se e -side (McMahan e al., 2017)
Requi e: Numbe o clien s N, numbe o FL ounds T, ini ial global model pa ame e s θ(0)
1: o ∈[T]do
2: Choose N clien s o upda ing (w.l.o.g. numbe ed 1, . . . , N ) and send he cu en model θ( −1)
3: Recei e pseudo-g adien s ∆( −1)
i, i = 1, . . . , N om he upda ing clien s
4: Upda e he global model:
5: θ( )←θ( −1) +1
N PN
i=1 ∆( −1)
i
6: end o
Ou pu : θT
Algo i hm 2 DPFL: Clien -side wi h sample-le el DP-SGD using Gaussian noise (Song e al., 2013; Abadi
e al., 2016)
Requi e:
Numbe o local op imiza ion s eps
S
, local da ase
xi
, loss unc ion
, lea ning a e
γ
, clipping
cons an C, noise scale σ, Poisson subsampling p obabili y q
1: Clien ecei es cu en model θ om he se e :
2: θ0←θ
3: o s∈[S]do
4: Upda e local model wi h DP-SGD:
5: gi,s ←0
6: Sample miniba ch bs om xiusing Poisson sampling wi h p obabili y q
7: o j∈[|bs|]do
8: Calcula e j h pe -sample g adien in miniba ch, clip and accumula e:
9: ˜gi,s ← ∇ (bs,j;θs−1)
10: gi,s ←gi,s + min 1,C
∥˜gi,s∥2˜gi,s
11: end o
12: Add noise o clipped g adien s and ake a s ep:
13: θ(s)←θ(s−1) −γ(gi,s +Cξ), whe e ξ∼ N(0, σ2Id)and dis model dimensionali y
14: end o
15: ∆←θS−θ0
Ou pu : ∆(Pseudo-g adien s sen o he se e )
18
Algo i hm 3 DPFL: Clien -side wi h use -le el DP using Gaussian noise (McMahan e al., 2018)
Requi e:
Numbe o local op imiza ion s eps
S
, local da ase
xi
, lea ning a e
γ
, clipping cons an
C
, noise
scale σ, Poisson subsampling p obabili y q
1: Clien ecei es cu en model θ om he se e :
2: θ0←θ
3: Upda e local model S imes wi h s anda d SGD:
4: θ(S)← om unning SGD locally o Ss eps wi h lea ning a e γon xi
5: ∆←θS−θ0
6: Clip pseudo-g adien s and add noise:
7: ∆←min1,C
∥∆∥2∆ + Cξ, whe e ξ∼ N 0, σ2Idand dis model dimensionali y
Ou pu : ∆(Pseudo-g adien s sen o he se e )
Essen ially, o his example when aiming o DDP, a FL ound
ou analysis enables choosing he numbe
o local s eps
S
in Algo i hm 2 eely while s ill calib a ing he DP noise by looking a he o al noise le el
PN
i=1 σ2
i
added by each clien sepa a ely on each local s ep (line 13 in Algo i hm 2). No e ha o he special
case o using Gaussian noise, a simila esul has been p esen ed by Noble e al. (2022).
A.3 Loosening Assump ions
In he main pape , as s a ed in Assump ion 4.1, o each FL ound we ha e assumed cons an lea ning a e
γ
,
no m clipping bound
C
, noise le el
σ
, and numbe o local op imiza ion s eps
S
, all o hem sha ed by all
he clien s. Nex , we conside loosening hese assump ions. As be o e, w.l.o.g. we conside only a single FL
ound, and will he e o e omi he index .
Focusing i s on he lea ning a e, we can immedia ely gene alize ou esul s o allow o di e en lea ning
a e
γs
o each local s ep
s
= 1
, . . . , S
: wi h his no a ion, ollowing he easoning o Theo em 4.6, he
agg ega ed upda e om he clien s is gi en by
N
X
i=1
∆i=−
N
X
i=1
S
X
s=1
γsA(s)
i(xi;zi,s),(14)
which can again be seen as pos -p ocessing he ec o
PN
i=1 A(1)
i(xi),...,PN
i=1 A(S)
i(xi)
, so we can again
use Lemma 4.5 o accoun ing wi hou encoun e ing p oblems.
When conside ing clien -speci ic lea ning a es hings can be mo e complica ed. The main issue now is o
ind p ope sum-domina ing mechanisms ha sa is y:
sup
x≃x′
Hα N
X
i=1
γi,sA(s)
i(xi)∥
N
X
i=1
γi,sA(s)
i(x′
i)!≤sup
x≃x′
HαA(s)(x)∥A(s)(x′), s = 1, . . . , S. (15)
As a conc e e example, assume
A(s)
i
is he con inuous Gaussian mechanism wi h sha ed no m clipping
cons an s and noise le els
Ci,s
=
Cs, σi,s
=
σs∀i
. D opping he s ep index
s
o eadabili y, le
γi
=
γ1
li
o some
li>
0
, i
= 2
, . . . , N
. W i ing
gi
o a sum o e he pe -uni clipped g adien s o clien
i
, and
ξi∼ N(0, C2σ2·Id)a single op imiza ion s ep now con ibu es he ollowing e m o he global upda e:
−
N
X
i=1
γi(gi+ξi)(16)
=−γ1 g1+ξ1+
N
X
i=2
gi+ξi
li!(17)
=−γ1 g1+
N
X
i=2
gi
li
+ξ!,(18)
19
whe e
ξ∼ N
(0
, C2σ2
[1 +
PN
i=2 1
l2
i
]
·Id
), which is a sum-domina ing Gaussian mechanism. When accoun ing
o he sum-domina ing mechanism, i has sensi i i y
C∗
=
max{C, C
l2,..., C
lN}
, which in u n gi es noise
a iance (C
C∗)2σ2[1 + PN
i=2 1
l2
i
] o DP.
Simila ly, we could elax he assump ions u he o allow he clien s o use di e en clipping and noise le els
Ci, σi. As be o e, a single op imiza ion s ep can again be w i en in he o m o Equa ion 18, when
ξ∼ N 0,"C2
1σ2
1+
N
X
i=2
C2
iσ2
i
l2
i#·Id!.(19)
Fo global p i acy accoun ing wi h a sum-domina ing Gaussian mechanism, sui able sensi i i y is now gi en
by C∗= max{C1,C2
l2,...,CN
lN}, and he esul ing a iance o accoun ing is PN
i=1(Ciσi
C∗)2.
Assuming clien s ha e di e ing numbe o local s eps, we can y o use some local s eps o he p i acy
analysis un il all clien s ha e he same numbe o s eps S, a e which we can hen use he ea lie esul s.8
As a simple example, assume we ha e 2 clien s unning DP-SGD: clien 1 uns
S
local s eps using no m
clipping cons an
C
and Gaussian mechanism wi h noise a iance
σ2
, while clien 2 uns 2
S
local s eps wi h
clipping
C/
2and Gaussian noise a iance
σ2
. The di e ence now is ha while he clipping is done on each
s ep, om he p i acy accoun ing pe spec i e we can dis ega d some noise and hink ha clien 2 adds noise
only on e e y o he s ep. Looking a he upda e om clien 2, we would hen ha e
∆2=−γ
2S
X
s=1
(g2,s +I[s= 2l, l ∈N]·ξ2,s)(20)
=−γ
S
X
s=1
(g′
2,s +ξ′
2,s),(21)
whe e
g2,s
a e he clipped pe -sample g adien s,
g′
2,s
:=
g2,2s−1
+
g2,2s
,
ξ2,s
a e he noise alues,
ξ′
2,s
:=
ξ2,2s
,
and
I
is he indica o unc ion. Due o he clipping, he sensi i i y o each used s ep can be easily uppe
bounded ia iangle-inequali y:
∥g2,s′∥2
=
∥g2,2s′−1
+
g2,2s′∥2≤ ∥g2,2s′−1∥2
+
∥g2,2s′∥2≤C
. Since Equa ion 21
now has he same numbe o local s eps as clien 1 is aking, we can eadily use he p e ious esul s o enable
p i acy accoun ing o he agg ega ed upda e. Combining he using o local s eps wi h he p e ious no es on
di e ing clipping no m alues, lea ning a es and noise a iances allows us o use ou main esul s in se e al
se ings beyond wha is s a ed in Assump ion 4.1.
As a inal no e, when he clien s use da a subsampling o he local op imiza ion, di e ing local subsampling
p obabili ies can lead o ha ing a ying DP gua an ees be ween he clien s on he global le el due o he
di e en subsampling ampli ica ion e ec s, bu can o he wise be inco po a ed wi h he same analysis we ha e
al eady p esen ed.
A.4 F om Ideal T us ed Agg ega o s o P ac ical SecAgg P o ocols
Fo implemen ing he us ed agg ega o assumed in Theo em 4.6 in p ac ice, i should be no ed ha as
he sum o e
s
is done locally by each clien du ing local op imiza ion, i is always us ed as long as he
indi idual clien s a e, while he sum o e
i
would need o be implemen ed, e.g., using a sui able SecAgg
p o ocol. Se e al such algo i hms a e known, including he ones p oposed by Bell e al. (2020); Bonawi z
e al. (2017); Saba e e al. (2022); So e al. (2021).
Using a SecAgg p o ocol will ypically also place some ex a equi emen s on he DP mechanisms
A(s)
i
, since
he SecAgg algo i hms usually un on elemen s o ini e ings. This p ecludes con inuous noise mechanisms.
A iable al e na i e is o use some sui able disc e e noise mechanism, such as Skellam (Aga wal e al., 2021)
8
Al e na i ely, we could also conside b eaking some local s eps in o se e al pa s. We lea e he de ailed conside a ion o his
app oach o u u e wo k.
20
o Poisson-binomial (Chen e al., 2022b). Howe e , di e ing om he cases conside ed in he ci ed pape s,
since in ou case he clien s send model upda es ins ead o single g adien s, he ini e ing size used in he
SecAgg p o ocol needs o accommoda e he model upda e size: i does no good o use Skellam mechanism
wi h g adien quan iza ion o a small numbe o bi s, i he model weigh s and he esul ing model upda e ∆
i
o clien is ill uses 32 bi loa s.
A.5 P i acy Accoun ing De ails
Fo p i acy accoun ing we u ilize Rényi DP (RDP):
De ini ion A.2. (Mi ono , 2017) Le
α >
1and
ε >
0. A andomised algo i hm
A
:
X∗→ O
is (
α, ε
)-RDP
i o e e y x, x′∈ X∗:x≃x′
Dα(A(x)∥A(x′)) ≤ε,
whe e Dαis he Rényi di e gence o o de α:
Dα(P∥Q) = 1
α−1log E ∼Qp( )
q( )α
.
We do p i acy accoun ing o all he expe imen s based on RDP. Gene ally, we accoun o he p i acy o
each indi idual local op imiza ion s ep using he noise con ibu ions om all he clien s selec ed o a gi en
FL ound. When he clien s use Poisson subsampling o sample miniba ches (we assume each clien uses
he same p obabili y o including any indi idual sample in he miniba ch), we use s anda d RDP p i acy
ampli ica ion esul s. In p ac ice, we use he RDP accoun an implemen ed in Opacus (Youse pou e al.,
2021), as well as bounds o Skellam mechanism by Aga wal e al. (2021) and igh RDP ampli ica ion by
Poisson subsampling (S einke, 2022). We calcula e he p i acy cos o he en i e aining un in RDP, and
hen con e in o ADP using (Mi ono , 2017, P oposi ion 3). No e ha , as is common in DP esea ch, we do
no include he p i acy cos o hype pa ame e uning in he epo ed p i acy budge s (see, e.g., T amè &
Boneh 2021 o some easoning on his p ac ice).
A.6 Expe imen al De ails
All he expe imen al se ings we use sa is y Assump ion 4.1. We use DP-SGD wi h Skellam mechanism o
op imise he local model pa ame e s, and s anda d ede a ed a e aging as he agg ega ion ule o upda ing
he global model in all expe imen s. Fo each cen alised da a se (combining o iginal ain and es se s), we
spli he da a andomly in o equal sha es, which esul s in ha ing almos he same da a dis ibu ion on each
clien . Fo hype pa ame e op imiza ion wi h each da ase , we i s spli each clien s’ da a in e nally in o ain
and es pa s wi h ac ions (.8-.2). Fo uning all hype pa ame e s, we use only he aining ac ion, and
di ide i u he (.7-.3) in o hype pa ame e ain- alida ion. We use Bayesian op imiza ion-based app oach
implemen ed in Weigh s and Biases (Biewald, 2020) o hype pa ame e uning, and simula e FL using Flowe
(Beu el e al., 2020).
In gene al, when uning hype pa ame e s we do 50 hype pa ame e uning uns. Fo each uning un, we
ain he model on hype pa ame e aining ac ion, es on he alida ion ac ion, and y o op imise o
he inal model weigh ed alida ion loss. A e inishing he hype pa ame e uning, we e- ain he model
om sc a ch 5 imes wi h di e en andom seeds wi h he bes ound hype pa ame e s using he en i e
o iginal aining da a and es ing on he es ac ion. We epo he mean and he s anda d e o o he
mean (SEM) in all he igu es. In Figu e 2 we plo he minimum es loss/maximum es accu acy aken o e
he en i e aining un.
Fo he expe imen s wi h Fashion-MNIST (Xiao e al., 2017) and CIFAR-10 (K izhe sky, 2009) da a se s, we
un hype pa ame e uning sepa a ely o each combina ion o numbe o local s eps {1 s ep, 1 epoch}, and
expec ed miniba ch sizes on he g id {64,128,256,512}using Poisson subsampling.
Fo Fashion-MNIST he bes expec ed ba ch sizes ound a e 512 o 1 local epoch, and 128 o 1 local s ep.
Wi h CIFAR-10, due o hea y compu a ional cos o hype pa ame e uning, we use a single expec ed ba ch
size o each con igu a ion o local s eps {1 s ep, 1 epoch} and FL ounds
{
10
,
20
,
40
,
160
}
. Conc e ely, we
21

pick he bes expec ed ba ch size alue om he abo e g id when using Bayesian op imiza ion o une all
hype pa ame e s wi h 20 FL ounds. This esul s in choosing expec ed ba ch size 128 o 1local s ep and 256
o 1local epoch. We hen use hese alues and op imize all o he hype pa ame e s sepa a ely o all o he
FL ound se ings.
Wi h ACS Income da a (Ding e al., 2021), we une all hype pa ame e s o each combina ion o local s eps
{1 s ep, 1 epoch} wi h Poisson subsampling using local sampling p obabili y on he g id
{
0
.
4
,
0
.
2
,
0
.
1
,
0
.
05
}
o 10 FL ounds. We epo esul s on he bes ound local sampling p obabili ies (0.05 o bo h).
Fo ResNeX -29 8x64, we used p e- ained weigh s a ailable om
h ps://gi hub.com/bea paw/
py o ch-classi ica ion
. Ou implemen a ion o he Skellam mechanism is based on he implemen-
a ion om
h ps://gi hub.com/ acebook esea ch/dp_comp ession
(Chaudhu i e al., 2022; Guo e al.,
2023).
Fo Ame ican Communi y Su ey (ACS) Income da a se Ding e al. (2021) we use he da a o all he s a es
and Pue o Rico o 2018. The goal is o p edic whe he an indi idual has income g ea e han $50000.
Ins ead o simula ing da a spli s, we use he inhe en spli s, i.e., we ake each o iginal egion (s a e o Pue o
Rico) o be a clien .
Fo aining all models, we use a small clus e wi h NVIDIA Ti an Xp, and NVIDIA Ti an V GPUs. The
o al compu e ime o all he aining uns (including debugging) o e all GPUs amoun s oughly o 30-60
GPU days.
A.7 Addi ional Resul s
Table 1 shows he esul s om a e aging se e al independen ly ained LDP models as desc ibed in Sec ion 5.
Table 1: Imp o ed p i acy o a e aged models, Skellam mechanism, 32 bi s (no quan iza ion), Poisson
sampling wi h sampling ac ion 0
.
1, each local model is unbounded (
ε
= 5
., δ
= 1
e−
5)-LDP. A e aging mo e
models imp o es on he DP gua an ees agains ad e sa ies who do no ha e access o he o iginal models.
Local s eps Pa ies σ o al a g model ε
1s ep 1 0.69 5.0
1s ep 2 0.98 2.78
1s ep 5 1.54 1.22
1s ep 10 2.18 0.64
1epoch 1 0.90 5.0
1epoch 2 1.28 2.61
1epoch 5 2.02 1.19
1epoch 10 2.85 0.72
5epochs 1 1.18 5.0
5epochs 2 1.67 2.85
5epochs 5 2.64 1.55
5epochs 10 3.73 1.03
22