Ana Macano ic
Complexi y Science Hub Vienna
19.11.2025.
Li e Ou come P edic ion wi h
Founda ion Models T ained on
Popula ion Regis y Da a
Today’s alk
1. P edic ing li e ou comes
- W ong modelling choices?
- Too li le da a?
2. Founda ion ans o me models o li e cou se modelling
- Building sequences
- En ichmen s
3. Model e alua ion
4. In e es ing s u
5. Whe e do we go om he e?
h ps://www. la icon.com/
Icons by Pee apak Takpho and Bec is
Team
Tanzi Pial Lucas Sage Fla io Ha ne Enamul Hassan
Tom Eme y A nou an de Rij S e en Skiena Dako a Handzlik
1. P edic ing li e ou comes
- Knowing wha will happen o indi iduals in ou socie ies can:
- imp o e policies
- imp o e unde s anding o di e ences
- boos ou heo e ical unde s anding o he social li e
h ps://www. la icon.com/
Icons by F eepik and Nualnoi Kinkaeo
1. P edic ing li e ou comes
- Knowing wha will happen o indi iduals in ou socie ies can:
- imp o e policies
- imp o e unde s anding o di e ences
- boos ou heo e ical unde s anding o he social li e
- Wha migh we wan o know?
- Demog aphic indica o s
- Sociological indica o s
- A i udes, beha iou s
- Heal h ou comes
h ps://www. la icon.com/
Icons by F eepik and Nualnoi Kinkaeo
1. P edic ing li e ou comes
h ps://www. la icon.com/
Icons by F eepik and Nualnoi Kinkaeo
10 yea s om now: 18.02 –19.71
20 yea s om now: 18.69 –20.23
1. P edic ing li e ou comes
- How do we usually do his?
- Specialized models o indi idual asks:
- Demog aphic p edic ions
- Logis ic eg ession
- Machine lea ning (e.g., Random Fo es )
h ps://www. la icon.com/
Icons by Pee apak Takpho and Bec is
e ili y
income
di o ce
ele an
da a
ele an
da a
ele an
da a
1. P edic ing li e ou comes is ha d!
1. P edic ing li e ou comes is ha d!
- Ex apola ion om he las pe iod bes o
e ili y (Bohk-Ewald 2018)
1.a. Modelling li es: Li e cou se heo y
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
no job job 1 job 2
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
1. Li e domains a e en angled
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
single ma ied ma ied & child
1.a. Modelling li es: Li e cou se heo y
no job job 1 job 2
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
1. Li e domains a e en angled
2. Li e sequences exhibi pa h-dependence and
lock-in
1.a. Modelling li es: Li e cou se heo y
single ma ied ma ied & child
no job job 1 job 2
single
no job job 1 no job
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
1. Li e domains a e en angled
2. Li e sequences exhibi pa h-dependence and
lock-in
3. Timing and du a ion ma e
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
1.a. Modelling li es: Li e cou se heo y
no job job 1 job 2
no job job 1 job 2
no job job 1 job 2
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
1. Li e domains a e en angled
2. Li e sequences exhibi pa h-dependence and
lock-in
3. Timing and du a ion ma e
4. Li es a e “linked”
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
job 1 no job
1.a. Modelling li es: Li e cou se heo y
single ma ied ma ied & child
no job job 1 job 2
- Linking ou comes o he o de ing and du a ion
o a ange o in e ela ed li e e en s (Willekens
1999)
1. Li e domains a e en angled
2. Li e sequences exhibi pa h-dependence and
lock-in
3. Timing and du a ion ma e
4. Li es a e “linked”
- Sequence analysis echniques oo
es ic i e o accoun o hese ac o s
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
heal hy sick
1.a. Modelling li es: Li e cou se heo y
job 1 no job
single ma ied ma ied & child
no job job 1 job 2
1.b. Mo e da a: CBS Regis e da a
h ps://www. la icon.com/
Icons by F eepik
- Du ch Regis e Da a
- All esiden s o he Ne he lands
- Con inuously upda ed
- Some yea ly da ase s go back o ea ly 2000s,
mos o 2010s
- Ou da ase
-~23 million people
-~7.5 billion eco ds
1.b. Mo e da a: CBS Regis e da a
h ps://www. la icon.com/
Icons by F eepik
- Du ch Regis e Da a
- All esiden s o he Ne he lands
- Con inuously upda ed
- Some yea ly da ase s go back o ea ly 2000s,
mos o 2010s
- Ou da ase
-~23 million people
-~7.5 billion eco ds
backg ound/
household
educa ion
employmen
ma iage/
di o ce
ne wo k
income
2. Po en ial answe ?
sequences
e ili y
income
di o ce
h ps://www. la icon.com/
Icons by F eepik
backg ound/
household
educa ion
employmen
ma iage/
di o ce
ne wo k
income
2. Po en ial answe ?
sequences
e ili y
income
di o ce
h ps://www. la icon.com/
Icons by F eepik
backg ound/
household
educa ion
employmen
ma iage/
di o ce
ne wo k
income
2. T ans o me models:
Bonus 1: Pa en s and pa ne s
h ps://www. la icon.com/
Icons by Andi ii C ea i e and Bec is
- Inco po a e backg ound and cu en e en s o
pa en s in o sequences up o 18 yea s o age
- Inco po a e backg ound and cu en e en s o
pa ne s h oughou he du a ion o egis e ed
pa ne ship/ma iage
2. T ans o me models:
Bonus 2: Popula ion-scale ne wo ks!
h ps://www. la icon.com/
Icons by Andi ii C ea i e and Bec is
-Mul iplex popula ion-le el oppo uni y ne wo k
- amily, household, neighbo s, classma es,
colleagues
2. T ans o me models:
Bonus 2: Popula ion-scale ne wo ks!
h ps://www. la icon.com/
Icons by Andi ii C ea i e and Bec is
-Mul iplex popula ion-le el oppo uni y ne wo k
- amily, household, neighbo s, classma es,
colleagues
-DeepWalk (Pe ozzi e al, 2014) wi h laye
pe sis ence and hubs
- Tempo al alignmen ac oss yea s
2. T ans o me models:
Bonus 2: Popula ion-scale ne wo ks!
h ps://www. la icon.com/
Icons by Andi ii C ea i e and Bec is
-Mul iplex popula ion-le el oppo uni y ne wo k
- amily, household, neighbo s, classma es,
colleagues
-DeepWalk (Pe ozzi e al, 2014) wi h laye
pe sis ence and hubs
- Tempo al alignmen ac oss yea s
- Gene alized Fibonacci g id me hod o highe
dimensions o “clus e ” embeddings
2. T ans o me models:
Bonus 2: Popula ion-scale ne wo ks!
h ps://www. la icon.com/
Icons by Andi ii C ea i e and Bec is
-Mul iplex popula ion-le el oppo uni y ne wo k
- amily, household, neighbo s, classma es,
colleagues
-DeepWalk (Pe ozzi e al, 2014) wi h laye
pe sis ence and hubs
- Tempo al alignmen ac oss yea s
- Gene alized Fibonacci g id me hod o highe
dimensions o “clus e ” embeddings
- Including embedding clus e as a sequence oken
in a gi en yea
2. T ans o me models: Model p e aining
h ps://www. la icon.com/
Icons by F eepik
-BERT a chi ec u e
- Di e en sizes: 8 and 80 million pa ame e s ( es ed up o 540)
- T aining objec i es: Masked Language Modelling (MLM) and
Sequence O de P edic ion (SOP)
-240-dimensional embeddings
- T ain on sequence da a o 23 million people up o 2020
3. Model e alua ion
h ps://www. la icon.com/
Icons by F eepik
-P edic ion asks:
- Demog aphic e en s: e ili y, di o ce (7 mln/200k people)
- Socioeconomic s a us: income (200k people)
- LISS su ey: e hnic sel -iden i ica ion, mo o ehicle
owne ship
3. Model e alua ion
h ps://www. la icon.com/
Icons by F eepik
-Two app oaches:
- Feed s a ic embeddings in o a ully connec ed NN wi h
1/2 laye s
- Fine- une he model on he ask (upda e he weigh s)
-Baseline: linea eg ession wi h all a iables used o
p e- aining
3. Model e alua ion: Fe ili y
-P eFe challenge –p edic ing
whe he a pe son will ha e a
child in he nex 3 yea s
F-1 sco es o ou di e en models on he P edic ing Fe ili y Challenge.
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
3. Model e alua ion: Fe ili y
-P eFe challenge –p edic ing
whe he a pe son will ha e a
child in he nex 3 yea s
F-1 sco es o ou di e en models on he P edic ing Fe ili y Challenge.
h ps://www. la icon.com/
Icons by Vlad Szi ka, F eepik, bsd
3. Model e alua ion: LISS Su ey Ques ion Answe s
- Longi udinal In e ne s udies o he Social Sciences (LISS) panel
- 7500 indi iduals o e he age o 16
F-1 sco es o ou di e en models on wo example LISS su ey ques ions
3. Model e alua ion
F-1 sco es o ou di e en models on wo example LISS su ey ques ions
Model Income
R2
Demog aphic
F1
Su ey
F1
Baseline
0.30
0.33
0.41
P e ain
+ S a ic
0.50
0.32
0.64
P e ain
+ Fine une
0.58
0.39
0.57
5. In e es ing s u : Does i make sense?
Two-dimensional -SNE p ojec ion o 2020 embeddings binned in o
hexagons o mo e han a leas 10 pe sons. The poin s a e colou ed by he
bi h decade.
5. In e es ing s u : Does i make sense?
Two-dimensional -SNE p ojec ion o 2020 embeddings binned in o
hexagons o mo e han a leas 10 pe sons. The poin s a e colou ed by he
highes deg ee achie ed (highe numbe s indica e highe deg ees).
5. In e es ing s u : P edic ing u he ahead
-Embedding’s pe o mance in
longe - e m p edic ion de e io a es
less quickly compa ed o he linea
model baseline
F-1 sco es o ou di e en models on he ma iage/di o ce p edic ion
h ps://www. la icon.com/
Icons by Ki anshas y
5. In e es ing s u : Model size
- Bigge != be e !
- Especially wi h ine- uning,
smalle models pe o m
be e
- Ha d o ind a single bes
solu ion
Pa ame e size o he bes -pe o ming ine- uned and s a ic embedding models pe ask
h ps://www. la icon.com/
Icons by Ki anshas y
Task Fine- uned
S a ic
embeddings
Fe ili y
80 80
Ma iage
160 540
Di o ce
80 80
Income
80 540
Owning a ca
80 80
Belongingness: Tu ks
160 540
5. Whe e a e we and whe e do we go om he e?
- Fine- uning ou ounda ion models deli e s
pe o mance (almos ) on-pa wi h s a e-o -a
- We po ou model wi hin days
- Bu pe o mance s ill a ies by ask
- Pe o mance de e io a es slowe compa ed o a
baseline
- Bu wha is a good, scalable baseline?
- Missing da a & measu emen e o
h ps://www. la icon.com/
Icons by F eepik
5. Whe e a e we and whe e do we go om he e?
- A en ion weigh ed s a ic embeddings
- Gene a i e models?
- Du ing p e- aining, a p edic i e model should lea n
om only pas e en s
- P edic ing a wide ange o ou comes and u he
in o he u u e?
- Be e in eg a ion o g aph and sequence
embeddings
5. Whe e a e we and whe e do we go om he e?
- Explo e ou embedding space o desc ibe he
Du ch socie y
-T ack people’s li e cou ses h ough ime
- P edic li e ou comes u he ou in o he u u e
- Use embeddings o in o ma ion- e ie al-like
asks
- C ea e coun e ac ual scena ios
- Explo e (un)p edic abili y pe social g oup
Thank you!
a.macano[email p o ec ed]
Appendix: G aph Embeddings
Appendix: Model p e- aining
−Fo he MLM ask, we do he ollowing:
1. Randomly mask ou (i.e. in oduce gaps) 24% o he okens.
2. Randomly change 3% o he okens o any o he andom oken.
3. Randomly selec 3% o he okens and lea e hem unal e ed.
−Fo he SOP ask, he model has o p edic which ype o al e a ion has
been done on each li e-sequence:
1. 5% o li e sequences a e e e sed.
2. The li e-e en s a e andomly shu led in 5% o he li e sequences.
3. The emaining 90% o sequences a e le unal e ed.
−The inal loss unc ion is a weigh ed sum o he wo objec i es: L = 0.7 ×
MLM + 0.3 ×SOP