MMLN: An R Package o Mixed-E ec s Mul inomial
Logis ic-No mal Reg ession and Model Diagnos ics
E ic A. E. Ge be 1
1No heas e n Uni e si y, 360 Hun ing on A e, Bos on, MA 02115
Abs ac
Mul inomial ou comes a ise in nume ous ields--- om spo s and species coun s o genomics---ye
exis ing so wa e o en ocuses on simple ixed-e ec s o pu ely mul inomial (logis ic)
amewo ks. The MMLN package in oduces a sui e o unc ions o i mo e complex mul inomial
logis ic-no mal eg ession models, including inco po a ion o andom e ec s, and e alua e he i
o all mul inomial eg ession models using he squa ed Mahalanobis dis ance esiduals [6].
The MMLN() unc ion i s mixed-e ec s mul inomial logis ic-no mal models ia MCMC sampling,
while he MD es() unc ion compu es he squa ed Mahalanobis dis ance esiduals o
comp ehensi ely e alua e model adequacy. Use s can isualize o o mally es hese using quan ile-
quan ile plo s and Kolmogo o -Smi no es s. We desc ibe he design and usage o he unc ions
p o ided in he MMLN package and demons a e he package's capabili ies o modeling
mul inomial da a by in eg a ing lexible modeling ools, summa ies, isualiza ion, and obus
diagnos ics.
Key Wo ds: mul inomial eg ession, mixed e ec s models, esiduals, diagnos ics
1. In oduc ion
Mul inomial logis ic-no mal (MLN) models ex end he classical mul inomial amewo k by
embedding he p obabili y ec o s o he mul inomial da a in a la en Gaussian space, mo e obus ly
accoun ing o o e dispe sion among ca ego ies han adi ional mul inomial logi o hie a chical
mul inomial-Di ichle models. Bayesian hie a chical models a e powe ul ools o i ing bo h
ixed-e ec and mixed-e ec MLN models used o cap u ing o e all and g oup-le el co a iance
s uc u es among he mul inomial ca ego ies. Pos e io p edic i e checks p o ide essen ial
diagnos ics o assessing model adequacy bu ha e his o ically been di icul o de i e o unin ui i e
o mul inomial models o any ype. Squa ed Mahalanobis esiduals calcula ed using samples om
p edic i e dis ibu ions ha e been de eloped o add ess his gap [6].
The MMLN package implemen s bo h ixed-e ec s and mixed-e ec s MLN models using Gibbs
sampling wi h lexible Me opolis-Has ings upda es, accompanied by ools o esidual analysis in
R. The speci ic MLN unc ions add o he lexicon o es ablished mul inomial eg ession models,
while he MD es() unc ion is simple o use o assessing model i s unde any mul inomial
eg ession amewo k, om models as complex as MLN o as simple as basic mul inomial logi .
2. Mul inomial Logis ic-No mal Models and Diagnos ics
De ine he gene al o m o a nominal ou come eg ession model:
π¦π¦ππβΌ β³
π½π½(ππππ,ππππ)
whe e β³
π½π½ ep esen s he mul inomial dis ibu ion wi h π½π½ ca ego ies, exposu e ππππ> 1, and
p obabili y ec o ππππ.
The commonly i model o hese da a, he mul inomial logis ic, s uggles o accoun o
o e dispe sion and canno accoun o posi i e co ela ions be ween he ou come coun s o he
di e en ca ego ies. The mul inomial Di ichle model has been shown as one op ion o
accoun ing o o e dispe sion [7]. Howe e , he mul inomial logis ic-no mal is a mo e lexible
al e na i e, which does a be e job o accoun ing o posi i e co ela ions be ween ca ego ies [1].
Mul inomial logis ic-no mal models a ise om modeling he p obabili y ec o using he in e se
addi i e logis ic a io ans o ma ion, whe e a mul i a ia e no mal noise e m is added o he log
odds, in o he wo ds (and equi alen ly):
(ππππππ(ππππ1
πππππ½π½
), β―,ππππππ(ππππ(π½π½β1)
πππππ½π½
)) βΌ πππ½π½β1
ππππ=ππππππβ1(ππππ+ππππ)
ππππβΌ πππ½π½β1(0, Ξ£)
2.1 Fixed-E ec s MLN
The FMLN() unc ion i s he pu ely ixed e ec s e sion o mul inomial logis ic-no mal eg ession
[5], whe e:
ππππ=ππππππβ1(πππππ½π½+ππππ)
The model is i ia a Gibbs sample wi h Me opolis o Me opolis-Has ings (depending on
p oposal dis ibu ion) p oposals o he la en ππππ=πππππ½π½+ππππ. The algo i hm i e a i ely p oceeds:
(1) Sample all ππππ|ππ
ππ,ππππ,π½π½,Ξ£ β βπ½π½β1 ia Me opolis-Has ings (la en a iables; depending on
p oposal dis ibu ion)
(2) Sample π½π½|ππ,ππ,Ξ£ βΌ ππ ( ixed e ec s)
(3) Sample Ξ£|ππ,ππ,π½π½ βΌ In -Wisha ( esidual co a iance)
2.2 Mixed-E ec s MLN
To accommoda e mo e complex da a s uc u es, including g oup le el a ia ion ( o ππ= 1, β―,ππ
g oups), he MMLN() unc ion in oduces es ima ion o andom in e cep s, whe e:
ππππππ =ππππππβ1(πππππππ½π½+ππππ+ππππππ)
and
ππππβΌ πππ½π½β1(0, Ξ¦)
while he es o he model pa ame e iza ion emains he same. This mixed e ec s mul inomial
logis ic-no mal model [5] is i ia a Me opolis-wi hin-Gibbs sample ex ended om he one used
o i he ixed e ec s e sion:
(1) Sample all ππππππ|ππ
ππππ,ππππππ,π½π½,ππππ,Ξ£ β βπ½π½β1 ia Me opolis-Has ings (la en a iables;
depending on p oposal dis ibu ion)
(2) Sample ππππ|ππ,ππ,π½π½,Ξ£,Ξ¦ βΌ ππ ( andom e ec s)
(3) Sample Ξ¦|Ξ¨ βΌ In -Wisha ( andom e ec s co a iance; Ξ¨ is he ull ma ix o andom
e ec s)
(4) Sample π½π½|ππ,ππ,Ξ¨,Ξ£ βΌ ππ ( ixed e ec s)
(5) Sample Ξ£|ππ,ππ,π½π½ βΌ In -Wisha ( esidual co a iance)
2.3 Mahalanobis Residuals
Recen ly, andomized quan ile esiduals o bina y ou comes [3] we e ex ended o use in nominal
mul inomial modeling amewo ks [6]. These esiduals ake he o m o squa ed Mahalanobis
dis ances in he ans o med addi i e log- a io space om he obse ed da a o he samples om a
p edic i e dis ibu ion unde any model i . These esiduals, implemen ed in he MD es() unc ion,
a e no speci ic o any mul inomial model. Gene ally, o each obse a ion ππ, πΎπΎ samples o p edic ed
coun s π¦π¦ππππ a e gene a ed om a i ed mul inomial model o o m he sampling dis ibu ion o he
addi i e log- a io ans o med ec o s:
π€π€ππππ =ππππππ(π¦π¦ππππ
β), ππ= 1,2, β¦ , πΎπΎ
Squa ed Mahalanobis dis ances o hese model-gene a ed log-odds, {π€π€ππ}, as well as o he obse ed
log-odds, π€π€ππ
ππππππ a e calcula ed:
πππ·π·ππππ
2= ((π€π€ππππ β π€π€ππ)ππΞ£
^
π€π€ππ
β1((π€π€ππππ β π€π€ππ))
and
πππ·π·ππ
2(ππππππ)= ((π€π€ππ
ππππππ β π€π€ππ)ππΞ£
^
π€π€ππ
β1((π€π€ππ
ππππππ β π€π€ππ))
whe e π€π€ππ is he sample mean o model-gene a ed log-odds and Ξ£
^
π€π€ππ is hei sample co a iance
ma ix.
A pe cen ile o he obse ed dis ance, πππ·π·ππ
2(ππππππ) ela i e o he empi ical cd o he model-gene a ed
dis ances, πΉπΉ
^
πΎπΎ(πππ·π·ππ
2) is calcula ed. A uni o m andom a iable is hen gene a ed whe e he
minimum, ππππ, and maximum, ππππ, depend on he obse ed alue's o de ed loca ion among he model-
based πππ·π·ππ
2:
(1) I πππ·π·ππ
2(ππππππ)β€ ππππππ(πππ·π·ππ
2)
π’π’ππβΌ π°π°(ππππ= 0, ππππ=πΉπΉ
^
πΎπΎ(ππππππ(πππ·π·ππ
2)))
(2) I πππ·π·ππ
2(ππππππ)>ππππππ(πππ·π·ππ
2)
π’π’ππβΌ π°π°(ππππ=πΉπΉ
^
πΎπΎ(ππππππ(πππ·π·ππ
2)), ππππ= 1)
(3) O he wise,
π’π’ππβΌ π°π°(ππππ=πΉπΉ
^
πΎπΎ(πππ·π·
~
ππ
2), ππππ=πΉπΉ
^
πΎπΎ(πππ·π·ππ
2(ππππππ)))
whe e πππ·π·
~
ππ
2=ππππππ(πππ·π·ππ
2<πππ·π·ππ
2(ππππππ))
I he da a i he model, hese pe cen iles a e dis ibu edπ°π°(0,1). These pe cen iles a e back-
ans o med o s anda d no mal alues o se e as esiduals, ππππ=Ξ¦β1(π’π’ππ). No mal quan ile-quan ile
and esidual plo s a e hen used o assess i .
3. The MMLN R Package
3.1 Package S uc u e
The package is o ganized in o ou main R sc ip s:
β’ mln_helpe s.R: u ili y unc ions
β’ mln_ unc ions.R: co e MCMC samples FMLN() and MMLN()
β’ mul i_ es: Mahalanobis esiduals MD es(), summa y and plo ing helpe s
β’ eal_da a_examples.R: igne es o applying he unc ions o eal da a se s
3.2 Func ion O e iew
Table 1 gi es he desc ip ion o he h ee p ima y unc ions o he package, as well as he h ee
mos impo an helpe unc ions. The e a e se e al o he unc ions which a e discussed as needed.
3.3 Using FMLN
The FMLN() unc ion akes as i s p ima y a gumen s he coun ma ix ππ and inpu ma ix ππ o
ixed-e ec s co a ia es. Pa ame e s also include he o al numbe o MCMC i e a ions, bu n-in
(numbe o ini ial i e a ions o disca d), hinning in e al, scaling ac o o Me opolis-Has ings
p oposal co a iance, se ings o he p io dis ibu ions on he ixed e ec s and esidual co a iance
ma ix, and choice o p oposal dis ibu ion o he Me opolis-Has ings. The e bose a gumen
allows o p in ing o p og ess upda es.
_______________________________________________________________________________
es_ <- FMLN(
Y = sim$Y,
X = sim$X,
n_i e = 2000,
bu n_in = 500,
hin = 2,
p oposal = "no mbe a",
e bose = TRUE
)
_______________________________________________________________________________
3.4 Using MMLN
The MMLN() unc ion beha es simila ly o he FMLN(), hough now equi es he de ini ion o he
ππ andom e ec s design ma ix, cu en ly only suppo ing andom in e cep s o g oup-le el
obse a ions. All o he a gumen s emain he same, hough he p io se ings also now accoun o
he inclusion o he andom e ec s.
_______________________________________________________________________________
es_m <- MMLN(
Y = sim$Y,
X = sim$X,
Z = sim$Z,
n_i e = 2000,
bu n_in = 500,
hin = 2,
p oposal = "no mbe a",
e bose = TRUE
)
_______________________________________________________________________________
Table 1: P ima y MMLN Package Func ions
Func ion
Desc ip ion
Fi ixed-e ec s MLN model ia MH-Gibbs sampling
Fi mixed-e ec s MLN model wi h g oup-le el andom in e cep s
Gene a e aceplo s and pos e io summa y ables
Calcula e DIC om log-likelihood samples
Simula e pos e io p edic i e coun s o model checking
Compu e Mahalanobis esiduals
FMLN
MMLN
plo _ ace_and_summa y
compu e_dic
sample_pos e io _p edic i e
MD es
3.5 Diagnos ic Tools
T ace plo s and pos e io Ma ko chain summa ies can be displayed wi h he
plo _ ace_and_summa y() unc ion a e passing one o he pos e io chain objec s e u ned by
ei he FMLN() o MMLN() h ough he simpli y2a ay() unc ion. By de aul , he ace plo s a e
displayed in g oups o ou .
_______________________________________________________________________________
be a_chain_a ay <- simpli y2a ay( es_m$be a_chain)
ace_s a s <- plo _ ace_and_summa y(be a_chain_a ay,
"be a")
ace_s a s
_______________________________________________________________________________
The De iance In o ma ion C i e ion (DIC) [2] o compa ing model i s is compu ed ia he
compu e_dic() unc ion a e using he e u ned pos e io chains and he ue da a coun s o es ima e
he log likelihood unc ions using he dmnl_loglik() unc ion.
_______________________________________________________________________________
ll_chain <- sapply( es_m$w_chain,
unc ion(W) dmnl_loglik(W, sim$Y))
W_ha <- al (comp ess_coun s(sim$Y) / owSums(sim$Y))
ll_ha <- dmnl_loglik(W_ha , sim$Y)
dic_ es <- compu e_dic(ll_chain, ll_ha )
_______________________________________________________________________________
Finally, he squa ed Mahalanobis esiduals can be compu ed o any se o p edic i e dis ibu ion
samples o each obse a ion using he MD es() unc ion. The unc ion has a summa y() class
me hod which p in s ou he esul s o he Kolmogo o -Smi no es o no mali y as a o mal es
o model i and displays he no mal quan ile-quan ile plo o he esiduals o a con enien g aphical
assessmen .
_______________________________________________________________________________
Y_p ed_lis <- lapply(seq_along( es_m$w_chain), unc ion(i) {
sample_pos e io _p edic i e(X = sim$X,
be a = es_m$be a_chain[[i]],
Sigma = es_m$sigma_chain[[i]],
n = sim$n,
Z = sim$Z,
psi = es_m$psi_chain[[i]],
mixed = TRUE
)
})
esids <- MD es(sim$Y, Y_p ed_lis )
summa y( esids)
_______________________________________________________________________________
3.5.1 Example Ou pu
The MMLN package also includes se e al igne es o demons a ing he implemen a ion and
u ili y o he models and diagnos ic ou pu on bo h simula ed and eal da a. One igne e in ol es
helpe unc ion, un_pollen_models(), ha shows he esiduals abili y o cap u e he well-es ablished
exis ence o o e dispe sion [7] in pollen coun da a. The e is also a simula e_mixed_mln_da a()
unc ion which will simula e da a om he MMLN model. As an example, we simula e da a unde
he MMLN, hen i hose da a wi h bo h he MMLN() and FMLN() unc ions. The example Figu e
1, and Kolmogo o -Smi no es esul s p esen ed demons a e one use case o he Mahalanobis
esiduals and i s summa y class me hod.
Figu e 1: Example QQ-plo s o summa y(MD es) ou pu o FMLN (le ) and MMLN ( igh )
models i o MMLN da a.
_______________________________________________________________________________
> esids <- MD es(obse ed_coun s, i ed_coun s_lis )
> summa y( esids)
Kolmogo o -Smi no es o no mali y o Mahalanobis esiduals:
Asymp o ic one-sample Kolmogo o -Smi no es
D = 0.13429, p- alue = 0.05429
al e na i e hypo hesis: wo-sided
_______________________________________________________________________________
4. Discussion and Fu u e Wo k
The MMLN package equips use s wi h lexible ools o modeling mul inomial ou comes in he
p esence o o e dispe sion, oge he wi h comp ehensi e diagnos ics ia squa ed Mahalanobis
esiduals. The modula design and simple in e aces acili a e usage and applica ion o a wide ange
o da a. Fu u e ex ensions a e planned o include handling o mo e obus andom e ec s, as he
in as uc u e o he mixed e ec s model should be easily ex ended:
ππππππ =πππππππ½π½+ππππππππππ+ππππππ
π£π£π£π£π£π£(ππππ)βΌ ππ(π½π½β1)ππ(0, Ξ¦)
whe e, gi en ππ g oup-le el andom co a ia es, he log-odds la en a iables ha e he mul i a ia e
no mal dis ibu ion:
ππππππ|ππππππ,π½π½,ππππ,Ξ£ βΌ ππ(π½π½β1)(πππππππ½π½+ππππππππππ,Ξ£)
and, uncondi ionally:
π£π£π£π£π£π£(ππ
ππ)|ππππππ,π½π½,Ξ£ βΌ ππ(π½π½β1)ππππ(π£π£π£π£π£π£(πππππ½π½), ππππ
β1)
whe e
ππππ
β1 = (ππππβ πΌπΌ(π½π½β1))Ξ¦(ππππβ πΌπΌ(π½π½β1))ππ+ (πΌπΌππππβ Ξ£)
Howe e , he addi ion o addi ional andom e ec s d as ically inc eases compu a ion cos , and will
hus equi e mo e obus implemen a ion, pe haps by le e aging he Rcpp package [4] o
in eg a ing R and C++. In he u u e, suppo o al e na i e p io s o he pa ame e s o he Bayesian
model may also be included.
Re e ences
[1] J. Ai chison. The S a is ical Analysis o Composi ional Da a. Jou nal o he Royal
S a is ical Socie y: Se ies B (Me hodological), 44(2): 139-177, 1982.
[2] D. J. Spiegelhal e , N. G. Bes , B. P. Ca lin, and A. Van De Linde. Bayesian measu es o
model complexi y and i . Jou nal o he Royal S a is ical Socie y Se ies B: S a is ical
Me hodology, 64(34):583-639, 2002.
[3] K. P. Dunn and G. K. Smy h. Randomized quan ile esiduals. Jou nal o Compu a ional
and G aphical S a is ics, 5:1-10, 1996.
[4] D. Eddelbue el and R. F ancois. Rcpp: Seamless R and C++ in eg a ion. Jou nal o
S a is ical So wa e, 40(8):1-18, 2011.
[5] E. A. E. Ge be and B. A. C aig. A mixed e ec s mul inomial logis ic-no mal model o
o ecas ing baseball pe o mance. Jou nal o Quan i a i e Analysis in Spo s, 17(3):221-
239, 2021.
[6] E. A. E. Ge be and B. A. C aig. Residuals and diagnos ics o mul inomial eg ession
models. S a is ical Analysis and Da a Mining: An ASA Da a Science Jou nal,
17(1):e11645, 2024.
[7] J.E. Mosimann. On he Compound Mul inomial Dis ibu ion, he Mul i a ia e π½π½-
Dis ibu ion, and Co ela ions among P opo ions. Biome ika, 49(1-2):65-82, 1962.