scieee Science in your language
[en] (orig)

Multivariate versus traditional quantitative phase analysis of X-ray powder diffraction and fluorescence data of mixtures showing preferred orientation and microabsorption

Source: https://iris.uniupo.it/bitstream/11579/141159/1/nb5320.pdf
research papers
J. Appl. Cryst. (2022). 55 https://doi.org/10.1107/S160057 6722004708 1o f1 4
Received 15 December 2021
Accepted 3 May 2022
Edited by A. Borbe
´ ly, Ecole National Supe
´ rieure
des Mines, Saint-Etienne, France
Keywords: X-ray powder diff raction;
quantitative p hase analysis; Rietvel d refinement;
multivariate analysis; principal component
analysis; X-ray fluorescence; preferred
orientation; microabsorption.
Supporting information : this article has
supporting information at journals.iucr.org/j
Multivariate versus traditional quantitative phase
analysis of X-ray powder diffraction and
fluorescence data of mixtures showing preferred
orientation and microabsorption
Mattia Lopresti,
a
Beatrice Mangolini,
a
Marco Milanesio,
a
Rocco Caliandro
b
and
Luca Palin
a,c
*
a
Universita
` del Piemonte Orientale, Dipartimento di Scienze e Innovazione Tecnologica, Viale T. Michel 11, 15121
Alessandria, Italy,
b
Institute of Crystallography, CNR, via Amendola 122/o, 70126 Bari, Italy, and
c
Nova Res s.r.l., Via D.
Bello 3, 28100 Novara, Ita ly. *Correspondence e-mail: [email protected]
In materials an d earth science, but also in chemistry , pharmaceutics and
engineering, the quantification of elements and crystal phases in solid samples is
often essential for a full characteri zation of ma terials . T he mo st frequently use d
techniques for this purpose are X-ray fluoresce nce (XRF) for elem ental analysis
and X-ray powder diffraction (XRPD) for phase an alysis . In both me thods ,
relations between signal and quan tity do exist but they are expre ssed in terms of
complex equations including many param eters relate d to both sample and
instruments , and the dependence on the active elemen t or phase amounts to be
determined is convol uted among those paramete rs . Often real-life samples hol d
relations not suita ble for a direct quantifica tion and, ther efore , estimations
based only on the values of the relative intensi ties are affected by large errors .
Preferred orientatio n (PO) and microab sorption (M A) in XRPD cann ot usually
be avoided, an d traditional corrections in Rietveld refine ment, such as the
Brindley MA correctio n, are not able , in general, to resto re the correct phase
quantification. In this work, a multiv ariate approa ch, where principal
component an alysis is exploite d alone or comb ined with regression methods ,
is used on XRP D profiles collected on ad hoc designed mixture s to face and
overcome the typical prob lems of traditional approac hes . Moreover, the partial
or no known crystal structure (PONKCS) metho d was tested on XRP D data, as
an example of a hybrid approac h between Rietveld and multivariate
approaches , to correct for the MA effect. Parti cular attention is given to the
comparison and selection of both me thod and pre -process , the two ke y steps for
good perform ance when applying multivariate methods to obt ain reliable
quantitative estima tions from XRPD data, especia lly when MA and PO are
present. A similar approach was tested on XRF data to deal with ma trix effects
and compared with the more cl assical fundamental-p aramete r approach. F inally ,
useful indications to overcome the difficulties of the general user in managing
the parameters for a successful ap plication of multivariate approa ches for
XRPD and XRF data analys is are give n.
1. Introduction
T he quantification of elements and phases in solid-state
materials represents a very important issue in many fields of
science in bot h the academic an d industria l world. X-ray
powder diffractio n (XRPD) and X-ray fluorescenc e (XRF)
are widely use d in this field to analyse crystalline phases and
atomic elements , respec tively . T he adva ntages of X-ray-ba sed
techniques are many , since these techniq ues can be part ially or
totally nondestructi ve and probe rela tively large amounts
(grams) of samples with a statisti cal relevance . An addition al
ISSN 1600-5767
Published under a CC BY 4.0 licence
advantage is me asuring the sam ple ‘as recei ved’, not alterin g
its natural conditio ns . X-ray techniq ues can also be exploited
in the field of portable instruments (Sarrazin et al. , 1998) for
experiments under in situ (Eve no et al. , 2011) or operando
(Urakawa, 2016) conditions . T his approach was exten ded to its
extreme , carrying out XRPD in the extraterrestrial world on
the Moon (V animan et al. , 1992) and on Marti an soil (De lhez
et al. , 2003; Bish et al. , 2013) .
Generally a whole profile fit approac h is exploite d, and a
profile calculated from the atomic crystal structure of each
pure component of the mixture is use d to fit the whole
experimental XRPD pattern in order to calculate the corre-
sponding weig ht fractions , thus perform ing a Rietveld refi ne-
ment (RR) (Rietveld, 1969). In XRF analysis (X-ray emission
spectra), elemen ts can be quantified by calibrati on with
standards or by the gene ric fundamenta l parameter (FP)
approach (Scho
¨ nenberger et al. , 2012 ). Some software, such as
MAUD or TOPAS (Lutter otti & Bortol otti, 2003 ; Coelho,
2018), can couple information fr om different techniques ( e.g.
XRPD , XRF , reflectivity experim ents ... ) to exploit RR.
Multivariate statisti cal analysis (MSA), req uiring no or little a
priori information, is a recently explored alternativ e
(Caliandro et al. , 2013; Zappi et al. , 2019; Guccione et al. , 2021)
to the above-cit ed tr aditional ap proaches (widely describe d in
Appendix A ) of XRPD and XRF data analysis . T he MultiF it
regression proce dure (Calia ndro & Belviso , 2014) req uires
only pure phase profiles , while principal component analysis
(PCA) is a comp letely blind ap proach (J olliffe & Cadi ma,
2016); both ap proaches , available within Root Prof (Caliandro
& Belviso , 2014), a software (free for academic use ) built
specifically to ma nage XY profile s such as XRPD , XRF or
other typical instrume ntal data, are the topic of the pre sent
contribution.
1.1. The potentialities and limitations of XRPD and XRF
T he pot entialities of XRPD in analytica l chemistry were
already envisaged in the early days of X-ray diffrac tion (Hull,
1919) and in its full develo pment in the second pa rt of the 20th
century (Co peland & Bragg, 1958). More recently , new
applications have emerg ed in spe cific utilizations , such as in
the cultural her itage (Artioli et al. , 2003, 2017; Doory hee &
Colomban, 2008; Bru netti et al. , 2016) and pharmace utical
(F awcett et al. , 2019) fields . T his approach has been widely
explored in rec ent decades (Madsen et al. , 2001; Scarl ett et al. ,
2002; De la T orre & Aranda, 2003; Leo
´ n-Reina et al. , 2009;
Ufer & Raven, 2017; Rav en & Self , 2017). T he need for
complete know ledge of the crystal struc ture was also over -
come by the PONKCS (Scarlett & Madsen, 2006; Madsen et
al. , 2019) approach, which is able to apply RR to partial or no
known crystal structures .
One of the major critical issue s in XRPD appli cations ,
especially phase quantification, is the tendency of micro-
crystals to be oriented along a preferred direc tion, which is
favoured in the case of needle or plate let-like morpho logy
(Dickson, 1969; Sitepu et al. , 2005; Monaco & Artioli, 20 11).
Preferred orie ntation (PO) causes biased intensi ties for the
oriented phase (Madsen et al. , 2019). Moreover, in the
presence of phases with larg e differences in linear absorption
coefficient (LA C) and particle diameter in coarse powders , the
heavily absorbing ones are underestimat ed as much as their
particles are large, because of the effect known as micro-
absorption (MA) (Madsen et al. , 2019). T his issue dramatical ly
affects quan tification, especially in the presence of par ticles of
diameter above a few microm etres . MA in XRP D can be
considered a matrix effec t (ME), since it can severely affect
the reliability of the results , in strict relatio n with sample
composition and morpholo gy . In fact, when MA is present, the
relation betw een intensi ty of the signals and weight fractio ns
can be lost, since it depend s on the volume subjected to the
incident flux in relatio n to particle size . In other wor ds , the
more heavily absorbing phase s will be less pe netrated by the
X-rays , while phases with lowe r absorption coefficient are
much more trans parent and are more likely to exhib it ‘volume
diffraction’ where the entire grain co ntributes to the diffrac-
tion proce ss .
When constraining weight fractions to 1 accord ing to
equation (7) in Ap pendix A , MA causes an underestimation of
the more absorbi ng crystalline phase and conse quently an
overestimatio n of the less absorbing phase. T o mitigate this
effect, costly and time-con suming procedu res are req uired to
prepare the sample for the measurement and mitigate MA.
F or instance , gentle milling all ows the crystallite s to be ground
without inducin g defect formation and crystallinity reductio n.
A widely used solution is the McCrone mill to reduce the
particle size of the powd er from a maximum of 0.5 mm
particles to some microme tres , depending on the sample and
grinding cond itions required for quantitative and qualitative
analytical methods , avoidin g stress/str ain/amorphization in the
crystallites . Sie ving can be useful, but the combinatio n of
milling and sievin g can severely alter the sam ple , which is
totally destructive, time consuming and not applicable in many
fields . When MA and PO concur to affect diffracted inte nsities
in solid mixtures , quan titative phase analysis (QP A) becomes
even more comp licated, if not impossib le . Similar ly , the main
obstacle hindering quan tification by XRF is referred to as the
matrix effect (Bowers , 2019), ag ain a signal enhancement and/
or reductio n induced by the presence of oth er elements in the
analysed mixture. It is exacerbated if the sample is measur ed
‘as recei ved’, without pe arl fusion (time consu ming and
destructiv e).
W ides pread solutions are the FP approach and algorithm s
based on the infl uence coeffic ients (Criss & Bi rks , 1968;
Rousseau, 1984 a , b ; W illis & Lachan ce , 2004) , which take into
account the ME in XRF da ta, exploiting theoretic al or
empirical influence coefficie nts that are specific to each
analyte–i nterferent pair. T he full empiri cal calibr ation with
known stan dard is an alternativ e but is limited in small
concentrat ion regions , with the additi onal limit of being
sample specific and very time consu ming. T he problem
becomes more complex when t he sample is made of mixed
inorganic an d organic ma terials . T he K  line of carbon can be
measured bot h with high-end energy-dispersive XRF and with
wavelength -dispersive XRF , but only if carbo n is pre sent in
research papers
2o f1 4 Mattia Lopresti et al.  QPA of diffraction and fluorescence data J. Appl. Cryst. (2022). 55
relatively high conc entrations ( e.g . above 50%) (Parus et al. ,
2000). Moreov er, below such a threshold, the measu red
intensity of the em ission line of the heavy ato m becomes
independent of its weight fr action, making the analysis
impossible (Grieken & Markowicz, 2001) unless the sample is
diluted with a lighter elemen t. T he aim of this article is to find
quick and reliable methods to proces s efficiently a large
number of samples while limitin g, as much as possible , sample
preparation.
1.2. Multivariate statistical analysis
MSA is a collection of met hods extensively used in analy-
tical chemistry and, in particular, in the ‘-omics’ sciences
(Sharaf et al. , 1986; V armuza & F ilzmoser, 2016), such as
metabolomics and proteo mics . In the MSA approach, the da ta
(of any kind) are organized in matrices and analys ed usin g
algorithms that allow searching for correlatio ns between the
variables (Anders on, 2003). T his approac h ensures that
important effects due to , for instance , the synergistic or
antagonistic interacti ons ( i.e. posi tive or negative corr elations)
between var iables (for insta nce , intensities at diffe rent 2 
angles in the XRPD case), are efficiently and correctly iden-
tified. T his multi-p urpose approach is common ly exploite d for
classification, regressions and pattern recognition, in which
unknown experim ental doma ins are explor ed (Anderson,
2003; J ohnson & W ichern, 2007). T he peculiarity of MSA is
the capability to extract effici ently the useful information, with
background supp ression and bias identifica tion, possibly
without or wi th very little a priori informa tion.
PCA is a well known method for experimental error
suppression appli ed to pattern rec ognition and dimensional ity
reduction (J olliffe & Cadima, 2016). The process consists of a
data decomposition in which sam ples , characterized by a
dimensionality p , equal to the number of descriptor values
( e.g. energies in XRF , 2  angles in XRPD), are projected in a
new space in which the dire ctions of the new axes (named
‘principal compon ents’, PCs) are defined by a linear combi-
nation of the starting variables (J olliffe & Cadima, 2016).
T he se PCs are generated by ma ximizing the explained
variance , which mean s that the y will be hierarch ically gener -
ated depend ing on how mu ch each PC describe s the varia nce
of the system (PC1 will have the ma ximum explain ed variance ,
PC2 will have less explained variance , and so on) (J olli ffe &
Cadima, 20 16; Guccione et al. , 2021). In a series of XRP D data
sets obtained from a group of samples with differe nt compo-
sitions , the main differe nces , i.e. the variance, are associated
with the changes of the experimental intensi ties due to the
different phase weight fractio ns in differe nt samples .
MSA applied to X-ray measurem ents has started to develop
in recent decades and is still a relativ ely novel field, as
described in a recent review (Gucc ione et al. , 2021). PCA, in
particular, has been applied to both single-crystal and powder
X-ray diffraction for in situ expe riments (Lopresti et al. , 2021;
Conterosito et al. , 2020; Palin et al. , 2019; Mato s et al. , 2007;
Guccione et al. , 2018), and when combining different techni-
ques such as XRPD and Raman spectro scopy (Uraka wa et al. ,
2011) or XRPD an d pair distrib ution function (PDF )/UV–V is
(Caliandro, Altamura et al. , 2019; Caliandro, T oson et al. ,
2019). Conce rning XRF , the use of MSA is alre ady a conso-
lidated practice. In particular, methods suc h as partial least
squares (Ho
¨ skuldsson, 1988; W old et al. , 2001) and prin cipal
componen t regression (Hotelling, 1957; J olli ffe , 1982) have
been wide ly reported in the scientific liter ature (Grieken &
Markowicz , 2001; Ghase mi et al. , 2013). MSA-based me thods
do not use crystal structure or other a priori know n infor -
mation but do use a probe- independent approac h to tackle the
same problem as the traditio nal methods , e.g . estimating sca le
factors between experimental XRPD and XRF inte nsities
(typically the whole XRPD patterns and a sub-range of XRF
spectra are used as inp ut) and phase or elem ent weight frac-
tion in XRF [equation (2)] and XRP D [equations (6) and (7)],
respectively . No specific equation s are used in MSA, and each
approach has specific data-a nalysis guidanc e criteria.
T he multip le regression approach, fully described by
Caliandro & Belviso (2014), is a whol e pattern regression
technique in which the expe rimental mix ture profile ^ y y ð i Þ is
fitted with a model y
mod
( i ) in the form of
y mod ð i Þ¼ P
q
j ¼ 1
v j ^
f f j ð i þ e j Þþ y 0 ; ð 1 Þ
built usin g q pure phase profi les ( ^
f f ). Mixture profile s are
therefore treated as a linea r combination of pure phase
experimental profile s , and the parameters v
j
, e
j
and y
0
,
representin g abundances of the q pure phase s and the hori -
zontal and vertical offsets of the profile s , respectively , are
refined using the MIN UIT libraries (J ames & Roos , 1997) .
T his algorithm is implemente d in RootProf and takes the
name of Mult iF it (Caliandr o , 2020).
T o prepare the data for PCA or regressio n procedures and
overcomin g the lack of equation s , as in RR and FP me thods
for XRPD and XRF , respec tively , it is often ne cessary to go
through an experim ental pattern pre-processing phase, which
uses several mathematical tools (normaliz ation, scalin g,
raising to a power, among the many possibilities ) to improve
the signal-t o-noise ratio (W ehrens , 2011). This is a key step in
the scale -factor estimation and weight or elem ent fraction
calculations , affecting the performances of all MSA methods .
Pre-processi ng is based on mathema tical treatment s
(Caliandro, 2020; Caliandro & Belv iso , 2014) able to transform
a raw experimental pa ttern into a pattern where the infor -
mation needed for quantification is enhance d and backgro und
and biased intensi ties are suppresse d. T he typical exampl e is
the pre-proces sing of data sets showing PO , where the math-
ematical transform ation suppresse s the oriented peaks to
overcome such bias . The used pre-process ing approac hes are
described in Section 2.2, while their test, selection and opti-
mization for the XRPD and XRF cases are describe d in
Section 3.2.1. In this article, MSA was perform ed by using
three differ ent approaches :
( a ) Supervise d multiple regr ession analysis (SMRA), in
which the scale facto r for each phase composin g the mixture is
estimated by multivaria te linear regressio n methods using
research papers
J. Appl. Cryst. (2022). 55 Mattia Lopresti et al.  QPA of diffraction and fluorescence data 3o f1 4
pure phase patterns for fitting and standard mixtures with
known compositi on for calibration and pre -process selection.
( b ) Unsupervised multiple regression analysis (UMRA ), in
which the same regression metho ds as SMRA are exploited on
samples , this time using pure phase pa tterns only . All the
mixture pattern s are used f or quantification .
( c ) Blind analysis (B A), in which pre-proce ssed data are
analysed by PCA witho ut prio r knowle dge of mixture
composition or pure phase s . T he guidance towa rds phase
scales is give n by the ma ximum data variance prin ciple . T he
quantification is perform ed not by reg ression methods but by
calculating the relative distance s between the points in the PC
space . All the patterns , inclu ding pure ones , are use d for
quantification.
1.3. Purpose of the work
Despite the fact that XRPD and XRF are often exploited
alone or together, no syste matic study focused on the
performance of the MSA metho ds applied to XRP D and XRF
methods of analysis in the full composition ran ge is available .
W ith the present article , we intend to fill the gap , assessing the
performances of PCA, multiple regressio n and hy brid
(PONKCS) approa ches , in comparison with traditional
methods (FP and Rietve ld). XRPD and XRF data sets are
analysed separatel y to assess the perform ances of the various
methods and give recipes for the application of MSA methods
to XRPD and XRF data. T he goal is favour ing the diffus ion of
multivariate approa ches in all academic and industrial en vir -
onments where solid materi als are of interest and a large
number of samples , in a wide range of compositi ons , must be
analysed, thus ma king complex prepar ation procedures such
as pearl fusion and milling impossib le , or when the sample
must be analyse d in a nondestruct ive way . Determining the
phase and elemen t content in complex mixtures , such as the
ones used for instance in brake pads , is a challenging task in
quality contro l. T hose mixtures are composed of reinforcing
fibres , binders , fillers , lubr icants and abrasive s . Reinforcing can
be carried out with ceramic materials such as pot assium tita-
nates; comm only used lubr icants are graph ite (C) an d metal-
sulfide ( e.g. MoS), and comm only used fillers are barite
(BaSO
4
) or calcium carbonate (CaCO
3
), typically calcite.
Quantifying the phase content in these mixtures with strong
MA effects is very comple x with XRPD and, due to the
presence of graphite , it is also very comple x from the XRF
point of view .
F our sets of sam ples with PO and/or MA iss ues were
prepared and analysed by XRF and XRPD . Substances for the
mixtures wer e selected followin g different crit eria in order to
simulate real exam ples of comple x mixtures: (i) pre sence of
both organic an d inorganic substa nces difficult to quantify by
traditional methods due to PO and/o r MA phenom ena; (ii)
non- or low-toxi city of the compon ents so that they could be
easily handled; (iii) non-re activity in mixture in standard
conditions; an d (iv) wide use in gene ral industry . In all the
mixtures , the two heavily absorbing phas es are bismite
(Bi2O
3
) and barite (BaSO
4
) with LA Cs of 1978 an d 924 cm
 1
,
respectively , using a Cu X-ray tube. A third lighter phase is
added to thes e two heavy phases to obt ain four ternary
mixtures: sieved graph ite (LA C of 10.18 cm
 1
) in specimen
D1; oriented graphite in specimen D2; zinc acet ate , an organ ic
sample but with Zn K  recorded in XRF da ta (LA C of
40.97 cm
 1
), in specimen D3; and urea (LA C of 9.91 cm
 1
)i n
specimen D4. T he space represe nted by the mixture weight
fractions , i.e. the correspon ding ternary phase diagram , is
commonly de fined as the ‘experimen tal domain’ (Cor nell,
2011).
T he mo st efficient way to explor e an experimental domain
is through the use of the design of experim ents (DoE)
approach (Bo x et al. , 1978 ; Cox & Reid , 2000; Cor nell, 2011) .
T he DoE approach consists of a set of mathematica l tools
allowing one to plan experiments to extract the maxi mum
possible am ount of information conta ined in the exper imental
domain with the least number of experiments . A Do E
approach was used to prepare the mixture samples , to cover all
the space rep resented by each possi ble combinatio n of the
phases’ weight fractio ns in a contro lled and efficien t way as
described in detai l elsewhere (Mango lini et al. , 2021 ). Each set
of samples was then analysed by XRPD and XRF to produce
four data sets . The obtained XRPD/XRF data belong to a
collection of data stored in a online repository that we recently
created (http s://doi.org/10.176 32/js2nzw f5md.2). T he database
is open to new contribut ions , with the aim of cre ating a large
data set for testing and calib rating XRPD and XRF techni-
ques . T he fea tures and instructions to exploit the current data
or for adding new da ta are give n in a dedicated publica tion
(Mangolin i et al. , 2021).
In the present article , these data are analysed both by the
traditiona l approaches and by the above-d escribed multi-
variate analysis approaches (SMRA , UMRA and B A). A
detailed descriptio n of the pre -processing opt imization and
selection is given, being the ke y step to obtainin g the best
QP A performances am ong all the adop ted approaches . T he
goal of these approac hes is mana ging, with a reasonable
precision , complex mix tures to allow fast (an d in principle
automati c) processing and an alysis of a large number of
samples . Moreover, the hybrid method PONKCS was tested to
compare its performance with respect to RR and the pure
multivariate approach. PONKCS was originall y developed to
refine with a Rietve ld-like approac h phases whose structure is
either not know n or only pa rtially know n. In this work, we
exploit PONKC S to obtain better estima tes for light phases in
samples affected by MA, even if their crystal st ructure is
known. Moreov er, only one of the four data sets will be
analysed by XRF ( i.e. data set D3) because both graphite and
urea lack elem ents that give an XRF signal detectabl e by the
used low-p ower benchtop instrume nt.
2. Materials and methods
2.1. Data collection
Sample preparation, morpho logical charac terization,
instrumenta tion and data collection are describe d in a dedi-
cated publicatio n (M angolini et al. , 2021) . T ernary mixtures
research papers
4o f1 4 Mattia Lopresti et al.  QPA of diffraction and fluorescence data J. Appl. Cryst. (2022). 55
were prepared by a DoE (Cornell, 20 11; Cox & Reid, 2000)
approach to properly sample their full co mpositional range.
All the data sets ha ve been made avai lable in an open online
database (Mangolini et al. , 2021 ). T able 1 shows the main
features for each sample .
2.2. Software
FP analysis of XRF data was carried out by the proprietary
software installed in the XRF instrument (Rigaku, 2012). T his
being a benchto p/portable instru ment with a low-p ower (4 W)
X-ray tube , the K line for carbon cannot be observ ed. XRPD
data were analysed by the traditional RR approach using
TOPAS-Academ ic (V5) (Coelho, 2018, 2020). A wh ole profile
regression (using the MultiF it algorithm) and PCA-assisted
quantitative analysis were performed by using RootProf
version 14 (Caliand ro & Belviso , 2014). T his software inclu des
different pre-proce ssing options org anized into four classes
(named levels), and one action for each level is execu ted on
raw data one after the other. T he levels of the modification
functions are profi le modifications (leve l 1), rescaling (level 2),
background subtrac tion (level 3) an d filtering (level 4)
(Caliandro & Belviso, 2014), whose use is docu mented in a
dedicated web pa ge with dedicated tutorials f or its efficient
learning an d usage (Caliandro, 2020).
T he se pre-processing steps have the scope of transform ing
raw data into modified data where backgro und and bias are
suppressed and relevan t information (phase or elem ent
amounts in XRP D or XRF , respectiv ely) is dom inant. Som e
useful and widely used raw-data pre-proce ssings are still not
included in Ro otProf (Savitzky–Gola y filtering and auto-
scaling), and were thus performed by using R base version
4.1.0 (R Core T eam, 2013) and the prosp ectr package version
0.2.1 (Stevens & Ramir ez-Lopez, 2021). T he hybrid approach
(exploiting, at the same time , Rietveld refinement and a
MultiF it-like approach using the pure phase intensi ty infor -
mation) na med PONKC S (Scarlett & Madsen, 2006) , as
implemente d in TOPAS-Acade mic (V5) (Co elho , 2018, 2020),
was used for a wi de exploration of possible analytica l solu-
tions .
3. Results
Eight data sets were built, collecting XRPD and XRF data on
four mixtures be longing to a tern ary experim ental domain,
whose features , as summarized in T able 1, were pre pared in
the whole concentration range with the aid of an augmen ted
simplex-cen troid DoE (Cor nell, 2011), as intro duced in
Section 2. The analysis of XRPD and XRF was carried out
comparing, in both cases , trad itional (Rietveld for XRPD and
FP for XRF) and multivariate me thods (SMRA, UMRA and
B A). Moreov er, XRPD data were analysed by PONKCS , as
implemente d in TOPAS V5 (Coelho, 2020).
3.1. Traditional methods
3.1.1. Rietveld analysis . T he four XRPD data sets were
refined first by a norma l RR with a one-direction March–
Dollase co rrection parameter for PO for gra phite . T he RR
data (F ig. 1) are represented with the foll owing lab elling
scheme: Eac h symbol is associa ted with one out of the fou r
data sets . Each comp osition is relate d to a specific colour: S4,
S5 and S6 are the binary mixtures with 50% in weight of each
componen t; S7 is the 33% equivalent weight tern ary mixture;
and SA1, SA2 an d SA3 are the augmen ted mixtures (66.6%,
16.7%, 16.7%) , (16.7%, 66.6%, 16.7%) and (16.7%, 16.7% ,
research papers
J. Appl. Cryst. (2022). 55 Mattia Lopresti et al.  QPA of diffraction and fluorescence data 5o f1 4
Table 1
A summary of the characteristics of each sample analysed by XRPD and
XRF .
More details on the characteristics of the samples are given by Mangolini et al.
(2021).
Data set Phases A brief description of the mixture
D1 BaSO
4
,B i
2
O
3
,
sieved C
Graphite has an average particle diameter of
< 90 m m to introduce moderate PO effects .
T here are large differences in density of the
three phases for an MA effect. T here is absence
of characteristic XRF signal for graphite .
D2 BaSO
4
,B i
2
O
3
,
mixed C
Same as sample D1 but this time graphite has a
30% in weight content with average particle
diameter larger than 90 m m with pronounced
PO effects .
D3 BaSO
4
,B i
2
O
3
,
ZnC
4
H
6
O
4
All phases have an XRF signal. Zinc acetate
introduces moderate PO and has a lower
density than graphite , enhancing MA effects .
Zinc acetate also has a larger unit cell, to
increase peak superposition in XRPD .
D4 BaSO
4
,B i
2
O
3
,
CH
4
N
2
O
Absence of XRF signal, slight PO effect. Urea
presents larger average particle size and has a
lower density than zinc acetate, with increased
MA effects .
Figure 1
Results of the XRPD Rietveld analysis reported on the ternary graph
representing the mixtures’ experimental domain; S4–S7 are ternary and
binary mixtures of the simplex DoE, while SA1–SA3 are the augmented
simplex samples , highlighted in italic. Phase 3 is the lighter phase:
graphite, oriented graphite , zinc acetate and urea in data sets D1, D2, D3
and D4, respectively .

66.6%), respectiv ely . W ith this scheme , the aggregation of
symbols of the same colour close to the circles (representing
the expected nomin al values) indicates small deviatio ns from
the nominal value. As expected, the symbols in F ig. 1 are
rather disperse d, highlighting large deviations in the weight
fractions estima ted by the standard RR approa ch. T he Ri et-
veld profile fitting reaches a satisfactory agreeme nt factor
( R
wp
< 17); also , when strong MA is present, unless the analyst
knows the actual sample composition, there is no evide nce
from the RR results that someth ing should be improved or
changed (F ig. S1 of the supporting information).
In general, because of MA, high-abso rbing bar ite and
bismite are unde restimated and the lighter Phas e 3 is over -
estimated in every data set. In sample S4, common to all data
sets , barite–b ismite 50:50 MA is present and the barite content
is overestimate d at 76 : 7 % . Deviations due to strong MA are
highlighted for sam ples S5 of data sets D1 and D2, composed
of 50 % barite and 50 % graph ite as the lighter phase , an d for
samples S6 of all data sets , compose d of bismite and a lighte r
phase at 50% in weight. T his behaviour affects the de viations
observed in the RR fit from the expected values of the ternary
mixture. T he mean deviation comp uted on sam ples S7, SA1,
SA2 and SA3 for D1 is due to an unde restimation for barite of
 5 : 8 % , an underestimatio n for bism ite of  18 : 8 % and an
overestimatio n of þ 24 : 6 % for graphite. Similar behaviour is
seen in data set D2. In the case of D3, where zinc acetate
replaces graphite, the mean deviation comp uted on the
samples S7, SA1, SA2 an d SA3 is lower and bism ite is
underestimate d by  18 : 6 % , and barite an d zinc acetate are
overestimate d by þ 4 : 7a n d þ 14 : 6 % , respectively . A similar
behaviour is seen in data set D4, where urea is present.
T he PONKCS approa ch (Appendix A 3) is exploite d to
‘calibrate’ and try to properly mana ge the MA effect, still by
using the RR approa ch. As seen in F ig. 2, the values are much
less disperse d compared with the classical RR case (Fig. 1). In
the case of data sets D1 and D2 , the best approach is the single
PONKCS (see Append ix A 3 for a detailed definitio n of single
and double PONKCS) calibrated on sample S6, with the
under - and overestim ation of heavie r and lighter phases mu ch
more limited than for RR. Instead, for dat a sets D3 and D4,
the best approach is the doubl e PONKCS calib rated with
respect to the bismite content on sample S7. In the case of
single PONKCS for data sets D1 and D2, the mean de viation
from the expected values decreases from 17.1 and 20 : 1 % to
8.7 and 7 : 2 % , respectiv ely . In the cas e of double PONKC S for
data sets D3 and D4, the mean deviation decreases from 12.6
and 12 : 4 % to 4.7 and 7 : 5 % , respectiv ely . The squared sum of
the resid uals (SSR) of estim ated phase ab undances fr om the
RR and PONKC S with respec t to the actual value is reported
in T able 2 (taking the sum of the phases as equal to 1).
3.1.2. XRF FPs result and measurement conditions .F o r
these kinds of mixtures , with phas es with very different
absorption coefficients , XRF resul ts are dependent on
measurem ent conditions . Meas uring at 50 kV with an Ag
X-ray tube and an Ag filte r placed between the tube and the
sample allows a smooth background at low energies , cutting
the L  of the Ag tube , but the NexQC low-powe r X-ray tube ,
as a portab le instrument, is not able to excite sufficie ntly and
record inte nsities of the K  of carbon, even in the pre sence of
a helium purge. W ith the classical FP quan tification approac h,
analysing the L  emission line of bismuth and the K  line of
barium, it is possible to quantify only the relativ e amount of
barium and bismuth in the mixture without the contrib ution of
the lighter phase. T he presence of the lighter phas e , graphite
or urea, does not affec t the intensi ties of Ba L  and Bi K  .
T hose intensities are independent of the lighter -ph ase
concentrat ion ( e.g. in mixtures S4, S7 and SA3 where the
integrated intensi ties of the fluor escence emission lines L  for
bismuth and K  for bar ium have a consta nt value). T he case of
data set D3 with zinc acetate is more straightforward due to
research papers
6o f1 4 Mattia Lopresti et al.  QPA of diffraction and fluorescence data J. Appl. Cryst. (2022). 55
Figure 2
Results of the XRPD PONKCS analysis reported on the ternary graph
representing the mixtures’ experimental domain. Labelling and colour
scheme as in Fig. 1.
the presence of the Zn K  emission line , for bot h the classical
FP method an d the MSA approac h.
3.2. Multivariate analysis of XRPD/XRF data
3.2.1. Pre-process selection . T he multivariate approach has
the great advantag e of being probe independent, so it can be
applied in the same way to XRP D and XRF data, and it does
not require known crystal struc tures or other a prior i infor -
mation. Moreover, no relations , such as those in equations
(4)–(7) (XRPD) or equation (2) (XRF), are assumed to relate
experimental intensitie s and phase or element fr actions .
T he refore , the system can be analyse d in an unb iased way ,
driven by the spe cific features of the experimental profiles . As
a drawback , the lack of informatio n about scale factors and
the absorption coeffic ients of each component of the mixture
must be compens ated by the use of other guiding princi ples .
On the one hand, each approach has an intrinsic principle
underlying the multivariate analysis , e.g. variance in PCA-
based B A or the minimization of the SSR towards pure phase
patterns in multip le regression (SMRA and UMRA ), as
described in Section 1.2. On the other hand, the powe r and
flexibility of the multivariate metho d rely on the almost infi-
nite combinatio n of mathematical tools used to tr ansform raw
patterns to suppress noise and bias and enhance the infor -
mation useful for quantificat ion. T he se data pre-proce ssings
might drastically trans form the pattern, but this is the way to
obtain good quantitative results . In this mandatory preli-
minary analysis named ‘pre -process selec tion’ (see Section
1.2), before inv estigating unknow n samples , the effec ts of
many parameters (such as the 2  data range , re-scaling of
intensities and re-samplin g of the profile s by a smoothing
algorithm) must be evalua ted. In this section, the adopted
approach for the chosen very diffi cult case study is presen ted,
while in Section 4 a guide for the best ap proach depending on
sample features and experimental needs is given. In fact, when
PO and MA are present and the whole experim ental domain is
studied, the diffic ulty is at its maxi mum, and this preliminar y
phase can be very time and resour ce demanding, requiring a
suitable trainin g data set of know n samples . Pre-proce ss
selection is perform ed by RootProf in an autom atic way
through its calibrati on process in supervised analysis
(Caliandro & Belv iso , 2014).
F or this study , additional, still not imple mented within
RootProf , pre-proce sses were tested (Savitzk y–Gola y filtering
and mathema tical derivative ) using the R framewo rk (R Core
T eam, 2013). T his external pre-proce ss optimi zation followed
an experim ental factor design approa ch of 2
5
(Box et al. , 1978 ;
Cox & Reid , 2000), where the pre -processing paramete rs
smoothing wind ow , der ivative order and autoscal ing were
combined with the 2  ran ge of the pattern and the number of
skipped data in the RootProf calibration process . T he SSR of
estimated phase ab undances with respec t to the measured one
was used to identify the best combinatio n of pre-processing
parameters , and the optimizatio n was performed on each
XRPD and XRF data set separately . F or conven ience , only the
best pre-proces s comb inations have been reporte d in T able 3.
T he details about all remaining pre -process combinatio ns can
be found in the supp orting informa tion. Concernin g XRPD
profiles , T able 3 shows that the best results are obt ained by
analysing their full range an d not subranges containing only
the highest-in tensity peak s . Howev er, the RootPr of internal
pre-proces s showed the existence of a be tter option, which
research papers
J. Appl. Cryst. (2022). 55 Mattia Lopresti et al.  QPA of diffraction and fluorescence data 7o f1 4
Table 2
T o make comparable the performances of the exploited methods , the differences between the expected values and the predicted values for each mixture
belonging to each data set were calculated.
Overall performances were expressed as the SSR, commonly used to evaluate the agreement degree of the regression models .
Data set D1 Data set D2 Data set D3 Data set D4
Method SSR
SA1–SA3
SSR
TOT
SSR
SA1–SA3
SSR
TOT
SSR
SA1–SA3
SSR
TOT
SSR
SA1–SA3
SSR
TOT
Rietveld 0.3557 1.046 0.4107 1.1642 0.2193 0.6393 0.2245 0.6798
Single PONKCS on barite 0.0899 – 0.1176 – 0.0741 – 0.1885 –
Single PONKCS on bismite 0.0934 – 0.0561 – 0.1831 – 0.5672 –
Double PONKCS on barite 0.1068 – 0.1109 – 0.0644 – 0.1613 –
Double PONKCS on bismite 0.1326 – 0.1787 – 0.038 – 0.0773 –
SMRA 0.0274 – 0.0263 – 0.0246 – 0.0243 –
UMRA 0.0292 0.0938 0.0263 0.0727 0.0246 0.0787 0.0259 0.0877
B A 0.0788 0.1678† 0.0161 0.0781† 0.0599 0.1129† 0.1528 0.3514†
† SSR
TOT
for B A was calculated without including pure phases .
Table 3
A summary of the best pre-process selection procedures for the analysis
of data sets D1–D4 for XRPD data and D3 for XRF data.
XRPD data.
External pre-process RootProf setup Results
Run
Smoothing
window
Derivative
order
A uto-
scaling 2  range Skipdata
Best pre-
process SSR
D1 5 0 No 10–120 3 3203 0.027
D2 5 0 No 10–120 3 3203 0.026
D3 5 0 No 10–120 3 4203 0.025
D4 9 0 No 10–120 5 4204 0.24
XRF data.
External pre-process RootProf setup Results
Run
Smoothing
window
Derivative
order
A uto-
scaling
Energy
range Skipdata
Best pre-
process SSR
D3 0 0 No 1.8–16.4 3 5003 0.058
occurs more fr equently amon g the XRPD data sets: L
1
=3
(logarithm in base 10 of the pattern intensi ties), L
2
=2
(normalization of the subten ded area to 1), L
3
= 0 (no back-
ground subtra ction) and L
4
= 3 (PC filterin g). T he powering of
the pattern intensi ties to 4/5 ( L
1
= 4) seems to be another good
alternative for profile modifications , while L
4
= 4 is a variant of
PC filtering L
4
= 3. T he autosc aling perform ed by R did not
give any valuab le result in t he presence of PO and MA. The
same procedure was rep eated without taking into acc ount the
conditions found to be more unf avourable , such as the auto-
scaling, the first-order derivative and the 2  range reductio n.
F or XRF data, for data set D3 with zinc acetate , the internal
RootProf pre-proce ss confirms that the best perform ing
combination has L
4
= 3 (PCA filtering). Exc luding the tail at
the beginni ng and end of the XRF spe ctra, where values are
going noisily to zero , is also crucial. Having determined the
best pre-processing options , we present the performan ce of
the multivariate approach on both XRPD and XRF data sets
in the following sectio n, by using superv ised and unsupervised
QP A and a completely blin d analysis , where no information
(not even the pure phases) is supplied to the softwa re .
3.2.2. Supervised quantitative analysis . SMRA was
performed using the three pure phase s and the four other
simplex mix tures to calib rate the mode l, while the augmented
experiments SA 1, SA2 and SA3 were used as unknown
samples to test the procedure. XRPD data sets were analysed
using the best pre-proce sses obtained by the selection proce-
dure describ ed in the previous sec tion. In F ig. 3, an example of
the performance of the Mult iF it procedure on data set D1 is
given. In this figure , the goodness of the fit performed by
RootProf is eviden t, like the RR reported in the supporti ng
information (see F ig. S1). The results of the data analysis of
the four data sets are reported in T able S1 of the supporting
information and F ig. 4. T he data are closer to the expected
value , even in comparison with the PONKCS calibrated
approach (Fig . 2).
Data set D1 has uncertainties on the estimations of up to
13 % , as can be seen for sam ple S5. In data set D2, surprisingly
since the averag e particle size of gra phite is larger than that in
data set D1, indiv idual results are gene rally more pre cise tha n
those obtained for data set D1, and the SSR is reduced. In
these data sets showin g PO and MA, SMRA appe ars to be
research papers
8o f1 4 Mattia Lopresti et al.  QPA of diffraction and fluorescence data J. Appl. Cryst. (2022). 55
Figure 3
A fit (red dashed line) produced by RootProf using the MultiF it algorithm during SMRA on sample SA1 of data set D1. The pattern (black continuous
line) has been pre-processed as reported in T able 3.
Figure 4
Results of the ( a ) SMRA and ( b ) UMRA performed on XRPD data,
reported on the ternary graph representing the mixtures’ experimental
domain. Labelling and colour scheme as in Fig. 1.
rather more robus t than RR and PONK CS . In fact,
significantly lower SSR (see T able 2) values are observ ed
for the results obtained by multivaria te analysis compared
with the best performing PONKC S (SSR
D1–SMRA
= 0.027,
SSR
D1–PONKCS
= 0.0899). Data sets D3 and D4 show errors
similar to the first two data sets , always below 10 % in the
estimates . T he error distrib utions are always normal and zero-
centred, a sign that systematic errors are absent, or very
limited (analyses of residuals are reported in the supporting
information), in contra st to PONKCS and RR. XRF data set
D3, the only data set with the presen ce of an XRF activ e
element in all three species , was analysed usin g the best pre-
processes obtained by the selection procedure described in the
previous sectio n and using the XRF spe ctra obtained at 50 kV .
As expected, the FP algorithm performs better in the SA2
sample case wh ere the lighter phase zi nc acetate is the
minority phase. In the other two cases , the results of the
SMRA approach are co mparable to those of the FP method.
Globally , very similar SSRs are observ ed, with a value of 0.104
for the FP method and 0.111 for SMRA. T his approach
represents typ ical real-world use of the regressio n method,
especially in comp lex cases and when errors must be mini-
mized. After the pre-proce ss selection and optimizatio n have
been performed on a well de fined series of samples , e.g.
clinker in a cement comp any or graph ite in a lubrican t plant,
the SMRA metho d can also be implemented for routi ne
analysis in a fully automati c approac h.
3.2.3. Unsupervised quantitative analysis . UMRA was
performed on XRP D and XRF data by supplying to the
software informa tion regarding the pure phases , while each
other mixture was use d for testing the fittin g method. A pre-
process combinat ion was appli ed, exploiting general-use
recipes for samples without particular critical issues
(Caliandro, 2020), or using indic ations by a previous SMRA
calibration (T ab le 3), as carried out in the presen t article
because of the presence of MA and PO . Since UMRA is a
standardles s method, it can be applied suc cessfully when a
strong corr elation between the scale parameters of the
experimental profi les after the pre-process and the quantities
in the mixtures is present, as demonstra ted in the previous
section. In this case study , this was found to be true for XRPD
data but not for XRF da ta, and thus XRF data are not
reported for UMRA in the present work. The results of the
quantifica tion are reported in T able S2. F or data sets D1, D2
and D3, the estimations are very simil ar to those obtained by
performing SMRA, and the quan titative information can be
extracted from the data, since the pre-processing procedure is
already known after the calibration by SMRA in the previous
section. Similar pre-process option s
were found for all data sets , sug gesting
a rather gene ral approach, as debated
in detail in Section 4. T he range of the
errors spans from 0 to 12.4%, which is
still a good value for these kinds of
samples , and it rep resents the be st
result among the thre e different QP A
approaches . As for supervised QP A,
the residua l analysis does not show any
easily recognizable tren d, and the
error distribution is normal and zero-
centred. Unsuper vised QP A of XRF
data was not reporte d due to the large
errors (up to 57 % error in quantifica -
tion of zinc ace tate).
3.2.4. Blind analysis . B A was run on
each data set to test the limits of the
minimum req uired knowledg e needed
by the MSA to pe rform a very fast
semi-quantitativ e analysis witho ut any
a priori informa tion. Differ ently from
SMRA and UMRA, no compositional
or pure pattern profile information was
given as input to the RootProf soft-
ware . B A relies onl y on the explain ed
variance extracte d by PCA and will be
based on the relativ e distance s
between the samples proj ected in the
PC spaces .
T he results of B A are reporte d in
the form of a score plot for each data
set (F ig. 5) and as numerical values
research papers
J. Appl. Cryst. (2022). 55 Mattia Lopresti et al.  QPA of diffraction and fluorescence data 9o f1 4
Figure 5
T he results of the B A are reported in the form of score plots for each data set: ( a ) D1, ( b )D 2,( c )D 3
and ( d ) D4. T he numbers represent the positions of the samples in T able S3 (0 is Ba, 9 is SA3).