Document [original]

Eileen Roesler, Tobias Rieger, Dietrich Manzey

Trust towards human vs. automated agents: Using

a multidimensional trust questionnaire to assess

the role of performance, utility, purpose, and

transparency

Open Access via institutional repository of Technische Universität Berlin

Document type

Journal article | Accepted version

(i. e. final author-created version that incorporates referee comments and is the version accepted for

publication; also known as: Author’s Accepted Manuscript (AAM), Final Draft, Postprint)

This version is available at

https://doi.org/10.14279/depositonce-16411

Citation details

Roesler, E., Rieger, T., & Manzey, D., Trust towards Human vs. Automated Agents: Using a Multidimensional

Trust Questionnaire to Assess The Role of Performance, Utility, Purpose, and Transparency, Proceedings of

2022 (Human Factors and Ergonomics Society). DOI: 10.1177/1071181322661065.

This work is protected by copyright and/or related rights. You are free to use this work in any way permitted by

the copyright and related rights legislation that applies to your usage. For other uses, you must obtain

permission from the rights-holder(s).

Trust towards Human vs. Automated Agents: Using a Multidimensional Trust Questionnaire to Assess

The Role of Performance, Utility, Purpose, and Transparency

Eileen Roesler*

Technische Universität Berlin

Tobias Rieger*

Technische Universität Berlin

Dietrich Manzey

Technische Universität Berlin

In various domains, humans are supported by automated systems. Earlier research has suggested that trust in

automated agents differs from trust in other humans. The present studies aimed at taking a multi-dimensional

look at effects on trust towards automation and humans. To this end, we conducted two studies to empirically

validate a multi-dimensional trust questionnaire to assess performance, utility, purpose, and transparency

subdimensions of trust (Study 1, N=160) and to study experimental effects of support agent (i.e., human vs.

decision support system) and failure experience (i.e., none vs. one; Study 2, N=181). The expected factor

structure was confirmed. Moreover, the results showed that being supported by a human mostly impacted the

performance subscale. In sum, the findings illustrate the importance to study trust not only uni-dimensionally

but to consider different subdimensions, particularly as a single-item trust measurement was mostly correlated

to performance and utility subscales.

For a plethora of critical decision-making tasks, such as de-

ciding about applicants’ loans or evaluating medical x-rays, hu-

mans are often assisted by support agents. These support agents

previously were mainly human experts, but are increasingly re-

placed by automated systems. The general idea of having crit-

ical tasks supported by either another human or an automated

decision support system (DSS) is to improve safety and perfor-

mance. Unfortunately, the joint performance of both human-

human teams (e.g., Cymek, 2018) as well as human-DSS teams

(e.g., Rieger & Manzey, 2022) is often worse than ideal. More-

over, trust towards these different types of support agents has

been found to differ (for a review, see Madhavan & Wiegmann,

2007). One finding regarding the differences between human-

human and human-DSS trust, which has been often referred to

is the so-called perfect automation schema by Dzindolet, Pierce,

Beck, and Dawe (2002). The central proposition of the perfect

automation schema is that, a priori, humans have higher trust

towards automation, but after a failure has been experienced,

trust decreases more strongly when interacting with a DSS than

when interacting with another human. Dzindolet et al. (2002)

attributed this difference to expectations of near-perfect perfor-

mance towards automated systems but not towards humans.

However, some recent findings challenge this often re-

ferred effect of a perfect automation schema. More specifically,

Rieger, Roesler, and Manzey (2022) compared trust in human

expert support and automated decision support in different do-

mains. In three experiments with consistent results, they even

found higher trust in the human agent than the automation, sug-

gesting something like an imperfect automation schema. More-

over, failure experience reduced trust in both the DSS and the

human condition—as would be expected from most theoretical

models of human-automation interaction (e.g., Hoff& Bashir,

2015). Crucially, the size of the failure effect did not differ be-

tween the DSS and the human condition, indicating that both

*Shared first authorship. The hypotheses for Study 2 were preregistered.

All data, analysis code, and an implementation of the questionnaire for jspsych

is available at the Open Science Framework under osf.io/56cwx.

trust towards humans as well as DSS support suffered from fail-

ure experience.

Both the findings of Rieger et al. (2022) and Dzindolet et al.

(2002) rely on rather simple assessments of trust or utility, and

did not consider a multi-faceted view of trust. Even though both

studies found differences between humans and automation, nei-

ther can address which underlying trust facets play a role for the

respective differences. For instance, a single-item trust (Rieger

et al., 2022) is not capable of capturing different trust dimen-

sions and a more detailed look might be necessary to address

theory-based trust dimensions.

More specifically, most theoretical models of trust in au-

tomation (and also, of trust in humans) consider more than one

single dimension. For instance, in their seminal work on trust in

automation, Lee and See (2004) describe performance, purpose,

and process as separate dimensions of trust. They consider per-

formance as what the automation does, mainly in terms of re-

liability and the ability to help the operator (or advice-taker)

to achieve one’s goals. Purpose refers to why the automation

was implemented, and also includes the designer’s intentions

and whether the system’s benefit is comprehensible. Finally,

process is about how the system works. More specifically, it

includes aspects such as comprehensibility and familiarity of

the system.

Often-used single-items of trust or uni-dimensional trust-

in-technology questionnaires (e.g., Jian, Bisantz, & Drury,

2000) do not capture these different dimensions, and thus can-

not contribute much to a more detailed understanding of the

underlying determinants of trust and trust dynamics in inter-

action with different support agents. A more detailed multi-

dimensional trust-in-automation questionnaire, which might

also be used for investigating trust in human support agents,

has been proposed by Wiczorek (2011, Multi-dimensional Trust

Questionnaire; MTQ). Theoretically, it is based on the concept

of Lee and See (2004). Specifically, the performance and pur-

pose dimensions are directly assessed as separate dimensions.

Further, even though the process dimension is not directly ad-

dressed, the MTQ includes a transparency subscale which is

closely related to the process dimension of Lee and See (2004).

Finally, for a more differentiated assessment of the performance

dimension of Lee and See (2004) which also includes aspects

of usefulness, the MTQ separately assesses a utility subscale

which addresses how useful a system is to fulfill the task. In

summary, the MTQ (Wiczorek, 2011) allows for a more fine-

grained assessment of trust in order to gain a better understand-

ing which trust dimensions are impacted from a given experi-

mental manipulation.

However, thus far, the questionnaire was only available in

German. Given this, the current research has two objectives.

In Study 1, we aimed at a psychometric evaluation of an En-

glish version of this questionnaire. In Study 2, we conducted

an experimental study to gain a better understanding of the trust

facets impacted by receiving decision support from either a DSS

or another human as well as from failure experience.

STUDY 1 – QUESTIONNAIRE

As mentioned above, the goal of Study 1 was to translate

and validate the Wiczorek (2011) questionnaire. To this end,

we conducted an online study where, after experiencing work-

ing together with a perfectly reliable DSS, participants eval-

uated the system with a single-item trust and the MTQ with

its four subscales (performance, utility, purpose, transparency;

with four items each). The stimuli and procedure were the same

as in Rieger et al. (2022, Experiment 2; see also, e.g., Appel-

ganc, Rieger, Roesler, & Manzey, 2022). The data was used for

psychometric analyses of the factor structure of the question-

naire. Moreover, intercorrelations of the MTQ’s subscales and

a global single-item trust were assessed.

Method

Participants. As the reliability of factor analysis is depen-

dent on the sample size, we aimed to have 10 times as many

participants as variables (Nunnally, 1978). As the MTQ con-

sists of 16 items, 164 participants were recruited via Prolific (4

participants failed an attention check) to achieve a final sample

size of 160 participants (mean age =33.60, SD =5.73, 52.5%

female).

Procedure and Design. Participants took part in the exper-

iment in their own web browser and the experiment was pro-

grammed in jspsych (de Leeuw, 2015) and ran on a JATOS

server (Lange, Kühn, & Filevich, 2015). After giving their

informed consent, the experiment started with a general intro-

duction on the simulated radiological x-ray screening task they

would perform later on. Specifically, the task was to evaluate

which percentage of simulated x-rays were brighter than a given

cutoff(brighter than a grayscale value of 150). In order to en-

able participants performing this task, they were first shown a

grayscale continuum, with the critical cutoffthreshold marked,

along with an example image. Participants were instructed that

parts brighter than this could possibly be malignant and that

later on, they would be asked to estimate the percentage of po-

tentially malignant tissue in x-ray samples. They were also in-

formed that this was a very cautious cutoffwith no reason for

any concern with percentages lower than 15%.

After this general introduction of the task, participants were

shown three example scenarios with the correct solution. Af-

ter these example scenarios, participants were informed that

they would receive support from a highly reliable DSS but they

would make the final decisions. The DSS was characterized as

a well-established system which made its decisions based on

prior x-ray data. Participants were also informed that the DSS’s

reliability was greater than 90%. Subsequent to this framing,

participants were asked to answer two short attention check

questions about the support agent.

Then, the main part of the experiment started. Specifically,

participants were instructed that they would now be evaluating

fictional personas’ x-rays and were shown an example. Partici-

pants worked on a total of 10 trials where the DSS’s recommen-

dation was always correct. On each trial, the persona with the x-

ray image was shown for 5 seconds along with the information

that the DSS is currently evaluating the image. After 5 seconds,

participants were told that they can now press the spacebar to

continue to see the DSS’s recommendation. Then, they filled in

their final decision in an input field. After each trial, participants

needed to press the spacebar to continue to the next trial.

Subsequent to these ten experimental trials, participants

filled out two trust measures. That is, trust was assessed both,

using a single-item trust (i.e., “how much do you trust the deci-

sion support system?” from 0 (not at all) to 100 (completely)),

as well as the translated version of the MTQ (translated from

Wiczorek, 2011), measured on a four-point Likert scale (“Dis-

agree”, “Somewhat disagree”, “somewhat agree”, “Agree”).

Besides the four scales of the original MTQ, we embedded an

additional attention check item in the MTQ to ensure data qual-

ity. The order of the two trust measures was counterbalanced

across participants. Finally, participants filled out a short so-

ciodemographic questionnaire and were debriefed.

Results

Item Reduction &Scale Analysis. In order to make the

questionnaire more economical and reliable, items of the trans-

lated scale were removed if the respective item did not improve

the reliability of the scale (4-item scales αPE =.89, αU=.85,

αPU =.50, αT=.85). This led to the reduction of one item

per scale, resulting in three items per subscale with acceptable

to excellent internal consistencies (3-item scales αPE =.92,

αU=.87, αPU =.74, αT=.85).

Factor Analysis. In order to analyze the underlying fac-

tor structure of the remaining 12 items, we performed an ex-

ploratory maximum likelihood factor analysis with oblique ro-

tation (promax). Before conducting this factor analysis, we

checked for the appropriateness of sample size and the data.

Both the significant Bartlett’s test of sphericity (X2(66) =

1371.58,p< .001) and the very good Kaiser-Meyer-Olkin mea-

sure of sampling adequacy (KMO=.88) indicated that the sam-

ple size and data is adequate for the following factor analysis.

To determine the optimal number of factors for the ex-

ploratory factor analysis, a visual inspection of a screeplot (Cat-

tell, 1966) and the Kaiser’s criterion (Kaiser, 1960) of Eigen-

values were used. The plot indicated a three factorial structure.

Therefore, the factor analysis was performed with three factors.

Table 1 displays the obtained pattern matrix by showing

factor loadings above .40 (Stevens, 2009). The first factor com-

prised all items targeting performance as well as utility subscale

and accounted for 37% of the variance. The second factor in-

cluded the transparency items and accounted for 18% of the

variance. Factor three included all purpose items and accounted

for 13% of the variance. In sum, the overall scale accounted for

68% of the variance of the data.

Table 1. Items and their factor loadings.

Factors

Items 1 2 3

PE1 The system works safely. 0.87

PU1 The intention of the system is positive. 0.56

PE2 The system works well. 0.86

T1 The way the system works is clear to me. 0.49

U1r The system makes my work more difficult. 0.80

PU2 The system is intended to help improve

overall performance.

0.60

PE3 The system works accurately. 0.91

T2 I am well informed how the system works. 0.91

U2 The system is useful for my work. 0.80

T3 I understand how the system works. 0.94

PU3 The system was implemented to help me. 0.82

U3 I find that the system supports my work. 0.72

Internal Consistency and Validity. After removing one item

per subscale, the overall scale (α=.91) showed an excellent

internal consistency. This was also the case for the combined

reliability-utility scale (α=.94). Furthermore, the purpose

scale (α=.74) showed an acceptable and the transparency scale

(α=.85) a good internal consistency. To investigate the con-

struct validity, the single-item trust measurement was correlated

with the overall scale mean and the three single scales. The

results showed that the overall scale (r=.77) as well as the

performance-utility scale (r=.83), the purpose scale (r=.52),

and the transparency scale (r=.39) highly correlated with

single-item trust (all ps < .001).

Discussion

The aim of the questionnaire study was twofold. On the one

hand, we wanted to validate the factorial structure of the trans-

lated questionnaire. On the other hand, we aimed to study the

interconnection between the uni-dimensional single-item mea-

surement of trust and the different dimensions of the MTQ.

First, the analysis of the trust questionnaire revealed three

dimensions of trust. Whereas two of the factors directly corre-

sponded to the purpose and transparency (process) dimension

of trust according to the original conceptualization of these di-

mensions of trust by Wiczorek (2011), performance and utility

loaded on the same factor. This latter effect could relate to a

conceptual relationship of these two aspects which might even

be combined to one scale resulting in a three-factorial trust con-

cept congruent with the one of Lee and See (2004). However,

the results could also be associated with the type of system (a

simple DSS) and the specific paradigm (a single-task x-ray es-

timation) we used. More specifically, it seems reasonable to

assume that utility and performance might end up as different

dimensions in a much more complex task environment (e.g.,

supervisory control settings) with an imperfect system. For

these methodological reasons, we would advise against our fac-

tor analysis and recommend to consider performance and utility

as separate scales in future research before the high correlation

between these two dimensions has been replicated with other

systems.

Second, the correlation of the uni-dimensional item with

the MTQ overall score revealed a high correlation which indi-

cates a sufficient content validity of the questionnaire. Inter-

estingly, the uni-dimensional measure seems to be most promi-

nently related to performance and utility. The uni-dimensional

assessment of trust thus might be primarily related to the out-

come of the system reflecting its reliability and validity. How-

ever, this measure seems to be considerably less related to how

the agent operates. Therefore, the MTQ enables insights to the

process component of trust (Lee & See, 2004) by assessing pur-

pose and transparency.

In summary, the results of Study 1 suggest that the MTQ

is a valid instrument to assess trust multi-dimensionally. Thus,

we conducted Study 2 using the MTQ in order to study why

humans are trusted more than automation.

STUDY 2 - EXPERIMENTAL STUDY

Study 2 had three goals. First, we wanted to study which

trust subdimensions are affected by different support agents

(i.e., human vs. DSS). Second, we were interested to see which

subdimensions are impacted from failure experience. Third, we

were also interested in checking whether the subdimensions’

correlations with the single-item trust are descriptively different

for both types of agents.

Regarding the first goal, we expected to find a main effect

of support agent on single-item trust as in Rieger et al. (2022),

with higher trust for human than for automation support. To fur-

ther disentangle this trust effect, the MTQ was used. Here, we

hypothesized that the main effect of agent is only found for the

purpose and transparency scales. We expected an effect for the

purpose subscale because it refers to benevolence (i.e., a pos-

itive orientation) of the interaction partner (Lee & See, 2004)

which humans can express but automation cannot. Moreover,

we expected an effect for the transparency subscale as the lack

of system transparency is often criticized for automation (e.g.,

Wickens, 2017) but obviously not for humans.

Regarding the second goal, we again expected a main ef-

fect for the single-item trust with lower trust after a failure ex-

perience than after experiencing perfect support. We also had

subscale-specific hypotheses. That is, we expected failure ex-

perience to mainly impact the performance and utility subscales

of the MTQ. We expected particularly these subscales to reflect

the failure effect because performance is directly linked to true

reliability and utility is closely linked to the support agent’s ca-

pabilities, which are of course smaller when reliability is lower.

Moreover, as Rieger et al. (2022) did not observe any interaction

effects of agent and failure experience throughout three exper-

iments, we did not expect to observe one here–neither for the

single item nor for any MTQ subscale.

Finally, we had no clear hypotheses with respect to the third

goal and checked the correlations in an exploratory manner.

Method

Participants. A fresh sample of 200 participants (mean age

=32.54, SD =5.16, 51.4% female) was recruited via Prolific.

One additional participant was also tested but excluded from

any further analyses due to a failed attention check.

Procedure and Design. The procedure and design were

very similar to Study 1. Specifically, whereas in Study 1 all

participants had a DSS as their support agent and the ten trials

were without failure experience, both support agent and fail-

ure experience were systematically varied across participants in

Study 2. This resulted in a 2 (support agent: human vs. DSS) x

2 (failure experience: none vs. one) between-subjects design.

In the human condition, participants were told that they

would be supported by an experienced colleague during the

ten trials, and also that their colleague was more than 90% re-

liable. In the failure conditions, the respective support agent

made an obvious error in trial 7, indicating a very low percent-

age where the true proportion of bright pixels was rather high

(59% brighter than the cutoffwas evaluated to be okay with 8%

as the recommendation).

Based on the results of the factor analysis conducted in

Study 1, we used a reduced version of the MTQ with only three

items per subscale. Moreover, all questions were adapted to

fit the respective condition (i.e., the wording of the questions

always referred to either a “decision support system” or a “col-

league”). The order of the single item trust and the MTQ was

again counterbalanced across participants.

Results

First, we performed a manipulation check in the failure

condition. We excluded 19 participants who exactly followed

the obviously wrong advice of the support agent, as it is not

certain that they detected the failure. Subsequently, the effects

of the dependent variables were analyzed by 2 (support agent)

x 2 (failure condition) ANOVAs.

The analysis of single-item trust revealed no significant

main effect of failure or interaction effect (ps > .105). However,

the main effect of support agent F(1,177) =3.84, p=.052,

η2

G=.021 just failed to reach the conventional level of sig-

nificance. Participants tended to report higher trust towards

the human (M=80.40; S D =18.75) compared to the DSS

(M=74.33; S D =19.71).

Neither the analysis of the overall MTQ score (ps > .191),

nor the analyses of the dimensions purpose (ps > .283), utility

(ps > .105), or transparency (ps > .395) revealed any signifi-

cant effects. Only for the performance dimension a significant

main effect of the support agent was found F(1,177) =9.61,

p=.002, η2

G=.052, indicating higher perceived performance

of the human (M=3.52; S D =0.58) compared to the DSS

(M=3.23; S D =0.64). However, neither the main effect of

failure nor the interaction effect were significant (ps > .225).

Finally, we analyzed the correlations between the differ-

ent MTQ scales and the single-trust item. Pearson’s product-

moment correlations were calculated separately for the human

support and the DSS to investigate whether different aspects of

trust are important for different support agents. Figure 1 shows

the respective bivariate correlations. For the human condition,

the single item correlated highly with performance (r=.76),

utility (r=.76), purpose (r=.53), and transparency (r=.59).

For the DSS, high correlations of the single item with perfor-

mance (r=.71) and utility (r=.56), as well as weak cor-

relations with purpose (r=.22) and transparency (r=.30)

were found. Whereas the correlation between the single item

and the performance scale did not significantly differ between

human and DSS (z=0.75,p=.453), the correlations of the

single item with the utility (z=2.41,p=.016), purpose (z=

2.49,p=.013), and transparency (z=2.42,p=.016) scale

were significantly higher for the human compared to the DSS.

We decided against separate correlations for the failure condi-

tions as no significant differences occurred in the ANOVAs.

Discussion

The second study aimed to investigate what subdimensions

of trust are affected by the reliability of the system and the type

of the support.

Surprisingly, in contrast to numerous earlier findings

(Dzindolet et al., 2002; Hoff& Bashir, 2015; Rieger et al.,

2022), there was no significant effect of failure experience for

any of the trust measures. Perhaps, the fact that the MTQ is

typically measured on a four-point Likert scale might have con-

tributed to a potential ceiling effect, hiding a true effect of fail-

ure experience. Moreover, Miller (2009) pointed out that “even

when a real effect is present, some replication failures must be

expected as one of the unfortunate consequences of variability”

(p. 617), and the present study might be just one of those unfor-

tunate consequences.

For the type of support, the trend of the single item trust

and the significant main effect of the MTQ performance scale

stand in contrast to the perfect automation schema (Dzindolet

et al., 2002). In line with earlier research (Appelganc et al.,

2022; Rieger et al., 2022), the results indicate that human ex-

pert support tends to be trusted more than DSS. The results

further broaden the research body by showing that higher trust

towards humans might be related to higher associated perfor-

mance. Given that the actual reliability of the two agents was

equal, this might look surprising. However, perceived and ac-

tual reliability are often not the same (Madhavan & Wiegmann,

2007; Rieger et al., 2022), and the type of agent whose relia-

bility is perceived can also make a difference here. One possi-

ble reason why medical experts are perceived as more capable

than an automated system might be the pre-existing knowledge

in regard to expectation and reputation (Hoff& Bashir, 2015).

The present results indicate that human expertise in prestigious

domains like the medical one (Hauser & Warren, 1997) seem to

exceed the prestige and expectations of expert systems. Since

the support was highly reliable for all conditions (95% reliable

on average), the expectations might be maintained throughout

the collaboration. Future research is necessary to explicitly test

this assumption by comparing the preexisting knowledge and

the reputation of human and automated expert support.

With respect to the question which subdimensions might be

most important for the global trust towards support agents, the

second study validated the finding that uni-dimensional trust is

most strongly associated with performance and utility of a sup-

Figure 1. Bivariate correlations between the single-item trust and the performance, utility, purpose, and transparency subscales.

port for both the human and automated support. Interestingly,

purpose and transparency are related to the single item trust

strongly for human-human interaction and weakly for human-

automation interaction. The direct comparison of the correla-

tions showed that all scales besides the performance scale are

less correlated with human-automation trust than human-human

trust. The result is in line with earlier research illustrating that

the reliability of an agent is the one of the most important de-

terminants of trust in human-automation interaction (Hoff&

Bashir, 2015). Whereas a performance-oriented trust concept

might be sufficient for classical DSS, the recent technological

trend towards self learning systems also challenges the trust di-

mensions which are more associated with human support (i.e.,

transparency, and purpose). Artificially intelligent technolo-

gies extend DSS with a higher autonomy and agency (Legaspi,

He, & Toyoizumi, 2019). Consequently, those technologies are

often linked to a possible lack of transparency. This is also

supported by public scandals that question not only the trans-

parency, but also the positive intention of these systems (O’Neil,

2016). Therefore, it is particularly relevant to also include

trust facets that go beyond performance and utility in research

which investigates application domains with real-life relevance.

Thus, future research should include a multi-dimensional view

at trust, particularly with novel application domains.

CONCLUSION

Overall, both studies illustrate the importance to approach

trust on a multi-dimensional level. Uni-dimensional trust mea-

sures most prominently account for the performance and util-

ity subdimensions of trust towards a support agent. However,

purpose and transparency gain increasing importance as (intelli-

gent) automated systems increasingly make decisions that affect

people’s lives (O’Neil, 2016). Thus, future research studying

trust in agents (e.g., automated systems, robots, other humans,

etc.) should not only consider performance and utility aspects,

but also take issues like purpose and transparency into account.

REFERENCES

Appelganc, K., Rieger, T., Roesler, E., & Manzey, D. (2022). How much

reliability is enough? a context-specific view on human interaction with

(artificial) agents from different perspectives. Journal of Cognitive Engi-

neering and Decision Making. doi: 10.1177/15553434221104615

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Be-

havioral Research,1(2), 245–276. doi: 10.1207/s15327906mbr0102_10

Cymek, D. H. (2018). Redundant automation monitoring: Four eyes don’t

see more than two, if everyone turns a blind eye. Human Factors,60(7),

902–921. doi: 10.1177/0018720818781192ï¿¡

de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral

experiments in a web browser. Behavior Research Methods,47(1), 1–12.

doi: 10.3758/s13428-014-0458-y

Dzindolet, M. T., Pierce, L. G., Beck, H. P., & Dawe, L. A. (2002). The

perceived utility of human and automated aids in a visual detection task.

Human Factors,44(1), 79–94. doi: 10.1518/0018720024494856

Hauser, R. M., & Warren, J. R. (1997). 4. socioeconomic indexes for occupa-

tions: A review, update, and critique. Sociological Methodology,27(1),

177–298. doi: 10.1111/1467-9531.271028

Hoff, K. A., & Bashir, M. (2015). Trust in automation. Human Factors,57(3),

407–434. doi: 10.1177/0018720814547570

Jian, J.-Y., Bisantz, A. M., & Drury, C. G. (2000). Foundations for an

empirically determined scale of trust in automated systems. Interna-

tional Journal of Cognitive Ergonomics,4(1), 53–71. doi: 10.1207/

s15327566ijce0401_04

Kaiser, H. F. (1960). The application of electronic computers to factor analy-

sis. Educational and Psychological Measurement,20(1), 141–151. doi:

10.1177/001316446002000116

Lange, K., Kühn, S., & Filevich, E. (2015). "Just Another Tool for Online

Studies” (JATOS): An easy solution for setup and management of web

servers supporting online studies. PLOS ONE,10(6), e0130834. doi:

10.1371/journal.pone.0130834

Lee, J. D., & See, K. A. (2004). Trust in automation: Designing for appro-

priate reliance. Human Factors,46(1), 50–80. doi: 10.1518/hfes.46.1.50

_30392

Legaspi, R., He, Z., & Toyoizumi, T. (2019). Synthetic agency: sense of agency

in artificial intelligence. Current Opinion in Behavioral Sciences,29, 84–

90. doi: 10.1016/j.cobeha.2019.04.004

Madhavan, P., & Wiegmann, D. A. (2007). Similarities and differences be-

tween human–human and human–automation trust: an integrative re-

view. Theoretical Issues in Ergonomics Science,8(4), 277–301. doi:

10.1080/14639220500337708

Miller, J. (2009). What is the probability of replicating a statistically sig-

nificant effect? Psychonomic Bulletin &Review,16(4), 617–640. doi:

10.3758/pbr.16.4.617

Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill.

O’Neil, C. (2016). Weapons of math destruction: How big data increases

inequality and threatens democracy. Crown.

Rieger, T., & Manzey, D. (2022). Human performance consequences of auto-

mated decision aids: The impact of time pressure. Human Factors,64(4),

617–634. doi: 10.1177/0018720820965019

Rieger, T., Roesler, E., & Manzey, D. (2022). Challenging presumed techno-

logical superiority when working with (artificial) colleagues. Scientific

Reports,12(1). doi: 10.1038/s41598-022-07808-x

Stevens, J. P. (2009). Applied multivariate statistics for the social sciences.

New York: Routledge.

Wickens, C. D. (2017, oct). Automation stages & levels, 20 years after. Jour-

nal of Cognitive Engineering and Decision Making,12(1), 35–41. doi:

10.1177/1555343417727438

Wiczorek, R. (2011). Entwicklung und Evaluation eines mehrdimensionalen

Fragebogens zur Messung von Vertrauen in technische Systeme. In Reflex-

ionen und Visionen der Mensch-Maschine-Interaktion–Aus der Vergan-

genheit lernen, Zukunft gestalten (Vol. 9, pp. 621–626).