Making the Case for Measuring Mental Effort [original]

Making the Case for Measuring Mental Effort∗

Stefan Zugal

University of Innsbruck

Technikerstraße 21a

6020 Innsbruck, Austria

stef[email protected]

Jakob Pinggera

University of Innsbruck

Technikerstraße 21a

6020 Innsbruck, Austria

jakob[email protected]

Hajo Reijers

Eindhoven University of

Technology

PO Box 513

NL-5600 MB Eindhoven, The

Netherlands

h.a.reijers@tue.nl

Manfred Reichert

Universität Ulm

Building O27,

James-Franck-Ring

89069 Ulm, Germany

manfred.reichert@uni-

ulm.de

Barbara Weber

University of Innsbruck

Technikerstraße 21a

6020 Innsbruck, Austria

barbara.w[email protected]

ABSTRACT

To empirically investigate conceptual modeling languages,

subjects are typically confronted with experimental tasks,

such as the creation, modification or understanding of con-

ceptual models. Thereby, accuracy, i.e., the amount of cor-

rectly performed tasks divided by the number of total tasks,

is usually used to assess performance. Even though accuracy

is widely adopted, it is connected to two often overlooked

problems. First, accuracy is a rather insensitive measure.

Second, for tasks of low complexity, the measurement of ac-

curacy may be distorted by peculiarities of the human mind.

In order to tackle these problems, we propose to additionally

assess the subject’s mental effort, i.e., the mental resources

required to perform a task. In particular, we show how afore-

mentioned problems connected to accuracy can be resolved,

that mental effort is a valid measure of performance and how

mental effort can easily be assessed in empirical research.

Categories and Subject Descriptors

G.3 [Probability and Statistics]: Experimental design

General Terms

Experimentation, Human Factors, Measurement

1. INTRODUCTION

Over the years, numerous conceptual modeling languages

and associated modeling tools have been proposed [15]. In

order to allow for an objective discrimination whether new

∗This research is supported by Austrian Science Fund

(FWF): P23699-N23

proposals come along with improvements, the adoption of

empirical software engineering has been advocated [4, 26].

Certainly, empirical research has been shown to be suitable

for putting discussions on an objective basis. Still, in or-

der to contribute to truly objective results, a valid exper-

imental setup, as well as valid measurement methods are

indispensable—slightest changes in the design might lead to

fundamentally different outcomes [12].

In this work, we focus on empirical research that involves

human activities, such as the creation, modification and un-

derstanding of conceptual models. Therein, various meth-

ods have been applied to identify differences. In particu-

lar, researchers have used modeling tasks [20], modification

tasks [7] and sets of questions [5] to assess performance of

conceptual modeling languages. In order to measure the

outcome of tasks, typically accuracy and duration are ana-

lyzed (cf. [5, 7, 11, 20, 28]). Accuracy thereby refers to the

percentage of correctly performed tasks, whereas duration

refers to how long a subject required to perform a task. Even

though accuracy is well-established and widely adopted, it

is connected to two often overlooked problems. First, in or-

der to identify differences with respect to accuracy, subjects

need to commit errors. Hence, subtle differences that may

be relevant, but do not lead to errors, cannot be identified

(e.g., slight improvement of comprehensibility). Second, it

has been shown that for tasks that are easy, humans tend

to make mistakes that are actually not caused by the mod-

eling notation, but are rather the result of peculiarities of

the human mind [10]. In order to overcome these problems

and to improve validity of the collected data, we propose to

additionally assess the subject’s mental effort, i.e., the men-

tal resources required for performing the task. We would

like to stress that we do not propose replacing accuracy and

duration, but rather using mental effort as an additional

perspective that potentially provides further insights. The

contribution of this paper is twofold. First, we argue for

measuring mental effort on the basis of literature. Second,

we will substantiate our claims with experiences drawn from

own experiments.

The remainder of this paper is structured as follows. Sec-

tion 2 discusses problems related to accuracy and how to

address them using mental effort. Insights from experiments

making use of mental effort are presented in Section 3 and

afterwards discussed in Section 4. Section 5 presents related

work and Section 6 concludes with a summary.

2. MEASURING MENTAL EFFORT

In the following we start by discussing the previously de-

scribed problems in more detail. Then, we introduce mental

effort to address the aforementioned problems.

Problems Concerning Accuracy. In the introduction,

we briefly mentioned that accuracy is of rather low sensi-

tivity and potentially incorrect for tasks of low complexity.

Issues regarding the sensitivity become clear when looking

at the definition of accuracy. Usually, accuracy is defined to

be the ratio of correctly performed tasks (e.g., correct an-

swers) divided by the number of all performed tasks (e.g.,

total amount of questions). In other words, subjects have

to commit mistakes in order to obtain a lower accuracy. If

a task performed in the course of an experiment is not diffi-

cult enough to provoke errors, no differences can be observed

with respect to accuracy, also known as ceiling effect [25].

Likewise, if differences between experimental tasks are not

large enough, no differences can be observed.

In addition, for tasks of low complexity a further problem

arises—it has been recognized that for such tasks subjects

tend to commit more careless mistakes. In [10], this phe-

nomenon is explained by Dual-Process Theory [22]. Roughly

speaking, this theory postulates that the human brain con-

sists of two systems, S1 and S2. S1 processes are character-

ized as being fast, unconscious and effortless. S2 processes,

in contrast, are slow, conscious and effortful. In addition,

S2 serves as monitor of the fast automatic responses of S1.

Apparently, in certain situations, S1 comes up with a fast

response and S2 does not intervene—leading to answers that

are fast but incorrect. Hence, for tasks of low complexity,

it may be the case that accuracy does not reflect the task’s

difficulty, but rather this peculiarity of the human mind.

Up to now we have discussed problems associated with mea-

suring accuracy, i.e., low sensitivity and potential problems

when assessing accuracy for tasks of low complexity. In the

following, we introduce the concept of mental effort and il-

lustrate how it can be used to overcome these problems.

Measuring Mental Effort. In general, the human brain

can be seen as a “truly generic problem solver” [24]. Within

the human brain, cognitive psychology differentiates between

working memory that contains information currently being

processed, as well as long-term memory in which informa-

tion can be stored for a long period of time [17]. Most se-

vere, and thus of high interest, are limitations of the working

memory. As reported in [2], working memory cannot hold

more than about seven items at the same time. The amount

of working memory currently used is thereby referred to as

mental effort. The importance of the working memory has

been recognized and led to the development and establish-

ment of Cognitive Load Theory, meanwhile widespread and

empirically validated in numerous studies [3].

To measure mental effort, various techniques, such as rat-

ing scales, pupillary responses or heart-rate variability are

available [17]. Especially rating scales, i.e., self-rating men-

tal effort, has been shown to reliably measure mental effort

and is thus widely adopted [9, 17]. Furthermore, this kind

of measurement can easily be applied, e.g., by using 7-point

rating scales. For instance, in [13] mental effort was assessed

using a 7-point rating scale, ranging from (1) very easy to

(7) very hard for the question“How difficult was it for you to

learn about lightning from the presentation you just saw?”.

In the context of conceptual models, mental effort is of in-

terest as it appears to be connected to performance, e.g.,

properly answering questions about a model. In general, it

is known that errors are more likely to occur when the work-

ing memory’s limits are exceeded [23]. Similarly, in [14] it

is argued that higher mental effort is in general associated

with lower understanding of models.

Based on these insights, we argue that mental effort is con-

nected to performance, i.e., accuracy and duration. In con-

trast to accuracy, however, subtle differences can presum-

ably be observed. In particular, for cases where mental ef-

fort is well within the working memory’s limits and thus does

not provoke a significant amount of errors, still a different

mental effort could be observed. In addition, for tasks of low

complexity, careless mistakes may distort the measurement

of accuracy. For mental effort, however, it can be expected

that careless mistakes will not distort the measurement.

3. MENTAL EFFORT IN EMPIRICAL RE-

So far, our arguments for measuring mental effort are based

on insights from literature. In the following, we will com-

plement the discussion with findings we gained in own ex-

periments. For each experiment, we will shortly sketch the

setup on a very abstract level and point out relevant findings

related to the measurement of mental effort.

3.1 Experiment 1: Test Cases in Declarative

Business Process Modeling

Background. In this experiment, we investigated whether

the presence of test cases supports the maintenance of declar-

ative business process models, as argued in [32]. In the con-

text of this work, the relevant information is that subjects

were asked to adapt conceptual models with different types

of operational support.

Setup. We employed a randomized, balanced single-factor

experiment with repeated measurements (cf. [27]). The fac-

tor was adoption of test cases, having factor levels test cases

as well as absence of test cases. The experiment’s objects

were change assignments for two declarative process mod-

els. Measured response variables relevant for this work were

mental effort and accuracy, i.e., the amount of errors con-

ducted (details are provided in [31]). To assess mental effort,

we employed a 7-point rating scale, ranging from Extremely

low mental effort (1) to Extremely high mental effort (7) for

the question “How would you assess the mental effort for

completing the change tasks (with tests)?”. For assessing the

impact of factor level absence of test cases, the phrase “with

tests” was replaced by “without tests”.

Execution of Experiment. The experiment was con-

ducted in December 2010 in a course on business process

management at the University of Innsbruck; in total 12

students participated. Operational support for collecting

demographic data was provided by Cheetah Experimental

Platform (CEP) [21], modeling assignments were conducted

using Test Driven Modeling Suite (TDMS) [30].

Findings Relevant to Mental Effort. The collected data

indicated that subjects, who had test cases at hand, con-

ducted fewer errors, however, the differences were not signif-

icant (Wilcoxon Signed-Rank Test, Z = -0.857, p = 0.391).

Interestingly, the data indicated a lower mental effort for

subjects who had test cases at hand. However, in this case

the differences could be found to be significant (Wilcoxon

Signed-Rank Test, Z = -2.565, p = 0.010). A detailed anal-

ysis showed that the provided models were too simple to

provoke the desired effects, i.e., differences with respect to

the amount of errors committed. In fact, 96% of the tasks

were performed correctly, leaving almost no room for im-

provement. Still, the models were difficult enough to achieve

significant results with respect to mental effort. Knowing

that errors are more likely to occur when the working mem-

ory’s limits are exceeded [23], these results seem plausible.

Even though the tasks were not difficult enough to go beyond

the limit of the subjects’ working memory and thereby pro-

voking errors, different levels of mental effort were required.

Put differently, it appears as if in this case measuring mental

effort provided a more sensitive method than accuracy.

3.2 Experiment 2: Test Cases in Declarative

Business Process Modeling (Replication)

Background. In this experiment, we further explored this

research direction, i.e., the background is identical to Ex-

periment 1.

Setup. Since this experiment is a replication of Experiment

1, the setup is identical, except for more complex models1.

Execution of Experiment. The experiment was con-

ducted in December 2011 in a course on business process

management at the University of Ulm; in total 31 students

participated. Again, CEP [21] and TDMS [30] were used as

operational support.

Findings Relevant to Mental Effort. Data analysis

showed that subjects who had test cases at hand conducted

significantly less errors (Wilcoxon Signed-Rank Test, Z =

-3.165, p = 0.002) and required significantly less mental ef-

fort (Wilcoxon Signed-Rank Test, Z = -3.867, p = 0.000).

Interestingly, the total amount of correctly performed tasks

dropped from 96% in Experiment 1 to 80% in this experi-

ment. Hence, the two key observations are, as follows. First,

apparently a certain level of complexity was required to pro-

voke enough errors and to make differences with respect

to accuracy significant. Second, mental effort consistently

showed significant differences in both experiments. In other

words, as discussed in Section 2, mental effort and accuracy

seem connected, however, a certain level of complexity is re-

quired for accuracy in order to show significant differences.

1Material can be downloaded from:

http://bpm.q-e.at/experiment/TDMReplication

3.3 Experiment 3: Hierarchy in Business Pro-

cess Models

Background. In this experiment we investigated the im-

pact of hierarchy on the understandability of BPMN models.

In the context of this work, the essential part is that we elab-

orated pairs of information-equivalent models, one of them

making use of hierarchy. Then, we asked subjects to an-

swer questions about those models, measuring accuracy of

answers, duration and mental effort.

Setup. We employed a randomized, balanced single-factor

experiment with repeated measurements (cf. [27]). The fac-

tor was hierarchy with factor levels flat as well as hierarchi-

cal. The experiment’s objects were two BPMN-based busi-

ness processes. Measured response variables relevant for this

work were accuracy of answers,duration and mental effort2.

In contrast to Experiment 1 and Experiment 2, where men-

tal effort was assessed once for each subject, in this exper-

iment we measured the expended mental effort after each

question. To assess mental effort, we used a 7-point rating

scale ranging from Extremely low mental effort (1) to Ex-

tremely high mental effort (7). The question for measuring

mental effort was: “Please indicate your mental effort for

answering this question (question X)”.

Execution of Experiment. The experiment was con-

ducted in a course on business process management at the

University of Eindhoven in January 2012; in total 114 stu-

dents participated. Again, CEP [21] was used for presenting

the models to subjects and collecting answers.

Findings Relevant to Mental Effort. The assessment

of accuracy, duration and mental effort per question, as op-

posed to Experiment 1 and Experiment 2, where mental

effort was assessed once for the entire experiment, allowed

for a much more fine grained analysis. In the course of this

experiment, 2 BPMN-based business process models were

presented to each subject. For each model, 8 questions were

asked, leading a total of 16 questions per subject. Since we

expected different mental effort, accuracy and duration, de-

pending on whether a question was posed for a hierarchical

model or a flat model, responses were analyzed separately,

leading to a total of 32 data points. In the following, we

will discuss this data from two perspectives. First, we will

present a case in which accuracy did not reflect the diffi-

culty of a task, but rather peculiarities of the human mind.

Second, we will take a closer look into the relation between

mental effort, accuracy and duration.

Accuracy for Tasks of Low Complexity. In Section 2

we argued that measurement of accuracy might lead to un-

expected results—in the following, we provide further em-

pirical evidence. In particular, the third question in this

experiment yielded an average mental effort of 3.14, ac-

curacy of 0.79 and duration of 40 seconds when asked for

the hierarchical model. If the same question was posed for

the information-equivalent model without hierarchy, men-

tal effort increased to 3.75, duration increased to 51 sec-

onds, but also the accuracy increased to 0.91. Statistically

speaking, a Mann-Whitney U test showed that mental ef-

2Material can be downloaded from:

http://bpm.q-e.at/experiment/Hierarchy

fort increased significantly (z = -3.271, p = 0.001), also the

duration increased significantly (z = -4.468, p = 0.000). Ap-

parently inconsistently, also the average accuracy increased,

even though not significantly (z = -1.717, p = 0.086)—

according to previous findings, higher mental effort should

have been associated with lower accuracy.

In order to resolve this contradiction, we investigated the

aforementioned question in detail. The analysis showed that

it should have been harder to answer the question for the

non-hierarchical model, i.e., lower accuracy could be ex-

pected. In particular, for answering the question in the

hierarchical model, 13 BPMN nodes had to be taken into

account—for the non-hierarchical model, however, 92 nodes

had to be considered3. Knowing that this amount of nodes

required the subjects to scroll considerably to see all rel-

evant model elements, it seems surprising that actually a

higher accuracy could be observed. However, in the light of

Dual-Process Theory [22], these results can be explained as

follows. For the hierarchical model, the question could be

answered easily, as indicated by the average mental effort

of 3.14 (approximately Low mental effort). Hence, it seems

plausible that system S1 provided a quick, but incorrect an-

swer that was not overridden by S2. In the non-hierarchical

model, subjects were forced to scroll to determine the an-

swer, i.e., involving conscious activities, hence activating S2.

The activation of S2, in turn, ensured that the question was

answered in a controlled way, instead by a quick response of

S1. Summarizing, it seems as if relying on accuracy would

have been misleading in this case, while mental effort and

duration provided more reliable results.

Validity of Mental Effort. So far we have provided em-

pirical evidence that mental effort is more sensitive than

accuracy and can be measured properly for tasks of low com-

plexity. In the following, we will provide empirical evidence

that mental effort is tightly connected to accuracy and du-

ration, i.e., is a valid measure of performance. To visualize

the interplay between mental effort and accuracy, we used

a scatter plot (cf. Figure 1). Therein, the x-axis represents

the average mental effort required for answering a question.

Values from 1 to 7 represent the rating scale used for assess-

ing mental effort, ranging from Extremely low mental effort

(1) to Extremely high mental effort (7). Likewise, the y-

axis reflects a question’s average accuracy, i.e., the ratio of

correct answers to total answers given for a question. As

discussed in Section 2, higher mental effort should be as-

sociated with lower performance. Hence, in this particular

case, higher mental effort should be associated with lower

accuracy. In fact, in Figure 1, a tendency toward lower ac-

curacy with increased mental effort can be observed. In par-

ticular, the dashed line represents the regression line as ob-

tained through simple linear regression (R2= 0.41, F(1,30)

= 21.16, p = 0.000). Please note that this regression does

not contradict the case when mental effort and accuracy

do not perfectly correlate, as demonstrated in the example

above. Rather, the regression is not perfect, hence leaving

room for such idiosyncrasies.

Akin to Figure 1, in Figure 2, the interplay between men-

tal effort and duration is illustrated. Likewise, the x-axis

3The models and question can be accessed through:

http://bpm.q-e.at/misc/HierarchyQuestion3

Mental Effort

7.006.005.004.003.002.001.00

Accuracy

1.00

0.80

0.60

0.40

0.20

0.00

R² Linear = 0.41

Page 1

Figure 1: Mental effort versus accuracy

represents mental effort. On the y-axis, the average amount

of seconds required for answering a question can be found.

The dashed line is the result of simple linear regression (R2

= 0.55, F(1,30) = 36.70, p = 0.000). Interestingly, in this

case higher mental effort is associated with higher duration.

In the light of the background presented in Section 2, also

this result seems plausible. The more difficult a questions

is to answer, the longer the answering process will take and

the higher the mental effort will be.

Mental Effort

7.006.005.004.003.002.001.00

Duration (sec)

100.00

80.00

60.00

40.00

20.00

0.00

Page 1

R² Linear = 0.55

Figure 2: Mental effort versus duration (sec)

To substantiate these observations, we computed Pearson

Correlation coefficient for correlations between mental ef-

fort, accuracy and duration. As shown in Table 1, the find-

ings coincide with the observations made in Figures 1 and 2.

In particular, the results confirm that mental effort and ac-

curacy are correlated negatively (r(30) = -0.643, p = 0.000);

mental effort and duration are correlated positively (r(30)

= 0.742, p = 0.000). Finally, accuracy and duration are

correlated negatively (r(30) = -0.459, p = 0.008).

Mental Eff. Duration

Accuracy Pearson Corr. -0.643** -0.459**

Sig. (2-tailed) 0.000 0.008

N 32 32

Mental Eff. Pearson Corr. 0.742**

Sig. (2-tailed) 0.000

N 32

**. Correlation is significant at the 0.01 level (2-tailed).

Table 1: Correlations

4. DISCUSSION

Up to know we argued that accuracy is presumably rather

insensitive and may be distorted for tasks of low complex-

ity. In order to tackle these problems, the measurement of

mental effort was proposed. In the following, key insights as

well as limitations of this approach are discussed.

Regarding sensitivity, Experiment 1 and Experiment 2 pro-

vided empirical evidence that mental effort is more sensi-

tive than accuracy. In Experiment 1 tasks of rather low

complexity were provided to the subjects. Even though dif-

ferences with respect to accuracy and mental effort could

be observed, only mental effort differed significantly [31].

In Experiment 2 the task complexity was increased, conse-

quently more errors were committed. Knowing that errors

are more likely to be committed when the working mem-

ory is overloaded [23], these observations seem plausible. In

Experiment 1, different levels of mental effort could be ob-

served. However, the working memory was not overloaded,

resulting in a low error rate and hence marginally fluctua-

tions in accuracy. In Experiment 2, increased complexity

lead to an overload of working memory, accordingly the er-

ror rate increased and accuracy dropped. In other words,

it seems likely that differences with respect to mental effort

can be observed before differences with respect to accuracy

occur, i.e., mental effort appears to be more sensitive.

Regarding tasks of low complexity, Experiment 3 provided

further insights. In particular, we could observe an increase

of mental effort and duration that was connected to in-

creased accuracy—actually a decrease of accuracy could be

expected, as far more model elements had to be taken into

account. As indicated in [10], it seems as if this result can be

traced back to peculiarities of the human mind, which tends

to commit more careless mistakes for tasks of low complex-

ity. Hence, in such a case it seems as if the measurement

of mental effort provides a more accurate picture. Please

note that this finding does not contradict the correlation

between mental effort and accuracy, as found in Experiment

3. Rather, the correlation is valid in general, while this pe-

culiar interplay could be found for one specific question.

Apparently, several limitations apply to this work. First,

as shown in Figure 2, a linear relationship between men-

tal effort and duration could be found. Even though this

seems plausible for short-lasting tasks (the maximum aver-

age duration was about 90 seconds), it seems questionable

in how far this holds for longer tasks. For instance, a long-

lasting repetitive task, such as finding all elements named

“A” within a model, will most likely lead to a low mental

effort, but a long duration. Second, mental effort is a sub-

jective measure. Even though it has been shown that people

are able to give a numerical indication of their mental bur-

den [16], it is questionable whether mental effort of different

subjects can be compared directly. Hence, it seems advis-

able to ensure a reasonable sample size when conducting

between-subject experiments or to focus on within-subject

experiments. Third, we reported from consistent results

across three experiments. Still, our findings may be pecu-

liarities of these experiments. To improve the generalization,

more experiments adopting different modeling languages are

required.

5. RELATED WORK

In the domain of cognitive psychology, the work of Paas et

al., in which mental effort is discussed broadly, is directly

connected [17]. In contrast to this work, however, mental

effort is not linked to conceptual modeling. In the domain

of conceptual modeling, related work can be found where

model understandability is of concern. For instance, Aranda

et al. provide guidelines for assessing model understandabil-

ity [1]. Besides accuracy and duration, perceived difficulty is

proposed to be measured. However, in contrast to this work,

no indications on how perceived difficulty can be measured,

are given. Likewise, [11] assesses in a survey how under-

standability of models is measured. The most prominent

measure is accuracy, followed by duration and perceived ease

of understanding. However, these studies rather rely on the

ease-of-use scale from Technology Acceptance Model [6] than

on rating scales for measuring mental effort. Recently, men-

tal effort has also been used as a tool for discussing model

understandability. For instance, in [29] the role of mental

effort in Business Process Modeling is discussed. Likewise,

in [28, 33] mental effort is employed for assessing the impact

of hierarchy on model understandability. In contrast to this

work, however, mental effort is rather used as a tool for dis-

cussion; the measurement of mental effort is not of concern.

Apparently, measuring mental effort is only meaningful if

the validity of the experimental design can be ensured. In

this respect [8, 18] provide guidelines for the proper opera-

tionalization of measurements.

6. SUMMARY AND OUTLOOK

In this work, we showed how measuring mental effort allows

to compensate for shortcomings when measuring accuracy.

In particular, we argued that mental effort is more sensitive

than accuracy and that the measurement is not distorted for

tasks of low complexity. Hence, it allows to identify subtle

differences between conceptual models or conceptual mod-

eling languages. Likewise, when data regarding accuracy

is affected by ceiling effects, mental effort can still provide

valuable insights. In addition, we showed that the measure-

ment of mental effort can be implemented easily through

the adoption of rating scales. Thereby, we recommend to

measure mental effort after each task in order to provide a

fine-grained measurement. With this contribution we hope

to help in paving avenues for even more comprehensive em-

pirical investigations.

Future work will imply the collection of further data for a

deeper investigation of the interplay between mental effort,

accuracy and duration. In particular, we plan to adopt eye

movement analysis [19] to constantly monitor mental effort

while subjects perform a task.

7. REFERENCES

[1] J. Aranda, N. Ernst, J. Horkoff, and S. Easterbrook.

A Framework for Empirical Evaluation of Model

Comprehensibility. In Proc. MISE ’07, pages 7–12,

2007.

[2] A. Baddeley. Working Memory: Theories, Models, and

Controversies. Annu. Rev. Psychol., 63(1):1–29, 2012.

[3] M. Bannert. Managing cognitive load—recent trends

in cognitive load theory. Learning and Instruction,

12(1):139–146, 2002.

[4] J. C. Carver, E. Syriani, and J. Gray. Assessing the

frequency of empirical evaluation in software modeling

research. In Proc. EESSMod ’11, pages 28–37, 2011.

[5] J. A. Cruz-Lemus, M. Genero, M. E. Manso,

S. Morasca, and M. Piattini. Assessing the

understandability of UML statechart diagrams with

composite states—A family of empirical studies.

Empirical Software Engineering, 25(6):685–719, 2009.

[6] F. Davies. A Technology Acceptance Model for

Empirically Testing New End-User Information

Systems: Theory and Results. PhD thesis, Sloan

School of Management, 1986.

[7] A. M. F´ernandez-S´aez, M. Genero, and M. R. V.

Chaudron. Does the level of detail of uml models

affect the maintainability of source code? In Proc.

EESSMod ’11, pages 3–17, 2011.

[8] A. Gemino and Y. Wand. A framework for empirical

evaluation of conceptual modeling techniques. Requir.

Eng., 9(4):248–260, 2004.

[9] D. Gopher and R. Brown. On the psychophysics of

workload: Why bother with subjective measure?

Human Factors: The Journal of the Human Factors

and Ergonomics Society, 26(5):519–532, 1984.

[10] I. Hadar and U. Leron. How intuitive is

object-oriented design? Communications of the ACM,

51(5):41–46, 2008.

[11] C. Houy, P. Fettke, and P. Loos. Understanding

understandability of conceptual models: What are we

actually talking about? In Proc. ER ’12, pages 64–77,

2012.

[12] R. Laue and A. Gadatsch. Measuring the

Understandability of Business Process Models - Are

We Asking the Right Questions? In Proc. BPD ’10,

pages 37–48, 2011.

[13] R. Mayer and P. Chandler. When learning is just a

click away: Does simple user interaction foster deeper

understanding of multimedia messages. Journal of

Educational Psychology, 93(2):390–397, 2001.

[14] D. L. Moody. Cognitive Load Effects on End User

Understanding of Conceptual Models: An

Experimental Analysis. In Proc. ADBIS ’04, pages

129–143, 2004.

[15] J. Mylopoulos. Information modeling in the time of

the revolution. Information Systems, 23(3/4):127–155,

1998.

[16] F. Paas. Training strategies for attaining transfer of

problem-solving skill in statistics: A cognitive-load

approach. Journal of Educational Psychology,

84(4):429–434, 1992.

[17] F. Paas, A. Renkl, and J. Sweller. Cognitive Load

Theory and Instructional Design: Recent

Developments. Educational Psychologist, 38(1):1–4,

2003.

[18] J. Parsons and L. Cole. What do the pictures mean?

Guidelines for experimental evaluation of

representation fidelity in diagrammatical conceptual

modeling techniques. DKE, 55(3):327–342, 2005.

[19] J. Pinggera, M. Furtner, M. Martini, P. Sachse,

K. Reiter, S. Zugal, and B. Weber. Investigating the

Process of Process Modeling with Eye Movement

Analysis. In Proc. ER-BPM ’12, to appear.

[20] J. Pinggera, P. Soffer, S. Zugal, B. Weber,

M. Weidlich, D. Fahland, H. Reijers, and J. Mendling.

Modeling Styles in Business Process Modeling. In

Proc. BPMDS ’12, pages 151–166, 2012.

[21] J. Pinggera, S. Zugal, and B. Weber. Investigating the

process of process modeling with cheetah experimental

platform. In Proc. ER-POIS ’10, pages 13–18, 2010.

[22] K. E. Stanovich and R. West. Individual differences in

reasoning: implications for the rationality debate?

Behavioural and Brain Sciences, 23(5):665–726, 2000.

[23] J. Sweller. Cognitive load during problem solving:

Effects on learning. Cognitive Science, 12(2):257–285,

1988.

[24] W. J. Tracz. Computer programming and the human

thought process. Software: Practice and Experience,

9(2):127–137, 1979.

[25] W. P. Vogt. Dictionary of Statistics & Methodology: A

Nontechnical Guide for the Social Sciences (Fourth

Edition). SAGE Publications, 2011.

[26] C. Wohlin, M. H¨

ost, and K. Henningsson. Empirical

research methods in software engineering. In Empirical

Methods and Studies in Software Engineering, volume

2765 of LNCS, pages 7–23. Springer, 2003.

[27] C. Wohlin, R. Runeson, M. Halst, M. Ohlsson,

B. Regnell, and A. Wesslen. Experimentation in

Software Engineering: an Introduction. Kluwer, 2000.

[28] S. Zugal, J. Pinggera, J. Mendling, H. Reijers, and

B. Weber. Assessing the Impact of Hierarchy on

Model Understandability—A Cognitive Perspective. In

Proc. EESSMod ’11, pages 123–133, 2011.

[29] S. Zugal, J. Pinggera, and B. Weber. Assessing

process models with cognitive psychology. In Proc.

EMISA ’11, pages 177–182, 2011.

[30] S. Zugal, J. Pinggera, and B. Weber. Creating

Declarative Process Models Using Test Driven

Modeling Suite. In Proc. CAiSE Forum ’11, pages

16–32, 2011.

[31] S. Zugal, J. Pinggera, and B. Weber. The impact of

testcases on the maintainability of declarative process

models. In Proc. BPMDS ’11, pages 163–177, 2011.

[32] S. Zugal, J. Pinggera, and B. Weber. Toward

Enhanced Life-Cycle Support for Declarative

Processes. Journal of Software: Evolution and Process,

24(3):285–302, 2012.

[33] S. Zugal, P. Soffer, J. Pinggera, and B. Weber.

Expressiveness and Understandability Considerations

of Hierarchy in Declarative Business Process Models.

In Proc. BPMDS ’12, pages 167–181, 2012.