Learnability of a Configurator Empowering End Users to Create Mobile Data Collection Instruments: Usability Study [original]

Original Paper

Learnability of a Configurator Empowering End Users to Create

Mobile Data Collection Instruments: Usability Study

Johannes Schobel1, MSc; Rüdiger Pryss1, PhD; Thomas Probst2, PhD; Winfried Schlee3, PhD; Marc Schickler1, Dipl

Inf; Manfred Reichert1, PhD

1Institute of Databases and Information Systems, Ulm University, Ulm, Germany

2Department for Psychotherapy and Biopsychosocial Health, Danube University Krems, Krems, Austria

3Department of Psychiatry and Psychotherapy, University of Regensburg, Regensburg, Germany

Corresponding Author:

Johannes Schobel, MSc

Institute of Databases and Information Systems

Ulm University

James-Franck-Ring

Ulm, 89081

Germany

Phone: 49 731 50 24229

Fax: 49 731 50 24134

Email: [email protected]

Abstract

Background: Many research domains still heavily rely on paper-based data collection procedures, despite numerous associated

drawbacks. The QuestionSys framework is intended to empower researchers as well as clinicians without programming skills to

develop their own smart mobile apps in order to collect data for their specific scenarios.

Objective: In order to validate the feasibility of this model-driven, end-user programming approach, we conducted a study with

80 participants.

Methods: Across 2 sessions (7 days between Session 1 and Session 2), participants had to model 10 data collection instruments

(5 at each session) with the developed configurator component of the framework. In this context, performance measures like the

time and operations needed as well as the resulting errors were evaluated. Participants were separated into two groups (ie, novices

vs experts) based on prior knowledge in process modeling, which is one fundamental pillar of the QuestionSys framework.

Results: Statistical analysis (ttests) revealed that novices showed significant learning effects for errors (P=.04), operations

(P<.001), and time (P<.001) from the first to the last use of the configurator. Experts showed significant learning effects for

operations (P=.001) and time (P<.001), but not for errors as the experts’ errors were already very low at the first modeling of the

data collection instrument. Moreover, regarding the time and operations needed, novices got significantly better at the third

modeling task than experts were at the first one (ttests; P<.001 for time and P=.002 for operations). Regarding errors, novices

did not get significantly better at working with any of the 10 data collection instruments than experts were at the first modeling

task, but novices’ error rates for all 5 data collection instruments at Session 2 were not significantly different anymore from those

of experts at the first modeling task. After 7 days of not using the configurator (from Session 1 to Session 2), the experts’ learning

effect at the end of Session 1 remained stable at the beginning of Session 2, but the novices’ learning effect at the end of Session

1 showed a significant decay at the beginning of Session 2 regarding time and operations (ttests; P<.001 for time and P=.03 for

operations).

Conclusions: In conclusion, novices were able to use the configurator properly and showed fast (but unstable) learning effects,

resulting in their performances becoming as good as those of experts (which were already good) after having little experience

with the configurator. Following this, researchers and clinicians can use the QuestionSys configurator to develop data collection

apps for smart mobile devices on their own.

(JMIR Mhealth Uhealth 2018;6(6):e148) doi:10.2196/mhealth.9826

KEYWORDS

mHealth; data collection; mobile apps

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.1http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Introduction

In psychology and social sciences, self-report questionnaires

are commonly used to collect data in various situations [1].

These data are predominantly collected using paper-based

questionnaires, which are costly regarding the subsequent

processing and analysis of the collected data. Furthermore, the

latter has to be transferred to digital spreadsheet documents,

which is a time-consuming and error-prone task, especially in

the context of large-scale trials or studies. According to one

estimate, approximately 50%-60% of the costs related to the

collection, transfer, and processing of the data could be saved

using digital instruments instead of paper-based ones [2].

Additionally, electronic questionnaires do not differ from the

paper-based versions in psychometric properties [3]. Moreover,

they contribute to more complete datasets compared with the

ones collected using pencil and paper [4], resulting in a better

data quality [5]. Finally, the digitally collected data may be

enriched with contextual information [6] (eg, time and location)

or sensor data [7] (eg, pulse measurement during an interview).

In general, digital instruments are in increasing demand to

support clinical trials or other psychological studies [8].

Over the last decade, several Web-based questionnaire apps

(eg, LimeSurvey or SurveyMonkey) emerged, enabling end users

to create online questionnaires themselves. Although these apps

are useful, they are not suitable in certain application scenarios.

Among others, Web questionnaires require permanent internet

access and are usually unable to capture data from external

sensors (eg, camera, Global Positioning System, or vital

parameter sensors). Smart mobile devices (eg, mobile phones

or tablets), in turn, could act as an enabler for scenarios in which

Web questionnaires are not sufficient. Mobile devices have

already proven their applicability in the context of various

business scenarios [9], ranging from simple task management

apps to sophisticated business analytics platforms to even apps

supporting ward rounds in hospitals [10].

Contrary to these findings, however, mobile data collection apps

are still rarely used in large-scale scenarios, like clinical or

psychological trials. The following three reasons are of

paramount importance in this context:

1. Researchers are unaware of the capabilities of and

opportunities offered by smart mobile devices in their

respective domain. This can also be traced back to the high

costs of such devices, especially in the context of large-scale

studies requiring multiple devices.

2. Already existing data collection apps do not adequately

support researchers. There might be legal aspects that need

to be considered (eg, “Where shall the data be stored?”,

“Who shall be allowed to access the data?”); the mobile

apps might require permanent internet access, or their

advanced features (eg, use of sensors during the data

collection procedure) need to be supported.

3. Implementing sophisticated mobile data collection apps

usually requires considerable communication efforts

between researchers and mobile app developers. This

communication is further aggravated due to the fact that

both groups use different languages (ie, terminology, or

[graphical] notations) to express themselves.

It is noteworthy that there are several mobile apps proving the

applicability of smart mobile devices in the context of data

collection scenarios, such as Manage My Pain [11] or Track

Your Tinnitus [12]. Although the participants involved in

respective scenarios gave positive feedback, several

shortcomings could still be observed. The latter include, for

example, high development costs, the need for skilled app

developers, or the common business-IT alignment gap (ie,

domain experts being unable to express what developers shall

realize) [13]. When relying purely on smart mobile devices for

data collection purposes, specific participant groups may be

excluded (eg, elderly) [14]. Furthermore, providing respective

mobile app for only one mobile operating system (eg, Android

or iOS) might result in biased samples as their users can differ

regarding various aspects such as income, age, or education

[15].

We also observed these issues in several long-running,

large-scale data collection scenarios for which we had provided

mobile apps (see Table 1).

Table 1. Implemented mobile data collection apps.

Collected datasets using

smart mobile apps

App

versions

Duration

(years)

Complex navigationa

CountryData collection scenario

≥45,0005>5NoWorldwideStudy on tinnitus research [17]

≥15005>5NoGermanyRisk factors during pregnancy [18]

≥5001>2NoGermanyRisk factors after pregnancy

≥220054YesBurundiPosttraumatic stress disorder in war regions [19]

≥20011NoUgandaPosttraumatic stress disorder in war regions [20]

≥15032YesGermanyAdverse childhood experiences [21]

≥20031YesGermanyLearning deficits among medical students

≥50006>3NoEuropean UnionSupporting parents after accidents of children

≥54,75029———Overall

aNo: complex navigation was not requested/required; yes: complex navigation was requested/required.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.2http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Most of these apps were explicitly tailored and implemented to

support a specific application scenario. Developing such a

plethora of data collection instruments enabled us to elaborate

crucial requirements in this context [16]. Although the involved

investigators and clinicians were satisfied with the provided

mobile data collection apps, they craved for more sophisticated

features over time. The latter include, for example,

audio-recordings during interviews, additional notes, and

real-time data analyses. Furthermore, maintaining these

specifically implemented apps over time was a costly and

time-consuming endeavor. In order to relieve app developers

from such tasks, researchers as well as clinicians should be

enabled to develop mobile apps themselves. Existing approaches

[22,23] combine WordPress, a blogging software, and

iBuildApp, a Web-based app builder, to create a platform

supporting students from clinical psychiatry. The focus of this

platform, however, is on information retrieval (eg, psychiatric

guidelines). Furthermore, only limited support regarding the

development of digital instruments is provided. Other projects

like MagPi or MovisensXS also provide configurators using

simple Web forms, allowing end users to create data collection

apps. Our work significantly differs from these approaches as

we focus on sophisticated data collection instruments based on

advanced process management technology. In particular,

well-established graphical notations are provided to express

various aspects of such data collection instruments. Figure 1

represents an instrument using the Business Process Modeling

and Notation (BPMN) 2.0 graphical notation [24] that provides

a solid basis for the developed configurator. The latter, however,

uses its own graphical notation in order to allow end users

without expertise in process modeling to apply such techniques.

The modeled instrument, in turn, may then be executed on smart

mobile devices, such as mobile phones or tablets.

In general, graphical process notations comprise various

elements that allow specifying and visualizing complex business

processes in enterprises (eg, partners involved and their roles

in the process, data elements, or temporal process constraints).

When applying such a notation to the modeling of data collection

instruments, several issues emerged. In particular, researchers

were overwhelmed by the multitude of graphical elements as

well as their semantical meaning needed to properly represent

their specific data collection instrument. More precisely, dealing

with data elements was especially challenging when modeling

such instruments. First, the data element needs to be specified

accordingly. Second, a question that produces (ie, writes) this

data element must be modeled; third, the data element must be

consumed (ie, read) by decisions later in order to properly

control the flow of the instrument. Monitoring these aspects,

while still dealing with the modeling process in general, is

challenging.

To make such an expressive modeling approach better accessible

for end users with little or no knowledge of process modeling

(ie, researchers or clinicians), end-user programming techniques

were evaluated. Such techniques, in turn, have proven their

feasibility in a multitude of studies to support nonprogrammers.

The use of a graphical programming language instead of a

text-based one has been evaluated to teach children

programming [25]. Their teachers reported that the simplified

representation significantly improved the understanding of

program code. Another approach [26] applied end-user

programming techniques to support administrators in “writing”

management scripts used in their daily routines. The

“programming” of Web Mashups, which combines operators

and functions in a graphical manner, has been previously

presented [27]. Among others, these studies have proven the

applicability of end-user programming approaches in their

specific domain.

Taking the above issues into account, the QuestionSys

configurator that we developed applies sophisticated end-user

programming techniques to properly abstract the modeling of

data collection instruments. Accordingly, QuestionSys offers a

user-friendly configurator, hiding most of the complexity

introduced by process modeling languages. Particularly, this

configurator uses its own (graphical) modeling notation based

on BPMN 2.0 in order to allow end users without any

programming skills or knowledge in process modeling to

graphically specify data collection instruments themselves.

Therefore, there should be no need for involving IT experts

anymore when developing such mobile data collection

instruments [28]. To be more precise, QuestionSys particularly

focuses on scenarios in which the instruments need to be

frequently adapted. Especially these adaptations shall be

accomplished by end users with no programming experience

in order to reduce costs.

Figure 1. A data collection instrument represented as BPMN 2.0 model.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.3http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Frequent adaptations, in turn, require the continuous use of the

respective configurator. Especially usability of apps that are

used in a daily manner is of paramount importance. One aspect

that is relevant in this context is the prior experience needed to

learn the app. Following this, it is important for the usability of

an app to assess learnability.

For this purpose, in a pilot study, we evaluated the QuestionSys

configurator with 44 participants and obtained promising results

with respect to the modeling of data collection instruments and

overall usability of the configurator [29]. We found that

individuals with no experience in process modeling understood

how to properly use the configurator. Based on this pilot study,

we conducted a larger study with a more sophisticated study

design that comprised 2 testing sessions (second session 7 days

after the first one) with 5 tasks (ie, modeling data collection

instruments) at each session. This refined study and its results

are presented in the manuscript at hand. The following research

questions (RQs) were addressed with novices (ie, individuals

with no experience in process modeling) and experts (individuals

with experience in process modeling):

RQ1: How are the performances of novices and

experts changing from the first to the last task (data

collection instrument) of Session 1?

RQ2: How are the performances of novices and

experts changing from the last task (data collection

instrument) of Session 1 to the first task (data

collection instrument) of Session 2?

RQ3: How are the performances of novices and

experts changing from the first to the last task (data

collection instrument) of Session 2?

RQ4: How are the performances of novices and

experts changing from the first task (data collection

instrument) of Session 1 to the last task (data

collection instrument) of Session 2?

RQ5: How many tasks (data collection instruments)

are necessary until the performance metrics of novices

are as good as those of experts at the first task (data

collection instrument)?

Methods

Configurator Component

The combined use of well-known technologies from end-user

programming and business process management enables end

users to create mobile data collection instruments on their own.

The most important views of the configurator component

(“Element and Page Repository View” and “Modeling Area

View”) are sketched in Figures 2 and 3; see [30] for more

details.

Element and Page Repository View (see Figure 2). The element

repository allows creating basic elements of a questionnaire (eg,

texts and questions). The rightmost part shows an editor panel,

where particular attributes of the respective elements may be

managed. The configurator enables researchers to handle

elements in multiple languages and revisions. Most importantly,

the created elements may be combined to pages using drag &

drop operations.

Figure 2. The QuestionSys configurator: combining elements to pages.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.4http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Figure 3. The QuestionSys configurator: modeling a data collection instrument.

Modeling Area View (see Figure 3). Previously created pages

may be used to model the structure of the data collection

instrument. Furthermore, researchers are able to model advanced

navigation operations (eg, to skip pages depending on already

given answers to previous questions) to adapt the instrument

during the data collection process. The modeling view, in turn,

provides guidance for untrained users; particularly, it does not

allow applying wrong operations to the model. Note that the

QuestionSys configurator applies its own (graphical) modeling

notation. The latter, however, is inspired by BPMN 2.0, but

significantly simplifies the modeling process for individuals

having no experience with process modeling notations (eg, no

explicit data flow needs to be modeled).

Altogether, the configurator component and its model-driven

approach allow researchers to graphically define the elements

and logic of data collection instruments.

In order to be able to automatically collect the data needed for

the evaluation of the configurator component, the latter was

enhanced with a Study Mode that enables specific features. First,

it requires users to enter a code before using the configurator.

This code, in turn, is used to store all collected data in a

dedicated folder. Second, the configurator tracks the time when

a specific operation (eg, adding a page to the model) was

performed. Third, after performing the operation, an image of

the model is stored on the computer. This allows reproducing

the process of modeling a data collection instrument step-by-step

as well as manually evaluating the errors in the resulting model.

Study Procedure

Participants modeled a series of data collection instruments (ie,

5 data collection instruments per session) with the QuestionSys

configurator over 2 sessions (with 7 days between Session 1

and Session 2). A controlled environment was chosen for this

study in order to be able to quickly react to upcoming problems.

For the study, 20 workstations, each comparable in hardware

resources (eg, RAM and central processing unit cores), were

prepared in a computer pool at Ulm University. Each

workstation was equipped with two monitors running a common

resolution. Before each of the 2 sessions, respective workstations

were prepared carefully. This includes, for example, reinstalling

the configurator component and placing the consent form,

description of tasks, and mental effort questionnaires next to

the workstation.

The procedure of the study is outlined in Figure 4. The study

started with welcoming the participants and introducing the goal

of the study as well as the overall procedure. Then, the

participants performed 2 tests (2 min each) measuring their

processing speed. Both tests are reliable and valid tests of the

Wechsler Adult Intelligence Scale [31]. Next, we provided a

tutorial (approximately 5 min) demonstrating the most important

features of the configurator. Before conducting the main part

of the study, the participants were asked to fill in a demographic

questionnaire. Up to this point in time, the participants were

allowed to ask questions. Next, participants had to model 5 data

collection instruments (tasks; see Table 2) using solely the

provided configurator component, followed by filling in a short

questionnaire. Concluding the first session, participants had to

answer a short questionnaire again. Altogether, this first session

took approximately 50 min.

After pausing for exactly 1 week, the participants were reinvited

for a second session. The latter, however, was much shorter as

the collection of demographic data could be skipped. Participants

were given 5 new tasks (ie, to model 5 new data collection

instruments; see Table 2), and they had to answer the short

questionnaires again.

The data, automatically recorded by the configurator, were then

uploaded to a network-attached storage after each session.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.5http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Figure 4. Study design.

Table 2. Short description of the tasks to be modeled by participants.

DecisionsPagesModeling a questionnaire#

25...to collect information about flight passengers1

25...to help customers select an appropriate mobile phone2

25...to help collect required information for travel expense reports3

25...to order food and drinks online4

25...to support customers select a movie and book cinema tickets5

25...to help customers select an appropriate laptop computer6

25...to support customers book seats for a theater play7

25...to inform patients regarding their upcoming surgery8

25...to guide customers through the process of purchasing a new coffee machine and equipment9

25...to collect required data to conclude a contract in a gym10

All materials and methods were approved by the Ethics

Committee of Ulm University and were carried out in

accordance with the approved guidelines. All participants gave

their informed consent.

Tutorial

Before working with the configurator app, a screencast tutorial

was presented directly to each participant. The latter was

recorded by us; it describes how to create a very simple data

collection instrument. No voice or sound was recorded; however,

the screencast was annotated with small comments in

postproduction.

Tasks

Each task to be modeled was presented in a textual

representation that described the overall structure of the data

collection instrument to be created.

When designing the tasks for the participants, we paid close

attention to the fact that all 10 tasks were mutually comparable.

As this study intended to measure the learnability, which is a

contributory factor of the overall usability of the configurator,

it was of utmost importance to keep the complexity for all

modeling tasks constant. Tasks in divergent complexity, in turn,

may limit the validity of the study results as a change in

performance measures may be attributed to a more complex

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.6http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

model itself or respective learning effect. The overall complexity

includes, on one hand, the complexity of the textual

representation handed out to the participants and, on the other,

the complexity of the resulting data collection instrument.

All tasks were designed so that a perfect instrument modeling

solution would need exactly the same number of operations.

Furthermore, each model contained two decision points in order

to influence the further processing of the instrument based on

given answers. Thematically, the models to be created were

selected from various domains, ranging from a health care

instrument up to a questionnaire for a food delivery service (see

Table 2).

Participants

In total, 80 participants were recruited at Ulm University for

the experiment. Most of them were students or research

associates from various departments, like computer science,

economics, chemistry, psychology, and medicine [32]. We

recruited participants from these different disciplines to allow

a comparison of how individuals with no experience in process

modeling can learn using the configurator compared with

individuals with experience in process modeling. The target

group (end users from medical, psychological, or social sciences)

that shall be empowered by the configurator to develop mobile

data collection instruments has most probably no experience in

process modeling. During the recruitment phase, we paid close

attention to maintaining the balance between female and male

participants. Students willing to participate were instructed

according to the developed study design, which was explained

to them before, and they were additionally informed that there

will be 2 consecutive sessions to attend. To ensure that all

participants correctly understood the tasks to be performed by

them, all relevant material was handed out in German [33].

Participants who answered the prequestion (ie, a question in the

demographic questionnaire) “Do you have experience in process

modeling” with yes were classified as experts, whereas

participants who answered this question with no were classified

as novices. It should be kept in mind that this is only one

(simplified) possibility to classify participants into novices and

experts. Another possibility would be more in-depth

prequestioning of participants (eg, asking about familiarity with

notations such as BPMN, asking for examples of process models

they have created, and asking specific questions about particular

items in process modeling notations). This would lead to a

spectrum of rated expertise, rather than the simplified binary

approach used in this study.

Altogether, our classification resulted in 45 novices and 35

experts. Three of the recruited participants did not participate

in the second session (1 novice and 2 experts). Therefore, RQs

2-4 (RQs that included data gathered in the second session)

were investigated with 77 participants (44 novices and 33

experts).

Baseline Measures

To evaluate whether experts and novices differed in their

cognitive abilities, we performed 2 established tests measuring

their processing speed [31]. Within 2 min each, participants had

to assign symbols to numbers (“Digital Symbol Coding”) and

detect symbols from within a set of symbols (“Symbol Search”).

Differences in cognitive abilities at baseline would be a

confounder as a higher cognitive ability could result in better

or faster learnability of the configurator.

Questionnaires

A demographic questionnaire collecting personal information

(eg, gender or education) was handed out to the participants.

Specific focus was put on questions about their prior knowledge

regarding process modeling, in general, or about how many

process models they had read and written during the last 12

months, in particular. After completing each task, participants

had to answer 5 questions regarding their mental effort when

modeling the instrument. At the end of each session, they had

to answer questions regarding the quality of the modeled

instruments or their own competence when working with the

provided configurator component.

Performance Measures

The following performance measures were collected:

Time

The moment participants started modeling their instruments,

the respective timestamp was added to an Excel file stored in

the configurator app’s directory. Once the task was completed,

another timestamp was added to the file. This allowed us to

evaluate the time taken to complete the respective tasks on a

fine-grained level (values were assessed in milliseconds).

Operations

Whenever participants interacted with the instrument (eg, by

adding a new page), their specific operations were logged in an

Excel file. In addition, the time at which the participant executed

this operation was logged. Finally, the configurator took an

image of the current model after performing the respective

operation and stored it to the directory of the participant.

Errors

It was not possible for the configurator to automatically assess

the errors in the resulting model (eg, order of branches in

decision points may be switched or respective statements may

be inverted). Therefore, we manually evaluated all created

models based on the provided images. As the configurator

provided a snapshot of the model after each operation, it was

possible to recreate the modeling process of each participant.

Furthermore, this allowed us to assess the models on a

fine-grained basis.

Statistics

SPSS 23 was used for all statistical analyses. Frequencies,

percentages, means, and standard deviations were calculated as

descriptive statistics. Novices and experts were compared in

baseline variables using Fisher’s exact tests and ttests for

independent samples. To test RQs 1-4, ttests for dependent

samples were performed to investigate the change in the

performance measures between the tasks (data collection

instruments) specified in the corresponding RQ; these ttests

for dependent samples, in turn, were conducted separately for

novices and experts. ttests for independent samples were

performed to evaluate RQ5, thereby performances of novices

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.7http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

at each task (data collection instrument) were compared with

those of experts at Task 1 of Session 1 (first data collection

instrument) in order to identify the tasks (data collection

instruments) for which the performances of novices were not

significantly different from those of experts at Task 1 of Session

1 (first data collection instrument). All statistical tests were

performed two tailed; the significance value was set to P<.05.

Data Availability

The raw data set containing all collected data that was analyzed

during this study is included in this paper (and its supplementary

material).

Results

Baseline Comparison Between Novices and Experts

Table 3 summarizes the sample description and comparisons

between novices and experts in baseline variables. There were

more female participants in the novices’ sample and more male

participants in the experts’ sample (P=.003). Moreover, the

experts’ sample comprised a higher percentage of participants

with a bachelor degree as highest education than the novices’

sample, whereas the latter comprised a higher percentage of

participants with graduating high school as highest education

(P=.009). While a higher percentage of the novices’ sample

studied psychology than the experts’ sample, a higher percentage

in the latter studied economics or computer science than in the

former (P=.001).

Table 3. Sample description and comparison between novices and experts in baseline variables.

Pvalue

Experts (n=35)Novices (n=45)Variable

.003a

Gender, n (%)

12 (34)31 (69)Female

23 (66)14 (31)Male

.180a

22.72 (2.97)21.20 (2.63)Age (years), mean (SD)

Age category, n (%)

17 (49)29 (64)<25 years

18 (51)16 (36)25-35 years

.009a

Highest education, n (%)

2 (6)13 (29)High school

32 (91)32 (71)Bachelor

1 (3)0 (0)Master

.001a

Current field of study, n (%)b

12 (40)14 (33)Economics

8 (27)0 (0)Media computer science

6 (20)1 (2)Computer science

1 (3)0 (0)International business

0 (0)2 (5)Chemistry

3 (10)26 (60.5)Psychology

Processing speed test 1: digital symbol coding, mean (SD)

.51581.11 (21.89)84.33 (21.76)Correct answers

.8640.06 (0.24)0.07 (0.25)Wrong answers

Processing speed test 2: symbol search, mean (SD)

.10338.91 (8.53)41.93 (7.77)Correct answers

.7951.63 (1.50)1.73 (1.98)Wrong answers

aFisher’s exact test.

bN=73/80 (91%) participants gave information on their current field of study.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.8http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Results for RQ1

Time

Novices (n=45): The mean time (in milliseconds) required for

the first task of Session 1 (first data collection instrument) was

452,334.29 (SD 209,527.70), and the mean time required for

the last task of Session 1 (fifth data collection instrument) was

135,273.89 (SD 49,861.64). This improvement reached

statistical significance: t(44)=10.71; P<.001.

Experts (n=35): The mean time needed (in milliseconds) for a

task significantly decreased from 405,444.89 (SD 248,497.68)

at the first task of Session 1 (first data collection instrument) to

147,251.91 (SD 91,181.39) at the last task of Session 1 (fifth

data collection instrument): t(34)=6.12; P<.001.

Operations

Novices (n=45): Operations significantly decreased from a mean

17.60 (SD 7.87) at the first task of Session 1 (first data collection

instrument) to 11.24 (SD 3.68) at the last task of Session 1 (fifth

data collection instrument): t(44)=5.23; P<.001.

Experts (n=35): Significantly less operations were needed at

the last task of Session 1 (fifth data collection instrument) than

at the first task of Session 1 (first data collection instrument):

17.49 (SD 11.20) at the first task of Session 1 and 11.31 (SD

3.98) at the last task of Session 1: t(34)=3.41; P=.002.

Errors

Novices (n=45): Errors nonsignificantly decreased from 1.24

(SD 2.15) at the first task of Session 1 (first data collection

instrument) to 1.00 (SD 1.83) at the last task of Session 1 (fifth

data collection instrument): t(44)=0.88; P=.386.

Experts (n=34, as errors were not available for one expert

because of corrupted snapshot images): Errors decreased from

a mean of 0.35 (SD 0.88) at the first task of Session 1 (first data

collection instrument) to 0.32 (SD 0.84) at the last task of

Session 1 (fifth data collection instrument). However, this

change was not statistically significant: t(33)=0.16; P=.876.

Results for RQ2

Time

Novices (n=44): The mean time (in milliseconds) significantly

increased from the last task of Session 1 (fifth data collection

instrument) to the first task of Session 2 (sixth data collection

instrument): 133,725.80 (SD 49,332.01) versus 235,291.93 (SD

167,630.02); t(43)=−3.82; P<.001.

Experts (n=33): No significant change in the mean time (in

milliseconds) emerged between the last task of Session 1 (fifth

data collection instrument) and the first task of Session 2 (sixth

data collection instrument): 148,253.30 (SD 93,726.57) versus

222,304.67 (SD 227,425.64); t(32)=−1.76; P=.088.

Operations

Novices (n=44): Significantly more operations were observed

at the first task of Session 2 (sixth data collection instrument)

than at the last task of Session 1 (fifth data collection

instrument): 11.11 (SD 3.61) versus 13.89 (SD 6.88); t

(43)=−2.25; P=.030.

Experts (n=33): Operations did not significantly change between

the last task of Session 1 (fifth data collection instrument) and

the first task of Session 2 (sixth data collection instrument):

10.97 (SD 3.62) versus 12.70 (SD 5.93); t(32)=−1.46, P=.155.

Errors

Novices (n=44): Errors did not significantly change between

the last task of Session 1 (fifth data collection instrument) and

the first task of Session 2 (sixth data collection instrument):

1.02 (SD 1.85) versus 0.86 (SD 1.44); t(43)=0.69; P=.492.

Experts (n=33): From the last task of Session 1 (fifth data

collection instrument) to the first task of Session 2 (sixth data

collection instrument), errors did not significantly change: 0.33

(SD 0.85) versus 0.46 (SD 0.97); t(32)=−0.61; P=.545.

Results for RQ3

Time

Novices (n=44): The mean time (in milliseconds) significantly

decreased from the first task of Session 2 (sixth data collection

instrument), 235,291.93 (SD 167,630.02), to the last task of

Session 2 (tenth data collection instrument), 107,957.18 (SD

54,837.64): t(43)=5.12; P<.001.

Experts (n=33): The mean time (in milliseconds) significantly

decreased from the first task of Session 2 (sixth data collection

instrument), 222,304.67 (SD 227,425.64), to the last task of

Session 2 (tenth data collection instrument), 85,600.36 (SD

23,698.01): t(32)=3.53; P=.001.

Operations

Novices (n=44): Operations became significantly less from the

first task of Session 2 (sixth data collection instrument), 13.89

(SD 6.88), to the last task of Session 2 (tenth data collection

instrument), 11.55 (SD 4.86): t(43)=2.01; P=.050.

Experts (n=33): Significantly less operations were needed at

the last task of Session 2 (tenth data collection instrument), 9.45

(SD 1.06), than at the first task of Session 2 (sixth data collection

instrument), 12.70 (SD 5.93): t(32)=3.00; P=.005.

Errors

Novices (n=44): Errors did not significantly change between

the first task (sixth data collection instrument), 0.86 (SD 1.44),

and last task (tenth data collection instrument), 0.64 (SD 1.01),

of Session 2: t(43)=1.26; P=.215.

Experts (n=33): No change in errors between the first task (sixth

data collection instrument), 0.46 (SD 0.97), and last task (tenth

data collection instrument), 0.30 (SD 0.85), of Session 2

emerged: t(32)=0.78; P=.443.

Results for RQ4

Time

Novices (n=44): From the first task of Session 1 (first data

collection instrument) to the last task of Session 2 (tenth data

collection instrument), the mean time (in milliseconds)

significantly decreased: 456,322.02 (SD 210,215.59) versus

107,957.18 (SD 54,837.64); t(43)=11.30; P<.001.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.9http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Experts (n=33): The mean time (in milliseconds) significantly

decreased from the first task of Session 1 (first data collection

instrument) to the last task of Session 2 (tenth data collection

instrument) 393,204.06 (SD 46,642.43) versus 85,600.36 (SD

23,698.01); t(32)=7.24; P<.001.

Operations

Novices (n=44): From the first task of Session 1 (first data

collection instrument) to the last task of Session 2 (tenth data

collection instrument), operations became significantly less:

17.80 (SD 7.85) versus 11.55 (SD 4.86); t(43)=4.98; P<.001.

Experts (n=33): Operations significantly decreased from the

first task of Session 1 (first data collection instrument) to the

last task of Session 2 (tenth data collection instrument): 17.18

(SD 11.41) versus 9.45 (SD 1.06); t(32)=3.83; P=.001.

Errors

Novices (n=44): Errors significantly decreased from the first

task of Session 1 (first data collection instrument) to the last

task of Session 2 (tenth data collection instrument): 1.7 (SD

2.17) versus 0.64 (SD 1.01); t(43)=2.09; P=.043.

Experts (n=33): Errors did not significantly change between the

first task of Session 1 (first data collection instrument) and the

last task of Session 2 (tenth data collection instrument): 0.36

(SD 0.90) versus 0.30 (SD 0.85); t(32)=0.30; P=.768.

Results for RQ5

Time

The comparisons between the time (in milliseconds) of experts

at the first task of Session 1 and the time of novices at each task

are presented in Table 4. It can be seen that novices were not

significantly different from experts already at the first task of

Session 1 (first data collection instrument) and that the time

taken by novices at Tasks 3-10 was significantly less than that

taken by experts at Task 1 of Session 1 (P=.363 comparing Task

1 of novices with Task 1 of experts; P=.062 comparing Task 2

of novices with Task 1 of experts; P<.001 comparing Task 3 of

novices with Task 1 of experts; P<.001 comparing Task 4 of

novices with Task 1 of experts; P<.001 comparing Task 5 of

novices with Task 1 of experts; P=.001 comparing Task 6 of

novices with Task 1 of experts; P<.001 comparing Task 7 of

novices with Task 1 of experts; P<.001 comparing Task 8 of

novices with Task 1 of experts; P<.001 comparing Task 9 of

novices with Task 1 of experts; P<.001 comparing Task 10 of

novices with Task 1 of experts).

Operations

Table 5 compares the operations of experts at the first task of

Session 1 and those of novices at each task. Again, novices

performed not significantly different from experts already at

the first task of Session 1 (first data collection instrument).

Moreover, the operations of novices at Tasks 3, 4, 5, 7, 8, 9,

and 10 were significantly less than those of experts at Task 1

of Session 1. Only the difference between operations of novices

at Task 6 (first data collection instrument of Session 2) and

those of experts at Task 1 (first data collection instrument of

Session 1) did not reach statistical significance (P=.957

comparing Task 1 of novices with Task 1 of experts; P=.373

comparing Task 2 of novices with Task 1 of experts; P=.002

comparing Task 3 of novices with Task 1 of experts; P=.027

comparing Task 4 of novices with Task 1 of experts; P=.003

comparing Task 5 of novices with Task 1 of experts; P=.101

comparing Task 6 of novices with Task 1 of experts; P=.007

comparing Task 7 of novices with Task 1 of experts; P=.004

comparing Task 8 of novices with Task 1 of experts; P=.020

comparing Task 9 of novices with Task 1 of experts; P=.005

comparing Task 10 of novices with Task 1 of experts).

Table 4. Comparisons between the time (in milliseconds) taken by experts at the first task of Session 1 and that taken by novices at each task.

Pvaluea

Mean (SD)NSessionSample and task

Experts

—

405,444.89 (248,497.68)

351Task 1

Novices

.363

452,334.29 (209,527.70)

451Task 1

.062

310,765.11 (198,970.99)

451Task 2

<.001

173,889.87 (73,069.81)

451Task 3

<.001

161,358.91 (65,405.85)

451Task 4

<.001

135,273.89 (49,861.64)

451Task 5

.001

235,291.93 (167,630.02)

442Task 6

<.001

126,357.86 (59,195.92)

442Task 7

<.001

188,537.89 (144,107.50)

442Task 8

<.001

155,625.20 (90,902.41)

442Task 9

<.001

107,957.18 (54,837.64)

442Task 10

aPvalues compare experts (Task 1) with novices (Tasks 1-10).

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.10http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

Table 5. Comparison between the operations of experts at the first task of Session 1 and those of novices at each task.

Pvaluea

Mean (SD)NSessionSample and task

Experts

—17.49 (11.20)351Task 1

Novices

.95717.60 (7.87)451Task 1

.37315.42 (9.39)451Task 2

.00210.84 (3.05)451Task 3

.02712.91 (4.19)451Task 4

.00311.24 (3.68)451Task 5

.10113.89 (6.88)442Task 6

.00711.64 (5.33)442Task 7

.00411.41 (3.87)442Task 8

.02012.46 (5.92)442Task 9

.00511.55 (4.86)442Task 10

aPvalues compare experts (Task 1) with novices (Tasks 1-10).

Table 6. Comparisons between the errors of experts for the first task of Session 1 and those of novices at each task.

Pvaluea

Mean (SD)NSessionSample and task

Experts

—0.35 (0.88)341Task 1

Novices

.0151.24 (2.15)451Task 1

.0081.40 (2.33)451Task 2

.1120.80 (1.56)451Task 3

.0011.53 (2.07)451Task 4

.0421.00 (1.83)451Task 5

.0580.86 (1.44)442Task 6

.1680.68 (1.14)442Task 7

.1090.75 (1.28)442Task 8

.1010.84 (1.52)442Task 9

.1920.64 (1.01)442Task 10

aPvalues compare experts (Task 1) with novices (Tasks 1-10).

Errors

Table 6 summarizes the comparisons between the errors of

experts at the first task of Session 1 and those of novices at each

task. Novices made significantly more errors at almost each

task of Session 1 (except for Task 3) than did experts at Task

1 of Session 1.

The errors of novices at each task of Session 2 were, however,

not significantly different from those of experts at Task 1 of

Session 1 (P=.015 comparing Task 1 of novices with Task 1 of

experts; P=.008 comparing Task 2 of novices with Task 1 of

experts; P=.112 comparing Task 3 of novices with Task 1 of

experts; P=.001 comparing Task 4 of novices with Task 1 of

experts; P=.042 comparing Task 5 of novices with Task 1 of

experts; P=.058 comparing Task 6 of novices with Task 1 of

experts; P=.168 comparing Task 7 of novices with Task 1 of

experts; P=.109 comparing Task 8 of novices with Task 1 of

experts; P=.101 comparing Task 9 of novices with Task 1 of

experts; P=.192 comparing Task 10 of novices with Task 1 of

experts).

Discussion

This study evaluated the QuestionSys configurator, which was

developed to empower end users to develop mobile data

collection instruments. In total, 80 participants with and without

knowledge of process modeling (ie, experts and novices,

respectively) took part and modeled 10 data collection

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.11http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

instruments at 2 sessions. Within each session (RQ1 and RQ3),

a learning effect was observed: time and number of operations

needed to model the data collection instruments became less in

each session for novices as well as for experts. Also, across both

sessions (RQ4), novices as well as experts improved their needed

time and operations, adding further evidence to the mentioned

learning effect. Across both sessions, novices also showed a

decrease in errors from the first to the last (tenth) data collection

instrument. This learning effect across sessions was not observed

for experts, probably because they already had very few errors

in the first data collection instrument. Yet, errors were not

reduced within sessions, indicating that the learning effect

regarding reducing errors took more time in novices than the

learning effect regarding time and operations.

One week after Session 1 (without using the configurator

component), the performances of experts did not change, but

needed operations and time increased again in novices (RQ2).

This might indicate that the within-session learning effect of

novices was not as robust to an interval without using the

configurator as the within-session learning effect of experts.

One reason to explain this result might be that experts work

with this type of app on a “day-to-day” basis, retaining some

level of expertise between sessions. When novices did not use

the configurator for 1 week, they needed to get themselves

acquainted with the configurator again, and our results showed

that novices got better or faster relatively quickly when they

started using the app again. The decay of learning noticed within

the novices’ sample is a salient factor to be considered and may

have practical implications, especially in scenarios requiring

infrequent adaptations of instruments. However, as mentioned

in the introduction, the QuestionSys configurator addresses the

above scenarios in which frequent adaptations of instruments

are required. Besides, even for scenarios where infrequent

adaptations become necessary, the QuestionSys configurator

provides an applicable approach when used by end users

experienced in process modeling (ie, experts) as they did not

show a decay of learning in this study. However, note that

although scenarios with infrequent changes may be supported,

they do not constitute the main target of the QuestionSys

configurator.

Finally, RQ5 evaluated how many tasks needed to be completed

by novices in order to be as good as experts at the first task.

Interestingly, novices became significantly faster from the third

task on. Moreover, from the third task on, novices needed

significantly less operations than experts at the first task, except

for Task 6. This might be attributed to the already-mentioned

within-session learning effect causing novices to forget how to

properly work with the configurator more quickly compared

with experts.

Despite the fact that novices did not need to model many data

collection instruments in order to be faster and that they needed

less operations than experts at the first modeling task, novices

were unable to catch up with experts regarding errors. In order

to allow for more error-free data collection instruments, more

training (than modeling 10 data collection instruments) might

be necessary for novices. Furthermore, one could argue that

experts are the sample of choice when the data collection

instrument should have as less errors as possible.

Several limitations to this study [34] need to be discussed. First,

the process of selecting the participants limits external validity

or generalizability as mostly students and research associates

were recruited for this study. In this context, however, one

approach discusses that students may act as proper substitutes

in empirical studies [32]. Furthermore, the classification of

recruited participants into novices and experts solely based on

a “yes or no” question may be oversimplified and subject for

discussion. A more in-depth prequestioning of participants (eg,

asking about familiarity with notations such as BPMN, asking

for examples of the process models they have created) might

allow analysis of how performance depends on the whole

spectrum of rated expertise. However, in this study, we aimed

to classify novices and experts by distinguishing between no

process modeling experiences at all or being in touch with

process modeling. Finally, a more elaborated classification by

directly observing individuals during modeling or by inspecting

the images recorded during the modeling process may be applied

in future research. Next, threats to the internal validity constitute

the baseline differences between novices and experts regarding

gender, education, and field of study. As stated earlier, these

differences were intentional as we recruited from different

disciplines to be able to compare how well novices could learn

using the configurator compared with experts. The target group

(end users from medical, psychological, or social sciences) will

most likely have no experience in process modeling. Tests

measuring processing speed indicated equal cognitive abilities

between both groups. Differences in cognitive ability at baseline

would be a confounder as this could result in better or faster

learnability of the configurator. As another shortcoming, the

experts’ sample consisted of less participants than the novices’

sample so that the statistical power was higher in tests for the

novices’ sample. Another limitation of this study was that the

tasks to be modeled were from various domains (see Table 2).

However, the modeling concept used within the configurator is

domain agnostic. In order to show the feasibility of this approach

for different domains, a vast number of scenarios were modeled.

Likewise, the tasks to be modeled did not include modeling of

sensors that may be connected to smart mobile devices (eg, to

measure vital health care parameters during an interview). In

order to deal with these limitations, however, a study specifically

targeting health care instruments and the integration of related

sensors may be subject for future research.

Despite these limitations, the strength of the study was that we

specifically focused on the learnability of the QuestionSys

configurator. Note that learnability is a contributory factor to

the overall usability. For example, many usability attitude scales

explicitly include learnability factors. Interestingly, usability is

often measured by subjective reports (eg, usability scales). In

this context, studies that measure learnability not by self-reports

but by performance measures are more complex and time

consuming than those using opinion-based instruments (eg,

System Usability Scale [SUS]) [35]. Therefore, measuring

learnability by performance measures is often neglected in

usability studies [36], although it may have an impact on the

success or failure of an app [37]. Despite the fact that there exist

a plethora of best practices on how to create a user-friendly app,

most of them deal with the proper design of the user interface

[38-40] or with the enhancement of the overall user experience

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.12http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

[41,42]. Note that there also exist instruments that assess these

properties fairly easily (eg, SUS [35]). Although such measures

are useful and have proven their applicability in various ways,

they might be misleading when evaluating sophisticated apps

used by end users with little (or no) IT knowledge. In such

scenarios, focusing on the experience gained (ie, learning) when

continuously working with an app that needs to be evaluated

may be more conclusive. However, learning is a process that

takes place over time and takes practice into account as well. It

can be measured by evaluating the time and effort needed to

become better at doing something [43]. Thereby, learnability

can be measured using various performance metrics. However,

efficiency-based metrics (eg, the time needed, errors committed,

or operations required) during task completion are the most

common ones.

In summary, the results of this study valuably replicate and

extend the results of a previous pilot study [29]. The main

findings show that even novices can properly use the

configurator and that novices as well as experts perform better

when they use the configurator more often (learning effect).

Addressing the abovementioned reasons for the lack of

sophisticated mobile data collection instruments in large-scale

scenarios (see Introduction), the developed configurator

component helps build awareness regarding the capabilities of

the smart mobile devices used nowadays (see Reason 1).

Furthermore, it may allow using sensors during the procedure

of collecting data, which may support more complex data

collection scenarios (see Reason 2). Most importantly, however,

the configurator component not only allows researchers to create

data collection instruments themselves but also provides a

common (graphical) notation that may improve the

communication between researchers and mobile app developers

(see Reason 3). Altogether, QuestionSys will significantly

influence the way data are collected in large-scale studies (eg,

clinical trials). To the best of our knowledge, usability issues

in the context of creating mobile data collection apps by

researchers have not been studied at this scale previously.

Furthermore, this may serve as a valuable benchmark for

collecting data in general.

Acknowledgments

The QuestionSys framework was supported by funds from the program Research Initiatives, Infrastructure, Network and Transfer

Platforms in the Framework of the DFG Excellence Initiative–Third Funding Line.

Authors' Contributions

All authors analyzed the real-world projects; JS and RP conceived and designed the architecture and prototype; JS implemented

the prototype and conducted the experiments; TP and WS processed and analyzed the experiment data; all authors wrote the

paper.

Conflicts of Interest

None declared.

Multimedia Appendix 1

The tutorial screencast for the configurator.

[MP4 File (MP4 Video), 3MB - mhealth_v6i6e148_app1.mp4 ]

Multimedia Appendix 2

Raw data collected using the configurator.

[XLSX File (Microsoft Excel File), 60KB - mhealth_v6i6e148_app2.xlsx ]

Multimedia Appendix 3

Description of one task to be modeled (translated from German to English).

[PDF File (Adobe PDF File), 233KB - mhealth_v6i6e148_app3.pdf ]

References

1. Fernandez-Ballesteros R. Self-report questionnaires. In: Hersen M, Haynes SN, Heiby EM, editors. Comprehensive Handbook

of Psychological Assessment, Volume 3, Behavioral Assessment. Hoboken, New Jersey: John Wiley & Sons; 2004:194-221.

2. Pavlović I, Kern T, Miklavcic D. Comparison of paper-based and electronic data collection process in clinical trials: costs

simulation study. Contemp Clin Trials 2009 Jul;30(4):300-316. [doi: 10.1016/j.cct.2009.03.008] [Medline: 19345286]

3. Carlbring P, Brunt S, Bohman S, Austin D, Richards J, Öst L, et al. Internet vs. paper and pencil administration of

questionnaires commonly used in panic/agoraphobia research. Computers in Human Behavior 2007 May;23(3):1421-1434.

[doi: 10.1016/j.chb.2005.05.002]

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.13http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

4. Marcano BJS, Jamsek J, Huckvale K, O'Donoghue J, Morrison CP, Car J. Comparison of self-administered survey

questionnaire responses collected using mobile apps versus other methods. Cochrane Database Syst Rev 2015 Jul

27(7):MR000042. [doi: 10.1002/14651858.MR000042.pub2] [Medline: 26212714]

5. Palermo TM, Valenzuela D, Stork PP. A randomized trial of electronic versus paper pain diaries in children: impact on

compliance, accuracy, and acceptability. Pain 2004 Feb;107(3):213-219. [Medline: 14736583]

6. Pryss R, Geiger P, Schickler M, Schobel J, Reichert M. The AREA framework for location-based smart mobile augmented

reality applications. International Journal of Ubiquitous Systems and Pervasive Networks 2017;9(1):13-21 [FREE Full text]

7. Schobel J, Schickler M, Pryss R, Nienhaus H, Reichert M. Using Vital Sensors in Mobile Healthcare Business Applications:

Challenges, Examples, Lessons Learned. In: Proceedings of the 9th International Conference on Web Information Systems

and Technologies, Special Session on Business Apps. 2013 Presented at: International Conference on Web Information

Systems and Technologies; May 08-10, 2013; Aachen, Germany.

8. Lane SJ, Heddle NM, Arnold E, Walker I. A review of randomized controlled trials comparing the effectiveness of hand

held computers with paper methods for data collection. BMC Med Inform Decis Mak 2006;6:23 [FREE Full text] [doi:

10.1186/1472-6947-6-23] [Medline: 16737535]

9. Pryss R, Reichert M, Bachmeier A, Albach J. BPM to go: Supporting business processes in a mobile and sensing world.

In: Fischer L, editor. BPM Everywhere: Internet of Things, Process of Everything. 1st ed. Lighthouse Point, FL, USA:

Future Strategies Inc; 2015:167-182.

10. Pryss R, Mundbrod N, Langer D, Reichert M. Supporting medical ward rounds through mobile task and process management.

Inf Syst E-Bus Manage 2014 Mar 11;13(1):107-146. [doi: 10.1007/s10257-014-0244-5]

11. Rahman QA, Janmohamed T, Pirbaglou M, Ritvo P, Heffernan JM, Clarke H, et al. Patterns of User Engagement With the

Mobile App, Manage My Pain: Results of a Data Mining Investigation. JMIR Mhealth Uhealth 2017 Jul 12;5(7):e96 [FREE

Full text] [doi: 10.2196/mhealth.7871] [Medline: 28701291]

12. Probst T, Pryss R, Langguth B, Schlee W. Emotion dynamics and tinnitus: Daily life data from the “TrackYourTinnitus”

application. Sci Rep 2016 Aug 04;6:31166 [FREE Full text] [doi: 10.1038/srep31166] [Medline: 27488227]

13. Keedle H, Schmied V, Burns E, Dahlen H. The Design, Development, and Evaluation of a Qualitative Data Collection

Application for Pregnant Women. J Nurs Scholarsh 2018 Jan;50(1):47-55. [doi: 10.1111/jnu.12344] [Medline: 28898529]

14. Park E, Lee S. Multidimensionality: redefining the digital divide in the smartphone era. INFO 2015 Mar 09;17(2):80-96.

[doi: 10.1108/info-09-2014-0037]

15. Ubhi HK, Kotz D, Michie S, van Schayck OCP, West R. A comparison of the characteristics of iOS and Android users of

a smoking cessation app. Transl Behav Med 2017 Jun;7(2):166-171 [FREE Full text] [doi: 10.1007/s13142-016-0455-z]

[Medline: 28168609]

16. Schobel J, Schickler M, Pryss R, Reichert M. Process-Driven Data Collection with Smart Mobile Devices. In: Proceedings

of the 10th International Conference on Web Information Systems and Technologies, Revised Selected Papers.: Springer,

Cham; 2015 Presented at: 10th International Conference on Web Information Systems and Technologies; 2015; Barcelona,

Spain. [doi: 10.1007/978-3-319-27030-2_22]

17. Pryss R, Reichert M, Langguth B, Schlee W. Mobile Crowd Sensing Services for Tinnitus Assessment, Therapy, and

Research. In: Proceedings of the IEEE 4th International Conference on Mobile Services.: IEEE Computer Society Press;

2015 Presented at: IEEE 4th International Conference on Mobile Services; June 27 - July 02, 2015; New York, USA p.

352-359. [doi: 10.1109/MobServ.2015.55]

18. Ruf-Leuschner M, Pryss R, Liebrecht M, Schobel J, Spyridou A, Reichert M, et al. Preventing further trauma: KINDEX

mum screen - assessing and reacting towards psychosocial risk factors in pregnant women with the help of smartphone

technologies. In: XIII Congress of European Society of Traumatic Stress Studies Conference. 2013 Presented at: XIII

Congress of European Society of Traumatic Stress Studies Conference; June 05-09, 2013; Bologna, Italy.

19. Schobel J, Pryss R, Reichert M. Using Smart Mobile Devices for Collecting Structured Data in Clinical Trials: Results

From a Large-Scale Case Study. In: Proceedings of the 28th IEEE International Symposium on Computer-Based Medical

Systems.: IEEE Computer Society Press; 2015 Presented at: 28th IEEE International Symposium on Computer-Based

Medical Systems; June 22-25, 2015; Sao Carlos, Brazil. [doi: 10.1109/CBMS.2015.69]

20. Wilker S, Pfeiffer A, Kolassa S, Elbert T, Lingenfelder B, Ovuga E, et al. The role of FKBP5 genotype in moderating

long-term effectiveness of exposure-based psychotherapy for posttraumatic stress disorder. Transl Psychiatry 2014 Jun

24;4:e403 [FREE Full text] [doi: 10.1038/tp.2014.49] [Medline: 24959896]

21. Isele D, Ruf-Leuschner M, Pryss R, Schauer M, Reichert M, Schobel J, et al. Detecting adverse childhood experiences with

a little help from tablet computers. In: XIII Congress of European Society of Traumatic Stress Studies Conference. 2013

Presented at: XIII Congress of European Society of Traumatic Stress Studies Conference; June 05-09, 2013; Bologna, Italy.

22. Zhang M, Cheow E, Ho CS, Ng BY, Ho R, Cheok CCS. Application of low-cost methodologies for mobile phone app

development. JMIR Mhealth Uhealth 2014;2(4):e55 [FREE Full text] [doi: 10.2196/mhealth.3549] [Medline: 25491323]

23. Zhang MW, Tsang T, Cheow E, Ho CS, Yeong NB, Ho RC. Enabling psychiatrists to be mobile phone app developers:

insights into app development methodologies. JMIR Mhealth Uhealth 2014;2(4):e53 [FREE Full text] [doi:

10.2196/mhealth.3425] [Medline: 25486985]

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.14http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

24. OMG Object Management Group. Business Process Model And Notation Specification Version 2.0. 2011. Business Process

Model and Notation (BPMN) Version 2.0 URL: https://www.omg.org/spec/BPMN/2.0 [accessed 2018-06-06] [WebCite

Cache ID 6zy5r6DMQ]

25. Klopfer E, Yoon S, Um T. Teaching complex dynamic systems to young students with Starlogo. Journal of Computers in

Mathematics and Science Teaching 2005;24(2):157-178 [FREE Full text]

26. Kandogan E, Haber E, Barrett R, Cypher A, Maglio P, Zhao H. A1nd-User Programming for Web-based System

Administration. In: Proceedings of the 18th ACM symposium on User interface software and technology.: ACM; 2005

Presented at: 18th ACM symposium on User interface software and technology; October 23-26, 2005; Seattle, USA. [doi:

10.1145/1095034.1095070]

27. Wong J, Hong JI. Making Mashups with Marmite: Towards End-user Programming for the Web. In: Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems.: ACM; 2007 Presented at: SIGCHI Conference on Human

Factors in Computing Systems; April 28 - May 03, 2007; San Jose, USA p. 1435-1444. [doi: 10.1145/1240624.1240842]

28. Schobel J, Pryss R, Schickler M, Ruf-Leuschner M, Elbert T, Reichert M. End-User Programming of Mobile Services:

Empowering Domain Experts to Implement Mobile Data Collection Applications. In: Proceedings of the IEEE 5th

International Conference on Mobile Services.: IEEE Computer Society Press; 2016 Presented at: IEEE 5th International

Conference on Mobile Services; June 27 - July 02, 2016; San Francisco, USA p. 1-8. [doi: 10.1109/MobServ.2016.11]

29. Schobel J, Pryss R, Schlee W, Probst T, Gebhardt D, Schickler M, et al. Development of Mobile Data Collection Applications

by Domain Experts: Experimental Results from a Usability Study. In: Proceedings of the 29th International Conference on

Advanced Information Systems Engineering.: Springer; 2017 Presented at: 29th International Conference on Advanced

Information Systems Engineering; June 12-16, 2017; Essen, Germany. [doi: 10.1007/978-3-319-59536-8_5]

30. Schobel J, Pryss R, Schickler M, Reichert M. A Configurator Component for End-User Defined Mobile Data Collection

Processes. In: Demo Tracks of the 14th International Conference on Service Oriented Computing.: Springer; 2016 Presented

at: 14th International Conference on Service Oriented Computing; October 10-13, 2016; Banff, Canada. [doi:

10.1007/978-3-319-68136-8_28]

31. von Aster M, Neubauer A, Horn R. Wechsler Intelligenztest für Erwachsene: WIE; Übersetzung und Adaption der WAIS-III.

Frankfurt am Main, Germany: Harcourt Test Services; 2006.

32. Höst M, Regnell B, Wohlin C. Using Students as Subjects - A Comparative Study of Students and Professionals in Lead-Time

Impact Assessment. Empirical Software Engineering 2000;5(3):201-214. [doi: 10.1023/A:1026586415054]

33. Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A. Experimentation in Software Engineering. Heidelberg,

Germany: Springer; 2012.

34. Cook TD, Campbell DT. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston: Houghton Mifflin

Boston; 1979.

35. Brooke J. SUS: a “quick and dirty” usability scale. In: Jordan PW, Thomas B, Weerdmeester BA, McClelland IL, editors.

Usability evaluation in industry. London: Taylor and Francis; 1996:189-194.

36. Zhou L, Bao J, Parmanto B. Systematic Review Protocol to Assess the Effectiveness of Usability Questionnaires in mHealth

App Studies. JMIR Res Protoc 2017 Aug 01;6(8):e151 [FREE Full text] [doi: 10.2196/resprot.7826] [Medline: 28765101]

37. Harrison R, Flood D, Duce D. Usability of mobile applications: literature review and rationale for a new usability model.

J Interact Sci 2013;1(1):1. [doi: 10.1186/2194-0827-1-1]

38. Shneiderman B, Plaisant C, Cohen M, Jacobs S, Elmqvist N. Designing the user interface: strategies for effective

human-computer interaction, 6th Edition. Edinburgh, England: Pearson; 2017.

39. Laurel B, Mountford SJ. The Art of Human-Computer Interface Design. Boston, MA, USA: Addison-Wesley Longman

Publishing Co Inc; 1990.

40. Mayhew DJ. The Usability Engineering Lifecycle. In: CHI 99 Extended Abstracts on Human Factors in Computing Systems.:

ACM; 1999 Presented at: ACM SIGCHI Conference on Human Factors in Computing Systems; May 15-20, 1999; Pittsburgh,

USA p. 147-148. [doi: 10.1145/632716.632805]

41. Hassenzahl M, Tractinsky N. User experience - a research agenda. Behaviour & Information Technology 2006

Mar;25(2):91-97. [doi: 10.1080/01449290500330331]

42. Law ELC, Roto V, Hassenzahl M, Vermeeren AP, Kort J. Understanding, scoping and defining user experience: a survey

approach. In: Proceedings of the SIGCHI conference on human factors in computing systems.: ACM; 2009 Presented at:

SIGCHI conference on human factors in computing systems; April 04-09, 2009; Boston, USA p. 719-728. [doi:

10.1145/1518701.1518813]

43. Tullis T, Albert B. Measuring the user experience: collecting, analyzing, and presenting usability metrics, 2nd edition.

Waltham, USA: Morgan Kaufmann; 2013.

Abbreviations

BPMN: Business Process Modeling and Notation

IT: information technology

RQ: research question

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.15http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX

SUS: System Usability Scale

Edited by C Dias; submitted 12.01.18; peer-reviewed by V Koutkias, G Ghinea, J Brooke, S Choemprayong; comments to author

02.03.18; revised version received 18.03.18; accepted 14.05.18; published 29.06.18

Please cite as:

Schobel J, Pryss R, Probst T, Schlee W, Schickler M, Reichert M

Learnability of a Configurator Empowering End Users to Create Mobile Data Collection Instruments: Usability Study

JMIR Mhealth Uhealth 2018;6(6):e148

URL: http://mhealth.jmir.org/2018/6/e148/

doi:10.2196/mhealth.9826

PMID:

in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 29.06.2018. This is an open-access article distributed under the terms of

the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work, first published in JMIR mhealth and uhealth, is properly

cited. The complete bibliographic information, a link to the original publication on http://mhealth.jmir.org/, as well as this copyright

and license information must be included.

JMIR Mhealth Uhealth 2018 | vol. 6 | iss. 6 | e148 | p.16http://mhealth.jmir.org/2018/6/e148/

(page number not for citation purposes)

Schobel et alJMIR MHEALTH AND UHEALTH

XSL

•

RenderX