scieee Science in your language
[en] (orig)
Seven Guidelines for Designing the User Interface
in Robotic Process Automation
Judith Wewerka
Institute of Databases and
Information Systems
Ulm University
Ulm, Germany
0000-0002-4809-2480
Christian Micus
Research and Development
BMW Group
Munich, Germany
0000-0002-1107-3964
Manfred Reichert
Institute of Databases and
Information Systems
Ulm University
Ulm, Germany
0000-0003-2536-4153
Abstract—Robotic Process Automation (RPA) aims to auto-
mate rule-based business process tasks by software robots (bots)
mimicking human interactions. Despite the partial automation
achieved with RPA, humans still need to interact with the bots,
which requires appropriate user interfaces. However, existing
RPA research has not evaluated RPA from a software-ergonomic
perspective so far and no corresponding user interface design
guidelines exist. The objective of this paper is to evaluate the
usability of RPA bots in industry and to provide user interface
design guidelines to bot developers. The results we obtained
from 50 questionnaires filled by RPA users indicate that both
the input/output and the dialogue interfaces of RPA need to
be improved, especially regarding error tolerance,perceptibil-
ity,directability of user’s attention,suitability for the task, and
availability. Finally, we derive seven guidelines for designing
the user interface of RPA bots. Potential improvements include,
among others, the quality of error messages, the efforts for error
handling, and the monitoring of the current status of the tasks
assigned to the bot.
Index Terms—Robotic Process Automation, User Interface
Design Guidelines, Software-Ergonomic Evaluation.
I. INTRODUCTION
Robotic Process Automation (RPA) supports companies
in optimizing and implementing business processes or parts
of them with software robots (bots). The latter automate
repetitive, rule-based process tasks and execute them in a
human-like fashion [5], e.g., by transferring data from various
input systems to system of records [26]. As opposed to general
software systems, a bot is often implemented by domain
experts (e.g., knowledge workers) rather than by professional
software developers [41]. Note that RPA not only relieves
humans from executing routine process tasks, but also changes
their daily work procedures as the humans hand over tasks
to bots [34]. Thus, RPA has a significant impact on the
structure and organization of work activities, while changing
the procedures and roles of the involved actors [34], [37].
To keep the human in the loop, [6] proposes a human-
bot framework for RPA. However, a still unaddressed issue
is how to realize the interactions between humans and RPA
bots [40]. Even though RPA automates processes or parts
of them, it does not provide an end-to-end automation and
minimal interactions between the users and the bot are still
required. These interactions include delegating tasks to the
bot, reporting potential errors to the bot users, and informing
the latter upon completion of a task (cf. Figure 1). In existing
RPA implementations, the users often interact with the bot via
e-mail, by storing documents in the filesystem of the bot and
assigning the task to the bot in the respective system. The
bot, in turn, interacts with the users via e-mail or by storing
documents in a pre-defined filesystem [2], [14], [18], [40].
Delegate a
task
Report an
error Completion
of the task
Fig. 1. Possible interactions between the users and the RPA bot.
A user-friendly design of the interactions between humans
and RPA bot is needed to increase user acceptance [38].
However, as the knowledge workers implementing the RPA
bots are usually no software developers [34], [43], they are
inexperienced with optimizing the human bot interactions
and, thus, demand for corresponding user interface design
guidelines [40]. This paper evaluates seven RPA bots from
automotive industry regarding their usability. Based on the
evaluation results, we obtain seven guidelines for designing
bot user interfaces, which shall help knowledge workers to
optimize the bot user interface.
The remainder of the paper is organized as follows: Section
II reviews literature on software ergonomics to obtain different
procedures and criteria for assessing user interfaces. Section III
adapts selected procedures and criteria to fit to the RPA context
and to obtain a questionnaire for the software-ergonomic
evaluation of a bot. The results we obtained from the distri-
bution of the questionnaire to 400 employees in a knowledge-
intensive domain are presented in Section IV. Finally, Section
V derives seven guidelines for the RPA user interface design
based on the results of the 50 filled questionnaires we obtained
back.
II. BACKGROUNDS
This section presents backgrounds on software ergonomics
needed for understanding this work. Software Ergonomics
intends to measure the usability of interactive software sys-
tems (software for short) [17]. Three aspects are considered:
adequate functionality, i.e., the software efficiently supports
user tasks, correctness, i.e., correspondence between actual
software behavior and the predefined software performance,
and user interface, i.e., the user’s access to the software [30].
Figure 2 shows a user interface model (according to [16]),
which actually distinguishes between three user interfaces
[17]:
The input/output interface covers the perceptibility and
manageability of the software based on user input and
software output.
The dialogue interface comprises the interactions of the
users with objects or functions of the software as well as
their perception.
The tool interface deals with the comfortable handling of
the software and the access possibilities of the users to
software functions.
Software
User Input/
Output Dialogue Tool
Organization
(of work)
Organi-
zation
(of the
software)
Business
Fig. 2. A model of the user interfaces (adopted from [16])
Assessment Techniques. In general, there are three tech-
niques to assess the usability of a software. The first one is
theory-based, i.e., design principles are derived from theory
and are evaluated in the software based on expert interviews
and checklists. The main goal is to avoid fundamental weak-
nesses of the software [21]. The second technique is task-
based, i.e., tasks are processed by the users to evaluate the
usability and usefulness of the software [21]. The third tech-
nique is the user-based one, i.e., interviews, questionnaires, or
experiments are conducted that ask users about expectations,
impressions, and experiences with the software. The user-
based method has a high informative value and is more
objective than the other two methods [17], [21].
Procedures. As we focus on the interaction between hu-
mans and bot, we solely consider the assessment of the user
interfaces with a user-based assessment technique. Adequate
functionality and correctness are not considered. User-based
techniques are chosen due to their objectivity and informative
value. In the following, we present three procedures to assess
the user interfaces:
ISO Norm 9241-110. This norm sets the principles for
designing the dialogue interface between humans and
software. The aim is to examine drawbacks of the soft-
ware based on the user’s assessment of seven design
requirements: suitability for the task, learnability, indi-
vidualization, conformity with user’s expectations, self-
descriptiveness, controllability, and error tolerance [32].
IFIP user interface model. The IFIP user interface model
extends the ISO Norm with the organizational environ-
ment of the usage, i.e., the definition of rules determining
the creation, definition, and distribution of work tasks
[22]. The procedure systematically investigates four user
interfaces: input/output, dialogue, tool, and organizational
interface [16], [31]. The IFIP user interface model shall
enable an objective assessment of the software and dis-
play complex behavior in a structured manner [22], [30].
IsoMetrics. The model is built on the ISO Norm and
provides 75 questions for the seven design requirements
with a 5-point-likert answer scale. Additionally, each
question is weighted by answering how important this
aspect is for the overall impression of the software on a
5-point-likert scale [10], [12]. This rating allows the users
to report problems with the software [12].
Assessment Criteria. The tool, dialogue, and input/output
user interfaces are assessed based on different criteria. The
Tool Interface Criteria are used to assess how comfortable the
software can be handled as well as to assess access possibilities
to the functional scope of the software [17], [30]. The criteria
and explanations are given in Table I.
TABLE I
TOOL INTERFACE CRITERIA.
Criterion Explanation
Availability Software functions are available at any time.
Reliability Software is reliable and does not generate system
errors.
Reusability Software can be used repeatedly and is deterministic.
Combinability Users may combine functions.
Expandability Users can program new functions.
Complexity Operation of the software is easy.
Transparency Function and response of the software to the user’s
input are predictable.
Dialogue Interface Criteria measure the interaction of the
users with objects and functions of the software as well as the
perception of the software by the users [9], [30]. The criteria
and their explanations can be found in Table II.
Finally, Input/Output Interface Criteria evaluate the interac-
tion based on the input of the users and output of the software
[17], [30] (cf. Table III).
III. QUESTIONNAIRE-BASED EVALUATION OF RPA BOT
USABILITY
This section introduces the method we applied to evaluate
the usability of RPA bots. In this context, we had slightly
adapted the above presented procedures to fit to RPA bots.
The resulting questionnaire is presented and its objectivity,
validity, and reliability are assessed.
TABLE II
DIALOGUE INTERFACE CRITERIA.
Criterion Explanation
Suitability for the
task
Software supports the users in completing their
process task.
Self-
descriptiveness
For the users it is always obvious, which actions
may be taken.
Conformity with
user’s expectations
Dialogue corresponds to the user’s concerns.
Learnability Software guides the users in learning how to use
it.
Controllability Users may start a dialogue and, then, influence
its direction and speed until the user’s goal is
achieved.
Error tolerance Despite incorrect entries, work results should be
achievable with minimal correction efforts.
Individualization Users may tailor the presentation of information to
suit individual skills and needs.
TABLE III
INPUT/OUTPUT INTERFACE CRITERIA.
Criterion Explanation
Perceptibility The perceptibility of information, brightness, con-
trasts, and volume.
Legibility Given by size, spacing, and line spacing of the
software.
Distinctness Information given by the software is clearly recog-
nizable and distinguishable.
Clarity Presentation and arrangement of information is
precise.
Orientation support The design of functional structures facilitates the
user’s orientation while using the software.
Directability of
user’s attention
Software can focus user’s perception.
Manageability Input systems of the software and corresponding
feedback are useful.
Wholesomeness Effect on users at their working place.
A. RPA-specific Procedure
To derive guidelines for designing the user interfaces of
RPA bots, i.e., to meet our research goal, we combine the IFIP
user interface and the IsoMetrics models (cf. Section II). The
former investigates all user interfaces and, therefore, is suitable
to account for the role changes of users coming with an RPA
project. We exclude the organizational interface, which varies
for different RPA bots and is not relevant for deriving user
interface design guidelines. Moreover, the weighting aspect of
the IsoMetrics model is included to address user needs and
to obtain information about weaknesses of the RPA software
[45]. All 14 criteria for evaluating the tool and dialogue
interfaces (cf. Tables I and II) are considered. Concerning
the 8 input/output interface criteria (cf. Table III), we exclude
wholesomeness as it differs for each user and is not relevant
in the context of our research goal.
For each of the remaining 21 criteria we consider two
statements (i.e., 42 statements in total) that can be rated on a
6-point-likert scale (1-6). We choose an even scale to avoid
a mediocre rating as feedback and to obtain a clear positive
or negative tendency [29]. Moreover, each statement can be
weighted as “unimportant” (1), “moderately important” (2),
or “very important” (3) by participants depending on their
satisfaction with the RPA bot they are working with. Table IV
provides an excerpt from the questionnaire for the availability
criterion.
The statements of all criteria for assessing the three user
interfaces are summarized in Table V.1These statements are
based on [17], [30], but have been slightly adapted to account
for the peculiarities of the RPA software.
We invite 400 engineers of a global automotive vendor to
participate in the survey. The subjects are selected based on
the following criteria:
The RPA bot takes over a task that has been accomplished
by the engineer before. Hence, the participants are RPA
users.
The interaction between the RPA bot and the engineer
follows the schema depicted in Figure 1.
B. Objectivity, Validity, and Reliability of the Questionnaire
Before presenting the results obtained with the question-
naire, we assess its objectivity, validity, and reliability. In
general, the IsoMetrics questionnaire is considered as reliable
and valid [12]. However, as we slightly adopted this ques-
tionnaire, we need to look at general measures to ascertain
its objectivity, validity, and reliability. A confirmatory factor
analysis is performed to test whether the statements measuring
the criteria are consistent with our understanding of the criteria
[20]. Structural equation modeling is used with the lavaan
package in R. Figure 3 shows the aspects to be assessed and
the measures used (in italic).
Objectivity
Reliability
Validity
Composite
Reliability
Content
Validity
Construct
Validity
Discriminant
Validity
Convergent
Validity
Fornell-Larcker-
Criterion
Factor Loadings
Fig. 3. Overview of objectivity, validity, and reliability.
Objectivity defines the extent to which results are indepen-
dent from the respective respondent [8]. It is achieved through
the design of the questionnaire, which is the same for every
participant, and the use of rating scales [10]. Moreover, the
evaluation of the scores needs to be standardized and easy to
understand [10]. All aspects are covered by the questionnaire,
i.e., objectivity is given.
Validity assesses the extent to which a criterion is accurately
measured by the statements [15]. To ensure validity, content
and construct validity need to be considered [10]. Content
validity is given if the statements adequately cover the mean-
ing of the variable to be assessed [15]. The questionnaire
follows the IFIP user interface as well as the IsoMetrics
1The original questionnaire as well as the raw data of the 50 returned
questionnaire instances are available via the following Link to Researchgate
TABLE IV
EXCERPT FROM THE QUESTIONNAIRE:Availability CRITERION.
Availability Criterion
Rating:
Please indicate your consent to this state-
ment.
Weighting:
Please weight the importance of this state-
ment regarding your satisfaction with the
RPA bot.
Statement 1:
The RPA bot is always available when I
want to use it.
I strongly disagree Unimportant
I moderately disagree Moderately important
I somewhat disagree Very important
I somewhat agree
I moderately agree
I strongly agree
Statement 2:
My work with the RPA bot is not affected
by disruptions or long response times.
I strongly disagree Unimportant
I moderately disagree Moderately important
I somewhat disagree Very important
I somewhat agree
I moderately agree
I strongly agree
models. Both are based on ISO Norm 9241-110 and use two
statements for each criterion. Consequently, content validity is
ensured. Construct validity, in turn, refers to the degree to
which the score of a statement corresponds to the criterion the
measure is intended to operationalize [11]. It is further divided
into discriminant and convergent validity [10]. Discriminant
validity corresponds to the degree to which a criterion is
truly distinct from others. The Fornell-Larcker-Criterion is
used to assess discriminant validity by assessing whether
each criterion shares more variance with the corresponding
statement than with other statements in the questionnaire [13].
Therefore, the diagonal elements in Table VIII, which are
the square roots of the average variance extracted, must be
greater than the correlations of the latent variables in the off-
diagonal elements [1]. This is fulfilled for all criteria and
discriminant validity is given. Finally, convergent validity is
evaluated. Here, we assess whether the statements measuring a
criterion behave as if they are measuring a common underlying
variable [7]. Factor Loadings corresponding to the correlation
coefficients of each statement with its criterion are used for
evaluation [4]. Values range from 0.71 (MAN 2) to 0.99 (SUI
1) (cf. Table VI) and exceed the suggested threshold of 0.70
[28]. Hence, convergent validity is fulfilled and validity in
general is given for our questionnaire.
Reliability refers to the accuracy of the questionnaire [15].
We use composite reliability to measure internal consistency
[3] and obtain values between 0.69 (legibility) and 0.97
(suitability for the task) (cf. Table VI). All values exceed the
threshold of 0.60 as suggested for exploratory research [3].
IV. RESULTS
This section evaluates the questionnaires returned back by
the participants. First, results of the descriptive analysis
are given. Second, the rating and weight of the statements
is detailed. We receive 50 of the 400 questionnaires back,
resulting in a response rate of 12.5%. The respondents are
working with seven different RPA bots in total that realize the
human robot interaction as shown in Figure 1. All seven bots
have been released in 2020 and can be seen as state-of-the-
art bots. Regarding the experience with RPA (cf. Figure 4a),
16% (N=8) of the respondents have been using the bot less
than a month, 36% (N=18) between one and three months,
14% (N=7) between three and six months, and 34% (N=17)
for more than six months. Regarding age (cf. Figure 4b): 10%
(N=5) of the respondents are 30 years or younger, 56% (N=28)
are between 31 and 40 years old, 26% (N=13) between 41 and
50, and 8% (N=4) older than 50.
8
18
7
17
0
5
10
15
20
< 1 month 1 - 3 months3 - 6 months > 6 months
Number of respondents
Experience with RPA
5
28
13
4
0
5
10
15
20
25
30
30 31 - 40 41 - 50 > 50
Number of respondents
Age
ab
Fig. 4. Descriptive Statistics: a) Experience with RPA, b) Age.
In the first step, we evaluate the 21 criteria based on
the average rating and weight of the statements in the
questionnaire, i.e., the answer scores corresponding to the two
statements of a criterion are averaged (cf. Table VII). Note that
the rating is based on a 6-point-likert scale and the weighting
on a 3-point-likert scale. The values for rating the criteria lie
between 3.32 for individualization and 4.99 for expandability.
The top five rated criteria are expandability (4.99), orienta-
tion support (4.97), reusability (4.95), complexity (4.90), and
conformity with user’s expectation (4.84). The weights, which
indicate the importance of the criteria, lie between 2.00 for
individualization and 2.85 for reliability and suitability for
the task. The criteria evaluated with the five highest weights
are reliability and suitability for the task (2.85), directability
of user’s attention (2.74), conformity with user’s expectations
(2.67), and expandability (2.66).
In the second step, we evaluate the three user interfaces.
For this purpose, we take the median of the data from the
criteria to the three interfaces (cf. medians in Table VII). The
TABLE V
STATEMENTS FOR THE THREE INTERFACES.
Tool interface
Availability AVA 1: The RPA bot is always available when I want to use it.
AVA 2: My work with the RPA bot is not affected by disruptions or long response times.
Reliability REL 1: The RPA bot works as required without any complications.
REL 2: When I work with the RPA bot, no system errors (e.g. crash or incorrect execution) occur.
Reusability REU 1: I can run the RPA bot as often as I want.
REU 2: If I use the RPA bot repeatedly, I get the same result with the same input, i.e., the RPA bot behaves deterministically.
Combinability COM 1: The RPA bot has a modular structure and can be used for sub-tasks as well.
COM 2: It is easy to use the RPA bot for similar tasks.
Expandability EXP 1: If the system or the task changes, the RPA bot can be adapted.
EXP 2: The RPA bot can be expanded to include additional sub-tasks.
Complexity COP 1: The terms the RPA bot uses are understandable to me.
COP 2: It is easy to run the RPA bot to handle the task as desired.
Transparency TRA 1: The result of the RPA bot is predictable for me.
TRA 2: The RPA bot gives me feedback on the progress of my task.
Dialogue interface
Suitability for the
task
SUI 1: The RPA bot is well aligned to meet the requirements of my working tasks.
SUI 2: The RPA bot supports me in completing my tasks and is not an additional burden.
Self-descriptiveness SED 1: The RPA bot gives me enough information about what inputs are allowed and what data may be used.
SED 2: I am fully aware of the purpose and scope of the RPA bot.
Conformity with
user’s expectations
CON 1: The RPA bot works exactly as I expect it to.
CON 2: The RPA bot performs the task in the same way as if done manually.
Learnability LEA 1: I only have to remember few details to run the RPA bot.
LEA 2: The RPA bot requires little learning and supports me in learning how to use it.
Controllability CTR 1: I can adapt the type and scope of RPA inputs and outputs (e.g., the result is available in different file formats).
CTR 2: I can adapt the reaction time and the speed of executing the RPA bot to my individual needs.
Error tolerance ERR 1: The RPA bot creates easily understandable error messages that help me to fix the error.
ERR 2: The RPA bot only requires little efforts for correcting errors.
Individualization IND 1: The workflow and the processing order of the RPA bot can be adjusted to my individual needs.
IND 2: The RPA bot can be easily adapted to my personal way of working.
Input/output interface
Perceptibility PER 1: The information required for processing tasks is always in the right place on the screen.
PER 2: It can be seen whether the RPA bot has completed the task.
Legibility LEG 1: The readability of the texts and characters created by the RPA bot is good.
LEG 2: The results of the RPA bot can be presented according to my individual requirements.
Distinctness DIS 1: I can clearly assign the feedback received from the RPA bot to the triggering process.
DIS 2: Results the RPA bot delivers cannot be distinguished from the ones obtained through manual processing.
Clarity CLA 1: Information and messages from the RPA bot are clearly displayed.
CLA 2: Information and messages from the RPA bot are displayed in the same way on different output media.
Orientation support ORS 1: The good design of the RPA bot eases its use.
ORS 2: The representations of the RPA bot are consistent and foster its use.
Directability of
user’s attention
DOA 1: The RPA bot clearly indicates when a task is completed, a task is aborted, or problems occur.
DOA 2: The RPA bot does not stop me from doing other work while it is running.
Manageability MAN 1: The RPA bot can be operated individually, not just following a rigid procedure.
MAN 2: If the RPA bot or I make a mistake during task processing, I can easily undo the faulty operation and restore the
original data.
tool interface is rated with 4.74 and weighted with 2.55, the
dialogue interface has a rating of 4.43 and a weight of 2.53,
and the input/output interface is evaluated with 4.37 and
weighted with 2.47. Note that we decided to use the median
instead of the average value, as it is more robust to outliers
and better suited for ordinal scales [19].
In the third step, we evaluate the relationship between rating
and weight (cf. Figure 5). The distance between these two
values provides information on the acceptance of the criteria
[12]. We use a bubble chart to illustrate this relationship [36].
The chart is divided along the two median values of all criteria
(i.e., 4.53 for the rating and 2.52 for the weight) into four parts:
Top right: criteria with a high rating being important for
the users.
Bottom left: criteria with a low rating not being important
for the users.
Top left: criteria with a low rating, but being important
for the users - these criteria should be put more into focus.
Bottom right: criteria with a high rating, but not being
important for the users - these criteria should be less in
the focus.
The criteria having a low rating and being weighted as
important include error tolerance,perceptibility,directability
of user’s attention,availability, and suitability for the task.
The latter lies on the line of the median and is, therefore,
considered as well.
Contrary, the criteria with high rating and weighted as
unimportant are distinctness,legibility, and transparency. We
add reusability and orientation support, which both lie on the
median line, to this list.
V. DERIVING GUIDELINES FOR RPA USER INTERFACE
DESIGN
Based on the results presented in Section IV, this section
derives seven guidelines for designing the user interfaces of
RPA bots. These guidelines shall help knowledge workers
without IT background to successfully implement RPA bots
TABLE VI
MEAN, FACTOR LOADING,AND COMPOSITE RELIABILITY FOR EACH STATEMENT.
Criterion Statement Mean Factor Loading Composite Reliability
Availability AVA 1/AVA 2 4.57/4.33 0.82/0.76 0.77
Reliability REL 1/REL 2 4.86/4.60 0.89/0.81 0.84
Reusability REU 1/REU 2 4.57/5.33 0.78/0.76 0.75
Combinability COM 1/COM 2 4.25/4.31 0.76/0.75 0.73
Expandability EXP 1/EXP 2 5.00/4.97 0.94/0.84 0.89
Complexity COP 1/COP 2 5.12/4.68 0.79/0.76 0.75
Transparency TRA 1/TRA 2 5.12/4.36 0.78/0.75 0.74
Suitability for the task SUI 1/SUI 2 4.63/4.43 0.99/0.94 0.97
Self-descriptiveness SED 1/SED 2 4.29/4.93 0.75/0.85 0.78
Conformity with user’s expectations CON 1/CON 2 4.85/4.83 0.91/0.90 0.90
Learnability LEA 1/LEA 2 4.51/4.35 0.88/0.94 0.91
Controllability CTR 1/CTR 2 3.94/3.11 0.80/0.96 0.87
Error tolerance ERR 1/ERR 2 3.54/4.06 0.91/0.83 0.87
Individualization IND 1/IND 2 3.08/3.56 0.82/0.79 0.79
Perceptibility PER 1/PER 2 4.09/4.39 0.78/0.79 0.77
Legibility LEG 1/LEG 2 4.92/4.31 0.74/0.72 0.69
Distinctness DIS 1/DIS 2 4.88/4.20 0.82/0.85 0.82
Clarity CLA 1/CLA 2 4.53/3.97 0.82/0.90 0.85
Orientation support ORS 1/ORS 2 4.92/5.03 0.82/0.81 0.80
Directability of user’s attention DOA 1/DOA 2 3.93/4.82 0.79/0.83 0.79
Manageability MAN 1/MAN 2 3.63/4.26 0.80/0.71 0.73
TABLE VII
RESULTS OF THE SOFTWARE ERGONOMIC EVALUATION.
Interface Criterion Rating Weight
Tool
Availability 4.45 2.58
Reliability 4.73 2.85
Reusability 4.95 2.52
Combinability 4.28 2.36
Expandability 4.99 2.66
Complexity 4.90 2.55
Transparency 4.74 2.41
Tool - Median 4.74 2.55
Dialogue
Suitability for the task 4.53 2.85
Self-descriptiveness 4.61 2.53
Conformity with user’s expectations 4.84 2.67
Learnability 4.43 2.27
Controllability 3.53 2.08
Error tolerance 3.80 2.59
Individualization 3.32 2.00
Dialogue - Median 4.43 2.53
Input/
Output
Perceptibility 4.24 2.59
Legibility 4.62 2.47
Distinctness 4.54 2.33
Clarity 4.25 2.19
Orientation support 4.97 2.52
Directability of user’s attention 4.37 2.74
Manageability 3.95 2.30
Input/Output - Median 4.37 2.47
Median 4.53 2.52
satisfying the needs of the users in respect to their interaction
with the RPA bot.
Five of the top seven rated criteria refer to the tool interface.
Contemporary RPA implementations have focused on imple-
menting reliable, reusable, expandable, simple, and transparent
RPA bots [25], [26]. Consequently, these criteria are now the
ones with the highest evaluation. When taking a look at the
criteria with the highest weights, i.e., the highest importance
for the users, there are criteria from all three user interfaces:
Of the top eight weighted criteria, three refer to the tool, three
to the dialogue, and two to the input/output user interfaces
(cf. Table VII). Due to the discrepancy between importance
and rating of the criteria, RPA bot developers need not only
focus on reliable and expandable implementations, but also
on aspects of the dialogue and input/output interfaces. Among
others, these include the suitability for the task, directability
of user’s attention, error tolerance, and perceptibility.
Examining the median evaluation values of the three user
interfaces, the tool interface is the best evaluated one (4.74)
and is weighted as most important (2.55). Note that the
weighting of the dialogue (2.53) and input/output interfaces
(2.47) are nearly as high. The ratings of the dialogue (4.43)
and the input/output interfaces (4.37), however, are far below
the one of the tool interface and, therefore, should be put
more into focus.
Seven Guidelines for Designing the User Interface in
Robotic Process Automation
1. Improve the quality and comprehensibility of the bot
error messages.
2. Minimize the efforts for correcting bot errors.
3. Ensure visibility of the current status of the task.
4. Attract the attention of users when their task is com-
pleted, aborted, or any problem occurs.
5. Guarantee that the users obtain the results form the bot
within a reasonable response time.
6. Take care that no additional efforts are required to use
the RPA bot.
7. Do not over-emphasize legibility, transparency, and dis-
tinctness.
We derive the guidelines by investigating the criteria that
show a discrepancy between rating and weight. We emphasize
Reliability
Suitability for the task
Directability of attention Conformity with users's
expectations Expandability
Error tolerance Perceptibility
Availability
Complexity
Self-descriptiveness
Reusability
Orientation support
Legibility
Transparency
Combinability
Distinctness
Manageability
Learnability
Clarity
Controllability
Individualization
1.80
1.90
2.00
2.10
2.20
2.30
2.40
2.50
2.60
2.70
2.80
2.90
3.00
3.20 3.30 3.40 3.50 3.60 3.70 3.80 3.90 4.00 4.10 4.20 4.30 4.40 4.50 4.60 4.70 4.80 4.90 5.00 5.10
Weight
Rating
Low rating but
important
focus more
Low rating and
unimportant
High rating but
unimportant
focus less
High rating and
important
Fig. 5. Relationship between rating and weight of the criteria.
five criteria with low rating and high weight. Note that four
of them refer to the dialogue or the input/output interfaces.
Possible improvements of the interaction between human and
bot are indicated in Figure 6.
Delegate a
task
Report an
error Completion
of the task
Report an
error Completion
of the task
Ask for current
status of task
Provide information
on current status
Fig. 6. Improved interactions between the users and the RPA bot.
First and foremost, error tolerance needs to be ensured.
Both statements related to this criterion are poorly rated. As
a consequence, developers should improve the quality and
comprehensibility of the bot error messages (1) to support
users in understanding and fixing the errors (cf. quality symbol
in Figure 6). In a second step, the efforts for correcting bot
errors need to be minimized (2), i.e., the users should be
able to solve errors and problems without being an RPA expert.
Note that this aspect is rarely addressed in literature, only [18]
suggests creating novel solutions for error handling.
Another criterion to which more attention should be paid
during bot implementation is perceptibility. According to the
results, users complain that it does not become transparent
when a bot has completed an assigned task. Consequently, the
RPA input/output interface needs to be improved. A simple
notification via e-mail seems to be not sufficient. Instead, a
chatbot monitoring the progress and answering questions like
“How far is the processing of task xy?” or “Can you send me
a chat message when task xy is finished?” could ensure the
visibility of the current status of the task (3). Alternatively, a
dashboard showing all tasks assigned to the bot as well as their
status, e.g., “being processed” or “in queue”, can be used (see
the layer between human and bot in Figure 6). Then, users
can monitor whether their tasks are completed or assess on
their own how long they have to wait for completion taking
the number of tasks in the bot queue into account (see the
interaction between users and dashboard/chatbot in Figure 6
on the right). In literature, the combination of RPA bots and
chatbots is at its beginnings. [33] develops a framework for
proactive chatbots communicating with the users. Their aim is
to provide a user-friendly interface to spread RPA adoption.
Using a dashboard to report the status of RPA bots is proposed
in [23], [35].
The low-rated criterion directability of user’s attention goes
into the same direction. The first statement, i.e., the bot clearly
indicates when a task is completed, aborted, or problems occur,
is poorly assessed. One could think of different possibilities to
attract the attention of the users (4), e.g., popping-up of a
text message or playing a sound (cf. sound symbol in Figure
6). The bot should not only send an e-mail to the users upon
completion, but also inform them if any error occurs. So far,
literature has focused on communicating task completion to
the users by e-mail [2], [14], [18], [40]. No other types of
communication are reported.
Regarding the availability criterion, its second statement is
poorly rated. During RPA implementation, therefore, it needs
to be guaranteed that users obtain results from the bot
within a reasonable response time (5) (cf. speed symbol in
Figure 6). In general, availability is assumed to be improved
by RPA. Several publications emphasize that RPA bots are
24/7 available [24], [27]. However, improving processing time
with RPA must not be taken for granted [39]. Some case
studies achieve fast RPA bots, e.g., minutes instead of days
[42] or 15 minutes instead of 6 hours [44]. Other projects do
not improve response times, e.g., 431 seconds instead of 440
seconds [2]. However, the feeling to obtain the results in a
reasonable amount of time remains subjective.
Based on the described guidelines, one can assume that the
suitability for the task can be improved as well. Currently,
users complain that the RPA bot introduces additional efforts.
If, on the contrary, users are informed about the status of
their task (perceptibility) or are immediately informed upon
task completion (directability of user’s attention) or are pro-
vided with useful error messages to correct errors themselves
(error tolerance), and the bot is providing answers within a
reasonable response time (availability), the RPA image should
improve and it might be seen as a real support. Therefore, the
RPA developer should take care that no additional efforts
are required to use the RPA bot (6).
Finally, concerning the criteria with high rating and low
weight, we recommend that the developers no longer over-
emphasize those aspects (7), e.g., legibility or transparency.
Obviously, a transparent software providing distinguishable
and legible information is important to its users, but the main
focus needs to be shifted.
VI. CONCLUSIONS
This work focuses on the design of the user interface
for RPA bots, which is subdivided into the tool, dialogue,
and input/output interfaces. 50 engineers from an automotive
company, who are experienced in interacting with at least one
RPA bot, participated in the survey. The latter asks for ratings
and weights of 21 different criteria. The survey confirms that
the tool interface of contemporary bots is well perceived by
users. By contrast, the dialogue and input/output interfaces for
RPA need to be improved. Finally, we derive seven guidelines
for designing user interfaces in RPA. In future work, the
survey needs to be repeated in other domains to ensure
generalizability of results. The evaluation can help companies
to implement RPA more successfully by optimizing the user
interface design. The derived guidelines should be followed
and monitored whether the rating of the RPA software im-
proves.
APPENDIX
REFERENCES
[1] M. Ab Hamid, W. Sami, and M. Sidek, “Discriminant validity assess-
ment: Use of Fornell & Larcker criterion versus HTMT criterion, in J
Physics Conf S, vol. 890, 2017.
[2] S. Aguirre and A. Rodriguez, “Automation of a Business Process Using
Robotic Process Automation (RPA): A Case Study, in Works Eng Appl,
2017, pp. 65–71.
[3] M. Al-Emran, V. Mezhuyev, and A. Kamaludin, “PLS-SEM in Informa-
tion Systems Research: A Comprehensive Methodological Reference,
in Int Conf Adv Intell Syst Informat, 2018, pp. 644–653.
[4] J. C. Anderson and D. W. Gerbing, “Structural equation modeling in
practice: A review and recommended two-step approach, Psychol Bull,
vol. 103, no. 3, pp. 411–423, 1988.
[5] A. Asatiani and E. Penttinen, “Turning robotic process automation into
commercial success - Case OpusCapita, J Inform Technol Teach Cases,
vol. 6, no. 2, pp. 67–74, 2016.
[6] R. Cabello, M. J. Escalona, and J. G. Enr´
ıquez, “Beyond the Hype: RPA
Horizon for Robot-Human Interaction, in Int Conf Bus Proc Manag,
2020, pp. 185–199.
[7] F. D. Davis, “Perceived Usefulness, Perceived Ease of Use, and User
Acceptance of Information Technology, MIS Q, vol. 13, no. 3, pp. 319–
339, 1989.
[8] N. D¨
oring and J. Bortz, Forschungsmethoden und Evaluation. Springer,
2016.
[9] W. Dzida, “The Development of Ergonomic Standards, SIGCHI Bull,
vol. 20, no. 3, pp. 35–42, 1989.
[10] K. Figl, “Deutschsprachige Frageb¨
ogen zur Usability-Evaluation im
Vergleich, Zeits Arbeitswissen, vol. 4, pp. 321–337, 2010.
[11] M. Fishbein and I. Ajzen, Belief, attitude, intention, and behavior: An
introduction to theory and research, 1977.
[12] G. Gediga, K.-C. Hamborg, and I. D¨
untsch, “The IsoMetrics usability
inventory: an operationalization of ISO 9241-10 supporting summative
and formative evaluation of software systems, Behav Inform Technol,
vol. 18, no. 3, pp. 151–164, 1999.
[13] J. F. Hair, C. M. Ringle, and M. Sarstedt, “PLS-SEM: Indeed a Silver
Bullet, J Market Theo Prac, vol. 19, no. 2, pp. 139–152, 2011.
[14] P. Hallikainen, R. Bekkhus, and S. L. Pan, “How OpusCapita Used
Internal RPA Capabilities to Offer Services to Clients, MIS Q Exec,
vol. 17, no. 1, pp. 41–52, 2018.
[15] R. Heale and A. Twycross, “Validity and reliability in quantitative
studies, Evid Nurs, vol. 18, no. 3, pp. 66–67, 2015.
[16] A. M. Heinecke, “Software ergonomics for real-time systems, in Human
Computer Interaction, T. Grechenig and M. Tscheligi, Eds., 1993, pp.
377–390.
[17] M. Herczeg, Software-Ergonomie: Theorien, Modelle und Krierien f¨
ur
gebrauchstaugliche interaktive Computersysteme. Walter de Gruyter
GmbH & Co KG, 2018.
[18] J. Hindel, L. M. Cabrera, and M. Stierle, “Robotic process automation:
Hype or hope?” in 15th Int Conf Wirtschaftsinformatik, 2020.
[19] S. Jamieson, “Likert scales: How to (ab)use them?” Med educ, vol. 38,
no. 12, pp. 1217–1218, 2004.
[20] K. G. J¨
oreskog, “A general approach to confirmatory maximum likeli-
hood factor analysis, Psychometrika, vol. 34, no. 2, pp. 183–202, 1969.
[21] J. Karat, “Software Evaluation Methodologies, in Handbook of Human-
Computer Interaction, M. Helander, Ed., 1988, pp. 891–903.
[22] M. Koch, H. Reiterer, and A. M. Tjoa, “Kriterien zur Gestaltung und
Bewertung menschengerechter Arbeit, in Softw Ergonom, 1991, pp. 43–
86.
[23] J. Kokina and S. Blanchette, “Early evidence of digital labor in account-
ing: Innovation with Robotic Process Automation, Int J Acc Inform Syst,
vol. 35, 2019.
[24] M. Lacity and L. Willcocks, “A New Approach to Automating Services,
MIT Sloan Manag Rev, 2017.
[25] M. Lacity, L. Willcocks, and A. Craig, “Robotic Process Automation:
Mature Capabilities in the Energy Sector, Outs Unit Work Res Pap S,
pp. 1–19, 2015.
[26] ——, “Robotizing Global Financial Shared Services at Royal DSM,
Outs Unit Work Res Pap S, vol. 16, no. 2, pp. 1–26, 2016.
[27] ——, “Service Automation: Cognitive Virtual Agents at SEB Bank,
London School Econ Polit Sci, pp. 1–29, 2017.
TABLE VIII
FORNELL-LARCKER-CRITERION.
AVA REL REU COM EXP COP TRA SUI SED CON LEA CTR ERR IND PER LEG DIS CLA ORS DOA MAN
AVA 0.79
REL 0.70 0.87
REU 0.70 0.68 0.77
COM 0.53 0.41 0.54 0.76
EXP 0.36 0.50 0.52 0.61 0.89
COP 0.53 0.50 0.75 0.61 0.49 0.80
TRA 0.49 0.44 0.40 0.61 0.45 0.35 0.77
SUI 0.70 0.31 0.39 0.47 0.21 0.26 0.24 0.97
SED 0.58 0.26 0.31 0.64 0.26 0.15 0.38 0.74 0.81
CON 0.66 0.26 0.40 0.47 0.12 0.20 0.44 0.91 0.77 0.91
LEA 0.54 0.14 0.37 0.60 0.08 0.47 0.31 0.65 0.67 0.72 0.91
CTR 0.68 0.32 0.41 0.69 0.22 0.49 0.33 0.63 0.69 0.59 0.83 0.89
ERR 0.55 0.38 0.48 0.73 0.58 0.39 0.56 0.49 0.63 0.52 0.58 0.74 0.87
IND 0.53 0.28 0.28 0.75 0.30 0.33 0.37 0.71 0.78 0.68 0.63 0.79 0.75 0.83
PER 0.64 0.41 0.42 0.68 0.33 0.47 0.43 0.65 0.67 0.67 0.78 0.75 0.67 0.68 0.79
LEG 0.36 0.24 0.34 0.61 0.29 0.37 0.07 0.60 0.72 0.46 0.66 0.66 0.52 0.62 0.66 0.74
DIS 0.27 0.09 0.28 0.67 0.37 0.18 0.27 0.61 0.68 0.63 0.54 0.45 0.47 0.63 0.47 0.59 0.87
CLA 0.46 0.32 0.35 0.76 0.37 0.31 0.38 0.58 0.83 0.59 0.68 0.82 0.75 0.84 0.65 0.77 0.72 0.86
ORS 0.56 0.19 0.38 0.60 0.27 0.39 0.18 0.73 0.66 0.67 0.77 0.77 0.70 0.77 0.74 0.65 0.70 0.73 0.82
DOA 0.53 0.15 0.46 0.35 0.10 0.29 0.09 0.81 0.45 0.75 0.64 0.49 0.39 0.46 0.57 0.53 0.60 0.41 0.77 0.85
MAN 0.52 0.18 0.32 0.43 0.22 0.31 0.29 0.49 0.41 0.57 0.61 0.66 0.64 0.59 0.47 0.19 0.48 0.52 0.71 0.54 0.75
[28] V. Mezhuyev, M. Al-Emran, M. A. Ismail, L. Benedicenti, and D. A.
Chandran, “The Acceptance of Search-Based Software Engineering
Techniques: An Empirical Evaluation Using the Technology Acceptance
Model, IEEE Access, vol. 7, 2019.
[29] T. Nemoto and D. Beglar, “Likert-scale questionnaires, in JALT Conf
Proceed, 2014, pp. 1–8.
[30] R. Oppermann, B. Murchner, and M. Koch, Software-ergonomische
Evaluation: Der Leitfaden EVADIS II. Berlin [ua]: de Gruyter, 1992.
[31] B. Preim, Entwicklung Interaktiver Systeme: Grundlagen, Fallbeispiele
und innovative Anwendungsfelder. Springer-Verlag, 1999.
[32] J. Pr¨
umper, “Methode Isonorm-Fragebogen, B Wirt Energ, pp. 1–6,
2008.
[33] Y. Rizk, A. Bhandwalder, S. Boag, T. Chakraborti, V. Isahagian, Y. Khaz-
aeni, F. Pollock, and M. Unuvar, “A Unified Conversational Assistant
Framework for Business Process Automation, arXiv:2001.03543, 2020.
[34] C. Rutschi and J. Dibbern, “Towards a framework of implementing soft-
ware robots: Transforming Human-executed Routines into Machines,
Data Base Adv Inform Syst, vol. 51, no. 1, pp. 104–128, 2020.
[35] M. Schmitz, C. Dietze, and C. Czarnecki, “Enabling digital transfor-
mation through robotic process automation at Deutsche Telekom, in
Digitalization Cases, Management for Professionals, N. Urbach and
M. R¨
oglinger, Eds., 2019, pp. 15–33.
[36] S. Sirisack and A. Grimvall, “Visual detection of change points and
trends using animated bubble charts, Environ Monitor, pp. 327–340,
2011.
[37] J. Wanner, A. Hofmann, M. Fischer, F. Imgrund, C. Janiesch, and
J. Geyer-Klingeberg, “Process selection in RPA projects - Towards a
quantifiable method of decision making, 40th Int Conf Inform Syst, pp.
1–17, 2020.
[38] J. Wewerka, S. Dax, and M. Reichert, “A User Acceptance Model for
Robotic Process Automation, in 24th Int Ent Dist Obj Comp Conf,
2020, pp. 97–106.
[39] J. Wewerka and M. Reichert, “Towards Quantifying the Effects of
Robotic Process Automation, in 24th Int Ent Dist Obj Comp Conf
Works, 2020, pp. 11–19.
[40] ——, “Robotic Process Automation in the Automotive Industry -
Lessons Learned form an Exploratory Case Study, in 15th Int Conf
Res Chall Inform Sci, 2021, pp. 1–17.
[41] L. Willcocks and M. Lacity, “Robotic Process Automation: The Next
Transformation Lever for Shared Services, Outs Unit Work Res Pap S,
vol. 15, no. 7, pp. 1–35, 2015.
[42] ——, “Robotic Process Automation at Telef´
onica O2, MIS Q Exec,
vol. 15, no. 1, pp. 21–35, 2016.
[43] L. Willcocks, M. Lacity, and A. Craig, “The IT Function and Robotic
Process Automation, Outsourc Unit Work Res Pap S, vol. 15, no. 5, pp.
1–39, 2015.
[44] W. William and L. William, “Improving Corporate Secretary Produc-
tivity using Robotic Process Automation, Int Conf Technol Appl AI,
2019.
[45] R. C. Williges, B. H. Williges, and J. Elkerton, “Software interface
design, in Handbook of human factors, 1987, pp. 1416–1449.