Document [original]

Information Systems 114 (2023) 102184

Contents lists available at ScienceDirect

Information Systems

journal homepage: www.elsevier.com/locate/is

An approach for analyzing business process execution complexity

based on textual data and event log

Aleksandra Revinaa,b, Ünal Aksuc,∗

aChair of Information and Communication Management, Faculty of Economics and Management, Technical University of

Berlin, 10623 Berlin, Germany

bFaculty of Economics, Brandenburg University of Applied Sciences, 14770, Brandenburg an der Havel, Germany

cDepartment of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, The Netherlands

article info

Article history:

Received 1 October 2021

Received in revised form 5 July 2022

Accepted 30 January 2023

Available online 3 February 2023

Recommended by Dennish Sasha

Keywords:

Business process execution complexity

Event log

IT service management

Linguistic features

Machine learning

Process mining

Textual data

abstract

With the advent of digital transformation, organizations increasingly rely on various information

systems to support their business processes (BPs). Recorded data, including textual data and event

log, expand exponentially, complicating decision-making and posing new challenges for BP complexity

analysis in Business Process Management (BPM). Herein, Process Mining (PM) serves to derive insights

based on historic BP execution data, called event log. However, in PM, textual data is often neglected

or limited to BP descriptions. Therefore, in this study, we propose a novel approach for analyzing BP

execution complexity by combining textual data serving as an input at the BP start and event log.

The approach is aimed at studying the connection between complexities obtained from these two

data types. For textual data-based complexity, the approach employs a set of linguistic features. In our

previous work, we have explored the design of linguistic features favorable for BP execution complexity

prediction. Accordingly, we adapt and incorporate them into the proposed approach. Using these

features, various machine learning techniques are applied to predict textual data-based complexity.

Moreover, in this prediction, we show the adequacy of our linguistic features, which outperformed the

linguistic features of a widely-used text analysis technique. To calculate event log-based complexity,

the event log and relevant complexity metrics are used. Afterward, a correlation analysis of two

complexities and an analysis of the significant differences in correlations are performed. The results

serve to derive recommendations and insights for BP improvement. We apply the approach in the

IT ticket handling process of the IT department of an academic institution. Our findings show that

the suggested approach enables a comprehensive identification of BP redesign and improvement

opportunities.

(http://creativecommons.org/licenses/by/4.0/).

1. Introduction

Today’s organizations use various information systems like

Enterprise Resource Planning (ERP) systems or Information Tech-

nology (IT) ticketing systems to support their business processes

(BPs) and operations [1]. As such, they highly rely on IT. Since

digital transformation engages organizations in rapidly chang-

ing environments [2], it continually demands them to have a

thorough understanding of their BPs and operations to remain

resilient [3].

Accordingly, Business Process Management (BPM) has become

popular as a well-known way to enable efficient business opera-

tions and improvements in quality and productivity in organiza-

tions. To accomplish these goals, BPM research and practice have

∗Corresponding author.

E-mail addresses: [email protected] (A. Revina), [email protected]

(Ü. Aksu).

established various approaches. Process Mining (PM) is one of the

commonly used techniques to derive insights for process analy-

sis based on BP execution data extracted as event logs. In this

context, a large number of studies in BPM and PM are devoted

to BP execution complexity analysis and complexity metrics [4].

However, in these studies focusing on BP executions, the analysis

of textual data remains limited [5], despite the fact that textual

data make up more than 80% of data in companies [6]. The

relevant studies in the literature related to BP executions mainly

consider BP descriptions, documentation, and texts in BP models,

such as labels [5]. In addition, many unsolved challenges in ap-

plying Natural Language Processing (NLP) in BPM are highlighted,

such as semantic enhancements and domain or organization-

specific adaptations of NLP solutions [5]. Thus, a more rigorous

relation between these two areas discloses an untapped potential

to substantially improve the BPM toolset.

In fact, textual data serving as an input to a BP at its very start,

i.e., triggering a BP, highly influence its execution. For example, in

https://doi.org/10.1016/j.is.2023.102184

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 1. Our approach for analyzing business process execution complexity based on textual data and event log.

the IT Service Management (ITSM) area, activities performed in a

Change Management (CHM) process strongly depend on the tex-

tual descriptions of Requests for Changes (RfCs) communicated by

a customer. Specifically, the urgency of a given RfC and whether

it needs to be analyzed and approved for implementation are

determined primarily based on its textual description. Similarly,

the involvement of roles, such as a Change Advisory Board (CAB),

also depends on the RfC texts. For example, urgent RfCs may

not require CAB involvement. Hence, the same input, i.e., textual

data, determines important decision points of any RfC processing,

such as: (i) which activities in the CHM process will be skipped,

(ii) from which activity RfC processing will start, and (iii) which

roles will be involved. In many other areas, one can also observe

similar influences of textual data on the execution of BPs. For

example, in healthcare, the complaints expressed by a patient

typically determine the required diagnostics and related BP ac-

tivities. Generally speaking, in BPs, textual data can influence the

decision points, activities, and their order. Thus, BP complexity is

affected by textual data. In the related literature, it is shown that

there is a connection between BP complexity and BP performance

and management [7,8]. For this reason, process redesign and

improvement initiatives are often motivated by BP complexity

analysis [9,10].

The potential of textual data serving as an input to BP execu-

tion in the context of complexity has been extensively studied in

our previous work [11–15]. Within that work, we have explored

and developed a set of linguistic features, including semantic,

syntactic, and stylistic ones, i.e., taxonomy-based [11], sentiment-

based [15], and stylistic features [12], which potentially influence

BP execution complexity [13]. In an industrial case study of a CHM

IT ticket handling process [14], we have investigated the linguistic

features favorable for BP execution complexity prediction.

Overall, in this study, we propose a novel BP execution com-

plexity analysis approach in which we combine textual data and

event log. For the development of the approach, we set the

following specific objectives:

•Enriching an understanding of event log-based (EL) com-

plexity common in BPM with textual data-based (TD) com-

plexity,

•Identifying a set of metrics for TD and EL complexities taking

existing works as a basis,

•Studying the relation between TD and EL complexities and

investigating how textual data can contribute to EL com-

plexity prediction,

•Exploring, adapting, and illustrating the benefits of our ap-

proach by applying it in a real-world setting.

To achieve these objectives and ensure the comprehensiveness

of our approach, we build our study on the following steps.

In Section 2, we analyze the related work on the application

of NLP in BPM and BP execution complexity highlighting the

unsolved issues regarding the use of textual data. Section 3sum-

marizes the aspects from our previous work that we use for TD

complexity calculation and explains the state-of-the-art event

log complexity metrics serving as the basis for EL complexity

calculation. Afterward, in Section 4, using a running example, we

adapt and incorporate our previous work on TD complexity and

well-established studies on EL complexity and introduce the BP

execution complexity analysis approach. As can be seen in Fig. 1,

in the first block, TD and EL complexities are calculated. For TD

complexity calculation, linguistic features extracted from textual

data are used. In this regard, we take the designed linguistic fea-

tures (taxonomy-based and stylistic features) from our previous

work [11,12] and identify relevant features for TD complexity.

Using these linguistic features, the TD complexity of BPs is pre-

dicted with respect to an agreed-upon complexity scale. Further,

we assess the adequacy of our linguistic features in predicting TD

complexity. This is done by comparing their prediction perfor-

mance with the prediction performance of the linguistic features

from a well-accepted text analysis technique. To calculate EL

complexity, the state-of-the-art event log complexity metrics are

analyzed, and suitable ones are applied to the given event log. In

the second block, two analyses are performed, namely correlation

analysis and significant difference analysis. More specifically, how

calculated complexities correlate is determined in the correlation

analysis. Following that, significant differences within the correla-

tions are analyzed. Using the obtained results, recommendations

and insights for process improvement are derived. To illustrate

the value of our approach, a Service Request Management process

case study from an IT Service Management (ITSM) of an academic

institution is conducted and explained in Section 5. We discuss

the implications of our findings in Section 6. Finally, in Section 7,

we present our conclusions and directions for future work.

Thus, our work contributes to BPM by proposing a new ap-

proach to analyze BP execution complexity, considering textual

data serving as an input to BP execution and event log. Although

one of the dominant research directions in BPM regarding BP

analysis is BP complexity [4], to the best of our knowledge, no

other works combine textual data and event log to analyze BP ex-

ecution complexity. Using qualitative (interviews) and quantita-

tive (computational analysis) research methods, we demonstrate

the value of our approach by means of a case study. As a practical

contribution, our study findings show a comprehensive way of

identifying process redesign and improvement opportunities.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

2. Related work

This section lists the studies associated with the approach we

propose in this paper. In particular, we highlight the increas-

ing relevance of textual data in organizations and review the

state-of-the-art NLP applications in various BPM lifecycle phases.

Afterward, we present prominent complexity research in the

BPM-related literature.

Organizations increasingly focus on insights into understand-

ing and improving their BPs. In this regard, BPM investigates the

potentials of NLP to benefit from its maturity and availability in

multiple BP applications providing support to different BPM life-

cycle phases [5]. In the following, relevant research is reviewed

according to BPM lifecycle phases.

In the BP discovery phase, considerable research effort has

been made to develop BP model discovery approaches from data.

Whereas BP discovery from event log is a well-established and

matured subject area that already has tangible practical applica-

tions, BP model discovery from textual data is still a promising

research topic lacking the ability to scale [16]. Below, we review

the most prominent and recent developments:

BP model and event log generation from textual data: Creation of

BP models makes up to 60% of the time spent in BPM projects [5].

Further, due to the current dynamics of work environments,

BP modeling has become a time-consuming and costly activ-

ity requiring constant updates of BP models, which might lead

to BPM project failures [17,18]. Thus, an automatic generation

of BP models from available textual data becomes an attrac-

tive application paving the way for multiple research projects.

For example, recent research by [19] proposes an automatic BP

model discovery from textual BP descriptions based on neural

networks. Further, [20] extend existing NLP techniques to extract

activities and their relations defining BP constraints from textual

descriptions. [21] present a method to generate an event log

from textual data using action and topic analysis. Thereafter, BP

models are mined based on common techniques. [22] use natural

language inference to construct event log from customer service

conversations. [23] deal with the problem of multi-grained text

classification by introducing a hierarchical neural network to ex-

tract multi-grained information from BP descriptions. In [24], the

early developments of a tool to extract BP models from text and

then maintain their alignment using Dynamic Condition Response

(DCR) Graphs are presented.

Enrichment of event log with textual data: Process Mining (PM)

represents the most typical approach to automatically create

BP models from event log [25]. Hereby, textual data massively

generated by BP participants in the BP execution, such as com-

ments or email communication, are not considered. To address

this shortcoming, several research projects started to appear.

For example, [26] extract key phrases denoting activities from

comments related to IT ticket processing enriching the event log

with this information. Subsequently, a more comprehensive BP

model can be derived. Further, [27] enhance the event log analysis

with the analysis of textual attributes contained in it using a novel

attribute classification technique.

Automatic discovery of decisions in BP models: Decisions make

up an important effort-intensive part of BP modeling. Accord-

ingly, [28] propose a deep learning approach to obtain decision

constraints and conditional clauses from text. [29] provide an

NLP pipeline to automatically extract the decisions and their

dependencies to build the decision requirements diagram making

part of a decision model. The study by [30] describes a method for

generating entire decision models from textual inputs. The sug-

gested technique based on NLP and customized syntactic patterns

enables the extraction of both decision requirements and decision

logic from a document.

Text annotation: The efforts related to BP model creation can

be sufficiently reduced in case the text is well annotated, this

way decreasing the noise caused by automatic NLP techniques.

Such annotated BP descriptions can be used for both inferring

new relations to create more comprehensive BP descriptions and

as training data for various NLP analyzers [5]. Hence, in [31], a

novel approach using NLP and a query language for tree-based

patterns is introduced. It derives annotations representing essen-

tial BP elements, i.e., activities, events, actors, roles, and con-

trol flow. [32] describe a method based on Semantic Parsing

and Graph Convolutional Networks. This method avoids the use

of manual rules and outputs much better results than existing

neural network-based solutions to derive annotations from BP

descriptions.

Automatic BP modeling recommendations and semantic auto-

completion: Considerable research has been devoted to automatic

activity recommendations to support BP modeling task [33,34].

Hence, grounding on a similar technique as [35,36] exploit la-

bel semantics for rule-based activity recommendation. Addition-

ally, [37] propose to use semantic similarities between BPs to

enable design-time autocompletion by relying on pre-trained NLP

models. The method converts BP sequences into text paragraphs

and encodes them as sentence embeddings, i.e., learned text rep-

resentations that include semantics as real-number vectors [37].

The next phase of the BPM lifecycle, i.e., BP analysis, aims

to identify flaws and bottlenecks in the discovered BP mod-

els. Hereby, NLP techniques can also be of support in specific

applications. We present some up-to-date developments below:

BP model semantic correctness and completeness verification:

The semantic quality of BP models is critical for understanding

BPs correctly. A number of NLP research projects are naturally

aimed at automating the verification of this characteristic. Ac-

cordingly, many BP model analysis strategies rely on a thorough

examination of the natural language information included in the

activity labels of the models. Standard NLP is not adequate for an-

alyzing these labels since they are often short and heterogeneous

in terms of grammatical style. Dealing with this challenge, [38]

propose a Hidden Markov Models-based approach for a linguistic

analysis of BP model activity labels. Additionally, research by [39]

addresses the problem of ambiguity of BP textual descriptions

and suggests a compliance checking technique using behavioral

spaces.

BP model and text consistency check: As mentioned above,

maintaining various BP-related data allows for improving the

knowledge of BPs in organizations. However, as BPs change over

time, it is important to constantly identify inconsistencies among

various BP descriptions so that expectations for BP outputs re-

main the same for every stakeholder [40]. The latter research [40]

proposes an approach to detect conflicts between textual and

model-based descriptions using NLP. Further, [41] design a tech-

nique to align BP models and textual descriptions, mapping the

knowledge derived from these two representations into a unified,

comparable format.

BP-related data querying: Having a variety of BP-related data

allows organizations to better analyze their BPs. In such an anal-

ysis, a common task is querying these data to get insights into

specific BP parts. In the case of event log data, to be able to

use common BP querying techniques, end-users must be familiar

with the query language and database schema. Addressing this

challenge, [42] introduce a natural language interface. Hereby,

questions can be asked in a normal language, and the inter-

face will automatically translate them into a structured query

to be run in a database. [43] also address this problem by sug-

gesting a technique to search both textual and model-based BP

descriptions.

Sentiment analysis: Sentiment analysis has already shown its

high value for e-business and e-commerce providing insights

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

based on the textual data collected in social media and social

networks [44]. [45] explore its potential in BPM and develop a

BP modeling tool considering stakeholders’ comments and feed-

back. Applying sentiment analysis, the tool identifies positive and

negative feedback to support BP analysts in designing the to-be

process.

In the following BP redesign phase, the BP model needs to

be modified to address the concerns discovered in the previous

phase. Contemporary NLP techniques can also be used to achieve

this goal:

BP redesign using textual data: As a rule, experts dealing with

BP redesign focus on proposing to-be BP models and redesign

patterns with little or no consideration of end-user feedback.

To address this shortcoming, recent research suggests an NLP

approach based on a novel set of annotation guidelines to identify

redesign suggestions directly from end-user feedback [46].

Comparing BP models: To produce a sound to-be BP model in

the BP redesign phase, a BP analyst might need to examine large

sets of BP models that are often organized hierarchically based on

the level of abstraction. Hereby, one of the most difficult tasks is

ensuring that BP models at the same level in the hierarchy have

the same level of abstraction [5]. To solve this challenge, various

BP model matching algorithms can be used. For example, [47]

present a semantic multi-phase matching algorithm based on a

vector space model and NLP to match the models. [48] provide

a technique for discovering sets of related activities based on

constrained k-means clustering considering both BP semantics

and control flow order.

BP model refactoring: The quality of BP models may signif-

icantly vary since BP modeling is time-consuming and error-

prone. Moreover, the competence of various modelers differs.

Hereby, refactoring can be used to improve the quality of BP mod-

els. Refactoring is a popular approach in software engineering to

restructure the code without changing its external behavior. As

BP modeling and coding are similar to a certain extent, existing

refactoring technologies from software engineering have been

adapted for BP workflows [49]. In the context of NLP, such an

approach as linguistic refactoring has appeared. Accordingly, [50]

elaborates NLP techniques for syntactic, semantic, and pragmatic

refactoring in the dissertation.

In addition to typical monitoring techniques for assessing per-

formance and conformity requirements, in the BP controlling

phase, NLP can be used to make available different forms of BP

descriptions [5]:

Transformation of BP model to text: In the controlling phase,

it is important that all stakeholders are able to understand BP-

related data. However, event log, workflows, and BP models are

not straightforward and require certain expertise for comprehen-

sion. On the contrary, a written BP textual description can be un-

derstood by any stakeholder. Hence, it is highly recommended to

support event log and BP models with the latter [5]. To solve this

problem, BP model-based natural language generation has been

researched [51]. Further, [52] introduce a semi-automated ap-

proach to transfer knowledge from BP models to natural language

requirements documents. [53,54] develop a tool to generate BP

textual descriptions from declarative BP models. In this context,

another group of researchers [55] deals with the comparison

of manually and automatically generated textual descriptions of

BP models focusing on the choice of an appropriate matching

technique. Additionally, [56] suggest a technique to fix poorly

written BP textual descriptions based on BP models.

Multi-lingual support: In international companies as well as

in the context of cross-country and cross-organizational learn-

ing, it is essential to translate BP models and BP descriptions

into multiple languages to enable accessing BP information to

various stakeholders. Thus, [57] develop a framework for the

automatic generation of multi-language description text using an

emergency disposal process example. In line with the latter, [58]

enhance the framework with multiple (cross-department) views

and operationalize it in a cross-department medical diagnosis

process.

As can be concluded from above, several research streams,

such as sentiment analysis, BP-related data querying, BP modeling

recommendation and autocompletion, might be applied in multi-

ple BPM lifecycle phases, for example, BP discovery, analysis, and

redesign. However, the most prominent research stream is aimed

at supporting the discovery phase, i.e., the automatic creation of

BP models from BP textual descriptions. Accordingly, the most

frequently used textual data are related to the BP descriptions,

feedback from BP participants, and textual data inherent in BP

models, i.e., labels. Hereby, only a limited number of works deal

with the comments and emails related to the activities in the

event log [26].

In addition, a large number of studies in BPM analyze BP exe-

cutions from a complexity perspective. For example, [59] describe

metrics for measuring BP model complexity based on observa-

tions from software complexity. Similarly, in [60], metrics for

analyzing BP model complexity are proposed by extending met-

rics on software complexity. BP model complexity metrics and

their theoretical thresholds are studied in [61] to assess BP model

complexity and categorize BP models based on their complexity.

An overview of the BP model complexity reduction mechanisms

is provided in the form of patterns in [62]. Aside from that,

in BPM, there is a great interest in analyzing BPs from a com-

plexity perspective using PM. Hence, [63] study the design and

applicability of metrics for measuring event log complexity. [64]

provide a comprehensive evaluation of state-of-the-art BP discov-

ery techniques considering the complexity of their automatically

generated BP models. An approach aimed at the reduction of

the complexity of the discovered declarative BP models is pro-

posed in [65]. Moreover, [66] provide an overview of the BP

model complexity metrics by conducting a systematic literature

review. Lastly, in a recent study [4], the state-of-the-art event log-

based complexity metrics are analyzed to determine the relation

between the event log and the resulting BP model.

To sum up, despite the ubiquity of textual data in organiza-

tions, in the relevant BPM literature, the analysis of textual data

related to BP executions is prevailingly limited to BP descriptions

and texts in BP models [5]. Moreover, the complexity of textual

data has a considerable influence on BP execution, which has

been recently studied and proven in our work [13]. To address the

shortcoming, in this paper, we propose an approach combining

textual data used as an input to BP execution and event log for

BP execution complexity analysis.

3. Background

In this section, we provide a background on the linguistic

features adapted to calculate TD complexity and event log com-

plexity metrics employed to calculate EL complexity.

3.1. Linguistic features

In our previous work, we studied what characteristics, i.e., fea-

tures, of a given text have a great potential to influence BP

execution complexity [11–13,15]. In particular, we investigated

several features and identified that taxonomy-based, so-called

Decision-Making Logic (DML) taxonomy, and stylistic features are

prevailingly important for the complexity of textual data [14]. In

this regard, we summarize how these features were developed.

To design linguistic features that capture TD complexity via

cognition and style of textual data, we focus on the distribution of

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 1

Linguistic features.

Group Linguistic feature

DML taxonomy-based

Relative occurrence of words based on the DML taxonomy and

the DML cognition level routine

Relative occurrence of words based on the DML taxonomy and

the DML cognition level semi-cognitive

Relative occurrence of words based on the DML taxonomy and

the DML cognition level cognitive

Stylistic

Relative occurrence of nouns in all words

Relative occurrence of unique nouns in all nouns

Relative occurrence of verbs in all words

Relative occurrence of unique verbs in all verbs

Relative occurrence of adjectives in all words

Relative occurrence of unique adjectives in all adjectives

Relative occurrence of adverbs in all words

Relative occurrence of unique adverbs in all adverbs

Word count

Wording style

parts-of-speech (PoS) in textual data. In particular, nouns, verbs,

adjectives, and adverbs are analyzed since they reveal the most

information about decision-making and style in textual data. As a

rule, process workers interpret textual data inputs to determine

how a BP should be carried out. They map the phrases to the

BP elements. For example, a process worker may decide on a BP

execution based on the information the customer mentioned in a

textual message about previously performed BP activities (nouns)

using specific verbs, adjectives, or adverbs indicating the timeline

and status of such activities. Hence, extracting this information

could assist process workers in handling textual data related to

BP execution. Aligned with this, there are naming conventions in

BPM [62,67] on the effects of labeling in BPs. More specifically,

these conventions provide guidance on those PoS that can be

used in labeling and how. As such, we suggest considering nouns

as Resources expressing the specifics of BP elements, verbs as

Techniques of knowledge and information transformation activity

impacting Resources, adjectives as Capacities revealing contex-

tual specifics of Techniques, and adverbs as Choices defining the

selection of the necessary set of Techniques, elements of RTCC

framework developed in our previous research [11].

DML taxonomy: DML taxonomy is a 2-tuple: (1) most important

words (nouns, verbs, adjectives, and adverbs) extracted from a

given text and (2) decision-making logic levels, i.e., cognition

levels, each of which denotes the easiness of the process to

understand something for making a decision. In DML taxonomy,

each word is associated with a DML cognition level. For detailed

information regarding the DML taxonomy development process,

we refer to our previous work [11]. In this paper, we present a

summary of the most important steps:

Step-1 The first step is collecting BP-relevant textual data from

different sources. Whereas in our approach, the focus lies

on the textual data provided as input to BP execution, for

DML taxonomy, also other textual data, such as official BP

descriptions, interview transcriptions, or legal documents,

should be considered.

Step-2 The collected data are converted into a machine-readable

format, in which the computational analysis will be per-

formed, for example, a CSV file format.

Step-3 Afterward, the data are pre-processed and parsed, build-

ing the document-term matrices for the most important

parts of speech (nouns, verbs, adjectives, and adverbs).

Step-4 The created document-term matrices are processed us-

ing the topic modeling methods, such as a combination

of Latent Dirichlet Allocation (LDA) and Latent Semantic

Indexing (LSI) [68].

Step-5 In the last step, the extracted topics with descriptive

keywords are classified into the decision-making logic

levels, i.e., cognition levels, of routine, semi-cognitive, and

cognitive. Here, the involvement of experts being famil-

iar with the context is essential for the right keyword

classification.

Stylistic patterns: In our previous work [12], we showcased that

the style, i.e., stylistic patterns, of a given IT ticket text can reveal

information on identifying its BP complexity. More specifically,

ticket length, PoS distributions, and wording style are suitable for

indicating and understanding how the complexity of handling an

IT ticket is affected. To capture such components of a text, we

proposed Syntactic Structure (SynS) and Wording Style (WS) as

new features. The SynS feature focuses on syntax. The way the

words are put together to form phrases influences text compre-

hension and corresponding BP execution, which uses that text as

a primary input. The WS feature takes Zipf’s Laws [69] as a basis

and focuses on the appearance of new words in a text and the

speed of appearance.

In accordance with the considerations explained above, for

DML taxonomy-based features, PoS distributions are computed

based on a given DML taxonomy. Hereby, all the words are

considered as the search space for stylistic features. In Table 1,

we list the identified features.

3.2. Event log complexity metrics

Organizations use various information systems (for example,

ERP systems or IT ticketing systems) to enact their BPs with the

support of such systems. These systems enable organizations to

record a large amount of data about BP executions. Such process

execution data are then extracted in the form of an event log

to analyze and provide insights into improving BPs [70]. An

event log consists of events, each of which refers to an activity

performed in executing a BP. In Table 2, an exemplary event log

of a Service Request Management process is depicted.

As can be seen, each row shows (i) which activity is performed,

(ii) when, (iii) for which request, and (iv) other information (for

example, resource and priority). The events carried out in the

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 2

An example of an event log of a Service Request Management process.

Request Activity Time stamp Resource Priority Attr.

t001 Register 01-08-2021 10:11:12 Worker1 Low ...

t001 Analyze 01-08-2021 14:10:00 Worker2 Low ...

t002 Register 01-08-2021 16:01:03 Worker3 High ...

t003 Register 01-08-2021 16:05:42 Worker1 Low ...

t002 Analyze 01-08-2021 16:10:42 Worker4 High ...

t002 Resolve 01-08-2021 16:25:51 Worker5 High ...

t003 Analyze 01-08-2021 17:15:02 Worker6 Low ...

t001 Escalate 01-08-2021 17:35:40 Worker7 Low ...

t003 Resolve 02-08-2021 09:01:02 Worker8 Low ...

t004 Register 02-08-2021 09:10:20 Worker3 Medium ...

t004 Analyze 02-08-2021 09:25:33 Worker2 Medium ...

t003 Re-open 02-08-2021 10:01:01 Worker8 Low ...

t004 Reject 02-08-2021 10:06:08 Worker6 Medium ...

t001 De-escalate 02-08-2021 10:24:32 Worker8 Low ...

t003 Resolve 02-08-2021 11:16:10 Worker4 Low ...

t001 Resolve 02-08-2021 11:59:59 Worker4 Low ...

scope of a single process instance execution are called a case. In

the example, each request refers to a case that goes through the

same Service Request Management process. The sequence of the

events in the scope of a particular case is called a trace.

In the literature, there are several studies focusing on quanti-

fying the complexity of such an event log. Within these studies,

metrics are proposed in order to assess the complexity of event

logs, so that further analysis can be determined considering the

characteristics of event logs. In recent research on EL complex-

ity measurement [4], EL complexity metrics are reviewed and

studied in detail. Then, a set of entropy-based complexity metrics

are proposed to address the issues in the studied EL complexity

metrics. We take this research as the basis and analyze both

discussed and proposed metrics in it. These metrics are listed in

Table 3.

Based on the specifications of the metrics given in the table,

one can note that either one or more aspects (size, variation,

and distance) of complexity are selected as the focus in each

metric. In that sense, some metrics have limitations. For example,

metrics measuring the size of event logs would not capture any

difference in terms of variation or distance. Despite the fact that

some metrics focus on the same aspects of complexity, there

is not much dependency among them as they differ from each

other in measuring an event log using its various components

(for example, traces, event classes, or event relations) [4]. To

have a comprehensive view of the aspects of complexity, in our

approach, we opt for employing all EL complexity metrics listed

in the table. Further, to mitigate the influence of one metric on

another, we use majority voting in our approach to obtain a single

EL complexity value for a given event log.

In general, processes indicate all the work performed in an

organization [1]. Accordingly, the complexity of a process and

corresponding ways to measure it can imply a wide range of

elements and factors, like those emerging from the process con-

text [73,74], which are often difficult to obtain. Moreover, how a

process is reflected in a model affects its perceived complexity.

In other words, quality aspects of process models and process

modeling notations have a notable impact on perceived process

complexity. Hence, we focus on event log-based complexity met-

rics in this paper and list extending our approach with process

complexity metrics as part of our future work.

4. Approach development

As introduced in Section 1, we propose an approach, hereafter

Approach, aimed at analyzing BP execution complexity based on

textual data and event log. More specifically, we investigate the

relation between TD complexity and EL complexity. To achieve

this, first, for a given BP, attributes of an entity that goes through

the BP are identified. For example, a communication channel

attribute of a service request, which is an entity handled in

a Service Request Management (SRM) process. Then, based on

these attributes and time dimension, textual data and event log

of the BP are split into subsets. Further, TD and EL complexities

are calculated for each subset. Using the computed complexities,

correlation analysis is performed to investigate whether textual

data may be used for EL complexity prediction. Thereafter, sta-

tistically significant differences in the created event log subsets

are analyzed to find out the factors affecting EL complexity. For

instance, a certain category of service requests may account for

the repeated or skipped information collection activities resulting

in a considerable increase or decrease of EL complexity. Thus, in

terms of such factors, recommendations for process redesign and

improvement can be formulated and provided to organizations.

In the separate subsections of this section, we present the

inputs required in our Approach and introduce a running exam-

ple. Afterward, we elaborate on how TD and EL complexities

are calculated. Finally, correlation analysis and identification of

statistically significant differences are explained.

4.1. Inputs

To carry out the tasks mentioned above, three types of input

are required in the Approach:textual data,event log, and complex-

ity scale. The first two inputs will be described in Sections 4.3

and 4.4. The complexity scale necessary for calculating TD and

EL complexities is defined below.

Complexity scale: A complexity scale is a set of ordinal complex-

ity values. They can be numbers or categories that are put in a

certain order denoting either increasing or decreasing complexity.

A five-point Likert-type scale containing numbers from one to

five or a set of category names like low, medium, and high are

two examples of a complexity scale [75]. Although a considerable

number of metrics for measuring complexity exists (see Table 3),

textual complexity metrics are rather generic, i.e., mostly con-

sidering language usage in texts [13,76], and have less emphasis

on how textual data are perceived by process workers in terms

of work instructions. This perception is important because, using

a given text, process workers determine which activities will be

performed and in what order. Moreover, such generic metrics are

not applicable in a given area without considering the jargon,

characteristics, and regulations of the area. Therefore, when ap-

plying our Approach, expert involvement is essential to determine

a suitable complexity scale for a particular domain.

Creating textual data and event log subsets: To conduct a well-

established analysis of the relation between TD and EL complex-

ities, in the Approach, the textual data and event log are split

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 3

Metrics for event log-based complexity calculation.

Metric Definition

Number of events

(magnitude) [10]

The total number of events an event log contains

Number of event types

(variety) [10]

The total number of event classes in an event log

Number of sequences

(support) [10]

The total number of traces in an event log

Average sequence length

(TL-avg) [25]

The average length of a trace in an event log

Average time difference

between consecutive

events (time granularity)

[10]

The mean duration between two events where the first one is followed

by the second one without an interruption

Number of acyclic paths

in transition matrix

(LOD) [8]

The total number of simple paths (paths without cycles) in the graph

network that represents the event connections in an event log

Number of ties in

transition matrix

(t-comp) [71]

The total number of possible paths in the graph network that

represents the event connections in an event log

Lempel–Ziv complexity

(LZ) [72]

The minimum number of steps that are required to generate a given

trace by either reusing its previous parts or inserting a new symbol

Number and percentage

of unique sequences

DT(#), DT(%) [25]

The number and percentage of distinct traces in a given event log

Average distinct events

per sequence (structure)

[10]

The amount of present directly-related event pairs compared to the all

possible ones in a given event log

Average affinity (affinity)

[10]

The homogeneity of a given event log based on the average overlap of

traces in terms of direct following relations (i.e., one event right

afterward another)

Deviation from random

(dev-random) [72]

The Euclidian distance of the transition matrix that is created using the

pairwise associations of events of a given event log

Average edit distance

(avg-dist) [72]

The average edit distance of traces to transform one to another using

string matching with the lowest cost

Entropy-based metrics

(variance and sequence

entropy) [4]

The entropy-based metrics that use prefix automation to describe

sequences within a given event log to map it to a graph

Table 4

Running example IT tickets.

ID Channel Category Textual data

SR001 IT ticketing

system

Application I would like to get access to XYZ. Could

you please send me the available

document how to install it?

SR002 IT ticketing

system

Security As of this week, I am working in a

different building. When I try to login, it

says unable to find the trust certificate

CRT-ABC in the recovery database for

this workstation. Would you please

activate the broken security

configuration?

into subsets. Hereby, we take the following into account: time

dimension and a set of attributes of an entity going through a

BP, for example, a service request. A time dimension is important

to analyze changes in BP execution complexity. Accordingly, rel-

evant time periods can be defined for observing the BP execution

complexity over time. Having attributes allows us to perform a

drill-down analysis and move from a general to a more detailed

view. Further, to enrich the analysis, we combine entity attributes

pairwise.

4.2. Running example

To illustrate our Approach development, we introduce a run-

ning example of two IT tickets (service requests) from an SRM

process case study used in this research.As can be seen in Table 4,

the two tickets, SR001 and SR002, are entered directly in the IT

ticketing system. When tickets are received, their textual contents

are analyzed, and they are assigned to a category (i.e., grouping

of tickets based on the concerned topics in them) by service desk

employees. Using the category of a ticket, how it will be handled,

i.e., next activities and involvement of resources, is determined.

As shown in the table, the ticket SR001 consists of fewer and

more common words compared to SR002. Important to note

that the event log for the tickets in the running example is also

provided.

In the subsections below, we show how the Approach is de-

veloped using the illustrative running example, in particular, its

textual data and event log.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 2. Textual data-based complexity calculation.

4.3. Calculating textual data-based complexity

As an unstructured data type, one of the important inputs

influencing BP execution is textual data. However, due to the

dynamic nature and high interdependence of BPs, textual data are

often either unlabeled or contain few labeled points. Moreover,

labeling is a time-consuming and costly process. Hence, we use

machine learning to address this problem. In particular, we start

with the textual data that have very few labeled data points,

extract linguistic features, and develop prediction models. After-

ward, we select the outperforming prediction model. We run that

model on the unlabeled data. This flow is depicted in Fig. 2.

To perform a computational analysis of textual data and build

prediction models for TD complexity, we extract two sets of

linguistic features from textual data: DML taxonomy-based and

stylistic features. Specifically, we focus on the two fundamental

aspects of textual data that indicate complexity, namely cognition

level [77] and style [78,79]. The decisions made based on textual

data depend on the complexity perceived after reading the text,

i.e., the cognition of textual data affects decision-making. Further,

words and their order in sentences are referred to as style that

can contain certain stylistic patterns.

As stated in Section 3, in our previous work, we have exten-

sively analyzed those linguistic features potentially influencing

textual complexity [13]. In [14], we have performed a feature

selection using a complete set of features for the IT ticket clas-

sification task. Whereas the importance of these features is likely

dependent on the domain specificity, our analysis demonstrated

that the DML taxonomy-based and stylistic features were favor-

able for complexity prediction in the IT ticket processing case

study, which was belonging to the ITSM domain. Accordingly,

in our Approach, we use the DML taxonomy-based and stylistic

features. Using the features in Table 1, prediction modeling tech-

niques are trained on the labeled textual data, i.e., the training set.

Since the labeled data comprise very few data points, the prob-

lem that the Approach deals with is a semi-supervised learning

problem. Hence, we use the following commonly applied semi-

supervised learning techniques [80] to enrich unlabeled data

using labeled data: Label Propagation, Label Spreading, and Self-

Training. The first two are very similar: both consider the distance

of data points to assign labels using the unlabeled data points by

putting all data points in a graph. In Label Spreading, affinity ma-

trix and normalized graph are used. Self-Training assigns labels to

unlabeled data points by reinforcing a model as a pseudo-labeler.

While training, in each prediction modeling technique, a set

of adequate hyper-parameters is chosen for creating the best

prediction model. As soon as all prediction modeling techniques

are trained, the test set is used to determine the best-performing

technique based on the prediction quality. For prediction quality

assessment, we use the F-score metric.

In the prediction model development, to accomplish a better

prediction quality, we use three common meta-algorithms [81],

namely bagging, boosting, and stacking. In stacking, a single mod-

eling technique aims to learn the best combination of the predic-

tion models of the primary prediction modeling techniques put

in a stack. In boosting, to fix the errors in prior prediction mod-

els, the prediction modeling techniques are trained in a chain.

Bagging involves selecting different sub-samples of a training

data set. Predictions for the sub-samples are then aggregated

to identify the final and the best prediction model. When the

prediction model development is completed, the best-performing

model is applied to the unlabeled data. Using the DML taxonomy-

based and stylistic features, each data point is classified based on

the complexity scale, which is the one used while preparing the

labeled data.

In Table 5, linguistic feature value calculations for the running

example IT tickets are listed. As can be seen, for each DML

taxonomy-based and stylistic feature, a value per ticket is com-

puted. For calculating the DML taxonomy-based feature values,

the DML taxonomy of our case study shown in Table A.12 in the

appendix is used. These values are fed into the best-performing

prediction model to obtain a single TD complexity value for each

ticket.

To derive insights into the BP complexity of these tickets han-

dling and analyze how their TD and EL complexities are related,

a single value of TD complexity per ticket is necessary. Neverthe-

less, one can trace back the single feature presence in the text

if needed, for example, to understand which features influence

the complexity. This becomes notably relevant in the case of

inaccurate classifications and the XAI (explainable artificial in-

telligence) paradigm [82]. For instance, inaccurate classifications

can be analyzed by identifying the contribution of each linguistic

feature to the complexity prediction. Thus, one can use these

contributions to build end-user recommendations to improve the

text.

In addition to the overall analysis of TD and EL complexities,

i.e., at the ticket level, it is to emphasize that the relation between

TD and EL complexities can be further investigated in a higher

granularity using ticket attributes. As shown in Table 5, each of

the IT tickets in the running example has a different category.

Such an attribute of tickets may be beneficial to identify how

BP execution complexity varies among subsets of tickets. In this

regard, it is essential to calculate an aggregated TD complexity

value per subset. To do so, in the Approach, we use weight mul-

tipliers that are determined based on the same complexity scale.

These multipliers are applied to the calculated TD complexities of

IT tickets in a given subset, and a weighted average is computed

per subset.

4.4. Calculating event log-based complexity

To calculate EL complexity, in addition to an event log, a set

of EL complexity metrics and a complexity scale are taken as

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 5

Textual data-based complexity calculation of the running example IT tickets.

Group Linguistic feature Calculated values for tickets

Feature value TD complexity

SR001 SR002 SR001 SR002

DML taxonomy-based

Relative occurrence of words based

on the DML taxonomy and the DML

cognition level routine

0.75 (install,

send, available)

0.3 (certificate, login, find,

active)

Relative occurrence of words based

on the DML taxonomy and the DML

cognition level semi-cognitive

0.25

(document)

0.3 (security, configuration,

activate, unable)

Relative occurrence of words based

on the DML taxonomy and the DML

cognition level cognitive

0 0.4 (database, recovery,

workstation, different,

broken)

Stylistic

Relative occurrence of nouns in all

words

0.35 0.4 Low Medium

Relative occurrence of unique nouns

in all nouns

1 0.88

Relative occurrence of verbs in all

words

0.3 0.2

Relative occurrence of unique verbs

in all verbs

1 1

Relative occurrence of adjectives in

all words

0.05 0.05

Relative occurrence of unique

adjectives in all adjectives

1 1

Relative occurrence of adverbs in all

words

0.05 0.05

Relative occurrence of unique

adverbs in all adverbs

1 1

Word count 20 40

Wording style 0 (min. word

repeats)

0 (min. word repeats)

Fig. 3. Event log-based complexity calculation.

inputs. As the complexity scale, we use the same scale as in the

TD complexity. Thus, one can perform a correlation analysis

between the TD and EL complexities computed in respect to the

same complexity scale. As for the EL complexity metrics, in our

Approach, we use the ones1described in Section 3(see Table 3).

Similarly to TD complexity calculation, in this task, we create

subsets of a given event log considering the attributes present in

the textual data.

Fig. 3 shows how the calculation of the EL complexity is per-

formed in our Approach. For each event log subset, a single com-

plexity value is computed using the employed EL complexity met-

rics. Since these metrics focus on different event log properties

1The Python script provided on this Github page is adopted to calculate

those metrics.

(for example, size, variance in executions, distances between se-

quences and events) and use different measurement units, com-

puted measurements will vary for a single event log subset. For

example, in an event log subset, the number of events is a counted

value, and measuring a time interval is about calculating an

average time difference between consecutive events.

To have a single EL complexity value for each event log subset,

the computed complexity values are mapped to the points of

the complexity scale using clustering. More specifically, for each

metric, calculated values are clustered. Then, for each cluster, a

value is determined from the complexity scale, which is the same

scale used for the textual data labeling. This flow is depicted in

Fig. 4.

In Table 6, each EL-based complexity metric and the resulting

EL complexity of our running example tickets are shown. As can

be seen, the ticket SR002 was put into the medium cluster for

the metric Percentage of Unique Sequences (DT%), whereas the

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 4. Determining event log-based complexity of an event log subset.

Table 6

Event log-based complexity calculation of the running example IT tickets.

EL-based complexity metric Calculated values per ticket

Metric value EL complexity

SR001 SR002 SR001 SR002

Number of events (magnitude) Low High

Number of event types (variety) Low High

Number of sequences (support) Low High

Average sequence length (TL-avg) Low High

Average time difference between

consecutive events (time granularity)

Low Medium

Number of acyclic paths in transition

matrix (LOD)

Low High

Number of ties in transition matrix

(t-comp)

Low High

Lempel–Ziv complexity (LZ) Low High

Number of unique sequences DT(#) Low High Low High

Percentage of unique sequences DT(%) Low Medium

Average distinct events per sequence

(structure)

Low High

Average affinity (affinity) Low High

Deviation from random (dev-random) Low High

Average edit distance (avg-dist) Low High

Variance entropy Low High

Normalized variance entropy Low High

Sequence entropy Low High

Normalized sequence entropy Low High

ticket SR001 was put into the low cluster. As the result of majority

voting, the EL complexity of SR002 is determined as high and, for

SR001, it is low. As the TD and EL complexities are now known,

one can identify how the TD complexities of these tickets are

associated with their EL complexities. Moreover, depending on

the direction and strength of the association between the TD and

EL complexities, further necessary analyses and inputs can be

determined.

4.5. Correlation analysis

As introduced in Section 1, one of the main goals of our

research is to determine how textual data can contribute to EL

complexity prediction. Thus, in our Approach, we conduct corre-

lation analysis [83] to find out how the TD complexity is related

to the EL complexity. In the correlation analysis, the strength

of association between these two complexities and the direction

of their relation are measured. More specifically, we investigate

strong-positive, strong-negative, and no correlations. In the case

when TD and EL complexities are close to a normal distribution,

the Pearson correlation is used. To assess the strength in the

Pearson correlation, we follow the general guidelines [83,84] and

use 0.1, 0.3, and 0.5 as coefficient thresholds. In the case of non-

normality in TD and EL complexities, we choose the Spearman’s

correlation [83,84]. Using the same general guidelines, 0.2 and 0.8

are taken as the thresholds to detect the strength of correlations.

In addition, p≤0.05 (p denotes probability) is used as the

indicator both in the Pearson and Spearman correlations for a

significance identification.

When TD and EL complexities are in a strong-positive correla-

tion, we can use textual data to predict the execution complexity

of BPs by means of TD complexity. Moreover, organizations can

make prior decisions and take actions to mitigate complexity in

performing BPs. A negative correlation between TD and EL com-

plexities is a good indicator to identify which data type should

be further analyzed. No correlation between these two complex-

ities cannot be directly interpreted. Therefore, more textual data

and event log attributes and other BP execution data, such as

performance indicator values, should be considered and further

analyzed to detect reasons for complexity.

In Table 7, the TD and EL complexities for the running example

IT tickets are shown. Solely considering these two tickets, one can

notice that the TD complexity has a positive correlation with the

EL complexity. Based on such observation, the following inter-

pretation can be deduced: TD complexity affects EL complexity.

Hence, textual data of these (and similar) tickets can be used to

predict their EL complexity.

As a correlation does not necessarily imply a cause and effect

between two complexities, we conduct a significant difference

analysis on event logs to identify the factors that may account

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 7

Textual data- and event log-based complexities of the running example IT tickets.

ID Channel Category TD complexity EL complexity

SR001 IT ticketing system Application Low Low

SR002 IT ticketing system Security Medium High

for variation in the EL complexity. In the following subsection,

we elaborate on that.

4.6. Significant difference analysis of event logs

To understand what process parts or activities affect the EL

complexity, statistically significant differences in the event logs

need to be analyzed. Aligned with that goal, there are approaches

in the related literature. We reviewed the applicability of these

approaches for analyzing the event logs within our Approach.

The approach by Bolt et al. [85] came forward since it provides

an extensible basis for the EL complexity metrics. To identify

what differences are significant, well-known statistical tests are

used within that approach. In particular, the two-tailed Welch’s

T-test [86] is used in the case of a normal distribution. This basis

is beneficial for our Approach due to the following reasons: (1)

for Welch’s T-test, it is not necessary to have equal variance

between two groups, and (2) Welch’s T-test is less restrictive

than the original Student’s T-test, which makes it more reliable.

For non-normality cases, the non-parametric rank-based Mann–

Whitney U-test [87] is applied. To handle outliers and unexpected

observations, this test focuses on the median, which is a better

measure of the central tendency for skewed data. In this respect,

such a non-parametric method is useful for our Approach.

Furthermore, this approach is available as a plugin (called

Process Comparator) in the Process Mining framework, ProM [88],

which offers built-in features for handling event logs. As such, we

execute this plugin for each complexity cluster pair and analyze

statistically significant differences in the event logs for a partic-

ular cluster pair. Complexity cluster pairs are determined using

the correlation analysis results. For example, if a strong-positive

correlation is observed with respect to an attribute present in

textual data, subsets created for that attribute will be used to

form cluster pairs.

Taking our running example IT tickets, a cluster pair is created

using the category attribute of the tickets. Fig. 5 illustrates how

the significant differences between the event logs of this cluster

pair are highlighted. In the figure, nodes represent activities,

whereas arcs reflect the sequence of these activities in the pro-

cess. The thickness of the arcs and nodes is determined based on

the increasing or decreasing value of the selected process met-

ric, for example, frequency or duration. Distinct colors, i.e., red

and blue, are used to indicate significant differences in terms

of the selected process metric. In addition, the letters B and R

are placed where appropriate to indicate blue and red colors,

respectively. As can be seen, the activity ‘‘Hand-over’’ (T10) is

significant in the cluster that contains the ticket SR002. Likewise,

‘‘Work assignment’’ (T14) is significant in the cluster that SR001

is put. Considering these insights, one can further analyze the

presence or absence of those textual data features which resulted

in handovers or work reassignments.

5. Case study

In this section, first, we give information about the setting of

the case study, in which we apply the proposed Approach. Then,

we elaborate on the application of our Approach and obtained

results.

Case study organization: The IT department of an educational

institution in the Netherlands initiated a project to learn from

Fig. 5. Statistically significant differences between SR001 (with B) and SR002

(with R). (For interpretation of the references to color in this figure legend, the

reader is referred to the web version of this article.)

the data recorded about its ITSM processes. One of the important

processes among them is the SRM process. This process defines

the way of handling user requests related to the products and

services offered by the educational institution. Upgrading a soft-

ware product installed on a user device or providing access to

a file are two examples of typical service requests. Since the

institution offers a wide range of products and services to more

than 25K users, multiple resolution teams are involved in the SRM

process. Each resolution team handles requests for a particular

set of products and services. For example, printing and secure file

sharing are two services managed by separate resolution teams.

To accomplish a data-driven service delivery, the IT depart-

ment, hereafter Org-IT, expressed its interest in finding ways to

reduce the complexity of incoming requests. As this setting is

highly related to the Approach introduced in this paper, we apply

it in Org-IT and explain the findings showing its usefulness.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 8

Identified ticket attributes.

Attribute Description Values

Channel The entry source of the ticket IT ticketing system, email, phone, online

chat, desk-physical location

Category The assigned category on the

ticket

A number of categorical values, e.g.,

printing, IT infrastructure

Organizational unit The organizational entity that the

ticket reporting end-user belongs

A number of categorical values, e.g., human

resources, finance department, or science

faculty

Duration Time spent on the ticket Less than 1 day, 1–3 days, 3–5 days, 5–10

days, 10–15 days, and more than 15 days

Hold-on Whether the ticket is held-on, i.e.,

the ticket processing is paused,

and the end-user is asked for

more information

Never, once, more than once

Re-open Whether the ticket is reopened Never, once, more than once

5.1. Data collection

As explained in the previous section, three types of input

(textual data, event log, and complexity scale) are required in the

Approach. To obtain these, we worked together with five experts

who coordinate the resolution teams in Org-IT. Importantly, these

experts have substantial knowledge of incoming service requests.

With the support of the experts, first, we defined a complexity

scale. Second, we asked experts to label a set of randomly selected

service requests considering their textual data. Lastly, together

with the experts, we extracted the SRM process execution data

in the form of an event log.

Complexity scale: In a semi-structured discussion meeting, we

asked the aforementioned five experts to define a complexity

scale based on the service requests handled by the resolution

teams they are coordinating. The experts agreed on a three-point

scale containing low,medium, and high as complexity values.

Textual data:Org-IT provided us with textual data about the

service requests handled between Jan 2019 and May 2021. The

provided textual data contains 4982 service requests (also called

tickets). From them, randomly selected 134 (∼2.7% of the total

requests) are labeled based on the defined three-point complexity

scale with the support of the same five experts. In the process

of labeling, the experts make their decision based on the textual

data as well as other data recorded about tickets, like priority,

category, or attached documents.

Event log: For the aforementioned 4982 service requests, we

received an event log consisting of ∼37K events.

Furthermore, to conduct a detailed analysis of the relation

between TD and EL complexities, subsets from textual data and

event log need to be created. For this purpose, a set of service

request attributes is identified together with the aforementioned

experts. These attributes are listed in Table 8. In addition, to

analyze complexities over time and detect changes, the following

time periods are defined: before Covid-19 (Jan 2019–Feb 2020)

and during Covid-19 (Mar 2020–May 2021).

5.2. Calculating textual data-based complexity

DML taxonomy is a required input to calculate TD complexity,

as explained in Section 4. Since it is not provided in the case

study, we develop it. For this purpose, we follow the proce-

dure explained in Section 3, which is taken from our previous

work [11].

DML taxonomy for SRM process: As we focus on the SRM pro-

cess in the case study, we develop a DML taxonomy for that

process. To express the cognition level of textual data in terms

of linguistic features, PoS (nouns, verbs, adjectives, and adverbs)

in textual data are analyzed. First, we clean the textual data in

the tickets and create document-term matrices. To clean the tex-

tual data, we performed the following activities: (i) we removed

signatures in the tickets received via email and (ii) URLs, email

addresses, and phone numbers mentioned in the text are replaced

with pseudonyms (for example, url1, email1). Second, LDA and

LSI topic modeling methods [68] are combined to extract the de-

scriptive keywords of the tickets. Finally, the extracted keywords

are grouped into the three DML cognition levels (routine, semi-

cognitive, and cognitive). In this grouping, the aforementioned

five experts are involved in correctly identify the DML cognition

level for each keyword. In particular, these experts are asked to

critically evaluate and provide their feedback on the extracted

keywords and their corresponding DML cognition levels. Addi-

tionally, the Information Technology Infrastructure Library (ITIL)

framework [89] is used to enrich the taxonomy. The experts are

asked to identify which keywords from the SRM process docu-

mentation in ITIL should be included in the taxonomy. The DML

taxonomy created for the SRM process is shown in Table A.12 in

the appendix. For the implementation purpose, separate text files

for each part of speech and DML level are created as presented on

our Github page.2

Using the developed DML taxonomy, we extract the DML

taxonomy-based linguistic features for the given 4982 tickets.

Afterward, for the same tickets, we extract the stylistic features

as listed in Table 1 in Section 3.

Next, prediction models are developed to classify unlabeled

tickets. Specifically, we take the four typical semi-supervised

learning techniques as the basis and enrich the unlabeled data

using the labeled data. Afterward, labeled and unlabeled data are

combined. From the combined data, training and test data sets are

created. A number of commonly used prediction modeling tech-

niques are trained on the training data set. The best-performing

prediction model is selected using the F-score metric. Then, the

best-performing prediction model is run on the unlabeled data to

assign TD complexities.

Fig. 6 depicts the flow from cleaning textual data to calculating

TD complexities. How textual data is split and processed can be

seen in the figure.

As indicated before, developing DML taxonomy-based linguis-

tic features requires additional resources, i.e., domain expert in-

volvement. In fact, one can argue whether such investment would

provide more accurate results compared to a feature develop-

ment technique that does not include user involvement. To ad-

dress this concern and show the value of our linguistic features

2check out the DML taxonomy on our Github page.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 6. Textual data-based complexity calculation by means of prediction.

in predicting TD complexity, we take Linguistic Inquiry and Word

Count (LIWC), a state-of-the-art text analysis technique, predict

TD complexity using LIWC features, and compare the results with

our linguistic features. In this regard, first, we elaborate on LIWC

features and then assess the evaluation of our approach against

it.

LIWC is a dictionary-based text analysis technique that focuses

on the connection between important psycho-social constructs

and theories with words, phrases, and other linguistic construc-

tions [90]. Words given in a text are analyzed and placed into one

or more linguistic, psychological, and topical categories. Each of

these categories indicates several aspects of a text, for example,

social, cognitive, or affective. From these categories, we select

linguistic features to comply with the focus of our Approach.

The selected features and their definitions with examples (taken

from [90]) are given in Table 9. As can be seen, there is consid-

erable overlap between our linguistic features listed in Table 1

and the selected LIWC features. Notably, features derived using

parts-of-speech (PoS) in text, for example, nouns, verbs, adverbs,

and adjectives, show similarity.

Using the LIWC implementation,3we obtain the LIWC fea-

ture values (mostly counts and distribution frequencies) for the

tickets. Using these values and following the same steps ex-

plained above, we develop prediction models and measure the

performance of these models in terms of weighted F-score. The

prediction models and their performance in the evaluation us-

ing both our linguistic features and LIWC features are given in

Table 10.

As highlighted in italics in the first row in Table 10, Bagged

Decision Trees is the outperforming algorithm with the best

weighted F-score value. This performance is achieved with the

DML taxonomy-based and stylistic features. In the case of LIWC

features, the Bagged Decision Trees algorithm has notably lower

performance. Overall, in almost all algorithms, we obtained a

better performance with our linguistic features than LIWC fea-

tures. Only in two cases, with a subtle difference, LIWC features

showed a better performance. These are highlighted in the ta-

ble in dark gray, namely Naïve Bayes with Label Spreading and

Stacked SVM-Naïve Bayes with Pseudo-Labeling. Important to

note that tree-based algorithms perform significantly well when

3check out LIWC implementation.

the base semi-supervised learning technique is Pseudo-Labeling

(see the top four rows in the table). However, Pseudo-Labeling is

also seen in the least successful performances (see the last row

in the table). Apart from that, Label Spreading is another semi-

supervised learning technique observed in the majority of the less

successful performances.

To perform a drill-down TD complexity calculation, we take

Table 8 as the basis and create subsets for the following: per ticket

attribute, pairwise combined ticket attributes, and before and

during Covid-19 periods. Hereby, the case study experts indicated

not only important ticket attributes but also the most relevant

pairwise combinations, i.e., by ‘‘channel’’ attribute, as it directly

affects the incoming text of a service request. The TD complexity

per subset is computed using the weighted TD complexity of the

tickets contained in it. For the low, medium, and high points in

the complexity scale, 1, 2, and 3 are used as the weight multipli-

ers, respectively. Simply put, the TD complexity per subset is the

aggregation of individual TD complexity of tickets in the subset

using these weight multipliers.

5.3. Calculating event log-based complexity

The event log provided by the case study organization is

filtered, and event log subsets are created for the ticket attributes

listed in Table 8, their pairwise combinations, and the defined

two time periods. For example, five event log subsets are created

for the five values seen in the channel attribute. To obtain a

single EL complexity for an event log subset, cluster analysis

is conducted. The number of clusters is set to three since the

given complexity scale contains three value points. Then, for

each EL complexity metric (see in Table 3), a complexity value

is determined, resulting in 13 values per subset. With a majority

voting, a single complexity value is selected as the EL complexity

for each subset.

5.4. Analyzing correlations

In the correlation analysis, we measure to what extent the

calculated TD and EL complexities correlate. Aligned with the

specification in our Approach (see Section 4.5), the Spearman’s

correlation is chosen for the analysis, as there is non-normality

in the data distribution and the complexity scale contains ordinal

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 9

LIWC features [90].

Category Feature Definition/Example

Summary

Word count Total word count

Words per sentence Average words per sentence

Big words Percent words 7 letters or longer

Dictionary words Percent words captured by LIWC

Analytical thinking Metric of logical, formal thinking

Linguistic

Total function words the, to, and, I

Total pronouns Total amount of pronouns

Personal pronouns I, you, my, me

1st person singular I, me, my, myself

1st person plural we, our, us, lets

2nd person you, your, u, yourself

3rd person singular he, she, her, his

3rd person plural they, their, them,

Impersonal pronouns that, it, this, what

Determiners the, at, that, my

Articles a, an, the, alot

Numbers one, two, first, once

Prepositions to, of, in, for

Auxiliary verbs is, was, be, have

Adverbs so, just, about, there

Conjunctions and, but, so, as

Negations not, no, never, nothing

Common verbs is, was, be, have

Common adjectives more, very, other, new

Quantities all, one, more, some

Psychological-cognition

All-or-none all, no, never, always

Cognitive processes but, not, if, or, know

Insight know, how, think, feel

Causation how, because, make, why

Discrepancy would, can, want, could

Tentative if, or, any, something

Certitude really, actually, of course, real

Differentiation but, not, if, or

Memory remember, forget, remind, forgot

Expanded-states

Need have to, need, had to, must

Want want, hope, wanted, wish

Acquire get, got, take, getting

Lack don’t have, didn’t have, hungry

Fulfilled enough, full, complete, extra

Fatigue tired, bored, don’t care, boring

Expanded-time

Time when, now, then, day

Past focus was, had, were, been

Present focus is, are, I’m, can

Future focus will, going to, have to, may

values. We compute the correlations for the subsets created with

the help of the case study experts. The computed correlations

are displayed in Table 11. The first column Combination contains

the information regarding the grouping attribute, i.e., its presence

(‘‘Channel’’) or absence (‘‘–’’). In the column Overall, the overall

values of coef, i.e., coefficient indicating the strength of the mea-

sured relationship between TD and EL complexities, and p, i.e., the

quantified significance of such a relationship, are shown. They

are followed by the columns Before Covid-19 and During Covid-

19 revealing coef and pvalues for the indicated time periods. For

instance, for the subset created using the pairwise combination

of the ticket attributes ‘‘Channel’’ and ‘‘Duration’’ for the time

period before Covid-19, coefficient 0.439 and pvalue 0.101 are

computed.

Following the general guidelines explained in the Approach

(see Section 4.5), 0.8 and 0.2 are chosen as the thresholds for

strong and weak correlations. The significance of correlations is

identified using the criterion p≤0.05, which is also explained

in the same Section 4.5. Accordingly, in Table 11, strong corre-

lations are highlighted in dark gray cells. Light gray is used for

indicating the combinations where pvalues meet the criterion.

As can be seen in the first six rows of the table (no group-

ing attribute ‘‘–’’), the correlations are strong in the following

four ticket attributes: ‘‘Channel’’, ‘‘Organizational unit’’, ‘‘Hold-

on’’, and ‘‘Re-open’’. Moreover, in all but one of these attributes

(‘‘Organizational unit’’), the over-time strong correlations are

identified. The correlation strength for ‘‘Organizational unit’’ in

one of the time periods (before Covid-19) is only 0.12 less than

the defined threshold. In the combinations, i.e., pairs (grouping

attribute ‘‘Channel’’), a strong correlation is identified for the

‘‘Channel–Category’’ combination. The computed correlation dur-

ing Covid-19 is significant for this pair. However, it is weak, albeit

very much above the defined threshold (0.2).

Fig. 7 illustrates some of the correlations. In the figure, we

used jittering to reduce overlapping points that hinder getting

a sense of density. As can be seen in Table 11, we observe

weak correlations for the following two subsets, including over

time (no grouping attribute ‘‘–’’): ‘‘Category’’ and ‘‘Duration’’.

For the ‘‘Duration’’ attribute, there is a subtle difference be-

tween the obtained correlation and the defined strong threshold

(i.e., 0.8−0.707 =0.093). Apart from that, weak correlations

are observed in the three out of five combinations (grouping

attribute ‘‘Channel’’ in Table 11): ‘‘Channel–Organizational unit’’,

‘‘Channel–Re-open’’, and ‘‘Channel–Duration’’. The only combina-

tion for which no correlation exists is ‘‘Channel–Hold-on’’. Aside

from that, no negative correlations are detected.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Table 10

Evaluation of prediction models for textual data-based complexity.

Meta algorithm AlgorithmaBase SSLTbWeighted F-score

DML taxonomy-based & stylistic

features

LIWC features

Bagging Decision Trees PL 0.943 0.865

– Random forest PL 0.94 0.852

– Extra trees PL 0.938 0.881

Boosting Random forest PL 0.931 0.872

Boosting Gradient boosting PL 0.922 0.832

– Logistic regression PL 0.919 0.881

– K-Nearest neighbors PL 0.914 0.877

– Decision trees PL 0.909 0.85

Boosting AdaBoost PL 0.901 0.839

– Stochastic gradient descent PL 0.9 0.697

– Naïve Bayes PL 0.899 0.904

– Perceptron ST 0.897 0.826

– Support vector machines ST 0.897 0.818

– Support vector machines PL 0.897 0.852

Stacking Stacked: Support vector machines and

Naïve Bayes, finalizer: Logistic regression

PL 0.897 0.896

– Logistic regression ST 0.893 0.818

– Perceptron PL 0.893 0.837

– Extra trees LS 0.89 0.867

– Support vector machines LS 0.89 0.826

Stacking Stacked: Support vector machines and

Naïve Bayes, finalizer: Logistic regression

LS 0.89 0.826

– Random forest LS 0.89 0.867

– Stochastic gradient descent ST 0.889 0.697

– Decision trees LS 0.889 0.867

Stacking Stacked: Support vector machines and

Naïve Bayes, finalizer: Decision trees

LS 0.885 0.826

– K-Nearest neighbors LS 0.885 0.826

Boosting Gradient boosting LS 0.885 0.826

Boosting Random forest LS 0.884 0.826

Bagging Decision trees LS 0.883 0.826

– Naïve Bayes LP 0.871 0.794

Boosting AdaBoost LS 0.839 0.818

Stacking Stacked: Support vector machines and

Naïve Bayes, finalizer: Decision trees

PL 0.815 0.828

aFor algorithms, we refer to the Python scikit-learn library implementation on https://scikit-learn.org/stable/.

bSemi-Supervised Learning Technique. PL: Custom Pseudo-Labeling, ST: Self-Training, LS: Label Spreading, and LP: Label Propagation.

Table 11

Correlations between textual data- and event log-based complexities.

Combination Overall Before Covid-19 During Covid-19

Grouping attribute Attribute coef. p coef. p coef. p

– Channel 0.968 0.007 1 0 1 0

– Organizational unit 0.983 0 0.788 0 0.833 0

– Hold-on 0.866 0.333 0.866 0.333 0.866 0.333

– Re-open 0.866 0.333 0.866 0.333 0.866 0.333

– Duration 0.707 0.116 0.707 0.116 0.707 0.116

– Category 0.577 0.081 0.655 0.056 0.632 0.092

Channel Category 0.816 0 0.607 0 0.977 0

Channel Organizational unit 0.577 0 0.364 0.005 0.533 0

Channel Re-open 0.556 0.048 0.342 0.232 −0.082 0.782

Channel Duration 0.476 0.019 0.594 0.001 0.536 0.003

Channel Hold-on 0.037 0.895 0.439 0.101 0.426 0.113

Furthermore, the significance of the correlations can be seen

in the pcolumn in Table 11. ‘‘Channel’’ and ‘‘Organizational unit’’

are the attributes with significant correlations, including the two

time periods. For the pair ‘‘Channel–Category’’, a similar obser-

vation can be obtained from the table. In addition, the signif-

icance criterion is met when the ‘‘Channel’’ is combined with

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 7. Correlations between textual data- and event log-based complexities.

‘‘Organizational unit’’, ‘‘Re-open’’, and ‘‘Duration’’ attributes, al-

beit their computed correlations are not strong.

The correlations explained above serve as the basis for

analyzing statistically significant differences in the execution of

BPs based on the event log subsets. In the following subsection,

we present the results of that analysis.

5.5. Analyzing significant differences in business process executions

Considering Table 11, we focus on strong correlations, weak

correlations, and changes in correlations over time. Accordingly,

we analyze the statistically significant differences between the

event log subsets in the related correlations. The most interesting

findings from that analysis are presented in this subsection.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 8. Significant differences between more interactive channels (with B) and

less interactive channels (with R). (For interpretation of the references to color

in this figure legend, the reader is referred to the web version of this article.)

Due to their real-time conversation nature, phone, desk-

physical location, and online chat are considered more interactive

channels. Unlike, email and IT ticketing system are the two less

interactive channels. As shown in Table 11 and in Fig. 7(a), we ob-

tained strong significant correlations in the ‘‘Channel’’ attribute.

More specifically, EL complexity increases when the channel is

less interactive. Hence, we investigated the differences between

the event log subsets of more and less interactive channels.

In Fig. 8, statistically significant differences in the SRM process

execution between more and less interactive channels are dis-

played. Activities and paths that more often reoccur in handling

tickets received via email and IT ticketing system are presented

in red (with R). Blue (with B) is used for phone, desk-physical

location, and online chat. The activity T9 is related to the catego-

rization of tickets and their assignments to the ticket resolution

teams. In interactive channels, it is usually performed after the

‘‘Registration’’ activity (T1). However, in less interactive chan-

nels, activities ‘‘Pause’’ and ‘‘Ask end-user for extra information’’

(T12 and T13) occur more often. In addition, ‘‘Hand-over’’ (T10)

and ‘‘Work assignment’’ (T14) are also frequent activities for

these channels. Aside from that, ‘‘Reopening’’ (T31) of the tickets

coming via these channels takes place more often compared to

the interactive channels. Aligned with that, activities that are

related to the resolution and closure of the ticket (T16, T29, and

T30) and changing the ticket status (T8) occur more often for

the same channels. In a further analysis, the tickets coming via

Fig. 9. Significant differences between categories 3, 4, 5, and 7 (with B) and

others (with R). (For interpretation of the references to color in this figure

legend, the reader is referred to the web version of this article.)

less interactive channels, i.e., email and IT ticketing system, are

checked. A notable observation is that the priority of the tickets

with a medium TD complexity is frequently changed.

‘‘Channel–Category’’ is the only pair (grouping attribute ‘‘Chan-

nel’’) in which strong significant correlations are observed (see

Table 11). Moreover, as depicted in Fig. 7(b), in the four ticket

categories (Category 3, 4, 5, and 7), the EL complexity of tickets

registered via IT ticketing system increases to high while their

TD complexities remain unchanged. Fig. 9 shows the differences

in handling the tickets in these four categories (blue color) in

comparison to the other categories. Notably, ‘‘Changing assigned

resolution teams’’ activity (T10) in the tickets is more common in

these categories. In other categories, ‘‘Work assignment’’ (T14) is

frequently handled in the same resolution teams.

In addition, as can be seen in Fig. 7(b), the TD complexity

of the tickets coming via email channel is low before Covid-

19 and increases to medium during Covid-19. However, there

is no recognizable change in their EL complexities. Therefore,

these tickets are further analyzed. It is found out that the median

duration for handling these tickets has been doubled.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 10. Significant differences between duration longer than 10 days (with B) and up to 10 days (with R). (For interpretation of the references to color in this

figure legend, the reader is referred to the web version of this article.)

Fig. 10 shows the difference analysis based on Fig. 7(c), i.e.,

‘‘Channel–Duration’’. In particular, we investigate the reasons that

may account for the longer execution duration (i.e., more than

10 days) of the tickets with low TD complexity and received

via interactive channels (phone and desk-physical location). As

highlighted in blue (with B), frequent ‘‘Reopening’’ activity (T31)

is observed. Because of frequent reopening, other main activities

(for example, T14 and T8) are repeated and highlighted in blue

(with B) to indicate their recurrence. Noteworthy, there might

be several reasons for a longer execution duration of the tickets

with low TD complexity and vice versa, a subject for further

investigation. For example, in some cases, end-users trigger the

reopening of a prior ticket by appending a follow-up request or

requesting minor adjustments.

Next, in Fig. 7(d), we focus on organizational units issuing the

requests. Organizational unit2 is the outlier unit in terms of the

TD and EL complexities. We analyze the executions of the tickets

of that unit and compare them with the tickets coming from the

rest of the units. In Fig. 11, the frequently performed activities

of handling the tickets coming from that unit are displayed in

blue (with B), for example, ‘‘Work assignment’’ (T14) and ‘‘Ask

end-user for extra information’’ (T13) activities. As can be seen,

‘‘Changing ticket status’’ (T8) happens more frequently. In ad-

dition, in the tickets coming from Organizational unit2, ‘‘Work

assignment’’ (T14) is directly followed by ‘‘Provide resolution’’

activity (T29) more often.

Considering the findings explained in the subsections above,

we discuss their implications in the following section.

6. Discussion

We discuss our findings, their implications, and the limitations

of our Approach in two subsections. Section 6.1 is devoted to

the interpretations of the case study results and derived relevant

observations. Section 6.2 highlights the benefits of the Approach

while mentioning its limitations.

6.1. Analyzing results and deriving observations

To obtain recommendations for BP redesign and improvement

in the proposed Approach, we address three important challenges

in IT ticket processing: ticket categorization, work assignment,

and prioritization. Textual data describing tickets are the basis for

identifying ticket categories that are then used to assign resolu-

tion teams to the tickets. Moreover, tickets are prioritized based

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Fig. 11. Significant differences between organizational unit2 (with B) and others

(with R). (For interpretation of the references to color in this figure legend, the

reader is referred to the web version of this article.)

on textual data. In IT ticket processing, the priority of tickets has

a major influence on the way they are handled. Hence, we discuss

our case study findings by focusing on such challenges.

The case study results have shown that the TD and EL com-

plexities increase when the channel is less interactive, i.e., email

or IT ticketing system. Specifically, in these channels, tickets

are held on (‘‘Pause’’ and ‘‘Ask end-user for extra information’’

activities) frequently. As a result, constant work reassignments

between resolution teams are detected. Moreover, repetitions of

such activities show a negative influence on the ticket handling

duration, which directly affects Service Level Agreements (SLAs).

Considering these, one can infer that less verbal communication

when submitting the service request raises the complexity. On

the contrary, in interactive channels, textual descriptions of end-

user service requests are clear and comprehensive. The real-time

verbal communication nature of interactive channels enables the

requests to be interpreted and registered correctly. Additionally,

in these channels, operators can guide end-users to provide all

required information in a single conversation. Accurate ticket

categorization, prioritization, and work assignment can be bet-

ter accomplished using the mentioned advantages of interactive

channels.

The analysis using the ‘‘Channel–Category’’ combination shows

that resolution teams are changed frequently in some ticket

categories. Such changes often happen when textual data are not

comprehensive enough to identify the resolution teams correctly.

The involved experts in the case study interpret such frequent

change of resolution teams as a ‘‘ping-pong’’ behavior. In other

words, due to the lack of clarity in textual data, tickets are passing

from one resolution team to another until more information is

available to detect the correct team.

Next, we note that tickets coming from specific organizational

units were more complex. In particular, in one of 17 organiza-

tional units, namely Organizational unit2, medium TD and high

EL complexities are obtained. To unveil the reasons for high EL

complexity, we conducted the significant difference analysis. In

particular, the event log of the tickets requested by this orga-

nizational unit is compared with the event log of the tickets

sent by the remaining 16 organizational units. We found out

that ‘‘Work assignment’’ and ‘‘Ask end-user for extra information’’

activities frequently happen and cause high EL complexity in

handling tickets coming from Organizational unit2. Moreover,

to understand the reasons for TD complexity, we analyzed the

attributes of the tickets coming from this organizational unit.

Based on the ‘‘Category’’ distribution of the tickets, it is observed

that Organizational unit2 often requests services requiring the

involvement of multiple resolution teams. Moreover, technical

terms are commonly used in the textual data of the tickets.

The case study experts mention that end-users belonging to that

organizational unit have technical backgrounds and knowledge.

In this regard, we have discovered that the stylistic features of

these tickets differ from the tickets of other organizational units

in the sense of relative occurrences of unique PoS and wording

style. This observation indicates that in this case, stylistic features

making part of the TD complexity have an important influence on

the EL complexity, i.e., actual ticket processing.

In the context of the implications of the observations about

organizational units, the following points should be considered

by organizations when applying our Approach. The compositions

of particular departments or teams and regulations within them

are likely to have a notable influence on textual data. For example,

textual data in the requests of users with rights to install software

and change configurations on their computers will differ from the

textual data in the requests of users who have no such flexibility

in using a computer. Combining the textual data of such orga-

nizational units and analyzing separately from the textual data

of other grouped units may provide more specific indications of

what leads to complexity. Hence, tailored solutions for addressing

TD complexity in such organizational units may be more effective

than general solutions for the entire organization.

Another observation is that in some tickets coming via inter-

active channels, the TD complexity and ticket execution duration

are inversely related. Noteworthy, the EL complexity of these

tickets remains unchanged. In such exceptional cases, further

analysis considering other ticket attributes and interactions be-

tween teams and end-users is required. For example, due to some

changes in the IT infrastructure or outsourced services, a ticket

with a low TD complexity may take more time to handle.

Aside from the aforementioned, the case study findings also

lead to the following further observations:

•DML taxonomy-based and stylistic features are rather ad-

equate in predicting TD complexity as seen in comparison

with LIWC features.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

•Simple machine learning algorithms, such as tree-based

ones, are powerful and perform well when combined with

pseudo-labeling to deal with semi-supervised learning prob-

lems.

•Entity attributes (for example, channel, category) play an

important role in better understanding the relation between

TD and EL complexities by means of a drill-down analysis.

In addition, the DML taxonomy that is developed for the SRM

process is of practical value and reusable. Since in most organi-

zations operations are dependent on IT, a similar SRM process

exists as part of their ITSM strategy. Hence, it is reasonable that

IT ticket texts contain similar concepts in organizations. Accord-

ingly, the developed DML taxonomy can be adjusted and used for

calculating TD complexity.

Moreover, the results obtained in the case study serve as a

proof of concept for using linguistic features in TD complexity

prediction. In other words, the potential of linguistic features as

textual data representation discovered in our previous work [13]

has been confirmed in a new real-life setting. Notably, DML

taxonomy-based and stylistic features are favorable to reveal such

potential in analyzing TD complexity.

Despite the fact that domain expert involvement is neces-

sary for developing DML taxonomy-based features, in comparison

with LIWC features, we showed that such an effort resulted in

a better TD complexity classification performance. More specif-

ically, the average overall increase in the performance is 7.5%,

whereas it goes up to 9.5% when only the top 10 best-performing

prediction models are considered (see Table 10). Furthermore, the

highest increase in the performance, 29%, is observed in Naïve

Bayes with Pseudo-Labeling. Likewise, 1.5% is determined as the

highest performance loss in the stacking of Naïve Bayes and SVM

with Pseudo-Labeling. Though it is difficult to measure precisely,

based on the difference in the classification performance, we

can infer that such a performance gain outweighs the cost of

DML development. Hence, organizations that want to adopt our

Approach can take the comparison as an initial baseline for a

trade-off between costs and performance gain.

6.2. Outlining benefits and limitations

Considering the results discussed above and correlations pre-

sented in Section 5, we conclude that TD complexity has con-

nections to EL complexity. We show the benefits of our Approach

in detecting and analyzing the relation between these two com-

plexities in a real-world setting. Specifically, our observations

clearly demonstrate that increasing TD complexity could account

for longer BP execution time, less accurate ticket categorization

and prioritization, and frequent work reassignments triggered by

hold-ons and re-opens.

Overall, the connection between TD complexity and EL com-

plexity indicates that textual data are appropriate and can be

used for predicting EL complexity. Organizations operating in

various domains often rely on textual data while performing their

BPs since textual data are generally the primary input to their

BPs. Such organizations can highly benefit from our Approach.

For example, banks, governmental bodies, universities, infras-

tructure or supply providers have established BPs for handling

various requests coming from their customers in a textual form.

Complexity in these texts can considerably influence the execu-

tion of their BPs. By studying this complexity, organizations can

predict and, therefore, mitigate complexity in performing BPs.

Furthermore, using our Approach, organizations can have a more

comprehensive way of identifying process redesign and improve-

ment opportunities. Such opportunities can be formulated based

on the activities in BPs that affect EL complexity. These activities

can be detected in the significant difference analysis phase of the

Approach.

Important to note that our Approach reveals a great potential

for understanding the implications of each specific linguistic fea-

ture for textual complexity. As illustrated in the running example

(see Table 5), the contribution of each feature to complexity

can be identified separately by tracing back from the aggregated

single TD complexity value. Having such information can help

organizations in several ways. In real-time, text provided by users

to BPs can be annotated in terms of complexity contributions.

Hence, based on the indications of those text parts leading to

complexity, users can improve text quality. Another way could

be assisting process workers in rephrasing text for a more accu-

rate interpretation. For example, depending on the text quality,

process workers may apply triage on requests or add an extra

explanation in the text to better identify what activities are

required in handling requests. Providing support for reducing the

use of words or phrases resulting in higher TD complexity and

further EL complexity could also be beneficial for organizations.

Further, the Approach sets apart from the state-of-the-art ap-

proaches on BP complexity analysis by (i) combining textual

data and event log, (ii) blending readily available techniques

in calculating TD and EL complexities, and (iii) analyzing the

relation between them. Our Approach is the first effort to ana-

lyze the connection between TD and EL complexities. For doing

this, publicly accessible and frequently used techniques, such as

LDA, LSI, common machine learning algorithms, and existing EL

complexity metrics, are employed.

Although the case study of our previous research serving the

basis for TD complexity as well as the case study of this pa-

per belong to the ITSM domain, i.e., IT ticket processing, the

research value goes far beyond the latter. The Approach can be

used by organizations relying on textual data as an input to

their processes and already executing or interested in initiat-

ing BPM projects, for example, healthcare or public administra-

tion institutions receiving customer requests in a textual form.

The domain-specific adaptations of the Approach may require

additional efforts of different degrees. Accordingly, among the

four inputs of the Approach, which are textual data, event log,

complexity scale, and DML taxonomy, the latter requires the

most manual effort. Hereby, the availability of experts and their

willingness to dedicate time and resources as well as top man-

agement support can significantly influence the process of DML

taxonomy creation. At the same time, DML taxonomy captures

the essential semantics enabling context awareness and domain

adaptation in the Approach.

As also follows from the title, the focus of the paper lies on the

BP execution data. However, if we consider the BPM lifecycle [1],

the role of textual data goes far beyond BP execution. It can be

employed throughout all phases of the lifecycle. For example,

in the discovery phase, process analysts might use available BP

descriptions, textual data from interviews, legal documents, or

ethnography [91–93]. In other BPM lifecycle phases, such as

process analysis, process redesign, and process monitoring and

controlling, any BP changes must comply with legal requirements,

corporate standards, business rules, and service operating proce-

dures which usually exist in textual form. All these documents

have a more official character than textual data produced by BP

participants (like conversations) in the BP execution revealing

different style and, hence, textual complexity. Such textual in-

formation is not considered in the present paper. However, this

limitation represents a promising direction for future research.

Accordingly, our Approach can be extended with the analyses

of further textual data sources to develop support for process

analysts at various phases of the BPM lifecycle.

Our Approach reveals several further limitations. Thus, iden-

tification of the relevant entity attributes for creating subsets

to perform a drill-down analysis is performed manually in the

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Approach. In that sense, it is a limitation. However, referred entity

attributes in the BPs can be identified and incorporated into the

Approach. Hence, one can automatically select entity attributes

that are relevant for particular BPs. Another limitation of the

Approach is related to textual data: it only supports the texts

written in English. With the advancement of NLP libraries, more

languages can be considered in the Approach.

7. Conclusion and future work

In this paper, we presented a novel Approach aimed at BP

execution complexity analysis by combining textual data and

event log. In particular, in the Approach, the relation between

TD and EL complexities is analyzed. To calculate TD complexity,

we use two sets of linguistic features aimed at capturing TD

complexity in terms of cognition and style: DML taxonomy-based

and stylistic features. For calculating EL complexity, the state-of-

the-art event log-based complexity metrics are employed. Then,

the correlations between these two complexities are measured to

check how TD and EL complexities are associated. Based on the

computed correlations and analysis of the significant differences,

the BP activities affecting EL complexity are determined. Hence,

the factors that may account for an increase or decrease in EL

complexity are identified.

The proposed Approach was applied in the IT department of

the academic institution. Specifically, using the textual data (ser-

vice requests descriptions) and an event log from an SRM process,

TD and EL complexities were calculated, and their relation was

analyzed. Further, the advantages of DML taxonomy-based and

stylistic features in determining TD complexity are assessed by

means of comparison against well-accepted linguistic features,

LIWC, in predicting TD complexity. Our findings showed that

TD and EL complexities highly correlate in most cases. Thus,

following our Approach, textual data can be used to predict the

complexity of BP execution.

The results presented in this paper are, to the best of our

knowledge, the first investigation of the connection between TD

and EL complexities related to BP execution. The existence of such

a connection implies that organizations can benefit from studying

the complexity expressed by means of textual data. For instance,

organizations that rely on textual data in performing their BPs can

approximate the BP execution complexity. Thus, organizations

can develop strategies and make prior decisions to deal with the

complexity of BP execution. Furthermore, the Approach enables

organizations to identify the factors that affect the complexity in

BP execution. By interpreting these factors, process redesign and

improvement directions can be determined.

In future work, we will expand our scope with other ITSM pro-

cesses and incorporate conversations captured in BP executions

into textual data analysis. Moreover, we would like to include

decision mining technologies [94] in our Approach since the BP

decision points are very likely to be associated with BP vari-

ants. Apart from that, we aim to develop new linguistic features

using sentences and their dependencies to further improve TD

complexity prediction. Combining our linguistic features with

LIWC features for achieving a better classification performance is

another direction we want to pursue. Regarding EL complexity,

we want to move one step further, focus on discovered process

models, and hence, incorporate process complexity metrics into

our Approach. In addition, investigating the potential of text sum-

marization for obtaining complexity from textual data is another

future work avenue. We will also consider the complexity analy-

sis of other textual data, such as BP descriptions, legal documents,

corporate standards, and interview transcriptions, to assist the

process analysts at all the stages of the BPM lifecycle. Lastly, we

will experiment with other approaches to process complexity,

for example, considering process context [73,74], to enhance our

Approach.

Table A.12

Decision-making logic taxonomy for service request management process.

Decision-making cognition levels

Routine Semi-cognitive Cognitive

Resources (Nouns) account, activation, address, admin, administrator,

admission, agenda, appointment, assessment,

authentication, authenticator, authorization,

booking, capacity, certificate, code, contract,

credit, demo, download, guest, host, intranet,

licence, license, link, mail, mailaddress, mobile,

password, permission, phone, portal, printer,

questionnaire, reference, registration, reset, staff,

storage, twofactorauthentication, update, url,

version

acceptance, app, application, archive,

backup, balance, bank, battery, cloud,

configuration, document, domain,

keychain, mailbox, migration,

network, security, software, toner,

upgrade, video, vpn

analysis, database, disk,

distribution, drive, driver, file,

firewall, folder, group, owner,

ownergroup, recovery,

workstation, server

Techniques (Verbs) complete, connect, create, download, exist,

expand, expire, extend, find, grant, increase,

inform, install, login, logon, open, print, reset,

restart, run, save, send, share, start, stop, store,

write, update

access, activate, add, approve, assign,

associate, deactivate, decide, delete,

disable, edit, link, make, reactivate,

recover, remove, replace, set, unlink

analyse, analyze, change, define,

design, lose, migrate, modify

Capacities (Adjectives) active, additional, available, correct, easier, exact,

extra, free, full, higher, important, larger, last,

latest, least, new, newest, next, older, online,

optional, organizational, original, outdated,

private, public, qualitative, ready, relevant,

responsible, several, urgent, valid, visible, wide

empty, external, incorrect,

international, local, long, multiple,

remote, safe, unable, wrong,

unknown

broken, different, compatible,

offline, temporary, stuck, spatial

Choices (Adverbs) almost, apparently, beforehand, completely,

correctly, directly, easily, efficiently, either, else,

ever, exactly, far, fully, last, mainly, much, next,

otherwise, properly, quite, rather, since, soon,

successfully, together, totally, urgently

accidentally, already, also, always,

anymore, anywhere, automatically,

constantly, everytime, frequently,

indeed, initially, instead, manually,

maybe, moreover, mostly, never,

often, perhaps, previously, probably,

regularly, sometimes, somewhere,

suddenly, temporarily, though, twice,

usually, wrongfully

hence, however, locally, remotely,

somehow, still, therefore, thus, yet

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

Declaration of competing interest

The authors declare that they have no known competing finan-

cial interests or personal relationships that could have appeared

to influence the work reported in this paper.

Data availability

The authors do not have permission to share data.

Appendix A. DML taxonomy

See Table A.12.

References

[1] M. Dumas, M. La Rosa, J. Mendling, H.A. Reijers, Fundamentals of Business

Process Management, second ed., Springer, 2018.

[2] C. Matt, T. Hess, A. Benlian, Digital transformation strategies, Bus. Inf. Syst.

Eng. 57 (5) (2015) 339–343.

[3] M. Fischer, F. Imgrund, C. Janiesch, A. Winkelmann, Strategy archetypes

for digital transformation: Defining meta objectives using business process

management, Inf. Manage. 57 (5) (2020) 103262.

[4] A. Augusto, J. Mendling, M. Vidgof, B. Wurm, The connection between

process complexity of event sequences and models discovered by process

mining, Inform. Sci. 598 (2022) 196–215.

[5] H. van der Aa, J. Carmona, H. Leopold, J. Mendling, L. Padró, Challenges

and opportunities of applying natural language processing in business

process management, in: Proceedings of the 27th International Conference

on Computational Linguistics, COLING 2018, Santa Fe, New Mexico, USA,

August 20-26, 2018, Association for Computational Linguistics, 2018, pp.

2791–2801.

[6] V. Kobayashi, S.T. Mol, H. Berkers, G. Kismihók, D.N. den Hartog, Text

mining in organizational research, Organ. Res. Methods 21 (2018) 733–765.

[7] M. Schäfermeyer, C. Rosenkranz, R. Holten, The impact of business process

complexity on business process standardization, Bus. Inf. Syst. Eng. 4 (5)

(2012) 261–270.

[8] B.T. Pentland, P. Liu, W. Kremser, T. Hærem, The dynamics of drift in

digitized processes, MIS Q. 44 (2020) 19–47.

[9] B. Münstermann, A. Eckhardt, T. Weitzel, The performance impact of busi-

ness process standardization: An empirical evaluation of the recruitment

process, Bus. Process Manage. J. (2010).

[10] A. Gunasekaran, B. Nath, The role of information technology in business

process reengineering, Int. J. Prod. Econ. 50 (2–3) (1997) 91–104.

[11] N. Rizun, A. Revina, V.G. Meister, Method of decision-making logic discov-

ery in the business process textual data, in: International Conference on

Business Information Systems, Springer, 2019, pp. 70–84.

[12] N. Rizun, V.G. Meister, A. Revina, Discovery of stylistic patterns in business

process textual descriptions: It ticket case, Innov. Manage. Educ. Excell.

Through Vis. (2020).

[13] N. Rizun, A. Revina, V.G. Meister, Assessing business process complexity

based on textual data: Evidence from ITIL IT ticket processing, Bus. Process

Manage. J. (2021).

[14] A. Revina, K. Buza, V.G. Meister, IT ticket classification: The simpler, the

better, IEEE Access 8 (2020) 193380–193395.

[15] N. Rizun, A. Revina, Business sentiment analysis. concept and method for

perceived anticipated effort identification, in: Proceedings of the 28th In-

ternational Conference on Information Systems Development: Information

Systems Beyond 2020 (ISD 2019), Toulon, France, August 28-30, 2019,

2019.

[16] P. Bellan, M. Dragoni, C. Ghidini, Process extraction from text: state of the

art and challenges for the future, 2021, arXiv preprint 2110.03754.

[17] A. Revina, Business process management: Integrated data perspective. A

framework and research agenda, in: Proceedings of the 29th International

Conference on Information Systems Development: Crossing Boundaries

Between Development and Operations (DevOps) in Information Systems

(ISD2021), Valencia, Spain, September 8-10, 2021, 2021.

[18] K. Honkisz, K. Kluza, P. Wiśniewski, A concept for generating business

process models from natural language description, in: International Con-

ference on Knowledge Science, Engineering and Management, Springer,

2018, pp. 91–103.

[19] X. Han, L. Hu, L. Mei, Y. Dang, S. Agarwal, X. Zhou, P. Hu, A-BPS: Automatic

business process discovery service using ordered neurons LSTM, in: 2020

IEEE International Conference on Web Services, ICWS, IEEE, 2020, pp.

428–432.

[20] H. van der Aa, C. Di Ciccio, H. Leopold, H.A. Reijers, Extracting declarative

process models from natural language, in: International Conference on

Advanced Information Systems Engineering (CAiSE 2019), Springer, 2019,

pp. 365–382.

[21] A.J. Chambers, A.M. Stringfellow, B.B. Luo, S.J. Underwood, T.G. Allard,

I.A. Johnston, S. Brockman, L. Shing, A. Wollaber, C. VanDam, Automated

business process discovery from unstructured natural-language documents,

in: International Conference on Business Process Management (BPM 2020),

Springer, 2020, pp. 232–243.

[22] C. Kecht, A. Eggert, W. Kratsch, M. Roglinger, Event log construction from

customer service conversations using natural language inference, in: 3rd

International Conference on Process Mining (ICPM 2021), IEEE, 2021, pp.

144–151.

[23] C. Qian, L. Wen, A. Kumar, L. Lin, L. Lin, Z. Zong, S. Li, J. Wang, An

approach for process model extraction by multi-grained text classification,

in: International Conference on Advanced Information Systems Engineering

(CAiSE 2020), Springer, 2020, pp. 268–282.

[24] H.A. López, S. Debois, T.T. Hildebrandt, M. Marquard, The process

highlighter: From texts to declarative processes and back, in: In-

ternational Conference on Business Process Management (BPM 2018,

Dissertation/Demos/Industry), Vol. 2196, 2018, pp. 66–70.

[25] W.M.P. van der Aalst, Process Mining: Data Science in Action, Springer,

2016.

[26] M. Gupta, P. Agarwal, T. Tater, S. Dechu, A. Serebrenik, Analyzing com-

ments in ticket resolution to capture underlying process interactions, in:

International Conference on Business Process Management (BPM 2020),

Springer, 2020, pp. 219–231.

[27] A. Rebmann, H. van der Aa, Extracting semantic process information

from the natural language in event logs, in: International Conference on

Advanced Information Systems Engineering (CAiSE 2021), Springer, 2021,

pp. 57–74.

[28] A. Goossens, M. Claessens, C. Parthoens, J. Vanthienen, Extracting decision

dependencies and decision logic from text using deep learning techniques,

in: International Conference on Business Process Management (BPM 2021),

Springer, 2021, pp. 349–361.

[29] V. Etikala, Z.V. Veldhoven, J. Vanthienen, Text2Dec: extracting decision

dependencies from natural language text for automated DMN decision

modelling, in: International Conference on Business Process Management

(BPM 2020), Springer, 2020, pp. 367–379.

[30] L. Quishpi, J. Carmona, L. Padró, Extracting decision models from textual

descriptions of processes, in: International Conference on Business Process

Management (BPM 2021), Springer, 2021, pp. 85–102.

[31] L. Quishpi, J. Carmona, L. Padró, Extracting annotations from textual

descriptions of processes, in: International Conference on Business Process

Management, Springer, 2020, pp. 184–201.

[32] L. Ackermann, J. Neuberger, S. Jablonski, Data-driven annotation of textual

process descriptions based on formal meaning representations, in: Inter-

national Conference on Advanced Information Systems Engineering (CAiSE

2021), Springer, 2021, pp. 75–90.

[33] H. Wang, L. Wen, L. Lin, J. Wang, Rlrecommender: a representation-

learning-based recommendation method for business process modeling, in:

International Conference on Service-Oriented Computing, Springer, 2018,

pp. 478–486.

[34] S. Deng, D. Wang, Y. Li, B. Cao, J. Yin, Z. Wu, M. Zhou, A recommendation

system to facilitate business process modeling, IEEE Trans. Cybern. 47 (6)

(2016) 1380–1394.

[35] H. van der Aa, A. Rebmann, H. Leopold, Natural language-based detection of

semantic execution anomalies in event logs, Inf. Syst. 102 (2021) 101824.

[36] D. Sola, H. van der Aa, C. Meilicke, H. Stuckenschmidt, Exploiting label

semantics for rule-based activity recommendation in business process

modeling, Inf. Syst. (2022) 102049.

[37] M. Goldstein, C. González-Álvarez, Augmenting modelers with semantic

autocompletion of processes, in: International Conference on Business

Process Management (BPM 2021), Springer, 2021, pp. 20–36.

[38] H. Leopold, H. van der Aa, J. Offenberg, H.A. Reijers, Using hidden Markov

models for the accurate linguistic analysis of process model activity labels,

Inf. Syst. 83 (2019) 30–39.

[39] H. van der Aa, H. Leopold, H.A. Reijers, Checking process compliance

against natural language specifications using behavioral spaces, Inf. Syst.

78 (2018) 83–95.

[40] H. van der Aa, H. Leopold, H.A. Reijers, Comparing textual descriptions to

process models–the automatic detection of inconsistencies, Inf. Syst. 64

(2017) 447–460.

[41] J. Sànchez-Ferreres, H. van der Aa, J. Carmona, L. Padró, Aligning textual

and model-based process descriptions, Data Knowl. Eng. 118 (2018) 25–40.

[42] M. Kobeissi, N. Assy, W. Gaaloul, B. Defude, B. Haidar, An intent-based

natural language interface for querying process execution data, in: 3rd

International Conference on Process Mining (ICPM 2021), IEEE, 2021, pp.

152–159.

A. Revina and Ü. Aksu Information Systems 114 (2023) 102184

[43] H. Leopold, H. van der Aa, F. Pittke, M. Raffel, J. Mendling, H.A. Reijers,

Searching textual and model-based process descriptions based on a unified

data format, Softw. Syst. Model. 18 (2) (2019) 1179–1194.

[44] A. Yadav, D.K. Vishwakarma, Sentiment analysis using deep learning

architectures: a review, Artif. Intell. Rev. 53 (6) (2020) 4335–4385.

[45] E. Lüftenegger, S. Softic, Sentipromo: a sentiment analysis-enabled social

business process modeling tool, in: International Conference on Business

Process Management (BPM 2020), Springer, 2020, pp. 83–89.

[46] A. Mustansir, K. Shahzad, M.K. Malik, Towards automatic business process

redesign: an NLP based approach to extract redesign suggestions, Autom.

Softw. Eng. 29 (1) (2022) 1–24.

[47] T. Niesen, S. Dadashnia, P. Fettke, P. Loos, A vector space approach to

process model matching using insights from natural language processing,

in: Multikonferenz Wirtschaftsinformatik (MKWI 2016), Universitätsverlag

Ilmenau, 2016, pp. 93–104.

[48] N. Wang, S. Sun, D. OuYang, Business process modeling abstraction based

on semi-supervised clustering analysis, Bus. Inf. Syst. Eng. 60 (6) (2018)

525–542.

[49] F. Dai, M. Liu, Q. Mo, B. Huang, T. Li, Refactor business process models for

efficiency improvement, in: Cloud Computing, Smart Grid and Innovative

Frontiers in Telecommunications, Springer, 2019, pp. 454–467.

[50] F. Pittke, Linguistic Refactoring of Business Process Models. Dissertation,

Tech. Rep., Vienna University of Economics and Business, 2015.

[51] J. Mendling, H. Leopold, A. Polyvyanyy, Supporting process model val-

idation through natural language generation, Softw. Eng. P-252 (2016)

71–72.

[52] B. Aysolmaz, H. Leopold, H.A. Reijers, O. Demirörs, A semi-automated

approach for generating natural language requirements documents based

on business process models, Inf. Softw. Technol. 93 (2018) 14–29.

[53] L. Ackermann, Language-centric approaches for improving business pro-

cess model acceptance., in: International Conference on Business Process

Management (BPM 2018, Dissertation/Demos/Industry), 2018, pp. 51–55.

[54] L. Ackermann, Sprachzentrierte Ansätze zur Steigerung der Akzeptanz von

Geschäftsprozessmodellen. Dissertation, Tech. Rep., Universitaet Bayreuth

(Germany), 2018.

[55] K. Shahzad, S. Zaheer, R.M. Adeel Nawab, F. Aslam, On comparing manual

and automatic generated textual descriptions of business process models,

J. Softw.: Evol. Process 31 (11) (2019) e2204.

[56] Q. Zeng, X. Tang, W. Ni, H. Duan, C. Li, N. Xie, Missing procedural texts

repairing based on process model and activity description templates, IEEE

Access 8 (2020) 12999–13010.

[57] G. Yuan, Q. Zeng, H. Duan, W. Guo, W. Ni, N. Xie, Multi-language descrip-

tion text automatic generation of emergency disposal process, in: 2018

IEEE International Conference of Safety Produce Informatization, IICSPI,

IEEE, 2018, pp. 31–35.

[58] G. Yuan, Q. Zeng, W. Ni, C. Liu, C. Li, H. Duan, Multi-view and multi-

language description generation for cross-department medical diagnosis

processes, IEEE Access 6 (2018) 76741–76753.

[59] J. Cardoso, J. Mendling, G. Neumann, H.A. Reijers, A discourse on complex-

ity of process models, in: International Conference on Business Process

Management, Springer, 2006, pp. 117–128.

[60] V. Gruhn, R. Laue, Approaches for business process model complexity

metrics, in: Technologies for Business Information Systems, Springer, 2007,

pp. 13–24.

[61] R.D. Boomsma, I. Vanderfeesten, D. Fahland, H.A. Reijers, S. Cramer, An

evaluation of thresholds for business process model metrics, 2017.

[62] M. La Rosa, P. Wohed, J. Mendling, A.H.M. ter Hofstede, H.A. Reijers, W.M.P.

van der Aalst, Managing process model complexity via abstract syntax

modifications, IEEE Trans. Ind. Inform. 7 (4) (2011) 614–629.

[63] M. Benner-Wickner, M. Book, T. Brückmann, V. Gruhn, Examining case

management demand using event log complexity metrics, in: 18th

IEEE International Enterprise Distributed Object Computing Conference

Workshops and Demonstrations, EDOC Workshops 2014, Ulm, Germany,

September 1-2, 2014, IEEE Computer Society, 2014, pp. 108–115.

[64] J. De Weerdt, M. De Backer, J. Vanthienen, B. Baesens, A multi-dimensional

quality assessment of state-of-the-art process discovery algorithms using

real-life event logs, Inf. Syst. 37 (7) (2012) 654–676.

[65] P.H.P. Richetti, F.A. Baião, F.M. Santoro, Declarative process mining: Re-

ducing discovered models complexity by pre-processing event logs, in:

Business Process Management - 12th International Conference, BPM 2014,

Haifa, Israel, September 7-11, 2014. Proceedings, Vol. 8659, Springer, 2014,

pp. 400–407.

[66] G. Polančič, B. Cegnar, Complexity metrics for process models–A systematic

literature review, Comput. Stand. Interfaces 51 (2017) 104–117.

[67] J. Mendling, H.A. Reijers, W.M.P. van der Aalst, Seven process modeling

guidelines (7PMG), Inf. Softw. Technol. 52 (2) (2010) 127–136.

[68] D.M. Blei, Probabilistic topic models, Commun. ACM 55 (4) (2012) 77–84.

[69] G.K. Zipf, Human Behaviour and the Principle of Least Effort: An

Introduction to Human Ecology, Addison-Wesley, 1949.

[70] W.M.P. van der Aalst, Process Mining - Data Science in Action, second ed.,

Springer, 2016.

[71] T. Hærem, B.T. Pentland, K.D. Miller, Task complexity: Extending a core

concept, Acad. Manag. Rev. 40 (3) (2015) 446–460.

[72] B.T. Pentland, Conceptualizing and measuring variety in the execution of

organizational work processes, Manage. Sci. 49 (7) (2003) 857–870.

[73] J. vom Brocke, M.-S. Baier, T. Schmiedel, K. Stelzl, M. Röglinger, C. Wehking,

Context-aware business process management, Bus. Inf. Syst. Eng. 63 (5)

(2021) 533–550.

[74] M. Rosemann, J. Recker, C. Flender, Contextualisation of business processes,

Int. J. Bus. Process Integr. Manage. 3 (1) (2008) 47–60.

[75] T.L. Saaty, The analytic hierarchy and analytic network processes for the

measurement of intangible criteria and for decision-making, in: Multiple

Criteria Decision Analysis, Springer, 2016, pp. 363–419.

[76] A. Revina, Ü. Aksu, V.G. Meister, Method to address complexity in organi-

zations based on a comprehensive overview, Information 12 (10) (2021)

423.

[77] N. Rizun, Y. Taranenko, Simulation models of human decision-making

processes, Manage. Dyn. Knowl. Econ. 2 (2) (2014) 241–264.

[78] W. Daelemans, Explanation in computational stylometry, in: International

Conference on Intelligent Text Processing and Computational Linguistics,

Springer, 2013, pp. 451–462.

[79] G.K. Zipf, Selected studies of the principle of relative frequency in language,

1932.

[80] X. Zhu, A.B. Goldberg, Introduction to semi-supervised learning, Synth. Lect.

Artif. Intell. Mach. Learn. 3 (1) (2009) 1–130.

[81] Z.-H. Zhou, Ensemble Methods: Foundations and Algorithms, Chapman and

Hall/CRC, 2012.

[82] A. Revina, K. Buza, V.G. Meister, Designing explainable text classification

pipelines: Insights from IT ticket complexity prediction case study, in:

Interpretable Artificial Intelligence: A Perspective of Granular Computing,

Vol. 937, Springer, Cham, 2021, pp. 293–332.

[83] F.J. Gravetter, L.B. Wallnau, Essentials of Statistics for the Behavioral

Sciences, tenth ed., Cengage Learning, 2017.

[84] J. Cohen, Statistical Power Analysis for the Behavioral Sciences, second ed.,

Routledge, 1988.

[85] A. Bolt, M. de Leoni, W.M.P. van der Aalst, Process variant comparison:

Using event logs to detect differences in behavior and business rules, Inf.

Syst. 74 (2018) 53–66.

[86] B.L. Welch, The generalization of ‘student’s’problem when several different

population variances are involved, Biometrika 34 (1–2) (1947) 28–35.

[87] H.B. Mann, D.R. Whitney, On a test of whether one of two random variables

is stochastically larger than the other, Ann. Math. Stat. 18 (1) (1947) 50–60.

[88] W.M.P. van der Aalst, B.F. van Dongen, C.W. Günther, A. Rozinat, H.M.W.

Verbeek, T. Weijters, Prom: The process mining toolkit, in: Proceedings of

the Business Process Management Demonstration Track (BPMDemos 2009),

2009.

[89] Axelos, ITIL Foundation, Stationery Office Books, 2019, URL https://www.

axelos.com/best-practice-solutions/itil.

[90] R.L. Boyd, A. Ashokkumar, S. Seraj, J.W. Pennebaker, The Development and

Psychometric Properties of LIWC-22, University of Texas, Austin, TX, 2022,

URL https://www.liwc.app/help/psychometrics-manuals.

[91] K. Winter, S. Rinderle-Ma, Detecting constraints and their relations from

regulatory documents using nlp techniques, in: OTM Confederated In-

ternational Conferences ‘‘on the Move to Meaningful Internet Systems’’,

Springer, 2018, pp. 261–278.

[92] H.A. López, Challenges in legal process discovery, in: ITBPM@BPM 2021,

CEUR, 2021, pp. 68–73.

[93] J.C. de AR Goncalves, F.M. Santoro, F.A. Baião, Business process mining

from group stories, in: 2009 13th International Conference on Computer

Supported Cooperative Work in Design, IEEE, 2009, pp. 161–166.

[94] E. Bazhenova, F. Zerbato, B. Oliboni, M. Weske, From BPMN process models

to DMN decision models, Inf. Syst. 83 (2019) 69–88.