Deriving Event Logs from Legacy Software Systems [original]

Deriving Event Logs from

Legacy Software Systems

Marius Breitmayer1[0000−0003−1572−4573], Lisa Arnold1[0000−0002−2358−2571],

Stephan La Rocca2, and Manfred Reichert1[0000−0003−2536−4153]

1 Institute of Databases and Information Systems, Ulm University, Germany

{marius.breitmayer,lisa.arnold,manfred.reichert}@uni-ulm.de

2 PITSS GmbH Stuttgart, Germany {slarocca}@pitss.com

Abstract. The modernization of legacy software systems is one of the

key challenges in software industry, which requires comprehensive sys-

tem analysis. In this context, process mining has proven to be useful for

understanding the (business) processes implemented by the legacy soft-

ware system. However, process mining algorithms are highly dependent

on both the quality and existence of suitable event logs. In many scenar-

ios, existing software systems (e.g., legacy applications) do not leverage

process engines capable of producing such high-quality event logs, which

hampers the application of process mining algorithms. Deriving suitable

event log data from legacy software systems, therefore, constitutes a rele-

vant task that fosters data-driven analysis approaches, including process

mining, data-based process documentation, and process-centric software

migration. This paper presents an approach for deriving event logs from

legacy software systems by combining knowledge from source code and

corresponding database operations. The goal is to identify relevant busi-

ness objects as well as to document user and software interactions with

them in an event log suitable for process mining.

Keywords: Event Log Generation ·Legacy Software System ·Software

Modernization ·Process Mining

1 Introduction

Economically, one of the most important sectors in software industry concerns

the modernization of legacy software systems. These systems need to be replaced

by modern software systems showing better usability, higher performance, and

improved code quality. A successful modernization of a legacy software system

requires the analysis of the (business) processes implemented by the legacy soft-

ware, the interactions users have with the system, and the access points to system

information (e.g., source code or databases).

Process mining offers a plethora of analysis approaches to gain a broad under-

standing of the processes implemented in software systems. Process discovery, for

example, enables the derivation of process models from event logs [1]. In turn,

conformance checking correlates modeled and recorded behavior of a business

process, enabling the analysis of the observed process behavior in relation to a

given process model [2]. Finally, process enhancement allows improving business

processes based on the information recorded in event logs. In summary, most

process mining approaches highly depend on the existence of process event logs

as well as the quality of these logs.

In software modernization projects, legacy software systems need to be ana-

lyzed. In this context, the use of process mining approaches is very promising for

analyzing the processes implemented in these systems. However, most existing

legacy software systems neither have been designed based on pre-specified exe-

cutable process models nor do they provide extensive process logging capabilities.

As a consequence, the application of process mining to legacy software systems

is hampered and alternatives for obtaining models of the implemented processes

and, thus, for supporting the migration of the legacy software system to modern

technology are needed. Alternatives include, for example, extensive interviews

with key system users and process owners [3]. Both alternatives, however, are

time-consuming and prone to incompleteness.

This paper presents an approach to generate event logs from running legacy

software systems by combining knowledge from source code analysis, including

database statements, to discover the relevant business objects of a process as

well as to document user and software interactions in an event log suitable for

process mining. We consider the following research questions:

RQ1: How can we generate event logs from running legacy software systems?

RQ2: How can we ensure that the performance of legacy software systems is

not affected during event log generation?

The remainder of this paper is structured as follows: Section 2 introduces the

concepts necessary for understanding this work. Section 3 discusses the require-

ments for generating event logs from running legacy software systems. Section 4

presents the legacy software system analysis required for generating event logs.

Section 5 describes our approach and shows how one can extend a legacy software

system to generate event logs. Section 6 evaluates our work using a requirements

evaluation, a performance comparison, and a user survey. Section 7 discusses re-

lated work. Section 8 provides a short summary as well as an outlook.

2 Fundamentals

2.1 Legacy Software Systems

Legacy software systems are widespread in enterprises, but very costly to main-

tain due to bad documentation, outdated operating or development environ-

ments, or high complexity of the historically grown system code basis [4]. As

a result, the replacement of such legacy software systems is often significantly

delayed beyond the initial system lifespan. Legacy software systems consist of a

plethora of artifacts and resources such as servers, (non-normalized) databases,

source code, or user forms, which all may be used during legacy software sys-

tem analysis. Fig. 1 depicts a screenshot of an Oracle legacy software system

implemented in the 1990s. We will refer to this example in the following.

Fig. 1: Screenshot of a Legacy Software System

2.2 Event Logs

Event logs build the foundation for process mining algorithms and capture infor-

mation on cases, events, and corresponding activities [5]. In general, event logs

record events related to the execution of process instances. Mandatory attributes

of a log entry include the case identifier, the timestamp, as well as the executed

activity [5].

In the context of legacy software systems, which may support multiple pro-

cess types (e.g., order-to-cash, purchase-to-pay, or checking an invoice), it might

be unclear to which process type an activity belongs. Therefore, an additional

attribute indicating the process type is required when deriving event logs from

legacy software systems.

3 Requirements

In most cases, there exist no suitable event logs for process mining in legacy

software systems. This section elicits fundamental requirements to be met when

generating event logs from user and software interactions with legacy software

systems. On one hand, we gathered the requirements from literature [5]. On the

other, we conducted interviews with domain experts (e.g., software engineers,

and process owners) to complement these requirements. Amongst others, we

identified the following requirements:

Requirement 1: (Relevance) The event log should only contain process-

relevant data that refers to those interactions with the legacy software system

that correspond to a process (e.g., filling or completing a form). If an interaction

triggers an automated procedure (e.g., invocation of an operation in the legacy

software system), the resulting changes (e.g., to the database) should be recorded

in the event log as well.

Requirement 2: (Scope) Legacy software systems often use a plethora of

database tables and source code fragments that contain business-relevant data.

Identifying and scoping process-relevant database tables and code fragments

usually requires extensive domain knowledge that might not be available. An

approach for generating event logs from legacy software systems should therefore

minimize the domain knowledge required.

Requirement 3: (Consistency) To facilitate the preprocessing of the event

log data, the event log should be consistent with respect to timestamps, data

types, and additional resources, even if different software components of the

legacy software system (e.g., database and user forms) are involved.

Requirement 4: (Performance) The event log generation from a running

legacy system should not influence its performance, i.e., the user and software

interactions should not be influenced (e.g., due to increased loading times).

4 Legacy Software System Analysis

Analyze Legacy

Software System

Identify process-

relevant Code

Fragments and

Database Tables

Extend Legacy

Software System

with Event Log

Generation

(Code Tracker)

Generate Event Log

from User and

Software

Interactions

Synchronize Event

Log with Database

Information

Fig. 2: Preparation Steps of our Approach

We derived the approach for generating event logs from running legacy soft-

ware systems (cf. Fig. 2) by applying design science research [6].

In the first step, we analyze the legacy software system, including source

code, database tables, and additional resources (e.g., configuration files, user

forms displayed by the running legacy software system).

In the second step, we transform the source code of the legacy application

to an abstract syntax tree in order to identify those code elements that trigger

database operations (e.g., the selection, insertion, deletion or update of tuples

in database tables). Using the database tables in combination with the informa-

tion provided from the source code (e.g., the exact SQL statement), we address

an important problem of legacy software systems, i.e., we are able to identify

relations between tables that have not been explicitly specified using foreign-key

constraints. In other words, we identify additional relations between database

tables specified in the legacy application source code.

We can further build clusters of database tables that most likely belong to

the same process based on these identified relations. In Fig. 3, for example, tables

belonging to the cluster marked in green correspond to orders, whereas tables

of the purple cluster correspond to articles. We identify the center of a cluster

using a page rank algorithm [7]. Note that checking the identified clusters with

a domain expert (if possible) might further improve the event log generation (cf.

Requirements 1 and 2 in Section 3). After having identified the clusters in the

database tables, we can determine which source code fragments are relevant for

the generation of the event log, i.e., which code fragments affect process-relevant

database tables. This information can then be used to configure and install the

code tracker into the legacy software system.

The code tracker is able to automatically inject code fragments into the

source code, which, in turn, are then executed together with the legacy software

system code enabling the generation of event logs at runtime. To ensure that

the performance of the legacy software system is not negatively affected, the

necessary data is passed using common log mechanisms (e.g., java.util.logging or

ADRESSEN

LIEFERANTEN

KUNDEN

KENNZEICHEN

ANFRAGEPOS

ARTIKEL

ARTIKEL_BESCH

LIEF_ARTIKEL

ANGEBOTS_POS

ME_EINHEITEN

APL_AVG

RESS_BEDARF_APL_AVG

ARBEITSPLATZ_GRUPPEN

RESSOURCEN_KAPAZ_POS

ARBEITSVORGANG

FERTIGUNGSAUFTRAEGE

RESS_BEDARF_AVG

ARTIKEL_DISPO

ARTIKEL_PREISE

AUFTRAGS_POSITIONEN

AUFTRAGS_UNTERPOSITIONEN

AUFTR_ARTIKEL

AUFTR_STUELI_POS

BESTELLPOS

BESTELLVORSCHLAEGE

DISPOVORSCHLAEGE LAGERBEWEGUNGEN

PA_PREISE

PRIMAERBEDARFE

RESERVIERUNGEN

STUECKLISTEN

WARENEINGAENGE

TEILEARTEN

ME_UMRECH

ARTIKEL_VERTRIEB

LIEFERSCHEIN_POSITIONEN

AUFTRAEGE

SERIEN_POSITIONEN

PLANTOUR_POSITIONEN

RECHNUNGEN

LIEFERSCHEINE

AUFTRAGS_KONDITIONEN

LIEFERSCHEIN_KONDITIONEN

KONDITIONSARTEN

MUSTERARTIKEL

POSITIONSARTEN

LIEFERSCHEIN_UNTERPOSITIONEN

VARI_VERWENDUNGEN

KD_MW_H_TAB

BESTELLUNGEN

DIVO_FA_BESTELL

BKT

CAD_STUECKLISTEN

LAGERORT_BESTAENDE

KALK_FA_PB_ANTEILE

LIEFERSCHEINE_MAWI_BEZUG

DOKUMENT_PARAM

SE_FERTIGUNGSAUFTRAEGE

VARI_MERKMALE

JOB_PARAM

RELEV_MERKMALE

LAGER

KONDITIONS_REFERENZEN

PREISKONDSTRATEGIEN_POS

PA_BESTAENDE

RECHNUNGS_POSITIONEN

RECHNUNGS_UNTERPOSITIONEN

PERSONAL_GRUPPEN

PLAN_BEREICHE

PREIS_REFERENZEN

RECHNUNGEN_RP

RECHNUNGS_POSITIONEN_RP

VERT_H_TAB

Fig. 3: Clusters derived from Database and Source Code

Oracle message-builtIns) already available in the legacy software system. This

yields the advantage that the existing infrastructure, in which the legacy software

system operates, takes care of managing files, rotating data and, thus, providing

methods for writing data to an event log in a performant manner. Consequently,

the transfer of event log data becomes possible with minimal footprint. In a last

step, we synchronize the event log with the information from the database (e.g.,

redo logs) enabling the generation of high-quality event logs.

5 Event Log Generation

In the context of a legacy software system, a business process can be derived

from the sequences of interactions the users have with the legacy application.

Each interaction of such a sequence is then subject-bound (i.e., the interactions

of a sequence belong to the same transaction). In a legacy software system, such

processes may be initiated and terminated using pre-defined actions, for exam-

ple, menu items or key combinations. The addition of corresponding actions to

an event log, together with the associated application object (e.g., a product

identified by a unique product number, or an order identified by its order num-

ber) constitutes the basis for generating an event log. Subsequently, this event

log may then serve as input for process mining algorithms.

5.1 Legacy Software System Extension

After showing how process-relevant source code fragments can be identified in

the legacy software system (cf. Section 4), we discuss how to augment the legacy

software system with event log generation capabilities by installing the code

tracker. This installation utilizes our ability to parse the relevant source code

fragments and to map them as an abstract syntax tree [8].

Leveraging this source code information, we can add the code tracker nodes at

the relevant positions of the software code, i.e., “start”, “end”, “return”, “exit”,

and “exception”, surrounding a create-, read-, update- or delete-statement (CRUD-

statement). Each code tracker statement then captures the context (i.e., the

position of the relevant source code in the entire legacy software system), the

timestamp, the identifier of the corresponding user session, and, optionally, ad-

ditional parameters of the identified source code fragments.

Adding the code tracker to the legacy software system is implemented as a

pre-deployment task. Thus, no developer interaction becomes necessary. In a de-

ployment chain, relevant code is checked out, parsed, added to the tracker, saved,

compiled, and then deployed to the running legacy software system. This inte-

gration ensures that any kind of source code change or release of new software

versions can be captured, hence, preventing mismatches between the running

code and the information captured in the generated event log. As an example,

consider the code fragment depicted in Fig. 4a, which is responsible for han-

dling a user interaction event. When applying the code tracking pre-deployment

task to this code fragment, we obtain the code fragment depicted in Fig. 4b.

In the latter, the event log generation is added to lines 2, 5, 7, and 9. Note

that ScreenName and EventName constitute placeholders that are replaced by

the actual values at runtime. An example of such actual values could be OR-

DERS.MAIN CANVAS.BUTTON SAVE.WHEN BUTTON PRESSED.

(a) Source Code (b) Extended Source Code

Fig. 4: Example Source Code Fragments

During event log analysis, such values provide important contextual infor-

mation and enable a failure-free identification of documented user and software

interactions. There exists a plethora of user interactions, e.g., pressing a button,

entering a value into a form field, clicking on a check box, or navigating be-

tween elements. As long as the legacy software system implements these events

as process-relevant in the source code, the code tracker is added.

Merging user interactions with database events In addition to the user

interaction events gathered by the code tracker, we analyze all database updates

(i.e., insert, update and delete) expressed in terms of Data Manipulation Lan-

guage (DML) statements. For this analysis, we utilize the redo log capabilities

provided by the legacy software system database. Redo logs are created by trans-

actional databases, to enable recovery in case of failures (e.g., after crashes). The

information contained in a redo log consists, for each recorded operation, of the

name of the database table, the performed operation (i.e., insert, update, or

delete), the timestamp, the session-id, and the original DML statement applied

to the database [9].

From the source code extension (cf. Fig. 4b), for each event, we can also ex-

tract the timestamp, session-id, and the affected database table. Combining these

three attributes enables the allocation between user or software interactions and

the corresponding changes to the persistence layer of the legacy software system.

Leveraging the information from redo logs, again ensures that no performance

penalties emerge due to the event log generation.

Using the code tracker functions, the information captured in the event log

is significantly increased compared to an event log solely generated from the

database schema [10], as we can unambiguously link processes with both program

code and related data. Therefore, time-consuming reverse engineering and root

cause analysis are not needed as the connection between source code, data, and

processes already exists.

Finally, one valuable effect for software modernization can be achieved: miss-

ing entries in the event log indicate that process parts implemented in the legacy

software system have never been used. This information is vital for modernizing

legacy software systems as the code fragments may correspond to technical debt

and must therefore not be migrated [11].

5.2 Recording User Interactions

Once the code tracker is installed, we are able to document the interactions of

users with the legacy application, including resulting software interactions. For

recording user interactions, we support two variants [12]:

Silent Recording shall record the use of the legacy application, starting with

the login a of user until closing the legacy application. We allow specifying

which information shall be recorded and in which form. For example, personal

data may only be logged in an anonymized way. By only recording selected user

sessions (e.g., sessions of users from a certain department), we can further restrict

the recording of user interactions to relevant user groups (e.g., users handling

invoices) in a fine-grained fashion.

Dedicated Recording aims to record existing (i.e., already identified) pro-

cesses implemented in the legacy software system. Users may define the start

and end of the recording (e.g., through predefined key combinations), and pro-

vide additional information about the recorded process. This, in turn, allows for

a precise delimitation of the interactions corresponding to a process.

6 Evaluation

The evaluation of our approach is threefold: First, we assess whether the iden-

tified requirements are met. Second, we analyze the performance of an Oracle

legacy software system to which we applied our approach. Third, we applied

process discovery algorithms to the derived event logs and evaluate the resulting

process models with domain experts. In total, the legacy software system used

to evaluate the approach comprises 589 database tables with 9977 columns. Ad-

ditionally, 60712 database statements (including more than 8000 different state-

ments) were implemented in a total of over 5 million lines of code. Furthermore,

the legacy software system comprises 1285 forms and 6243 different screens. The

event log was created using dedicated recording (cf. Section 5). In other words,

the users in this event log were able to provide additional information of the

recorded business process (e.g., name and description of the process). Addition-

ally, we applied the approach using silent recording to the legacy software system

of an insurance company1.

6.1 Requirements Evaluation

To evaluate Requirement 1 (Relevance), according to which the event log shall

solely contain process-relevant information, we conduct an in-depth and auto-

matic analysis of the legacy software system by identifying and clustering im-

portant tables and source code fragments (cf. Section 4). This enables us to

distinguish between relevant and non-relevant information. As a result, we are

able to configure the code tracker to ensure that only relevant data is collected.

Requirement 2 (Scope) deals with the scope of the legacy software system

and aims to minimize the amount of domain knowledge needed for the analysis.

By analyzing the source code, we are able to identify which code fragments refer

to which database tables. Clustering the database tables (cf. Section 4) allows

grouping the tables that belong to the same context. This enables a best guess

approach that may be checked by domain experts to further improve the event

log generation. Compared to alternative approaches (e.g., extensive interviews),

our approach requires significantly less domain knowledge.

Requirement 3 (Consistency) refers to consistency with respect to data types,

timestamps, and resources. While we account for consistency regarding data

types (e.g., timestamp formats and variables), due to the automated nature of

our approach, the fulfillment of this requirement also depends on the consistency

of the analyzed legacy software system as well as the underlying database.

According to Requirement 4 (Performance) the event log generation must not

affect the performance of the legacy software system or user interactions with

the legacy application. Typically, the generation of redo log files, archive log

files based on the redo log files, as well as the log rotation capabilities are tuned

to not influence the performance of the analyzed legacy software system. For

further analysis, the generated event log is extracted asynchronously to ensure

that the extraction neither impacts users nor the performance of the running

legacy software system. Additionally, the logging of user interactions focuses

on the relevant actions identified during legacy software analysis. Furthermore,

the logging is running in a separate, isolated transaction to the user session.

1Event logs provided: https://cloudstore.uni-ulm.de/s/7jYeRnXtcsk2Wfd

Finally, the collected event data is also persisted in a separate storage to not

affect performance.

6.2 Performance Analysis

To further evaluate the performance effects of our approach on the considered

legacy software system, we executed the same 3 processes multiple times (N=10)

with and without event log generation and measured the duration of the follow-

ing performance metrics: navigation, loading time, and function call. Note that

due to limitations of the legacy software system, timestamps could only be col-

lected every 10 milliseconds. In other words, differences of up to 20 ms might

exist. Figs. 5 - 7 depict the collected performance metrics. When navigating

through the legacy software system the average duration decreased by 25 ms.

The average loading times decreased by 30 ms after adding the event log gener-

ation. These differences are in range of the timestamp limitations of the legacy

software system. Therefore, we can conclude that the event log generation does

not significantly impact navigation and loading times. On average, the duration

of function calls increased by 0.65 seconds (+18.2%) per function call. However,

after closer inspection, this increase is mainly due to recursive function calls that

generate event log entries with each iteration. We are able to only record one

event log entry for recursive function calls, consequently reducing the increase

to the level of non-recursive function calls. Concerning the latter, we observed

an average increase of 14 ms (1.73%). Across all observed performance metrics,

the differences do not impact typical user and software interactions.

Fig. 5: Navigation Fig. 6: Loading Time Fig. 7: Function Calls

6.3 Initial Process Discovery

We applied several process discovery algorithms to the event logs generated with

our approach using default algorithm configurations. Next, we showed the re-

sulting process models to domain experts (N=13) and asked them to evaluate

to which degree they are able to recognize the legacy software system in each

process model on a 5-Point Lickert scale from not at all to completely. Overall,

the domain experts rated the process model generated by the Heuristic Miner

(threshold = 0.9) best (Mean = 4.45, SD = 0.63). This indicates that process

models discovered from the generated event log adequately represent the behav-

ior of processes implemented by the legacy software system.

Table 1: Domain Expert Recognition of Discovered Process Models (N=13)

Inductive Inductive DFG Heuristic Heuristic Heuristic

(Tree) (BPMN) (thold=0.75) (thold=0.9) (thold=0.95)

Mean (SD) 3.08 (1.07) 3.08 (0.73) 2.31 (1.2) 4.15 (0.77) 4.46 (0.63) 3.38 (1.27)

While the results could be improved using additional process discovery algo-

rithms or fine-tuning parameters, they emphasize the high quality of generated

event logs as no additional event log preparation was required.

7 Related Work

This paper is related to event log generation, robotic process automation, and

legacy software system analysis.

Process mining algorithms require event logs and, therefore, the generation

of event logs from various sources has gained great attention [13]. Databases

are often used as the main resource for extracting event data from information

systems [10, 14]. A quality-aware and semi-automated approach to extract event

logs from relational data is presented in [15]: users may select event log attributes

from available data columns, assisted by data quality metrics. In the context

of legacy software systems, however, relying solely on the information present

in databases is not sufficient, as important process-relevant knowledge is often

captured in the source code as well as the displayed user forms, but cannot be

discovered from the database solely. For example, legacy databases are often not

normalized and miss important information, e.g., foreign key constraints.

In the field of Robotic Process Automation (RPA) [16], user interface in-

teractions and software robots are used to replicate human tasks. An approach

for recording the interactions with user interfaces and the generation of user

interface event logs is presented in [17]. A pipeline of processing steps enabling

robotic process mining tools to generate RPA scripts from UI logs is presented

in [18]. [19] presents an UI logger that generates an event log from multiple user

interfaces. As opposed to [17–19], our approach accounts for the effects on the

legacy software system (e.g., exact database statements), i.e., it does not only

consider the user interface interactions in isolation.

In [20], a framework to recover workflows from an e-commerce scenario is

presented, leveraging static analysis to identify business knowledge from source

code. Similarly, [21] presents an approach for recovering business knowledge from

legacy application databases by inspecting the data stored within the database.

As our approach also aims to identify business knowledge from legacy software

systems, it differs from [20, 21]. Instead of extracting business knowledge from

static analysis, we generate event logs that represent business knowledge using

interactions with the legacy software systems.

[22] deals with the generation of event logs from legacy software systems

by first extending the source code and then recording the event logs. In con-

trast, our approach requires less domain knowledge for generating the event logs

as we derive relevant source code fragments from the clusters identified in the

database (including foreign-key constraints specified in the source code) rather

than domain experts or system analysts. Additionally, we support two event

log generation variants (silent and dedicated) that enable further insights into

specific processes implemented in the legacy software system.

8 Summary and Outlook

This paper presented an approach for generating event logs from running legacy

software systems with minimal domain knowledge. We combine information from

source code analysis and the database structure to identify tables and source code

fragments relevant in the context of supported business processes.

Further, we identify which database tables and source code fragments may

correspond to a specific process (e.g., handling an invoice) using a cluster anal-

ysis. We then automatically inject event log generation functions to the legacy

software system to track user and software interactions with the legacy soft-

ware system, while at the same time recording the resulting database transac-

tions. Next, we document user interactions with the application and the resulting

database changes from the running legacy application in a user-decided fashion.

We then combine both logs to correlate user interactions with corresponding

database changes to obtain event logs suitable for process mining.

We evaluated the approach based on the requirements identified with domain

experts, a performance analysis of the legacy software system, and the applica-

tion and evaluation of initial process discovery algorithms. The requirements are

met, enabling the generation of comprehensive event logs from legacy software

systems with the approach. A performance evaluation using an Oracle legacy

software system has shown that our event log generation does not impact the

performance of the legacy software system, and initial process models discovered

were able to adequately represent the legacy software system for domain experts

using the event logs generated with the approach.

In future work, we will apply the presented approach to additional legacy

software systems. Additionally, we will increase the quality of the discovered

process models for non-experts using more intuitive event log labels based on

the legacy software system.

Acknowledgments This work is part of the SoftProc project, funded by the

KMU-innovativ Program of the Federal Ministry of Education and Research,

Germany (F.No. 01IS20027A)

References

1. W. M. P. van der Aalst, A. Adriansyah, A. K. A. De Medeiros, F. Arcieri, T. Baier,

T. Blickle, J. C. Bose, P. Van Den Brand, R. Brandtjen, J. Buijs et al., “Process

mining manifesto,” in Int’l Conf on BPM. Springer, 2011, pp. 169–194.

2. A. Rozinat and W. M. P. van der Aalst, “Conformance checking of processes based

on monitoring real behavior,” Information Systems, vol. 33, no. 1, pp. 64–95, 2008.

3. M. Dumas, M. L. Rosa, J. Mendling, and H. A. Reijers, Fundamentals of Business

Process Management, 2nd ed. Springer, 2018.

4. M. Feathers, Working effectively with legacy code. Addison-Wesley, 2013.

5. W. M. P. van der Aalst, Process Mining: Data Science in Action, 2nd ed. Springer

Berlin Heidelberg, 2016.

6. R. J. Wieringa, Design science methodology for information systems and software

engineering. Springer, 2014.

7. L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citation ranking:

Bringing order to the web.” Stanford InfoLab, Technical Report 1999-66, 1999,

previous number = SIDL-WP-1999-0120.

8. B. Fluri, M. Wursch, M. Pinzger, and H. Gall, “Change distilling:tree differencing

for fine-grained source code change extraction,” IEEE Transact’ on Softw Eng,

vol. 33, no. 11, pp. 725–743, 2007.

9. E. G. L. de Murillas, W. M. P. van der Aalst, and H. A. Reijers, “Process min-

ing on databases: Unearthing historical data from redo logs,” in Business Process

Management. Springer, 2015, pp. 367–385.

10. W. M. P. van der Aalst, Extracting Event Data from Databases to Unleash Process

Mining. Springer, 2015, pp. 105–128.

11. W. Cunningham, “The wycash portfolio management system,” SIGPLAN OOPS

Mess., vol. 4, no. 2, 1992.

12. M. Breitmayer, L. Arnold, and M. Reichert, “Towards retrograde process analysis

in running legacy applications,” in Proceedings of the 14th ZEUS Workshop, vol.

3113. CEUR-WS.org, 2022, pp. 11–15.

13. D. Dakic, D. Stefanovic, T. Lolic, D. Narandzic, and N. Simeunovic, “Event log

extraction for the purpose of process mining: A systematic literature review,” in

Innov’ in Sust’ Mngmt and Entr’. Springer, 2020, pp. 299–312.

14. D. Calvanese, M. Montali, A. Syamsiyah, and W. M. P. van der Aalst, “Ontology-

driven extraction of event logs from relational databases,” in BPM Workshops.

Springer, 2016, pp. 140–153.

15. R. Andrews, C. van Dun, M. Wynn, W. Kratsch, M. R¨oglinger, and A. ter Hofst-

ede, “Quality-informed semi-automated event log generation for process mining,”

Decision Support Systems, vol. 132, p. 113265, 2020.

16. J. Wewerka and M. Reichert, “Robotic process automation - a systematic mapping

study and classification framework,” Enterprise Information Systems, 2022.

17. D. Choi, H. R’bigui, and C. Cho, “Enabling the gab between rpa and process

mining: User interface interactions recorder,” IEEE Access, vol. 10, pp. 39 604–

39 612, 2022.

18. V. Leno, A. Polyvyanyy, M. Dumas, M. La Rosa, and F. Maggi, “Robotic process

mining: Vision and challenges,” Bus’ & Inf’ Sys’ Eng’, vol. 63, 06 2021.

19. J. M. L´opez-Carnicer, C. del Valle, and J. G. Enr´ıquez, “Towards an open-

source logger for the analysis of rpa projects,” in Business Process Management:

Blockchain and Robotic Process Automation Forum. Springer, 2020, pp. 176–184.

20. Y. Zou and M. Hung, “An approach for extracting workflows from e-commerce

applications,” in 14th IEEE ICPC’06, 2006, pp. 127–136.

21. R. P´erez-Castillo, D. Caivano, and M. Piattini, “Ontology-based similarity applied

to business process clustering,” J. Softw. Evol. Process, vol. 26, no. 12, pp. 1128–

1149, 2014.

22. R. P´erez-Castillo, B. Weber, J. Pinggera, S. Zugal, I. G. R. de Guzm´an, and M. Pi-

attini, “Generating event logs from non-process-aware systems enabling business

process mining,” Enterp. Inf. Syst., vol. 5, no. 3, pp. 301–335, 2011.