Document [original]

processes

Article

Monitoring Parallel Robotic Cultivations with Online

Multivariate Analysis

Sebastian Hans 1,*,†, Christian Ulmer 1,†, Harini Narayanan 2, Trygve Brautaset 3,

Niels Krausch 1, Peter Neubauer 1, Irmgard Schäffl1, Michael Sokolov 4and

Mariano Nicolas Cruz Bournazou 4,*

1Chair of Bioprocess Engineering, Institute of Biotechnology, Technische Universität Berlin, Ackerstraße 76,

13355 Berlin, Germany; [email protected] (C.U.); [email protected] (N.K.);

peter[email protected] (P.N.); irmgard.schaeffl@campus.tu-berlin.de (I.S.)

Department of Chemistry and Applied Biosciences, Institute of Chemical and Bioengineering, ETH Zürich,

Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland; [email protected]

3Department of Biotechnology and Food Science, Norwegian University of Science and Technology,

Sem Sælandsvei 6-8, 7491 Trondheim, Norway; [email protected]

4DataHow AG, c/o ETH Zürich, HCl, F137, Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland;

m.sokolov@datahow.ch

*Correspondence: [email protected] (S.H.); n.cruz@datahow.ch (M.N.C.B.)

†These authors contribute equally to this work.

Received: 26 April 2020; Accepted: 11 May 2020; Published: 14 May 2020





Abstract:

In conditional microbial screening, a limited number of candidate strains are tested at

different conditions searching for the optimal operation strategy in production (e.g., temperature and

pH shifts, media composition as well as feeding and induction strategies). To achieve this, cultivation

volumes of >10 mL and advanced control schemes are required to allow appropriate sampling and

analyses. Operations become even more complex when the analytical methods are integrated into the

robot facility. Among other multivariate data analysis methods, principal component analysis (PCA)

techniques have especially gained popularity in high throughput screening. However, an important

issue specific to high throughput bioprocess development is the lack of so-called golden batches that

could be used as a basis for multivariate analysis. In this study, we establish and present a program

to monitor dynamic parallel cultivations in a high throughput facility. PCA was used for process

monitoring and automated fault detection of 24 parallel running experiments using recombinant

E. coli cells expressing three different fluorescence proteins as the model organism. This approach

allowed for capturing events like stirrer failures and blockage of the aeration system and provided a

good signal to noise ratio. The developed application can be easily integrated in existing data- and

device-infrastructures, allowing automated and remote monitoring of parallel bioreactor systems.

Keywords:

high throughput bioprocess development; online data analysis; multivariate analysis;

principalcomponentanalysis; laboratory automation; SiLA;designofexperiments; bioprocess monitoring

1. Introduction

Microbial cell factories are widely used for biotechnological production processes. The development

of effective bioprocesses requires screening of many microbial strains under various cultivation

conditions. State-of-the-art bioprocess development is known to be time consuming, laborious, and for

having a low success rate [

]. Process parameters, such as microbial host, vector size and copy-number,

feeding strategy, and media composition have a significant impact on the profitability, reliability,

and sustainability of the final manufacturing process. Considering all these factors and evaluating

them in relation to each other often calls for a high number of cultivations. To ensure reliability

Processes 2020,8, 582; doi:10.3390/pr8050582 www.mdpi.com/journal/processes

Processes 2020,8, 582 2 of 16

of data and results, parallel cultivations providing a number of replicates to compensate for the

variance of biological systems are additionally needed. Many bioprocess conditions are difficult to

study due to the lack of suitable high throughput (HT) facilities to perform all these cultivations in

a short time. Consequently, current process improvement strategies in high throughput bioprocess

development (HTBD) are based on expert knowledge [

] and static design of experiments (DoE) [

–

In order to reduce the number of cultivations to an appropriate level, many factors are only partially

and incompletely weighed against each other. On the other hand, emerging technologies such as

automation and digitization enable a faster product development and shorter cycles from construction

of a microbial clone to an optimal bioprocess [

–

] by increasing the possible number of parallel

cultivations. Although model-based tools (e.g., for process control or computer aided design) are

the standard in other industries [

–

], they are rarely used in bioprocess development despite their

big potential [

]. A major challenge is the lack of suitable tractable mathematical models that are

required but difficult to develop due to the complexity of biological systems [

]. This is especially

the case for process development, where knowledge of the new microbial strains is limited and an

exhaustive investigation of mutants likely to be discarded is considered unnecessary.

The difference between process control (batch vs. fed-batch), cultivation system (e.g., shake flask,

multi well plate or lab scale bioreactor vs. production scale bioreactor) and the resulting different

metabolic and stress conditions between screening and manufacturing makes scale-up between

these stages challenging. An essential factor for scale up is a detailed knowledge about the process

dynamics [

]. Changes in pH, oxygen limitation or the availability of media components should

be considered during the conditional screening phase. The technical requirements are already met

by modern HT robots with developments at the

L scale. Significant advantages are offered by

mini-bioreactors (MBR) with working volumes between 10 and 250 mL [

–

], integrated online

sensors for pH and DOT, integrated control of pH and substrate feeding [

–

] and automated at-line

sampling and analysis [

]. Additionally, computer aided tools that enable advanced process

control and feedback operation of the robots and the MBRs have been developed, but are rarely

used [17,18,20,22,24–26].

However, the number of parallel cultivations made possible in this way is now hardly achievable

in the present way, with manual corrections and interventions. Therefore, process monitoring of

parallel cultivations is a major challenge in HTBD, especially if no models to describe the bioprocess

are available to guide the operator. Fully automated solutions that include process and feeding control,

online and at-line monitoring are very challenging in parallel MBR cultivations [

]. The main

difficulties are the analysis of multi-dimensional and highly correlated data sets, monitoring and

operating a large number of bioreactors and the intrinsic need to solve an optimal experimental design

problem considering all MBRs simultaneously [

] in a period of time. Additional challenges arise

when industrial conditions are investigated at the milliliter scale [22,25].

In production processes, the operation strategy is well defined, historical data is at hand, and the

goal is typically to run the current cultivations within predefined critical quality attributes (CQAs)

or the “golden batch” conditions. Under these conditions, multivariate analysis (MVA) tools have

been widely applied to supervise the process and assure its reproducibility [

]. MVA tools such as

principal component analysis (PCA) have become increasingly popular in the field of bioprocesses

due to their capability to represent highly correlated multi-dimensional datasets in a reduced space,

separating process noise while retaining maximum information. Some of the early applications of PCA

include process monitoring, detection of failures or anomalies, and statistical process control [

–

However, for all these applications, a PCA model is usually trained based on an “in-control”, i.e., “golden

batch”, basis to detect deviations from the targeted production run characteristics [34].

During screening, where the goal is to find new strains best suited for the bioprocess at hand,

this training data is of course unavailable. The lack of a “golden batch” makes it very challenging to

diagnose or even identify faults or disturbances in cultivations with no historical data. Although in

principle historical data of similar processes can be used as reference points for development, one cannot

Processes 2020,8, 582 3 of 16

rely on pre-defined process behavior and constraints, as is typically the case in production. Due to

various factors, the physiology and phenotype of the cells is known to vary during cultivation time [

]

(e.g., population heterogeneities [

]). This makes choosing a setup and tuning of control strategies in

advance very difficult. For this reason, we need to develop tools that exploit the information generated

in parallel MBRs online to rapidly develop models for process monitoring and to project the large data

sets into a low dimensional visual representation.

Our previous work showed that PCA can be used in parallel MBR experiments to identify and

improve feeding strategies with a low number of experiments [

]. In this work we developed a

program that enables the monitoring of parallel dynamic cultivations in real-time, supported with

visual representations as well as automated event triggers (Figure 1). The added value of this method is

enhanced process monitoring and automated fault detection. This is demonstrated in an experimental

campaign with 24 parallel MBR operated by a fully automated robotic system. This program allows

a compressed representation and visualization of the ongoing experiments, enabling a comparison

between the states of parallel cultivations. The PCA method is applied in a moving horizon framework

to allow a rapid detection of specific events and to track the dynamic evolution of the reactors. The two

approaches together provide an informative overview of the bioreactor’s performance and state.

Thus, they enable the operator to determine whether all parallel cultivations are running within

critical parameter limits and display a warning in case of deviations. Critical bioreactors can be easily

identified and tended to.

Processes2020,8,xFORPEERREVIEW4of18



Figure1.Screenshotsofthemonitoringapplication.Screenshot(a)depictsthelandingpagewhere

theuserselectsanexperiment,adaptssettings,andstartsthedataqueryandsubsequentonlinedata

analysis;(b)showsthemainpanelwithplotsforprocessdataandresultsfromthePCA;and(c)

depictsthecentralcontrolappofthedigitaltwin(DTW)oftheprocess.Itallowsforfastprocess

monitoringviadirectcontrolandobservationofsingleMBRsaswellasfastidentificationofprocess

failuresbasedonPCA.

2.MaterialsandMethods

2.1.ExperimentalFacility

Thewetexperimentswereperformedinahighthroughputbioprocessdevelopmentfacility

composedoftwoliquidhandlingstations(LHS)(FreedomEVO200,TecanGroupLtd.,Männedorf,

Switzerland;MicrolabStar,HamiltonCompany,Bonaduz,Switzerland)andoneminibioreactor

system(bioREACTOR48,2magAG,Munich,Germany)whichismountedonthefirstLHS(Tecan).

Theentirefacilityanditsfunctionalityisdescribedin[21].Totriggernon‐optimalmicrobial

cultivationconditionsthevolumebalancecontroldescribedbyHabyetal.[21]wasswitchedoff.

2.2.MicrobialModelStrainandCultures

AllcultivationswereperformedwithE.coliK‐12BW25113(F‐,DE(araD‐araB)567,

lacZ4787(del)::rrnB‐3,LAM‐,rph‐1,DE(rhaD‐rhaB)568,hsdR514)carryingplasmidpAG032.As

describedbyGawinetal.2018[37]theplasmidpAG032containsthreefluorescentprotein(CFP,YFP,

RFP)encodinggenes,eachundertranscriptionalcontrolofdifferentpromotors.TheCFPfluorescent

signalgeneiscoupledtoanσ

relatedpromoterandisconstitutivelyexpressed.TherpsJconstitutive

promoterisresponsiblefortheYFPexpressionandservesasanindicatorforthenumberofribosomes

Figure 1.

Screenshots of the monitoring application. Screenshot (

) depicts the landing page where

the user selects an experiment, adapts settings, and starts the data query and subsequent online data

analysis; (

) shows the main panel with plots for process data and results from the PCA; and (

) depicts

the central control app of the digital twin (DTW) of the process. It allows for fast process monitoring

via direct control and observation of single MBRs as well as fast identification of process failures based

on PCA.

Processes 2020,8, 582 4 of 16

In addition, they enable an automated and secure transfer of the cultivation data during the

runtime of the experiment. This allows the often computationally intensive online data evaluation to

be performed in specialized laboratories, as a service or by project partners. Here we used an efficient

protocol for communication, enabling the monitoring of the robotic experiments remotely, reducing

the physical barrier separating theoretical work and practical wet lab. The implementation is based on

a SILA2 protocol (preliminary standard as of January 2019).

2. Materials and Methods

2.1. Experimental Facility

The wet experiments were performed in a high throughput bioprocess development facility

composed of two liquid handling stations (LHS) (Freedom EVO 200, Tecan Group Ltd., Männedorf,

Switzerland; Microlab Star, Hamilton Company, Bonaduz, Switzerland) and one mini bioreactor system

(bioREACTOR 48, 2mag AG, Munich, Germany) which is mounted on the first LHS (Tecan). The entire

facility and its functionality is described in [

]. To trigger non-optimal microbial cultivation conditions

the volume balance control described by Haby et al. [21] was switched off.

2.2. Microbial Model Strain and Cultures

All cultivations were performed with E. coli K-12 BW25113 (F-, DE(araD-araB)567,

lacZ4787(del)::rrnB-3, LAM-, rph-1, DE(rhaD-rhaB)568, hsdR514) carrying plasmid pAG032. As described

by Gawin et al. 2018 [

] the plasmid pAG032 contains three fluorescent protein (CFP, YFP, RFP)

encoding genes, each under transcriptional control of different promotors. The CFP fluorescent signal

gene is coupled to an

σ32

related promoter and is constitutively expressed. The rpsJ constitutive

promoter is responsible for the YFP expression and serves as an indicator for the number of ribosomes

in the cell. As placeholder for a recombinant product RFP expression is under control of the XylS/Pm

promoter system. Frozen cells (stored at

−

◦

C) were transferred into TY medium and kept for 5 h at

◦

C. Afterwards an aliquot was added to EnPresso B medium (Enpresso GmbH, Berlin, Germany)

with 6 U L

−1

Reagent A and stored at 37

◦

C overnight. The main culture was adjusted to an OD

600

of 1

in MS medium [

] with 5 g L

−1

glucose and distributed to the MBR culture vessels as 10 mL aliquots.

Ampicillin (0.1 g L

−1

) was added to all cultures for plasmid maintenance. The main cultivation was

started with 2000 rpm at 37

◦

C. The stirrer speed was increased stepwise by 200 rpm every 5 min

to 3000 rpm. The maximum specific growth rate of E. coli BW25113 (pAG032) was calculated based

on previous experiments without induction at 0.72 h

−1

. The applied feed rates are summarized in

Table 1, the calculations are based on the equations by Enfors 2019 [

]. After six hours of cultivation

the cultures were induced by the addition of 50

L 0.1 M m-toluic acid to a final concentration 0.5 mM.

Table 1.

Summary of applied feed rates for cultivation of E. coli BW25113 (pAG032). The listing of the

reactors in columns and rows represents the reactor allocation at the two liquid handling stations (LHS).

Reactor µset % of µmax

1 2 3 0.65 90

4 5 6 0.58 80

7 8 9 0.50 70

10 11 12 0.43 60

13 14 15 0.36 50

16 17 18 0.29 40

19 20 21 0.22 30

22 23 24 0.14 20

2.3. Sampling and Analytics

On-line measurements for pH and DOT were taken every 30 s. For at-line analysis, the MBRs

were sampled column-wise every 15 min during the cultivation, and samples were directly transferred

Processes 2020,8, 582 5 of 16

into V-shaped 96-microwell plates. The sampling plates contained 15

L dried 2M NaOH to ensure

direct metabolic inactivation of the samples [

]. The samples were stored for five cycles on the LHS

deck at 4

◦

C before being transferred to the second analytic LHS. The sample storage time on the

LHS deck was between 2 and 75 min. For OD

600

and fluorescence measurements the samples were

diluted by the LHS [

] and measured in a Synergy

MX microwell plate reader (BioTek Instruments

GmbH, Bad Friedrichshall, Germany). The undiluted samples were filtrated to isolate the cells, and

glucose and acetic acid concentrations in the supernatant were measured. The detailed procedure of

the automated workflow is described in Haby et al. 2019 [

]. Outliers based on traceable technical

issues are marked as invalid and not included in the data analysis.

2.4. Software Framework

Online data transfer was enabled by a server–client architecture based on the SiLA 2 (Association

Consortium Standardization in Lab Automation, Rapperswil-Jona, Switzerland, sila-standard.org)

standard. The server is located at the Chair of Bioprocess Engineering at Technische Universität Berlin,

while the Client is distributed with the application. The server–client framework is written in Java

8 (Oracle Corporation, Santa Clara, CA, USA). The client requests information about an (running or

completed) experiment to which the server replies. The server is equipped with a driver that connects

to the centralized MySQL (Oracle Corporation, Santa Clara, CA, USA) database, allowing access to all

process and meta data of an experiment. Upon request by a client, the server pulls the data from the

database via SQL queries and returns the information to the client. The client formats the information

into a string complying to the XML standard and saves it on the local machine.

2.5. Monitoring Application

The monitoring application serves as graphical user interface (Supplementary Code S1).

The application itself is written in MATLAB (2018a, MathWorks, Natick, MA, USA, 2019) using

the MATLAB App Designer environment. A MATLAB script initiates the client and parses the

information of the XML file into a data structure compatible to MATLAB. The parsing is based on a

modified version of the xml2struct function from MATLAB File Exchange [40].

2.6. Data Processing for PCA

The input variables for the PCA consisted of on-line (pH, DOT) and at-line (OD

600

, glucose and

acetate concentrations, fluorescence for red, yellow, and cyan fluorescent proteins) measurements

as well as the logged volumes for base and glucose addition. The time differences between the

sampling of at-line measurements were interpolated to a reference time using piecewise cubic Hermite

interpolating polynomials.

The PCA of the three-dimensional dataset (reactor x variable x time) is unfolded in a batch-wise

manner [

]. This approach essentially converts time to a distinguishing factor of each variable,

i.e., defining one variable per time instance where it was quantified. Following the unfolding the

dataset was mean-centered and scaled to unit-variance. Additionally, to account for the different

frequency of measurements, the data was block-scaled: the trajectories of one variable among all was

scaled by dividing each column by the square root of available number of data points [42].

2.7. Principal Component Analysis

The PCA is computed by the built-in MATLAB function pca. Detailed mathematical representation

of the algorithm can be found in [

]. The optimal number of principal components (PCs) are selected

automatically based on the improvement in % variance explained. An empirical threshold was set

at 5%.

Score plots were used to visually represent the replicates’ run behavior. Given the unfolding

choice, the traditional loading plots become too complex with many lines for direct interpretation.

Processes 2020,8, 582 6 of 16

Thus, contribution plots were employed to aid the operator in relating patterns in the score plots to

actual occurrences in the process [44].

PCA was applied to series of dataset collected in time that are augmented in two ways.

irst, a movi

ng horizon or sliding window mode of data augmentation is used to detect failures

of sensors or faulty measurements rapidly. In this approach a window length is chosen (x mins) and

an update time is chosen (y mins). All data available in the window length is used to build the PCA

model and a new model is built by sliding the window by y minutes. Secondly, a full horizon mode of

data augmentation is used to track and compare the dynamic evolution of the different MBRs. In this

approach, all data available is used to build the PCA model. In this work we chose to build a full

horizon PCA model for each time in the reference time set.

The Hotelling’s T

distance measure was used to detect unexpected drifts of an MBR compared to

its replicates. Automated triggers were set for reactors outside the 90% confidence ellipse. The variable

causing the behavior was automatically detected using key properties of PCA [

] to have a preliminary

diagnosis of the event. The latent variable contributing most to the run was identified using the

formula stated below:

cos2

i,l=f2

i,l

Plf2

i,l

(1)

where

i,l

is the squared score of observation ion latent variable l. Subsequently, squared loadings of

all the variables on this principal component were analyzed to identify the key driver of the failure.

2.8. Automated Warnings

Robotic experiments typically run without automated supervision systems. Failures besides arm

movements or device malfunction cannot be detected unless an operator is monitoring the process.

To tackle this issue, a simple method to trigger alarms was developed. For this, the Euclidean distance

of each point to the center of its cluster of replicates was calculated in the sliding window triggering a

message if user defined constraints were violated. This information allows the operator to quickly

grasp the current status of the overall cultivation and assess the similarity of the replicates. Additionally,

this first step towards online automatic classification of outliers enables more robust data selection for

online optimal experimental re-design [

], a process that requires fast and thorough data selection

that can hardly be done manually.

2.9. Pipetting Accuracy

To assess the pipetting accuracy of the cultivation LHS, weighed 96 well plates are filled with

coloured demineralised water. Absorption maximum of the used liquid was determined via a spectral

scan by the plate reader at 445 nm. The pipetting scheme was set up so each needle pipettes three

columns in one row on one aspiration cycle to mimic the experimental setup in the cultivation.

The factor for absorption mL−1was calculated as shown below and applied to the plate.

Factor[absortion

mL ] = avrg. absortion of all wells ∗no. of wells

weight differnce

density water

=total absortion

total volume in plate (2)

3. Results

The aim of the study was to demonstrate the functionality of the application and its capability to

remotely monitor parallel cultivations, detect failures and guide the operator through the experiment.

To this end, 24 microbial cultivations were carried out with eight different feeding strategies in triplicate.

The aim of the study was to force several failures to test the performance of the program under

critical conditions.

Processes 2020,8, 582 7 of 16

3.1. Microbial Cultivation

As model system, we use previously constructed recombinant strain E. coli BW25113 (pAG032) that

expresses three different fluorescence proteins under individual control of three different promoters [

The batch phase of the 24 parallel cultivations lasted 2.6 h

2.04 min. The feed for the first nine reactors

was started at 3.2 h of cultivation after consumption of acetic acid. The feed for all other reactors was

started at 4 h. Different feed rates were set for each reactor triplicate. The feed rates varied between

20% and 90% of the maximum growth rate (see Table 1). Due to the pulse-based nature of the feeding,

DOT oscillations started together with the feed additions. Cultivation data are shown in Figure 2and

available in Supplementary Table S1. Depending on the feeding rate, the increase of reactor volumes

over time differs. For reactors 1–3 (with the highest feed rate of 90% of

µmax

) the critical volume was

reached after 5.5 h, reactors 10–12 (60% of

µmax

) reached the critical volume after 8.8 h. Reactors with a

low feed rate never reached critical volume levels. The DOT profile of reactor 17 decreased between

8.7 h and 9.8 h, however, this was due a technical issue and was solved during the running cultivation.

When a critical volume level was reached, the DOT dropped to zero and the glucose consumption

rate decreased.

Processes 2020, 8, x; doi: FOR PEER REVIEW www.mdpi.com/journal/processes

(a)

(b)

(c)

Figure 2. Cultivation data of the experiment. The rows represent a group of replicates with three mini-

bioreactors (MBR). From top to bottom row, the applied feed decreased by 10% from 90% to 20% of

µmax. (a) Solid lines: DOT (%); dashed lines: pH (-); (b) dots: biomass (g L−1); stars: glucose (g L−1);

diamonds: acetic acid (g L−1); (c) dots: RFP (RFU × 103) (rpsJ constitutive promoter); stars: YFP (RFU ×

105) (inducible XylS/Pm promoter); diamonds: CFP (RFU × 104) (σ32 related constitutive promoter).

3.2. User Interface

The main features of the program developed here are (i) the operator support with a visual

compression of the large number of bioreactors and variables that need to be supervised, (ii) the

secure and reliable remote access via the framework, and (iii) developing an automated event trigger

and fault detection tool. Additionally, a user-friendly interface was developed to demonstrate the

added value of the tool and allow its test in real experiments with experienced operators.

The central program developed in MATLAB covers all Server-Client connections, data

management and -analyses and offers a graphical user interface. The user may choose from different

plots commonly used in PCA such as score, scree, contribution, and loading plots. Input variables for

data analysis can be varied to explore different aspects. To monitor the cultivation, the application

separates data for a full horizon and a moving window approach (see Figure 3).

Figure 2.

Cultivation data of the experiment. The rows represent a group of replicates with three

mini-bioreactors (MBR). From top to bottom row, the applied feed decreased by 10% from 90%

to 20% of

µmax

. (

) Solid lines: DOT (%); dashed lines: pH (-); (

) dots: biomass (g L

−1

); stars:

glucose (g L

−1

); diamonds: acetic acid (g L

−1

); (

) dots: RFP (RFU

) (rpsJ constitutive promoter);

stars: YFP (

RFU ×105)

(inducible XylS/Pm promoter); diamonds: CFP (RFU

) (

σ32

constitutive promoter).

In all cultivations a decrease of CFP activity (

σ32

related promoter) was observed during the batch

phase. Furthermore, the CFP activity is on the same level and stays constant during the feeding phase

for low and moderate feed rates. For higher feed rates (60–90% of

µmax

) the CFP activity is lower

Processes 2020,8, 582 8 of 16

during the feeding phase and increases with the beginning of the oxygen limitation. The specific CFP

signal increases during the batch phase and decreased during the feed phase at higher feed rates.

In cultivations with a lower feed rate (20–40% of

µmax

) the YFP signal (indicator for ribosomes per cell)

stays constant during the feed rate. Between the cultivation time from 3–5 h, some YFP measurements

were marked invalid because the detector limit of the used plate reader was exceeded. Later samples

are analyzed in a higher dilution. For the two highest feed rates nearly no specific increase of RFP

(inducible product) was observed. The highest specific RFP activity was reached with a feed rate of

µset of 0.22 h−1(30% of µmax).

3.2. User Interface

The main features of the program developed here are (i) the operator support with a visual

compression of the large number of bioreactors and variables that need to be supervised, (ii) the secure

and reliable remote access via the framework, and (iii) developing an automated event trigger and

fault detection tool. Additionally, a user-friendly interface was developed to demonstrate the added

value of the tool and allow its test in real experiments with experienced operators.

The central program developed in MATLAB covers all Server-Client connections, data management

and -analyses and offers a graphical user interface. The user may choose from different plots commonly

used in PCA such as score, scree, contribution, and loading plots. Input variables for data analysis can

be varied to explore different aspects. To monitor the cultivation, the application separates data for a

full horizon and a moving window approach (see Figure 3).

Processes2020,8,xFORPEERREVIEW9of18



Figure3.Ascreenshotofthegraphicaluserinterfaceoftheapplication.Thedatashownisfromthe

describedexperimentat3:19h.TheleftblueshowsplotsforthemovinghorizonPCAmodel,whereas

therightbluerectangledepictsthesameplotsforafullhorizonPCAmodel.

3.3.MovingandFullHorizonSetup

Inthemovinghorizonsetup,thewindow’stimeframewassetto20min,adurationempirically

determinedbasedonexperienceandtrialswithhistoricaldata.Thus,theinputvariablesforthe

slidingwindowPCAarethesetpointsforpipettingvolumes(base+feed)andtheonline

measurements(pH+DOT).

Analysisoftheloadingvectorsinthefullhorizonsetupshowedthatthevariablescumulated

glucosefeedandbiomasscorrelatepositivelyandarestronglypronouncedonthefirstprincipal

component.Asthefeedwassetdifferentlyforeachgroupofreplicates,thisfindingissound.

However,from03:00honwards,atrendcanbeobservedonthesecondPCwherethescoresforthe

reactorsofthereplicateshavemonotonousdecreasingvaluesonthey‐axes(seeFigure4).The

posteriorianalysisofthepipettingsystemshowedthatthefeedingswereindeedfollowingthistrend.

Figure 3.

A screenshot of the graphical user interface of the application. The data shown is from the

described experiment at 3:19 h. The left blue shows plots for the moving horizon PCA model, whereas

the right blue rectangle depicts the same plots for a full horizon PCA model.

3.3. Moving and Full Horizon Setup

In the moving horizon setup, the window’s timeframe was set to 20 min, a duration empirically

determined based on experience and trials with historical data. Thus, the input variables for the sliding

window PCA are the set points for pipetting volumes (base +feed) and the online measurements

(pH +DOT).

Analysis of the loading vectors in the full horizon setup showed that the variables cumulated

glucose feed and biomass correlate positively and are strongly pronounced on the first principal

component. As the feed was set differently for each group of replicates, this finding is sound. However,

Processes 2020,8, 582 9 of 16

from 03:00 h onwards, a trend can be observed on the second PC where the scores for the reactors of

the replicates have monotonous decreasing values on the y-axes (see Figure 4). The posteriori analysis

of the pipetting system showed that the feedings were indeed following this trend.

Processes2020,8,xFORPEERREVIEW10of18

(a)(b)

Figure4.FullhorizonPCAapproachatdifferenttimes.(a)Scoreplotforthefullhorizonprincipal

componentanalysis(PCA)modelattimepointt=30min.(b)ScoreplotforthefullhorizonPCA

modelatt=10:22h(entirecultivation).Theeightgroupsofreplicatesareindicatedbycolorandthe

reactorsarenumberedconsecutively.ThevarianceexplainedbythePCisindicatedinpercentin

parenthesis.

3.4.EventMonitoringBasedonPCA

Duringthisstudy,theprogramcontinuouslyobservedthecultivations,updatingitsdataevery

10min.Severalincidentsobservedduringthecultivationweredetectedproperlybytheprogram.

Wediscussthreeoftheseevents:(1)stirrerfailureinonebioreactor,causedbyproblemsinthe

magneticsystem,(2)overfillingofabioreactor,causedbydeactivationofthevolumecontrol,and(3)

disturbanceofairsupplyinabioreactor,causedby,e.g.,dropletsintheinlet.

3.4.1.StirringFailure

ThemovinghorizonPCAmodelwithelevenpHandDOTmeasurementsrevealedatleastthree

reactorsbehavingdifferentlyafter20minofbatchphase.Reactors3,8,and20wereidentifiedas

outliersbytheautomatedprogram(seeFigure5b).DOTwasidentifiedasthecausalvariablefor

reactor3,whilepHwasidentifiedthecausalvariableforreactor20.ThefirstandsecondPCexplain

44.4%and43.1%variance,respectively.Intheloadingplotstheorthogonalrelationoftheinput

variablepHandDOTisclearlyvisible,henceallowingtotracebackthedeviationinthescoreplotto

therawmeasurements(Figure5c).WhilethevariablefromtheDOTtrajectoryimpactsthesecond

PCalmostexclusively,thepHtrajectoryhasanimpactonthefirstPC.Correspondingdeviations

werealsoobservedintheon‐linemeasurements.Forreactors3and8lowerDOTvalueswere

measuredduringthefirst10minofthecultivation(Figure5a).Inbothreactorsthiswascausedbya

technicalissue.Themagneticstirrerofreactor8didnotstartproperly(thisissuewasdetectedbythe

operatorandsolvedpromptly).Reactor20islocatedthefurthestawayfromthecenterpointin

respecttothefirstPC,indicatingalowerpHbutusualDOT.Thisfindingissupportedbythe

physiologicalstateofthereactor.

Figure 4.

Full horizon PCA approach at different times. (

) Score plot for the full horizon principal

component analysis (PCA) model at time point t =30 min. (

) Score plot for the full horizon PCA model

at t =10:22 h (entire cultivation). The eight groups of replicates are indicated by color and the reactors

are numbered consecutively. The variance explained by the PC is indicated in percent in parenthesis.

3.4. Event Monitoring Based on PCA

During this study, the program continuously observed the cultivations, updating its data every

10 min. Several incidents observed during the cultivation were detected properly by the program.

We discuss three of these events: (1) stirrer failure in one bioreactor, caused by problems in the magnetic

system, (2) overfilling of a bioreactor, caused by deactivation of the volume control, and (3) disturbance

of air supply in a bioreactor, caused by, e.g., droplets in the inlet.

3.4.1. Stirring Failure

The moving horizon PCA model with eleven pH and DOT measurements revealed at least three

reactors behaving differently after 20 min of batch phase. Reactors 3, 8, and 20 were identified as

outliers by the automated program (see Figure 5b). DOT was identified as the causal variable for

reactor 3, while pH was identified the causal variable for reactor 20. The first and second PC explain

44.4% and 43.1% variance, respectively. In the loading plots the orthogonal relation of the input

variable pH and DOT is clearly visible, hence allowing to trace back the deviation in the score plot to

the raw measurements (Figure 5c). While the variable from the DOT trajectory impacts the second PC

almost exclusively, the pH trajectory has an impact on the first PC. Corresponding deviations were

also observed in the on-line measurements. For reactors 3 and 8 lower DOT values were measured

during the first 10 min of the cultivation (Figure 5a). In both reactors this was caused by a technical

issue. The magnetic stirrer of reactor 8 did not start properly (this issue was detected by the operator

and solved promptly). Reactor 20 is located the furthest away from the center point in respect to the

first PC, indicating a lower pH but usual DOT. This finding is supported by the physiological state of

the reactor.

Processes 2020,8, 582 10 of 16

Processes2020,8,xFORPEERREVIEW11of18



(a)(b)(c)

Figure5.Detectionofstirrerfailures.(a)TrajectoriesforDOTandpHforall24reactorsinthefirst

hourofthecultivation.(b)ScoreplotfortheslidingwindowPCAmodelwithDOTandpH

trajectoriesasinputvariables.Thetimeframeist0=0totend=20min.Reactors8and20aredistinctly

separatedfromthemaincluster.ThevarianceexplainedbythePCisindicatedinpercentin

parenthesis.(c)LoadingsofthefirsttwoprincipalcomponentsforthesamePCAmodel.

3.4.2.ReactorOverfill

At05:40h,theprogramdetectedreactor3tobeanoutlier.Thecontributingvariablewas

identifiedtobethepH.AnalyzingthescoreplotoftheslidingwindowPCAat05:40hshowedthat

reactors2and3didseparatefromtheirclusterofreplicates(Figure6c).Comparedtothescoreplots

at05:20h(Figure6b),thesetworeactorswheretheonlyonesthatdidnotmoveuniformlyinone

direction.Rather,thescoresofreactors2and3shiftedfromthefirsttothefourthquadrant.Inspection

ofthefirsttwoPCshowsthattheyexplainmorethan90%ofthevariance.Theweightsfortheloading

vectorofthesecondcomponentshownegativecorrelationofpHandbaseadditiontoDOT(not

shown).TheDOTtrajectoryforthesereactorsdidnotfeaturetheexpectingoscillatingpattern,

indicatingthatthecultivationstoppedreactingtothepulse‐basedfeeding(Figure6a).

WhileallthreeMBRwerefedthesamevolumeofglucose,theaddedvolumeofbasediffered.

Thetotalvolumeaddeddecreasesinreverseorderofthereactors(3–1).Duetothemissingvolume

control,thiscausedthereactorstoexceedtheiruppervolumelimit,causingablockageoftheaeration

system.Theat‐lineanalysisoftheglucoseandacetatemediaconcentrationshowedadrasticincrease

ofglucoseandslightincreaseofacetate(Figure2).

(a)(b)(c)

Figure6.Reactoroverfill.(a)TrajectoriesoftheDOTprofilefrom05:00–05:40hwithhighlighted

reactors(1,2,3;highestfeedrate).Thepatternofpeakscorrespondstothepulse‐basedfeeding.(b,c)

ScoreplotofthefirsttwoPC.TheslidingwindowPCAmodelwasbuiltwithdatafrom05:00–05:20

h(b)and05:20–05:40h(c)intothecultivation,respectively.ThevarianceexplainedbythePCis

indicatedinpercentinparenthesis.

3.4.3.BlockageoftheAerationSystem

Figure 5.

Detection of stirrer failures. (

) Trajectories for DOT and pH for all 24 reactors in the first hour

of the cultivation. (

) Score plot for the sliding window PCA model with DOT and pH trajectories as

input variables. The timeframe is t

=0 to t

end

=20 min. Reactors 8 and 20 are distinctly separated from

the main cluster. The variance explained by the PC is indicated in percent in parenthesis. (

) Loadings

of the first two principal components for the same PCA model.

3.4.2. Reactor Overfill

At 05:40 h, the program detected reactor 3 to be an outlier. The contributing variable was identified

to be the pH. Analyzing the score plot of the sliding window PCA at 05:40 h showed that reactors

2 and 3 did separate from their cluster of replicates (Figure 6c). Compared to the score plots at 05:20 h

(Figure 6b), these two reactors where the only ones that did not move uniformly in one direction.

Rather, the scores of reactors 2 and 3 shifted from the first to the fourth quadrant. Inspection of the first

two PC shows that they explain more than 90% of the variance. The weights for the loading vector

of the second component show negative correlation of pH and base addition to DOT (not shown).

The DOT trajectory for these reactors did not feature the expecting oscillating pattern, indicating that

the cultivation stopped reacting to the pulse-based feeding (Figure 6a).