Deli e able D2.2
Sou ce E idence Ex ac o – 1
Edi o (s):
Ve ena Geis , S e an Schöbe l
Responsible Pa ne :
So wa e Compe ence Cen e Hagenbe g GmbH (SCCH)
S a us-Ve sion:
Final – 1.0
Da e:
31.10.2024
Type:
OTHER
Dis ibu ion le el:
PU
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 2 o 30
www.eme ald-he.eu
P ojec Numbe :
101120688
P ojec Ti le:
EMERALD
Ti le o Deli e able:
D2.2 – Sou ce E idence Ex ac o – 1
Due Da e o Deli e y o he EC
31.10.2024
Wo kpackage esponsible o he
Deli e able:
WP2 – Me hodology o knowledge ex ac ion
Edi o (s):
Ve ena Geis , S e an Schöbe l (SCCH)
Con ibu o (s):
Flo ian Wendland (FHG)
Re iewe (s):
Ramon Ma in De Pozuelo Genis (CXB)
Juncal Alonso (TECNALIA)
C is ina Ma inez (TECNALIA)
App o ed by:
All Pa ne s
Recommended/manda o y
eade s:
WP1, WP2, WP3, WP4, and WP5
Abs ac :
This deli e able p esen s ools and echniques o
e idence ex ac ion om sou ce code ha can be
in eg a ed wi h he ce i ica ion g aph.
I is he esul o wo k pe o med in Task 2.2. This
documen is a i s /in e im e sion, he inal e sion on
sou ce e idence ex ac o s will be epo ed in D2.3.
Keywo d Lis :
Knowledge ex ac ion, sou ce code iles, echnical
e idence, s a ic code analysis, Codyze, eknows.
Licensing in o ma ion:
This wo k is licensed unde C ea i e Commons
A ibu ion-Sha eAlike 4.0 In e na ional (CC BY-SA 4.0
DEED h ps://c ea i ecommons.o g/licenses/by-sa/4.0/)
Disclaime
Funded by he Eu opean Union. Views and opinions
exp essed a e howe e hose o he au ho (s) only and
do no necessa ily e lec hose o he Eu opean Union.
The Eu opean Union canno be held esponsible o
hem.
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 3 o 30
www.eme ald-he.eu
Documen Desc ip ion
Ve sion
Da e
Modi ica ions In oduced
Modi ica ion Reason
Modi ied by
0.1
14.08.2024
Fi s d a e sion, ToC
Ve ena Geis , S e an
Schöbe l (SCCH),
Flo ian Wendland
(FHG)
0.2
22.08.2024
Execu i e summa y
Ve ena Geis (SCCH)
0.3
26.08.2024
Re e ences, in oduc ion, embedding
in o he EMERALD a chi ec u e,
ac onyms
Ve ena Geis (SCCH)
0.4
04.09.2024
Func ional desc ip ion o eknows
Ve ena Geis (SCCH)
0.5
25.09.2024
Desc ip ions o Codyze
Flo ian Wendland
(FHG)
0.6
01.10.2024
Technical desc ip ion o eknows
S e an Schöbe l (SCCH)
0.7
07.10.2024
Conclusion, inalizing he documen
o he in e nal e iew
Ve ena Geis (SCCH)
0.8
17.10.2024
In e nal e iew
Ramon Ma in De
Pozuelo Genis (CXB)
0.9
18.10.2024
Re ision o in e nal e iew
Ve ena Geis , S e an
Schöbe l (SCCH)
0.10
28.10.2024
Final e iewed e sion
Juncal Alonso, C is ina
Ma inez (TECNALIA)
1.0
31.10.2024
Submi ed o he Eu opean
Commission
Juncal Alonso, C is ina
Ma ínez (TECNALIA)
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 4 o 30
www.eme ald-he.eu
Table o con en s
Te ms and abb e ia ions ............................................................................................................... 6
Execu i e Summa y ....................................................................................................................... 8
1 In oduc ion ........................................................................................................................... 9
1.1 Abou his deli e able .................................................................................................... 9
1.2 Documen s uc u e ..................................................................................................... 10
2 Sou ce e idence ex ac o s in he EMERALD a chi ec u e ................................................. 11
3 Codyze o EMERALD ........................................................................................................... 12
3.1 Func ional desc ip ion ................................................................................................. 12
3.2 Technical desc ip ion ................................................................................................... 13
3.2.1 P o o ype a chi ec u e ...................................................................................... 13
3.2.2 Technical speci ica ions ..................................................................................... 15
3.3 Deli e y and usage ....................................................................................................... 16
3.3.1 Package in o ma ion .......................................................................................... 16
3.3.2 Ins alla ion ......................................................................................................... 16
3.3.3 Ins uc ions o use ............................................................................................ 16
3.3.4 Licensing in o ma ion ........................................................................................ 17
3.3.5 Download .......................................................................................................... 17
3.4 Limi a ions and u u e wo k ........................................................................................ 17
4 eknows e idence ex ac o .................................................................................................. 18
4.1 Func ional desc ip ion ................................................................................................. 18
4.2 Technical desc ip ion ................................................................................................... 22
4.2.1 P o o ype a chi ec u e ...................................................................................... 22
4.2.2 Technical speci ica ions ..................................................................................... 23
4.3 Deli e y and usage ....................................................................................................... 24
4.3.1 Package in o ma ion .......................................................................................... 24
4.3.2 Ins alla ion ........................................................................................................ 24
4.3.3 Ins uc ions o use ............................................................................................ 25
4.3.4 Licensing in o ma ion ........................................................................................ 25
4.3.5 Download .......................................................................................................... 25
4.4 Limi a ions and u u e wo k ........................................................................................ 25
5 Conclusions .......................................................................................................................... 27
6 Re e ences ........................................................................................................................... 28
Appendix A: eknows Bina y Usage So wa e License .................................................................. 30
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 5 o 30
www.eme ald-he.eu
Lis o ables
TABLE 1. REQUIREMENT CODYZE.01 - EXTRACTION OF SECURITY FEATURES FROM SOURCE CODE ................ 13
TABLE 2. SUPPORTED PROGRAMMING LANGUAGES BY CODYZE THROUGH THE CPG LIBRARY ........................ 15
TABLE 3. OVERVIEW OF CODYZE PACKAGE STRUCTURE ........................................................................... 16
TABLE 4. EKNOWS.01 - INTEGRATION INTO EXISTING SYSTEMS ............................................................. 19
TABLE 5. EKNOWS.02 - RESILIENCE WHILE ANALYSING ERRONEOUS CODE ............................................... 19
TABLE 6. EKNOWS.03 - MULTI-LANGUAGE SUPPORT........................................................................... 20
TABLE 7. EKNOWS.04 - SUPPORT EMERALD EVIDENCE FORMAT ......................................................... 20
TABLE 8. EKNOWS.05 - STATIC CODE ANALYSIS .................................................................................. 21
TABLE 9. SUPPORTED PROGRAMMING LANGUAGES BY EKNOWS............................................................... 23
TABLE 10. OVERVIEW AND DESCRIPTION OF PACKAGE STRUCTURE FOR THE EKNOWS EVIDENCE EXTRACTOR .... 24
Lis o igu es
FIGURE 1. EMERALD COMPONENT OVERVIEW DIAGRAM [9]. THE RED RECTANGLE HIGHLIGHTS THE SOURCE
EVIDENCE EXTRACTION COMPONENTS, WHICH ARE DESCRIBED IN THIS DELIVERABLE. ........................... 11
FIGURE 2. ARCHITECTURE OF CODYZE FOR EMERALD HIGHLIGHTING ITS MODULES AND CONTRIBUTIONS WITHIN
THE EMERALD PROJECT (I.E., MODULES WITH DASHED BOXES ARE EXTERNAL) .................................. 14
FIGURE 3. CONFIGURATION OF EVIDENCE COLLECTORS IN THE EMERALD UI (D4.3 [14]) .......................... 17
FIGURE 4. REVERSE ENGINEERING ACTIVITIES SUPPORTED BY THE SOFTWARE PLATFORM EKNOWS [15]. FURTHER
EXPLANATIONS OF SUBCOMPONENTS ARE PROVIDED IN SECTION 4.2.1.1. ......................................... 18
FIGURE 5. OVERVIEW OF EKNOWS PLATFORM BUILDING BLOCKS [16] ...................................................... 22
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 6 o 30
www.eme ald-he.eu
Te ms and abb e ia ions
AI
A i icial In elligence
AI-SEC
AI Secu i y E idence Collec o
AMOE
Assessmen and Managemen o O ganisa ional E idence
ANTLR
ANo he Tool o Language Recogni ion
API
Applica ion P og amming In e ace
AST
Abs ac Syn ax T ee
ASTM
Abs ac Syn ax T ee Me amodel
BSI C5
BSI Cloud Compu ing Compliance C i e ia Ca alogue
Ce G aph
Ce i ica ion G aph
CaaS
Con aine -as-a-Se ice
CDT
C/C++ De elopmen Tooling
CI/CD
Con inuous In eg a ion / Con inuous Deploymen
CIL
Common In e media e Language
CLI
Command-Line In e ace
COBOL
Common Business-O ien ed Language
CoCo/R
Compile Gene a o
Codyze
S a ic Code Analyze om FHG
CPG
Code P ope y G aph
CSA o EU CSA
EU Cybe secu i y Ac
CSP
Cloud Se ice P o ide
DoA
Desc ip ion o Ac ion
DOT
Ma kup-Language
DSL
Domain-Speci ic Language
DSM
Domain-Speci ic Model
EC
Eu opean Commission
eknows
The pla o m o so wa e analysis om SCCH
eknows co e
Selec ed modules o he eknows pla o m, which o m he basis o he
eknows e idence ex ac o
eknows e idence
ex ac o
The ex ac o componen de eloped in he con ex o EMERALD
ENISA
Eu opean Union Agency o Cybe secu i y
EUCS
EU Cloud Ce i ica ion Scheme
GA
G an Ag eemen o he p ojec
GASTM
Gene ic AST Me amodel
HTML
Hype ex Ma kup Language
JAR
Ja a A chi e
JCL
Job Con ol Language
JDT
Ja a De elopmen Tools
JNA
Ja a Na i e Access
JSON
Ja aSc ip Objec No a ion
Koopa
(COBOL) Pa se Gene a o
KPI
Key Pe o mance Indica o
KR
Key Resul
MD
Ma kdown
MEDINA
P edecesso p ojec o EMERALD
ODF
Open Documen Fo ma
OMG
Objec Managemen G oup
PL/I
P og amming Language One
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 7 o 30
www.eme ald-he.eu
PL/SQL
P ocedu al Language/S uc u ed Que y Language
PT
Pa se T ee
REST
Rep esen a ional S a e T ans e
SARIF
S a ic Analysis Resul s In e change Fo ma
SASTM
Specialized AST Me amodel
SE
S anda d Edi ion
SVG
Scalable Vec o G aphics
SW
So wa e
TLS
T anspo Laye Secu i y
TRL
Technology Readiness Le el
WP
Wo k Package
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 8 o 30
www.eme ald-he.eu
Execu i e Summa y
This deli e able p esen s he ini ial design, a chi ec u e, and implemen a ion s a e o he sou ce
e idence ex ac o s o WP2, i.e., Codyze and eknows e idence ex ac o . They con ibu e o he
key esul KR1-EXTRACT o EMERALD, a amewo k o con inuously ex ac knowledge om
di e en laye s o a cloud se ice and p epa e sui able e idence based on hem.
EMERALD ollows a knowledge g aph-based app oach o p o ide a uni ied iew o he cloud
se ice unde ce i ica ion a di e en laye s o he se ice, anging om he in as uc u e laye
(e.g., i ual esou ces), o he business laye (e.g., policies and p ocedu es), o he
implemen a ion laye (e.g., sou ce code iles) and da a laye (e.g., inc easingly used AI models)
in cloud applica ions. The sou ce e idence ex ac o s, de eloped in Task 2.2 and desc ibed in
his deli e able, aim a iden i ying c i ical secu i y- ela ed unc ionali y such as da a enc yp ion,
anspo enc yp ion, o au hen ica ion in sou ce code. O he ela ed deli e ables in WP2, all
due a p ojec mon h 12 (Oc obe 2024), p o ide unc ional and echnical de ails on u he
e idence ex ac o s om di e en sou ces, i.e., D2.4 [1] on e idence ex ac ion om policy
documen s in Task 2.3, D2.6 [2] on secu i y and p i acy p ese ing e idence ex ac ion in Task
2.4, and D2.8 [3] on un ime da a ex ac ion in Task 2.5. All hese de ails con ibu ed o D2.1 [4]
on he o e all in o ma ion model o he ce i ica ion g aph in Task 2.1.
This documen s a s by illus a ing how he sou ce e idence ex ac o s i in o he o e all
EMERALD a chi ec u e. The main pa p o ides unc ional and echnical desc ip ions o he wo
ex ac o componen s Codyze and eknows e idence ex ac o , including hei pu pose and
scope, he (cu en and planned) co e age o he EMERALD equi emen s, he componen s’
in e nal a chi ec u e and hei subcomponen s. These desc ip ions a e complemen ed by
in o ma ion on deli e y and usage, as well as on limi a ions and u u e wo k. Finally, he
documen concludes wi h a sho summa y.
The sou ce e idence ex ac o s desc ibed in his deli e able con ibu e o KR1-EXTRACT by
p o iding nex -gene a ion e idence ga he ing ools and echniques based on a knowledge g aph
app oach. The p esen ed ex ac o s cu en ly ha e he ini ial p o o ypes implemen ed and
eady o be ( o some deg ee) in eg a ed wi h o he componen s o he EMERALD a chi ec u e.
Some equi emen s o he componen s a e al eady ully o pa ially sa is ied by he p esen ed
p o o ypes.
Based on he wo k desc ibed in his deli e able, he sou ce e idence ex ac o s will be u he
ex ended and in eg a ed in o he EMERALD amewo k. This is he i s i e a ion o he
deli e able coming om Task 2.2. The second and inal e sion o his deli e able wi h he
upda ed ex ac o s will be deli e ed wi h D2.3 [5] in p ojec mon h 24 (Oc obe 2025). E idence
will be p epa ed acco ding o he in eg a ed, g aph-based model o seman ically linked and
combined e idence, p o ided in D2.10 (in e im e sion) [6] in p ojec mon h 15 (Janua y 2025)
and D2.11 ( inal e sion) [7] in p ojec mon h 27 (Janua y 2026). The ex ac ed e idence will be
s o ed and assessed, i.e., o e i y he implemen a ion o secu i y me ics, in he scope o WP3.
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 9 o 30
www.eme ald-he.eu
1 In oduc ion
EMERALD aims o p o ide a nex gene a ion se o e idence ga he ing ools and echniques
based on a knowledge g aph app oach. KR1-EXTRACT suppo s an imp o ed and uni ied ool-
suppo ed app oach o con inuously ex ac knowledge om di e en laye s o a cloud se ice,
e.g., in as uc u e, pla o m, un ime in o ma ion, policy documen s, so wa e, and AI models.
The objec i e o WP2 is o es ablish a uni ied iew o he cloud se ice unde ce i ica ion by
ex ac ing and en iching knowledge o he di e en laye s o he se ice and p o iding sui able
e idence o secu i y me ics. A majo pa o his wo k package is esea ch and design o
mul iple ools and echniques o ex ac knowledge ou o a ious sou ces. A g aph-based
model, called he ce i ica ion g aph (Ce G aph), se es as a common s uc u e ha is illed by
all e idence ex ac ion ools.
1.1 Abou his deli e able
The goal o his deli e able is o p esen he design and implemen a ion o he EMERALD
e idence ex ac o s, ha ex ac knowledge om sou ce code. This is a epo on he ini ial
p o o ypes e lec ing an ea ly s age o implemen a ion and in eg a ion o hese ex ac o s and
is he i s o wo i e a ions o deli e ables, esul ing om Task 2.2.
E idence on he sou ce code le el is p ima ily ga he ed by he sou ce e idence ex ac o s Codyze
and eknows e idence ex ac o , which a e adap ed o suppo he Ce G aph da a model.
Codyze, o iginally launched in MEDINA
1
, ocuses on gene a ing e idence o secu i y- ela ed
indings, such as he exis ence o enc yp ion o p ope au hen ica ion. In EMERALD, i should be
ad anced o TRL 7 and imp o ed o e i y ha unc ionali y is implemen ed acco ding o s a e-
o - he a secu i y guidelines and s anda ds. To supplemen e idence ex ac ion om sou ce
code, he so wa e analysis pla o m eknows is in eg a ed as basis o he eknows e idence
ex ac o . eknows o e s language-independen analyses and could be ex ended o iden i y
secu i y-en o cing business ules in code and e i y co ec usage o secu i y- ela ed APIs. A
compac o e iew o bo h sou ce code ex ac o s in he o m o Componen s Ca ds can be ound
in D1.3 [8].
Fu he mo e, in o ma ion om p ojec con igu a ions and deploymen iles such as
in as uc u e-as-code migh be used o u he augmen he Ce G aph da a model. Fo now,
please no e ha hese a e ideas o wha echnical e idence could be ga he ed om sou ce code.
The nex deli e able, D2.3 [5], will desc ibe how e idence ex ac ion was implemen ed in de ail
acco ding o he ag eed secu i y scheme(s). Also no e ha he in eg a ion o Codyze and eknows
is no planned. The EMERALD amewo k wo ks wi h wo di e en sou ce e idence ex ac o s,
i.e., a secu i y me ic may wo k well wi h Codyze, and ano he using he eknows e idence
ex ac o . Howe e , he esul ing e idence o ma mus be he same. This shows he use o APIs
in he amewo k and emphasises ha he amewo k is no ied o a speci ic ool.
All ex ac ed in o ma ion oge he p o ides a sys em-le el iew o he cloud se ice iden i ying
exposed unc ionali y and in e ac ions wi h o he cloud se ices. Along hese in e aces
addi ional e idence can be ga he ed speci ically o secu i y equi emen s on se ice
in e ac ions such as anspo enc yp ion o au hen ica ion. The unc ionali ies a e hen
classi ied, anno a ed, and linked wi h o he ex ac ed e idence in o ma ion om di e en laye s
o he cloud se ice (i.e., in as uc u e, policy documen s, and a i icial in elligence (AI) models)
in he EMERALD Ce G aph [6] [7].
1
h ps://medina-p ojec .eu/
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 16 o 30
www.eme ald-he.eu
Mo eo e , Codyze p oduces epo s in SARIF. These epo s a e included in he e idence o he
EMERALD amewo k.
Finally, Codyze o EMERALD equi es a Ja a un ime compa ible wi h Ja a SE 17.
3.3 Deli e y and usage
The ollowing subsec ions de ail he deli e y and usage o Codyze o EMERALD. The p o ided
in o ma ion is cu en ly wo k in p og ess and may change.
3.3.1 Package in o ma ion
Codyze o EMERALD is deli e ed in he o m o wo packages. Fi s , Codyze is eleased as an
a chi e con aining all necessa y iles. The s uc u e o his package is summa ized in Table 3.
Second, Codyze is dis ibu ed as a con aine image. The con aine image con ains he ex ac ed
a chi e o he i s package and con igu es i o be used as an applica ion wi hin a con aine .
The e o e, he ins alla ion olde ma ches he o e iew o Table 3.
Table 3. O e iew o Codyze package s uc u e
Folde / File
Desc ip ion
bin/
Con ains execu ion sc ip s o Windows and Linux/macOS (POSIX-
complian shells).
docs/
Con ains de ailed documen a ion ex s.
e c/
Con ains sample con igu a ion iles.
lib/
Con ains applica ion and dependen lib a ies.
specs/
Con ains speci ica ion iles in Codyze’s DSL.
LICENSE
License ex (Apache License, Ve sion 2.0).
README.md
Sho documen a ion including sho summa y desc ip ion,
ins alla ion and usage ins uc ions, and u he in o ma ion.
3.3.2 Ins alla ion
Ins alla ion ins uc ions a e p o ided as pa o he README wi h Codyze o EMERALD 18.
In summa y, Codyze has he ollowing p e-ins alla ion equi emen s:
• Ja a SE 17 JDK
17
.
• Sou ce code o he Codyze o EMERLD (see Sec ion 3.3.5).
The ollowing s eps a e equi ed o build i :
1. In he olde wi h he sou ce code un:
./g adlew build
2. The buil applica ion can be ound as a chi es a :
codyze-cli/build/dis ibu ion/
3.3.3 Ins uc ions o use
Ins uc ions o use a e p o ided as pa o he eleased Codyze o EMERALD package (c . olde
‘docs/’ in Table 3) and a e included in Codyze’s public Gi Lab eposi o y
18
.
17
h ps://adop ium.ne /de/ emu in/ eleases/? e sion=17&package=jdk
18
h ps://gi .code. ecnalia.com/eme ald/public/componen s/codyze
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 17 o 30
www.eme ald-he.eu
Figu e 3 illus a es how e idence ex ac o s will be used o se ing up ce i ica ion a ge s in he
EMERALD UI. Please e e o D4.3 [14] o u he isualiza ions o he EMERALD UI.
Figu e 3. Con igu a ion o e idence collec o s in he EMERALD UI (D4.3 [14])
3.3.4 Licensing in o ma ion
Codyze o EMERALD and i s subcomponen s a e licensed as open sou ce unde Apache License,
Ve sion 2.0. In addi ion, i is ensu ed ha hi d-pa y dependencies a e compa ible wi h he
Apache License, Ve sion 2.0. In pa icula , he main Codyze dependency o sou ce code analysis,
he CPG, is also licensed as open sou ce unde Apache License, Ve sion 2.0.
3.3.5 Download
Codyze o EMERALD is a ailable om he public EMERALD Gi Lab eposi o y18 hos ed by
TECNALIA. The eposi o y will hos he sou ce code, he documen a ion and he bina y a e ac s
consis ing o a con aine image and a elease a chi e.
3.4 Limi a ions and u u e wo k
Codyze o EMERALD is based on Codyze o MEDINA and aims o each TRL 6-7. Changes in he
EMERALD amewo k compa ed o he MEDINA amewo k equi e adjus men s o Codyze.
P ima ily, Codyze o MEDINA epo ed e idence and assessmen esul s. In con as , he
EMERALD amewo k uses a common axonomy o e idence o c ea e a Ce i ica ion G aph
(Ce G aph) o coalesce all in o ma ion om ga he ed e idence. As a esul , Codyze o
EMERALD can epo only e idence classi ied acco ding o he axonomy and on ology o he
Ce G aph. The assessmen o he in o ma ion is dedica ed o he assessmen componen wi hin
EMERALD. These changes equi e a change in he e idence collec ion and classi ica ion wi hin
Codyze and a e pa o he ongoing wo k in EMERALD WP2, Task 2.2.
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 18 o 30
www.eme ald-he.eu
4 eknows e idence ex ac o
eknows
19
[15] [16] is a so wa e analysis pla o m de eloped by SCCH. The pla o m suppo s
apid de elopmen o mul i-language so wa e analysis ools om p e-buil modules, which
build on a echnology-agnos ic, gene ic laye and a e ex ended o mee use case-speci ic
equi emen s.
4.1 Func ional desc ip ion
O e all pu pose. To e icien ly c ea e knowledge ex ac ion echniques deli e ing equi ed
e idence o e i y i applica ion sou ce code complies o secu i y equi emen s, we ely on he
mul i-language so wa e pla o m eknows. This pla o m quickly and lexibly suppo s he
c ea ion o e idence ex ac ion unc ions by eusing p e ab ica ed pa sing, analysis, and
gene a ion modules. eknows p o ides suppo o main e e se enginee ing ac i i ies, i.e.,
knowledge ex ac ion, ans o ma ion, analysis, and gene a ion ( isualiza ion) as depic ed in
Figu e 4). The co ne s one o he pla o m implemen a ion is a gene ic p og amming language-
independen ep esen a ion o sou ce code ha can be eused ac oss analysis and gene a ion
modules o p epa e sui able e idence.
Figu e 4. Re e se enginee ing ac i i ies suppo ed by he so wa e pla o m eknows [15]. Fu he
explana ions o subcomponen s a e p o ided in Sec ion 4.2.1.1.
Con ex and scope. While he de elopmen o he pla o m was d i en by domain-speci ic
equi emen s om a ious s akeholde s, e.g., business analys s o so wa e a chi ec s, and
mul i- echnology use cases, an a chi ec u e ha suppo s euse o modules o he analysis o
so wa e and gene a ion o a e ac s om di e en p og amming languages was en isaged om
he beginning.
The deli e ed p o o ype builds upon selec ed modules o he pla o m (eknows co e) and
p o ides a “w appe ” con aining he newly de eloped unc ions o EMERALD o ex ac
secu i y- ela ed e idence om applica ion sou ce code (eknows e idence ex ac o ). The eby,
19
h ps://www.scch.a /so wa e-science/p ojek e/de ail/eknows
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 19 o 30
www.eme ald-he.eu
gene ic modules o he model-guided symbolic execu ion o use case-speci ic con o mi y
checks and ac ex ac ion a e ex ended. Also, gene ic modules o business ule ex ac ion o
domain-speci ic ule and cons ain localiza ion will be ex ended. New analyses will be necessa y
o b eak down high-le el secu i y con ols om ca alogues, such as EUCS o BSI C5, in o
checkable sou ce code p ope ies. New gene a ion unc ions o c ea e e idence based on hese
sou ce code p ope ies and o in eg a e hem in o Ce G aph will be p o ided. Which conc e e
modules a e needed om eknows co e, and which new pa s will be added is no ye clea a he
ime o w i ing. I depends on which secu i y con ols a e de i ed om he secu i y ca alogues,
and which p og amming languages a e used in he cloud applica ion code. De ails on his scope
will be desc ibed in he nex deli e able D2.3 [5].
Mo i a ion. As al eady said, Codyze and eknows will no be echnically in eg a ed, as bo h can
co-li e in he EMERALD CaaS amewo k. The mo i a ion o include eknows in he EMERALD
amewo k is, on he one hand, o demons a e ha he amewo k is open o ex ension and
no igh o a speci ic ool. On he o he hand, he co e age o secu i y con ols should be
inc eased by including eknows as an addi ional sou ce ex ac o in he EMERALD amewo k o
be able o check mo e comp ehensi ely whe he he a ailable sou ce code con o ms o he
selec ed secu i y con ols.
Requi emen s. The ele an equi emen s om D1.3 [12] wi h hei espec i e implemen a ion
s a e (pa ially / ully /no implemen ed) and a b ie desc ip ion o how hey a e / will be
implemen ed a e gi en in ables om Table 4 o Table 8.
Table 4. EKNOWS.01 - In eg a ion in o exis ing sys ems
Field
Desc ip ion
Requi emen ID
EKNOWS.01
Sho i le
In eg a ion in o exis ing sys ems
Desc ip ion
The componen should be in eg able in o exis ing sys ems,
de elopmen en i onmen s and wo k lows, o example by using
APIs like REST by compa ibili y wi h CI/CD-Pipelines.
S a us
Accep ed
P io i y
Mus
Componen
eknows
Sou ce
Componen
Type
Technical
Rela ed KR
KR1_EXTRACT
Rela ed KPI
KPI 1.1
Valida ion accep ance
c i e ia
The a ailabili y o he API will be es ed ia an OpenAPI clien .
P og ess
Pa ially implemen ed – 30%
Miles one
MS3: In eg a ed audi sui e V1 (M18)
The p o o ype cu en ly o e s a command line in e ace, which can be in eg a ed in a lexible
way.
Table 5. EKNOWS.02 - Resilience while analysing e oneous code
Field
Desc ip ion
Requi emen ID
EKNOWS.02
Sho i le
Resilience while analysing e oneous code
Desc ip ion
The sou ce code analysed by he componen could be e oneous,
o example syn ac ical and seman ical e o s could be encoun e ed
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 20 o 30
www.eme ald-he.eu
while pa sing i . Fu he mo e, an unknown dialec o a language
could be encoun e ed. An app op ia e e o handling s a egy o
such si ua ions is necessa y: E oneous code will be skipped and no
be u he analysed. A co esponding e o message will be s o ed
in he ga he ed e idence.
S a us
Accep ed
P io i y
Should
Componen
eknows
Sou ce
Componen
Type
Technical
Rela ed KR
KR1_EXTRACT
Rela ed KPI
KPI 1.1
Valida ion accep ance
c i e ia
The componen will ecei e e oneous sou ce code. P ocessing
should un h ough, and a co esponding e o message should be
ound in he gene a ed e idence.
P og ess
Pa ially implemen ed – 70%
Miles one
MS5: Componen s V2 (M24)
Some e o cases a e shown o he use .
Table 6. EKNOWS.03 - Mul i-language suppo
Field
Desc ip ion
Requi emen ID
EKNOWS.03
Sho i le
Mul i-language suppo
Desc ip ion
The componen should be able o analyse sou ce code w i en in
di e en p og amming languages and should suppo a leas Ja a
and Py hon.
S a us
Accep ed
P io i y
Mus
Componen
eknows
Sou ce
Componen
Type
Technical
Rela ed KR
KR1_EXTRACT
Rela ed KPI
KPI 1.1
Valida ion accep ance
c i e ia
The componen will ecei e sou ce iles w i en in Ja a and Py hon
and should be able o p ocess each language and gene a e an
ou pu .
P og ess
Pa ially implemen ed – 50%
Miles one
MS5: Componen s V2 (M24)
Cu en ly, he Ja a on end is used.
Table 7. EKNOWS.04 - Suppo EMERALD e idence o ma
Field
Desc ip ion
Requi emen ID
EKNOWS.04
Sho i le
Suppo EMERALD e idence o ma
Desc ip ion
The analysis o esul s is o e ed in a s uc u ed and s anda dized
o ma , he EMERALD e idence o ma (see da a model in [9]). This
enables u he p ocessing and que ies in o he componen s.
S a us
Accep ed
P io i y
Mus
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 21 o 30
www.eme ald-he.eu
Componen
eknows
Sou ce
Componen
Type
Technical
Rela ed KR
KR1_EXTRACT
Rela ed KPI
KPI 1.1
Valida ion accep ance
c i e ia
The componen ecei es sou ce code o analyse and gene a es an
ou pu om i . This ou pu will be alida ed agains he schema o
he e idence o ma .
P og ess
No implemen ed – 0%
Miles one
MS3: In eg a ed audi sui e V1 (M18)
Simple JSON ou pu is gene a ed so a .
Table 8. EKNOWS.05 - S a ic code analysis
Field
Desc ip ion
Requi emen ID
EKNOWS.05
Sho i le
S a ic code analysis
Desc ip ion
The componen uses s a ic code analysis me hods. Such me hods
a e, o example, da a low analysis, call g aph analysis, symbolic
execu ion, o con ol low analysis. One o mul iple me hods
(possibly in combina ion) will be used o ga he e idence. The ac ual
used me hod(s) depend(s) on he me ic, o which e idence should
be ex ac ed.
S a us
Accep ed
P io i y
Mus
Componen
eknows
Sou ce
Componen
Type
Technical
Rela ed KR
KR1_EXTRACT
Rela ed KPI
KPI:1.1
Valida ion accep ance
c i e ia
Code e iew: Re iew code and check i s a ic code analysis me hods
a e used/implemen ed.
P og ess
Pa ially implemen ed – 60%
Miles one
MS5: Componen s V2 (M24)
Cu en ly, symbolic execu ion is used.
Inno a ion. In addi ion o he desc ibed inno a ion o p o iding compliance by design h ough
sou ce code analysis in cloud applica ions in Sec ion 3.1, using a mul i-language so wa e
pla o m o apid de elopmen o e idence ex ac ion echniques om p e-buil analysis and
gene a ion modules o ce i ica ion o cloud applica ions is a big ad ancemen . Following he
ex ac -abs ac - iew me apho [17] ha can be conside ed as e e ence a chi ec u e o
gene a ing sui able e idence o secu i y me ics in EMERALD, as well as using a s anda d-based
app oach (i.e., he abs ac syn ax ee me amodel (ASTM)
20
o he Objec Managemen G oup
(OMG)) as he gene ic ep esen a ion o he pa sed sou ce code, is ano he no able aspec .
20
h ps://www.omg.o g/spec/ASTM/1.0/Abou -ASTM
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 22 o 30
www.eme ald-he.eu
4.2 Technical desc ip ion
4.2.1 P o o ype a chi ec u e
Figu e 5 shows an o e iew (no ully comple e) o he consis ing modules o he eknows
pla o m. A he bo om o he igu e, a ious on ends using speci ic pa se s o ex ac ing
knowledge om di e en p og amming languages, such as Ja a, C#, and PL/SQL, a e depic ed.
The gene ic ep esen a ion o he pa sed sou ce code based on he ASTM me amodel builds he
basis o a se o analyses, such as Call G aph, Con ol Flow, and Dependency. Va ious gene a ion
modules o isualizing applica ion code as epo s o diag ams o m he op o he pla o m.
As al eady men ioned, only a selec ed subse o hese building blocks is eused / ex ended in
he deli e ed p o o ype and addi ional unc ions will be de eloped. The p o o ype a chi ec u e
will be imp o ed and de ailed as he conc e e secu i y con ols and me ics a e a ailable, which
will be epo ed in he nex e sion o his deli e able, namely D2.3 [5].
Figu e 5. O e iew o eknows pla o m building blocks [16]
4.2.1.1 Subcomponen s desc ip ion
The eknows pla o m p o ides p e-buil modules ha acili a e 1) language pa sing and
ans o ma ion o code in o a gene ic abs ac syn ax ee (AST), 2) s uc u al and beha iou al
analysis o so wa e, and 3) epo ing and isualiza ion o analysis esul s (see Figu e 5). So wa e
solu ions buil on op o eknows in eg a e equi ed modules as-is and add unc ionali y equi ed
o speci ic use cases.
Ex ac ion modules, also e e ed o as language on ends, pa se in o ma ion om so wa e
sys ems (i.e., sou ce code and commen s) and ans o m pa se ees in o ASTs and gene ic da a
s uc u es used in analysis. eknows cu en ly suppo s o e 14 p og amming languages. To
p o ide obus and up- o-da e pa sing in as uc u e, eknows euses eely a ailable pa sing
componen s as a as possible. Fo ins ance, o pa se Ja a o C++, Eclipse JDT and Eclipse CDT
a e used. I no eady- o-use sou ce code pa se is a ailable, pa se gene a o s (i.e., ANTLR and
CoCo/R) a e used o gene a e pa sing in as uc u e om con ex - ee g amma speci ica ions.
In he deli e ed p o o ype we will mos likely ely on he Ja a and Py hon on ends.
To euse analysis componen s ac oss di e en echnology s acks, eknows builds upon he ASTM.
The s anda d p o ided by he OMG is used as common in e media e ep esen a ion and is
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 23 o 30
www.eme ald-he.eu
composed o he gene ic AST me amodel (GASTM) and a se o complemen a y, language-
speci ic speci ica ions, called he specialized AST me amodels (SASTM). To euse analysis
componen s as a as possible, language-dependen models a e kep o a minimum. In he
deli e ed p o o ype we will euse and ex end exis ing analysis, such as he symbolic e alua ion,
and will de elop new analysis depending on he selec ed secu i y con ols and me ics, e.g., he
TLS analysis.
Finally, eknows p o ides a se o isualiza ion and epo ing componen s. Gene a ed elemen s
all in o he ca ego ies ex , ma hema ical o mula, ables, cha s, o g aphs. Inpu o hese
modules a e ASTM da a s uc u es o speci ic esul da a s uc u es c ea ed by analysis modules.
G aphs a e ou pu in in e media e o ma , e.g., G aph iz DOT. To gene a e documen a ion, a
o ma -independen da a s uc u e is used o speci y documen s uc u e and con en (e.g.,
sec ions, pa ag aphs, o mulas, igu es, o ables). Documen speci ica ions can be ou pu as
LaTeX, HTML, Ma kdown, and Open Documen Fo ma (ODF). Fo epo ing e idence in he
deli e ed p o o ype, we will e ine he exis ing unc ionali y o gene a e he esul s in he
de ined EMERALD o ma .
4.2.2 Technical speci ica ions
The Ja a-based pla o m (eknows co e) comp ises o e 350K sou ce lines o code (SLOC). I uses
lib a ies as dependencies o implemen i s unc ionali y. Fo sou ce code pa sing, eknows uses
language-speci ic pa se s (see Table 9) adi ionally buil wi h compile -gene a o ools. A
cen al poin is he gene ic ep esen a ion o he pa sed con en . The dependencies a e
managed using Ma en
21
.
Table 9. Suppo ed p og amming languages by eknows
Language
Pa se
Suppo ed e sion
Adele
CoCo/R22
n.a.
B&R (Be necke + Raine )
CoCo/R
n.a.
C, C++
Eclipse CDT23
C++17
C#
ANTLR24
C# 7.0
CIL/.NET
n.a.
.NET 4.5
COBOL
Koopa25
COBOL 85
Codesys/Bachmann
X ex 26
3
Fo an
Open Fo an27
77/2003
JCL
CoCo/R
JCL z/OS 2.2
Ja a
Eclipse JDT28
Ja a 8
Ja asc ip
Mozilla Rhino29
ES 6
Na u al
CoCo/R
4.2.6
Obe on
CoCo/R
n.a.
Pascal
CoCo/R
Pascal 7.0
PL/I
CoCo/R
z/OS 4.1
21
h ps://ma en.apache.o g/
22
h ps://ssw.jku.a /Resea ch/P ojec s/Coco
23
h ps://p ojec s.eclipse.o g/p ojec s/ ools.cd
24
h ps://www.an l .o g/
25
h ps://gi hub.com/k isds/koopa
26
h ps://eclipse.de /X ex /
27
h ps://gi hub.com/OpenFo anP ojec /open- o an-pa se
28
h ps://www.eclipse.o g/jd
29
h ps://gi hub.com/mozilla/ hino
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 24 o 30
www.eme ald-he.eu
Language
Pa se
Suppo ed e sion
PL/SQL
JDe elope /Akiban30
9.1
Py hon
ANTLR
Py hon 3.6
Sigma ek
ANTLR
n.a.
SQL
Akiban
MySQL 5.6
The deli e ed p o o ype, he eknows e idence ex ac o , is based on eknows co e and euses
modules as a as possible.
The eknows e idence ex ac o consis s o an execu able bina y dis ibu ion and uns s and-
alone. Like Codyze, he eknows e idence ex ac o is execu ed on sou ce code o cloud
applica ions and se ices. The in eg a ion o he eknows e idence ex ac o a he CSPs equi es
a pla o m o CI/CD. The eknows e idence ex ac o can be in eg a ed in o CI/CD pipelines by
using he bina y dis ibu ion. In he cu en s a e i is no ye in eg a ed wi h o he componen s.
Cu en ly, a CLI is p o ided, and a REST in e ace can be p o ided in he u u e, i needed.
Findings a e gene a ed as console ou pu . This ou pu will be submi ed o he E idence S o e o
he EMERALD amewo k in he speci ied da a o ma ollowing he e ms de ined in he
Ce G aph on ology.
4.3 Deli e y and usage
The ollowing subsec ions de ail he deli e y and usage o eknows o EMERALD. The p o ided
in o ma ion is cu en ly wo k in p og ess and may change in he u u e.
4.3.1 Package in o ma ion
The eknows e idence ex ac o is de eloped as a Ja a applica ion wi h he suppo o Ma en
31
as build ool. Table 10 shows he s uc u e o he Gi lab eposi o y
32
and i s con en s.
Table 10. O e iew and desc ip ion o package s uc u e o he eknows e idence ex ac o
Folde
Desc ip ion
eknows/
The p ebuil eknows bina ies should be placed he e. In
addi ion, ins alla ion sc ip s a e p o ided in his olde .
s c/
Sou ce code oo .
s c/main/
Sou ce code o he eknows e idence ex ac o .
s c/ es /
Sou ce code o uni es s o he eknows e idence ex ac o .
es iles/
Tes iles, which a e used o uni es s and can be used o
demo pu poses as well.
LICENSE
License ex (Apache License, Ve sion 2.0).
README.md
Compac guide on how o build and use he ex ac o .
4.3.2 Ins alla ion
Requi emen s:
• Ja a 17 JDK
33
.
• Ma en31.
• Sou ce code o he eknows e idence ex ac o (see Sec ion 4.3.5).
30
h ps://gi hub.com/b uno ibei o/sql-pa se
31
h ps://ma en.apache.o g/download.cgi
32
h ps://gi .code. ecnalia.com/eme ald/public/componen s/eknows
33
h ps://adop ium.ne /de/ emu in/ eleases/? e sion=17&package=jdk
D2.2 - Sou ce E idence Ex ac o – 1 Ve sion 1.0 – Final. Da e: 31.10.2024
© EMERALD Conso ium Con ac No. GA 101120688 Page 25 o 30
www.eme ald-he.eu
The ollowing s eps a e equi ed o build he eknows e idence ex ac o :
1. Ins all eknows co e o EMERALD ( his s ep is only equi ed when building o he i s
ime o when eknows co e o EMERALD is upda ed).
a. Ge he eknows-co e- o -eme ald JAR ile om he in e nal eposi o y (see
Sec ion 4.3.5).
b. Add he JAR ile o he eknows olde .
c. Run ins all.cmd o ./ins all wi hin he eknows olde .
2. Build-
a. Run m n package -DskipTes s.
3. The buil ex ac o can be ound a
a ge /eknows-e idence-ex ac o -< e sion>-ja -wi h-dependencies.ja .
4.3.3 Ins uc ions o use
The eknows e idence ex ac o can be un om he command line o now by in oking:
ja a -ja a ge /eknows-e idence-ex ac o -< e sion>-ja -wi h-dependencies.ja - < ile o analyse>
This will gene a e a JSON ou pu , which con ains he analysed ile name and, in he cu en
de elopmen s a us, he de ec ed TLS e sions.
Figu e 3 illus a es how e idence ex ac o s will be used o se ing up ce i ica ion a ge s in he
EMERALD UI. Please e e o D4.3 [14] o u he isualiza ions o he EMERALD UI.
4.3.4 Licensing in o ma ion
The licensing is spli in o wo pa s:
1. The eknows e idence ex ac o , which is de eloped in he con ex o EMERALD, is
licensed unde Apache 2.0
34
and will be made a ailable o he public as open-sou ce
so wa e.
2. The ounda ion o he ex ac o , eknows co e, is closed sou ce and bina ies a e made
a ailable o he EMERALD p ojec conso ium wi hin he con ex o he p ojec unde
he eknows Bina y Usage So wa e License (see Appendix A: eknows Bina y Usage
So wa e License).
4.3.5 Download
The eknows e idence ex ac o is a ailable om he public EMERALD Gi Lab eposi o y
35
hos ed
by Tecnalia. The eposi o y is going o hos he sou ce code and he documen a ion.
The bina ies o eknows co e a e a ailable o he EMERALD p ojec conso ium in a sepa a e
p i a e EMERALD Gi Lab eposi o y
36
.
4.4 Limi a ions and u u e wo k
Cu en ly, he ex ac o jus suppo s he analysis o sou ce code w i en in Ja a and a leas one
addi ional p og amming language will be added in he u u e, howe e he eknows pla o m
34
h p://www.apache.o g/licenses/LICENSE-2.0
35
h ps://gi .code. ecnalia.com/eme ald/public/componen s/eknows
36
h ps://gi .code. ecnalia.com/eme ald/p i a e/componen s/eknows/eknows-co e- o -eme ald
[in e nal use only - au hen ica ion equi ed]