scieee Science in your language
[en] (orig)

Devops for Optimizing Database Management: Practice Implementation, Challenges and Comparison with the Context-Based Approach

Author: CSEIJ
Publisher: Zenodo
DOI: 10.5281/zenodo.17718688
Source: https://zenodo.org/records/17718688/files/15125cseij33.pdf
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
DOI:10.5121/cseij.2025.15133 315
DESIGN AND IMPLEMENTATION OF THE
MOREHEAD-AZALEA COMPILER (MAC)
Dal on Hensley and Heba Elgazza
Depa men o Enginee ing Sciences, Mo ehead S a e Uni e si y, Mo ehead, KY, USA
ABSTRACT
Wi hin he ealm o compu e science exis s he e e ubiqui ous p og amming language compile . The ole
o he p og amming language compile is ha o a ansla ion de ice which, among o he hings, mus
co ec ly and e icien ly ansla e i s sou ce language o any numbe o a ge languages. Fo example, he
C compile is designed o con e i s sou ce code in o execu able bina ies ha a e backed by a my iad o
so-called ins uc ion se a chi ec u es, such as x86-64. In any case, his ansla ion p ocess is said o be
opaque o he compile ’s end-use , allowing o he au oma ion o an end- o-end sou ce p og am o a ge
language pipeline. Ou goal, hen, is designing and de eloping such a pipeline as o allow o he exis ence
o he no el ”Azalea” p og amming language.
KEYWORDS
Azalea, Compile , T anspile , Pipeline, T ans o ma ion, S a ic Analysis
1. INTRODUCTION
The signi icance o compile s wi hin compu e science can- no be unde s a ed, as hei essen ial
u ili y lies in he au oma ion o ansla ing messy human desi es ( ia p og amming languages)
in o plain ins uc ions ( ia bina y o ma s). O loading his e o -p one and edious wo k o
compile s has se ed hu- mankind well, as he compile has been bu one s ep in a long line o
imp o ing p og amme ’s p oduc i i y. The i s gene a ion o p og amming languages exis ed in
absen ia o compile s as we know hem oday. In ac , his pe iod was mos closely associa ed
wi h—and elied almos exclusi ely on— he physical de ices ha an he compu e p og ams [1].
Enginee s a he ime we e equi ed o ha e an in ima e and nea -pe ec unde - s anding o hei
compu ing machines, o en di ec ly loading in hand-p epa ed bina y p og ams in o memo y,
which we e la e execu ed by he onboa d cen al p ocessing uni [1]. As we mo e close o he
p esen day, one no ices a pa icula end: he p e e ence owa ds abs ac ion.
Abs ac ion can be oughly de ined as he pa ial s ipping o o al emo al o unnecessa y de ails,
ac s, and complexi ies o a hing while no al oge he dele ing i s iden i ying cha ac e is ics. This
is no only use ul in e e yday li e, whe e humans can only abso b so much in o ma ion a a ime
bu also o compu e s. This ealiza ion led o he de elopmen o he second gene a ion o
p og amming languages. The second gene a ion was mos commonly linked wi h he inno a ion
o assembly languages [1]. Thei ingenious c ea ion is mos o en a ibu ed o bo h And ew
Boo h and Ka hleen B i en in hei seminal wo k, ”Coding o A.R.C.” Fi s published in 1947,
Boo h and B i - en’s wo k desc ibes he ”Au oma ic Relay Compu e ,” whose pu pose was o
o e a mo e s aigh o wa d in e ace (A.R.C assembly) and p o ide he au oma ic ansla ion
om i s gene al sou ce o a mo e speci ied a ge [2]. Thei wo k had cemen ed he beginnings o
wha is now known as he assemble .
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
316
Howe e , he assemble (and assembly languages mo e gene ally) had i s disad an ages. I was
possible ha p og ams w i en in assembly had o ha e hei sou ce code comple ely ew i en as
new compu e ha dwa e ad ances ou paced so - wa e’s compa a i ely slow de elopmen .
Simply pu , assemble s s ill su e ed om po abili y p oblems, as hei assembly languages we e
di ec ly ied o he e e -changing ins uc ion se a chi ec u es; new machines mean ou da ed
code! An addi ional laye o abs ac ion o e assemble s, hen, was equi ed o a o d gene ali y
o e a mul i ude o assembly languages.
Du ing his e a, p og amme s success ully ma e ialized he hi d gene a ion o p og amming
languages, which played a i al ole in ad ancing p og amming language heo y. Mo e
impo an ly, he en i y behind ha spea headed his pa adigm shi in au oma ic code gene a ion
[1]. This en i y was (and is s ill) known as he compile . To be clea , compile s can come in many
di e en la o s and adap a ions; howe e , o pu poses ele an o his wo k, we shall conside
he s a ic ahead-o - ime (AOT) a ie y as opposed o jus -in- ime (JIT).
Fo a poin o compa ison, he C p og amming language is mos commonly backed by an AOT
compile . This has nume ous ad an ages and disad an ages, bu like any AOT compile , i s
cen al ea u e is s a ic compila ion. One shall de e he echnical de ails abou s a ic compila ion
o u he discussion la e in his pape . Howe e , i is easonable o b ie ly men ion ha he e m
”s a ic” con as s wi h ”dynamic.”
Now, one a i es a he ocus o his pape : he Mo ehead-Azalea Compile (MAC). The goal o
MAC is no all ha di e en when compa ed o he likes o o he con empo a y com- pile s such
as Haskell’s Glasgow-Haskell Compile (GHC) o Rus ’s us c [3]. Tha is o say ha he sole
mission o MAC is he comple e s a ic analysis o Azalea p og ams— ia a ype sys em and o he
auxilia y sys ems—such ha use s can be con iden ly su e ha hei p og ams will no c umble
unde hei ee is-a`- is segmen a ion aul s. Ano he c i ical mo i a ion o MAC is he
au oma ic ans o ma ion o Azalea code in o a sa e subse o C. C no o iously has a ”handso ”
philosophy when i comes o p o iding ail gua ds and sa e y ea u es (e.g. bounds checking), so
ha ing a means by which one can w i e sa e Azalea code wi h he pe o mance cha ac e is ics o
C is highly desi able.
1.1. Goals o MAC
 Design and implemen MAC as an ahead-o - ime compile . This means ha MAC will ne e
need o in e ace di ec ly wi h he un ime o he end-use ’s Azalea p og am.
 Simila ly, ensu e ha use aul s and bugs a e caugh ea ly a he han de e o he p og am’s
un ime.
 Cons uc and u ilize an in aluable e o epo ing sys em wi hin he compile o allow he
use clea isibili y o e mis akes in hei code.
 Once all iably possible e o s a e oo ed ou o an Aza- lea p og am, p oceed o anspile
he sou ce code o i s equi alen in he C p og amming language. A his poin , a backend o
a C compile will u he b ing down he code o he opaque le el o an execu able bina y.
 O e a simple- o-use in eg a ed de elopmen en i onmen (IDE) o allow o apid i e a ion
o e use p ojec s.
 Se ialize he Azalea abs ac syn ax ee (AST) ou o disk o be g aphed and displayed on
he use ’s sc een.
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
317
2. RELATED WORK
Due o he eno mous u ili y o compile enginee ing, he e would be an equally la ge co pus o
scien i ic li e a u e o d aw upon. By all accoun s, his assessmen is accu a e. The o i- gins o
compile enginee ing can be aced back o G ace Hop- pe , who coined he e m, al hough
Hoppe ’s use o he wo d in 1951 and i s colloquial use ha e somewha di e ged [4]. Hop- pe , in
he wo k o e he A-0 Sys em, was a he much close o wha one migh call a ”linke ” o
”loade ,” as he sys em did no make use o he hallma k componen s o ans o ma ion and
analysis, which a e u ilized in mode n compile s. I wouldn’ be un il 1952, a he Uni e si y o
Manches e , when Alick Glennie would go on o co-op he wo d ”compile ” o e e o his
Au ocode p og am, which compiled p og ams o he Manch- es e Ma k 1 [4].
Mo e ge mane o his body o wo k, howe e , comes in he la e hal o gene a ion h ee
p og amming languages, as MAC has mo e in common wi h C han i does wi h ei he Au ocode
o A-0 Sys em. This is o say ha Azalea mos ly mimics he in e nal code ans o ma ion and
analysis pipeline o C, hough wi h some sligh di e ences. While MAC mos ce ainly d aws
om i s p edecesso s, one would be emiss o neglec making compa isons wi h i s
con empo a ies. In o de o ele ance, hese languages a e Rus , Haskell, JAI, and Typesc ip .
2.1. Rus ’s Type Sys ems
Rus , much like he languages ha came be o e i , unde - s ood he alue in embedding ypes
wi hin he use ’s p og ams. While i is ce ainly he case ha almos e e y hi d gene a ion
language implemen s hei ype sys em a li le di e en ly, one can say ha i s use wi hin
p og amming languages has led o an explosion in he enhancemen in local and global easoning
o code. This is o say ha despi e he added complexi y ha comes wi h lea ning a ype sys em’s
ules, one ge s back he ad an age o ha ing e oneous p og ams excluded om he uni e sal se
o possible p og ams. This is in con as wi h aw assembly which, no ably, does no use no does
i di e en ia e be ween da a ypes. In ege s, loa ing poin s, and cha ac e s a e ep esen ed ia
by es and manipula e h ough ins uc ions. This mean ha one could make an unin en ionally
pe ilous mis ake when using, say, indi ec add essing using a gumen s ha a e no add esses! A
ca e ully planned ype sys em, much like Rus ’s, allows o he clea delinea ion be ween alue
and e e ence ypes [5]. Rus ’s ypes a e backed by he ype sys em, which is u he backed by
ype checke and in e ence subsys ems. Azalea is ele an o Rus in ha i sha es a simila design
philosophy o sepa a ing ou ype checking and in e ence in o wo sepa a e modules, allowing o
a mo e cohesi e implemen a ion o i s ype sys em. Addi ionally, bo h Rus and Azalea sha e he
no ion o so-called ”algeb aic da a ypes,” allowing o g ea e exp essibili y h ough da a
s uc u es. Figu e 1 shows he ype Sys em View unde Rus and Azalea.
2.2. Haskell’s E o Repo ing Sys em
Bo h Haskell and Azalea sha e he belie ha conc e e and ac ionable e o messages a e i al o
he usabili y o any p og amming language. To his end, signi ican wo k has been done o
bo ow he use -in e ace design philosophy om Haskell. One o he co e sou ces o use e o s,
a leas o some es ima ion, is an incohe ence p oblem when using ypes. This is o say ha bo h
Haskell and Azalea a e ex emely s ic on he placemen and use o ypes. In Haskell, i a use
decla es ha a unc ion akes wo in ege s, hen i s supplied a gumen s should also be o ype
in ege . Simon Pey on-Jones, who is known o his con ibu ions o he Haskell compile (GHC),
helped in he pi o al e olu ion o he quali y o GHC’s e o messages. Speci ically, in
”Diagnosing Haskell E o Messages,” Pey on- Jones e al. wo ked o imp o e he some imes
eso e ic diagnos ics gene a ed by using a complex ype sys em [6]. Because o hei wo k,
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
318
Haskell enjoys no only he sa e y ha comes wi h ypes bu also he cla i y ha comes wi h
exp essi e e o epo s. Azalea can be compa ed o Haskell in his ega d, as MAC ies o ca ch
any e oneous use inpu a e e y s age o i s compila ion pipeline.
Figu e 1. Type Sys em View unde Rus and Azalea
2.3. JAI’s Syn ax
One c i ical goal o Azalea is o e ing a use - iendly syn ax ha minimizes e o s h ough
consis ency and exp essibili y.
By consis ency, a leas o he pu poses o Azalea, one means ha ing a syn ax ha epea edly
uses he same (o simila ) syn ac ical s uc u es h ough a sizable p opo ion o he g amma ha
de ines i . Type quali ica ion in Azalea—a leas whe e equi ed—asse s p edic able syn ac ical
expec a ions o ha ing he ype come a e he quali ie , a he han be o e. This pa icula bi o
syn ax is wha he in e nals o MAC e e o as ”decla a ion-based.” Ha ing said his, i is wise o
men ion ha Azalea’s syn ax bo ows hea ily om JAI, which is a language w i en by he
p ominen ideo game de elope Jona han Blow [7]. Global cons an s, unc ions, s uc u es, and
enume a ions a e all de ined using a uni ied syn ax, wi h he only conside able di e ence
be ween hem being he use o he keywo ds ha ell MAC how o pa se hem app op ia ely.
By exp essibili y, one e e s o he sizable educ ion in he men al o e head one mus ake on in
w i ing co ec p og ams. Assembly uns coun e o exp essibili y, as i may ake many lines o
code o w i e a p og am ha p in s ”hello wo ld” o he sc een. Na u ally, hen, exp essibili y can
be hough o as a spec um ha co ela es wi h a p og am’s le el o abs ac ion. JAI is a much
highe -le el language han i s con empo a y (C++), so i can exp ess he same use in en wi h a
ac ion o he equi ed lines o code.
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
319
2.4. Typesc ip ’s T anspile
When w i ing a compile , one mus ha e a conc e e plan o implemen ing he code gene a ion
module. Code gene a ion gi es he compile i s p ime unc ionali y, as use s a e ypically only
conce ned wi h he end p oduc when unning hei com- pile s. Con empo a y p og amming
languages ha e gone abou his design decision in a ew di e en ways. Languages like C and
C++, o he mos pa , op o a ge some speci ied ins uc ion se a chi ec u e ia a di ec -
omachine-code implemen a ion. In con as , o he s ind i con enien o con e o some
in e media e ep esen a ion (o in e media e language) such as by ecode. Rus is an excellen
example, as i s sou ce is ansla ed in o qui e a numbe o in e media e ep esen a ions, which
include ”high-le el IR,” ”mid-le el IR,” and ”LLVM IR.” This is all o emphasize he ac ha
ans o ma ion is a pi o al pa o he o e all compila ion p ocess.
The Mo ehead-Azalea compile ’s code gene a ion module was hea ily inspi ed by he Typesc ip
anspile , o mally known as sc. An o e iew o he sc T anspile is shown in Figu e 2.
Typesc ip , om a p og amming language design pe spec i e, is a he in e es ing. I s u ili y is
p edica ed on he exis ence o Ja aSc ip . One o he co e us a ions o Ja aSc ip is i s ype
sa e y, al e na i ely, ins ead, i s lack o a s a ically checked ype sys em. Many un ime e o s in
Ja aSc ip a e, un o una ely, possible. Mic oso , who c ea ed TypeSc ip , ecognized his
immense sho coming and saw i i o design a Ja aSc ip supe se , including ype in e ence and
checking [8]. Designing a language a ound he implemen a ion, especially in TypeSc ip ’s case,
is highly aluable. No only does i mean ha one can include hei ea u es on op o an al eady
exis ing language, bu i also means ha one ge s o inhe i mos (i no all) o he unde lying
unc ion o you a ge language. The e is also a case o be made ha anspile s, such as wi h he
case o TypeSc ip , can allow de elope s o be much mo e p oduc i e, as hey can be empowe ed
by he enhancemen s made in he supe se language.
Figu e 2. O e iew o he sc T anspile
3. SOFTWARE REQUIREMENT SPECIFICATIONS
The Mo ehead-Azalea compile has a ew co e componen s. Howe e , his sec ion spli s up his
opic along ou di e en axes: Azalea’s implemen a ion language and oolchain, sup- po ed
pla o ms, language lib a ies, and he compile ’s in e nal modules.

Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
320
3.1. Azalea’s Implemen a ion Language
I was decided ela i ely ea ly in he design p ocess ha he Rus p og amming language would
be used o implemen he Mo ehead-Azalea compile . This decision was, admi edly, a
nonob ious choice (as C o C++ is ypically used o sys ems p og amming- ela ed p ojec s).
Rus was chosen because i suppo s se e al mode n p og amming pa adigms ha would
con ibu e owa ds he apid de elopmen o Azalea. These pa adigms include de i e mac os,
pa e n ma ching, and a obus ype sys em. The Rus compile , hen, is a ha d equi emen o
anyone wishing o un hei Azalea code, as he Mo ehead-Azalea compile is ”p op-up” by he
Rus compile ( us c). I is also wo h men ioning he b oade Rus ecosys em ia he end- o-end
build ool known as Ca go. Ca go is also i al o he Mo ehead-Azalea compile , as i allows us
o build ou Azalea compile om Rus sou ce code! E en ually, he Azalea p ojec migh
become ma u e enough o allow MAC o be a sel -hos ing compile , bu ha is le as a u u e
goal.
Ano he essen ial bu sepa a e ac is ha Rus is de eloped unde he open-sou ce model and is
likewise eadily a ailable o he public. Mo e impo an ly, hough, is he no ion ha Rus is a
highly po able language, which allows us o se e mos pla o ms and a chi ec u es.
3.2.Suppo ed Pla o ms
 The Windows 10 ope a ing sys em is suppo ed, hough a ew ca ea s a e equi ed o ge he
Mo ehead-Azalea com- pile unning unde his pla o m. Like C o C++, Rus e- qui es he
C s anda d lib a y o ope a e co ec ly. Hence, Windows 10 use s mus ha e MSVC and i s
associa ed build ools.
 Simila ly, MacOS is suppo ed. Use s a e expec ed o ha e he ”Xcode Command Line
Tools” package in o de o in oke Rus ia he command line.
 Finally, a Linux bina y o he Mo ehead-Azalea compile is also a ailable and has a simila
lis o dependencies. Howe e , all ha is needed on Linux is he build-essen ials package
and he Rus compile .
3.3. Language Lib a ies
The e a e a ew no able language lib a ies ha ha e igid equi emen s o his p ojec . Fi s ly,
conside he lib a ies ha Rus uses. The build p ocess o MAC assumes ha use s ha e access o
he Rus s anda d lib a y and he C s anda d lib a y. Bo h lib a ies a e used o simpli y he
de elopmen p ocess o MAC u he since hey hea ily educe he bu den o no ha ing o
” ew i e he wheel.” Howe e , mo e speci ic o he opic o w i ing he compile is he A iadne
lib a y. A iadne is a lib a y buil o he sole pu pose o c ea ing mode n and p oduc ion- quali y
e o messages. These e o messages a e ou inely used and se ed o he use on making a a al
e o somewhe e along he MAC pipeline. Fo example, a use may make a ype e o in hei
Azalea code; na u ally, i is desi ed ha MAC displays an app op ia ely o ma ed e o epo o
he use ’s sc een o in o m hem o he e o .
The MAC p ojec also uses a iny amoun o Py hon code o c ea e an ”in eg a ed de elopmen
en i onmen ” (IDE). This o m o applica ion p o ides use s wi h an easy- o-use and eadily
unde s andable in e ace be ween he non- i ial command line in e ace and he p og amming
i sel . The bene i s o ha ing an IDE o one’s language a e ela i ely s aigh o wa d o
enume a e. Since use s ypically only ca e abou w i ing and unning hei p og ams, ha ing o
w i e complex and e bose commands is usually iewed as a nega i e o he o e all use
expe ience. Hence, Azalea’s IDE can educe his men al bu - den by o e ing wha is e ec i ely a
no epad wi h bu ons.
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
321
The speci ic Py hon lib a y ha enabled he swi de elopmen o he Azalea IDE was he PyQ 5
amewo k. This lib a y, much like he IDE i sel , is bo h simple o ins all and use since i is
essen ially a high-le el w appe a ound he C++ e sion. Rega dless, Azalea use s a e equi ed o
ins all PyQ 5 and Py hon ( e sion 3.10) i hey wish o ake ad an age o he Azalea IDE.
3.4. Azalea Compile Pipeline
The Azalea pipeline o ms he basis o he p ojec as a whole. Wi hou i , he compile would be
non unc ional. This is because he pipeline akes an Azalea sou ce ile as inpu and hen
p opaga es his ile along he a ious s ages o compila ion. One can hink o he pipeline as an
o ganized assembly line whe eby ans o ma ions and analysis happen in a speci ic and
well egula ed o de . The undamen al idea, hen, begins when a use inishes w i ing hei i s
Azalea p og am. A e his pi o al momen , he use will hen a emp o anspile hei p og am
di ec ly o C using ou compile . In he in e im, many s eps need o happen be ween p og am
w i ing and execu ion. These equi ed s eps include p ep ocessing, scanning, pa sing, seman ic
analysis, code gene a ion, and code execu ion. Again, he pipeline p ocess asse s ha a s age can
only be ini ia ed when i s dependency s ages ha e concluded, he eby educing any i ial
oppo uni ies o pa allelism and concu ency wi hin MAC’s implemen a ion.
4. PROPOSED DESIGNS, METHODS AND ALGORITHMS
As p e iously discussed, a compile has many di e en o ms o assume. Some compile s, such
as Rus ’s and C’s, use wha is known as ”ahead-o - ime” compila ion. This a ie y o compila ion
asse s ha he majo i y (i no all) o he p og am’s seman ics checks and mechanical
ans o ma ions will happen be- o e he p og am’s un ime. Mo eo e , he end esul o ahead-
o - ime compila ion is an execu able bina y om which use s can un hei p og ams. Ano he
popula choice is ”jus -in- ime” compila ion, which compiles (and ecompiles) he p o- g am a
i s un ime. Since Azalea speci ically a ge s C—which is an ahead-o - ime language—i makes
sense o a o s a ic analysis a he han dynamic. In o de o accomplish ou ahead- o - ime
design, Azalea needed o ollow a pipeline design ha on loaded all o he equi ed
ans o ma ions and analysis passes o e an Azalea sou ce ile. The ollowing subsec ions will
del e in o he design o each s age in he MAC pipeline.
4.1. P ep ocesso Design
When use s w i e hei Azalea p og ams, hey may be emp ed o inse help commen s ha aid in
unde s anding hei code. No ably, commen s do no a ec hei code in any way, as he
composi ion o a commen is jus supplemen al ex . I is p uden ha hese commen s ge
s ipped om he sou ce ile be o e hey ge passed o he la e s ages o compila ion, as he
commen s a e useless and would only ob usca e he scanne and pa se . How he p ep ocesso
s ips away commen s is also c i ical. Ou cu en implemen a ion consis s o a nes ed o - loop
ha linea ly scans o ”s a ” and ”s op” commen ma ke s. Azalea uses he same me hodology as
C and C++ ega ding single and mul i-line commen s, using ”//” o he o me and ”/**/” o he
la e . I is essen ial o men ion ha obse ing a single ”/” is inconclusi e on i s own, as i may
e y well be a di ision ope a o oken. Hence, he nes ed o -loop mus peek a he adjacen
cha ac e in o de o di e en ia e be ween commen s and di ision.
Finally, p ep ocessing is also i al in he de ec ion o so- called e oneous cha ac e s. These a e
cha ac e s whose use is unsuppo ed. Fo ins ance, he ”@” cha ac e has ze o u ili y in Azalea. I
any ins ance o his cha ac e is de ec ed, Aza- lea will gene a e an e o diagnos ic and display i
o he use ia he command line. Now, how his check is implemen ed is a he cu ious, as i
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
322
assumes ha a se o cha ac e s exis s ha is alid. The easies way o achie e his unc ionali y is
by c ea ing a ”whi elis .” This whi elis will con ain only hose cha ac e s whose use is
au ho ized. As we pe o m ou commen s ipping, we simul aneously check i he gi en
cha ac e is in he whi elis . I i is, we p oceed o he nex cha ac e ; o he wise, an e o is
h own.
4.2. Scanne Design
Scanning is he nex s age in he compila ion pipeline. This s age is chie ly esponsible o
accumula ing okens, which a e analogous o wo ds and punc ua ion in e e yday w i en
languages. An o e iew o he Scanning P ocess is shown in Figu e 3. The way i wo ks is a he
simple, hough he e y i s s age in Azalea’s implemen a ion is he enume a ion o he kinds o
okens ha may be obse ed. The ypes o okens ha Azalea suppo s a e numbe s ( loa s and
in ege s), booleans ( ue o alse), s ings ( ex ), keywo ds, ope a o s, and iden i ie s. Once he
a ie ies ha e been conc e ely es ablished, he scanne can p oceed o i e a e o e he cha ac e
s eam supplied ia he p ep ocessing module. Ou scanne goes cha ac e -bycha ac e , only
s opping when i hi s and ambiguous cha ac e o whi e space. An ambiguous cha ac e is one
whose in e p e a ion depends on he nex cha ac e . Fo ex- ample, he s ing ”le x = 2.3;” Has
he ambiguous subs ing ”2.3” because he numbe ”2.3” is a loa . Since loa s depend on
he ebeing a numbe a e he decimal, he e may be an e o i his in a ian is no upheld. Floa s
mus asse ha he nex cha ac e ollowing a decimal be a numbe . O he wise, an e o is
h own o he use ! The o he in e es ing case is when a whi e space cha ac e is encoun e ed,
which means we ha e eached he end o he cu en oken and can begin p ocessing he nex one.
The scanning p ocess concludes once he p ocedu e has eached he end o cha ac e s eam.
Figu e 3. O e iew o he Scanning P ocess
4.3. Pa se Design
Azalea’s pa se di ec ly ollows scanning, and i s cen al job is consuming he oken s eam o
p oduce an abs ac syn ax ee (AST). The AST o Azalea is a ecu si e da a s uc u e whose
ields con ain poin e s o o he nodes in he ee. Each node ep esen s a ”p oduc ion ule” om
he o mal g amma ha speci ies Azalea’s syn ax. Fundamen ally, he e a e wo high le el
concep s wi h pa sing Azalea code: slo s and keys. Slo s (o holes) a e b anches wi hin he ee
whose composi ion comp ises nodes. So, he ”slo ” o he a iable binding will comp ise a
b anch wi h i e nodes. The i s node will expec he ”le ” keywo d, as i is decla es he s a o
he a iable binding. Nex , we assume ha he nex oken will be an iden i ie ( he binding name)
ollowed up only wi h he assignmen ope a o , exp ession, and semicolon. In his analogy, he
”keys” ha ill he slo s a e he okens om he oken s eam!
Compu e Science & Enginee ing: An In e na ional Jou nal (CSEIJ), Vol 15, No 1, Feb ua y 2025
323
The exac manne in which we cons uc he AST is su p isingly simple, as ou app oach uses
”Recu si e Descen ” pa sing. The main idea o his algo i hm is o model you isi o ou ines
a ound he AST i sel . This means ha he e will be a ” isi ” unc ion o e e y p oduc ion ule
ha makes up he ee. No ably, his scheme uses ecu sion, meaning he unc ions will o en call
hemsel es in o de o con e he oken s eam in o he ee.
4.4. Seman ic Analyze Design
I is highly impo an ha ou p e iously gene a ed AST be ee om any i ially de ec able
e o s, as hese mis akes will p opaga e du ing code gene a ion and p oduce mal o med C code.
The seman ic analyze akes inspi a ion om he pa se in he sense ha i is en i ely composed
o ecu si e unc ions whose sole objec i e is he e i ying ha ce ain in a ian s a e upheld.
Some seman ic checks a e ela i e simple o implemen , while o he s a e signi ican ly mo e
in ol ed. One ha is i ial o cons uc is he so called ” unc ion a i y” checke , which walks he
b anch o a unc ion call wi hin he AST o e i y i he numbe o supplied a gumen s ma ches
he expec numbe o o mal pa ame e s o he unc ion’s de ini ion. I hese wo numbe s a e
unequal, we mus epo his o he p og amme as an e o .
One decidedly complica ed check is he ype checking sys em, which is made up o o mal ” ype
ules” ha go e n how ypes can be used. An example o a ype checking e o is shown in Figu e
4. Adding wo numbe s oge he , such as adding wo in ege s, is one such ule speci ied in he
ype sys em. Mixing ypes along he bounda ies o a i hme ic and ela ional ope a o s is s ic ly
o bidden, as Azalea’s ype sys em does no inco po a e implici ype con e sion. I you wish o
ea an in ege as a loa , hen i is equi ed ha you use he ”as” keywo d which will pe o m he
explici con e sion o you.
One impo an hing o emphasize is he mo i a ion behind he Azalea seman ic analyze . Mo e
succinc ly pu : why does he Azalea compile need a seman ic analyze in he i s place? Recall
ha he cen al goal is o ansla e Azalea sou ce code in o i s C equi alen . One also knows ha
C, like any p og amming language, demands ha ce ain ules a e ollowed as o al- low o he
success ul compila ion a p o ided p og am. The e- o e, Azalea is mo i a ed by he desi e o
ca ch bugs ha would be ejec ed by he C compile (while also checking o bugs ha C does
allow). In elimina ing hese bugs ahead-o - ime, we educe he amoun o ime and e o needed
ha would o he wise be sen debugging ansplied C code. I is be e (and easie ) o debug you
Azalea code han i is o look o e he gene a ed C code a e he ac .
Figu e 4. Example o a Type Check E o