Gen AI

Author: Upadhyayula, Raghavender Surya

Publisher: Zenodo

DOI: 10.5281/zenodo.17695725

Source: https://zenodo.org/records/17695725/files/Anaconda-Sponsored_Manning_Generative-AI-in-Action.pdf

MANNING
Ami Bah ee
Fo ewo d by E ic Boyd
Sponso ed by
2EPILOGUE
Inpu ex
(p omp )
Token
Embedding
Encode
Decode
…………………..
…………………..
…………………..
Gene a ed ex
(comple ion)
Nume ical ep esen a ion
Needed o scena ios such
as “B ing you own da a,”
sea ch, e c.
LLM
…………………..
…………………..
…………………..
…………………..
…………………..
………………….
LLM
Inpu oken
ec o
Vec o ep esen a ion
o nex ou pu oken
he
ma
pad
…
…
…
Highes
p obabili y
Second highes
p obabili y
Less
likely
Nex wo d
………
………
The dog sa on
………
………
………
………
………
……… ………
………
Concep ual a chi ec u e o an LLM
LLM – Nex oken p edic o
Gene a i e AI in Ac ion
AMIT BAHREE
FOREWORD BY ERIC BOYD
MANNING
SHELTER ISLAND
Fo online in o ma ion and o de ing o his and o he Manning books, please isi
www.manning.com. The publishe o e s discoun s on his book when o de ed in quan i y.
Fo mo e in o ma ion, please con ac
Special Sales Depa men
Manning Publica ions Co.
20 Baldwin Road
PO Box 761
Shel e Island, NY 11964
Email: [email p o ec ed]m
©2024 by Manning Publica ions Co. All igh s ese ed.
No pa o his publica ion may be ep oduced, s o ed in a e ie al sys em, o ansmi ed, in
any o m o by means elec onic, mechanical, pho ocopying, o o he wise, wi hou p io
w i en pe mission o he publishe .
Many o he designa ions used by manu ac u e s and selle s o dis inguish hei p oduc s a e
claimed as adema ks. Whe e hose designa ions appea in he book, and Manning
Publica ions was awa e o a adema k claim, he designa ions ha e been p in ed in ini ial caps
o all caps.
Recognizing he impo ance o p ese ing wha has been w i en, i is Manning’s policy o ha e
he books we publish p in ed on acid- ee pape , and we exe ou bes e o s o ha end.
Recognizing also ou esponsibili y o conse e he esou ces o ou plane , Manning books
a e p in ed on pape ha is a leas 15 pe cen ecycled and p ocessed wi hou he use o
elemen al chlo ine.
The au ho and publishe ha e made e e y e o o ensu e ha he in o ma ion in his book
was co ec a p ess ime. The au ho and publishe do no assume and he eby disclaim any
liabili y o any pa y o any loss, damage, o dis up ion caused by e o s o omissions, whe he
such e o s o omissions esul om negligence, acciden , o any o he cause, o om any
usage o he in o ma ion he ein.
Manning Publica ions Co. De elopmen edi o : Rebecca Johnson
20 Baldwin Road Technical edi o : Wee Hyong Tok
PO Box 761 Re iew edi o : Radmila E cego ac
Shel e Island, NY 11964 P oduc ion edi o : Ka hy Rossland
Copy edi o : Lana Todo o ic-A nd
P oo eade : Melody Dolab
Technical p oo eade : John Aziz
Typese e and co e designe : Ma ija Tudo
ISBN 9781633435339
P in ed in he Uni ed S a es o Ame ica
To my amily, who pa ien ly lis ened o my ech ambles,
al hough hey we e no help in w i ing his book and will ne e ead i ,
and o you, dea eade , who boldly chose o engage wi h hese ideas—
may you neu ons spa k joy and you ci cui s ne e sho .
Toge he , le ’s build a u u e whe e AI is mo e b ains han b awn.

i
b ie con en s
PART 1FOUNDATIONS OF GENERATIVE AI ................................... 1
1■In oduc ion o gene a i e AI 3
2■In oduc ion o la ge language models 26
3■Wo king h ough an API: Gene a ing ex 57
4■F om pixels o pic u es: Gene a ing images 96
5■Wha else can AI gene a e? 127
PART 2ADVANCED TECHNIQUES AND APPLICATIONS 153
6■Guide o p omp enginee ing 155
7■Re ie al-augmen ed gene a ion: The sec e weapon 183
8■Cha ing wi h you da a 213
9■Tailo ing models wi h model adap a ion and ine- uning 242
PART 3DEPLOYMENT AND ETHICAL CONSIDERATIONS 281
10 ■Applica ion a chi ec u e o gene a i e AI apps 283
11 ■Scaling up: Bes p ac ices o p oduc ion deploymen 321
12 ■E alua ions and benchma ks 357
13 ■Guide o e hical GenAI: P inciples, p ac ices, and pi alls 384
con en s
o ewo d xii
p e ace xi
acknowledgmen s x i
abou his book x iii
abou he au ho xxiii
abou he co e illus a ion xxi
PART 1 FOUNDATIONS OF GENERATIVE AI .................... 1
1 In oduc ion o gene a i e AI 3
1.1 Wha is his book abou ? 5
1.2 Wha is gene a i e AI? 6
1.3 Wha can we gene a e? 9
En i ies ex ac ion 9 ■Gene a ing ex 10 ■Gene a ing
images 12 ■Gene a ing code 12 ■Abili y o sol e logic
p oblems 14 ■Gene a ing music 15 ■Gene a ing ideos 17
1.4 En e p ise use cases 17
1.5 When no o use gene a i e AI 19
1.6 How is gene a i e AI di e en om adi ional AI? 19
1.7 Wha app oach should en e p ises ake? 21
1.8 A chi ec u e conside a ions 23
1.9 So you en e p ise wan s o use gene a i e AI. Now wha ? 24
CONTENTS i
2 In oduc ion o la ge language models 26
2.1 O e iew o ounda ional models 27
2.2 O e iew o LLMs 29
2.3 T ans o me a chi ec u e 30
2.4 T aining cu o 31
2.5 Types o LLMs 31
2.6 Small language models 33
2.7 Open sou ce s. comme cial LLMs 35
Comme cial LLMs 36 ■Open sou ce LLMs 36
2.8 Key concep s o LLMs 38
P omp s 39 ■Tokens 40 ■Coun ing okens 42
Embeddings 45 ■Model con igu a ion 47 ■Con ex
window 50 ■P omp enginee ing 51 ■Model adap a ion 52
Eme gen beha io 52
3 Wo king h ough an API: Gene a ing ex 57
3.1 Model ca ego ies 58
Dependencies 60 ■Lis ing models 62
3.2 Comple ion API 64
Expanding comple ions 67 ■Azu e con en sa e y il e 68
Mul iple comple ions 69 ■Con olling andomness 71
Con olling andomness using op_p 74
3.3 Ad anced comple ion API op ions 75
S eaming comple ions 75 ■In luencing oken p obabili ies:
logi _bias 77 ■P esence and equency penal ies 80
Log p obabili ies 82
3.4 Cha comple ion API 84
Sys em ole 86 ■Finish eason 88 ■Cha comple ion API o
noncha scena ios 88 ■Managing con e sa ion 89
Bes p ac ices o managing okens 92 ■Addi ional LLM
p o ide s 93
4 F om pixels o pic u es: Gene a ing images 96
4.1 Vision models 97
Va ia ional au oencode s 100 ■Gene a i e ad e sa ial
ne wo ks 101 ■Vision ans o me models 102
Di usion models 104 ■Mul imodal models 106
4.2 Image gene a ion wi h S able Di usion 109
Dependencies 109 ■Gene a ing an image 111
CONTENTS ii
4.3 Image gene a ion wi h o he p o ide s 114
OpenAI DALLE 3 114 ■Bing image c ea o 114
Adobe Fi e ly 115
4.4 Edi ing and enhancing images using S able
Di usion 116
Gene a ing using image- o-image API 119 ■Using he masking
API 121 ■Resize using he upscale API 124 ■Image
gene a ion ips 125
5 Wha else can AI gene a e? 127
5.1 Code gene a ion 128
Can I us he code? 130 ■Gi Hub Copilo 132
How Copilo wo ks 135
5.2 Addi ional code- ela ed asks 136
Code explana ion 136 ■Gene a e es s 138 ■Code
e e encing 139 ■Code e ac o ing 140
5.3 O he code gene a ion ools 140
Amazon CodeWhispe e 141 ■Code Llama 142
Tabnine 143 ■Check you sel 145 ■Bes p ac ices o
code gene a ion 145
5.4 Video gene a ion 146
5.5 Audio and music gene a ion 149
PART 2ADVANCED TECHNIQUES AND APPLICATIONS 153
6 Guide o p omp enginee ing 155
6.1 Wha is p omp enginee ing? 156
Why do we need p omp enginee ing? 156
6.2 The basics o p omp enginee ing 158
6.3 In-con ex lea ning and p omp ing 161
6.4 P omp enginee ing echniques 163
Sys em message 163 ■Ze o-sho , ew-sho , and many-sho
lea ning 166 ■Use clea syn ax 168 ■Making in-con ex
lea ning wo k 169 ■Reasoning: Chain o Though 170
Sel -consis ency sampling 173
6.5 Image p omp ing 175
6.6 P omp injec ion 176
6.7 P omp enginee ing challenges 179
xi
p e ace
Wi h nea ly 30 yea s o expe ience as a de elope and applied esea che , I ha e been
in ol ed in undamen al echnology shi s om he ea ly days. Gene a i e a i icial
in elligence (AI) is one o hose a eas whe e he hype and he ea o missing ou
each s a osphe ic le els! O ganiza ions a e ying o unde s and his new echnology
and how o implemen i . Some o his means ying o gain an edge; in o he cases, i
is esponding o he ma ke and he p essu e om he boa d and CEOs o join he
end.
A Mic oso , I ha e he p i ilege o being pa o he Azu e AI pla o m enginee -
ing eam, helping de elop some o ou ad anced AI echnologies, such as Azu e
OpenAI, and Azu e AI Se ices, including speech, ision, and small language models
(e.g., he new Phi amily o models). Pa o my ole has been collabo a ing wi h many
Fo une 500 companies ha a e ou clien s. These companies a e sca e ed a ound
he wo ld, ep esen ing di e en indus y domains, wi h many o hem being leade s
in hei ields.
My expe ience wi h GenAI ac oss a ious domains and applica ions, pa icula ly in
collabo a ion wi h Fo une 500 companies, has e ealed ha he e is a gap be ween
he hype and he eali y o gene a i e AI. I’ e no iced ha many use s and cus ome s
a e con used o in imida ed by he complexi y and challenges o his ield. In
esponse, I se ou o w i e a book o b idge his gap, p o iding a p ac ical and accessi-
ble guide o gene a i e AI. This guide empowe s anyone, ega dless o backg ound, o
lea n and apply gene a i e AI e ec i ely.
The echnology indus y is known o i s apid pace, bu he ield o GenAI is g ow-
ing e en as e , and we see changes in weeks a he han mon hs and yea s. While I
was w i ing his book, he echnology ad anced, and I ha e had o upda e many o he
new a eas in he book se e al imes. Howe e , he basics o GenAI and la ge language

PREFACE x
models (LLM) emain no el and c ucial o g asp. These a e he building blocks on
which new a eas a e being de eloped. Unde s anding hese undamen als is no jus a
goal o he book bu a necessi y in his apidly e ol ing ield.
This book ocuses on gene a i e AI aspec s, especially LLMs, which a e o en he
mos common use cases. I expec newe models wi h addi ional mul imodal capabili ies
ha combine ision, speech, and ideo will g ow in he u u e. He e, we’ll mainly use
OpenAI and Azu e OpenAI, bu I also show o he p o ide s’ examples. Mos LLM p o-
ide s a e simila o OpenAI, so he book is bene icial e en i you use a di e en p o-
ide . I also used Py hon o he examples, as i is easy and common in AI. In addi ion,
he e a e SDKs o mos languages and REST APIs ha you can call in any language.
Welcome o Gene a i e AI in Ac ion, a book aiming o demys i y he gene a i e AI
ield and help you apply i o you p ojec s. I am exci ed o sha e some insigh s om
my lea ning and assis you on you pa h.
x i
acknowledgmen s
Fi s and o emos , I wan o hank my pa en s o le ing me disappea in o he “com-
pu e oom” o inke wi h hose amazing machines and o buying me my i s com-
pu e . I also hank my wi e, Meenakshi, o pu ing up wi h me, especially when I
con enien ly igno ed mos o he hings and wo ked h ough he g a eya d shi a e
long days o w i e he book and code. To my daugh e Maya, I hank you o ne e
doub ing my li e al and coding abili ies (e en i i came wi h a eenage ’s eye oll).
This book would no be comple e wi hou my dog, Champ, who, as you will see, is a
ecu ing heme. And inally, I hank my dea iend Somya o showing us wha ue
cou age looks like and eminding us ha mos o li e’s d amas a e jus hings we ge
ou sel es wo ked up o e .
I hank E ic Boyd o w i ing he o ewo d and o his ime and collabo a ion on
his p ojec . Wo king unde his guidance on he Azu e AI eam has been an exhila a -
ing expe ience. Pushing he limi s o echnology and ekindling ha childlike exci e-
men in all o us—i eminds me why I ell in lo e wi h compu e s and p og amming
in he i s place.
A special hanks goes o Wee Hyong Tok, he echnical edi o o his book, o his
inc edible ime spen assis ing, di ec ing, challenging, and e i ying e e y hing. You
e o s ha e been in aluable in my lea ning and in imp o ing his book! Wee Hyong is
a pa ne di ec o o p oduc a Mic oso . He has a PhD in compu e science om
he Na ional Uni e si y o Singapo e and is a ecognized expe on da a and AI. He
has also au ho ed o e 10 books on AI.
To all he e iewe s—Ami Basnak, And es Sacco, A un Kand egula, B uno
Rica do San os, Dan Sheikh, E im E ü k, G ego y V, Ha iskuma Panakkal, Ike
Okonkwo, James Coa es, Julien Pohie, Lokesh Kuma , Louis Luangkeso n, Luiz Da i,
Manish Jain, Ma eo Ba is a, Maxim Volgin, Na han B. C ocke , P adeep Bha ip olu,
ACKNOWLEDGMENTS x ii
Radhak ishna MV, Raj Kuma , Rambabu Posa, Roy Wilske , Rui Liu, Sanjee Jaiswal,
Sco Ling, Simon Ve hoe en, Sumi Pal, Sushil Singh, Swamina han Sub amanian,
Swapneelkuma Deshpande, Vic o Du án, and We onika Bu man—you sugges ions
helped make his a be e book.
Finally, I would like o hank he eam a Manning. I ha e immense empa hy and
g a i ude o my de elopmen edi o , Rebecca Johnson, and acquisi ions edi o , Mike
S ephens. Rebecca especially dese es a medal o making sense o my ini ial d a s
and u ning gibbe ish in o cohe en con en . Thank you all o you pa ience and
dedica ion!
x iii
abou his book
Gene a i e AI in Ac ion is designed o equip en e p ise p o essionals and en husias s
wi h he knowledge and skills o e ec i ely use gene a i e AI echnologies. This book
p o ides a comp ehensi e unde s anding o gene a i e AI, co e ing i s undamen al
p inciples, p ac ical applica ions, and he challenges associa ed wi h implemen ing i
in eal-wo ld scena ios.
The book eaches you how o c ea e and use gene a i e models o asks and use
cases. I ocuses on his echnology’s p ac ical and hands-on aspec s and how i wo ks.
I does no di e deep in o he science, bu i e e ences he pape s and scien i ic
b eak h oughs ha ha e helped de elop some o he echnology—you can see hese
a he end o he book.
This book is designed o p o ide a comp ehensi e unde s anding o gene a i e AI
and i s po en ial wi hin an en e p ise con ex . I explo es ounda ional models, la ge
language models, and ela ed algo i hms and a chi ec u es, o e ing eade s a ho -
ough g asp o hese ad anced echnologies. P ac ical insigh s and examples a e p o-
ided o help de elop and deploy gene a i e AI models, ensu ing ha eade s can
apply hese concep s in eal-wo ld scena ios.
Ad anced opics such as p omp enginee ing, e ie al-augmen ed gene a ion,
and model adap a ion a e discussed in de ail, gi ing eade s an in-dep h unde s and-
ing o hese cu ing-edge echniques. The book also highligh s bes p ac ices o in e-
g a ing gene a i e AI in o exis ing sys ems and wo k lows, ensu ing a smoo h and
e icien implemen a ion. Fu he mo e, i add esses he e hical conside a ions, go e -
nance, and sa e y measu es necessa y o esponsible AI deploymen , guiding eade s
on how o esponsibly na iga e he complexi ies o his apidly e ol ing ield.
ABOUT THIS BOOK xix
Who should ead his book
Gene a i e AI in Ac ion is designed o a di e se audience. I is ideal o de elope s and
so wa e a chi ec s looking o in eg a e gene a i e AI in o hei p ojec s and da a sci-
en is s who wan o enhance hei unde s anding o gene a i e AI echnologies and
applica ions. Business and echnical decision-make s will ind i aluable o g asping
he s a egic implica ions o gene a i e AI o hei o ganiza ions. Powe use s ac oss
a ious en e p ise sec o s can explo e gene a i e AI’s p ac ical applica ions and bene-
i s. Addi ionally, educa o s and s uden s in AI- ela ed ields will gain comp ehensi e
knowledge o he la es ad ancemen s in gene a i e AI.
This book p ima ily a ge s de elope s, da a scien is s, and echnology decision-
make s wi h some p og amming backg ound who wan o explo e he ascina ing and
powe ul wo ld o gene a i e AI. One doesn’ need o be an expe in machine lea n-
ing, deep lea ning, o gene a i e AI o ha e a PhD in ma hema ics o ollow his book.
S ill, you should be amilia wi h he basics o APIs, SDKs, and Py hon o one o he
o he common p og amming languages.
How his book is o ganized: A oad map
Gene a i e AI in Ac ion is di ided in o h ee main pa s, encompassing 13 chap e s.
Each chap e is c a ed o build on p e ious ones, p o iding a s uc u ed and comp e-
hensi e lea ning expe ience.
The i s pa , “Founda ions o Gene a i e AI,” lays he ounda ion o gene a i e
AI, s a ing wi h new use cases and a comp ehensi e unde s anding o he basics,
including ounda ional models. I del es in o he a chi ec u e o LLMs, demons a -
ing hei applica ion ac oss a ious modali ies such as ex , images, code, and cha .
This sec ion also includes examples o help eade s g asp hese new AI echnologies
e ec i ely:
Chap e 1 in oduces he basics o gene a i e AI, di e en ia ing i om adi-
ional AI and showcasing i s po en ial h ough a ious eal-wo ld applica ions.
Chap e 2 del es in o he a chi ec u e and unc ionali y o LLMs, explo ing
hei capabili ies and limi a ions.
Chap e 3 co e s p ac ical s eps o gene a e ex using APIs, including hands-on
examples.
Chap e 4 shows you how gene a i e AI can c ea e images om ex desc ip-
ions and unde s and he unde lying models, such as DALL-E.
Chap e 5 explo es o he gene a i e AI applica ions, such as gene a ing music,
code, and 3D models.
The book’s second pa , “Nex s eps wi h gene a i e AI,” ocuses on ad anced opics
c ucial o anyone wan ing o deploy a GenAI-powe ed applica ion. This pa
add esses new a chi ec u e pa e ns and cons uc s such as p omp enginee ing, da a

ABOUT THIS BOOKxx
in eg a ion, ine- uning, and model adap a ion. I also explo es he componen s o
he new GenAI applica ion s ack:
Chap e 6 is a de ailed guide o c a ing e ec i e p omp s o achie e desi ed
ou pu s om gene a i e AI models.
Chap e 7 explains how o enhance gene a i e AI models by inco po a ing
ex e nal da a sou ces.
Chap e 8 eaches you how o in eg a e con e sa ional AI wi h you en e p ise
da a o mo e in e ac i e applica ions.
Chap e 9 eaches you echniques o cus omizing gene a i e AI models o be -
e sui speci ic use cases.
The book’s inal sec ion, “Deploymen and e hical conside a ions,” co e s bes p ac ices
o p oduc ion deploymen , scaling s a egies, e alua ion and benchma king ech-
niques, and esponsible and e hical AI guidelines. These ad anced opics a e essen ial
o o ganiza ions p epa ing o deploy and u ilize gene a i e AI in p oduc ion a scale:
Chap e 10 will help you unde s and he a chi ec u al conside a ions o de el-
oping and deploying gene a i e AI applica ions.
Chap e 11 o e s s a egies o scaling gene a i e AI models in a p oduc ion
en i onmen .
Chap e 12 eaches you how o e alua e and benchma k gene a i e AI models
o ensu e hey mee pe o mance s anda ds.
Chap e 13 is a comp ehensi e guide on he e hical conside a ions, go e -
nance, and sa e y measu es necessa y o esponsible AI deploymen .
The book is designed o be ead sequen ially om co e o co e , as each chap e
builds on he concep s in oduced in he p e ious chap e s. Howe e , eade s al eady
amilia wi h he basics may ocus on speci ic chap e s ha add ess hei pa icula
in e es s o needs. Code samples a e included h oughou he book o ein o ce
lea ning and p o ide hands-on expe ience. Running hese samples is highly ecom-
mended; he code can be ound in he book’s Gi Hub eposi o y. This app oach
ensu es ha eade s unde s and he heo e ical aspec s o gene a i e AI and gain
p ac ical skills o implemen hese echnologies e ec i ely.
This book ocuses on Azu e OpenAI and OpenAI, he leading LLM pla o ms, due
o hei s abili y and en e p ise eadiness. I aims o educa e eade s on gene a i e AI
applica ions in business, wi h p inciples applicable ac oss a ious LLMs. While i
includes di e se LLM examples and open sou ce models, he emphasis is on he Mic-
oso s ack, mainly because i is widely used in he indus y and also accessible o he
au ho .
Abou he code
This book p o ides sou ce code o a ious chap e s o enhance he hands-on lea ning
expe ience. The code is designed o help you p ac ice and apply he concep s discussed
in he book. You can download he sou ce code o he ele an chap e s o he book.
ABOUT THIS BOOK xxi
Many examples o sou ce code a e con ained bo h in numbe ed lis ings and in
line wi h no mal ex . In bo h cases, sou ce code is o ma ed in a
ixed-wid h on
like his
o sepa a e i om o dina y ex . Some imes, code is also
in bold
o high-
ligh code ha has changed om p e ious s eps in he chap e , such as when a new
ea u e adds o an exis ing line o code.
In many cases, he o iginal sou ce code has been e o ma ed; we’ e added line
b eaks and ewo ked inden a ion o accommoda e he a ailable page space in he
book. In a e cases, e en his was no enough, and lis ings include line-con inua ion
ma ke s (➥). Addi ionally, commen s in he sou ce code ha e o en been emo ed
om he lis ings when he code is desc ibed in he ex . Code anno a ions accompany
many o he lis ings, highligh ing impo an concep s.
You can ge execu able snippe s o code om he li eBook (online) e sion o his
book a h ps://li ebook.manning.com/book/gene a i e-ai-in-ac ion. The comple e
code o he examples in he book is a ailable o download om he Manning web-
si e a www.manning.com/books/gene a i e-ai-in-ac ion, and om Gi Hub a h ps://
gi hub.com/bah ee/GenAIBook.
You will need he ollowing so wa e and e sions o un he p o ided code:
IDE—Visual S udio Code (o simila ).
Py hon—Ve sion 3.7.1 o la e ; we use e sion 3.11.3 o he book.
Package manage —Al hough echnically a package manage is no needed, i
would make hings much easie o main ain. We use conda o he book, bu
you can use any package manage .
Gi —Gi en we a e using Gi Hub, you need Gi ins alled locally.
Docke —Used o con aine ized deploymen s and ep oducible en i onmen s. In
he second pa o he book, con aine s a e u ilized o mo e ad anced use cases.
Va ious SDKs—Used o ex and image gene a ion examples, including Azu e
OpenAI, OpenAI, Gemini, e c.
Va ious o he packages—Used o wo king h ough di e en aspec s o he chap e s.
I edi ed mos o he book’s code o cla i y and b e i y. Fo example, I le ou some
hings ha a e no e y use ul in a p in ed book, such as excep ion handling, boile -
pla e unc ions, and so o h. The Gi Hub eposi o y has all hese, and he code he e
is es ed and unnable.
These ools and lib a ies a e essen ial o unning he examples and exe cises p o-
ided in he book. Ensu e you ha e he co ec e sions ins alled o a oid compa ibil-
i y issues. De ailed ins uc ions o se ing up he en i onmen and dependencies a e
included in he Gi Hub code eposi o y, which can be ound a h ps://gi hub.com/
bah ee/GenAIBook.
li eBook discussion o um
Pu chase o Gene a i e AI in Ac ion includes ee access o li eBook, Manning’s online
eading pla o m. Using li eBook’s exclusi e discussion ea u es, you can a ach
ABOUT THIS BOOKxxii
commen s o he book globally o o speci ic sec ions o pa ag aphs. I ’s a snap o
make no es o you sel , ask and answe echnical ques ions, and ecei e help om
he au ho and o he use s. To access he o um, go o h ps://li ebook.manning
.com/book/gene a i e-ai-in-ac ion/discussion. You can also lea n mo e abou Man-
ning's o ums and he ules o conduc a h ps://li ebook.manning.com/discussion.
Manning’s commi men o ou eade s is o p o ide a enue whe e a meaning ul
dialogue be ween indi idual eade s and be ween eade s and he au ho can ake
place. I is no a commi men o any speci ic amoun o pa icipa ion on he pa o
he au ho , whose con ibu ion o he o um emains olun a y (and unpaid). We sug-
ges you y asking he au ho some challenging ques ions les his in e es s ay! The
o um and he a chi es o p e ious discussions will be accessible om he publishe ’s
websi e as long as he book is in p in .
xxiii
abou he au ho
AMIT BAHREE is a P incipal TPM a Mic oso ,
whe e he is pa o he enginee ing eam building
he nex gene a ion o AI p oduc s and se ices o
millions o cus ome s using he Azu e AI pla o m.
He is also esponsible o cus om enginee ing
ac oss he pla o m wi h key cus ome s, sol ing
complex en e p ise scena ios using all o ms o AI,
including gene a i e AI.
A simple geek a hea , Ami has nea ly 30 yea s
o expe ience in echnology and p oduc de elop-
men . He has a s ong backg ound in applied esea ch, machine lea ning, AI, and
cloud pla o ms. He is passiona e abou c ea ing po en and esponsible AI p oduc s
ha ans o m indus ies and imp o e li es.
Ami esides in he Sea le a ea wi h his wi e, daugh e , and he swee es dog, who
is no spoil o en.
6CHAPTER 1 In oduc ion o gene a i e AI
1.2 Wha is gene a i e AI?
Gene a i e AI is no a new ield o AI, bu i has gained mo e popula i y and a en ion
la ely. I can gene a e new con en in a ious ou pu s— om ealis ic human aces and
w i ing pe suasi e ex o com-
posing music and de eloping
no el d ug compounds. This
new AI echnique is abou epli-
ca ing exis ing pa e ns, imagin-
ing new ones, c a ing new
scena ios, and c ea ing new
knowledge.
As shown in igu e 1.2, gene a-
i e AI is a subsec ion o AI ha
is ained on a as a ay o da a
o lea n he unde lying pa e ns
and dis ibu ions. The magic lies
in i s po en ial o gene a e some-
hing no el and o iginal, a ask
p e iously belie ed o be he sole
domain o human ingenui y.
Machine and deep lea ning
p o ide he undamen al ech-
niques we need o unde s and
be o e di ing in o gene a i e AI.
They gi e us he oolki o na iga e he landscape o AI and unde s and he p ocesses
behind da a enginee ing, model aining, and in e ence. As we p og ess h ough his
book, we will apply hese p inciples bu will no ge in o he de ails. Mul iple books
ha e been dedica ed o bo h opics, and i would be mo e p uden o he eade o
consul hose o de ails.
A i s simples , machine lea ning (ML) is he scien i ic discipline ocusing on how
compu e s can lea n om da a. Ins ead o explici ly p og amming compu e s o ca y
ou asks, in ML, we de elop algo i hms ha can lea n om and make p edic ions o
decisions based on da a. This da a-d i en decision-making is applicable o nume ous
eal-wo ld scena ios, anging om spam il e ing in emails o ecommenda ion sys-
ems on e-comme ce pla o ms.
Deep lea ning (DL), a subse o ML, akes his concep u he . I uses a i icial
neu al ne wo ks wi h se e al laye s. These ne wo ks a emp o simula e he beha io
o he human b ain—albei in a simpli ied o m— o lea n om la ge amoun s o
da a. While a neu al ne wo k wi h a single laye can s ill make app oxima e p edic-
ions, addi ional hidden laye s can help op imize i s accu acy. DL d i es many
AI applica ions oday and helps execu e asks wi h imp o ed e iciency, speed, and
scale.
A i icial
in elligence
Machine
lea ning
Deep lea ning
Gene a i e AI
Figu e 1.2 Gene a i e AI o e iew

71.2 Wha is gene a i e AI?
An AI model is a sophis ica ed algo i hmic s uc u e ained on ex ensi e da ase s
o au onomously pe o m speci ic asks such as ex gene a ion, ansla ion, and
decision-making. These models lea n om da a pa e ns o mimic human cogni i e
abili ies, which enables hem o unde s and and gene a e na u al language. Once
ained, de elope s should ecognize ha hese models can p ocess and analyze da a
independen ly, using ML and DL echniques. ML models apply ma hema ical ame-
wo ks o da a o p edic ions, while DL models use neu al ne wo ks o complex asks
in ol ing uns uc u ed da a. In essence, an AI model is a sel -su icien ool ha can
ca y ou in elligen asks based on lea ned da a pa e ns a e aining, which a e c u-
cial o c ea ing sma applica ions.
Gene a i e AI is an e olu ion o DL. Many inco ec ly assume ha Cha GPT is gen-
e a i e AI. Cha GPT is a web applica ion ha uses gene a i e AI a i s simples le el.
The ise and popula i y o Cha GPT exposed many olks o gene a i e AI, and he
powe o he o he gene a i e models called la ge language models (LLMs) is, as he
name sugges s, ela ed o language. OpenAI ained Cha GPT on di e se in e ne ex
o p oduce a human-like con e sa ion. In addi ion o Cha GPT, able 1.1 ou lines
some o he key gene a i e AI models used oday; hese a e g ouped by gene a ed AI
a ea ypes: language, image, and code gene a ion.
Table 1.1 Popula gene a i e AI models
Name Desc ip ion A ea
Gene a i e
P e- ained T ans-
o me (GPT)
A la ge language model de eloped by OpenAI and ained on a
massi e da ase o ex and code can gene a e ex , ansla e lan-
guages, w i e a ious kinds o c ea i e con en , and answe you
ques ions in o ma i ely. GPT4-Omni (mo e commonly e e ed o
as GPT-4o) is a mul imodal model. A he ime o w i ing, i is he
la es e sion and is a signi ican upg ade om GPT-4, o e ing
speed, cos , and capabili y imp o emen s.
Language/
mul imodal
Llama 3 Me a ecen ly eleased he hi d e sion o a na u al la ge lan-
guage model, open-sou ced unde a special license. The models
come in a ious sizes and ha e a ying capabili ies.
Language
Claude 3 An h opic has in oduced he Claude 3 model amily, which
includes Claude 3 Haiku, Claude 3 Sonne , and Claude 3 Opus.
These models o e a ange o capabili ies, wi h Opus being he
mos in elligen . I is capable o complex asks and exhibi s nea -
human comp ehension and luency le els. Like OpenAI’s Cha GPT,
Claude can gene a e ex , w i e code, summa ize, and eason,
among o he hings, o a gi en p omp .
Language
Cohe e Command Cohe e o e s wo models (Command R and Command R+) as
pa o i s Command amily. While hese LLMMS a e op imized o
a ious use cases, Cohe e’s newes la ge language model, Com-
mand R+, is op imized o con e sa ional in e ac ion and long-
con ex asks. I is designed o be highly pe o man o complex
e ie al-augmen ed gene a ion (RAG) wo k lows and mul is ep
ool use.
Language
8CHAPTER 1 In oduc ion o gene a i e AI
The ollowing lis desc ibes a ew a eas whe e gene a i e AI is used oday. We expec o
see e en mo e inno a i e and c ea i e applica ions as gene a i e AI echnology
de elops:
Images—This echnology c ea es ealis ic images o people, objec s, and scenes
ha do no exis in he eal wo ld. I is used o a ious pu poses, such as c ea -
ing i ual wo lds o gaming and en e ainmen , gene a ing ealis ic p oduc
images o e-comme ce, and aining da a o o he AI models.
Videos—C ea es ideos ha do no exis in he eal wo ld. This echnology is
used o a ious pu poses, such as c ea ing special e ec s o mo ies and TV
shows, gene a ing aining da a o o he AI models, and c ea ing pe sonalized
ideo con en o ma ke ing and ad e ising.
Mis al Mis al La ge Language Models a e ad anced AI models designed
o ex gene a ion and o he language asks. They ha e models in
di e en sizes om a collec ion o open sou ce models (Mis al-
7B, 8x7B, and 8x22B) and op imized comme cial models (Mis al
Small, Medium, and La ge), each ailo ed o di e en easoning
complexi ies and wo kloads.
Language
Gemini Gemini is Google’s new mul imodal model ha can unde s and
ex , images, ideos, and audio. I will be a ailable in di e en
sizes (Ul a, P o, and Nano), each wi h di e en capabili ies.
Language/
mul imodal
DALL-E Visual AI model de eloped by OpenAI ha can c ea e ealis ic
images om ex p omp s
Image
S able Di usion Open sou ce image gene a ion model ha gene a es images om
a p omp as inpu . I is p ima ily used o gene a e de ailed images
condi ioned on ex desc ip ions and can also be applied o o he
asks such as inpain ing, ou pain ing, and gene a ing image- o-
image ansla ions.
Image
Midjou ney An image gene a ion model using na u al language p omp s om
a s a up called Midjou ney, Inc., simila o OpenAI’s DALL-E and
S able Di usion.
Image
CodeWhispe e CodeWhispe e is an AWS code-gene a ion model ha can gene -
a e code in se e al p og amming languages, including Py hon,
Ja a, Ja aSc ip , and TypeSc ip .
Code
CodeLlama CodeLlama is a la ge language model buil on Llama 2 and speci -
ically ained on code. I is a ailable in a ious sizes and suppo s
mul iple popula p og amming languages.
Code
Codex A la ge language model is ained speci ically on code and used o
help wi h code gene a ion. I suppo s o e a dozen p og amming
languages, including some o he mo e commonly used, such as
C#, Ja a, Py hon, Ja aSc ip , SQL, Go, PHP, and Shell, among
o he s.
Code
Table 1.1 Popula gene a i e AI models (con inued)
Name Desc ip ion A ea
91.3 Wha can we gene a e?
Tex (language)—This echnology c ea es ealis ic ex , such as news a icles,
blog pos s, and c ea i e w i ing. I is used o a ious pu poses, such as gene a -
ing con en o websi es and social media, c ea ing pe sonalized ma ke ing
ma e ials, and c ea ing syn he ic da a.
Tex (code)—Gene a i e AI models augmen and assis de elope s when hey
w i e code. Gi Hub’s esea ch ound ha de elope s who use i s Copilo ea u e
eel 88% mo e p oduc i e and a e 96% as e on epe i i e asks.
Music—Gene a i e AI models a e being used o c ea e o iginal and c ea i e
new music. This echnology se es a ious pu poses, such as c ea ing music o
mo ies and TV shows, gene a ing pe sonalized playlis s, and c ea ing aining
da a o o he AI models.
We’ll di e in o he speci ics o how gene a i e AI wo ks in he nex chap e , bu o
now, le ’s discuss wha can be gene a ed using his echnology and how i can help
you en e p ise.
1.3 Wha can we gene a e?
When i comes o gene a ing hings using gene a i e AI, he sky is he limi . As dis-
cussed ea lie , we can gene a e ex , images, music, code, oice, and e en designs.
Be o e we look a some examples o hings ha can be gene a ed, i is wo h no ing
ha gene a i e AI does no unde s and he con en as humans do. I uses pa e ns in
he da a (pa o i s aining se ) o gene a e new, simila da a— he quali y and ele-
ance o he gene a ed con en a e di ec ly co ela ed o he quali y and ele ance o
he aining da a.
1.3.1 En i ies ex ac ion
We can use gene a i e AI, speci ically a la ge language model (LLM), o ex ac en i-
ies om ex . En i ies a e pieces o in o ma ion ha a e o in e es o us. In he pas ,
we would need o use a named en i y ecogni ion (NER) model o en i y ex ac ion;
u he mo e, ha model would need o ha e seen he da a and be ained as pa o
i s da ase . Wi h LLM models, we can do his wi hou any aining, and hey a e mo e
accu a e. While adi ional NER me hods a e e ec i e, hey o en equi e manual
e o and domain-speci ic cus omiza ion. LLMs ha e signi ican ly educed his bu -
den, o e ing a mo e e icien and o en mo e accu a e app oach o NER ac oss a i-
ous domains. A key eason is he T ans o me a chi ec u e, which we will co e in he
nex ew chap e s. This is a g ea example o adi ional AI being mo e igid and less
lexible han gene a i e AI.
He e, we will use OpenAI’s GPT-4 model o ex ac he i s name, company name,
loca ion, email, and phone numbe om he ex :
Ex ac he name, company, email, and phone numbe om he ex below:
Hello. My name is Ami Bah ee. I’m calling om Acme Insu ance, Sea le, WA. My col-
league men ioned ha you a e in e es ed in lea ning abou ou comp ehensi e
10 CHAPTER 1 In oduc ion o gene a i e AI
bene i s policy. Could you gi e me a call back a (555) 111-2222 when you ge a
chance so we can go o e he bene i s? I can be eached Monday o F iday du ing
no mal business hou s. I you wan , you can also email me a aweomsein-
[email p o ec ed]m. Thanks, Ami .
We can see he en i ies ex ac ed by he model in he ou pu in igu e 1.3.
Figu e 1.3 En i y ex ac ion
1.3.2 Gene a ing ex
Gene a ing ex in many ways is he b ead and bu e o gene a i e AI. We can gene -
a e di e en hings, such as poems, jokes, and pe sonalized con en . In his example,
we will use GPT-4 o gene a e a book launch email o a new book on gene a i e AI.
We gi e he model a p omp wi h ou equi emen s and some con ex om which o
build. We se up he scene, ou line he equi emen s, and add addi ional con ex ,
such as he ex 's one, which will gene a e he email o us.
Gi en he b oad na u e o he unde lying model (called ounda ional models, co -
e ed in he nex chap e ), i is also good a unde s anding g amma and spelling mis-
akes. Fo example, we misspelled “p ac ices” and “a chi ec es,” and i unde s ood
bo h:
W i e a sho book launch email o he new gene a i e AI book a ailable a all book-
s o es. The book co e s in oduc ion o gene a i e AI, he di e en ypes and use
cases, and bes p ac ices o en e p ises, including how hey can inco po a e his
new AI echnology in hei business p ocesses. The book is published by Manning and
he au ho is Ami Bah ee. The name o he book is Gene a i e AI in Ac ion: Real-Wo ld
111.3 Wha can we gene a e?
Applica ions and Case S udies o AI o En e p ises. The a ge audience is de elope s
and en e p ise a chi ec es, and he one is iendly and exci ing.
1Wha should be he subjec line o he email?
2Wha should be he body o he email?
We ge he ollowing answe , which is a he good:
1Subjec Line: “Explo e he Powe o Gene a i e AI wi h ou New Book: ‘Gene a i e
AI in Ac ion’ by Ami Bah ee!”
2Email Body:
Dea [Recipien 's Name],
I am h illed o announce he elease o a ema kable new book, “Gene a i e AI in
Ac ion: Real-Wo ld Applica ions and Case S udies o AI o En e p ises”, au ho ed
by Ami Bah ee and published by Manning.
Ou dynamic business en i onmen is calling o cu ing-edge solu ions, and Gen-
e a i e AI is jus ha . I is one o he mos exci ing ad ancemen s in he ech
indus y and can be a game-change o you business. Whe he you' e a de elope
o an en e p ise a chi ec , his book is you icke o ha nessing he powe o Gen-
e a i e AI.
“Gene a i e AI in Ac ion” un a els he complexi ies o Gene a i e AI. I p o ides an
accessible in oduc ion and di es deep in o i s di e en ypes, use cases, and
how i can e olu ionize business p ocesses. Fu he mo e, i o e s he bes p ac-
ices o en e p ises looking o inco po a e his s a e-o - he-a echnology in o
hei ope a ions.
He e's wha you can expec om his enligh ening ead:
– A comp ehensi e in oduc ion o Gene a i e AI.
– Explo a ion o a ious ypes o Gene a i e AI and hei eal-wo ld applica ions.
– Insigh ul case s udies demons a ing how businesses le e age Gene a i e
AI.
– Valuable guidance on he bes p ac ices o implemen ing Gene a i e AI in
you en e p ise.
We’ e ca e ully cu a ed his book o make Gene a i e AI accessible o anyone
eage o unde s and and use i . So whe he you' e new o AI o an expe ienced
p o essional seeking o s ay a he o e on o echnological ad ancemen s, “Gen-
e a i e AI in Ac ion” is he esou ce you' e been wai ing o .
We' e exci ed o sha e ha “Gene a i e AI in Ac ion” is now a ailable a all leading
books o es. G ab you copy oday and s a you jou ney owa ds ans o ming you
business wi h Gene a i e AI.
Thank you o you con inued suppo and happy eading!
Bes Rega ds,
[You Name]
[You Posi ion]
[You Con ac In o ma ion]

12 CHAPTER 1 In oduc ion o gene a i e AI
1.3.3 Gene a ing images
We a e no es ic ed o only ex —we can gene a e images using a p omp . To be
clea , he model does no ind a simila image ha sa is ies hese a ibu es; a he , i
gene a es some hing new ha does no exis . I does no ma e i we ask o some-
hing whimsical ha will no exis in he eal wo ld, such as a dog w i ing a book wi h
he Taj Mahal in he backg ound. The AI can decons uc and comp ehend each
aspec and hen use ha o c ea e some hing new, simila o how an a is would. In
igu e 1.4, we use OpenAI’s DALL-E 3 model o gene a e he images om he p omp .
Gene a e an image o a dog wea ing glasses si ing a a able and au ho ing a book
on AI using a compu e . Make i a posi i e image wi h he backg ound o he Taj
Mahal in he window in he dis ance a he golden hou .
Figu e 1.4 Image gene a ion using DALL-E 3
1.3.4 Gene a ing code
When hinking abou gene a ing code, i is help ul o hink o AI no as being able o
c ea e ully unc ioning applica ions bu a he as being able o c ea e some unc ions
and ou ines. A lo o code is abou sca olding o di e en un imes and amewo ks
and less abou he exac business logic. In many o hese scena ios, code gene a ion
can help imp o e he de elope ’s p oduc i i y. In he ollowing example, we use
131.3 Wha can we gene a e?
GPT-3.5 o gene a e code o a classic “Hello, Wo ld!” unc ion. We can gi e i a
p omp such as he ollowing, and i will gene a e he code o us.
W i e a hello wo ld equi alen in Py hon using OpenAI API’s o a de elope who is new
o using OpenAI and ansla e he ou pu in o F ench.
You ge an answe like lis ing 1.1, including he s eps equi ed o s a , which is
imp essi e. O cou se, his is jus an illus a i e example o show he model’s powe —
unde s anding he con ex and ules o he eques , including he p og amming lan-
guage, he so wa e de elopmen ki (SDK), packages o use, and, inally, gene a ing
code. This code does no ollow es ablished bes p ac ices (e.g., one should no ha e
hei API key in he code).
impo os
om openai impo OpenAI
gp _model = "gp -3.5- u bo"
# Replace wi h you ac ual OpenAI API key
clien = OpenAI(api_key='you -api-key')
# Gene a e English ex
esponse_english = clien .cha .comple ions.c ea e(
model="gp -3.5- u bo",
messages=[
{
" ole": "use ",
"con en ": "Hello, Wo ld!"
}
],
max_ okens=50
)
english_ ex = esponse_english.choices[0].message.con en .s ip()
p in (english_ ex )
# T ansla e English ex o F ench
esponse_ ench = clien .cha .comple ions.c ea e(
model="gp -3.5- u bo",
messages=[
{
" ole": "use ",
"con en ": "T ansla e he ollowing English
➥ ex o F ench: " + english_ ex
}
],
max_ okens=100
)
Lis ing 1.1 “Hello, Wo ld!” calling OpenAI se ice
14 CHAPTER 1 In oduc ion o gene a i e AI
# This p in s he ansla ion o F ench
p in ( esponse_ ench.choices[0].message.con en .s ip())
Run he sc ip :
py hon hellowo ld.py
The ou pu is shown in igu e 1.5. No e ha gi en he non-de e minis ic na u e o AI,
we ge a sligh ly di e en esponse each ime we un his. In chap e 3, we will see how
we can con ol some o his using di e en op ions and nudge he model owa d
being mo e de e minis ic. Some o hese op ions a e how la ge language models
(LLMs), a ype o gene a i e AI model, wo k. We will explo e LLMs in he nex
chap e .
Figu e 1.5 Gene a ed code
While we ou line code gene a ion, LLMs can be help ul in o he aspec s ela ed o
code, such as explainabili y, op imiza ion, and eadabili y. We will ge in o mo e
de ails abou he code la e in chap e 5.
1.3.5 Abili y o sol e logic p oblems
Gene a i e AI's abili y o eason is imp essi e. Due o he eme gen beha io o hese
AI echniques, i we can exp ess he p oblem domain ia a p omp , we can ge an
answe and a s ep-by-s ep hough p ocess o he app oach. This esembles how
humans hink, b eak down, and app oach a p oblem.
Fo example, we can ask he model o sol e a simple ma h equa ion, explain he
s eps, and gi e us he answe .
Sol e his p oblem and hink i h ough s ep by s ep.
I x = 80 and y = 60
Wha is x when y = 30?
151.3 Wha can we gene a e?
He e is he esponse showing he model wo king h ough i s s eps and he exac
hough p ocess i unde wen . Achie ing some hing like his ha can gene alize
ac oss a wide domain space wi h adi ional AI would be di icul , i no impossible:
Gi en ha x = 80 when y = 60, we can es ablish a p opo ion o sol e o x when
y = 30.
In his case, we assume a linea ela ionship be ween x and y, ha is, x is di ec ly
p opo ional o y. So, when y is hal ed, x will also be hal ed.
Using he gi en alues:
x1/y1 = x2/y2
Whe e:
x1 = 80
y1 = 60
y2 = 30
and we' e ying o ind x2.
We can se up he equa ion:
80/60 = x2/30
Sol ing o x2, we ge :
x2 = (80/60) * 30
This simpli ies o:
x2 = (4/3) * 30
x2 = 40
So, i y = 30, hen x = 40.
T adi ional AI canno sol e his p oblem because i has a na ow unde s anding o he
wo ld. I canno unde s and he ques ion’s in en o ou line a s ep-by-s ep p ocess
because adi ional AI is no gene a i e. I can only ollow ins uc ions ha a e explic-
i ly p og ammed in o i .
Gene a i e AI, howe e , can comp ehend he in en o he ques ion because i has
a b oade unde s anding o he wo ld and can gene a e s ep-by-s ep p ocesses.
Ano he aspec ha allows his is he eme gen beha io o gene a i e AI models. This
beha io is he abili y o ou line a s ep-by-s ep p ocess. I is no p esen in any o he
indi idual componen s o he model bu eme ges om he in e ac ion o he compo-
nen s. The nex chap e will co e eme gen beha io in mo e de ail when in oduc-
ing la ge language models.
1.3.6 Gene a ing music
Simila o how we can use p omp s and gene a e images, we can do he same wi h
music. Music gene a ion is s ill new compa ed o ex , bu he e a e apid
22 CHAPTER 1 In oduc ion o gene a i e AI
model such as GPT-4, a big language model, does no make any di e ence by i sel .
These ad anced AI sys ems mus be implemen ed and connec ed o he en e p ise’s
business lines and p ocesses like any o he ex e nal so wa e. We will see examples o
how o implemen his in subsequen chap e s.
A a high le el, he e should be ew changes om an o e all app oach; en e p ises
a e s ill ad ised o ake a hough ul and s a egic app oach when inco po a ing gen-
e a i e AI. The ollowing a e a ew key conside a ions— hese span a ious dimensions
ha mos en e p ises need o conside , om s a egic o business o echnical:
C awl, walk, and un. S a small, and do no ush in o do oo much oo soon.
S a wi h a small pilo p ojec o e alua e, lea n, and adap . This is a complex
echnology, and i akes ime o de elop and deploy e ec i e gene a i e AI
applica ions. Do no expec o see esul s o e nigh .
De ine clea objec i es and he igh use cases. I is impo an o en e p ises o ca e-
ully e alua e po en ial use cases and selec hose ha a e mos likely o deli e
alue. The selec ed use case will guide he choice o AI models, da a p epa a ion,
and esou ce alloca ions. Some gene a i e AI applica ions a e mo e ma u e and
ha e a p o en eco d o success, while o he s a e s ill in hei ea ly days.
Es ablish go e nance policies. Gene a i e AI can gene a e da a, some o which may
be sensi i e o ha m ul. En e p ises mus es ablish go e nance policies o
ensu e his da a is used esponsibly and secu ely. These policies should add ess
p oblems such as da a owne ship, p i acy, and secu i y.
Es ablish esponsible AI and e hical go e nance. Conside ing he e hical implica ions
o using gene a i e AI is impo an . Es ablish a sepa a e esponsible AI and e h-
ical se o policies ha e lec he company’s alues and ha a e impo an o
managing i s epu a ion and b and. This includes conce ns a ound bias in AI
ou pu s, he po en ial misuse o gene a ed con en , hallucina ions and inco -
ec de ails in gene a ed con en , and he implica ions o au oma ing asks ha
humans p e iously pe o med. A obus AI go e nance and e hics amewo k
can help manage hese isks.
Expe imen and i e a e. Unlike compu e science, AI, pa icula ly gene a i e AI, is
nonde e minis ic, and depending on he model pa ame e s and se ings, he
ou pu can be qui e di e en . As wi h any AI applica ion, i is essen ial o ake
an i e a i e app oach when implemen ing gene a i e AI. S a wi h smalle
p ojec s, lea n om he ou comes, and g adually scale up. This app oach helps
o manage isk and gain p ac ical expe ience.
Design o ailu e. Mos gene a i e AI models oday a e comme cially a ailable as
cloud APIs. As such, hey a e complex and ha e a conside able la ency com-
pa ed o mo e adi ional APIs. En e p ises should adhe e o cloud bes p ac-
ices and design o ailu e. They should also ac o in bes p ac ices o e y
mechanics, including exponen ial backo policies, caching, secu i y, e c.
Expand exis ing a chi ec u e. These new gene a i e AI endpoin s a e jus addi ional
pieces o he o e all sys em. As such, mos o ganiza ions will wan o keep hei

231.8 A chi ec u e conside a ions
exis ing a chi ec u e guidance and p ac ices and expand hei exis ing a chi ec-
u e and bes p ac ices, a he han s a ing om sc a ch. New cons uc s, such
as con ex windows, okens, and embeddings, need o be inco po a ed.
B ing you da a. One o he main di e en ia o s en e p ises ha e is hei p op i-
e a y da a and associa ed p omp s; he e o e, de e mining how one can u ilize
hei p op ie a y in e nal da a when using GenAI-powe ed applica ions is c u-
cial. This needs o be ancho ed in he use cases a hand, and i no managed
p ope ly, i can ge complex quickly, which will be co e ed in la e chap e s
when we alk abou RAG.
Manage cos . Gene a i e AI is complex and much mo e expensi e. The cos is yp-
ically measu ed di e en ly (such as in okens) and no in API calls. Much o his
is new and di e en o en e p ises, and he cos s can easily ge ou o hand.
Complemen adi ional AI. In mos cases, gene a i e AI would help assis exis ing
in es men in adi ional AI ha en e p ises al eady ha e. Bo h se s o echnolo-
gies a e no mu ually exclusi e bu a he suppo each o he .
Open-sou ce e sus comme cial models. Some models a e comme cially a ailable,
such as Azu e OpenAI’s GPT models, and some a e open sou ce, such as S able
Di usion. Depending on he use case, i is impo an o alida e which models
o use, wha he licensing allows, and wha legal and egula o y aspec s a e
al eady co e ed.
1.8 A chi ec u e conside a ions
Suppose you a e an en e p ise de elope who is seeing all he news on gene a i e AI
and he a ious p oduc announcemen s om majo echnology companies. In ha
case, you migh hink ha o AI, e e y hing has changed. S ill, in eali y, no hing has
changed.
F om an en e p ise pe spec i e, he e a e new aspec s o gene a i e AI ha one
needs o conside —mos , i no all, o hese would be hings o add o exis ing a chi-
ec u e bes p ac ices and guidance, a he han h owing ou any hing. We will co e
he de ails la e in he book, bu new a chi ec u al pa e ns mus be accoun ed o a a
high le el. We ha e al eady ouched on many o hese, bu he key ones a e
P omp s—We will see how o assess enginee ing and managing aspec s a ound
p omp s, including okens and con ex windows.
Model adap a ion—The aim is o make he ou pu be e o speci ic asks.
In eg a ing gene a i e AI in o exis ing en e p ise line-o -business sys ems—These new
AI models alone do no sol e a business p oblem.
Design o ailu e—This aspec is no hing new pe se when building mission-
c i ical sys ems, bu many s ill ake sho cu s.
Cos and ROI—These gene a i e AI sys ems a e emendously expensi e
because he unde lying compu e is e y expensi e as well. The cos s will come
down o e ime, bu hey mus be consciously planned and designed up on .
24 CHAPTER 1 In oduc ion o gene a i e AI
Fo example, he cos o GPT-3.5 Tu bo om OpenAI came down by 90%, and
i s quali y wen up by 90% compa ed o GPT-3 [5].
Implemen policies and app oaches o open sou ce (OSS) e sus comme cial models—
Each week, newe models powe AI sys ems and a e eleased. Some a e com-
me cial and o he s a e OSS, wi h di e en licensing s uc u es.
Vendo —The e a e a ew endo s in p oduc ion ha en e p ises can use oday,
bu mo e a e coming soon. Today, wo o he mos ma u e a e OpenAI and
Azu e OpenAI. The o me a ge s smalle companies and s a ups, whe eas he
la e a ge s en e p ises. Google is also eleasing i s gene a i e AI sui e on Goo-
gle Cloud, and he e ha e been simila announcemen s om Amazon. In addi-
ion, many well- unded s a ups ha e announced simila p oduc s, such as
An h opic and Mis al. En e p ises need o conside each as a endo and iden-
i y which one hey would wan o u ilize and depend on.
1.9 So you en e p ise wan s o use gene a i e AI. Now wha ?
You en e p ise has aken a c i ical s ep owa d using gene a i e AI o d i e inno a ion
and e iciency. Howe e , unde s anding wha comes nex is c ucial o maximizing he
bene i s and mi iga ing he isks o his ad anced echnology.
To ge s a ed, we will use he example o implemen ing an En e p ise Cha GPT
and ou line he s eps needed a a high le el. Th oughou he nex ew chap e s, we
will dig in o mo e echnical de ails, including guidance on implemen a ion and bes
p ac ices. Figu e 1.7 shows a high-le el o e iew o wha a ypical wo k low in an en e -
p ise migh look like.
Figu e 1.7 High-le el o e iew o implemen ing gene a i e AI
You should s a by se ing clea goals o you cha bo . Wha challenges do you wan
o add ess wi h gene a i e AI? How can i help you he mos ? This could be any hing
om c ea ing con en o ma ke ing o enhancing cus ome se ice wi h cha bo s,
Goals
• Use cases
• Success c i e ia
1
Resou ces
• People
• So wa e
• Ha dwa e
2
Da a
• Cleaning
• Inges ion
• Indexing
3
In eg a e gene a i e AI
• Line o business app
• P omp enginee ing
• Sa e and esponsible AI
4
Deploy
• Tes MVP
• Deploy o p oduc ion
• Moni o
5
25Summa y
o ecas ing o business plans, o e en inno a ing new p oduc s o se ices. In ou
example, we a e building an En e p ise Cha GPT, such as OpenAI’s Cha GPT, bu one
ha is deployed and uns in an en e p ise en i onmen , using in e nal and p op i-
e a y da a, and only au ho ized use s can access i .
Nex , we need o ensu e ha we ha e he necessa y esou ces a ailable, ha is,
people wi h he igh compe encies, a sui able ha dwa e and so wa e amewo k,
de ining indica o s o success, and he app op ia e go e nance and e hics p inciples
in place.
Then, conside he da a. In ou example, he en e p ise cha bo would need
access o ele an , high-quali y en e p ise da a ha he use can employ. This da a
needs o be inges ed and indexed o help answe p op ie a y ques ions. Be o e ha ,
he da a mus be managed p ope ly, ensu ing p i acy and legal compliance. Remem-
be , he quali y o he da a ed will in luence he ou pu quali y.
Nex , we need o in eg a e he en e p ise cha bo in o he line o business applica-
ions ha add ess he use case and he p oblem we a e ying o add ess. As an en e -
p ise, we will also wan o add ess he isks associa ed wi h gene a i e AI and
implemen co po a e guidance a ound sa e y and esponsible AI.
Las ly, al hough we migh be eady o deploy in p oduc ion, implemen ing gene a-
i e AI is no a one- ime e en bu a jou ney. I equi es con inuous moni o ing, es -
ing, and ine- uning o ensu e i wo ks op imally and esponsibly. I ’s a good idea o
s a wi h smalle , manageable p ojec s and g adually scale up as you gain mo e con i-
dence and expe ise in handling his powe ul echnology.
Adop ing gene a i e AI is a signi ican commi men ha could ans o m you
en e p ise, bu i equi es ca e ul planning, app op ia e esou ces, ongoing moni o -
ing, and an unwa e ing ocus on e hical conside a ions. Wi h hese in place, you
en e p ise can eap he nume ous bene i s o gene a i e AI.
Summa y
Gene a i e AI can be used o mul iple use cases, such as en i y ex ac ion; gen-
e a ing speci ic and pe sonalized ex , images, code, and music; in e p e ing
ex ; and sol ing logical p oblems.
Gene a i e AI use cases can be ho izon al ac oss mos indus ies (such as cus-
ome se ices and pe sonalized ma ke ing) o indus y speci ic (such as aud
de ec ion in inance o pe sonalized ea men plans in heal hca e).
T adi ional AI p edominan ly ope a es in p ede ined na ow lanes and can ac
only in hose dimensions, unlike gene a i e AI, which is b oade and allows o
mo e lexibili y.
This chap e ou lined an app oach and a chi ec u e conside a ions o en e -
p ises o use when adop ing and implemen ing gene a i e AI.
26
In oduc ion o la ge
language models
La ge language models (LLMs) a e gene a i e AI models ha can unde s and and
gene a e human-like ex based on a gi en inpu . LLMs a e he ounda ion o many
na u al language p ocessing (NLP) asks, such as sea ch, speech- o- ex , sen imen
analysis, ex summa iza ion, and mo e. In addi ion, hey a e gene al-pu pose lan-
guage models ha a e p e ained and can be ine- uned o speci ic asks and
pu poses.
This chap e co e s
An o e iew o LLMs
Key use cases powe ed by LLMs
Founda ional models and hei e ec on AI
de elopmen
New a chi ec u e concep s o LLMs, such as
p omp s, p omp enginee ing, embeddings,
okens, model pa ame e s, con ex window, and
eme gen beha io
An o e iew o small language models
Compa ison o open sou ce and comme cial LLMs
272.1 O e iew o ounda ional models
This chap e explo es he ascina ing wo ld o LLMs and hei ans o ma i e
e ec on a i icial in elligence (AI). As a signi ican ad ancemen in AI, LLMs ha e
demons a ed ema kable capabili ies in unde s anding and gene a ing human-like
ex , hus enabling nume ous applica ions ac oss a ious indus ies. He e, we di e
in o he c i ical use cases o LLMs, he di e en ypes o LLMs, and he concep o
ounda ional models ha has e olu ionized AI de elopmen .
The chap e discusses essen ial LLM concep s, such as p omp s, p omp enginee -
ing, embeddings, okens, model pa ame e s, con ex windows, ans o me a chi ec-
u e, and eme gen beha io . Finally, we compa e open sou ce and comme cial LLMs,
highligh ing hei ad an ages and disad an ages. By he end o his chap e , you will
ha e a comp ehensi e unde s anding o LLMs and hei implica ions o AI applica-
ions and esea ch. LLMs a e buil on ounda ional models; he e o e, we will s a by
ou lining wha hese models a e be o e discussing LLMs in mo e dep h.
2.1 O e iew o ounda ional models
In oduced by S an o d esea che s in 2021, ounda ional models ha e subs an ially
ans o med he cons uc ion o AI sys ems. They di e ge om ask-speci ic models,
shi ing o b oade , mo e adap able models ained on la ge da a olumes. These
models can excel in di e se na u al language asks, such as machine ansla ion and
ques ion answe ing, as hey lea n gene al language ep esen a ions om ex ensi e
ex and code da ase s. These ep esen a ions can hen be used o pe o m a ious
asks, e en asks hey we e no explici ly ained on, as shown in igu e 2.1.
In mo e echnical e ms, ounda ional models u ilize es ablished machine lea ning
echniques such as sel -supe ised lea ning and ans e lea ning, enabling hem o
apply acqui ed knowledge ac oss a ious asks. De eloped by means o deep lea ning,
hese models employ mul ilaye ed a i icial neu al ne wo ks o comp ehend complex
da a pa e ns; hence, hei p o iciency wi h uns uc u ed da a such as images, audio,
and ex . This also ex ends o 3D signals—da a ep esen ing 3D a ibu es ha cap u e
spa ial dimensions and dep h, such as 3D poin clouds om LiDAR senso s, 3D medical
imaging such as CT scans, o 3D models used in compu e g aphics and simula ions.
These can be u ilized o make p edic ions based on 3D da a o asks such as objec ec-
ogni ion, scene unde s anding, and na iga ion in obo ics and au onomous ehicles.
NOTE T ans e lea ning is a machine lea ning echnique in which a model
de eloped o one ask is eused as a s a ing poin o a simila ask. Ins ead
o s a ing om sc a ch, we use he knowledge om he p e ious ask o pe -
o m be e on he new one. I ’s like using knowledge om a p e ious job o
excel a a new bu ela ed job.
Gene a i e AI and ounda ional models a e closely in e linked. As ou lined, ounda-
ional models, ained on massi e da ase s, can be adap ed o pe o m a ious asks;
his p ope y makes hem pa icula ly sui able o gene a i e AI and allows o c ea ing

28 CHAPTER 2 In oduc ion o la ge language models
new con en . The b oad knowledge base o hese models allows o e ec i e ans e
lea ning, which can be used o gene a e new, con ex ually app op ia e con en ac oss
di e se domains. They ep esen a uni ied app oach, whe e a single model can gene -
a e a ious ou pu s, o e ing s a e-o - he-a pe o mance owing o hei ex ensi e
aining. Wi hou ounda ional models as he backbone, he e would be no gene a i e
AI models.
Figu e 2.1 Founda ional model o e iew
He e a e some examples o he common ounda ion models:
GPT (Gene a i e P e- ained T ans o me ) Family is an NLP amily o models
de eloped by OpenAI. I is a la ge language model ained on a massi e da ase
o ex and code, which makes i capable o gene a ing ex , ansla ing lan-
guages, w i ing c ea i e con en , and answe ing you ques ions in o ma i ely.
GPT-4, he la es e sion a he ime o his w i ing, is also a mul imodal
model—i can manage bo h language and images.
Codex is a la ge language model ained speci ically on code ha is used o
help wi h code gene a ion. I suppo s o e a dozen p og amming languages,
Founda ional
model
T ans o me model
Tex
Images
Speech
S uc u ed
da a
3D signals
Q&A
Sen imen analysis
In o ma ion
ex ac ion
Image cap ioning
Objec ecogni ion
Ins uc ion ollow
Code gene a ion
Code
unde s anding
Tasks
Adap a ionT aining
Da a
292.2 O e iew o LLMs
including some o he mo e commonly used, such as C#, Ja a, Py hon, Ja a-
Sc ip , SQL, Go, PHP, and Shell, among o he s.
Claude is an LLM buil by a s a up called An h opic. Like OpenAI’s Cha GPT,
i p edic s he nex oken in a sequence when gi en a ce ain p omp and can
gene a e ex , w i e code, summa ize, and eason.
BERT (Bidi ec ional Encode Rep esen a ions om T ans o me s) is an NLP
model de eloped by Google. I is a bidi ec ional model, meaning i can p ocess
ex in bo h di ec ions, om le o igh and igh o le . This ea u e makes i
be e a unde s anding he con ex o wo ds and ph ases.
PaLM (Pa hway Language Model) and i s successo PaLM2 a e la ge mul i-
modal language models de eloped by Google. The mul imodal model can p o-
cess ex , code, and images simul aneously, making i capable o pe o ming a
wide ange o asks ac oss hose modali ies compa ed o adi ional language
models ope a ing only in one modali y.
Gemini is Google’s la es AI model, capable o unde s anding ex , images, id-
eos, and audio. I ’s a mul imodal model desc ibed as being able o comple e
complex asks in ma h, physics, and o he a eas, as well as unde s anding and
gene a ing high-quali y code in a ious p og amming languages. Gemini was
buil om he g ound up o be mul imodal, meaning i can gene alize and
seamlessly unde s and, ope a e ac oss, and combine di e en ypes o in o ma-
ion. I ’s also he new umb ella name o all o Google’s AI ools, eplacing Goo-
gle Ba d and Due AI, and is conside ed a successo o he PaLM model.
Once a ounda ional model is ained, i can be adap ed o a wide ange o down-
s eam asks by ine- uning i s pa ame e s. Fine- uning in ol es adjus ing he model’s
pa ame e s o op imize he model o a speci ic ask. I can be done using a small
amoun o labeled da a. By ine- uning hese models o speci ic asks o domains, we
use hei gene al unde s anding o language and supplemen i wi h ask-speci ic
knowledge. The bene i s o his app oach include ime and esou ce e iciency, cou-
pled wi h ema kable e sa ili y. We can also adap a model ia P omp enginee ing,
which we’ll discuss la e in his chap e . Now ha we know mo e abou ounda ional
models, le ’s explo e LLMs.
2.2 O e iew o LLMs
LLMs ep esen a signi ican ad ancemen in AI. They a e ained on a as amoun o
ex da a, such as books, a icles, and websi es, o lea n pa e ns in human language.
They a e also ha d o de elop and main ain, as hey equi e lo s o da a, compu ing,
and enginee ing esou ces. OpenAI’s Cha GPT is an example o an LLM—i gene -
a es human-like ex by p edic ing he p obabili y o a wo d conside ing he wo ds
al eady used in he ex .
The model lea ns o gene a e cohe en and con ex ually ele an sen ences by
adjus ing i s in e nal pa ame e s o minimize he di e ence be ween i s p edic ions
30 CHAPTER 2 In oduc ion o la ge language models
and he ac ual ou comes in he aining da a. When gene a ing ex , he model
chooses he wo d wi h he highes p obabili y as i s subsequen ou pu and hen
epea s he p ocess o he nex wo d.
LLMs a e ounda ional models adap ed o na u al language p ocessing and lan-
guage gene a ion asks. These LLMs a e gene al-pu pose and can handle asks wi h-
ou ask-speci ic aining da a. As b ie ly desc ibed in he p e ious chap e , gi en he
igh p omp , hey can answe ques ions, w i e essays, summa ize ex s, ansla e lan-
guages, and e en gene a e code. LLMs can be applied o many applica ions ac oss di -
e en indus ies, as ou lined in chap e 1— om summa iza ion o classi ica ion,
Q&A cha bo s, con en gene a ion, da a analysis, en i y ex ac ion, and mo e. Be o e
we ge in o mo e de ails o LLMs, le us look a he T ans o me a chi ec u e, which
makes hese ounda ional models possible.
2.3 T ans o me a chi ec u e
T ans o me s a e he bed ock o ounda ional models and a e esponsible o hei
ema kable language unde s anding capabili ies. The T ans o me model was i s
in oduced in he pape “A en ion Is All You Need” by Vaswani e al. in 2017 [1].
Since hen, T ans o me -based models ha e become s a e-o - he-a o many asks.
GPT and BERT a e examples o T ans o me -based models, and he “T” in GPT s ands
o T ans o me s.
A hei co e, T ans o me s use a mechanism known as a en ion (speci ically sel -
a en ion), which allows he model o conside he en i e con ex o a sen ence, con-
side ing all wo ds simul aneously a he han p ocessing he sen ence wo d by wo d.
This app oach is mo e e icien and can imp o e he esul s o many NLP asks.
The s eng h o his app oach is ha i cap u es dependencies ega dless o hei
posi ion in he ex , which is an essen ial ac o in language unde s anding. This is key
o asks such as machine ansla ion and ex summa iza ion, whe e he meaning o a
sen ence can depend on e ms ha a e se e al wo ds apa .
T ans o me s can pa allelize hei compu a ions, which makes hem much as e
o ain han o he ypes o neu al ne wo ks. This mechanism enables he model o
pay a en ion o he mos ele an pa s o he ask inpu .
In he con ex o gene a i e AI, a ans o me model would ake an inpu (such as
a p omp ) and gene a e an ou pu (such as he nex wo d o he comple ion o he
sen ence) by weighing he impo ance o each pa o he inpu in gene a ing he ou -
pu . Fo example, in he sen ence “The ca sa on he...,” a T ans o me model would
likely gi e much weigh o he wo d “ca ” when de e mining ha he likely nex wo d
migh be “ma .” These models exhibi gene a i e p ope ies by p edic ing he nex
i em in a sequence— he nex wo d in a sen ence o he nex no e in a melody. We
explo e his mo e in he nex chap e .
T ans o me models a e usually e y la ge, equi ing signi ican compu a ional
esou ces o ain and use. Using a ca analogy, hink o T ans o me models as
312.5 Types o LLMs
supe cha ged engines ha need much powe o un bu do amazing hings. Think o
hem as he nex s ep a e models such as ResNET 50, which is used o ecognizing
images. While ResNET 50 is like a ca wi h 50 gea s, OpenAI’s GPT-3 is like a
mega uck wi h 96 gea s and ex a ea u es. Because o hei ad anced capabili ies,
hese models a e a op pick o c ea ing in elligen AI ou pu s.
LLMs use ans o me s, which a e composed o an encode and a decode . The
encode p ocesses he inpu ex (i.e., he p omp ) and gene a es a sequence o hid-
den s a es ha ep esen he meaning o he inpu ex . The decode uses hese hid-
den s a es o gene a e he ou pu ex . These encode s and decode s o m one laye ,
simila o a mini-b ain. Mul iple laye s can be s acked one upon ano he . As ou lined
ea lie , GPT3 is a decode -only model wi h 96 laye s.
2.4 T aining cu o
In he con ex o ounda ional models, he aining cu o e e s o he poin a which
he model’s aining ends, ha is, he ime un il he da a used o ain he model was
collec ed. In he case o AI models de eloped by OpenAI, such as GPT-3 o GPT-4, he
aining cu o is when he model was las ained on new da a.
This cu o is impo an because a e his poin , he model is no awa e o any
e en s, ad ancemen s, new concep s, o changes in language usage. Fo example, he
aining da a cu o o he GPT-3.5 Tu bo was in Sep embe 2021, GPT-4 Tu bo in
Ap il 2023, and GPT-4o in Oc obe 2023, meaning he model does no know abou
eal-wo ld e en s o ad ancemen s in a ious ields beyond ha poin .
The key poin is ha while hese models can gene a e ex based on he da a hey
we e ained on, hey do no lea n o upda e hei knowledge a e he aining cu o .
They canno access o e ie e eal- ime in o ma ion om he in e ne o any ex e nal
da abase. Thei esponses a e gene a ed pu ely based on pa e ns hey ha e lea ned
du ing hei aining pe iod.
NOTE The ecen announcemen ha he p emium e sions o Cha GPT will
ha e access o he in e ne ia he Bing plugin doesn’ mean ha he model
has mo e up- o-da e in o ma ion. This uses a pa e n called RAG ( e ie al-
augmen ed gene a ion), which will be co e ed la e in chap e 7.
2.5 Types o LLMs
As shown in able 2.1, he e a e h ee ca ego ies o LLMs. When we alk abou LLMs,
ha ing he con ex is c ucial, and i migh no be e iden in some cases. This is o g ea
impo ance, as he pa hs we can go down when using he models a en’ in e change-
able, and picking he igh ype depends on he use case one ies o sol e. Fu he -
mo e, he e is also a dependency on how e ec i ely one can adap he models o
speci ic use cases.
38 CHAPTER 2 In oduc ion o la ge language models
2.8 Key concep s o LLMs
This sec ion desc ibes he a chi ec u e o a ypical LLM implemen a ion. Figu e 2.3
shows he abs ac s uc u e o a common LLM implemen a ion a a high le el; i ol-
lows his p ocess whene e we use an LLM such as OpenAI’s GPT.
Figu e 2.3 Concep ual a chi ec u e o an LLM
The model s a s wi h he inpu ex — he p omp . I is i s con e ed in o a sequence
o okens using okeniza ion. Each oken is hen con e ed in o a nume ical ec o ia
a p ocess called embedding, which ac s as he encode inpu .
(con inued)
Azu e OpenAI o e s addi ional en e p ise- eady ea u es, such as da a p i acy,
cus ome -managed keys, enc yp ion a es , p i a e ne wo king, egional a ailabili y,
and esponsible AI con en il e ing. These ea u es can be impo an o businesses
ha need o comply wi h speci ic secu i y o egula o y equi emen s.
Consequen ly, he APIs be ween he wo a e simila bu no he same. Howe e , he
unde lying models a e he same, and Azu e OpenAI has a deploymen ha inco po-
a es hese addi ional ea u es ha mos en e p ises equi e.
Inpu ex
(p omp )
Token
Embedding
Encode
Decode
…………………..
…………………..
…………………..
Gene a ed ex
(comple ion)
Nume ical ep esen a ion
Needed o scena ios such
as “B ing you own da a,”
sea ch, e c.
LLM
…………………..
…………………..
…………………..
…………………..
…………………..
………………….

392.8 Key concep s o LLMs
The encode p ocesses he inpu sequence and gene a es a sequence o hidden
s a es. These hidden s a es a e hen ed in o he decode wi h a s a oken. The
decode gene a es he ou pu sequence one oken a a ime by p edic ing he nex
oken based on he p e ious okens and hidden s a es.
Once all he laye s ha e p ocessed he in o ma ion, he model p edic s he nex
oken in he lea ned sequence. This ou come is con e ed back o he ex , and we see
he esponse. This p ocess uns in an i e a i e loop and occu s o each new oken
gene a ed, hus c ea ing a cohe en ex ou pu . The inal ex ha he model gene -
a es is an eme gen p ope y o his laye ed, i e a i e p ocess. The inal ou pu
sequence is also called a comple ion.
Le ’s examine each o hese LLM aspec s in mo e de ail.
2.8.1 P omp s
A p omp is how we “ alk” o hese models. A p omp is jus ex desc ibing he ask we
a e ying o achie e using na u al language. The ou pu o hese models is also ex .
The abili y o exp ess ou in en ion in his manne (na u al language) ins ead o con-
o ming o he inpu es ic ions o a machine makes p omp s powe ul. C a ing o
designing he ex in he p omp is akin o p og amming he model and c ea ing a
new pa adigm called p omp enginee ing, which will be co e ed la e in he book.
P omp s allow us o cap u e a b oade se o in en s and emo ions using he same
pa adigm. In he ollowing wo examples (gene a ed by AI), a p omp asks o a Haiku
ou lining how g ea LLMs a e: one haiku is happy, and he o he is sad. Ge ing his
le el o emo ion and exp essi eness on he same ask was impossible.
W i e a happy haiku on how g ea LLMs a e.
AI-gene a ed esponse
LLMs a e g ea
They can cha and gene a e
C ea i e and sma
W i e a sad haiku on how g ea LLMs a e.
AI-gene a ed esponse
LLMs a e g ea
Bu hey canno eel o lo e
Only simula e
NOTE LLMs do no unde s and ex like humans do. They a e s a is ical mod-
els ha lea n pa e ns in da a and use hose pa e ns o gene a e ou pu .
Despi e hei imp essi e ou pu , hese models don’ comp ehend he con en
gene a ed in he same sense as we do—including belie s, desi es, conscious-
ness, and igh and w ong. They a e jus ex emely good a p edic ing he nex
wo d in a sequence o ex based on pa e ns hey’ e seen millions o imes.
40 CHAPTER 2 In oduc ion o la ge language models
2.8.2 Tokens
Tokens a e he basic uni s o ex ha an LLM uses o p ocess bo h he eques and
he esponse, ha is, o unde s and and gene a e ex . Tokeniza ion is he p ocess o
con e ing ex in o a sequence o smalle uni s called okens. When using LLMs, we
use okens o con e se wi h hese models, which is one o he mos undamen al ele-
men s o unde s anding LLMs.
Tokens a e he new cu ency when inco po a ing LLMs in o you applica ion o
solu ions. They di ec ly co ela e wi h he cos o unning models, bo h in e ms o
money and o he expe ience wi h la ency and h oughpu . The mo e okens, he
mo e p ocessing he model mus do. This means mo e compu a ional esou ces a e
equi ed o he model, which means lowe pe o mance and highe la ency.
LLMs con e he ex in o okens be o e p ocessing. Depending on he okeniza-
ion algo i hm, hey can be indi idual cha ac e s, wo ds, sub-wo ds, o e en la ge lin-
guis ic uni s. A ough ule o humb is ha one oken is app oxima ely ou cha ac e s
o 0.75 wo ds o English ex . Fo mos LLMs oday, he oken size ha hey suppo
includes bo h he inpu p omp and he esponse.
Le ’s illus a e his h ough an example. Figu e 2.4 shows how he sen ence “I ha e
a whi e dog named Champ” ge s okenized (using OpenAI’s okenize in his case).
Each block ep esen s a di e en oken. In his example, we use eigh okens.
Figu e 2.4 Tokenize example
LLMs gene a e ex by p edic ing he nex wo d o symbol ( oken) mos likely o ol-
low a gi en sequence o wo ds o symbols ( okens) hey use as inpu , ha is, he
p omp . We show a isual ep esen a ion o his in igu e 2.5, whe e he lis o okens
on he igh shows he highes p obabili y o okens ollowing he p omp “The dog
sa on.” We can in luence some o his p obabili y o okens using a ew pa ame e s we
will see la e in he chap e .
Suppose we ha e a sequence o okens wi h a leng h o n. U ilizing hese n okens
as he con ex , we gene a e he subsequen oken, n + 1. This newly p edic ed oken is
hen appended o he o iginal sequence o okens, he eby expanding he con ex .
Consequen ly, he expanded con ex window o gene a ing oken n + 2 becomes
Tokeniza ion
I ha e a whi e dog named Champ .
12345678
412.8 Key concep s o LLMs
n + (n + 1). This p ocess is epea ed in a con inuous loop un il a p ede e mined s op
condi ion, such as a speci ic sequence o a size limi o he okens, is eached.
Fo example, i we ha e a sen ence, “Hawaiian pizza is my a o i e,” he p obabili y
dis ibu ion o he nex wo d we see is shown in igu e 2.6. The mos likely wo d is
“ ype,” inishing he sen ence “Hawaiian pizza is my a o i e ype.”
Figu e 2.6 Nex oken p obabili y dis ibu ion
I you un his example again, you will ge a p obabili y di e en om he one shown
he e. This is because mos AI is nonde e minis ic, speci ically in he case o LLMs.
Simul aneously, i migh p edic one oken, and i is p obably being looked a ac oss
all he possible okens ha he model has lea ned in he aining phase.
We also use wo examples ha ou line how one oken changes he dis ibu ion d a-
ma ically (changing one wo d om “ he” o “a”). Figu e 2.7 shows ha he mos
LLM
Inpu oken
ec o
Vec o ep esen a ion
o nex ou pu oken
he
ma
pad
…
…
…
Highes
p obabili y
Second highes
p obabili y
Less
likely
Nex wo d
………
………
The dog sa on
………
………
………
………
………
……… ………
………
Figu e 2.5 LLM—nex oken p edic o
Hawaiian pizza is my a o i e
42 CHAPTER 2 In oduc ion o la ge language models
p obable nex oken is “ma ” a 41% p obabili y. We also see a lis o he o he okens
and hei p obabilis ic dis ibu ions.
Figu e 2.7 Example 1
Howe e , changing one oken om “ he” o “a” d ama ically changes he nex dis i-
bu ion se , wi h he ma jumping up 30 poin s o a p obabili y o nea ly 75%, as shown
in igu e 2.8.
Figu e 2.8 Example 2
Some se ings ela ed o LLMs a e impo an and can change how he model beha es
and gene a es ex . These se ings a e he model con igu a ions and can be changed
ia an API, GUI, o bo h. We co e model con igu a ions in mo e de ail la e in he
chap e .
2.8.3 Coun ing okens
Many de elope s will p obably be new o acking okens when using LLM, especially
in an en e p ise se ing. Howe e , coun ing okens is impo an o se e al easons:
Memo y limi a ions—LLMs can p ocess a maximum numbe o okens in a single
pass. This is due o he memo y limi a ions o hei a chi ec u e, o en de ined
by hei con ex window (ano he concep we discuss la e in his chap e ). Fo
example, OpenAI’s la es GPT-4o model has a con en window o 128K, and
432.8 Key concep s o LLMs
Google’s la es Gemini 1.5 P o has a con ex window o 1M okens. GPT3.5-
Tu bo, ano he OpenAI model, has wo models suppo ing 8K and 16K oken
leng hs. The e is esea ch ongoing o see how o sol e his, such as LongNe [6]
om Mic oso Resea ch, which shows how o scale o 1B con ex windows. I is
impo an o poin ou ha his is s ill an ac i e esea ch a ea and has no been
p oduc ized ye .
Cos —When hinking abou cos , he e a e wo dimensions: he compu a ional
cos s in e ms o la ency, memo y, and he o e all expe ience, and he ac ual
cos in e ms o money. Fo each call, he compu a ional esou ces equi ed o
p ocessing okens di ec ly co ela e o he okens’ leng h. As he oken leng h
inc eases, i equi es mo e p ocessing ime, leading o mo e compu a ional
equi emen s (speci ically memo y and GPUs) and highe la ency. This also
means inc eased cos s o using he LLMs.
AI quali y—The quali y o a model’s ou pu depends on he numbe o okens i
is asked o gene a e o p ocess. I he ex is oo sho , he model migh no
ha e enough con ex o p o ide a good answe . Con e sely, i he ex is oo
long, he model migh lose cohe ence in i s esponse. We will ouch on he
no ion o good e sus poo as pa o p omp enginee ing la e in chap e 6.
Fo many en e p ises, cos and pe o mance a e key ac o s in deciding whe he o
use okens. Gene ally speaking, smalle models a e mo e cos -e ec i e and e icien
han bigge ones.
Lis ing 2.1 shows a simple way o calcula e he numbe o okens. In his example,
we use an open sou ce lib a y called ik oken, eleased by OpenAI. This okenize
lib a y implemen s a by e-pai encoding (BPE) algo i hm. These okenize s a e
designed wi h hei espec i e LLMs, ensu ing e icien okeniza ion and op imal pe -
o mance du ing p e aining and ine- uning p ocesses. I you use one o he OpenAI
models, you mus use his okenize ; many o he ans o me models also use i . I
needed, you can ins all he ik oken lib a y using pip ins all ik oken
impo ik oken as k
de coun _ okens(s ing: s , encoding_name: s ) -> in :
# Ge he encoding
encoding = k.ge _encoding(encoding_name)
# Encode he s ing
encoded_s ing = encoding.encode(s ing)
# Coun he numbe o okens
num_ okens = len(encoded_s ing)
e u n num_ okens
# De ine he inpu s ing
p omp = “I ha e a whi e dog named Champ”
Lis ing 2.1 Coun ing okens o GPT
The encoding speci ies how he
ex is con e ed in o okens.

44 CHAPTER 2 In oduc ion o la ge language models
# Display he numbe o okens in he S ing
p in (“Numbe o okens:” , coun _ okens(p omp , “cl100k_base”))
Running his code, as expec ed, gi es us he ollowing ou pu :
$ py hon coun ing okens.py
Numbe o okens: 7
NOTE By e-pai encoding (BPE) is a comp ession algo i hm widely used in
NLP asks, such as ex classi ica ion, ex gene a ion, and machine ansla-
ion. One o he BPE ad an ages is ha i is e e sible and lossless, so we can
ge he o iginal ex . BPE wo ks on any ex ha he okenize ’s aining da a
hasn’ seen, and i comp esses he ex , esul ing in sho e oken sequences
han he o iginal ex . BPE also helps gene alize epea ing pa e ns in a lan-
guage and p o ides a be e unde s anding o g amma . Fo example, he
ge und -ing o m is qui e common in English (swimming, unning, debug-
ging, e c.). BPE will spli i in o di e en okens, so “swim” and “-ing” in swim-
ming become wo okens and gene alize be e .
I we a e no su e o he name o he encoding o use, ins ead o he unc ion ge _
encoding(), we can use he encoding_ o _model() unc ion. This akes he name o
he model we wan o use and u ilizes he co esponding encoding, such as encoding
= ik oken.encoding_ o _model('gp -4'). Fo OpenAI, able 2.3 shows di e en
suppo ed encodings.
Lis ing 2.2 shows how o use di e en encodings and how o ge he o iginal ex om
he okens. We should unde s and his as a basic cons uc o now, bu i is use ul o
mo e ad anced use cases such as caching and chunking ex —aspec s ha we co e
la e in he book.
impo ik oken as k
de ge _ okens(s ing: s , encoding_name: s ) -> s :
# Ge he encoding
encoding = k.ge _encoding(encoding_name)
# Encode he s ing
e u n encoding.encode(s ing)
Table 2.3 OpenAI encodings
Encoding OpenAI model
cl100k_base gp -4, gp -3.5- u bo, gp -35- u bo, ex -embedding-ada-002
p50k_base Codex models, ex -da inci-002, ex -da inci-003
50k_base GPT-3 models (da inci, cu ie, babage, ada)
Lis ing 2.2 Tokens
452.8 Key concep s o LLMs
de ge _s ing( okens: s , encoding_name: s ) -> s :
# Ge he encoding
encoding = k.ge _encoding(encoding_name)
# Decode he okens
e u n encoding.decode( okens)
# De ine he inpu s ing
p omp = “I ha e a whi e dog named Champ.”
# Display he okens
p in (“cl100k_base Tokens:” , ge _ okens(p omp , “cl100k_base”))
p in (“ p50k_base Tokens:” , ge _ okens(p omp , “p50k_base”))
p in (“ 50k_base Tokens:” , ge _ okens(p omp , “ 50k_base”))
p in (“O iginal S ing:” , ge _s ing([40, 617, 264, 4251, 5679, 7086, 56690,
13], “cl100k_base”))
$ py hon encodings.py
cl100k_base Tokens: [40, 617, 264, 4251, 5679, 7086, 56690, 13]
p50k_base Tokens: [40, 423, 257, 2330, 3290, 3706, 29260, 13]
50k_base Tokens: [40, 423, 257, 2330, 3290, 3706, 29260, 13]
O iginal S ing: I ha e a whi e dog named Champ.
In addi ion o he ik oken lib a y we ha e been using in he examples, he e a e a
ew o he popula okenize s. Remembe ha each okenize is designed o he co -
esponding LLM and canno be in e changed:
Wo dPiece—Used by he BERT model om Google, i spli s ex in o smalle
uni s based on he mos equen wo d pieces, allowing o e icien ep esen a-
ion o a e o ou -o - ocabula y wo ds.
Sen encePiece—Me a’s RoBERTa model (Robus ly Op imized BERT) uses he
model. I combines Wo dPiece and BPE app oaches in o a single language-
agnos ic amewo k, allowing o mo e lexibili y.
T5 okenize —Based on Sen encePiece, i is used by Google’s T5 model (Tex - o-
Tex T ans e T ans o me ).
XLM okenize —This is used in Me a’s XLM (C oss-lingual Language Model) and
implemen s a BPE me hod wi h lea ned embeddings (BPEmb). I is designed o
handle mul ilingual ex and suppo c oss-lingual ans e lea ning.
2.8.4 Embeddings
Embeddings a e powe ul machine-lea ning ools o la ge inpu s ep esen ing wo ds.
They cap u e seman ic simila i ies in a ec o space (i.e., a collec ion o ec o s, as
shown in igu e 2.9), allowing us o de e mine i wo ex chunks ep esen he same
meaning. By p o iding a simila i y sco e, embeddings can help us be e unde s and
he ela ionships be ween di e en pieces o ex .
The idea behind embeddings is ha wo ds wi h simila meanings should ha e sim-
ila ec o ep esen a ions, as measu ed by hei dis ances. Vec o s wi h smalle dis-
ances be ween hem sugges hey a e highly ela ed, and hose wi h longe dis ances
46 CHAPTER 2 In oduc ion o la ge language models
sugges low ela edness. The e a e a ew ways o measu e simila i ies; we will co e
hese la e in chap e 7.
These ec o s a e lea ned du ing aining and a e used o cap u e he meaning o
wo ds o ph ases. AI algo i hms can easily u ilize hese ec o s o loa ing-poin
numbe s.
Figu e 2.9 Embeddings
Fo example, he wo d “ca ” migh be ep esen ed by a ec o as [0.2, 0.3, -0.1], while
he wo d “dog” migh be ep esen ed as [0.4, 0.1, 0.2]. These ec o s can hen be used
as inpu o machine lea ning models o asks such as ex classi ica ion, sen imen
analysis, and machine ansla ion.
Embeddings a e lea ned when he model is ained on a la ge co pus o ex da a.
The idea is o cap u e he meaning o wo ds o ph ases based on hei con ex in he
aining da a.
Depending on he ask, he e a e se e al algo i hms o c ea ing embeddings:
Simila i y embeddings a e good a cap u ing seman ic simila i y be ween wo o
mo e pieces o ex .
Tex sea ch embeddings measu e whe he long documen s a e ele an o a
sho que y.
Code sea ch embeddings a e use ul o embedding code snippe s and na u al
language sea ch que ies.
NOTE Embeddings c ea ed by one me hod canno be unde s ood by ano he .
In o he wo ds, i you c ea e an embedding using OpenAI’s API, embeddings
o ano he p o ide will no unde s and he ec o s c ea ed, and ice e sa.
Lis ing 2.3 shows how o ge an embedding ( om OpenAI in his example). We
de ine a unc ion called ge _embedding() ha akes a s ing o which we need o c e-
a e embeddings as a pa ame e . The unc ion uses OpenAI’s API o gene a e an
embedding o he inpu ex using he ex -embedding-ada-002 model. The embed-
ding is e u ned as a lis o loa ing-poin numbe s.
impo os
om openai impo OpenAI
clien = OpenAI(api_key=’you -API-key’)
Lis ing 2.3 Ge ing an embedding in OpenAI
Embedding
model
I ha e a whi e dog.
P omp
0.00608 0.01417 ….. 0.02123
P omp as ec o
472.8 Key concep s o LLMs
de ge _embedding( ex ):
esponse = clien .embeddings.c ea e(
model=" ex -embedding-ada-002",
inpu = ex )
e u n esponse.da a[0].embedding
embeddings = ge _embedding("I ha e a whi e dog named Champ.")
p in ("Embedding Leng h:", len(embeddings))
p in ("Embedding:", embeddings[:5])
The ec o space esul ing om he embedding isn’ a one- o-one mapping o he
okens bu can be a lo mo e. The ou pu o he p e ious examples is shown nex . Fo
b e i y, we only show he i s i e i ems in he lis :
p in ("Embedding Leng h:", len(embeddings))
p in ("Embedding:", embeddings[:5])
2.8.5 Model con igu a ion
Mos LLMs expose some con igu a ion se ings o he use , allowing one o weak how
he model ope a es and i s beha io o some ex en . While a ew pa ame e s would
change depending on he model implemen a ion, he h ee key con igu a ions a e
empe a u e, op p obabili y ( op_p), and max esponse. No e ha some implemen a-
ions migh ha e a di e en name bu mean he same hing. The OpenAI implemen-
a ion o GPT calls he maximum esponse as max okens. Le us explo e hese in a
li le mo e de ail.
MAX RESPONSE
The pa ame e known as max esponse essen ially de ines he uppe limi o he ex
leng h ha he model gene a es. This means ha once he model hi s his p ede e -
mined leng h, i hal s ex gene a ion, ega dless o whe he i is mid-wo d o mid-
sen ence. I ’s c ucial o g asp his con igu a ion because he e is a size limi o he
okens mos models can p ocess. Inc easing his size co esponds o heigh ened com-
pu a ional demands, leading o inc eased la ency and cos .
TEMPERATURE
When gene a ing ex , as wi h any ounda ional model, inhe en andomness yields a
di e en ou pu each ime we call he model. Tempe a u e is one o he mos impo -
an se ings o con olling he deg ee o he model’s andomness. Typically, his is a
alue om 0 o 1, wi h 0 ep esen ing a mo e accu a e and p edic able ou pu . In con-
as , se ing a 1 makes he ou pu mo e di e se and andom, as shown in igu e 2.10.
TOP PROBABILITY (TOP_P)
The op p obabili y ( op_p) pa ame e (also known as nucleus sampling) is a se ing
in language model APIs ha s ee s he andomness o he ex -gene a ion p ocess.
This pa ame e allows one o ine- une he balance be ween c ea i i y and eliabili y in
he ex ha he model gene a es. I de ines a h eshold p obabili y; only wo ds wi h
p obabili ies abo e his h eshold a e conside ed when he model gene a es ex .
When a language model gene a es ex , i p edic s he p obabili y o each wo d being
54 CHAPTER 2 In oduc ion o la ge language models
// XXX: ails when a and b e e o same memo y loca ion
# de ine XOR_SWAP(a,b) do
{
a ^= b;
b ^= a;
a ^= b;
} while (0)
// walk inwa ds om bo h ends o he S ing,
// swapping un il we ge o he middle
while (s < end)
{
XOR_SWAP(*s , *end);
s ++;
end--;
}
# unde XOR_SWAP
}
}
The i s s ep would be o ask ou LLM o con e his code o Py hon using a p omp
simila o he nex one. We a e omi ing he C unc ion (lis ing 2.4) again o b e i y.
The code is pa o he p omp .
Con e his C unc ion o Py hon:
#include <s ing.h> // o s len()
// e e se he gi en null- e mina ed S ing in place
oid inplace_ e e se(cha * s )
{
…
}
Lis ing 2.5 shows he ou pu om ou model wi h he code con e ed o Py hon. I
does a good job o unde s anding wha he C unc ion is doing and ou lines Py hon’s
di e en cons ain s and p ope ies in he esponse. No e ha we did no gi e i any
hin s as o wha he unc ion is doing. I deciphe ed he while loop i sel , showing an
example o eme gen beha io .
This unc ion will e u n he e e sed s ing a he han modi y he o iginal s ing
because Py hon s ings a e immu able. In o he wo ds, once a s ing is c ea ed in
Py hon, i canno be changed. Hence, he unc ion inplace_ e e se is a bi o a mis-
nome in his Py hon con ex . You could conside enaming i o some hing like
e e se_s ing.
de inplace_ e e se(s ):
i s :
# Con e he S ing o a lis since s ings a e immu able in Py hon
ls = lis (s )
Lis ing 2.5 Con e ed Py hon unc ion o s ing e e sal

552.8 Key concep s o LLMs
# Ini ialize wo poin e s a he s a and end o he lis
s a = 0
end = len(ls ) - 1
# Re e se he lis in place
while s a < end:
# XOR_SWAP equi alen in Py hon
ls [s a ], ls [end] = ls [end], ls [s a ]
# Mo e he poin e s owa ds he cen e o he lis
s a += 1
end -= 1
# Con e he lis back in o a s ing
e u n ''.join(ls )
e u n s
We can simpli y his u he by asking he model he ollowing:
Can you u he simpli y and op imize he Py hon code?
Su e! One o he simples ways o e e se a s ing in Py hon uses slicing. He e’s a
unc ion ha will e u n he e e se o a s ing:
de e e se_s ing(s):
e u n s[::-1]
The [::-1] is a slice ha s a s a he end o he s ing and mo es backwa d.
Howe e , i you wan o emula e he same logic as he C p og am, ha is, e e sing
a s ing in place using wo poin e s, he e is a mo e Py honic way o achie ing his:
de e e se_s ing(s):
ls = lis (s)
ls . e e se()
e u n ''.join(ls )
In his e sion, we con e he s ing o a lis , e e se he lis in place using he buil -in
e e se unc ion, and hen join he lis back in o a s ing. The e e se unc ion in
Py hon uses a simila wo-poin e app oach, bu i ’s all unde he hood, so he code
looks much simple .
The poin he esponse calls ou on a simila wo-poin e app oach demons a es
he eme gen p ope y’s powe . We ne e explici ly called his ou anywhe e in he
con e sa ion.
In his chap e , we looked a he co e concep s ha make LLMs wo k and wha we
need o be mind ul o when implemen ing hese new gene a i e AI echniques in he
en e p ise. We co e new concep s such as p omp s, p omp enginee ing, embed-
dings, okens, model pa ame e s, con ex windows, ans o me a chi ec u e, and
eme gen beha io . Now ha we ha e a b oad unde s anding o LLMs and hei impli-
ca ions in he nex chap e , le us pu his in o ac ion and wo k h ough an API o
gene a e ex , bo h as comple ions whe e we p o ide ins uc ions ia a p omp and as
56 CHAPTER 2 In oduc ion o la ge language models
cha whe e we can ha e a dialogue, and in ha , gene a e ex , in he con ex o he
con e sa ion.
Summa y
La ge language models (LLMs) ep esen a majo ad ancemen in AI. They a e
ained on as amoun s o ex da a o lea n pa e ns in human language.
LLMs a e gene al-pu pose and can handle asks wi hou ask-speci ic aining
da a, such as answe ing ques ions, w i ing essays, summa izing ex s, ansla ing
languages, and gene a ing code.
Key LLM use cases include summa iza ion, classi ica ion, Q&A/cha bo s, con-
en gene a ion, da a analysis, ansla ion and localiza ion, p ocess au oma ion,
esea ch and de elopmen , sen imen analysis, and en i y ex ac ion.
Types o LLMs include base, ins uc ion-based, and ine- uned LLM. Each has
p os and cons and is powe ed by ounda ional models.
Founda ional models a e la ge AI models ained on as quan i ies o da a a a
massi e scale, esul ing in models ha can be adap ed o a wide ange o down-
s eam asks.
Some key LLM concep s include p omp s, p omp enginee ing, embeddings,
okens, model pa ame e s, con ex windows, ans o me a chi ec u e, and
eme gen beha io .
Open sou ce and comme cial LLMs ha e ad an ages and disad an ages, wi h
comme cial models ypically o e ing s a e-o - he-a pe o mance and open
sou ce models p o iding mo e lexibili y o cus omiza ion and in eg a ion.
Small language models (SLMs) a e a new eme ging end o ligh weigh gene -
a i e AI models ha p oduce ex , summa ize documen s, ansla e languages,
and answe ques ions. In some cases, hey o e capabili ies simila o hose o
la ge models.
57
Wo king h ough an API:
Gene a ing ex
We ha e seen ha la ge language models (LLMs) p o ide a powe ul sui e o
machine lea ning ools speci ically designed o enhance na u al language unde -
s anding and gene a ion. OpenAI ea u es wo no able APIs: he comple ion and
he cha comple ion APIs. These APIs, unique in hei dynamic and e ec i e
This chap e co e s
Gene a i e AI models and hei ca ego iza ion based
on speci ic applica ions
The p ocess o lis ing a ailable models,
unde s anding hei capabili ies, and choosing he
app op ia e ones
The comple ion API and cha comple ion API o e ed
by OpenAI
Ad anced op ions o comple ion and cha comple ion
APIs ha help us s ee he model and hence con ol
he gene a ion
The impo ance o managing okens in a con e sa ion
o imp o ed use expe ience and cos -e ec i eness
58 CHAPTER 3 Wo king h ough an API: Gene a ing ex
ex -gene a ion capabili ies, esemble human ou pu . In addi ion, hey o e de elop-
e s exclusi e oppo uni ies o c a a ious applica ions, om cha bo s o w i ing assis-
an s. OpenAI was he i s o in oduce he pa e n o comple ion and cha
comple ion APIs, which now embody almos all implemen a ions, especially when
companies wan o build gene a i e-AI-powe ed ools and p oduc s.
The comple ion API by OpenAI is an ad anced ool ha gene a es con ex ually
app op ia e and cohe en ex o comple e use p omp s. Con e sely, he cha com-
ple ion API was designed o emula e an in e ac ion wi h a machine lea ning model,
p ese ing he con ex o a con e sa ion ac oss mul iple exchanges, which makes i
sui able o in e ac i e applica ions.
Chap e 3 es ablishes he g oundwo k o scaling en e p ises. These APIs can sig-
ni ican ly accele a e he de elopmen o in elligen applica ions, he eby educing he
ime o alue. We’ll mos ly use OpenAI and Azu e OpenAI as illus a i e examples,
o en in e changeably. The code models emain consis en , and he APIs a e la gely
simila . Many en e p ises may g a i a e owa d Azu e OpenAI because o he con ol i
o e s, while o he s migh a o OpenAI. I is impo an o no e ha we assume he e
ha an Azu e OpenAI ins ance has al eady been deployed as pa o you Azu e sub-
sc ip ion, and we will be e e encing i in he con ex o ou examples.
This chap e ou lines he basics o he comple ion and he cha comple ion APIs,
including how hey di e and when o use each. We will see how o implemen hem
in an applica ion and how we can s ee he model gene a ion and i s andomness.
We’ll also see how o manage okens, which a e key ope a ion conside a ions when
deploying o p oduc ion. These a e he undamen al aspec s equi ed o build on o
a mission-c i ical applica ion. Bu i s , le ’s s a by unde s anding he di e en model
ca ego ies and hei ad an ages.
3.1 Model ca ego ies
Gene a i e AI models can be classi ied in o a ious ca ego ies based on hei speci ic
applica ions, and each ca ego y includes di e en ypes o models. We s a ou dis-
cussion by unde s anding he di e en classi ica ions o models wi hin gene a i e AI.
This unde s anding will help us iden i y he ange o models a ailable and choose he
mos app op ia e one o a gi en si ua ion.
The a ailabili y o di e en ypes and models may a y, depending on he API in
use. Fo example, Azu e OpenAI and OpenAI p o ide di e en e sions o LLMs.
Some e sions migh be phased ou , some could be limi ed, and o he s could be
exclusi e o a ce ain o ganiza ion.
Di e en models ha e unique ea u es and capabili ies, di ec ly a ec ing hei cos
and compu a ional equi emen s. Thus, choosing he igh model o each use case is
c i ical. In con en ional compu e science, he idea ha bigge is be e has o en
been applied o memo y, s o age, CPUs, o bandwid h. Howe e , in he case o LLMs,
his p inciple is no always applicable. OpenAI p o ides a hos o models ca ego ized,
as shown in able 3.1. No e ha hese a e he same o bo h OpenAI and Azu e
OpenAI, as he unde lying models a e iden ical.
593.1 Model ca ego ies
Each model ca ego y con ains a ia ions ha a e u he dis inguished by ce ain ea-
u es such as oken size. As discussed in he p e ious chap e , oken size de e mines a
model’s con ex window, which de ines he amoun o inpu and ou pu i can p ocess.
Fo ins ance, he o iginal GPT-3 models had a maximum oken size o 2K. GPT-3.5
Tu bo, a subse o models wi hin he GPT-3.5 ca ego y, has wo e sions—one wi h a
oken size o 4K and ano he wi h a oken size o 16K. These a e double and quad u-
ple he oken size o he o iginal GPT-3 models. Table 3.2 ou lines he mo e popula
models and hei capabili ies.
Table 3.1 OpenAI model ca ego ies
Model ca ego y Desc ip ion
GPT-4 The newes and mos powe ul e sion is a se o mul imodal models. GPT-4 is ained
on a la ge da ase wi h mo e pa ame e s, making i e en mo e capable. I can pe -
o m asks ha a e ou o each o he p e ious models. The e a e a ious models in
he GPT-4 amily—GPT-4.0, GPT-4 Tu bo, and he la es GPT-4o (omni), a mul imodal
model and he mos powe ul in he amily a he ime o publica ion.
GPT-3.5 A se o models ha imp o e on GPT-3 and can unde s and and gene a e na u al
language o code. When unsu e, hese should be he de aul models o mos
en e p ises.
DALL.E A model ha can gene a e images when gi en a p omp
Whispe A model ha is used o speech- o- ex , con e ing audio in o ex
Embeddings A se o models o con e ex in o i s nume ical o m
GPT-3 (Legacy) A se o models ha can gene a e and unde s and na u al language. These we e he
o iginal se o models ha a e now conside ed legacy. In mos cases, we would wan
o s a wi h one o he newe models, 3.5 o 4.0, which de i e om GPT-3.
Table 3.2 Model desc ip ions and capabili ies
Model Capabili ies
Ada (legacy) Simple classi ica ion, pa sing, and o ma ing o ex . This model is pa o
he GPT-3 legacy.
Babbage (legacy) Seman ic sea ch anking, medium complex classi ica ion. This model is pa
o he GPT-3 legacy.
Cu ie (legacy) Answe ing ques ions, highly complex classi ica ion. This model is pa o he
GPT-3 legacy.
Da inci (legacy) Summa iza ion, gene a ing c ea i e con en . This model is pa o he GPT-3
legacy.
Cushman-Codex (legacy) A descendan o he GPT-3 se ies, ained in na u al language and billions o
lines o code. I is he mos capable in Py hon and p o icien in o e a dozen
o he p og amming languages.
Da inci-Codex A mo e capable model o Cushman-codex

60 CHAPTER 3 Wo king h ough an API: Gene a ing ex
No e ha he men ioned legacy models a e s ill a ailable and wo k as in ended. How-
e e , he newe models a e be e , ha ing mo e mindsha e and longe suppo . Mos
should s a wi h GPT-3.5 Tu bo as he de aul model and use GPT-4 on a case-by-case
basis. Some imes, e en a smalle , olde model, such as he GPT-3 Cu ie, is good. This
p o ides he igh balance be ween he model’s capabili y, cos , and o e all pe o -
mance.
In he ea ly days o gene a i e AI, all he models we e a ailable only o some.
These will a y by company, egion, and in he case o Azu e, you subsc ip ion ype,
among o he hings. We ha e o lis he models and hei capabili ies ha a e a ailable
o us o use. Howe e , be o e lis ing models, le us see he dependencies equi ed o
ge hings wo king.
3.1.1 Dependencies
In his sec ion, we call ou he un ime dependencies and con igu a ions needed a a
high le el. To ge hings wo king, we need a leas he ollowing i ems:
De elopmen IDE—We use Visual S udio Code o ou examples, bu you can use
any hing you a e com o able wi h.
Py hon—We use 3.11.3 in his book, bu you can use any e sion as long as i is
3.7.1 o la e . The ins alla ion ins uc ions a e a ailable a h ps://
www.py hon.o g/ i you need o ins all Py hon.
OpenAI Py hon lib a ies—We use Py hon lib a ies o mos o he code and he
demos. The OpenAI Py hon lib a y can be a simple ins alla ion in conda, using
conda ins all -c conda- o ge openai. I you a e using pip, use pip ins all
--upg ade openai. The e a e also so wa e de elopmen ki s (SDKs) o spe-
ci ic languages i you p e e o use hose ins ead o Py hon packages.
Azu e Subsc ip ion o OpenAI API access—We use OpenAI’s endpoin and he
Azu e OpenAI (AOAI) endpoin in e changeably; in mos cases, ei he op ion
will wo k. Gi en he emphasis on en e p ises o his book, we end o lean
owa d using he Azu e OpenAI se ice:
GPT3.5-Tu bo The mos capable GPT-3.5 model op imized o cha use cases is 90%
cheape and mo e e ec i e han GPT-3 Da inci.
GPT-4, GPT-4 Tu bo Mo e capable han any GPT-3.5 model. I is able o do mo e complex asks
and is op imized o cha models.
GPT-4o The la es GPT-4o model is mo e capable han he GPT-4 and GPT-4 Tu bo,
bu i is also wice as as and 50% cheape .
ex -embedding-ada-002,
ex -embedding-ada-003
This new embedding model eplaces i e sepa a e models o ex sea ch,
simila i y, and code sea ch, ou pe o ming hem a mos asks; u he mo e,
i is 99.8% cheape .
Table 3.2 Model desc ip ions and capabili ies (con inued)
Model Capabili ies
613.1 Model ca ego ies
– To use he lib a y wi h Azu e endpoin s, we need he api_key.
– We also need o se he api_ ype, api_base, and api_ e sion p ope ies.
The api_ ype mus be se o azu e, he api_base poin s o he endpoin ha
we deploy, and he co esponding e sion o he API is speci ied ia
api_ e sion.
– Azu e OpenAI uses 'engine' as he pa ame e o speci y he model’s name.
When deploying he model in you Azu e subsc ip ion, his name needs o
be se o you chosen name. Fo example, igu e 3.1 is a sc eensho o he
deploymen s in one subsc ip ion. OpenAI, howe e , uses he pa ame e
model o speci y he model’s name. These model names a e s anda d as hey
elease hem. You can ind mo e de ails on Azu e OpenAI and OpenAI a
h ps://mng.bz/yoYd and h ps://pla o m.openai.com/docs/.
NOTE The Gi Hub code eposi o y accompanying he book (h ps://bi
.ly/GenAIBook) has he de ails o he code, including dependencies and
ins uc ions.
Ha dcoding he endpoin and key is no an ad isable p ac ice. The e a e mul iple
me hods o accomplish his ask, one o which includes using en i onmen a iables.
We demons a e his me hod in he s eps ha ollow. O he al e na i es could be
e ching hem om sec e s o es o en i onmen iles. Fo he sake o simplici y, we
will s ick o en i onmen a iables in his guide. Howe e , you a e encou aged o
adhe e o you en e p ise’s bes p ac ices and ecommenda ions. Se ing up he en i-
onmen a iables can be achie ed h ough he ollowing commands.
Fo Windows, hese a e
se x AOAI_KEY "you -openai-key"
se x AOAI_ENDPOINT "you -openai-endpoin "
NOTE You may need o es a you e minal o ead he new a iables.
On Linux/Mac, we ha e
expo AOAI_ENDPOINT=you -openai-endpoin
expo AOAI_KEY=you -openai- key
Bash uses
echo expo AOAI_KEY="YOUR_KEY" >> /e c/en i onmen && sou ce /e c/
en i onmen
echo expo AOAI_ENDPOINT="YOUR_ENDPOINT" >> /e c/en i onmen &&
➥sou ce /e c/en i onmen
NOTE In his book, we will use conda, an open sou ce package manage , o
manage ou speci ic un ime e sions and dependencies. Technically, using a
package manage like conda is no manda o y, bu i is ex emely bene icial
o isola ing and oubleshoo ing p oblems and is highly ecommended. We
won’ del e in o he speci ics o ins alling conda in his con ex ; o de ailed,
62 CHAPTER 3 Wo king h ough an API: Gene a ing ex
s ep-by-s ep ins uc ions on how o ins all i , please e e o he o icial docu-
men a ion a h ps://docs.conda.io/.
Fi s , le us c ea e a new conda en i onmen and ins all he equi ed OpenAI Py hon
lib a y:
$ conda c ea e -n openai py hon=3.11.3
(base) $ conda ac i a e openai
(openai) $ conda ins all -c conda- o ge openai
Now ha we ha e ou dependencies ins alled, le ’s connec o he Azu e OpenAI end-
poin and ge de ails o he a ailable models.
3.1.2 Lis ing models
As we ou lined ea lie , each o ganiza ion may ha e di e en models o use. We’ll s a
by unde s anding wha models we ha e access o; we’ll use he APIs o help us se up he
basic en i onmen and ge i unning. Then, I’ll show you how o do his using he
Azu e OpenAI Py hon SDK and ou line he di e ences when using he OpenAI API.
As he nex lis ing shows, we connec o he Azu e OpenAI endpoin , ge a lis o
all he models a ailable, i e a e o e hose, and p in ou he de ails o each model o
he console.
impo os
impo json
om openai impo Azu eOpenAI
clien = Azu eOpenAI(
azu e_endpoin =os.ge en ("AOAI_ENDPOINT"),
api_ e sion="2023-05-15",
api_key=os.ge en ("AOAI_KEY")
)
# Call he models API o e ie e a lis o a ailable models
models = clien .models.lis ()
# sa e o ile
wi h open('azu e-oai-models.json', 'w') as ile:
models_dic = [model.__dic __ o model in models]
json.dump(models_dic , ile)
# P in ou he names o all he a ailable models, and hei capabili ies
o model in models:
p in ("ID:", model.id)
p in ("Cu en s a us:", model.li ecycle_s a us)
p in ("Model capabili ies:", model.capabili ies)
p in ("-------------------")
Running his code will p esen us wi h a lis o a ailable models. The ollowing lis ing
shows an example o he models a ailable; he exac lis may be di e en o you.
Lis ing 3.1 Lis ing Azu e OpenAI models a ailable
Requi ed o Azu e
OpenAI endpoin s
This is he en i onmen a iable
poin ing o he endpoin
published ia he Azu e po al.
Choose he API e sion
we wan o use om
he mul iple op ions.
This is he en i onmen
a iable wi h he API key.
633.1 Model ca ego ies
{
"id": "gp -4- ision-p e iew",
"c ea ed": null,
"objec ": "model",
"owned_by": null
},
{
"id": "dall-e-3",
"c ea ed": null,
"objec ": "model",
"owned_by": null
},
{
"id": "gp -35- u bo",
"c ea ed": null,
"objec ": "model",
"owned_by": null
},
…
Each model is cha ac e ized by i s dis inc capabili ies, sugges ing he use cases o
which i is ailo ed—speci ically o cha comple ions, comple ions (which a e egula
ex comple ions), embeddings, and ine- uning. Fo example, a cha comple ion
model would be he ideal selec ion in a si ua ion whe e con e sa ional engagemen is
equi ed, like a cha -based in e ac ion ha equi es signi ican dialogue exchange.
Con e sely, a comple ion model would be he mos sui able o ex gene a ion. We
can iew he OpenAI base models wi h Azu e AI S udio in igu e 3.1.
Figu e 3.1 Base model lis ed
Lis ing 3.2 Lis ing Azu e OpenAI models’ ou pu
70 CHAPTER 3 Wo king h ough an API: Gene a ing ex
o choice in esponse.choices:
p in (choice. ex )
When we un his upda ed code, we ge he esponse shown in lis ing 3.6. The p op-
e y choices a e an a ay, and we ha e h ee i ems, wi h he index s a ing a a base
ze o. Each has he gene a ed ex o us o use. Depending on he use case, his is
help ul when picking mul iple comple ions.
1. Pe Pampe ing Palace
2. Pe G ooming Ha en
3. Pe ec Pe Pa lo
1. Pawsi i e Pe Spa
2. Fu -E e F iends Pe Salon
3. Pu ec ion Pe Ca e
1. Pampe ed Paws P o essional Pe Ca e
2. Pe sonalized Pe Pampe ing
3. F iendly Fu i ic Pe Ca e
Ano he simila bu mo e powe ul pa ame e is he bes _o pa ame e . Like he n
pa ame e , i gene a es mul iple comple ions, allowing he op ion o pick he bes .
The bes _o is he comple ion wi h he highes log p obabili y pe oken. We canno
s eam esul s when using his op ion. Howe e , i can be combined wi h he n pa am-
e e s, wi h bes _o needs g ea e han n.
As shown in he ollowing lis ing, i we se n o 5, we ge i e comple ions as
expec ed; o b e i y, we do no show all i e o he comple ions he e, bu no e ha
his call uses 184 okens.
{
"choices": [
{
…
],
"c ea ed": 1689097645,
"id": "cmpl-7bBkLk60mA8R9c AKXqTmTwzx2IEI",
"model": "gp -35- u bo",
"objec ": " ex _comple ion",
"usage": {
"comple ion_ okens": 152,
"p omp _ okens": 32,
" o al_ okens": 184
}
}
Lis ing 3.6 Ou pu showing mul iple esponses
Lis ing 3.7 Ou pu showing mul iple esponses

713.2 Comple ion API
I we un a simila call using he bes _o pa ame e , do no speci y he n pa ame e :
esponse = clien .comple ions.c ea e(
model="gp -35- u bo",
p omp =p omp _s a ph ase,
empe a u e=0.7,
max_ okens=100,
bes _o =5,
s op=None)
When we un his code, we ge only one comple ion, as shown in lis ing 3.8; howe e ,
we a e using a simila numbe o okens as ea lie (171 e sus 184). This is because
he se ice gene a es i e comple ions on he se e side and e u ns he bes one.
The API uses he log p obabili y pe oken o pick he bes op ion. The highe he log
p obabili y, he mo e con iden he model is abou i s p edic ion.
{
"choices": [
{
" inish_ eason": "s op",
"index": 0,
"logp obs": null,
" ex ": " n n1. Pawsi i ely P o essional Pe Salon n
➥2. F iendly Fu F iends Pe Salon n
➥3. Pe sonalized Pampe ed Pe s Salon",
"con en _ il e _ esul s"={...}
}
],
"c ea ed": 1689098048,
"id": "cmpl-7bBqqp uoV5n gH ahuWGVAiM50Aj",
"model": "gp 35",
"objec ": " ex _comple ion",
"usage": {
"comple ion_ okens": 139,
"p omp _ okens": 32,
" o al_ okens": 171
}
}
The one pa ame e ha in luences many o he esponses is he empe a u e se ing.
Le ’s see how his changes he ou pu .
3.2.4 Con olling andomness
As discussed in he p e ious chap e , he empe a u e se ing in luences he andom-
ness o he gene a ed ou pu . A lowe empe a u e p oduces mo e epe i i e and de e -
minis ic esponses, while a highe empe a u e p oduces mo e inno a i e esponses.
Fundamen ally, he e isn’ a igh se ing—i all comes down o he use cases.
Lis ing 3.8 Ou pu gene a ion wi h bes _o i e comple ions
72 CHAPTER 3 Wo king h ough an API: Gene a ing ex
Fo en e p ises, a mo e c ea i e ou pu would be when he e is in e es in di e se
ou pu and c ea ing ex o use cases such as con en gene a ion o ma ke ing, s o-
ies, poems, ly ics, jokes, e c. These a e hings ha usually equi e c ea i i y. Howe e ,
en e p ises need mo e eliable and p ecise answe s o use cases, such as documen
au oma ion o in oice gene a ion, p oposals, code gene a ion, e c. These se ings a e
applicable pe API call, so combining di e en empe a u e le els in he same wo k-
low is possible.
As demons a ed in p e ious examples, we ecommend a empe a u e se ing o
0.8 o c ea i e esponses. Con e sely, a se ing o 0.2 is sugges ed o mo e p edic -
able esponses. Using an example, le us examine how hese se ings al e he ou pu
and obse e he a ia ions be ween mul iple calls.
When he empe a u e was se o 0.8, we ecei ed he ollowing esponses om
h ee consecu i e calls. The ou pu changes as expec ed, o e ing sugges ions like
hose seen h oughou his chap e . I is impo an o no e ha we do no need o
make h ee sepa a e API calls. We can se he n pa ame e o 3 in a single API call o
gene a e mul iple esponses. He e is wha ou API call looks like:
esponse = clien .comple ions.c ea e(
model="gp -35- u bo",
p omp =p omp _s a ph ase,
empe a u e=0.8,
max_ okens=100,
n=3,
s op=None)
The ollowing lis ing shows he c ea i e gene a ion o he h ee esponses.
{
"choices": [
{
" inish_ eason": "con en _ il e ",
"index": 0,
"logp obs": null,
" ex ": "",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "s op",
"index": 1,
"logp obs": null,
" ex ": " n n1. Pawsi i ely P o essional Pe S yling n
➥2. Fu -E e F iendly Pe G oome s n
➥3. Tailo ed TLC Pe Ca e",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "s op",
"index": 2,
Lis ing 3.9 Comple ions ou pu wi h he empe a u e a 0.8
Fi s esponse: ge blocked
by he con en il e
Second o h ee
esponses
Final esponse wi h e y
di e en gene a ed ex
733.2 Comple ion API
"logp obs": null,
" ex ": " n n1. Pawsi i ely P o essional Pe Salon n
➥2. F iendly Fu - e ni y Pe Ca e n
➥3. Pe sonalized Pup Pampe ing Place",
"con en _ il e _ esul s"={...}
}
],
"c ea ed": 1689123394,
"id": "cmpl-7bIRe6Ponn8y1198 lJF agq64 2E",
"model": "gp 35",
"objec ": " ex _comple ion",
"usage": {
"comple ion_ okens": 96,
"p omp _ okens": 32,
" o al_ okens": 128
}
}
Le ’s change he se ing o make his mo e de e minis ic and un i again. No e ha
he only change in he API call is empe a u e=0.2. The ou pu is p edic able and
de e minis ic, wi h e y simila ex gene a ed be ween he h ee esponses.
{
"choices": [
{
" inish_ eason": "s op",
"index": 0,
"logp obs": null,
" ex ": " n n1. Pawsi i ely P o essional Pe Salon n
➥2. F iendly Fu Salon n
➥3. Pe sonalized Pe Pampe ing",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "s op",
"index": 1,
"logp obs": null,
" ex ": " n n1. Pawsi i ely P o essional Pe Salon n
➥2. F iendly Fu -E e Pe Salon n
➥3. Pe sonalized Pe Pampe ing Salon",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "s op",
"index": 2,
"logp obs": null,
" ex ": " n n1. Pampe ed Paws Pe Salon n
➥2. F iendly Fu Salon n
➥3. P o essional Pe Pampe ing"
}
],
...
}
Lis ing 3.10 Comple ions ou pu wi h he empe a u e a 0.2
One o h ee
esponses
Two o h ee esponses; e y
simila gene a ed ex
The inal esponse wi h
e y simila gene a ed ex
74 CHAPTER 3 Wo king h ough an API: Gene a ing ex
The empe a u e alue goes up o 2, bu i is no ecommended o go ha high, as he
model s a s hallucina ing mo e and c ea ing nonsensical ex . I we wan mo e c e-
a i i y, we usually wan i o be a 0.8 and, a mos , 1.2. Le us see an example when he
empe a u e is changed o 1.8. In his example, we did no e en ge he hi d gene a-
ion, as we hi he oken limi and s opped he gene a ion.
{
"choices": [
{
" inish_ eason": "s op",
"index": 0,
"logp obs": null,
" ex ": " n n1. Com o G oom Pe Fu nishing n2. Pampe ed T eaBankan
Ca e s n3. Top ech Sunny Ha en P omo ion.",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "s op",
"index": 1,
"logp obs": null,
" ex ": " n n1: Na u alis o Clea lywowG oomingz
➥Pe Luxusia n2: VipalMinde e s Pe
➥S a enca ines g ooming n3: B isasia
➥C ownsnus Take Ca e Budds oshesipalising",
"con en _ il e _ esul s"={...}
},
{
" inish_ eason": "leng h",
"index": 2,
"logp obs": null,
" ex ": " n n1. T us owS a Pe Salon n
➥2. Ha ipad TailTagz G ooming & S yles n
➥3. Li leLo Millonis a Cosmania DipSa ez
➥Hubopolis ShineB igh P incessly
➥P osnoi a ianis ics Kensoph Cowlosophy
➥Exp essionala Na ix o d i Mundulan e E o i y
➥Di ineSponn Bo dlo eDV Eni yzBFA P es ageina o
➥Supe Gold Clou oilyna C i ina illies
➥P och omomumphance Toud",
➥"con en _ il e _ esul s"={...}
}
],
...
}
3.2.5 Con olling andomness using op_p
An al e na i e o he empe a u e pa ame e o managing andomness is he op_p
pa ame e . I has he same a ec on he gene a ion as he empe a u e pa ame e , bu
i uses a di e en echnique called nucleus sampling. Essen ially, nucleus sampling
Lis ing 3.11 Comple ions ou pu wi h he empe a u e a 1.8
One o h ee esponses wi h
names ha a en’ e y clea
Second and hi d o h ee
esponses, wi h nonsensical names
753.3 Ad anced comple ion API op ions
allows only he okens wi h a p obabili y equal o o less han he alue o op_p o be
conside ed as pa o he gene a ion.
Nucleus sampling c ea es ex s by picking wo ds om a small g oup o he mos
likely ones wi h he highes cumula i e p obabili y. The op_p alue decides how small
his g oup is based on he o al chance o he wo ds o appea in i . The g oup size
can change depending on he nex wo d’s chance. Nucleus sampling can help a oid
epe i ion and gene a e mo e a ied and clea e ex s han o he me hods.
Fo example, i we ha e he op_p alue se o 0.9, only he okens ha make up
90% o he p obabili y dis ibu ion will be sampled o he gene a ion o ex . This
allows us o a oid he las 10%, which a e o en qui e andom and di e se and end up
as nonsensical hallucina ions.
A lowe alue o op_p makes he model mo e consis en and less c ea i e as i
chooses ewe okens o gene a e. Con e sely, a highe alue makes he gene a ion
mo e c ea i e and di e se, as i has a la ge se o okens o ope a e. The la ge alue
also makes i p one o mo e e o s and andomness. The exac alue o op_p
depends on he use case; in mos cases, he ideal alue o op_p anges be ween 0.7
and 0.95. We should change ei he he empe a u e a ibu e o op_p, bu no bo h.
Table 3.5 ou lines he ela ionship be ween he wo.
Le us look a some o he ad anced API op ions o speci ic scena ios.
3.3 Ad anced comple ion API op ions
Now ha we ha e examined he basic cons uc s o he comple ion API and unde -
s and how hey wo k, we need o conside mo e ad anced aspec s o he comple ion
API. Many o hese migh no seem as complex, bu hey add many mo e esponsibili-
ies o he sys em a chi ec u e, complica ing o e all implemen a ion.
3.3.1 S eaming comple ions
The comple ions API allows s eaming esponses, o e ing immedia e access o in o -
ma ion as soon as i is eady a he han wai ing o a ull esponse. Fo en e p ises,
s eaming can be impo an in some cases whe e eal- ime con en gene a ion wi h
Table 3.5 Rela ionship be ween empe a u e and op_p
Tempe a u e op_p E ec
Low Low Gene a es p edic able ex ha closely ollows common language pa e ns
Low High Gene a es p edic able ex , bu wi h occasional less common wo ds o
ph ases
High Low Gene a es ex ha is o en cohe en bu wi h c ea i e and unexpec ed wo d
usage
High High Gene a es highly di e se and unp edic able ex wi h a ious wo d choices and
ideas; has e y c ea i e and di e se ou pu , bu may con ain many e o s

76 CHAPTER 3 Wo king h ough an API: Gene a ing ex
lowe la ency is key. This ea u e can enhance use expe iences by p ocessing incom-
ing esponses p omp ly.
To enable s eaming om he API’s s andpoin , modi y he s eam pa ame e o
ue. By de aul , his op ional pa ame e is se o alse.
S eaming employs se e -sen e en s (SSE), which equi e a clien -side implemen-
a ion. SSE is a s anda d p o ocol allowing se e s o con inue ansmi ing da a o cli-
en s a e es ablishing he ini ial connec ion. I is a long- e m, one-way connec ion
om se e o clien . SSE o e s ad an ages such as low la ency, educed bandwid h
consump ion, and an uncomplica ed con igu a ion se up.
Lis ing 3.12 demons a es how ou example can be adjus ed o u ilize s eaming.
Al hough he API modi ica ion is s aigh o wa d, he desc ip ion and eques ed mul-
iple gene a ions we e adjus ed (using he n p ope y). This allows us o gene a e
mo e ex a i icially, making i easie o obse e he s eaming gene a ion.
impo os
impo sys
om openai impo Azu eOpenAI
clien = Azu eOpenAI(
azu e_endpoin =os.ge en ("AOAI_ENDPOINT"),
api_ e sion="2024-05-01-p e iew",
api_key=os.ge en ("AOAI_KEY"))
p omp _s a ph ase = "Sugges h ee names and a agline
➥which is a leas 3 sen ences o a new pe salon business.
➥The gene a ed name ideas should e oke posi i e emo ions and he
➥ ollowingkey ea u es: P o essional, iendly, Pe sonalized Se ice."
o esponse in clien .comple ions.c ea e(
model="gp -35- u bo",
p omp =p omp _s a ph ase,
empe a u e=0.8,
max_ okens=500,
s eam=T ue,
s op=None):
o choice in esponse.choices:
sys.s dou .w i e(s (choice. ex )+" n")
sys.s dou . lush()
When managing a s eaming call, we mus pay ex a a en ion o he inish_ eason
p ope y. As messages a e s eamed, each appea s as a s anda d comple ion, wi h he
ex ep esen ing he newly gene a ed oken. In hese ins ances, he inish_ eason
emains null. Howe e , he inal message di e s; i s inish_ eason could be ei he
s op o leng h, depending on wha igge ed i .
Lis ing 3.12 S eaming comple ion
Tweaked he p omp
sligh ly o add
desc ip ions
We need o handle he s eaming
esponse on he clien side.
Enables
s eaming
We need o loop h ough he a ay
and handle mul iple gene a ions.
773.3 Ad anced comple ion API op ions
...
{
" inish_ eason": null,
"index": 0,
"logp obs": null,
" ex ": " Pe "
}
{
" inish_ eason": null,
"index": 0,
"logp obs": null,
" ex ": " Pam"
}
{
" inish_ eason": null,
"index": 0,
"logp obs": null,
" ex ": "pe ing"
}
{
" inish_ eason": "s op",
"index": 0,
"logp obs": null,
" ex ": ""
}
3.3.2 In luencing oken p obabili ies: logi _bias
The logi _bias pa ame e is one way we can in luence ou pu comple ion. In he
API, his pa ame e allows us o manipula e he p obabili y o ce ain okens, which
can be wo ds o ph ases, ha he model gene a es in i s esponses. I is called logi _
bias because i di ec ly a ec s he log odds, o logi s, ha he model calcula es o
each po en ial oken du ing he gene a ion p ocess. The bias alues a e added o
hese log-odds be o e con e ing hem o p obabili ies, al e ing he inal dis ibu ion
o okens he model can pick om.
The impo ance o his ea u e lies in i s abili y o s ee he model’s ou pu . Say we
a e c ea ing a cha bo and wan i o a oid ce ain wo ds o ph ases. We can use
logi _bias o dec ease he likelihood o hose okens being chosen by he model. In
con as , i he e a e ce ain wo ds o ph ases we wan he model o a o , we could
use logi _bias o inc ease hei likelihood. The ange o his pa ame e is om –100
o 100, and i ope a es on okens o he wo d. Se ing a oken o –100 e ec i ely bans
i om he gene a ion, whe eas se ing i o 100 makes i exclusi e.
To use logi _bias, we p o ide a dic iona y whe e he keys a e he okens, and he
al-ues a e he biases ha need o be applied o hose okens. To ge he oken, we use
he ik oken lib a y. Once you ha e he app op ia e oken, you can assign a posi i e bias
o make i mo e likely o appea o a nega i e bias o make i less likely, as shown in igu e
3.2. The blocks show he deg ee o p obabili y ha di e en okens can be a di e en
Lis ing 3.13 S eaming inish eason
78 CHAPTER 3 Wo king h ough an API: Gene a ing ex
p obabili ies o banning o exclusi e gene a ion. Smalle changes o he okens’ alue
inc ease o dec ease he p obabili y o hese okens in he gene a ed ou pu .
Figu e 3.2 The logi _bias pa ame e
Le ’s use an example o see how we can make his wo k. Fo ou pe salon name, we
do no wan o use he wo ds “pu ,” “pu s,” o “meow.” The i s hing we wan o do
is c ea e he okens o hese wo ds. We also wan o add wo ds wi h a p eceding space
and capi alize hem as spaces. Capi al le e s a e all di e en okens. So “Meow” and
“Meow” (wi h a space) and “meow” (again wi h a space) migh ead he same o us,
bu when i comes o okens, hese wo ds a e all di e en . The ou pu shows us he
okens o he co esponding wo d:
'Pu Pu s Meow Pu pu pu s meow:[30026, 81, 9330,
➥3808, 42114, 9330, 81, 1308, 81, 1308, 3808, 502, 322]'
Now ha we ha e he okens, we can add hem o he comple ion call. No e ha we
assign each oken a bias o –100, s ee ing he model away om hese wo ds.
impo os
om openai impo Azu eOpenAI
clien = Azu eOpenAI(
azu e_endpoin =os.ge en ("AOAI_ENDPOINT"),
api_ e sion="2024-05-01-p e iew",
api_key=os.ge en ("AOAI_KEY"))
GPT_MODEL = "gp -35- u bo"
p omp _s a ph ase = "Sugges h ee names o a new pe salon
➥business. The gene a ed name ideas should e oke posi i e
➥emo ions and he ollowing key ea u es: P o essional,
➥ iendly, Pe sonalized Se ice."
esponse = clien .comple ions(
model=GPT_MODEL,
Lis ing 3.14 logi _bias implemen a ion
Tokens
Ban Exclusi e
Ou pu
logi _bias
-100 100
Lowe p obabili y Highe p obabili y
793.3 Ad anced comple ion API op ions
p omp =p omp _s a ph ase,
empe a u e=0.8,
max_ okens=100,
logi _bias={
30026:-100,
81:-100,
9330:-100,
808:-100,
42114:-100,
1308:-100,
3808:-100,
502:-100,
322:-100
}
)
esponse ex = esponse.choices[0]. ex
p in ("P omp :" + p omp _s a ph ase + " nResponse:" + esponse ex )
We do no ha e any wo ds we wan o a oid when we un his code.
{
"choices": [
{
" inish_ eason": "s op",
"index": 0,
"logp obs": null,
" ex ": " n n1. Paw P in s Pe Pampe ing n2. Fu y F iends Fussing n3.
Posh Pe Pooches"
}
],
...
}
We can do he opposi e and posi i ely bias okens oo. Say we wan o o e emphasize
and s ee he model owa d he wo d “Fu y.” We can use he ik oken lib a y we saw
ea lie and ind ha he okens o “Fu y” a e [37, 16682]. We can upda e he p e i-
ous API call wi h his and, in his case, a posi i e bias o 5.
GPT_MODEL = "gp -35- u bo"
esponse = clien .comple ions.c ea e(
model=GPT_MODEL,
p omp =p omp _s a ph ase,
empe a u e=0.8,
max_ okens=100,
logi _bias={
30026:-100,
81:-100,
Lis ing 3.15 Ou pu o logi _bias gene a ion
Lis ing 3.16 logi _bias: Posi i e implemen a ion
Dic iona y con aining he
okens and he co esponding
bias alues o s ee he model
on hese speci ic okens
86 CHAPTER 3 Wo king h ough an API: Gene a ing ex
NOTE The ollowing pa ame e s a e una ailable wi h he new GPT-35 Tu bo
and GPT-4 models: logp obs, bes _o , and echo. T ying o se any o hese
pa ame e s will h ow an excep ion.
The ou pu o he p e ious example is shown in he nex lis ing. The use s a ed wi h
“Hello, Wo ld!”, and he sys em esponded, asking how o help us wi h he assis an
message. The ques ion abou dog de ails is he nex dialogue u n.
{
"id": "cha cmpl-7c9HOaRHmau 94oZacjdUqxOL96H3",
"objec ": "cha .comple ion",
"c ea ed": 1689326490,
"model": "gp -35- u bo",
"choices": [
{
"index": 0,
" inish_ eason": "s op",
"message": {
" ole": "assis an ",
"con en ": "Dogs a e o en conside ed man's bes
➥ iend o good eason. They p o ide
➥companionship, p o ec ion, and loyal y o
➥ hei owne s. He e a e some easons why dogs a e
➥good o humans: n n1. They educe s ess: Dogs
➥ha e been shown o educe s ess and anxie y in
➥ hei owne s. Pe ing a dog can elease endo phins,
➥which a e eel-good ho mones ha can help o
➥ educe s ess. n n2. They p o ide companionship..."
}
}
],
"usage": {
"comple ion_ okens": 238,
"p omp _ okens": 60,
" o al_ okens": 298
}
}
3.4.1 Sys em ole
The sys em ole (some also call i he sys em message) is included a he beginning o
he message a ay. This message p o ides he ini ial ins uc ions o he model, and we
can p o ide a ious pieces o in o ma ion in he sys em ole, including
A b ie desc ip ion o he assis an
Pe sonali y ai s o he assis an
Rules and ins uc ions you wan he assis an o ollow
Addi ional in o ma ion needed o he model (e.g., ele an ques ions om
an FAQ)
Lis ing 3.20 Cha comple ion API ou pu

873.4 Cha comple ion API
We cus omize he sys em ole and include basic ins uc ions o he use case. F om an
API pe spec i e, e en hough he sys em ole is op ional, i is highly ecommended
ha you make his in en ional o ge he bes esul s. Fo example, i we expand on
he p e ious example o cha ing o pe s and pe salons, we can ins uc he model o
only eply in hyme.
GPT_MODEL = "gp -35- u bo"
esponse = clien .cha .comple ions.c ea e(
engine=GPT_MODEL,
messages = [
{" ole": "sys em", "con en ": "You a e a help ul AI
➥assis an ha p o ides u h ul in o ma ion.
➥You answe all ques ions in hyme."},
{" ole": "use ", "con en ": "Hi"},
{" ole": "assis an ", "con en ": "Hello he e, how can
➥I assis ? nAsk me a ques ion, don' esis !"},
{" ole": "use ", "con en ": "who a e he ounde s o Mic oso ?"},
{" ole": "assis an ", "con en ": "Bill Ga es and Paul Allen,
➥i 's ue, nA e he ounde s o Mic oso ,
➥ h ough and h ough."},
{" ole": "use ", "con en ": "wha is a good name o a pe salon?"},
{" ole": "assis an ","con en ": "Fo a pe salon [
➥ ha 's simply di ine, nHe e's a name ha 's [
➥su e o shine: n "Paws and Pampe " is wha I
➥p opose, nA name ha 's ca chy and easy o compose."}],
empe a u e=0.7,
max_ okens=800,
op_p=0.95,
equency_penal y=0,
p esence_penal y=0,
s op=None)
In he example, we can ha e a con e sa ion as expec ed, which can a y opics in
u ns, bu all he answe s hyme.
When we wan o gi e he model addi ional da a as con ex o he con e sa ion,
his is called g ounding he da a. I he e is a small amoun o da a, his can be pa o
he sys em ole, as shown in he nex lis ing. Howe e , i he e is a la ge amoun o
da a, we should use embeddings and e ie e he mos ele an in o ma ion using a
seman ic sea ch (e.g., Azu e cogni i e sea ch).
{" ole": "sys em", "con en ": Assis an is an in elligen cha bo designed o
help use s answe echnical ques ions abou Azu e OpenAI se ice. Only
answe ques ions using he con ex below. Don’ make up he answe . I
you a e unsu e o an answe , say 'I don' know'.
Lis ing 3.21 Cha comple ion sys em message example
Lis ing 3.22 G ounding sys em message example
Ins uc s o
answe in hyme
88 CHAPTER 3 Wo king h ough an API: Gene a ing ex
Con ex :
- Azu e OpenAI Se ice p o ides REST API access o OpenAI's powe ul language
models including he GPT-3, Codex and Embeddings model se ies.
- Azu e OpenAI Se ice gi es cus ome s ad anced language AI wi h OpenAI GPT-
3, Codex, and DALL-E models wi h he secu i y and en e p ise p omise o
Azu e.
..."
},
{" ole": "use ", "con en ": "Wha is Azu e OpenAI Se ice?"}
3.4.2 Finish eason
E e y cha comple ion API esponse has a inish eason encoded in he inish_
eason ield. T acking is impo an in his case, as i helps us unde s and why he API
e u ned he esponse i did. This can be use ul o debugging and imp o ing he
applica ion. Fo example, i you ecei e an incomple e esponse due o he leng h
inish eason, you may wan o adjus he max_ okens pa ame e o gene a e mo e
comple e esponses. The possible alues o inish_ eason a e
s op—The API inished gene a ing and ei he e u ned a comple e message o
a message e mina ed by one o he s op sequences p o ided using he s op
pa ame e .
leng h—The API s opped he model ou pu due o he max_ okens pa ame e
o oken limi .
 unc ion_call—The model decided o call a unc ion.
con en _ il e —Some o he comple ion was il e ed due o ha m ul con en .
3.4.3 Cha comple ion API o noncha scena ios
OpenAI’s cha comple ion can be used o noncha scena ios. The API is qui e simila
and designed o be a lexible ool ha can be adap ed o a ious use cases, no jus
con e sa ions. In mos cases, he ecommended pa h uses he cha comple ion API as
i i we e he comple ion API. The main eason is ha he newe models (Cha 3.5-
Tu bo and GPT-4) a e much mo e e icien , cheape , and powe ul han he ea lie
models. The comple ion use cases we ha e seen, such as analyzing and gene a ing ex
and answe ing ques ions om a knowledge base, would all s ill wo k wi h he cha
comple ion API.
Implemen ing he cha comple ion API noncha scena ios usually in ol es s uc-
u ing he con e sa ion wi h a se ies o messages and a sys em message o se he assis-
an ’s beha io . Fo example, as shown in he ollowing lis ing, he sys em message se s
he ole o he assis an , and he use message p o ides he ask.
GPT_MODEL = "gp -35- u bo"
esponse = clien .cha .comple ions.c ea e(
model=GPT_MODEL,
Lis ing 3.23 Cha comple ion as a comple ion API example
893.4 Cha comple ion API
messages=[
{" ole": "sys em", "con en ": "You a e a help ul assis an ."},
{" ole": "use ", "con en ": "T ansla e he ollowing
➥English ex o Spanish: 'Hello, how a e you?'"}
]
)
We can also use a se ies o use messages o p o ide mo e con ex o accomplish mo e
complex asks, as shown in he nex lis ing. In his example, he i s use message se s
up he ask, and he second use message p o ides mo e speci ic de ails. The assis an
gene a es a esponse ha a emp s o comple e he ask in he use messages.
GPT_MODEL = "gp -35- u bo"
esponse = clien .cha .comple ions.c ea e(
model=GPT_MODEL,
messages=[
{" ole": "sys em", "con en ": "You a e a help ul assis an ."},
{" ole": "use ", "con en ": "I need o w i e a Py hon unc ion."},
{" ole": "use ", "con en ": "This unc ion should ake wo
➥numbe s as inpu and e u n hei sum."}
]
)
3.4.4 Managing con e sa ion
Ou examples keep unning, bu he con e sa ion will hi he model’s oken limi as i
con inues. Wi h each u n o he con e sa ion (i.e., he ques ion asked and he answe
ecei ed), he lis o messages g ows. As a eminde , he oken limi o GPT-35 Tu bo
is 4K okens, and o GPT-4 and GPT-4 32K, i is 8K and 32K, espec i ely; hese
include he o al coun om he message lis sen and he model esponse. We ge an
excep ion i he o al coun exceeds he ele an model limi .
No ou -o - he-box op ion can ack his oken coun o us and ensu e i alls wi hin
he oken limi . As pa o he en e p ise app design, we need o ack he oken coun
and only send a p omp ha alls wi hin he limi .
Many en e p ises a e in he p ocess o implemen ing an en e p ise e sion o
Cha GPT using he cha API. He e a e some o he bes p ac ices ha can help en e -
p ises manage hese con e sa ions. Remembe , he bes way o ge you desi ed ou pu
in ol es i e a i e es ing and e ining you ins uc ions:
Se ing he beha io wi h sys em message—You should use he sys em message a he
s a o he con e sa ion o guide he model’s beha io and o en e p ises o
une o e lec hei b and o IP.
P o iding explici ins uc ions—I he model is no gene a ing you desi ed ou -
pu , make you ins uc ions mo e explici . Think abou i a he same le el as i
you we e elling a oddle wha no o do.
Lis ing 3.24 Cha comple ion as a comple ion API example
90 CHAPTER 3 Wo king h ough an API: Gene a ing ex
B eaking down complex asks—I you ha e a complex ask, b eak i down in o se -
e al simple asks, and send hem as sepa a e use messages. You o en need o
show, no explain i . This is called Chain o Though (CoT), and i will be co -
e ed in mo e de ail in chap e 6.
Expe imen a ion—Feel ee o expe imen wi h he pa ame e s o ge he desi ed
ou pu . A highe empe a u e alue (e.g., 0.8) makes he gene a ion mo e an-
dom, while a lowe alue (e.g., 0.2) makes i mo e de e minis ic. You can also
use he maximum oken alue o limi esponse leng h.
Managing okens—Be awa e o he o al numbe o okens in a con e sa ion, as
inpu and ou pu okens coun owa d he o al. You mus unca e, omi , o
sho en you ex i a con e sa ion has oo many okens o i wi hin he model's
maximum limi .
Handling sensi i e con en —I you’ e dealing wi h po en ially unsa e con en , you
should look a Azu e OpenAI’s Responsible AI guidelines (h ps://mng.bz/
pxVK). Howe e , i you a e using OpenAI’s API, hen OpenAI’s mode a ion
guide is help ul (h ps://mng.bz/OmEw) o adding a mode a ion laye o he
ou pu s o he cha API.
TRACKING TOKENS
As ou lined ea lie , keeping ack o okens when using he con e sa ional API is key.
No only will he expe ience su e i we go o e he o al oken size, bu he o al num-
be o okens in an API also has a di ec e ec on la ency and on how long he call
akes. Finally, he mo e okens we use, he mo e we pay. He e a e some ways you can
manage okens:
Coun okens. Use he ik oken lib a y, which allows us o coun how many
okens a e in a s ing wi hou making an API call.
Limi esponse leng h. When making an API call, use he max_ okens p ope y o
limi he leng h o he model’s esponses.
T unca e long con e sa ions. I a con e sa ion has oo many okens o i wi hin
he model’s maximum limi , we mus unca e, omi , o sho en ou ex .
Limi he numbe o u ns. Limi ing he numbe o u ns in he con e sa ion is a
good way o unca e o sho en he ex . This also helps s ee he model be e
when he con e sa ion ge s longe and ends o s a hallucina ing.
Check he usage ield in he API esponse. A e making an API call, we can check
he usage ield in he API esponse o see he o al numbe o okens used. This
is ongoing and includes bo h inpu and ou pu okens. I is a good way o keep
ack o okens and show hem o he use ia some UX.
Reduce empe a u e. Reducing he empe a u e pa ame e can make he model's
ou pu s mo e ocused and concise, which can help educe he numbe o
okens used in he esponse.
Say we wan o build a cha applica ion o ou pe salon and allow cus ome s o ask us
ques ions abou pe s, g ooming, and hei needs. We can build a console cha
913.4 Cha comple ion API
applica ion, as shown in lis ing 3.25. I also shows us a possible way o ack and man-
age okens. In his example, we ha e a unc ion num_ okens_ om_messages which, as
he name sugges s, is used o calcula e he numbe o okens in a con e sa ion.
As he con e sa ion g ows u n by u n, we calcula e he numbe o okens used,
and once i eaches he model limi , he old messages a e emo ed om he con e sa-
ion. No e ha we s a a index 1. This ensu es we always p ese e he sys em message
a index 0 and only emo e use /assis an messages.
impo os
om openai impo Azu eOpenAI
impo ik oken
clien = Azu eOpenAI(
azu e_endpoin =os.ge en ("AOAI_ENDPOINT"),
api_ e sion=”2024-05-01-p e iew”,
api_key=os.ge en (“AOAI_KEY”))
GPT_MODEL = "gp -35- u bo"
sys em_message = {" ole": "sys em", "con en ": "You a e
➥a help ul assis an
max_ esponse_ okens = 250
oken_limi = 4096
con e sa ion = []
con e sa ion.append(sys em_message)
de num_ okens_ om_messages(messages):
encoding= ik oken.ge _encoding("cl100k_base")
num_ okens = 0
o message in messages:
num_ okens += 4
o key, alue in message.i ems():
num_ okens += len(encoding.encode( alue))
i key == "name":
num_ okens += -1
num_ okens += 2
p in ("I am a help ul assis an . I can alk abou pe s and salons.")
while T ue:
use _inpu = inpu ("")
con e sa ion.append({" ole": "use ", "con en ": use _inpu })
con _his o y_ okens = num_ okens_ om_messages(con e sa ion)
while con _his o y_ okens + max_ esponse_ okens >= oken_limi :
del con e sa ion[1]
con _his o y_ okens = num_ okens_ om_messages(con e sa ion)
esponse = clien .cha .comple ions.c ea e(
model=GPT_MODEL,
Lis ing 3.25 ConsoleCha App: Token managemen
Se s up he OpenAI
en i onmen and
con igu a ion de ails
Se s up he sys em
message o he cha
Func ion o coun he o al
okens om all he messages
in he con e sa ion
Uses he
ik oken
lib a y
o coun
okens
Loops h ough
he messages
Cap u es he use inpu
When he o al okens exceed he oken limi ,
we emo e he second oken. The i s oken
is he sys em oken, which we always wan .
Cha comple ion
API call

92 CHAPTER 3 Wo king h ough an API: Gene a ing ex
messages=con e sa ion,
empe a u e=0.8,
max_ okens=max_ esponse_ okens)
con e sa ion.append({" ole": "assis an ", "con en ":
➥ esponse.choices[0].message.con en })
p in (" n" + esponse.choices[0].message.con en )
p in ("(Tokens used: " + s ( esponse.usage. o al_ okens) + ")")
CHAT COMPLETION VS. COMPLETION API
Bo h cha comple ion and comple ion APIs a e designed o gene a e human-like ex
and a e used in di e en con ex s. The comple ion API is designed o single- u n
asks, p o iding comple ion o a p omp p o ided by he use . I is mos sui ed o
asks whe e only a single esponse is equi ed.
In con as , he cha comple ion API is designed o mul i u n con e sa ions,
main aining he con ex o he con e sa ion o e mul iple exchanges. This makes i
mo e sui able o in e ac i e applica ions such as cha bo s. The cha comple ion API is
a new dedica ed API o in e ac ing wi h he GPT-35-Tu bo and GPT-4 models and is
he p e e ed me hod. The cha comple ion API is gea ed mo e owa d cha bo s, and
using he di e en oles (sys em, use , and assis an ), we can ge he memo y o
p e ious messages and o ganize ew-sho examples.
3.4.5 Bes p ac ices o managing okens
Fo LLMs, okens a e he new cu ency. As mos en e p ises go beyond kicking i es o
business-c i ical use cases, managing okens would become a p io i y o compu a-
ions, cos , and o e all expe ience. F om an en e p ise applica ion pe spec i e, he e
a e some o he conside a ions o managing okens:
Concise p omp s—Whe e possible, using concise p omp s and limi ing he maxi-
mum numbe o okens will educe he oken’s usage, making i mo e cos -
e ec i e.
S op sequences—Use s op sequences o s op he gene a ions o a oid gene a ing
unnecessa y okens.
Coun ing okens—We can coun okens using he ik oken lib a y as ou lined
ea lie and a oid making he API calls do he same.
Smalle models—Gene ally speaking, in compu ing, bigge and newe ha dwa e
and so wa e a e conside ed as e , cheape , and be e ; howe e , his isn’ nec-
essa ily he case o LLMs. Whe e possible, conside using smalle models such
as GPT-3.5 Tu bo i s , and when hey migh no be a good i , conside going o
he nex one. Smalle models a e less compu e in ensi e and, hence, a e mo e
economical.
Use caching—Fo p omp s ha a e ei he qui e s a ic o equen ly epea ed,
implemen ing a caching s a egy would help sa e okens and a oid making API
calls epea edly. Fo mo e complex scena ios, look o cache he embeddings
933.4 Cha comple ion API
using a ec o sea ch and s o e, such as Azu e Cogni i e Sea ch, Pinecone, e c.
The las chap e co e ed an in oduc ion o embeddings, and we will ge mo e
de ails on embeddings and sea ching la e in chap e s 7 and 8 when we co e
RAG and cha ing wi h you da a.
3.4.6 Addi ional LLM p o ide s
Addi ional endo s also now ha e LLMs o use o en e p ises. These a e ei he a ail-
able ia APIs o , in some cases, as model weigh s ha en e p ises can sel -hos . Table
3.7 ou lines some o he mo e amous ones a ailable a he ime o publica ion. Please
no e ha some es ic ions a e in place om a comme cial-licensing pe spec i e.
In e es ingly, all hese endo s ollow a simila app oach o he concep s and APIs
es ablished by OpenAI. Fo example, as ou lined by hei documen s, he PaLM
model om Google’s comple ion API equi alen is p esen ed in he nex lis ing.
google.gene a i eai.gene a e_ ex (*,
model: Op ional[model_ ypes.ModelNameOp ions] = 'models/ ex -bison-001',
p omp : s ,
empe a u e: Op ional[ loa ] = None,
Table 3.7 O he LLM p o ide s
Models Desc ip ions
Llama 2 Me a eleased Llama 2, an open sou ce LLM, which comes in h ee sizes (7 billion, 13
billion, and 70 billion pa ame e s) and is ee o esea ch and comme cial pu poses.
Companies can access his h ough cloud op ions such as Azu e AI’s model ca alog,
Hugging Face, o AWS. En e p ises ha wan o hos i using hei own compu e and
GPUs can eques access om Me a ia h ps://ai.me a.com/llama/.
PaLM PaLM is a 13 billion-pa ame e model om Google ha is pa o hei gene a i e AI o
de elope p oduc s. The model can pe o m ex summa iza ion, dialogue gene a ion,
and na u al language in e ence asks. A he ime o publica ion, he e was a wai lis o
an API key; de ails a e a ailable a h ps://de elope s.gene a i eai.google/.
BLOOM Bloom is a 223-billion pa ame e , open sou ce mul ilingual model ha can unde s and
and gene a e ex in o e 100 languages by collabo a ing wi h o e 1,000 esea che s
ac oss mo e han 250 ins i u ions. I is a ailable ia Hugging Face o deploymen . Mo e
de ails a e a ailable a h ps://hugging ace.co/bigscience/bloom.
Claude Claude is a 12-billion pa ame e de eloped by An h opic. I is accessible h ough a play-
g ound in e ace and API in i s de elope console o de elopmen and e alua ion pu -
poses only. A publica ion, o p oduc ion use, en e p ises mus con ac Claude o
comme cial discussions. Mo e de ails can be ound a h ps://mng.bz/YVqz.
Gemini Google ecen ly eleased a new LLM called Gemini, a successo o PaLM 2 and op i-
mized o di e en sizes: ul a, p o, and nano. I is designed o be mo e powe ul han
i s p edecesso and can be used o gene a e new con en . Google claims i o be hei
mos capable AI model ye . Mo e de ails can be ound a h ps://mng.bz/GNxD.
Lis ing 3.26 PaLM-gene a ed ex API signa u e
94 CHAPTER 3 Wo king h ough an API: Gene a ing ex
max_ou pu _ okens: Op ional[in ] = None,
op_p: Op ional[ loa ] = None,
op_k: Op ional[ loa ] = None,
s op_sequences: Union[s , I e able[s ]] = None,
) -> ex _ ypes.Comple ion
While hese op ions exis , and some a e om epu able and leading echnology com-
panies, o mos en e p ises, Azu e OpenAI and OpenAI a e he mos ma u e, wi h
he mos en e p ise con ols and suppo needed. The nex chap e will deal wi h
images, and we will lea n how o mo e om ex o images and gene a e in ha
modali y.
Summa y
GenAI models a e classi ied in o a ious ca ego ies, depending on he ype.
Each model has addi ional capabili ies and cha ac e is ics. Choosing he igh
model o he use case a hand is impo an . And unlike compu e science, in
ou case, he bigges model isn’ necessa ily be e .
The comple ion API is a sophis ica ed ool ha gene a es ex , which can be
used o comple e p omp s p o ided by he use and o ms he backbone o he
ex gene a ion pa adigm.
The comple ion API is ela i ely easy o use wi h only a ew key pa ame e s, such
as he p omp , numbe o okens o gene a e, empe a u e pa ame e ha
helps s ee he model, and numbe o comple ions o gene a e.
The API exposes many ad anced op ions o s ee ing models and con olling
andomness and gene a ed ex , such as logi _bias, p esence penal y, and e-
quency penal y. All hese wo k in andem and help gene a e be e ou pu .
When using Azu e OpenAI, he con en sa e y il e can help il e speci ic ca e-
go ies o iden i y and ac on po en ially ha m ul con en as pa o bo h he
inpu p omp s and gene a ed comple ions.
The cha comple ion API builds on he comple ion API, going om one se o
ins uc ions and APIs o a dialogue wi h he use in a u n-by- u n in e ac ion.
The cha comple ion consis s o mul iple sys ems, use , and assis ance oles.
The con e sa ion s a s wi h a sys em message ha se s he assis an 's beha io ,
ollowed by al e na ing use and assis an messages as he con e sa ion p o-
ceeds u n by u n.
The sys em ole is included a he beginning o he message a ay. I p o ides
he ini ial ins uc ions o he model, including pe sonali y ai s, ins uc ions
and ules o he assis an o ollow, and addi ional in o ma ion we wan o p o-
ide as con ex o he model; his addi ional in o ma ion is called g ounding
he da a.
Each comple ion and cha comple ion API esponse has a inish eason, which
helps us unde s and why he API e u ned he esponse i did. This can be use-
ul o debugging and imp o ing he applica ion.
95Summa y
The language lea ning models all ha e a ini e con ex window and a e qui e
expensi e. Managing okens becomes impo an o us o be able o un hings
a a easonable cos and wi hin he API allowance. This also helps us manage
okens in con e sa ions o imp o ed use expe ience and cos -e ec i eness.
In addi ion o Azu e OpenAI and OpenAI, he e a e o he LLM p o ide s, such
as Me a’s Llama 2, Google’s Gemini and PaLM, Bloom by BigScience, and
An h opic’s Claude. Thei o e ings a e simila and ollow he comple ions and
cha comple ions pa adigm, including simila APIs.
102 CHAPTER 4 F om pixels o pic u es: Gene a ing images
Figu e 4.5 GAN model a chi ec u e
GANs o e many simila use cases, such as VAEs, bu hey a e speci ically good o
Image gene a ion—C ea ing ealis ic images om noise, wi h speci ic applica-
ions in en e aining, design, and a , allows gene a ing high-quali y images.
S yle ans e —Enabling a is ic s yles o anspose om one image o ano he ;
his is he same as in VAEs.
Supe esolu ions—GANs can help enhance esolu ion, making images mo e
de ailed and clea e . This is e y help ul in some indus ies, such as medical
and space imaging.
Da a augmen a ion—Simila o VAEs o c ea ing syn he ic da a, GANs help c e-
a e aining da a ei he o edge cases o whe e he e is no enough da a o da a
di e si y.
GANs can p oduce high-quali y images ha a e indis inguishable om eal ones. S ill,
hey ha e d awbacks, such as mode collapse (i.e., he model epea edly p oduces he
same ou pu ), ins abili y, and di icul y con olling he ou pu . They also aise e hical
conce ns, as hey can be qui e easily used o c ea e deep akes ha could lead o p i-
acy in asion, po en ial misin o ma ion, and mis ep esen a ion. Finally, as wi h many
o he AI models, GANs can inad e en ly pe pe ua e biases p esen in he aining
da a in he gene a ed ou pu .
4.1.3 Vision ans o me models
T ans o me s a e ano he model a chi ec u e ha can c ea e images. We saw he same
a chi ec u e ea lie in he con ex o na u al language p ocessing (NLP) asks. T ans-
o me s can also ope a e on ision- ela ed asks and a e called ision ans o me s
(ViT) [2].
T ans o me s a e neu al ne wo ks ha use a en ion mechanisms o p ocess
sequen ial da a, such as ex o speech, and hey can be used o gene a e image
La en
space
G
gene a o
D
disc imina o
Noise
Lea n da a dis ibu ion
Gene a e ake
samples
Fine- uning aining
Lea n di e ence be ween
ake samples and eal samples
Real samples
Is D
co ec ?

1034.1 Vision models
p omp s. They a e also e y e ec i e o speci ic asks such as image ecogni ion and
ha e ou pe o med p e ious leading model a chi ec u es.
A ViT model’s a chi ec u e is simila o ha o NLP, albei wi h some di e ences—
i has a la ge numbe o sel -a en ion laye s and a global a en ion mechanism allow-
ing he model o a end o all pa s o he image simul aneously. T ans o me s calcu-
la e how much each inpu oken is ela ed o e e y o he inpu oken. This is called
a en ion. The mo e okens he e a e, he mo e a en ion calcula ions a e needed.
The numbe o a en ion calcula ions g ows as he squa e o he numbe o okens,
ha is, is quad a ically.
Fo images, howe e , he basic uni o analysis is a pixel and no a oken. The ela-
ionships o e e y pixel pai in a ypical image a e compu a ionally p ohibi i e.
Ins ead, ViT compu es ela ionships among pixels in a ious small sec ions o he
image ( ypically in 16 × 16-sized pixels), which helps educe he compu a ional cos .
These 16 × 16-sized sec ions, along wi h hei posi ional embeddings, a e placed in a
linea sequence and a e he inpu o he ans o me .
As shown in igu e 4.6, a ViT model consis s o h ee main sec ions: he le , mid-
dle, and igh . The le sec ion shows he inpu classes, such as Class, Bi d, Ball, Ca ,
and so o h. These a e he possible labels ha he model can assign o an image. The
middle sec ion shows he linea p ojec ion o la ened pa ches, which ans o m he
inpu image in o a sequence o ec o s ha can be ed o he ans o me encode .
The inal sec ion is he ans o me encode . This comp ises se e al mul i-head a en-
ion and no maliza ion laye s and is used o lea n he ela ionships be ween di e en
image pa s.
Figu e 4.6 Vision ans o me (ViT) a chi ec u e [2]
Vision ans o me (ViT) T ans o me encode
MLP
head
Class
Bi d
Ball
Ca
...
T ans o me encode
Linea p ojec ion o la ened pa ches
Mul ihead
a en ion
MLP
No m
No m
Embedded
pa ches
Pa ch + posi ion
embedding
* Ex a lea nable
[class] embedding
104 CHAPTER 4 F om pixels o pic u es: Gene a ing images
ViTs a e used o a ious image use cases, such as segmen a ion, classi ica ion, and
de ec ion, and hey a e o en mo e accu a e han p e ious echniques. They also sup-
po ine- uning, which can be used in a ew-sho manne wi h smalle da ase s, mak-
ing hem qui e use ul o en e p ise use cases whe e we migh no ha e much da a.
The ViT model aims o p oduce a inal ec o ep esen a ion o he class oken,
which con ains in o ma ion abou he whole image.
ViTs also ha e challenges such as high compu a ional cos s, da a sca ci y, and e hical
issues. They a e compu a ionally complex bo h om a aining and in e ence pe spec-
i e and ha e low in e p e abili y—bo h ac i e esea ch a eas. Mul imodal models, wi h
ViTs such as GPT-4, hold much p omise and unlock new en e p ise possibili ies.
4.1.4 Di usion models
Di usion models a e gene a i e machine lea ning models ha can c ea e ealis ic
da a om andom noise, such as images o audio. Thei goal is o lea n he la en
s uc u e o a da ase by modeling how da a poin s di use h ough ha la en space.
The model is ained by slowly adding noise o an image and lea ning o e e se his
by emo ing noise om he inpu un il i esembles he desi ed ou pu . Fo example,
a di usion model can gene a e an image o a panda by s a ing wi h a andom image
and hen slowly emo ing noise un il i looks like a panda.
Vision di usion models ypically consis o wo pa s: a o wa d and a e e se di u-
sion p ocess. The o wa d di usion p ocess is esponsible o g adually adding noise
o he la en ep esen a ion o an image, which co up s ha la en space. The e e se
di usion p ocess is jus he opposi e—i is esponsible o econs uc ing he o iginal
image om he co up ed la en ep esen a ion.
The o wa d di usion p ocess is ypically implemen ed as a Ma ko chain (i.e., a
sys em wi h no memo y o i s pas , and he p obabili y o he nex s ep depends on he
cu en s a e). This means he co up ed la en ep esen a ion a each s ep depends
only on he p e ious s ep’s la en ep esen a ion, which makes he o wa d di usion
p ocess e icien and easy o ain.
The e e se di usion p ocess is ypically implemen ed as a neu al ne wo k, mean-
ing he neu al ne wo k lea ns o e e se he o wa d di usion p ocess by p edic ing
he o iginal la en ep esen a ion om he co up ed one. This e e se di usion p o-
cess is slow, as i is a s ep-by-s ep epe i ion.
Some o he ad an ages ha di usion models ha e a e he ollowing:
They can p oduce high-quali y images ha ma ch o bea GAN-gene a ed
images, especially o complex scenes, bu hey ake much longe o gene a e.
They do no su e om mode collapse, a common p oblem o GANs. Mode
collapse occu s when he gene a o p oduces only a limi ed a ie y o ou pu s,
igno ing some modes o da a dis ibu ion.
Di usion models can cap u e he ull di e si y o he da a dis ibu ion by using
a Ma ko chain p ocess ha adds noise o he inpu da a.
Di usion models can be combined wi h o he s, such as na u al language mod-
els, o c ea e ex -guided gene a ion sys ems.
1054.1 Vision models
S able Di usion is one o he mos popula di usion-based models o image gene a-
ion. I s a chi ec u e consis s o h ee main pa s (see igu e 4.7):
The ex encode , which con e s he use ’s p omp in o a ec o ep esen a ion.
A denoising au oencode (called UNe ), which is used o econs uc an image
om he la ency space, and a schedule algo i hm, which helps econs uc he
o iginal image. We call i he image in o ma ion c ea o . The UNe is a denoising
au oencode because i lea ns o emo e noise om he inpu image and p o-
duce a clean ou pu image. I is a neu al ne wo k ha has an encode –decode
s uc u e. The encode pa educes he esolu ion o an inpu image and
ex ac s i s ea u es. On he o he hand, he decode pa inc eases he esolu-
ion and econs uc s he ou pu image.
A a ia ional au oencode (VAE), which c ea es an image as close as possible o
a no mal dis ibu ion.
Figu e 4.7 S able Di usion logical a chi ec u e
Tex condi ioned
la en UNe
Tex encode
(CLIP)
Image decode
( a ia ional
au oencode )
Repea Ns eps 64 x 64
Condi ional
la en s
64 x 64
La en s
77 x 768
Tex embeddings
P omp :
“A panda iding a wa e”
La en seed
Random image
512 x 512
Gene a ed image
Image in o ma ion c ea o
Schedule
algo i hm
“ econs uc ”
1
2
3
106 CHAPTER 4 F om pixels o pic u es: Gene a ing images
The choice be ween hese models depends on he speci ic applica ion, he a ailabili y
o compu ing esou ces, aining da a, and non unc ional equi emen s such as image
quali y, speed, and so o h. Table 4.1 lis s some o he mo e common gene a i e AI
ision sys ems ha can c ea e images om ex .
Many o he AI ision models lis ed in able 4.1 a e a ailable only o hose who we e
in i ed o es hem. This is s ill a new space, and mos p o ide s a e going slowly,
lea ning wi h a hand ul o cus ome s be o e olling hese ou .
C ea ing and manipula ing images wi h gene a i e AI is an exci ing and challeng-
ing esea ch a ea wi h many po en ial applica ions and implica ions. Howe e , i aises
e hical and social ques ions abou he gene a ed con en ’s owne ship, au hen ici y,
and e ec s. The e o e, i is impo an o use gene a i e AI esponsibly and e hically
and o conside i s bene i s and isks o socie y.
4.1.5 Mul imodal models
A mul imodal model can handle di e en ypes o inpu da a. “Modal” e e s o he
mode o ype o da a, and “mul imodal” e e s o mul iple da a ypes. These ypes
include ex , images, audio, ideo, and mo e. Fo example, GPT-4 has a mul imodal
Table 4.1 Mos common AI ision ools
AI ision ool Desc ip ion
Imagen Imagen is Google’s ex - o-image di usion model, which can gene a e ealis ic
images om ex desc ip ions. I is a ailable in limi ed p e iew and has been
shown o gene a e images indis inguishable om eal pho og aphs.
DALL-E OpenAI de eloped a ans o me language model o c ea e di e se, o iginal, eal-
is ic, and c ea i e images and a om a p omp . I can edi images based on he
con ex , such as adding, dele ing, o changing speci ic pa s. I has gene a ed a -
ious images, om e e yday objec s o su ealis ic a , om simple ex p omp s.
DALL-E 3 is an imp o ed e sion ha can gene a e mo e ealis ic and accu a e
images wi h 4x g ea e esolu ion.
Midjou ney AI-based a gene a o ha uses deep lea ning and neu al ne wo ks o c ea e a -
wo k based on p omp s and o he images and ideos. This is accessible only ia
a Disco d se e , and he esul s can be ailo ed o any aes he ics, om abs ac
o ealis ic, hus o e ing endless possibili ies o c ea i e exp ession.
Adobe Fi e ly Adobe Fi e ly is a amily o c ea i e, gene a i e AI di usion models designed o
help designe s and c ea i e p o essionals c ea e images and ex e ec s and
edi and ecolo . I is easy o use wi h Adobe’s o he ools, such as Pho oshop
and Illus a o . Adobe has bo h ex - o-image models and gene a i e ill
models.
S able Di usion Popula models include e sions o S able Di usion XL and 1.6, an image-
gene a ing model ha uses di usion models o c ea e high-quali y images using
p omp s wi h nex -le el pho o ealism capabili ies. I can also gene a e no el
images om ex desc ip ions. The mo e ecen 3 amily o models comes in
la ge and medium wi h 8B and 2B pa ame e s, espec i ely.
1074.1 Vision models
model a ian ha akes bo h an image and an associa ed p omp o make p edic ions
o in e ences.
Bing Cha ecen ly enabled his mul imodal ea u e, allowing us o use images and
ex in he p omp . Fo example, as shown in igu e 4.8, we gi e he model wo hings:
an image and a p omp ela ed o he image. In his case, we show some p oduce and
ask he model wha we can cook wi h i .
Figu e 4.8 Mul imodal example using bo h an image and a p omp
In his case, he model mus unde s and he image and he di e en pa s (i.e., ing e-
dien s in ou example) and co ela e o he p omp o gene a e an answe . We see he
esponse in he shaded ex , showing we can make guacamole, salsa, a ocado oas ,
and so o h.

108 CHAPTER 4 F om pixels o pic u es: Gene a ing images
Mul imodal models o en use di e en AI echniques. While hey can use di e en
combina ions o model a chi ec u e, in ou example, GPT-4 combines di e en ans-
o me blocks ( igu e 4.9).
Figu e 4.9 Mul imodal model design
NOTE When showing ans o me blocks, as in igu e 4.9, he con en ion is
o use Nx, e e ing o he ans o me block epea ing mul iple imes; in
o he wo ds, i is s acked x numbe o imes. In ou mul imodal example, his
is he case o all h ee ans o me blocks: he image on he le (Lx), he
ex on he igh (Rx), and he combining laye (Nx).
Mul i sel -a en ion
Feed o wa d
Combining laye
Visual embeddings
T ans o me block
Mul i sel -a en ion
Feed o wa d
Tex embeddings
T ans o me block
Lx
Nx
Rx
“Wha can I make
wi h his?” Tex inpu
Image inpu
+ +
+ +
Mul i sel -a en ion
Feed o wa d
T ans o me block
+
+
Laye no m
Laye no m
Laye no m
Laye no m
Laye no m
Laye no m
1094.2 Image gene a ion wi h S able Di usion
Mul imodal models a e pa icula ly use ul in complex eal-wo ld applica ions whe e
da a comes in a ious o ms. Fo example:
Web—Analyzes ex and images o con en mode a ion and sen imen analysis
eComme ce—Recommends p oduc s using bo h pho os and ex desc ip ions
Heal hca e—Uses ex da a (pa ien medical his o y) and medical imaging
(image da a) o diagnosis
Sel -d i ing—In eg a es senso da a ( ada and lida ) wi h isual da a (came as)
o si ua ional awa eness and decision-making
Now ha we ha e seen some models, hei ou pu , and a gene al sense o how ision
AI models wo k, le us gene a e images wi h S able Di usion.
4.2 Image gene a ion wi h S able Di usion
S abili y AI, he company behind he S able Di usion, has ad anced di usion-based
models wi h SDXL as hei la es and mos powe ul model hus a . They o e mul i-
ple op ions o us o use:
Sel -hos —The model and associa ed weigh s ha e been published and a e a ail-
able ia Hugging Face (h ps://hugging ace.co/s abili yai). They can be sel -
hos ed, equi ing he app op ia e compu ing ha dwa e, including GPUs.
D eamS udio—This S abili yAI’s consume applica ion a ge s consume s. I is a
simple web in e ace ha gene a es images. The company also has an open
sou ce e sion called S ableS udio, d i en by he communi y. Mo e de ails on
D eamS udio can be ound a h ps://d eams udio.ai.
Pla o m APIs—S abili y AI has a pla o m API (h ps://pla o m.s abili y.ai) ha
we will use in his book, gi en ha mos en e p ises would p e e an API ha
can be managed be e a scale. REST API will be used o ou example he e, as
i shows he mos lexibili y ac oss all pla o ms. S able Di usion also has a
gRPC API, which is qui e simila .
4.2.1 Dependencies
We will build on he packages equi ed ea lie in chap e 3 and assume ha he ol-
lowing a e ins alled: Py hon, de elopmen IDE, and a i ual en i onmen (such as
conda). Fo S able Di usion, we need he ollowing:
A S abili y AI accoun and associa ed API key; his can be acqui ed ia he
accoun page a h ps://pla o m.s abili y.ai/accoun /keys. Billing de ails also
need o be se up a he same place. We pip ins all he s abili y-sdk Py hon
package: pip ins all s abili y-sdk.
Keep he API key con iden ial, and ollow bes p ac ices o managing sec e s.
We will use en i onmen al a iables o s o e he key secu ely, which can be con-
igu ed as ollows:
–Windows—se x STABILITY API KEY “you -openai-key”
–Linux/Mac—expo STABILITY API KEY=you -openai-endpoin
110 CHAPTER 4 F om pixels o pic u es: Gene a ing images
–Bash—echo expo STABILITY_API_KEY="YOUR_KEY" >> /e c/en i onmen
&& sou ce /e c/en i onmen
We s a by ge ing a lis o all he models a ailable using he engines API, including all
he a ailable engines (i.e., models).
impo os
impo eques s
impo json
api_hos = "h ps://api.s abili y.ai"
u l = "{api_hos }/ 1/engines/lis "
esponse = eques s.ge (u l, heade s={
"Au ho iza ion": "Bea e {api_key}"
})
payload = esponse.json()
# o ma he payload o p in ing
payload = json.dumps(payload, inden =2)
p in (payload)
The ou pu o his code is p esen ed in he nex lis ing. This shows us he engines we
mus use and helps in es ing end o end o con i m ha he API call wo ks and ha
we can au hen ica e and ge a esponse.
[
{
"desc ip ion": "Real-ESRGAN_x2plus upscale model",
"id": "es gan- 1-x2plus",
"name": "Real-ESRGAN x2",
" ype": "PICTURE"
},
{
"desc ip ion": "S abili y-AI S able Di usion XL 1.0",
"id": "s able-di usion-xl-1024- 1-0",
"name": "S able Di usion XL 1.0",
" ype": "PICTURE"
},
{
"desc ip ion": "S abili y-AI S able Di usion 1.5",
"id": "s able-di usion- 1-5",
"name": "S able Di usion 1.5",
" ype": "PICTURE"
},
…
]
Lis ing 4.1 S able Di usion: Lis ing he models
Lis ing 4.2 Ou pu : S able Di usion model lis s
REST API call o
ge ing he models
HTTP heade o
au ho iza ion
Response back
om he API
Making he JSON mo e
human- eadable
1114.2 Image gene a ion wi h S able Di usion
4.2.2 Gene a ing an image
We use he S able Di usion image gene a ion endpoin (REST API) o ou image gen-
e a ion. We will use he la es model, he SDXL model, a he ime o his publica ion.
The co esponding engine ID o his model is s able-di usion-xl-1024- 1-0, as
shown in he p e ious example lis ing o models. This engine ID is equi ed as pa o
he REST API pa h pa ame e and is a ailable a h ps://api.s abili y.ai/ 1/gene a ion/
{engine_id}/ ex - o-image.
Lis ing 4.3 shows an example o using his API o gene a e an image. No e ha we
use 1.0 o he API o he examples in his chap e . To use he newe models, we only
need o change he REST API pa h in mos cases. Fo example, o use he newe models
ha ha e jus been announced, S able Di usion 3 and cu en ly in Be a, swi ch o he
ollowing engine ID:
h ps://api.s abili y.ai/ 2be a/s able-image/gene a e/sd3
.
impo base64
impo os
impo eques s
impo da e ime
impo e
engine_id = "s able-di usion-xl-1024- 1-0"
api_hos = "h ps://api.s abili y.ai"
api_key = os.ge en ("STABILITY_API_KEY")
p omp = "Laughing panda in he clouds ea ing bamboo"
# Se he olde o sa e he image; make su e i exis s
image_di = os.pa h.join(os.cu di , 'images')
i no os.pa h.isdi (image_di ):
os.mkdi (image_di )
# Func ion o clean up ilenames
de alid_ ilename(s):
s = e.sub( '[^ w_.)( -]', '', s).s ip()
e u n e.sub( '[ s]+', '_', s)
esponse = eques s.pos (
"{api_hos }/ 1/gene a ion/{engine_id}/ ex - o-image",
heade s={
"Con en -Type": "applica ion/json",
"Accep ": "applica ion/json",
"Au ho iza ion": "Bea e {api_key}"
},
json={
" ex _p omp s": [
{
" ex ": "{p omp }",
}
],
Lis ing 4.3 S able Di usion: Image gene a ion
Choose he model
we wan o use.
P omp s used o
gene a e he image
Helpe unc ions
o c ea e ilenames
API call o gene a ing
he image
The REST API
Endpoin includes
he engine ID.
118 CHAPTER 4 F om pixels o pic u es: Gene a ing images
As shown in igu e 4.15, addi ional se ings o inpain ing allow o ine con ol.
Some o hese a e he same as image gene a ion and a e equally impo an , such as
he numbe o sampling s eps and me hods.
Figu e 4.15 S able Di usion inpain ing op ions
(con inued)
o ha pa icula ask. Fo example, CLIP can be gi en he names o isual classes
and iden i y hem in images, e en i i wasn’ speci ically ained on hem.
CLIP encodes bo h ex and images in o a common ep esen a ion space. I can es i-
ma e he mos sui able ex snippe o an image o ice e sa. This gi es i much
lexibili y and he abili y o handle di e en kinds o isual asks wi hou equi ing
aining da a speci ic o each ask.

1194.4 Edi ing and enhancing images using S able Di usion
Ou pain ing is an addi ional se ing ha gene a es and expands he image in ou cho-
sen di ec ion. This op ion is selec ed ia he Sc ip d opdown on he same se ings ab
( igu e 4.16).
Figu e 4.16 Ou pain ing se ings in S able Di usion
We go h ough he i e a ion o inpain -
ing by emo ing he a eas we wan
using he mask, egene a ing, and
hen adding he new elemen s. The
inal esul o hese i e a ions is shown
in igu e 4.17.
NOTE The de ails on S able Di u-
sion web UI, including se up, con-
igu a ion, and deploymen s, a e
ou side he scope o his book; how-
e e , i is one o he e y popula
applica ions ha allow one o sel -
hos ac oss Windows, Linux, and
MacOS. You can ind mo e de ails a
hei Gi Hub eposi o y (h ps://
mng.bz/znx1).
4.4.1 Gene a ing using image- o-image API
Image- o-image is a powe ul ool o gene a ing o modi ying new images ha use
exis ing images as a s a ing poin and a ex p omp . We can use his API o gene a e
a new image bu change he s yle and mood and add o emo e aspec s.
Figu e 4.17 Final edi s o inpain ing using S able
Di usion
120 CHAPTER 4 F om pixels o pic u es: Gene a ing images
Le ’s use ou se ene lake example om ea lie and hen use he image- o-image
API o gene a e a new image. We build on bo h examples we ha e seen ea lie —we
use he se ene lake as ou inpu and ask he model o gene a e “a happy panda ea ing
bamboo in he sky.”
impo base64
impo os
impo eques s
impo da e ime
impo e
engine_id = "s able-di usion-xl-1024- 1-0"
api_hos = "h ps://api.s abili y.ai"
api_key = os.ge en ("STABILITY_API_KEY")
o ginal_image = "images/se ene_ aca ion_lake_house.jpg"
#helpe unc ions
...
esponse = eques s.pos (
"{api_hos }/ 1/gene a ion/{engine_id}/image- o-image",
heade s={
"Accep ": "applica ion/json",
"Au ho iza ion": "Bea e {api_key}"
},
iles={
"ini _image": open(o ginal_image, " b")
},
da a={
"image_s eng h": 0.35,
"ini _image_mode": "IMAGE_STRENGTH",
" ex _p omp s[0][ ex ]": "A happy panda ea ing bamboo in he sky",
"c g_scale": 7,
"samples": 1,
"s eps": 50,
"sample ": "K_DPMPP_2M"
}
)
da a = esponse.json()
o i, image in enume a e(da a["a i ac s"]):
ilename = "{ alid_ ilename(os.pa h.basename(o ginal_image))}_
➥img2img_{i}_{da e ime.da e ime.now().
➥s ime('%Y%m%d_%H%M%S')}.png"
image_pa h = os.pa h.join(image_di , ilename)
wi h open(image_pa h, "wb") as :
.w i e(base64.b64decode(image["base64"]))
We see he gene a ed image as shown on he le in igu e 4.18 o he image- o-image
API call; we see he panda and he bamboo and how he inpu image o se he scene
Lis ing 4.4 Image- o-image gene a ion
1214.4 Edi ing and enhancing images using S able Di usion
and he ype and aes he ic o he gene a ed image a e used. Howe e , i doesn’
adhe e o he cloud aspec o he p omp .
We can weak he pa ame e s o make i adhe e mo e o he p omp and less o he
inpu image, as shown on he igh side o igu e 4.18. An example is when we see a
panda in he sky, ea ing bamboo; o e all, he image aes he ics ollows he inpu image.
Figu e 4.18 S able Di usion image- o-image gene a ion
4.4.2 Using he masking API
S able Di usion also has a masking API ha allows us o edi po ions o an image p o-
g amma ically. The API is e y simila o he c ea ion API, as shown in he example in
lis ing 4.5. I does ha e a ew cons ain s: he mask image needs o be he same dimen-
sion as he o iginal image, and a PNG, less han 4MB in size. The API has he same
heade pa ame e s ou lined ea lie in he chap e when we discussed image gene a-
ion; we will a oid duplica ing ha .
impo base64
impo os
impo eques s
impo da e ime
impo e
engine_id = "s able-inpain ing-512- 2-0"
api_hos = "h ps://api.s abili y.ai"
api_key = os.ge en ("STABILITY_API_KEY")
o ginal_image = "images/se ene_ aca ion_lake_house.jpg"
mask_image = "images/mask_se ene_ aca ion_lake_house.jpg"
p omp = " boa wi h a pe son ishing and a dog in he boa "
Lis ing 4.5 S able Di usion masking API example
Selec s he inpain ing
model we wan o use
Image we
wan o edi
Masks ha we
wan o apply
122 CHAPTER 4 F om pixels o pic u es: Gene a ing images
# helpe unc ions
...
esponse = eques s.pos (
"{api_hos }/ 1/gene a ion/{engine_id}/image- o-image/masking",
heade s={
"Accep ": 'applica ion/json',
"Au ho iza ion": "Bea e {api_key}"
},
iles={
'ini _image': open(o ginal_image, ' b'),
'mask_image': open(mask_image, ' b'),
},
da a={
"mask_sou ce": "MASK_IMAGE_BLACK",
" ex _p omp s[0][ ex ]": p omp ,
"c g_scale": 7,
"clip_guidance_p ese ": "FAST_BLUE",
"samples": 4,
"s eps": 50,
}
)
da a = esponse.json()
o i, image in enume a e(da a["a i ac s"]):
ilename = "{ alid_ ilename(os.pa h.basename(o ginal_image))}_
➥masking_{i}_{da e ime.da e ime.now().
➥s ime('%Y%m%d_%H%M%S')}.png"
image_pa h = os.pa h.join(image_di , ilename)
wi h open(image_pa h, "wb") as :
.w i e(base64.b64decode(image["base64"]))
Table 4.4 ou lines all he API pa ame e s. In e ms o op ions o s ee he model,
much o i is simila o he p e ious image c ea ion.
Table 4.4 S able Di usion masking API pa ame e s
Pa ame e Type De aul alue Desc ip ion
ini _image S ing Bina y
( equi ed)
The ini ial image ha we wan o edi
mask_sou ce S ing Null ( equi ed) Mask de ails ha de e mine he gene a ion a eas and
associa ed s eng hs. I can be one o he ollowing:
MASK_IMAGE_WHITE—Use whi e pixels as he
mask; whi e pixels a e modi ied; black pixels a e
unchanged.
MASK_IMAGE_BLACK—Use black pixels as he
mask; black pixels a e modi ied; whi e pixels a e
unchanged
INIT_IMAGE_ALPHA—Use he alpha channel as he
mask. Edi ully anspa en pixels, and lea e ully
opaque pixels unchanged.
Masks
API call
Selec s he black pixels o
he image o be eplaced
P omp s o
he gene a ion
Speci ies he numbe
o images o gene a e
De e mines he numbe o
s eps o each o he images
Ge s he esponse
om he API
Sa es he edi ed
image o disk
1234.4 Edi ing and enhancing images using S able Di usion
mask_image S ing Bina y
( equi ed)
Mask image ha guides he model on which pixels
need o be modi ied. This pa ame e is used only i
he mask_sou ce is ei he MASK_IMAGE_BLACK o
MASK_IMAGE_WHITE.
ex _p omp s S ing Null ( equi ed) An a ay o ex p omp s is used o gene a e he
image. Each elemen in his a ay comp ises wo p op-
e ies—one o he p omp i sel and he o he o he
associa ed weigh . The weigh s should be nega i e o
nega i e p omp s. The p omp s need o adhe e o he
ollowing o ma :
ex _p omp s[index][ ex |weigh ], wi h
he index being unique and no ha ing o be
sequen ial.
c g_scale S ing 7 (op ional) Can ange be ween 0 and 35; i de ines how s ic ly
he di usion p ocess ollows he p omp . Highe al-
ues keep he image close o he p omp .
clip_guidance
_p ese
S ing None (op ional) Di e en alues con ol how much CLIP guidance is
used and in luence he quali y and ele ance o he
image being gene a ed. Possible alues a e NONE,
FAST_BLUE, FAST_GREEN, SIMPLE, SLOW,
SLOWER, and SLOWEST.
sample S ing Null (op ional) De ines he sample o use o he di usion p ocess.
I his alue is omi ed, he API au oma ically selec s
an app op ia e sample o you.
Possible alues a e DDIM, DDPM, K_DPMPP_2M,
K_DPM_2, K_EULER K_DPMPP_2S_ANCESTRAL,
K_HEUN, K_DPM_2_ANCESTRAL, K_LMS, and
K_EULER_ANCESTRAL.
samples In ege 1 (op ional) De ines he numbe o images o gene a e. Values
need o ange be ween 1 and 10.
seed In ege 0 (op ional) A andom seed is a numbe ha de e mines how he
noise looks. Lea e 0 o a andom seed alue. The
possible alue anges be ween 0 and 4294967295.
s eps In ege 50 (op ional) De ines he numbe o di usion s eps o un. Possible
alues ange be ween 10 and 150.
s yle_p ese S ing Null (op ional) Used o guide he image model owa ds a pa icula
p ese s yle. Possible alues a e 3d-model,
analog- ilm, anime, cinema ic, comic-book,
digi al-a , enhance, an asy-a ,
isome ic, line-a , low-poly, modeling-
compound, neon-punk, o igami,
pho og aphic, pixel-a , and ile-
ex u e.
No e: This lis o s yle p ese s is subjec o change
o e ime.
Table 4.4 S able Di usion masking API pa ame e s (con inued)
Pa ame e Type De aul alue Desc ip ion

124 CHAPTER 4 F om pixels o pic u es: Gene a ing images
4.4.3 Resize using he upscale API
The inal S able Di usion API we wan o co e is used o upscale an image, ha is,
gene a e a highe - esolu ion image o a gi en image. The de aul is o upscale he
inpu image by a ac o o wo, wi h a maximum pixel coun o 4,194,304, equi alen
o a maximum dimension o 2,048 × 2,048 and 4,096 × 1,024.
The API is s aigh o wa d, as shown in he nex lis ing. The main hing o be
awa e o is using he igh model ia he engine_id pa ame e .
impo base64
impo os
impo eques s
impo da e ime
impo e
engine_id = "es gan- 1-x2plus"
api_hos = "h ps://api.s abili y.ai"
api_key = os.ge en ("STABILITY_API_KEY")
o ginal_image = "images/se ene_ aca ion_lake_house.jpg"
# helpe unc ions
...
esponse = eques s.pos (
"{api_hos }/ 1/gene a ion/{engine_id}/image- o-image/upscale",
heade s={
"Accep ": "image/png",
"Au ho iza ion": "Bea e {api_key}"
},
iles={
"image": open(o ginal_image, " b")
},
da a={
"wid h": 2048,
}
)
ilename = "{ alid_ ilename(os.pa h.basename(o ginal_image))}_
➥upscale_{da e ime.da e ime.now().
➥s ime('%Y%m%d_%H%M%S')}.png"
image_pa h = os.pa h.join(image_di , ilename)
wi h open(image_pa h, "wb") as :
.w i e( esponse.con en )
Now ha we ha e examined nume ous image-gene a ion op ions using bo h GUIs
and APIs, le ’s examine some o he bes p ac ices o en e p ises.
Lis ing 4.6 S able Di usion esizing API
1254.4 Edi ing and enhancing images using S able Di usion
4.4.4 Image gene a ion ips
This sec ion ou lines some bes p ac ices o image gene a ion. In he con ex o
en e p ises, ou side o some unc ions, such as g aphic designe s and a is s, many
people wi h di e en skills need help. These sugges ions will help hem ge s a ed.
We will co e mo e de ails la e in he book when discussing p omp enginee ing:
Desc ibe in de ail—Desc ibe he main subjec you wan o gene a e in de ail. The
isual elemen s we imagine o wan migh no ma ch how he model in e p e s
hem, so adding de ails and hin s can s ee he model mo e owa d wha you
wan . Many also o ge o desc ibe he backg ound; i is also impo an o add
hose de ails.
Vibes and a s yle—Speci y he s yle o he ibe o he a ha is you in en ; o
example, we ou lined a pain ing in he p omp s ea lie . The lis is endless and,
in some ways, up o you imagina ion, going om oil pain ing o s eampunk o
ac ion pho og aphy.
Se he emo ion, ene gy, and mood—Add adjec i es and e bs ha con ey he
mood, ene gy, and o e all emo ion— o example, he gene a ed image aims o
be posi i e and high ene gy, o posi i e bu low ene gy, and so o h.
Hands and ace gene a ions—These a e p oblema ic o many models, and while
hey a e ge ing be e , some imes i is be e o add s ock o o he images o
gene a ed images.
S uc u e, size, ligh , and iewing pe spec i es—When hinking o he ibe and s yle
o he a ge image, one also has o hink o he size and s uc u e o he a i-
ac s. Fo example, do we expec some hing small and in ica e o big and ee-
s anding? And om wha pe spec i e a e he a i ac s being looked a —is i a
Wa e ma k o AI-gene a ed images
Since AI-gene a ed images a e ge ing inc easingly be e , and we o en canno dis-
inguish be ween eal and AI-gene a ed images, he e is a push o wa e ma k
AI-gene a ed images. The e a e wo main ways o do his oday: isible wa e ma ks,
like wha Bing and DALLE do, and in isible wa e ma ks, which a e no isible o us
bu a e embedded in he image and can be de ec ed using special ools.
Google has gone a s ep u he and de eloped a new ype o wa e ma k called
Syn hID. An in isible wa e ma k is embedded in each image pixel, making i mo e
esis an o image manipula ion, such as il e s, esizing, and c opping. I does so
wi hou deg ading he image in any no iceable way and wi hou changing he image
size signi ican ly.
The e a e mul iple bene i s o wa e ma king AI-gene a ed images. In addi ion o indi-
ca ing he o igin and possibly owne ship o he images, hey help discou age unau-
ho ized use and dis ibu ion and help p e en he sp ead o misin o ma ion. Chap e
13 co e s GenAI- ela ed isks in mo e de ail, including mi iga ion s a egies and asso-
cia ed ooling.
126 CHAPTER 4 F om pixels o pic u es: Gene a ing images
closeup, a long sho , wide angle, ou doo , o in na u al ligh ? O cou se, gi en
ha we a e alking abou a p omp , i can combine many o hese hings.
Wo ds, logos, and cha ac e s—The image models a en’ la ge language models
and gene ally s uggle wi h images whe ein we expec wo ds o be gene a ed
(e.g., a pe salon wi h i s name on he ou side). I is bes o add hese manually
when edi ing he images. Once added, we can use inpain ing.
A oid mul iple cha ac e s oge he —I you add many cha ac e s in he same p omp
and gene a ion ask, i is common o he model o ge con used. I migh be
be e o s a wi h smalle asks and hen use inpain ing o manually edi hese
elemen s.
The nex chap e will show o he hings ha can be gene a ed in addi ion o ex and
images. We will co e audio, ideo, and code gene a o s.
Summa y
Vision-based gene a i e AI models allow us o c ea e unique and ealis ic con-
en , all om a simple p omp . These models can gene a e new con en , edi
and enhance exis ing images, and use simple p omp s.
Gene a i e AI ision models ha e mul iple use cases in which hey can be used
o c ea i e con en , image edi ing, syn he ic da a c ea ion, and gene a i e
design.
The e a e ou p ima y gene a i e AI model a chi ec u es, each wi h s eng hs
and challenges. We explained a ia ional au oencode s (VAEs), gene a i e
ad e sa ial ne wo ks (GANs), ision ans o me models (ViT), and di usion
models.
Mul imodal models a e di e en gene a i e AI models ha allow us o handle
di e en ypes o inpu da a, including ex , images, audio, and ideo, simul a-
neously.
OpenAI’s DALLE, Bing, Adobe, and S abili y AI’s S able Di usion a e some o
he mo e amous and common gene a i e AI image models used by en e p ises
o image gene a ion and edi ing. Mos hings exposed ia an API ha e ele an
GUI in e aces oo.
Many gene a i e AI ision models suppo inpain ing (modi ying pa s wi hin
an image), ou pain ing (expanding an image beyond i s o iginal bounda ies),
and c ea ing image a ia ions.
Di usion models a e mo e obus in modeling collapse and suppo ing a ious
ou pu s.
Finally, when i comes o images, we need o hink abou he scene, main cha -
ac e , s uc u e, and elemen s such as ex and aces, which a e be e done
manually and edi ed in o he image. These aspec s ha e o be added o he
p omp o he gene a ion. La e in he book, we will discuss his opic as pa
o p omp enginee ing.
127
Wha else can
AI gene a e?
Code ha w i es i sel wi h li le p omp ing and wi hou much inpu seems magical,
esembling a holy g ail, a leas o hose wo king in compu ing. Gi en he ad ance-
men s in a i icial in elligence (AI) wi h gene a i e AI, his endea o seems possible
oday. We ha e seen some amazing and in e es ing hings AI can gene a e— om
language o images o holding an ongoing back-and- o h mul i u n con e sa-
ion—and many o hem ha e s ong use cases in en e p ises. This chap e ou lines
he emaining hings we can gene a e using AI.
We will i s alk abou code gene a ion, wha i means, how one should go
abou i , and he ools en e p ises use. Fo example, And ej Ka pa hy, one o he
OpenAI co ounde s, who used o lead Tesla’s AI and Vision eam, ecen ly said ha
This chap e co e s
Using gene a i e AI o code c ea ion and code- ela ed
asks
Tools ha allow code gene a ion and how o use hem
Bes code gene a ion p ac ices
Gene a ing ideo and ela ed ools
Gene a ing audio, music, and ela ed ools
230 CHAPTER 8 Cha ing wi h you da a
# Ge he i le
...
# ge he pos desc ip ion
...
# ge he publish da e
...
# ge he a icle body
y:
a icle_body = soup. ind('di ', {'class': 'pos -con en '}). ex
excep A ibu eE o :
a icle_body = ""
# This should be chunked up
a icle = a icle_body
o al_ oken_coun = 0
chunks = []
# spli he ex in o chunks by sen ences
chunks = spli _sen ences_by_spacy(a icle, max_ okens=3000, o e lap=10)
p in ( "Numbe o chunks: {len(chunks)}")
o j, chunk in enume a e( qdm(chunks))
ec o = ge _embedding(chunk)
# con e o numpy a ay
ec o = np.a ay( ec o ).as ype(np. loa 32). oby es()
# C ea e a new hash wi h he URL and embedding
pos _hash = {
"u l": pos .link,
" i le": a icle_ i le,
"desc ip ion": a icle_desc,
"publish_da e": publish_da e,
"con en ": chunk,
"embedding": ec o
}
conn.hse (name= "pos :{i}_{j}", mapping=pos _hash)
p.execu e()
p in ("Vec o upload comple e.")
Once we ge he blog pos ’s con en , we need o chunk i up, as discussed in he p e i-
ous chap e . Fo his example, we use spaCy o chunk he blog pos and also ha e
some o e lap be ween di e en chunks.
8.4.1 Re ie e pipeline bes p ac ices
When implemen ing a RAG pa e n, i ’s c ucial o ha e a deep unde s anding o he
sou ce sys em’s con en and s uc u e. The success o a RAG model hinges on i s abil-
i y o access and in e p e he igh da a, which necessi a es a well-a chi ec ed da a

2318.4 Re ie ing he da a
pipeline. This pipeline is no jus a condui o da a low, bu a sophis ica ed ame-
wo k ha ensu es da a is ex ac ed, ans o med, indexed, and s o ed o align wi h he
model’s equi emen s and he de ined use case.
The i s s ep owa d implemen ing GPTs and LLMs in en e p ises is a deep unde -
s anding o he sou ce sys em. This in ol es ho oughly analyzing he da a s uc u e,
including en i y- ela ionship diag ams, da a ypes, and da a dis ibu ion. Da a p o il-
ing ools can be ins umen al in unde s anding he na u e o he con en .
NOTE Fo RAG o wo k well, i is impo an o ca e ully plan he p ep ocess-
ing one needs o do in he e ie e pipeline and no jus use e e y hing wi h-
ou conside ing whe he i is be e . I no planned well, his will c ea e
p oblems when using sea ch as pa o a RAG implemen a ion.
The nex phase de ines he use case, which en ails c ea ing a de ailed equi emen
documen ou lining he p oblem, po en ial solu ions, expec ed esul s, and success
me ics. This documen should also de ail he use s’ in o ma ional needs and he sce-
na ios in which he RAG model will be applied.
Following his, he ocus shi s o da a ex ac ion and ans o ma ion. This p ocess
in ol es using ETL (ex ac , ans o m, load) ools o ex ac da a om he sou ce sys-
em and ans o m i in o a o ma he RAG model can unde s and. I may in ol e
NLP echniques such as okeniza ion, s op-wo d emo al, and lemma iza ion.
Once he da a has been ans o med, i needs o be indexed o e icien e ie al.
Azu e AI Sea ch, Elas icsea ch, Sol , and Lucene a e ideal o his pu pose, as hey
p o ide ull- ex sea ch capabili ies and can handle la ge da ase s e ec i ely.
Pa allel o da a indexing, selec ing a sui able da a s o age solu ion is impo an .
Depending on he speci ic needs o he da a size, speed, and ype, his could be a a-
di ional SQL da abase, a NoSQL da abase such as Cosmos DB, o a dis ibu ed ile sys-
em such as Hadoop HDFS.
One o he mos c i ical phases is p ep ocessing planning. This in ol es ca e ul
planning o p ep ocessing s eps, which could in ol e echniques such as noise
emo al, no maliza ion, and dimensionali y educ ion. The goal is o e ain in o ma-
ion ele an o he use case while educing he model’s complexi y.
The nex phase is model in eg a ion, which in ol es using APIs o SDKs p o ided
by he AI model endo o in eg a e he RAG model in o he applica ion. The
e ie e mus be con igu ed wi h he co ec que y pa ame e s, and he gene a o
should be se up wi h he desi ed ou pu s uc u e.
Fine- uning and moni o ing a e c ucial o enhancing he model’s pe o mance
and ensu ing he sys em’s heal h. This in ol es using a alida ion da ase o ine- uning
and applica ion pe o mance managemen (APM) ools o moni o ing.
Rega ding scalabili y and eliabili y, cloud pla o ms such as AWS, Google Cloud,
o Azu e should be used o scale he sys em as needed. Con aine iza ion pla o ms
such as Docke and Kube ne es can assis in scaling and managing he applica ion.
Redundancy and ailo e s a egies a e c ucial o ensu ing sys em eliabili y.
232 CHAPTER 8 Cha ing wi h you da a
Fu he mo e, secu i y and compliance canno be o e looked. Implemen ing
da a enc yp ion, use au hen ica ion, access con ol, and egula sys em audi s can
ensu e da a secu i y and compliance wi h da a p o ec ion egula ions such as GDPR
o CCPA.
Be o e deploymen , igo ous es ing and alida ion a e impe a i e o ensu e ha
he pipeline and he RAG model mee he expec a ions ou lined by he use case. Once
he sys em is li e, comp ehensi e documen a ion and echnical aining should be p o-
ided o he eam o e ec i e managemen , main enance, and oubleshoo ing.
Finally, i ’s c ucial o ensu e he quali y con ol o he e ie al co pus, implemen
measu es o in o ma ion secu i y and p i acy, egula ly upda e he e ie al co pus,
and e icien ly alloca e esou ces. By ollowing hese s eps, en e p ises can e ec i ely
build and main ain AI-powe ed applica ions.
8.5 Sea ch using Redis
Now ha we ha e he da a inges ed and he index eady, we can sea ch agains i . We
c ea e a simple console app ha accep s a use ’s que y, ec o izes i , and sea ches
based on he op h ee simila pos s o e u n o he use . This is a seman ic sea ch.
The ollowing lis ing shows he ou pu gene a ed as an example when we ask abou
“Longho n.”
$ py hon . sea ch.py
Connec ed o Redis
En e you que y: Tell me abou Longho n
Vec o izing que y...
Sea ching o simila pos s...
Found 3 esul s:
You p obably al eady hea d his, bu <s ong>Ch is Sells</s ong>
➥has a new column on MSDN called <s ong>Longho n Fogho n</s ong>
, ha desc ibes each o he â
<s ong>Pilla s o Longho n</s ong>
â - This is some hing ha IMHO de elope s would unde s and and
➥app ecia e. In he i s a icle he explains he âPilla sâ and hen
➥in he nex wo goes on o build Soli ai e. You can download he sample
➥and play wi h i oo.
F om OSNews: Mic os has made <em>ha d s a emen s abou pe omance
➥imp o emen s in Longho n ...
NOTE Windows Longho n used o be he codename o he ope a ing sys em
ha e en ually became Windows Vis a.
Le ’s check ou he code o implemen ing he sea ch using Redis. We i s ake a use
que y such as “Tell me abou Longho n,” c ea e a ec o , and use cosine simila i y o
ob ain a lis o compa able esul s.
Lis ing 8.7 Sea ch esul s
2338.5 Sea ch using Redis
de hyb id_sea ch(que y_ ec o , clien , op_k=3, hyb id_ ields="*"):
base_que y = "{hyb id_ ields}=>
[KNN { op_k}
@embedding $ ec o AS ec o _sco e]"
que y = Que y(base_que y). e u n_ ields(
"u l",
" i le",
"publish_da e",
"desc ip ion",
"con en ",
" ec o _sco e").so _by(" ec o _sco e").dialec (2)
y:
esul s = clien . ("pos s").sea ch(
que y, que y_pa ams={" ec o ": que y_ ec o })
excep Excep ion as e:
p in ("E o calling Redis sea ch: ", e)
e u n None
i esul s. o al == 0:
p in ("No esul s ound o he gi en que y ec o .")
e u n None
e u n esul s
# Connec o he Redis se e
conn = edis.Redis(...)
que y = inpu ("En e you que y: ")
p in ("Vec o izing que y...")
que y_ ec o = ge _embedding(que y)
que y_ ec o = np.a ay(que y_ ec o ).as ype(
np. loa 32). oby es()
p in ("Sea ching o simila pos s...")
esul s = hyb id_sea ch(que y_ ec o , conn)
i esul s:
p in ( "Found { esul s. o al} esul s:")
o i, pos in enume a e( esul s.docs):
sco e = 1 - loa (pos . ec o _sco e)
p in (pos .con en )
else:
p in ("No esul s ound")
As he name sugges s, he hyb id_sea ch() unc ion does he hea y li ing o unning
he hyb id sea ch que y. A hyb id sea ch que y combines mul iple ypes o sea ches
in o a single que y. This can include combining ex -based sea ches wi h o he ypes,
such as nume ical, ca ego ical, o e en ec o -based sea ches. No e ha he exac
sea ch ype would depend on he in o ma ion and he equi emen .
Lis ing 8.8 Sea ching using Redis
A base que y ha
p e il e s ields and
is implemen ed as
a KNN sea ch
Selec s he di e en ields we
a e in e es ed in sea ching
So s by cosine
simila i y in
descending o de
Execu es
he que y
Cap u es he que y
om he use
Vec o izes
he inpu
Con e s he ec o
o a NumPy a ay
Pe o ms he
simila i y sea ch
234 CHAPTER 8 Cha ing wi h you da a
In ou example, we combine a K-Nea es Neighbo s (KNN) sea ch on an embed-
ding ec o wi h o he sea ch ields. The KNN sea ch inds he mos ela ed i ems o a
gi en i em, in his case, he mos simila pos s o a gi en que y ec o . The que y
esul s a e so ed by ec o sco e, which means a high o low o de ing based on cosine
simila i y. In o he wo ds, he esul s wi h he highes simila i y a e shown i s . We
also es ic his o he op h ee i ems, as depic ed by he op_k pa ame e .
No e ha he exac na u e o he sea ch and ype also depends on he sea ch
engine and he da a ype. Fo mo e de ails on Redis sea ch ypes and KNN, see he
documen a ion a h ps://mng.bz/o0Gp.
Now ha we ha e seen he sea ch, le ’s combine all he dimensions and in eg a e
hem in o a cha expe ience using an LLM.
8.6 An end- o-end cha implemen a ion powe ed by RAG
Th oughou his and he p e ious chap e , we ha e discussed and examined all he
pieces o help us unde s and some o he co e concep s; now, we can b ing i all
oge he and build an end- o-end cha applica ion. In he applica ion, we can ask
ques ions o ge de ails abou ou da a (i.e., he blog pos s). Figu e 8.6 shows he
applica ion low.
Figu e 8.6 End- o-end cha applica ion
The ques ion he use asks i s ge s con e ed in o embeddings and hen sea ched in
Redis using a hyb id sea ch index o ind simila chunks, which a e e u ned as sea ch
esul s. As we saw ea lie , he blog pos s ha e al eady been injec ed in o he Redis da a-
base and indexed. Once we ha e he esul s, we o mula e he LLM p omp by combin-
ing he o iginal ques ions and he chunks e ie ed o answe om. These a e passed
in o he p omp i sel be o e inally calling he LLM o gene a e a esponse.
Ques ion + Sea ch
esul s
Gene a e
answe
Ques ion
Blog pos
RSS eed
LLM
Hyb id sea ch
Vec o
index
C ea e
embeddings
Redis
Fo mula e
p omp
Chunks
2358.6 An end- o-end cha implemen a ion powe ed by RAG
On he sea ch on , we deployed Redis unning locally and c ea ed a ec o index.
We ead all he blog pos s going back nea ly 20 yea s. We c ea ed he ele an chunks
o hese pos s and hei co esponding embeddings and popula ed ou ec o da a-
base. We also implemen ed a ec o sea ch on hose embeddings. The only piece le
is o in eg a e all o his in o ou applica ion and hook i up wi h an LLM o comple e
he las s age o ou RAG implemen a ion.
Lis ing 8.9 shows exac ly how o do his. Se e al helpe unc ions, such as ge _
sea ch_ esul s(), ake he use ’s que y, call ano he helpe unc ion o sea ch
Redis, and e u n any esul s ound. The ac ual API call ha calls he GPT is in he
ask_gp () unc ion, and i is a Cha Comple ion() API, jus like we saw ea lie .
As wi h p e ious examples, we lea e ou he code’s helpe unc ions and o he
aspec s o b e i y. The comple e code samples a e a ailable in he Gi Hub code
eposi o y accompanying he book (h ps://bi .ly/GenAIBook).
de hyb id_sea ch(que y_ ec o , clien , op_k=5, hyb id_ ields="*"):
...
e u n esul s
de ge _sea ch_ esul s(que y:s , max_ oken=4096,
➥debug_message=False) -> s :
que y_ ec o = ge _embedding(que y)
que y_ ec o = np.a ay(que y_ ec o ).as ype(
np. loa 32). oby es()
p in ("Sea ching o simila pos s...")
esul s = hyb id_sea ch(que y_ ec o , conn, op_k=5)
oken_budge = max_ oken - coun _ okens(que y)
i debug_message:
p in ( "Token budge : { oken_budge }")
message = 'Use he blog pos below o answe he subsequen
➥ques ion. I he answe canno be ound in he
➥a icles, w i e "So y, I could no ind an answe in
➥ he blog pos s."'
ques ion = " n nQues ion: {que y}"
i esul s:
o i, pos in enume a e( esul s.docs):
nex _pos = ' n nBlog pos : n""" n{pos .con en } n"""'
new_ oken_usage = coun _ okens(message + ques ion + nex _pos )
i new_ oken_usage < oken_budge :
i debug_message:
p in ( "Token usage: {new_ oken_usage}")
message += nex _pos
else:
b eak
else:
Lis ing 8.9 End- o-end RAG-powe ed cha
Vec o izes
he que y
Con e s he ec o
o a numpy a ay
Pe o ms he
simila i y sea ch
Manages
oken budge
Loops h ough he esul s
while s ill keeping wi hin
he oken budge

236 CHAPTER 8 Cha ing wi h you da a
p in ("No esul s ound")
e u n message + ques ion
de ask_gp (que y : s , max_ oken = 4096, debug_message = False) -> s :
message = ge _sea ch_ esul s(
que y,
max_ oken,
debug_message=debug_message)
messages = [
{" ole":
"sys em",
"con en ": "You answe ques ions in summa y om he [CA]
blog pos s."},
{" ole":
"use ",
"con en ": message},]
esponse = openai.Cha Comple ion.c ea e(
model="gp -3.5- u bo-16k",
messages=messages,
empe a u e=0.7,
max_ okens=2000,
op_p=0.95
)
esponse_message = esponse["choices"][0]["message"]["con en "]
e u n esponse_message
i __name__ == "__main__":
# En e a que y
while T ue:
que y = inpu ("Please en e you que y: ")
p in (ask_gp (que y, max_ oken=15000, debug_message=False))
p in ("=="*20)
We can see all his coming oge he when we un i and cha wi h he blog. I unde -
s ands he que y, c ea es embeddings, uses he ec o da abase and he associa ed ec-
o indexes o e ie e he op i e ma ching esul s, adds ha o he p omp , and uses
he LLM o gene a e he esponse ( igu e 8.7).
In he example we ha e seen hus a , we a e esponsible o e e y hing— om se -
ing up he Docke con aine s o deploying Redis and inges ing he da a. This is no
enough o en e p ises o go in o p oduc ion. Mo e sys em enginee ing is equi ed,
such as se ing up a ious clus e s o machines, scaling hem up o down as needed,
managing Redis, secu i y equi emen s, o e all ope a ions, and so o h. This akes a
signi ican amoun o ime, e o , cos , and skills ha no e e y o ganiza ion migh
ha e. Ano he op ion is o use Azu e OpenAI, which can do much o his ou o he
box and allows o ganiza ions a quicke ime o ma ke , po en ially a a lowe cos .
Le ’s see how Azu e OpenAI can achie e he same esul bu much as e .
Runs a ec o sea ch
o ge embeddings
Se s up he cha
comple ion calls
Calls
he LLM
2378.7 Using Azu e OpenAI on you da a
Figu e 8.7 Q&A using blog da a wi h GPT-3.5 Tu bo
8.7 Using Azu e OpenAI on you da a
Many en e p ises use Azu e, and inco po a ing Azu e OpenAI as pa o hei da a
s a egy ep esen s a pi o al s ep in employing he powe o gene a i e AI o business
ans o ma ion. Azu e OpenAI p o ides an en e p ise-g ade pla o m o in eg a e
ad anced AI models such as Cha GPT in o you da a wo k lows.
“Azu e OpenAI on you da a” is he se ice ha enables unning hese powe ul
cha models on you da a and ge ing ou -o - he-box ea u es ha en e p ises equi e
o p oduc ion wo kloads: scalabili y, secu i y, e eshes, and in eg a ion in o o he s.
You can connec you da a sou ce using Azu e OpenAI S udio ( igu e 8.8) o he
REST API.
NOTE Azu e AI S udio is a pla o m ha combines capabili ies ac oss mul iple
Azu e AI se ices. I is designed o de elope s o build gene a i e AI applica-
ions on an en e p ise-g ade pla o m. You can i s in e ac wi h a p ojec
code ia he Azu e AI SDK and Azu e AI CLI and seamlessly explo e, build,
es , and deploy using cu ing-edge AI ools and ML models.
A he co e o Azu e OpenAI’s appeal is i s seamless in eg a ion wi h he b oade
Azu e ecosys em. Connec ing hese powe ul AI models o you da a eposi o ies
unlocks he po en ial o mo e sophis ica ed da a analysis, na u al language p ocess-
ing, and p edic i e insigh s. This in eg a ion is pa icula ly bene icial o en e p ises
wi h a signi ican oo p in in Azu e, enabling hem o enhance hei exis ing in a-
s uc u e wi h minimal dis up ion.
238 CHAPTER 8 Cha ing wi h you da a
Figu e 8.8 Adding you da a o Azu e OpenAI
Azu e AI S udio suppo s mul iple op ions om exis ing Azu e AI Sea ch indexes,
Blob s o age, Cosmos DB, and so o h. One o hese op ions is a URL, which we will
use o inges blog pos s (see igu e 8.9). We can also sa e he RSS eed locally and
upload i as a ile. One o he ad an ages o using ou own Azu e AI Sea ch index is
ha i does he hea y li ing o keeping he da a inges ion up o da e om he sou ce
sys ems. This eplaces Redis and can be globally dis ibu ed o a cloud-scale i
equi ed.
Figu e 8.9 Azu e AI S udio: Adding a da a sou ce
2398.7 Using Azu e OpenAI on you da a
We can con igu e and se up mos hings he e, including a s o age esou ce whe e his
da a will be sa ed, an Azu e AI Sea ch esou ce, he index de ails, embedding de ails,
and so o h (see igu e 8.10). Wi h a ew clicks, all o his is se up and eady o us
o use.
Figu e 8.10 Con igu e de ails o da a inges ion
On he in o ma ion secu i y on , his p ocess is s eamlined by Azu e’s obus secu-
i y and compliance amewo k, ensu ing ha you da a emains p o ec ed h ough-
ou i s in e ac ion wi h AI models. Azu e OpenAI suppo s wo key ea u es on you
da a: ole-based and documen -le el access con ols. This ea u e, wo king alongside
Azu e AI Sea ch secu i y il e s, can be used o limi access o only hose use s who
should ha e access based on hei pe mi ed g oups and LDAP membe ships, which is
a c i ical equi emen o many en e p ises, especially in egula ed indus ies.
Finally, Azu e’s abili y o p ocess and analyze la ge cloud-scale olumes o uns uc-
u ed da a scalabili y is ano he signi ican ad an age. Fo example, OpenAI’s Cha GPT
in e nally uses Azu e AI Sea ch, and ha wo kload is 100+ million use s pe day. Azu e’s
cloud in as uc u e allows o he easy scaling o AI capabili ies as you da a needs
g ow. Mo e de ails on Azu e OpenAI can be ound a h ps://mng.bz/n022.
246 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
cus ome in e ac ions. By using models adap ed o hei speci ic needs, businesses can
gain insigh s and inc ease e iciency, which will p o ide hem wi h a compe i i e
ad an age in hei ma ke .
En e p ises can enhance e iciency and cos sa ings by educing esou ce equi e-
men s and esou ce needs. Fine- uning exis ing models equi es signi ican ly less com-
pu a ional powe and da a compa ed o aining models om he g ound up, which
esul s in lowe cos s and quicke deploymen imes. Fo example, aining Llama 2’s
70B pa ame e model ook many mon hs and 1,720,320 GPU hou s, compa ed o
ine- uning a GPT-3.5 Tu bo model, which akes only a ew hou s.
Model adap a ion comes wi h challenges, and se e al key a eas mus be consid-
e ed. Fi s , ask-speci ic da a is c ucial. I is essen ial o ha e su icien da a o ine- une
an LLM, ensu ing ha his da a is clean, consis en , and ep esen a i e o he speci ic
ask. Depending on he ask and LLM cha ac e is ics, his da a may equi e p ep o-
cessing, augmen a ion, o labeling. De e mining how much da a o ine- uning is
enough can be a nuanced p ocess, as i a ies based on se e al ac o s; a a minimum,
i is a ew hund ed o housand examples, depending on he model.
De e mining adequa e da a o ine- uning models such as OpenAI’s GPT-3.5
depends on a ious ac o s. The complexi y and speci ici y o he ask hea ily in lu-
ence da a equi emen s, wi h mo e complex asks equi ing mo e da a. Howe e , he
quali y o da a is c ucial and o en ou weighs he quan i y. La ge models such as
GPT-3.5 can bene i om mo e da a due o hei ex ensi e capaci y, bu hey also can
lea n e ec i ely om smalle , high-quali y da ase s. O ganiza ions ypically s a wi h a
baseline da ase and adjus i based on he model’s pe o mance, which is con inu-
ously moni o ed o signs o o e i ing o unde i ing. P ac ical cons ain s such as
compu a ional esou ces and ime also play a ole in de e mining he da ase size. The
expe ience and expe ise o da a scien is s o en guide he decision. Compa a i e
analysis and con inual e alua ion a e in ol ed in inding he op imal balance o da a
quan i y and quali y o he speci ic ask equi emen s.
Ano he signi ican challenge is ela ed o compu a ional esou ces and cos s.
Fine- uning LLMs can be esou ce in ensi e and cos ly, o en equi ing subs an ial
p ocessing powe (speci ically GPUs) connec ed wi h high-speed memo y. To manage
his, i migh be necessa y o u ilize cloud se ices, in es in specialized ha dwa e, o
employ dis ibu ed sys ems. Addi ionally, he cos o accessing p e ained LLMs can
a y, depending on he p o ide and licensing ag eemen s, which can add o he o e -
all expense.
Pe o mance and gene aliza ion a e also c i ical conside a ions. E alua ing he
pe o mance o a ine- uned LLM is impe a i e; i in ol es compa ing i o o he mod-
els o es ablished baselines, which ensu es ha he ine- uned LLM does no o e i
he aining da a and can gene alize well o new o unseen inpu s. We co e e alua-
ions la e in his chap e , and mo e de ails on benchma ks and associa ed ools a e
co e ed in chap e 12.

2479.2 When o ine- une an LLM
The e hical and social implica ions o using ine- uned LLMs mus be add essed as
well. This includes unde s anding po en ial isks and biases, such as conce ns ela ed
o da a p i acy, model ai ness, and social e ec s. Adhe ing o app op ia e guidelines,
s anda ds, o egula ions is necessa y o ensu e he e hical and esponsible use o ine-
uned LLMs.
Finally, inding he igh alen is c i ical. The need o specialized alen and
expe ise is a signi ican ac o in success ully ine- uning LLMs, which includes indi-
iduals who deeply unde s and ML, na u al language p ocessing (NLP), and he spe-
ci ic a chi ec u e o LLMs. These expe s mus be skilled in a ious a eas, such as da a
p epa a ion, model a chi ec u e design, aining s a egies, and pe o mance e alua-
ion. The need o skilled pe sonnel adds ano he laye o challenges o he al eady
complex p ocess o LLM ine- uning.
9.2 When o ine- une an LLM
Fine- uning is a echnique o imp o e a model’s pe o mance on a speci ic ask. How-
e e , i should be he las op ion and used only a e applying o he echniques, such
as p omp enginee ing and RAG. These echniques complemen each o he and
should be s acked o he bes ou pu , e en when using ine- uned models. As we saw
in ea lie chap e s, p omp enginee ing and RAG a e no mu ually exclusi e bu a e
complemen a y and should be s acked, e en when ine- uning. This s acked app oach
gi es he bes ou pu s, e en when using ine- uned models.
Once we decide o ine- une a model, we p epa e he da ase needed o aining
and s a he ine- uning p ocess, which can ake om a ew hou s o a ew days. A e
aining, we e alua e he ine- uned model agains he base model and he speci ic
ask’s baseline.
Le ’s use an example o help us ine- une and unde s and a ious aspec s. Say we
wan o adap a model o espond wi h emojis—a bo ha can unde s and wha we a e
asking bu espond only using emojis. We will call his EmojiBo . We wan o ine- une
GPT-3.5 Tu bo and make i an EmojiBo . Bu o show ha hese emojis a e di e en
and specialized o a ask, we don’ wan he emojis ha we would expec o see, say, in
a cha applica ion, on social media, o in ou ex s. Ra he , we wan he ones ha ol-
low he o ma used by Mic oso Teams.
Figu e 9.2 shows he high-le el low o ine- uning. Fi s , we iden i y a ask ha
would bene i om ine- uning (such as EmojiBo ). We iden i y which cha ac e is ics
all sho o he ask and c ea e e alua ion c i e ia. We hen compa e he de aul mod-
els’ pe o mance agains ou needs. I hey pe o m well, we es ablish a baseline and
cu a e he da ase equi ed o ine- uning. The amoun and o ma o da a depend
on he model; we’ll co e he de ails la e . We ob ain a ine- uned model a e ain-
ing, which can ake hou s o days, depending on he ask. Nex , we mus e alua e i
agains he base model and he baseline o he speci ic ask using quali a i e and
quan i a i e measu es.
248 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
Figu e 9.2 Fine- uning end- o-end low
I is qui e common and almos expec ed ha he i s ine- uned model will be wo se
han he de aul model. Usually, inding a sui able deploymen model akes 10–12
aining i e a ions. Each i e a ion equi es weaking he aining da a o add ess weak
a eas, which can ake hou s o days. I ’s a ime- and e o -consuming p ocess ha
should be one o he las s eps.
NOTE Fine- uning enhances he model’s pe o mance on asks simila o
hose ou lined in he ine- uning da ase . This p ocess migh mani es as
imp o ed accu acy, mo e ele an esponses, o a be e unde s anding o
domain-speci ic language. Imp o ed pe o mance in e ms o cheape o
as e models is a side ad an age and no some hing gua an eed. One way o
achie e his is o ine- une a smalle model, such as GPT-3.5 Tu bo, on a spe-
ci ic ask o imp o e i ins ead o using a mo e expensi e and powe ul model,
such as GPT-4.
Now ha we ha e iden i ied a ask ha makes sense o ine- une— ha is, an EmojiBo
whe e we wan o espond in emojis bu in a ce ain pa e n—le ’s examine he s eps
needed o ine- une an LLM such as GPT-3.5 Tu bo.
9.2.1 Key s ages o ine- uning an LLM
When we wan o ine- une a model o an iden i ied ask, as ou lined la e in igu e
9.6, sec ion 9.3.5, he e a e i e key s ages:
1Choosing a model and ine- uning me hod—To ine- une a language model, i is
necessa y o choose a ounda ion model ha sui s he ask and da a. Va ious
Baseline
de aul
model
Da a
cu a ion
Use
cases
Task
iden i ica ion T aining
P oduc ion
deploymen
E alua ion
No
Yes
• “EmojiBo ”
• E alua ion
c i e ia
GPT-3.5
Tu bo
• T aining da a —
Emojis in p e e ed
pa e n (sadkoala)
• E alua ion da a
Iden i y use case
i o ine- uning
E alua e
FT model
2499.3 Fine- uning OpenAI models
models a e a ailable, such as GPT, BERT, and RoBERTa. Conside ac o s such
as he model’s sui abili y o he ask, inpu /ou pu size, da ase size, and ech-
nical in as uc u e. Fine- uning me hods can a y based on he ask and da a,
such as ans e lea ning, sequen ial ine- uning, o ask-speci ic ine- uning.
2Da a cu a ion—This s age in ol es p epa ing a ask-speci ic da ase o ine-
uning and la gely in ol es p epa ing and p ep ocessing he da ase . This p o-
cess o en includes da a cleaning, ex no maliza ion (e.g., okeniza ion), and
con e ing he da a in o a o ma compa ible wi h he LLM’s inpu equi e-
men s (e.g., da a labeling). I is essen ial o ensu e ha he da a ep esen s he
ask and domain and co e s a ange o scena ios he model is expec ed o
encoun e in p oduc ion.
3Fine- uning—This s age is he ac ual p ocess o ine- uning and in ol es aining
he p e ained LLM on he ask-speci ic da ase . The aining p ocess in ol es
op imizing he model’s weigh s and pa ame e s o minimize he loss unc ion
and imp o e i s pe o mance on he ask. The ine- uning p ocess may in ol e
se e al ounds o aining on he aining se , alida ion o he alida ion se ,
and hype pa ame e uning o op imize he model’s pe o mance.
4E alua ing—Once he ine- uning p ocess is comple e, we mus e alua e he
model’s pe o mance on a es da ase . This helps o ensu e ha he model is
gene alizing well o new da a and pe o ming well on he speci ic ask. Com-
mon me ics used o e alua ion include accu acy, p ecision, ecall, F1 sco e,
Bilingual E alua ion Unde s udy (BLEU), Recall-O ien ed Unde s udy o Gis -
ing E alua ion (ROUGE), and so o h. This opic is co e ed la e in de ail in
sec ion 9.3.2.
5Deploymen (in e ence)—Once he ine- uned model is e alua ed and we a e
happy wi h i s pe o mance, i can be deployed o p oduc ion. The deploymen
p ocess may in ol e in eg a ing he model in o a la ge sys em, se ing up he
necessa y in as uc u e, and moni o ing he model’s pe o mance in eal-
wo ld scena ios.
Now ha we ha e a basic concep o model adap a ion and when o ine- une, le ’s see
how o ine- une.
9.3 Fine- uning OpenAI models
He e, we’ll use an example o ine- une OpenAI’s GPT-3.5 Tu bo model. Cu en ly,
o OpenAI, only GPT-4, GPT-3.5 Tu bo, GPT-3 Babbage (Babbage-002), and GPT-3
(Da inci-002) a e a ailable o ine- uning. Se e al OSS LLMs, such as Me a’s Llama 2
and G42’s Falcon, can be ine- uned. In ou case, he book’s Gi Hub eposi o y
(h ps://bi .ly/GenAIBook) con ains comple e code samples and sc eensho s ha we
use and show how o ine- une OpenAI GPT-3.5 Tu bo. To make his as eal o o gani-
za ions as possible, we will show he p ocess by using bo h Azu e OpenAI and OpenAI.
250 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
We wan o ine- une GPT-3.5 Tu bo and make i an EmojiBo , whe e he model
esponds in emojis only. Howe e , as we ou lined ea lie , we wan emojis o ollow he
o ma used by Mic oso Teams.
In Mic oso Teams, he ex in pa en heses, such as (dog), ende s he ele an
emojis. We will ine- une he model o espond o his ex , which ep esen s he spe-
ci ic ask we wan he model o imp o e. To unde s and all he di e en op ions and
he co esponding ex in Teams, see h ps://bi .ly/TeamEmojis. Gi en ha we ha e a
ask, le ’s s a p epa ing he da ase .
9.3.1 P epa ing a da ase o ine- uning
Now ha we ha e eached a poin whe e we ha e iden i ied a ask o which ine-
uning would make sense, we need o c ea e a da ase o examples equi ed o ine-
une. We need o c ea e wo se s o da ase s: one o aining and ano he o alida-
ion. A alida ion da ase is a subse o da a used o e alua e he pe o mance o a ine-
uned model on he a ge ask. I is di e en om he aining da ase , which is used
o upda e he model’s pa ame e s, and he es da ase , which is used o measu e he
inal accu acy o he model.
A alida ion da ase is impo an o ine- uning LLMs because i helps us o a oid
o e i ing, which is when he model lea ns he speci ic pa e ns o he aining da a
and ails o gene alize o new da a. Using a alida ion da ase , you can moni o he
model’s p og ess and adjus he lea ning a e, he numbe o epochs, o o he hype -
pa ame e s o op imize he model’s pe o mance.
These examples should show di e en ways o sol e he p oblem and he esul s o
each me hod. We also need o iden i y sho comings using a base model, such as
inconsis en pe o mance on edge cases, inabili y o i enough sho p omp s in he
con ex window o s ee he model, high la ency, and so o h.
I is highly ecommended ha a alida ion da ase be used o measu e he e ec-
i eness o ine- uning. The aining and alida ion da ase s a e in he JSONL o ma ,
wi h each line con aining a JSON objec wi h a ex key o inpu ex and a a ge key
o desi ed ou pu ex .
Fine- uned models a e di ec ly co ela ed wi h high-quali y aining da a. Di e en
models equi e a ying amoun s o aining da a. Fo e ec i e aining, we need hun-
d eds o housands o cu a ed da a examples. Al hough he API equi es a minimum
o 10 examples, ha ing mo e is gene ally be e . Ten examples a en’ enough o in lu-
ence LLMs such as GPT-3.5 Tu bo in any signi ican way.
OpenAI ecommends ha ing a leas 50 good examples o ain ou model. They
also ecommend mo e good examples o be e - ine- uned models han bad ones, as
hose examples can nega i ely a ec he model. Consequen ly, i is ad isable only o
use he bes ones om you in e nal da a. The ollowing lis ing shows an example
JSONL ile o cha da a.
2519.3 Fine- uning OpenAI models
{"messages": [{" ole": "sys em", "con en ": "You' e a cha bo ha only
esponds wi h emojis!"}, {" ole": "use ", "con en ": "I hea d he koala
habi a was des oyed by he i e."}, {" ole": "assis an ", "con en ":
"(sadkoala)" }]}
{"messages": [{" ole": "sys em", "con en ": "You' e a cha bo ha only
esponds wi h emojis!"}, {" ole": "use ", "con en ": "I' e been wo king
nons op and need a b eak."}, {" ole": "assis an ", "con en ": "( i ed)"
}]}
{"messages": [{" ole": "sys em", "con en ": "You' e a cha bo ha only
esponds wi h emojis!"}, {" ole": "use ", "con en ": "I jus inished
eading an amazing book!"}, {" ole": "assis an ", "con en ": "(like)"
}]}
As we can see, he model is being shown how o espond using emojis o ma ed in a
ce ain pa e n, such as (sadkoala), ( i ed), and (like).
BASIC CHECKS
Be o e ine- uning, i ’s impo an o pe o m basic checks on he aining da a o
a oid was ing ime and esou ces. These checks can include da a eadabili y, o ma -
ing alida ion, ligh weigh analysis o missing pai s, and oken leng h.
We alida e he da a ile by loading and eading i using he basic_checks() unc-
ion. I akes a ilename as inpu and e u ns he numbe o messages ound. The mes-
sages mus be in he cha comple ion o ma o ine- uning GPT-3.5 Tu bo.
# Basic checks o ensu e he da a ile is alid
de basic_checks(da a_ ile):
y:
wi h open(da a_ ile, ' ', encoding='u -8') as :
da ase = [json.loads(line) o line in ]
p in ( "Basic checks o ile {da a_ ile}:")
p in ("Coun o examples in aining da ase :", len(da ase ))
p in ("Fi s example:")
o message in da ase [0]["messages"]:
p in (message)
e u n T ue
excep Excep ion as e:
p in ( "An e o occu ed in ile {da a_ ile}: {e}")
e u n False
FORMAT CHECKS
Once we ha e done he basic checks, he nex s ep is o check he ile o he o ma
and ensu e i is s uc u ed p ope ly be o e p ocessing i u he . This is an impo an
s ep, mainly because e en i he o ma is inco ec , we won’ ge an e o when we
s a he aining job, bu he esul ing model will be e y poo , and we will only
Lis ing 9.1 JSONL example
Lis ing 9.2 Da ase alida ion: Basic checks
Opens he ile
in ead-mode
Loads each line
o he ile as a
JSON objec and
s o es i in a lis
P in s he i s example
om he da ase and
helps isually check
whe he hings
in ui i ely look OK
Loops h ough he messages in he
i s example and p in s each one

252 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
ealize his pos aining when we deploy. To a oid much o his ouble, i is highly ec-
ommended ha we check o o ma s.
Lis ing 9.3 shows o ma _checks(), which checks o cha comple ion o ma and
pai ing, wi h da ase and ilename as i s wo a gumen s. I ca ches mos e o s bu no
all. The unc ion i e a es o e each example in he da ase and checks o da a ype
checks, he p esence o message lis s, and message keys. I alida es ha i has he el-
e an oles and con en alida ion. This unc ion also helps debug da a- ela ed
p oblems.
de o ma _checks(da ase , ilename):
# Ini ialize a dic iona y used o ack o ma e o s
o ma _e o s = de aul dic (in )
# I e a e o e each example in he da ase
o ex in da ase :
# Check i he example is a dic iona y, i no
# inc emen he co esponding e o coun
i no isins ance(ex, dic ):
o ma _e o s["da a_ ype"] += 1
con inue
# Check i he example has a "messages" key,
# i no inc emen he co esponding e o coun
messages = ex.ge ("messages", None)
i no messages:
o ma _e o s["missing_messages_lis "] += 1
con inue
# I e a e o e each message
o message in messages:
# Check i he message has " ole" and "con en " keys,
# i no inc emen he co esponding e o coun
i " ole" no in message o "con en " no in message:
o ma _e o s["message_missing_key"] += 1
# Check i he message has any un ecognized keys,
# i so inc emen he co esponding e o coun
i any(k no in (" ole", "con en ", "name",
➥" unc ion_call") o k in message):
o ma _e o s["message_un ecognized_key"] += 1
# Check i he ole o he message is one o he ecognized
# oles, i no inc emen he co esponding e o coun
i message.ge (" ole", None) no in (
"sys em",
"use ",
"assis an ",
" unc ion",
):
o ma _e o s["un ecognized_ ole"] += 1
Lis ing 9.3 Da ase alida ion: Checking o o ma
2539.3 Fine- uning OpenAI models
# Check i he message has ei he con en o a unc ion call,
# and i he con en is a s ing, i no inc emen he
# co esponding e o coun
con en = message.ge ("con en ", None)
unc ion_call = message.ge (" unc ion_call", None)
i (no con en and no unc ion_call) o no
➥isins ance(con en , s ):
o ma _e o s["missing_con en "] += 1
# Check i he e is a leas one message wi h he ole "assis an ",
# i no inc emen he co esponding e o coun
i no any(message.ge (" ole", None) == "assis an "
➥ o message in messages):
o ma _e o s["example_missing_assis an _message"] += 1
# I he e a e any o ma e o s, p in hem and e u n False
i o ma _e o s:
p in ( "Fo ma ing e o s ound in ile { ilename}:")
o k, in o ma _e o s.i ems():
p in ( "{k}: { }")
e u n False
p in ( "No o ma ing e o s ound in ile { ilename}")
e u n T ue
Finally, we should also unde s and how he da ase pe o ms when i comes o simple
da a dis ibu ions, oken coun s, and cos s.
NOTE The oken coun is impo an , no jus o cos . I i is la ge han he
maximum numbe o okens he model can handle, i will be unca ed wi h-
ou wa ning. Knowing his up on is e y help ul.
The ollowing lis ing shows how we can inish doing he checks on he da ase .
# P icing and de aul n_epochs es ima e
MAX_TOKENS = 4096
TARGET_EPOCHS = 3
MIN_TARGET_EXAMPLES = 100
MAX_TARGET_EXAMPLES = 25000
MIN_DEFAULT_EPOCHS = 1
MAX_DEFAULT_EPOCHS = 25
de es ima e_ okens(da ase , assis an _ okens):
# Se he ini ial numbe o epochs o he a ge epochs
n_epochs = TARGET_EPOCHS
# Ge he numbe o examples in he da ase
n_ ain_examples = len(da ase )
# I he examples o al is less han he minimum a ge
# adjus he epochs o ensu e we ha e enough examples o
Lis ing 9.4 Da ase alida ion: Cos es ima ion and basic analysis
254 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
# aining
i n_ ain_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES
➥// n_ ain_examples)
# I he numbe o examples is mo e han he maximum a ge
# adjus he epochs o ensu e we don' exceed he maximum
# o aining
eli n_ ain_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES
➥// n_ ain_examples)
# Calcula e he o al numbe o okens in he da ase
n_billing_ okens_in_da ase = sum(
min(MAX_TOKENS, leng h) o leng h in assis an _ okens
)
# P in he o al oken coun ha will be cha ged du ing aining
p in (
"Da ase has ~{n_billing_ okens_in_da ase } okens ha
➥will be cha ged o du ing aining"
)
# P in he de aul numbe o epochs o aining
p in ( "You will ain o {n_epochs} epochs on his da ase ")
# P in he o al numbe o okens ha will be cha ged du ing aining
p in ( "You will be cha ged o ~{n_epochs *
➥n_billing_ okens_in_da ase } okens")
# I he o al oken coun exceeds he maximum okens, p in a wa ning
i n_billing_ okens_in_da ase > MAX_TOKENS:
p in (
"WARNING: You da ase con ains examples longe han
➥4K okens by {n_billing_ okens_in_da ase –
➥MAX_TOKENS} okens."
)
p in (
"You will be cha ged o he ull leng h o hese
➥examples du ing aining, bu only he i s
➥4K okens will be used o aining."
9.3.2 LLM e alua ion
E alua ing LLMs is impo an o ensu ing hei quali y, eliabili y, and ai ness. How-
e e , e alua ing LLMs is complex, as i in ol es mul iple dimensions and challenges.
Main aining di e se au oma ic me ics can help e icien ly ack model imp o emen s
du ing adap a ion cycles, while educing cos ly manual e iews. Me ics should be cus-
omized o each adap ed model’s use cases and business needs. Con inuous logging
om p oduc ion sys ems enables he e alua ion o eal-wo ld pe o mance o e ime.
Benchma king agains baselines is an essen ial s ep in e alua ing ine- uned GPT
models. I in ol es compa ing he pe o mance o he ine- uned model wi h a p ees-
ablished s anda d o baseline model. This baseline could be he model’s pe o -
mance be o e ine- uning o a di e en model known o i s p o iciency in a simila
2559.3 Fine- uning OpenAI models
ask. The pu pose o his compa ison is o quan i y he imp o emen s b ough by ine-
uning. Fo ins ance, a ine- uned model migh be benchma ked agains a s anda d
ansla ion model in a language ansla ion ask o assess ansla ion accu acy o lu-
ency imp o emen s. This p ocess helps in unde s anding he e icacy o ine- uning
and iden i ying a eas whe e he model has imp o ed o s ill needs enhancemen .
EVALUATION CRITERIA
When p epa ing he ine- uning da ase , we should also de ine he e alua ion c i e ia.
When ine- uning, he e alua ion p ocess begins by es ablishing clea c i e ia c i ical
o assessing he pe o mance and e icacy o he model in i s in ended applica ion.
These c i e ia o en include ele ance, cohe ence, accu acy, and language luency
( able 9.1).
E alua ing a ine- uned GPT model using hese c i e ia in ol es a combina ion o
au oma ed me ics, manual e iew, and use eedback, ensu ing ha he model mee s
he high s anda ds equi ed o i s speci ic applica ion.
CHOOSING APPROPRIATE METRICS
When ine- uning models, selec ing he igh me ics o e alua ion is c ucial o accu-
a ely assessing he model’s pe o mance and imp o emen s [1]. A e ine- uning,
hese me ics indica e how well he model adap s o speci ic asks o domains. They
p o ide insigh s in o a ious aspec s o model pe o mance, such as p edic ion
Table 9.1 Fine- uning e alua ion c i e ia
E alua ion
c i e ia Desc ip ion
Rele ance Gauges how well he model’s esponses o ou pu s align wi h he con ex and in en
o he inpu . This is especially c ucial in applica ions such as cha bo s, whe e p o-
iding con ex ually app op ia e esponses is key o use sa is ac ion. Rele ance is
o en assessed by examining whe he he model can s ay on opic and p o ide in o -
ma ion o esponses di ec ly applicable o he que ies o asks.
Cohe ence Re e s o he logical consis ency o he model’s ou pu s. A ine- uned model should
gene a e con ex ually ele an , logically sound, and cohe en ex . This means he
esponses should ollow a logical s uc u e and na a i e low, making sense in he
con e sa ion o ex con ex . Cohe ence is i al o main aining use engagemen
and ensu ing he model’s ou pu s a e unde s andable and meaning ul.
Accu acy This pa icula ly comes in o play when he model is used o asks in ol ing ac ual
in o ma ion, such as educa ional ools, in o ma ional bo s, o any applica ion whe e
p o iding co ec in o ma ion is c i ical. Accu acy is measu ed by how well he
model’s esponses align wi h ac ual co ec ness and objec i e u h.
Language luency Pe ains o he g amma ical and syn ac ical co ec ness o he model’s ou pu s.
E en i a model is highly ele an , cohe en , and accu a e, poo language luency
can signi ican ly de ac om he use ’s expe ience. This includes p ope g amma ,
punc ua ion, and s yle, ensu ing he ex gene a ed is co ec and eads na u ally o
he end use .
262 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
se , using he same cos unc ion as he aining loss. The alida ion loss is usually mea-
su ed a e each epoch, a comple e pass h ough he aining se .
Figu e 9.3 shows an example o he loss when we ine- une using Azu e OpenAI
and he model pe o mance du ing aining. The g aph in igu e 9.3 showing he
aining loss o ine- uning aining esul s illus a es how well he model lea ns om
he aining da a.
We see he loss alue o each aining s ep, a ba ch o aining examples. The
x-axis is he s ep numbe , and he y-axis is he loss alue. The g aph shows ha he loss
dec eases as he model ains on mo e da a, indica ing ha i is imp o ing i s pe o -
mance. Howe e , he loss does no each ze o, which means he model s ill has some
e o s and canno pe ec ly i he da a. This is no mal, as o e i ing he da a can lead
o poo gene aliza ion o new da a.
Figu e 9.3 T aining loss when ine- uning GPT-3.5 Tu bo
To in e p e he g aph and de e mine whe he he model is pe o ming well, ideally
o a good i , we wan bo h aining and alida ion loss o dec ease o s abili y wi h a
minimal gap be ween he wo, which indica es ha he model is lea ning and gene al-
izing well. I he aining loss dec eases while he alida ion loss inc eases, he model
may be o e i ing he aining da a and no gene alizing well o new da a. Finally, i
bo h aining and alida ion loss emain high, he model may be unde i ing, which

2639.3 Fine- uning OpenAI models
means i ’s no lea ning he unde lying pa e ns in he da a well enough. The scale o
he loss and he numbe o aining s eps mus be conside ed. The model migh need
mo e aining i he loss is s ill high o he alida ion loss has ye o s abilize. Fo hose
wi h an ML model expe ience o backg ound, he o e all app oach o spli ing
be ween aining and alida ing da ase s and in e p e ing hese me ics is e y simila .
An in e es ing beha io is ha he da a in he loss g aph luc ua es, indica ing ha
he loss alue can a y depending on he samples in each ba ch. I is no mal o he
model o be noisy; howe e , in ine- uning, he model lea ns and imp o es i s pe o -
mance as long as he loss dec eases o e ime.
To ind whe he he ine- uning is good, we would ypically look o a low and s a-
ble alida ion loss close o he aining loss. The h esholds o wha would be consid-
e ed good loss alues a e subjec i e and will a y depending on he ask’s complexi y
and he na u e o he da a.
MEAN TOKEN ACCURACY
Mean oken accu acy measu es how well a ine- uned model co ec ly p edic s each
oken in he ou pu sequence ha he model gene a es o p edic s du ing aining. I
is e lec ed as a pe cen age, ha is, he pe cen age o okens he model p edic s co -
ec ly in a da ase . Fo example, i he mean oken accu acy is 90%, i means ha on
a e age, he model co ec ly p edic s 90% o he okens. This is an a e age calcula ed
by di iding he numbe o co ec ly p edic ed okens by he o al numbe o okens in
he ou pu .
Simila o he loss o mean oken accu acy, we ha e wo me ics: one o he ain-
ing and he o he o alida ion (assuming one has p o ided a alida ion da ase ). Fig-
u e 9.4 shows he mean oken accu acy o a ine- uning job o aining and alida ion.
The aining mean oken accu acy is he a e age accu acy o he model’s p edic ions
Figu e 9.4 T aining mean oken accu acy
264 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
on he aining da a. I measu es how well he model lea ns om he aining da a
and adap s o i . A high aining mean oken accu acy sugges s ha he model lea ns
e ec i ely om he aining da a. In con as , he alida ion mean oken accu acy is
he a e age accu acy o he model’s p edic ions on he alida ion da a. I measu es
how well he model gene alizes o new da a i has no seen be o e. A high alida ion
mean oken accu acy sugges s ha he model does no o e i he aining da a and
can gene alize well o new da a.
The di e ence be ween he wo me ics can help iden i y whe he he model is
o e i ing o he aining da a. Suppose he aining mean oken accu acy is much
highe han he alida ion mean oken accu acy. In ha case, i sugges s ha he
model is o e i ing o he aining da a and no gene alizing well o new da a. In con-
as , i he alida ion mean oken accu acy is much lowe han he aining mean
oken accu acy, i sugges s ha he model is unde i ing he aining da a and no
lea ning e ec i ely.
This me ic is use ul o e alua ing he pe o mance o a ine- uned model on he
aining da a. A good mean oken accu acy can be ela i e and depends on he spe-
ci ic ask o applica ion. Gene ally, a highe alue (close o 1.0) indica es be e pe -
o mance. Howe e , i does no e lec how well he model gene alizes o new o
unseen da a.
No e ha he in e p e a ion o hese me ics can depend on he speci ic ask o
applica ion. The e o e, i ’s essen ial o conside o he me ics and quali a i e e alua-
ions o ge a comp ehensi e iew o he model’s pe o mance. The quali y o mean
oken accu acy depends on he ask’s complexi y and he na u e o ex . Highe accu-
acy (close o 100%) is expec ed o simple asks o ex s wi h p edic able pa e ns.
A lowe accu acy migh s ill be good o mo e complex asks o di e se ex s.
One way o assess whe he he mean oken accu acy is good is o compa e i wi h a
baseline o wi h he pe o mance o o he models on he same ask. I you model’s
accu acy is highe han he baseline o simila models, i ’s a posi i e sign.
Now ha we unde s and he basic cons uc s o ine- uning and using a CLI o
code, le ’s ake a look a how we can achie e his using Azu e OpenAI and a GUI. As
s a ed ea lie , we will use Azu e OpenAI as an example, bu he same p ocess applies
o OpenAI.
9.3.5 Fine- uning using Azu e OpenAI
Ins ead o using he SDK and he CLI, we also ha e a isual in e ace ha we can
employ o achie e he same ou come. O en, doing his manually would be a be e
app oach han using code. To kick o a ine- uning job in Azu e OpenAI, when
logged in o he Azu e Po al and in he AI S udio, unde models, we choose he
op ion o c ea e a cus om model ( igu e 9.5).
We go h ough he wiza d and choose o upload he aining and alida ion da a-
se s, as shown in igu e 9.6. No e: I hese ha e been uploaded using he SDK, we will
ind hem he e, as long as hey a e in he same enan s and ha e he same end-poin
deploymen .
2659.3 Fine- uning OpenAI models
Figu e 9.5 Azu e AI S udio: C ea ing a cus om model
Figu e 9.6 Choosing a aining and alida ion da ase
266 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
Figu e 9.7 shows he s a us and de ails o each o ou aining jobs.
Now ha we ha e a ine- uned model, we need o deploy i o a es en i onmen o
un an e alua ion.
9.4 Deploymen o a ine- uned model
The deploymen o a ine- uned model is qui e s aigh o wa d. The new ine- uned
model shows up as ano he model a ailable o use in ou Azu e enan o OpenAI
subsc ip ion, as shown in igu es 9.8 and 9.9, espec i ely.
Figu e 9.7
T aining job de ails
Figu e 9.8
Deploying ine- uned
model o in e ence
2679.4 Deploymen o a ine- uned model
OpenAI has launched a ea u e in he play-
g ound ha le s use s see how a ine- uned
model di e s om he base model side by
side, which can be use ul isually bu no
e icien ly.
9.4.1 In e ence: Fine- uned model
Re u ning o ou ask, we now ha e a ine-
uned model o EmojiBo , whe e he bo
esponds in emojis using he o ma ha
Mic oso Teams uses. Figu e 9.10 shows
how he ou -o - he-box GPT-3.5 Tu bo
model beha es when asked o espond wi h
emojis; his is expec ed bu will no wo k
wi h Teams.
Figu e 9.10 Response wi h emojis using GPT-3.5 Tu bo
Howe e , he expe ience o he same ques ions using ou ine- uned powe ed Emoji-
Bo is qui e di e en , as shown in igu e 9.11. He e, o he same ques ions as be o e,
we ge he esponse in he o ma we’ll be able o use in Teams.
Figu e 9.9 OpenAI ine- uned model
deploymen

268 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
Figu e 9.11 Fine- uned EmojiBo in e ence
Howe e , i is easy o ge comple ely inco ec esul s on he same ques ions om ea -
lie and wi h he same pa ame e se ings ( igu e 9.12). We can see he ine- uned
model answe in emojis—(Pizza) and (Feeling i ed)—bu he esul is no wha we
expec ed.
Figu e 9.12 Fine- uned EmojiBo wi h inco ec esul s
2699.5 T aining an LLM
To esol e his, we need o weak he sys em p omp o s ee he model o espond
using emojis whe e possible, which is a g ea way o close ou by eminding ha a
s acked app oach o p omp enginee ing, RAG, and ine- uning (whe e he ask a
hand wa an s) is he igh app oach in almos all cases.
Now ha we ha e seen how o ine- une a model and he s eps one needs o unde -
ake, le us swi ch and look a some o he unde pinnings o he echnology ha will
make his wo k. S ic ly speaking, we do no equi e his o do a ine- uning, bu i will
help us o unde s and some o he nuances o achie e be e ou comes o ine-
uning. We will s a by unde s anding how we ain an LLM and, a a high le el, wha
he s eps en ail.
9.5 T aining an LLM
I is help ul o ou unde s anding o model adap a ion and he echniques and hei
associa ed limi a ions o examine wha i means and wha i akes o do ull aining
o an LLM. A a high le el, i we we e o do ull aining and build an LLM om
sc a ch, ha aining would in ol e ou majo s ages, as shown in igu e 9.13.
Figu e 9.13 Full end- o-end aining o an LLM [5]
Le ’s go h ough each s age in mo e de ail.
9.5.1 P e aining
Base LLMs a e buil du ing his ini ial s age. We ouched on base LLMs in chap e 2.
These a e he o iginal, p e ained models ained on a massi e co pus o ex da a.
They can gene a e ex based on he pa e ns hey lea ned du ing aining. Some also
call hem aw language models.
Raw in e ne
• T illions o wo ds
• Low quali y
• La ge quan i y
S age P e aining Supe ised ine- uning Rewa d modeling Rein o cemen lea ning
Da ase
Algo i hm
Model
No es
Demons a ions
• Ideal assis an esponses
• ~10–100K p omp - esponse
pai s (human w i en)
• High quali y; low quan i y
Compa isons
• 100K–1M compa isons
(human w i en)
• High quali y; low quan i y
P omp s
• ~10K–100K p omp s
(human w i en)
• High quali y; low quan i y
Language modeling
• Nex oken p edic ion
Language modeling
• Nex oken p edic ion
Bina y classi ica ion
• P edic ewa ds consis en
wi h p e e ences
Rein o cemen lea ning
• Gene a e oken ha
maximizes he ewa d
Base model SFT model RM model RL model
GPUs: Thousands
T aining: Mon hs
Model can be deployed
E.g. GPT, PaLM, LLaMA
GPUs: 1–100
T aining: Days
Model can be deployed
E.g. Vicuna-13B
GPUs: 1–100
T aining: Days
GPUs: 1–100
T aining: Days
Model can be deployed
E.g. Cha GPT, Claude
Ini ialized
Ini ialized
Ini ialized
Use RM
270 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
NOTE While powe ul, hese base models a e less sui able o gene al-
pu pose applica ions because hey may need o align hei esponses wi h he
speci ic in en ions o ins uc ions o he use . They a e mo e like aw engines
o ex gene a ion, lacking he e ined capabili y o unde s and and adhe e
o he nuances o use p omp s. Base models do no answe ques ions and
o en espond wi h mo e ques ions. In con as , ins uc o s a e ailo ed o be
mo e in e ac i e and use - iendly, which makes hem mo e sui able o a
wide ange o applica ions, om cus ome se ice cha bo s o educa ional
ools, whe e unde s anding and ollowing ins uc ions accu a ely is c ucial.
9.5.2 Supe ised ine- uning
Supe ised ine uning (SFT) is he nex s age. In his s age, he base model unde -
goes e ining o he base model wi h high-quali y, domain-speci ic da a. These da ase s
consis o p omp – esponse pai s, manually c ea ed (o en by human con ac o s),
which a e ewe in numbe han in he p e ious s age bu o much highe quali y. The
con ac o s ollow de ailed documen a ion o c ea e hese p omp – esponse pai s,
ensu ing ele ance and quali y. Simila o he las p e aining s age, he SFT model is
ained o p edic he nex oken in hese pai s, bu hese a e less accu a e and con ex-
ually awa e when gene a ing he esponse.
SFT is a echnique o op imizing LLMs on labeled da a o a speci ic downs eam
ask, such as sen imen analysis, ex summa iza ion, o machine ansla ion. La e in
he chap e , we will co e addi ional de ails o SFT me hods and app oaches.
9.5.3 Rewa d modeling
The hi d phase is ewa d modeling, he i s pa o he Rein o cemen Lea ning
om Human Feedback (RLHF) p ocess. The main goal a his s age is o de elop a
model ha can e alua e and ank esponses based on hei quali y and ele ance. To
do his, he SFT model ( om he p e ious s age) gene a es mul iple esponses o a
p omp , which human con ac o s hen ank based on a ious c i e ia such as domain
expe ise, ac -checking, and code execu ion. These ankings ain a ewa d model,
which lea ns o sco e esponses like human con ac o s.
9.5.4 Rein o cemen lea ning
This is he second pa o he RLHF p ocess, and i aims o enhance he language
model’s abili y o gene a e high-quali y esponses h ough i e a i e eedback. In his
inal s age, he ewa d model sco es esponses gene a ed by he SFT model o many
p omp s. These sco es a e used o u he ain he SFT model, ul ima ely leading o
he c ea ion o he RLHF model. The RLHF aligns he LLMs wi h human p e e ences
o expec a ions o a gi en ask o domain, such as cha , code, o c ea i e w i ing.
Mo e de ails on RLHF me hods will be co e ed la e in his chap e .
9.5.5 Di ec policy op imiza ion
Di ec policy op imiza ion (DPO) [6] is ano he echnique, which is a new ype o
ewa d model pa ame e iza ion in RLHF ha used o ine- uning LLMs o align wi h
2719.6 Model adap a ion echniques
ou p e e ences. I exploi s a ela ionship be ween ewa d unc ions and op imal poli-
cies. I allows us o skip he ewa d modeling s ep ou lined ea lie , as long as he
human eedback can be exp essed in bina y e ms— ha is, a choice be ween wo
op ions. DPO can sol e he ewa d maximiza ion p oblem wi h cons ain s in a single
policy aining phase, essen ially ea ing i as a classi ica ion p oblem. PPO (see sec-
ion 9.7) equi es a ewa d model and a complex RL-based op imiza ion p ocess;
DPO, howe e , bypasses he ewa d modeling s ep and di ec ly op imizes he language
model on p e e ence da a, which can be simple and mo e e icien . As DPO elimi-
na es he need o ain a ewa d model ins ead o aining a ewa d model and op i-
mizing a policy based on ha model, we can di ec ly op imize he policy. This
cha ac e is ic makes his app oach quicke , and ewe esou ces a e used han in
RLHF wi h PPO.
9.6 Model adap a ion echniques
The e a e se e al echniques a ailable o model adap a ion, wi h each echnique p o-
iding i s unique app oach and being sui able o di e en scena ios depending on
he speci ic equi emen s (i.e., he model size, a ailable compu a ional esou ces, and
he desi ed le el o adap a ion). One o he main echniques widely used o adap ing
LLMs is low adap a ion anking (LoRA), which will be co e ed in mo e de ail in he
nex sec ion. In LoRA, ins ead o upda ing all he weigh s in he model, only a small
subse o pa ame e s, in oduced as low- ank ma ixes, a e modi ied. This app oach
allows e icien aining and adap a ion, while p ese ing mos o he p e ained
model’s s uc u e and knowledge.
Pa ame e e icien ine- uning (PEFT) is a concep in ML ha e e s o me hods o
adap ing and ine- uning la ge p e ained models, such as GPT-3.5, o minimize he
numbe o pa ame e s ha need o be upda ed. This app oach is pa icula ly aluable
when dealing wi h la ge models, as i educes compu a ional equi emen s and can mi -
iga e p oblems such as o e i ing. PEFT echniques a e designed o make ine- uning
mo e accessible and e icien , especially o use s wi h limi ed compu a ional
esou ces—LoRA is an example o he PEFT me hod. Fo mo e de ails on di e en
ypes o PEFT echniques and de ails, see he pape “Scaling Down o Scale Up: A Guide
o Pa ame e -E icien Fine-Tuning” by Vladisla Lialin [7].
Ca as ophic o ge ing is a phenomenon whe e a model loses i s abili y o pe -
o m well on p e ious asks a e being ine- uned on new asks [8]. This can happen
when he model o e w i es i s o iginal pa ame e s wi h ask-speci ic ones, hus o ge -
ing he gene al knowledge i lea ned om p e aining. When implemen ing PEFT o
p e en ca as ophic o ge ing, we ine- une only a small subse o pa ame e s, while
keeping mos p e ained pa ame e s ixed. This way, he model can e ain i s gene al-
iza ion abili y and adap o new asks wi hou losing i s p e ious pe o mance.
Supe ised ine- uning (SFT) is ano he ype o adap a ion echnique; i is a spe-
ci ic ype o ine- uning whe e he model is u he ained on a labeled da ase . I ’s
supe ised because he aining p ocess uses a da ase ha pai s he inpu da a wi h
he co ec ou pu (labels). SFT is pa icula ly common in asks such as classi ica ion,
278 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
P oximal policy op imiza ion (PPO)—PPO is an RL algo i hm ha i e a i ely
imp o es he p ima y model’s policy (decision-making p ocess). The algo i hm
upda es he model’s policy o maximize he ewa ds p edic ed by he ewa d
model. PPO is chosen o i s s abili y and e iciency in handling la ge and com-
plex models.
Human eedback loop—This loop in ol es con inuous inpu om human e alua-
o s who assess he quali y o he model’s ou pu s. The eedback is used o ain
he ewa d model u he , c ea ing a dynamic lea ning en i onmen whe e he
model adap s o e ol ing human p e e ences and s anda ds. The loop ensu es
ha he model emains aligned wi h human expec a ions and can adap o
changes.
NOTE PPO-p x [13] is an adap a ion o he PPO algo i hm ailo ed o ine-
uning RLHF. I in eg a es a e e ence o he o iginal LLM o main ain pe -
o mance, while aligning he model’s ou pu s wi h human p e e ences. This
app oach helps mi iga e he alignmen ax, ensu ing he LLM emains e ec-
i e and di e se in i s ou pu s a e aining. Essen ially, PPO-p x balances he
model’s p e aining knowledge wi h he new eedback o c ea e a high-
pe o ming LLM aligned wi h human alues.
RLHF migh seem like he sil e bulle in many ways, bu en e p ises mus be awa e o
some challenges. Le ’s explo e hese.
9.7.1 Challenges wi h RLHF
RLHF is a powe ul echnique o eaching models complex asks, bu i has many
p ac ical challenges and limi a ions o en e p ises. An RLHF sys em needs a lo o
human p e e ence da a, which is ha d o ge because i in ol es o he people who a e
no pa o he aining p ocess. How well RLHF wo ks depends on how good he
human anno a ions a e, which humans can w i e, such as when hey adjus he ini ial
LLM in Ins uc GPT o p o ide a ings o how much hey like di e en ou pu s om
he model. Some o hese challenges a e
Technical complexi y—Implemen ing RLHF equi es ad anced skills and knowl-
edge in ML, RL, and NLP. I also in ol es complex se up and main enance p o-
cesses, such as con igu ing he model a chi ec u e, ewa d sys ems, and
eedback mechanisms.
Compu a ionally in ensi e—RLHF models need a lo o compu a ional esou ces,
such as GPUs and se e s, which can be expensi e. They also depend on he
quali y and quan i y o human eedback, which can be ha d o ob ain and p o-
cess. F om a p ac ical iewpoin , a lo o he human eedback is om con ac
wo ke s (o gig wo ke s) on c owdsou cing pla o ms whe e ge ing he igh
quali ied people in ce ain domains migh be challenging. Mo eo e , ensu ing
a di e se and unbiased da ase o aining can be challenging and compu a-
ionally hea y.

2799.7 RLHF o e iew
No scalable—RLHF models a e di icul o scale o la ge-scale applica ions,
equi ing con inuous human eedback and inc easing compu a ional esou ces.
They a e also ha d o adap o di e en domains o changing da a en i onmen s,
esul ing in limi ed adap abili y and cus omiza ion.
Quali y—RLHF models a e p one o bias, as hey e lec human eedback p o-
ide s’ subjec i e opinions and po en ial p ejudices. Ensu ing e hical use and
unbiased ou pu s is a majo conce n. Main aining a consis en quali y o human
eedback can be di icul , as human judgmen can a y and a ec he model’s
eliabili y and pe o mance. When ying o build a help ul model ha a oids
ha m, he e is an inhe en ension be ween hose wo dimensions. P o iding
oo many poli e esponses, such as “So y, I am an AI model, and I canno help
you wi h ha ,” o some hing simila , limi s he model’s use ulness. O ganiza-
ions mus balance and mi iga e his using addi ional guidance, aining, and
o he ML echniques o c ea e syn he ic da a whe e possible.
Cos —RLHF models a e cos ly o implemen and ope a e. Cos s include in a-
s uc u e, compu a ional esou ces, da a acquisi ion, and hi ing skilled p o es-
sionals. The e a e also ongoing ope a ional cos s ela ed o da a managemen ,
model upda es, and con inuous eedback in eg a ion. These cos s can be sub-
s an ial, especially o la ge-scale implemen a ions.
Da a—I is ha d o p oduce good human ex ha answe s speci ic p omp s
because i usually means paying pa - ime wo ke s (ins ead o p oduc use s o
c owdsou cing). Luckily, he amoun o da a needed o ain he ewa d model
o mos uses o RLHF (~50k p e e ence labels) is no ha cos ly. Howe e , i is
s ill mo e han wha academic labs can usually a o d. The e is only one big
da ase o RLHF on a gene al language model ( om An h opic) and a ew
smalle da ase s o speci ic asks (such as summa iza ion da a om OpenAI).
Ano he p oblem wi h da a o RLHF is ha human anno a o s can disag ee a
lo , which makes he aining da a e y noisy wi hou a ue answe .
RLHF o e s ad anced capabili ies in eaching models o pe o m complex asks; how-
e e , i s adop ion in en e p ise se ings is hinde ed by echnical complexi y, esou ce
demands, scalabili y challenges, e hical conside a ions, and high cos s.
On he one hand, hese ba ie s make i di icul o many o ganiza ions o imple-
men and sus ain RLHF sys ems in hei ope a ions p ac ically. On he o he hand,
hose who can implemen his, especially some o he echnical companies such as
OpenAI and An h opic, can bene i om i . Le ’s see how we can scale an RLHF
implemen a ion.
9.7.2 Scaling an RLHF implemen a ion
Scaling an RLHF implemen a ion o LLMs in ol es a mul i ace ed app oach ha bal-
ances e iciency, di e si y, and quali y con ol. Fi s , au oma ing da a collec ion and
implemen ing e icien eedback mechanisms a e c ucial o handling la ge olumes
o da a and eedback. Au oma ed sys ems can ga he da a om a ious sou ces o
h ough in e aces designed o e icien human in e ac ion.
280 CHAPTER 9 Tailo ing models wi h model adap a ion and ine- uning
Using a la ge, di e se pool o human e alua o s is essen ial o cap u ing a wide
ange o pe spec i es, helping he model o be mo e obus and less biased. To ensu e
he eedback is in o ma i e, in elligen sampling s a egies, such as ac i e lea ning,
can be used o iden i y and p io i ize he mos aluable ins ances o e alua ion.
Pa alleliza ion and dis ibu ion o asks among mul iple e alua o s can signi i-
can ly speed up he eedback p ocess. The sys em can handle la ge-scale da a p ocess-
ing and model aining wi h scalable in as uc u e.
Implemen quali y con ol measu es, such as c oss- alida ion among e alua o s
and algo i hms, o de ec biases and main ain he quali y and consis ency o eedback.
Regula moni o ing and e alua ion o he model’s pe o mance can help you unde -
s and he e ec s o RLHF and guide con inuous imp o emen .
Finally, e hical conside a ions and bias mi iga ion a e c ucial. Ensu ing ha eed-
back does no ein o ce ha m ul s e eo ypes and ac i ely add essing po en ial biases is
i al o de eloping ai and esponsible models. O e all, scaling RLHF o LLMs
equi es a comp ehensi e app oach ha in eg a es echnical, logis ical, and e hical
s a egies, aiming o a sys em ha e ec i ely inco po a es human eedback in o he
model’s lea ning p ocess.
Summa y
Model adap a ion should be ancho ed in a se o use cases, and i should be he
las eso o en e p ises ying o imp o e he model on hose asks.
P omp enginee ing and RAG mus wo k in conjunc ion wi h ine- uning in a
s acked manne .
When done co ec ly, ine- uning has a high upside om enhanced e iciency
and possible cos sa ings.
Fine- uning has a high cos , and you should be awa e o challenges such as he
need o ask-speci ic da a, compu a ional esou ces, pe o mance e alua ion,
and e hical conside a ions.
Fine- uning should be done in conjunc ion wi h e alua ions and will o en
equi e mul iple i e a ions o ob ain a model eady o p oduc ion deploymen .
The choice o me ics o e alua ing ine- uned models la gely depends on he
model’s speci ic applica ion and objec i es.
The main model adap a ion echniques ha a e mo e cos -e icien a e supe -
ised ine- uning (SFT), pa ame e e icien ine- uning (PEFT), and low- ank
adap a ion (LoRA).
Pa 3
Deploymen and
e hical conside a ions
This inal sec ion ocuses on he p ac ical aspec s o deploying gene a i e AI
applica ions and he e hical conside a ions in ol ed. I p o ides a comp ehen-
si e guide o applica ion a chi ec u e, scaling up o p oduc ion, and he ope a-
ional bes p ac ices o deploymen . The closing chap e s emphasize he
impo ance o e hical p inciples, discussing po en ial isks, esponsible AI li ecy-
cle, and ools o ensu ing e hical AI p ac ices.
Chap e 10 discusses he a chi ec u al conside a ions necessa y o building
gene a i e AI applica ions. I co e s he o ches a ion and g ounding laye s and
how o il e models and esponses o e ec i ely ensu e op imal applica ion pe -
o mance.
Chap e 11 ocuses on he challenges o scaling gene a i e AI applica ions
and p o ides bes p ac ices o p oduc ion deploymen . I add esses c i ical
aspec s such as me ics, la ency, scalabili y, and secu i y conside a ions o ensu e
smoo h and e icien ope a ion.
Chap e 12 explains how o e alua e and benchma k la ge language models,
discussing a ious me ics and benchma ks. I co e s ask-speci ic benchma ks
and he impo ance o human e alua ion in assessing model pe o mance.
Chap e 13, he inal chap e , highligh s gene a i e AI’s e hical challenges
and isks. I ou lines he p inciples and p ac ices o esponsible AI use, includ-
ing con en sa e y, da a p i acy, secu i y conside a ions, and he e hical li ecycle
o AI implemen a ion.
282 CHAPTER
283
Applica ion a chi ec u e
o gene a i e AI apps
The en e p ise a chi ec u e landscape con inues o change, mo ing inexo ably
owa d mo e sel -di ec ed sys ems—in elligen , sel -managing applica ions ha a e
capable o lea ning om in e ac ions and adap ing in eal ime. Fu he mo e,
inc easing digi iza ion uels he AI digi al ans o ma ion. This ongoing p og ession
This chap e co e s
An o e iew o GenAI applica ion a chi ec u e and
he eme ging GenAI app s ack
The di e en laye s ha make up he GenAI app
s ack
GenAI a chi ec u e p inciples
The bene i s o o ches a ion amewo ks and
some o he popula ones
Model ensemble a chi ec u es
How o c ea e a s a egic amewo k o a c oss-
unc ional AI Cen e o Excellence

284 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
unde sco es a ans o ma i e e a in en e p ise echnology, poised o ede ine he e y
na u e o so wa e de elopmen and deploymen .
Na u ally, his is mo e o an ideal. Howe e , mos en e p ises a e s ill e y inexpe i-
enced wi h AI-in used applica ions in gene al, and gene a i e AI is s ill e y much in
i s ea ly s ages. This chap e will explo e how en e p ise applica ion a chi ec u e s an-
da ds and bes p ac ices mus adap o he eme ging gene a i e AI echnologies and
use cases. The chap e in oduces he concep o a GenAI app s ack as a concep ual e -
e ence a chi ec u e o building gene a i e AI applica ions, and i ou lines i s main
componen s and how gene a i e AI i s oge he in he b oade en e p ise a chi ec-
u e. The GenAI app s ack is an e olu ion o cloud applica ion a chi ec u e, wi h a
shi owa d da a-cen ic and AI-d i en a chi ec u es.
This chap e s a s by ou lining wha he new GenAI app s ack en ails, co e ing
de ails o each sec ion and, inally, b inging all he concep s oge he in o wo king
examples ha make i eal and usable. As you lea n abou his s ack, we’ll consolida e
he di e en aspec s o he a chi ec u e desc ibed in p e ious chap e s. One hing o
no e is ha despi e ep esen ing a big change, gene a i e AI does no equi e a com-
ple ely new a chi ec u e bu builds on he exis ing cloud-based dis ibu ed a chi ec-
u e. This cha ac e is ic allows us o build on exis ing bes p ac ices and a chi ec u e
p inciples o inco po a e new GenAI- ela ed pa adigms. Le ’s s a by iden i ying he
upda es o en e p ise applica ion a chi ec u e.
10.1 Gene a i e AI: Applica ion a chi ec u e
O e he las ew yea s, en e p ise applica ion a chi ec u e has wi nessed a signi ican
e olu ion, going h ough se e al ans o ma i e s ages o mee he escala ing
demands o business agili y, scalabili y, and in elligence. Ini ially, en e p ises ope -
a ed on monoli hic sys ems, ha is, obus bu in lexible s uc u es wi h igh ly in e -
wo en componen s, which made changes cumbe some and wide- eaching. These
sys ems se he s age o en e p ise compu ing bu we e no sui able o he apid e o-
lu ion o business needs.
The p oli e a ion o cloud compu ing and cloud-na i e a chi ec u es saw he ise
o con aine iza ion and o ches a ion ools, which simpli ied he deploymen and
managemen o applica ions ac oss di e se en i onmen s. Simul aneously, he deluge
o da a led o da a-cen ic a chi ec u es ha p io i ize da a p ocessing and analy ics as
key d i e s o business ope a ions.
The e olu ion o en e p ise applica ion a chi ec u e o gene a i e AI can be seen
as a shi om adi ional so wa e de elopmen o da a-d i en so wa e syn hesis. In
he adi ional pa adigm, so wa e enginee s w i e code o implemen speci ic unc-
ionali ies and logic, using amewo ks and lib a ies ha abs ac away low-le el
de ails. In he gene a i e AI pa adigm, so wa e de elope s p o ide da a and high-
le el speci ica ions and use la ge language models (LLMs) o gene a e code ha
mee s he desi ed equi emen s and cons ain s. The ollowing wo key concep s
enabled his pa adigm shi : So wa e 2.0 and building on copilo s.
28510.1 Gene a i e AI: Applica ion a chi ec u e
10.1.1 So wa e 2.0
So wa e 2.0 is a e m coined by And ej Ka pa hy [1] o desc ibe he end o eplac-
ing handc a ed code wi h lea ned neu al ne wo ks. So wa e 2.0 uses ad ances in AI,
such as na u al language p ocessing (NLP), compu e ision, and ein o cemen
lea ning, o c ea e so wa e componen s ha can lea n om da a, adap o new si ua-
ions, and in e ac wi h humans na u ally.
Recen ly, we ha e ansi ioned om w i ing code and managing explici ins uc-
ions o a desi ed goal o a mo e abs ac app oach. De elope s ain models on la ge
da ase s ins ead o w i ing explici ins uc ions o ules in a p og amming language.
So wa e 2.0 also educes he need o manual debugging, es ing, and main enance,
as he neu al ne wo ks can sel -co ec and imp o e o e ime (see igu e 10.1).
Figu e 10.1 So wa e 1.0 e sus So wa e 2.0
This allows he models o lea n he ules o pa e ns hemsel es. Algo i hms and mod-
els a e c a ed o lea n om da a, make decisions, and imp o e o e ime, e ec i ely
w i ing he so wa e. This pa adigm shi has ans o med he ole o AI om a sup-
po i e ool o a undamen al componen o sys em a chi ec u e.
10.1.2 The e a o copilo s
Ano he key concep ha acili a ed he e olu ion o en e p ise applica ion a chi ec-
u e o gene a i e AI is copilo s—a concep o iginally p oposed by Mic oso . Copi-
lo s a e mean o augmen humans and human capabili ies and c ea i i y. Using an
30
Inpu da a
unc oo(x): in {
e u n x+1 }
Code
Compu a ion 6
Ou pu
Compu a ion Weigh s
Ou pu
Labelled aining da a
Model a chi ec u e
So wa e 1.0 So wa e 2.0
286 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
ai plane analogy, i we a e humans, we a e he pilo s; ins ead o AI being on au opilo
whe e we ha e no con ol o say in how i unc ions, his new AI plays he ole o copi-
lo s ha help us ake on cogni i e load and some o he d udge y o wo k. S ill, we
emain in cha ge as he pilo .
The Copilo s ack is a amewo k o building AI applica ions and copilo s ha use
LLMs o unde s and and gene a e na u al language and code. Copilo s a e in elligen
assis an s ha can help use s wi h complex cogni i e asks such as w i ing, coding,
sea ching, o easoning. Mic oso has de eloped a ange o copilo s o di e en
domains and pla o ms, such as Gi Hub Copilo , Bing Cha , Dynamics 365 Copilo ,
and Windows Copilo . You can also build you cus om Copilo using he Copilo s ack
and ools, such as Azu e OpenAI, Copilo S udio, and he Teams AI Lib a y. Copilo s
can also be in eg a ed in o exis ing ools and pla o ms, such as Gi Hub, Visual S udio
Code, and Jupy e No ebook, o enhance he p oduc i i y and c ea i i y o so wa e
de elope s.
Copilo s a e based on he concep o So wa e 2.0, whe e hey use LLMs o gene -
a e code om na u al language desc ip ions ins ead o elying on manually w i en
code. Howe e , hey should be seen as he GenAI applica ion s ack, simila o he
LAMP s ack o web de elopmen . LAMP is an ac onym o he s ack componen s:
Linux (ope a ing sys em); Apache (webse e ); MySQL (da abase); and PHP, Pe l, o
Py hon (p og amming language).
Copilo s a e a use ul model o en e p ises o ollow when designing hei gene a-
i e AI apps en e p ise a chi ec u e because hey o e se e al ad an ages (e.g.,
quicke and simple de elopmen , mo e c ea i i y and es ing, and imp o ed coope a-
ion and lea ning, enabling en e p ises o y ou new concep s and oppo uni ies o
o c ea e o iginal solu ions o di icul p oblems). Le ’s expand on wha he Copilo
s ack is o make i mo e ele an and eal in conc e e e ms.
10.2 Gene a i e AI: Applica ion s ack
Copilo s’ a chi ec u e comp ises se e al laye s and componen s ha wo k oge he o
p o ide a seamless and powe ul use expe ience, as ou lined in igu e 10.2. We will
s a om he bo om up, examine each laye and componen in de ail, and ind ou
how hey in e ac .
The AI in as uc u e laye is he ounda ional laye ha powe s e e y hing and
hos s he co e AI models and compu a ional esou ces. I encompasses he ha dwa e,
so wa e, and se ices ha enable he de elopmen and deploymen o AI applica ions
and a e o en op imized o AI wo kloads. This also includes he massi ely scalable dis-
ibu ed high-pe o mance compu ing (HPC), equi ed o aining he base ounda-
ional models.
The ounda ional model laye includes he ange o suppo ed models, om
hos ed ounda ion models o he model you ain and wan o deploy. The hos ed
ounda ional models a e la ge p e ained models, such as LLMs and o he s ( ision
and speech models), and he newe small language models (SLMs) ha can be used
o in e ence; hese models can be closed o open. Some o he models can be u he
28710.2 Gene a i e AI: Applica ion s ack
adjus ed o speci ic asks o domains. These models a e hos ed and managed wi hin
he AI in as uc u e laye o ensu e high pe o mance and a ailabili y. Use s can
selec om a ious hos ed ounda ion models based on hei needs and p e e ences.
The o ches a ion laye manages he in e ac ions be ween he a ious componen s
o he a chi ec u e, ensu ing seamless ope a ion and coo dina ion. I is esponsible
o key unc ions such as ask alloca ion, esou ce managemen , and wo k low
op imiza ion:
The esponse il e ing componen uses he p omp enginee ing se o compo-
nen s; he e, he p omp s and esponses a e analyzed, il e ed, and op imized o
gene a e sa e ou pu s.
The sys em p omp can also p o ide addi ional in o ma ion o cons ain s o
he AI model o ollow. The use can exp ess a sys em p omp ia a simple syn-
ax, o he sys em can au oma ically gene a e i .
G ounding is he implemen a ion o e ie al-augmen ed gene a ion (RAG),
and i e e s o he p ocess o con ex ualizing he esponses gene a ed by he AI
model. G ounding ensu es he ou pu s a e syn ac ically co ec , seman ically
meaning ul, and ele an o he gi en con ex o domain. We use plugins o ge
da a inges ed om di e en en e p ise sys ems.
The plugin execu ion laye uns plugins ha add mo e ea u es o he basic AI
model. Plugins a e sepa a e and eusable pa s ha can do di e en hings,
AI in as uc u e
Founda ional models
Hos ed ounda ional
models
Hos ed ine- uned
ounda ional models BYO models
G ounding Plugin execu ion
Vec o DB, APIs, e c.)
(
Response il e ing Me a p omp
O ches a ion
Copilo on end + UX
AI sa e y
Figu e 10.2
GenAI applica ion s ack
294 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
ge ing bogged down in he echnical de ails o LLM in e ac ion. Table 10.1 ou lines
he key esponsibili ies.
These di e en componen s wo k oge he o c ea e a s ong o ches a ion sys em
ha se es as he ounda ion o he success ul deploymen and ope a ion o gene a-
i e AI echnology in he en e p ise sec o . Such o ches a ion is necessa y o he
in icacy and cons an changes o AI-powe ed applica ions o a oid ine iciencies, mis-
akes, and sys em b eakdowns.
10.3.1 Bene i s o an o ches a ion amewo k
O ches a o s a e essen ial o managing he complex sys ems powe ing gene a i e AI
apps. These sys ems in ol e di e se p ocesses ha need ca e ul coo dina ion h ough
o ches a ion ools. O ches a o s simpli y wo k lows and ensu e asks a e done in
Table 10.1 O ches a o key esponsibili ies
A ea Desc ip ions
Wo k low managemen O ches a o ensu es ha he sequence o p ocesses— om da a inges ion
and p ocessing o AI model in e ence and esponse deli e y—is execu ed in
an o de ly and e icien manne . This includes s a e managemen o coo di-
na e dependencies be ween asks, e o handling, e y mechanisms, and
he dynamic alloca ion o esou ces based on he ask load.
Se ice o ches a ion Mic ose ices a chi ec u e is ypically employed, whe e each se ice is
esponsible o a disc e e unc ion in he gene a i e AI p ocess. Se ice
o ches a ion is abou managing hese se ices o scale, communica e, and
unc ion seamlessly. In addi ion, con aine iza ion pla o ms such as Docke
and o ches a ion sys ems such as Kube ne es deploy, manage, and scale
he mic ose ices ac oss a ious en i onmen s.
Da a low coo dina ion Ensu e ha da a lows co ec ly h ough he sys em, om he ini ial da a
sou ces o he model and back o he end use o applica ion. This includes
p ep ocessing inpu s, queue managemen o incoming eques s, and ou -
ing ou pu s o he co ec des ina ions.
Load balancing and
au o-scaling
Load balance s dis ibu e incoming AI in e ence eques s ac oss mul iple
ins ances o p e en any single ins ance om becoming a bo leneck. Au o-
scaling adjus s he numbe o ac i e ins ances based on he cu en load,
ensu ing cos -e ec i e esou ce use. This also has API managemen compo-
nen s o manage a e limi s and implemen back-o s a egies o p oduc-
ion wo kloads.
Model e sioning and
ollback
O ches a ion includes main aining di e en e sions o AI models and man-
aging hei deploymen . I allows o quick ollback o p e ious e sions i a
new model exhibi s unexpec ed beha io o poo pe o mance.
Managing model con ex
windows
O ches a o enhances in e ac ions by e icien ly managing con ex windows
and oken coun s. I acks and dynamically adjus s con e sa ion his o y
wi hin he model’s oken limi s and main ains cohe ence in esponses, espe-
cially in long o complex exchanges. Bes p ac ices include e icien con ex
managemen , handling edge cases, con inuous pe o mance moni o ing,
and inco po a ing use eedback o ongoing imp o emen s.

29510.3 O ches a ion laye
o de , wi h dependencies and e o -handling ules aken ca e o . This esul s in a eli-
able and egula ope a ional low, whe e s eps o p ep ocessing, compu a ion, and
pos p ocessing a e smoo hly connec ed, ensu ing da a quali y and consis en ou pu
gene a ion.
Scalabili y is ano he a ea whe e o ches a ion is i al. As demand luc ua es, a sys-
em ha dynamically adjus s esou ce alloca ion, especially o p oduc ion wo kloads,
becomes c ucial. An o ches a o can p o ide his agili y using di e en echniques,
such as load balance s o dis ibu e wo kloads e enly and au o-scaling ea u es o
modula e compu ing powe in eal- ime. This elas ici y mee s he load equi emen s
and op imizes esou ce usage, balancing pe o mance and cos e iciency. The o ches-
a o s would need o manage his ac oss di e en models, as well as he compu a-
ional and cos p o iles o hose models.
O ches a o s o e a cen alized managemen and moni o ing abili y. They cons i-
u e amewo ks ha o e dashboa ds and ools o moni o ing LLM usage, iden i y-
ing bo lenecks, and oubleshoo ing p oblems. This enhances sys em eliabili y by
moni o ing se ice heal h, esponding o ailu es, and ensu ing minimal down ime.
O ches a o s can employ au oma ed eco e y p ocesses, such as ins ance es a s o
eplacemen s, allowing o se ice con inui y.
The de aul deploymen model is a pay-as-you-go me hod o mos cloud-based LLM
p o ide s. This model is sha ed wi h o he cus ome s, and incoming eques s a e
queued and p ocessed on a i s -come, i s -se ed basis. Howe e , o p oduc ion wo k-
loads ha equi e a be e use expe ience, Azu e OpenAI se ice o e s a p o isioned
h oughpu uni s (PTU) ea u e. This ea u e allows cus ome s o ese e and deploy
uni s o model p ocessing capaci y o p omp p ocessing and gene a ing comple ions.
Each uni ’s minimum PTU deploymen , inc emen s, and p ocessing capaci y a y
depending on he model ype and e sion. An o ches a o will manage he di e en
deploymen endpoin s be ween egula pay-as-you-go and PTUs o ensu e op imum
pe o mance and cos -e ec i eness.
O ches a o s play a signi ican ole in inc easing p oduc i i y and s eamlining
ope a ions, which a e achie ed in wo ways. Fi s , i educes he need o w i e epe i-
i e code o common asks such as p omp cons uc ion and ou pu p ocessing, hus
inc easing de elope s’ p oduc i i y. Second, i au oma es he deploymen and man-
agemen o se ices, hus minimizing he possibili y o human e o . This au oma ed
p ocess educes manual o e head and ensu es e ec i e compu e esou ce u iliza ion,
s eamlining p oduc ion ope a ions. We will del e deepe in o managing ope a ions
la e in he chap e .
Compliance and go e nance a e essen ial equi emen s o any en e p ise. An
o ches a o can assis in en o cing compliance by de e mining how da a is p ocessed,
s o ed, and used in he wo k low, which ensu es ha he da a complies wi h he en e -
p ise’s da a go e nance policies and p i acy egula ions. Main aining us and legal
compliance in en e p ise ope a ions is c ucial and can be achie ed h ough adhe -
ence o da a go e nance policies and p i acy egula ions.
296 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
10.3.2 O ches a ion amewo ks
Many people a e amilia wi h o ches a o s and o ches a ion amewo ks. While
amewo ks such as Kube ne es, Apache Ai low, and ML low a e e ec i e gene al
o ches a ion ools o so wa e enginee ing and can suppo ML ope a ions, hey a e
no designed exclusi ely o gene a i e AI applica ions. O ches a ing wo k lows o
gene a i e AI equi es a mo e in ima e unde s anding o he nuances o hese com-
plex echnologies.
The choice o an o ches a ion amewo k o gene a i e AI applica ions depends
on he exis ing echnology s ack, he complexi y o he wo k lows, and speci ic
equi emen s. Table 10.2 ou lines o ches a ion amewo ks ailo ed o he speci ic
needs o gene a i e AI applica ions. These amewo ks can handle adi ional compu-
a ional wo k lows; manage in e ac ions’ s a e, con ex , and cohe ence; and a e
designed o sui he unique equi emen s o gene a i e AI.
Table 10.2 O ches a ion amewo ks
Name No es
Seman ic
Ke nel
Seman ic Ke nel is an OSS amewo k om Mic oso ha aims o c ea e a uni ied ame-
wo k o seman ic sea ch and gene a i e AI. I uses p e ained LLMs and g aph-based
knowledge ep esen a ions o enable ich and di e se na u al language in e ac ion.
LangChain LangChain is a lib a y ha chains language models wi h ex e nal knowledge and capabili-
ies. I acili a es he o ches a ion o LLMs such as GPT-4 wi h da abases, APIs, and o he
sys ems o c ea e mo e comp ehensi e AI applica ions.
P omp -
Laye
P omp Laye is a pla o m ha simpli ies he c ea ion, managemen , and deploymen o
p omp s o LLMs. Use s can isually edi and es p omp s, compa e models, log equ-
es s, and moni o pe o mance. Mo e de ails can be ound a h ps://p omp laye .com/.
Rasa Rasa is an en e p ise con e sa ional AI pla o m ha le s you c ea e cha - and oice-based
AI assis an s o manage a ious con e sa ions o di e en pu poses. In addi ion o con-
e sa ion AI, i also o e s a gene a i e AI-na i e me hod o building assis an s, wi h en e -
p ise ea u es such as analy ics, secu i y, obse abili y, es ing, knowledge in eg a ion,
oice connec o s, and so o h. Mo e in o ma ion is a ailable a h ps:// asa.com/.
YouCha
API
The YOU API is a sui e o ools ha helps en e p ises g ound he ou pu o LLMs in he
mos ecen , accu a e, and ele an in o ma ion a ailable. You can use he YOU API o
access web sea ch esul s, news a icles, and RAG o LLMs. Mo e de ails can be ound
a h ps://api.you.com/.
Ragna Ragna is an open sou ce RAG-based AI o ches a ion amewo k ha allows you o expe i-
men wi h di e en aspec s o a RAG model—LLMs, ec o da abases, okeniza ion
s a egies, and embedding models. I also allows you o c ea e cus om RAG-based web
apps and ex ensions om di e en da a sou ces. Mo e de ails can be ound a
h ps:// agna.cha /.
Llama-
Index
LlamaIndex is a cloud-based o ches a ion amewo k ha enables you o connec you
da a o LLMs and gene a e na u al language esponses. I can access a ious LLMs.
Hugging
Face
Hugging Face p o ides a collec ion o p e ained models o a ious NLP asks. I can be
used wi h o ches a ion ools o manage he li ecycle o gene a i e AI applica ions. Mo e
de ails can be ound a h ps://hugging ace.co/.
29710.3 O ches a ion laye
10.3.3 Managing ope a ions
An o ches a o plays a c ucial ole in enhancing he pe o mance and seamless in e-
g a ion o gene a i e AI models, such as LLMs, wi hin in ica e sys ems and wo k lows.
I s co e unc ionali y op imizes ope a ional e iciency and os e s a be e use expe i-
ence h ough sophis ica ed con ol mechanisms.
The o ches a o is c ucial in managing he LLM’s in eg a ion in o complex wo k-
lows, such as con en c ea ion pipelines. I plans and schedules he LLM’s ac i a ion
o ensu e smoo h da a collec ion, p ep ocessing, and ex gene a ion, hus simpli ying
he en i e p ocess om s a o inish. This coo dina ion imp o es he wo k low and
ensu es ha he API calls o he gene a ed con en a e imely and ele an .
The o ches a o ’s main ole is o balance he load and esou ces o he LLM’s
se ices. I e ec i ely manages eques s o a oid o e loading o was ing esou ces. Fu -
he mo e, i can change compu a ional esou ces by cons an ly acking wo kload and
pe o mance me ics. This lexibili y ensu es he sys em s ays esponsi e and
esou ces a e used e icien ly, e en when demanding changes.
The o ches a o also supe ises API in e ac ions, en o cing a e limi s and con-
olling secu e access, while managing any e o s o dis up ions ha may occu .
Simul aneously, i handles he essen ial asks o da a p ep ocessing and pos p ocess-
ing. This means cleaning, o ma ing, and ans o ming da a o ensu e i is in he
igh s a e o p ocessing by he LLM and hen imp o ing he ou pu o mee se qual-
i y s anda ds and o ma equi emen s.
Fo wo k lows equi ing sequen ial p ocessing, he o ches a o ensu es ha ou -
pu s om one phase a e accu a ely ed in o he nex , main aining he p ocess in eg-
i y. This is complemen ed by i s ole in en o cing secu i y and compliance measu es,
whe e i il e s sensi i e in o ma ion and ensu es adhe ence o legal and e hical s an-
da ds, in addi ion o conduc ing audi s o accoun abili y and quali y assu ance.
Fo applica ions such as cha bo s o digi al assis an s, he o ches a o manages
use in e ac ions by handling session s a es and que ies, di ec ing hem o he LLM o
o he se ices as needed, which esul s in a mo e engaging and esponsi e use expe-
ience. Mo eo e , he o ches a o con inuously moni o s he LLM pe o mance, ana-
lyzing esponse ime, accu acy, and h oughpu o guide op imiza ion e o s. I also
manages upda es o he LLM, ensu ing ha ansi ions o newe e sions o con igu a-
ions a e smoo h and minimally dis up i e o use s.
As we can see, an o ches a o can signi ican ly enhance he e iciency, eliabili y,
and scalabili y o an LLM when in eg a ed in o complex sys ems, p o iding a laye o
managemen ha coo dina es be ween he LLM and o he sys em componen s.
Building you own o ches a o amewo k
C ea ing you own gene a i e AI o ches a o o an en e p ise can be di icul . How-
e e , i allows you o cus omize he amewo k acco ding o you equi emen s and
inc eases you unde s anding o he echnology. This p ocess demands ex ensi e
298 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
Some new amewo ks used widely nowadays a e Seman ic Ke nel, LangChain, and
LlamaIndex. These amewo ks enable he use o GenAI models, al hough hey
add ess di e en aspec s. We will explo e hese in mo e dep h.
SEMANTIC KERNEL
Seman ic Ke nel (SK) om Mic oso is an SDK ha in eg a es LLMs wi h languages
such as C#, Py hon, and Ja a. I simpli ies he some imes-complex p ocess o in e ac-
ing LLMs wi h adi ional C#, Py hon, o Ja a code. Wi h SK, de elope s can de ine
seman ic unc ions ha encapsula e speci ic ac ions hei applica ion is capable o ,
such as da abase in e ac ions, API calls, o email ope a ions. SK allows hese unc ions
o be o ches a ed seamlessly ac oss mixed p og amming language en i onmen s.
The eal powe o SK lies in i s AI-d i en o ches a ion capabili ies. Ins ead o
me iculously cho eog aphing he LLM in e ac ions by hand, SK le s de elope s use
na u al language o s a e a desi ed ou come o ask. The AI au oma ically de e mines
how o combine he ele an seman ic unc ions o achie e his goal, which signi i-
can ly accele a es de elopmen and lowe s he skill ba ie o using LLMs.
SK can bene i en e p ises when building LLM applica ions by simpli ying he appli-
ca ion p ocess, educing he cos and complexi y o p omp enginee ing, enabling
in-con ex lea ning and ein o cemen lea ning, and suppo ing mul imodali y and
(con inued)
echnical knowledge and esou ces. Un o una ely, no uni e sal boile pla e code is
a ailable o de elop an LLM o ches a o . Be o e p oceeding wi h his p ojec , con-
side he ollowing ac o s:
Cus omiza ion—Tailo ing he amewo k o mee you speci ic applica ion and
pe o mance equi emen s
In eg a ion wi h exis ing sys ems—Seamlessly in eg a ing he o ches a o wi h
you exis ing in as uc u e and wo k lows
Con ol and isibili y—Main aining comple e con ol o e he LLM echnology
and accessing de ailed insigh s in o i s ope a ion
Flexibili y and scalabili y—Designing he amewo k o be lexible enough o
accommoda e u u e changes and scaling o mee g owing demands
I you wan o c ea e some hing en i ely new, you need o unde s and gene a i e AI,
he di e en ypes o LLMs, how o ain and ine- une hem, and how o use hem o
a ious asks and domains. Addi ionally, you should know how o ga he , p ocess,
and s o e da a and knowledge ha can help imp o e he quali y and di e si y o he
gene a ed ou pu s.
To apply hese concep s in eal-wo ld scena ios, you mus be able o design and
implemen di e en gene a i e s a egies, such as p omp enginee ing and RAG.
These s a egies can help con ol he beha io and ou pu o he LLMs. You mus also
ensu e ha he gene a i e models and wo k lows a e scalable, secu e, and eliable.
This can be achie ed using cloud se ices, APIs, and UIs. Expe ise in dis ibu ed sys-
ems, ML, and so wa e enginee ing is also equi ed.
29910.3 O ches a ion laye
mul ilanguage scena ios. SK p o ides a consis en and uni ied in e ace o di e en
LLM p o ide s, such as OpenAI, Azu e OpenAI, and Hugging Face.
Combining simpli ied LLM in eg a ion wi h AI-powe ed o ches a ion c ea es a
powe ul pla o m o en e p ises o use o e olu ionize hei applica ions. Fu he -
mo e, SK makes i easible o build highly ailo ed, in elligen cus ome suppo sys-
ems, implemen mo e powe ul and seman ically nuanced sea ch unc ionali y,
au oma e ou ine wo k lows, and po en ially e en aid de elope s wi h code gene a-
ion and e ac o ing asks. Addi ional de ails on SK can be ound on hei si e a
h ps://aka.ms/seman ic-ke nel.
We can illus a e his using an example. Con inuing wi h he pe heme om he
p e ious chap e s, we ha e some books abou dogs, which ange om gene al opics
o mo e speci ic medical ad ice. These books a e scanned and a ailable as PDFs and
con ain con iden ial business da a we wan o use o a ques ion–answe use case.
These PDFs a e complex documen s ha con ain ex , images, ables, and so o h.
Gi en ha we canno use eal-wo ld in e nal in o ma ion, hese PDFs ep esen p o-
p ie a y in e nal in o ma ion o an en e p ise ha equi es RAG o handle. Suppose
we wan o do ques ion–answe use cases wi h he PDFs we ha e; le ’s see how ha ’s
possible.
The i s s ep is o use SK o ins all he SDK (o he package), which is no sup-
po ed ia conda and will equi e pip ins ead. Also no e he e a e b eaking changes
wi h some o he SDKs, and we will wan o pin he SK SDK o e sion 1.2.0. You can
ins all his speci ic e sion using pip ins all seman ic-ke nel==1.2.0. A e ins all-
ing he SDK, o ge s a ed wi h SK a a high le el, we need o ollow hese s eps:
C ea e an SK ins ance, and egis e he AI se ices you wan o use, such as
OpenAI, Azu e OpenAI, o Hugging Face.
C ea e seman ic unc ions ha a e p omp s wi h inpu pa ame e s. These unc-
ions can call you exis ing code o o he seman ic unc ions.
Call he seman ic unc ions wi h he app op ia e a gumen s, and awai he
esul s. The esul s will be he ou pu o he AI model a e execu ing he p omp .
Op ionally, we can c ea e a planne o o ches a e mul iple seman ic unc ions
based on he use inpu .
SK EXAMPLE
He e is an example o implemen ing his using he SK. As we saw ea lie , SK is he
co e componen ha enables he p ocessing and unde s anding o na u al language
ex . I ’s a amewo k ha p o ides a uni ied in e ace o a ious AI se ices and
memo y s o es.
Ou example is a simple ques ion-answe ing sys em ha uses he OpenAI API o
gene a e embeddings o a collec ion o PDF documen s. Then, we use hose embed-
dings o ind documen s ele an o a use ’s que y. In ou example, i is used o
C ea ing embeddings—SK p o ides a simple in e ace o calling he OpenAI se -
ice o gene a e embeddings o he ex ex ac ed om PDF documen s. As we

300 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
know, hese embeddings a e nume ical ep esen a ions o he ex ha cap u e
i s seman ic meaning.
S o ing and e ie ing in o ma ion—We use a ec o da abase (Ch oma in ou
example) o s o e he ex and co esponding embeddings. SK calls hese pe -
sis en da a s o es “memo y” and, depending on he p o ide , has me hods o
que ying he s o ed in o ma ion based on seman ic simila i y. As we know, his
is used o ind documen s ele an o a use ’s que y.
Tex comple ion—We also use SK o egis e an OpenAI ex comple ion se ice,
which is used o gene a e comple ions o a gi en piece o ex .
NOTE We need o speci ically use Ch oma e sion 0.4.15, as a he momen ,
he e is an incompa ibili y wi h e sion 0.4.16 and highe wi h SK ha hasn’
been ixed. To do his, we can use one o he ollowing commands depending
on whe he we a e using conda o pip: conda ins all ch omadb=0.4.15 o
pip ins all ch omadb==0.4.15.
Lis ing 10.1 shows his simple applica ion p ocessing a collec ion o PDF documen s,
ex ac ing hei ex , and hen using he OpenAI API o gene a e embeddings o
each documen . These embeddings a e hen s o ed in a ec o da abase, which can be
que ied o ind documen s ha a e seman ically simila o a gi en inpu . The
load_pd s unc ion eads PDF iles om a speci ied di ec o y. I uses he PyPDF2
lib a y o open each PDF, ex ac he ex om each page, and e u n a collec ion o
hose pages.
impo asyncio
om PyPDF2 impo Pd Reade
impo seman ic_ke nel as sk
om seman ic_ke nel.connec o s.ai.open_ai impo
➥(Azu eCha Comple ion,Azu eTex Embedding)
om seman ic_ke nel.memo y.seman ic_ ex _memo y
➥impo Seman icTex Memo y
om seman ic_ke nel.co e_plugins. ex _memo y_plugin
➥impo Tex Memo yPlugin
om seman ic_ke nel.connec o s.memo y.ch oma impo
➥Ch omaMemo yS o e
# Load en i onmen a iables
AOAI_KEY = os.ge en ("AOAI_KEY")
AOAI_ENDPOINT = os.ge en ("AOAI_ENDPOINT")
AOAI_MODEL = "gp -35- u bo"
AOAI_EMBEDDINGS = " ex -embedding-ada-002"
API_VERSION = '2023-09-15-p e iew'
PERSIST_DIR = os.ge en ("PERSIST_DIR")
VECTOR_DB = os.ge en ("VECTOR_DB")
Lis ing 10.1 Q&A o e my PDFs: Ex ac ing ex om PDFs
30110.3 O ches a ion laye
DOG_BOOKS = "./da a/dog_books"
DEBUG = False
VECTOR_DB = "dog_books"
PERSIST_DIR = "./s o age"
ALWAYS_CREATE_VECTOR_DB = False
# Load PDFs and ex ac ex
de load_pd s():
docs = []
o al_docs = 0
o al_pages = 0
ilenames = [ ilename o ilename in
➥os.lis di (DOG_BOOKS) i ilename.endswi h(".pd ")]
wi h qdm( o al=len( ilenames), desc="P ocessing PDFs")
➥as pba _ou e :
o ilename in ilenames:
pd _pa h = os.pa h.join(DOG_BOOKS, ilename)
wi h open(pd _pa h, " b") as ile:
pd = Pd Reade ( ile, s ic =False)
j = 0
o al_docs += 1
wi h qdm( o al=len(pd .pages),
➥desc="Loading Pages") as pba _inne :
o page in pd .pages:
o al_pages += 1
j += 1
docs.append(page.ex ac _ ex ())
pba _inne .upda e()
pba _ou e .upda e()
p in ( "P ocessed { o al_docs} PDFs wi h { o al_pages} pages.")
e u n docs
A e we ha e ex ac ed he ex om he pages, we use he popula e_db() unc ion
o gene a e embeddings and s o e hem in Ch oma, a ec o da abase. This unc ion
akes an SK objec and goes h ough all he pages o he PDF. Each page sa es he doc-
umen ’s ex using he SK’s memo y s o e. When he sa e_in o ma ion() unc ion is
called, i au oma ically c ea es embedding o s o e in he ec o da abase, as shown in
he nex lis ing. I he e is al eady a Ch oma ec o da abase, we use ha ins ead o
making a new one.
# Popula e he DB wi h he PDFs
async de popula e_db(memo y: Seman icTex Memo y, docs) -> None:
o i, doc in enume a e( qdm_asyncio. qdm(docs, desc="Popula ing DB")):
i doc: #Check i doc is no emp y
y:
awai memo y.sa e_in o ma ion(VECTOR_DB,id=s (i), ex =doc)
excep Excep ion as e:
p in ( "Failed o sa e in o ma ion o doc {i}: {e}")
con inue # Skip o he nex i e a ion
Lis ing 10.2 Q&A o e my PDFs: Using SK and popula ing ec o da abase
302 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
# Load he ec o DB
async de load_ ec o _db(memo y: Seman icTex Memo y,
➥ ec o _db_name: s ) -> None:
i no ALWAYS_CREATE_VECTOR_DB:
collec ions = awai memo y.ge _collec ions()
i ec o _db_name in collec ions:
p in ( " Vec o DB { ec o _db_name} exis s in he
➥collec ions. We will euse his.")
e u n
p in ( " Vec o DB { ec o _db_name} does no exis in he collec ions.")
p in ("Reading he pd s...")
pd _docs = load_pd s()
p in ("To al PDFs loaded: ", len(pd _docs))
p in ("C ea ing embeddings and ec o db o he PDFs...")
# This may ake some ime as we call embedding API o each ow
awai popula e_db(memo y, pd _docs)
The p og am’s en y poin is he main() unc ion, as shown in lis ing 10.3. I se s up
he SK wi h he OpenAI ex comple ion and embedding se ices, egis e s a memo y
s o e, and loads he ec o da abase. Then, i en e s a loop whe e i p omp s he use
o a ques ion, que ies he memo y s o e o ele an documen s, and p in s he ex
o he mos ele an documen .
async de main():
# Se up Seman ic Ke nel
ke nel = sk.Ke nel()
ke nel.add_se ice(Azu eCha Comple ion(
se ice_id="cha _comple ion",
deploymen _name=AOAI_MODEL,
endpoin =AOAI_ENDPOINT,
api_key=AOAI_KEY,
api_ e sion=API_VERSION))
ke nel.add_se ice(Azu eTex Embedding(
se ice_id=" ex _embedding",
deploymen _name=AOAI_EMBEDDINGS,
endpoin =AOAI_ENDPOINT,
api_key=AOAI_KEY))
# Speci y he ype o memo y o a ach o SK.
# He e we will use Ch oma as i is easy o un i locally
# You can speci y loca ion o Ch oma DB iles.
s o e = Ch omaMemo yS o e(pe sis _di ec o y=PERSIST_DIR)
memo y = Seman icTex Memo y(s o age=s o e,
➥embeddings_gene a o = ke nel.ge _se ice(" ex _embedding"))
ke nel.add_plugin(Tex Memo yPlugin(memo y), "Tex Memo yPluginACDB")
awai load_ ec o _db(memo y, VECTOR_DB)
Lis ing 10.3 Q&A o e my PDFs: SK using Ch oma
30310.3 O ches a ion laye
while T ue:
p omp = check_p omp (inpu ('Ask a ques ion agains
➥ he PDF ( ype "qui " o exi ):'))
# Que y he memo y o mos ele an ma ch using
# sea ch_async speci ying ele ance sco e and
# "limi " o numbe o closes documen s
esul = awai memo y.sea ch(collec ion=VECTOR_DB,
➥limi =3, min_ ele ance_sco e=0.7, que y=p omp )
i esul :
p in ( esul [0]. ex )
else:
p in ("No ma ches ound.")
p in ("-" * 80)
i __name__ == "__main__":
asyncio. un(main())
In ou example, we use Ch oma as he ec o da abase. This is one o he many
op ions a ailable when using SK. We can ge mo e de ails on he lis o suppo ed ec-
o da abases a h ps://mng.bz/YVgQ. I is also impo an o no e ha suppo
be ween C# and Py hon is no a pa i y; some ec o da abases a e suppo ed ac oss
bo h, bu some a e only suppo ed in one language.
The SK is he cen al componen o p ocessing and unde s anding ex . I p o-
ides a uni ied in e ace o a ious AI se ices and memo y s o es, simpli ying he
p ocess o building complex NLP applica ions. Now le ’s swi ch gea s and see he
same example using LangChain.
LANGCHAIN
LangChain o e s a sophis ica ed amewo k designed o s eamline he in eg a ion o
LLMs in o en e p ise applica ions. This amewo k abs ac s he complexi ies o in e -
acing wi h LLMs, allowing de elope s o inco po a e ad anced NLP capabili ies wi h-
ou deep expe ise in he ield. I s lib a y o modula componen s enables he
cons uc ion o cus omized NLP solu ions easily, acili a ing a mo e e icien de elop-
men p ocess.
LangChain’s main bene i is i s abili y o wo k wi h di e en LLMs and o he na u-
al language AI se ices. This ea u e allows en e p ises o selec he bes ools o
hei pa icula needs, a oiding he d awbacks o being ied o one endo . The ame-
wo k boos s e iciency by p o iding easie in e aces and eady-made componen s o
quick deploymen and suppo s scalabili y, hus enabling p ojec s o expand smoo hly
om es ing s ages o ull- ledged applica ions.
Addi ionally, LangChain helps o lowe cos s by minimizing he amoun o special-
ized de elopmen and simpli ying in e ac ions wi h LLMs. En e p ises also gain om
he s ong communi y and suppo o he ecosys em a ound LangChain, which gi es
access o documen a ion, bes p ac ices, and coope a i e p oblem-sol ing esou ces.
This comp ehensi e app oach makes LangChain an a ac i e op ion o businesses
310 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
Fo da a o be use ul in in o ming LLM ou comes, i mus i s unde go a igo ous
cleansing and s anda diza ion p ocess o ensu e i s quali y. The a chi ec u al blue-
p in should include hese p ep ocessing ac i i ies, such as deduplica ion, no maliza-
ion, and e o ec i ica ion. In eg a ed da a quali y ools should au oma e hese asks,
p o iding LLMs wi h supe io da ase s.
Da a handling equi es s ic access con ols o p ope secu i y and compliance,
which is i al when wo king wi h sensi i e in o ma ion and ollowing egula ions. Da a
in e ac ion needs s ong au hen ica ion and au ho iza ion p o ocols. Da a go e -
nance amewo ks should speci y access igh s; u he mo e, enc yp ion should p o-
ec da a a es and in mo ion. F equen compliance assessmen s a e c ucial o
ensu ing da a quali y and p i acy. Following GDPR, HIPAA, o CCPA egula ions is
also impo an o e hical and law ul p ocessing o pe sonal da a.
A plugin enabling he in eg a ion in o sou ce sys ems is no a one- ime s a ic com-
ponen o he a chi ec u e—i changes and adap s cons an ly. As businesses use o
imp o e hei new SoRs, he a chi ec u e mus be buil o allow simple in eg a ion o
mo emen o da a sou ces. Fo his, a lexible app oach o in eg a ion is equi ed,
whe e new da a sou ces can be connec ed wi h li le change o he cu en sys em.
The a chi ec u e should be designed o suppo di e en da a o ma s and p o o-
cols. This ensu es ha da a lows seamlessly om a ious sys ems o he LLM. To
achie e his, cus om APIs may need o be de eloped, middlewa e may ha e o be used
o da a ans o ma ion, and ETL p ocesses capable o handling la ge olumes o da a
may ha e o be implemen ed.
The da a pipeline in as uc u e o gene a i e AI is complex and equi es ca e ul
planning o handle he in icacies o en e p ise-g ade da a landscapes. These will build
on exis ing ETL and da a wa ehousing in es men s bu mus ac o in he new da a
ypes o embeddings. By s a egically using a combina ion o ools o da a inges ion,
p ocessing, s o age, o ches a ion, and ML, en e p ises can build powe ul pipelines
ha p o ide hei gene a i e AI applica ions wi h a consis en low o quali y da a.
10.4.2 Embeddings and ec o managemen
In ea lie chap e s o he book, we discussed he c ucial ole o model embeddings
and ep esen a ions. This is he s age whe e he complexi y o language is dis illed
in o machine-in e p e able o ma s, speci ically ma hema ical ec o s. Tex is ans-
o med by embedding echniques and ad anced ea u e ex ac ion o ms ha esul
in a ec o space ep esen a ion o ex . These ec o s a e no a bi a y; hey encapsu-
la e he seman ic essence o wo ds, ph ases, o en i e documen s, mapping in o ma-
ion in o a comp essed, in o ma ion- ich, lowe -dimensional space.
OpenAI Codex is a p ime example o his p ocess. I can comp ehend and gene -
a e human- eadable code, making i a powe ul ool o embedding p og amming
and na u al languages. This is a signi ican ad an age o code gene a ion and au o-
ma ion asks. In con as , Hugging Face p o ides an ex ensi e sui e o p e ained
models ha a e inely uned o di e se languages and asks. They can adep ly handle
embeddings anging om b ie sen ences o in ica e documen s.

31110.4 G ounding laye
These models dis inguish hemsel es by hei abili y o g asp con ex ual wo d ela-
ionships beyond basic dic iona y meanings. By conside ing he wo ds in hei icini y,
he gene a ed embeddings p o ide a nuanced e lec ion o he wo d usage and con-
no a ions wi hin speci ic con ex s. This ea u e is essen ial o gene a i e AI applica-
ions ha aim o emula e human-like ex p oduc ion. I os e s ou comes ha a e no
only cohe en and con ex -awa e bu also seman ically p o ound.
As we saw in ea lie chap e s on RAG, a ious lib a ies a e a ailable o chunking
da a, and some o e au o-chunking capabili ies. One such lib a y, called Uns uc-
u ed (h ps://gi hub.com/Uns uc u ed-IO/uns uc u ed), p o ides open sou ce
lib a ies and APIs ha can c ea e cus omized p ep ocessing pipelines o labeling,
aining, o p oduc ion ML pipelines. The lib a y includes modula unc ions and
connec o s ha o m a cohesi e sys em, which makes i easy o inges , p ep ocess, and
adap da a o di e en pla o ms. I is also e icien a ans o ming uns uc u ed da a
in o s uc u ed ou pu s.
An al e na i e solu ion is using LangChain and SK, which we saw ea lie . These
lib a ies suppo common chunking echniques o ixed size, a iable size, o a com-
bina ion o bo h. In addi ion, you can speci y an o e lap pe cen age o duplica e a
small amoun o con en in each chunk, which helps p ese e con ex .
A e ans o ming ec o s, i is c ucial o manage hem p ope ly. Vec o da abases
specially designed o s o e indexes and e ie e high-dimensional ec o da a a e a ail-
able. Some such da abases include Redis, Azu e Cosmos DB, Pinecone, and Wea ia e,
o name a ew. These da abases help wi h quick sea ches wi hin la ge embedding
spaces, making i easy o iden i y simila ec o s ins an ly. Fo ins ance, a gene a i e AI
sys em can use a ec o da abase o ma ch a use ’s que y wi h he mos seman ically
ela ed ques ions and answe s and achie e his in a ac ion o a second.
Vec o da abases ea u e sophis ica ed indexing algo i hms enginee ed o de ly
a e se high-dimensional e ains wi hou alling p ey o he “cu se o dimensional-
i y” [5]. This a ibu e ende s hem excep ionally aluable o applica ions such as
ecommenda ion engines, seman ic sea ch pla o ms, and pe sonalized con en cu a-
ion, whe e pinpoin ing ele an con en quickly is c i ical.
Vec o da abases o e mo e han jus speed; hey also p o ide accu acy and ele-
ance. Combining hese da abases allows AI models o espond quickly and p ecisely
o use inqui ies based on hei lea ned con ex . P ope index managemen is c ucial,
including asks such as index c ea ion, upda e igge s, e esh a es, complex da a
ypes, and ope a ional ac o s (e.g., index size, schema design, and unde lying com-
pu e se ices). Cloud-based solu ions such as Azu e AI Sea ch and Pinecone can e i-
cien ly manage hese demands in a p oduc ion en i onmen .
The p ocess o ans o ming ex ual da a in o a o ma ha AI can handle has wo
s ages: embedding and ec o da abase managemen . This con e sion is essen ial o
gene a i e AI’s in elligence, enabling i o unde s and and engage wi h he wo ld
meaning ully and in a scalable manne . The e o e, ca e ully choosing embedding
echniques and ec o da abases is a echnical necessi y and a key ac o in he success
o gene a i e AI applica ions. When choosing LLMs, ela ed ec o s o age and
312 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
e ie al engines, and embedding models, en e p ises mus conside he da a size, o i-
gin, change a e, and scalabili y needs.
10.5 Model laye
The model laye is he ounda ion o AI cogni i e capabili ies. I in ol es a se o mod-
els, including ounda ional LLMs ha p o ide gene al in elligence, ine- uned LLMs
specialized o speci ic asks o domains, model ca alogs hos ing and managing access
o a ious models, and SLMs ha o e ligh weigh , agile al e na i es o ce ain
applica ions.
The signi icance o his laye lies in i s design, as i o ms he co e p ocessing uni s
o he GenAI app s ack. I allows a scalable and lexible app oach o AI deploymen and
can e icien ly add ess a ious asks by di e en ia ing be ween ounda ional, ine- uned,
and small models. This ensu es ha he a chi ec u e can ca e o di e se use cases, op i-
mize esou ce alloca ion, and main ain high pe o mance ac oss di e en scena ios.
10.5.1 Model ensemble a chi ec u e
Gene a i e AI employs a model ensemble, which combines mul iple ML models o
enhance pe o mance and eliabili y. This app oach akes ad an age o he indi idual
s eng hs o each model, minimizing hei weaknesses. Fo example, one model may
be g ea a gene a ing echnical con en , while ano he may be be e a c ea i e s o y-
elling. By assembling hese models, an applica ion can be e ca e o a wide ange
o use eques s wi h g ea e accu acy. To c ea e an e ec i e model ensemble o gen-
e a i e AI, he a chi ec u e should include
Model selec ion—C i e ia o choosing which models o include in he ensemble,
o en based on hei pe o mance, he di e si y o aining da a, o hei a ea o
specializa ion.
Small language models
SLMs such as Phi-3 and O ca 2 a e designed o o e ad anced language p ocessing
capabili ies wi h ewe pa ame e s han la ge models. Bo h models a e pa o a
b oade ini ia i e o make powe ul language p ocessing ools mo e accessible and
e icien , enabling mo e ex ensi e esea ch and applica ion possibili ies. They ep e-
sen a signi ican s ep in he e olu ion o AI language models, balancing capabili y
wi h compu a ional e iciency.
Phi-3, Phi-2, and O ca 2 a e smalle -scale language models de eloped by Mic oso ,
o e ing ad anced language p ocessing wi h ewe pa ame e s. Phi-3, which is a suc-
cesso o Phi-2, is a amily o models in a ious sizes (mini, 3.8B; small, 7B; medium,
14B pa ame e s). Phi-2, wi h 2.7 billion pa ame e s, is e icien and ma ches la ge
models in pe o mance, while O ca 2, a ailable in 7- and 13-billion-pa ame e e -
sions, excels in easoning asks and can ou pe o m much la ge models. Bo h a e
designed o accessibili y and compu a ional e iciency, enabling b oade esea ch
and applica ion in AI language p ocessing.
31310.5 Model laye
Rou ing logic—Rou ing logic is he mechanism o de e mining which model o
use o a gi en inpu o how o combine ou pu s om mul iple models.
API in eg a ion—APIs a e he main condui s h ough which applica ions in e -
ac wi h LLMs. API in eg a ion becomes complex when dealing wi h an ensem-
ble o models as in e ac ions wi h mul iple endpoin s mus be managed. The
a chi ec u e should conside API in eg a ion o h o ling and a e limi s, e o
handling, and caching esponses.
Scalabili y and edundancy—Scalable design accommoda es g owing use bases
and spikes in demand. Load balancing and he use o API ga eways can help dis-
ibu e a ic e ec i ely. Redundancy is equally c i ical; hus, ha ing mul iple
egions o model deploymen s ensu es he applica ion emains unc ional.
Queuing and s eam p ocessing—Queuing and s eam p ocessing handle asyn-
ch onous asks and manage wo kloads; message queues and s eam p ocessing
se ices can be u ilized, which ensu es ha he sys em is no o e whelmed
du ing peak imes and ha asks a e p ocessed in an o de ly way.
Figu e 10.5 is an example o implemen ing Phi-2 as a classi ie . We use Phi-2, which
uns locally and as , o iden i y he use ’s in en when asking a ques ion. Con inuing
wi h he opic o pe s and dogs, we asked Phi-2 he in en o he ques ion and whe he
i had any hing o do wi h dogs. I i was i ele an o he cu en opic (i.e., dogs), we
asked GPT-4 o answe .
Figu e 10.5 Classi ie using mul iple models
Lis ing 10.6 shows an example o implemen ing a simple classi ie using a ligh weigh
model and hen, based on he ques ion’s in en , igu ing ou which model o call.
He e, we use Phi-2, a esea ch SML om Mic oso , as a classi ie o de e mine
whe he a ques ion is ela ed o dogs. The Phi-2 model is a ans o me -based model,
ained o unde s and and gene a e human-like ex . I is used he e as a i s -pass il e
o de e mine he ques ion’s in en .
Use
ques ion
Ques ion abou
dogs?
Classi ica ion Fo mula e
p omp
Phi 2 — In en classi ie
(SLM)
Response
o use
Azu e OpenAI
GPT-4 model
Yes
Local in e ence
314 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
The unc ion check_dog_ques ion() akes a ques ion as inpu and cons uc s a
p omp o ask he Phi-2 model whe he he e’s any hing abou dogs in he ques ion. I
Phi-2 de e mines ha he ques ion is abou dogs, he unc ion e u ns T ue. This
could igge a mo e expensi e GPT-4 model o gene a e a mo e de ailed esponse. I
he ques ion is no abou dogs, he unc ion e u ns False, and he mo e expensi e
model would no ha e o be used. We need o ensu e ha he ollowing packages a e
ins alled be o e unning his code: pip ins all ans o me s==4.42.4 o ch=
=2.3.1.
impo o ch
om ans o me s impo Au oModelFo CausalLM, Au oTokenize
impo openai
...
model = Au oModelFo CausalLM. om_p e ained("mic oso /phi-2",
o ch_d ype="au o",
us _ emo e_code=T ue)
okenize = Au oTokenize . om_p e ained("mic oso /phi-2",
us _ emo e_code=T ue)
de check_dog_ques ion(ques ion):
p omp = "Ins uc : Is he e any hing abou dogs in he
➥ques ion below? I yes, answe wi h 'yes' else
➥'no'. nQues ion:{ques ion} nOu pu : "
inpu s = okenize (p omp , e u n_ enso s="p ",
e u n_a en ion_mask=False,
add_special_ okens=False)
ou pu s = model.gene a e(**inpu s,
max_leng h=500,
pad_ oken_id= okenize .eos_ oken_id)
ex = okenize .ba ch_decode(ou pu s)[0]
egex = "^Ou pu : Yes$"
ma ch = e.sea ch( egex, ex , e.MULTILINE)
i ma ch:
e u n T ue
e u n False
de handle_dog_ques ion(ques ion):
p in ( "This is a esponse om RAG and GPT4")
# Call OpenAI's GPT-4 o answe he ques ion
openai.api_key = "YOUR_API_KEY"
esponse = openai.Comple ion.c ea e(
…
)
e u n esponse
Lis ing 10.6 Using Phi-2 as an in en classi ie
31510.5 Model laye
i __name__=="__main__":
# Loop un il he use en e s "qui "
while T ue:
# Take use inpu
use _p omp = inpu (
"Wha is you ques ion (o ype 'qui ' o exi ):")
i check_dog_ques ion(use _p omp ):
p in (handle_dog_ques ion(use _p omp ))
else:
p in ("You did no ask abou dogs")
The app oach employs a small model, such as Phi-2, wi h much less capabili y o
mo e e icien use o esou ces, as he mo e expensi e GPT-4 model is used only when
necessa y. This app oach can jus as easily be expanded o use mo e han one model.
This oy example could be be e i we used a mo e powe ul LLM, such as a
smalle GPT-3 model. Figu e 10.6 shows ano he example o using a ine- uned GPT-3
as a classi ie o help unde s and he use ’s goal. This is o an en e p ise cha bo ha
can answe ques ions on bo h s uc u ed and uns uc u ed da a. I can answe ques-
ions abou Mic oso ’s su ace de ices based on he use ’s pe sona. The e is ic i ious
sales in o ma ion in a SQL da abase ha a salespe son can cha wi h, and he e is
uns uc u ed da a ha can answe echnical suppo ques ions.
Figu e 10.6 En e p ise Q&A bo —High-le el o e iew
The bo uses a RAG pa e n and can answe ques ions using in o ma ion om bo h
s uc u ed and uns uc u ed sys ems based on he use ’s in en ion. The s uc u ed
da a has sales in o ma ion (wi h ake da a), and he uns uc u ed da a is a c awl o di -
e en o ums and o icial si es ela ed o Su ace de ices. Lis ing 10.7 p esen s a high-
le el iew o he a chi ec u e.
Use
ques ion
Ques ion [In en
+ Domain]
Classi ica ion Fo mula e
p omp
Azu e OpenAI se ice —
Fine- uned GPT in en
classi ie
Response
o use
Azu e OpenAI
Cha GPT model
S uc u ed and uns uc u ed
knowledge bases

316 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
The o ches a o uses GPT-3 o implemen he in en classi ie and can help selec
he bes pa h based on he bo ’s ques ion. Then, sui able knowledge sou ces a e
applied. This complica ed wo k low shows much o wha an o ches a o would do in
a eal-wo ld en e p ise si ua ion. The sales da a is s o ed in a SQL da abase, and GPT
is also used o build he SQL que y agains he schema o un, depending on he
use ’s que y. Wha is e y in e es ing is ha he LLM is in oked mul iple imes in he
low, i s o unde s and he in en o he ques ion, and hen, depending on he pa h,
GPT also c ea es he SQL que y o execu e. I s esul s a e passed o he p omp o mu-
la ion o in oke he LLM again o c ea e he esponse o he use . This mainly shows
ha along he low, we can in oke he igh model based on he poin in ime and o
wha i is needed, ac o ing in he model capabili y and associa ed compu a ional con-
s ain s and cos s.
y:
esponse = openai_clien .cha _comple ions(
messages=message_lis ,
openai_se ings=Cha Comple ionsSe ings(
**bo _con ig["app oach_classi ie "]["openai_se ings"]
),
api_base= "h ps://{AZURE_OPENAI_SERVICE}.openai.azu e.com",
api_key=AZURE_OPENAI_KEY,
)
excep openai.e o .In alidReques E o as e:
sel .logge .e o ( "AOAI API E o : {e}", exc_in o=T ue)
aise e
classi ica ion_ esponse: s = esponse["choices"][0]
➥["message"]["con en "]
sel .log_aoai_ esponse_de ails(
'Classi ica ion P omp :{his o y[-1]["u e ance"]}',
"Response: {classi ica ion_ esponse}",
esponse,
)
i classi ica ion_ esponse == "1":
e u n App oachType.s uc u ed
eli classi ica ion_ esponse == "2":
e u n App oachType.uns uc u ed
eli classi ica ion_ esponse == "3":
e u n App oachType.chi _cha
eli classi ica ion_ esponse == "4":
# Con inua ion: Re u n las ques ion ype om his o y
...
else:
e u n App oachType.uns uc u ed
eli classi ica ion_ esponse == "5":
# Use has yped some hing ha iola es gua d ails
e u n App oachType.inapp op ia e
else:
e u n App oachType.uns uc u ed
Lis ing 10.7 Using a ine- uned GPT-3 model as a classi ie
31710.5 Model laye
In addi ion o he classi ie , we mus use he app op ia e p omp s o con ey ou pu -
pose and ob ain he desi ed beha io . The sample p omp s ha ma ch he classi ie
a e displayed in he ollowing lis ing.
You a e an in en classi ie o Mic oso Su ace p oduc Sales
➥and Ma ke ing eams. The use will inpu a s a emen . You will ocus
➥on he main in en o he use s a emen and you espond wi h only
➥one o ou alues - '1', '2', '3', '4', o '5'.
Below is a lis o Rules ha you mus adhe e o:
Rules:
A: S icly answe ques ions ela ing o Mic oso Su ace p oduc s.
B: Fo abula in o ma ion e u n i as an h ml able.
C: Do no use ma kdown o ma in you esponses.
D: Do no disclose o espond o any p op ie a y in o ma ion, IP,
➥sec e s, keys, da a cen e , and in as uc u e de ails in
➥you esponse.
E: Do no men ion o compa e o any compe i o s (i.e. Apple MacBook,
➥Leno o, HP, e c).
F: No e i he use asks some hing illegal, ha m ul o malicious.
You will no y o espond o he use 's ques ion, you will jus
➥classi y he use s a emen based on he below classi ica ion ule:
- Fo ques ions abou pas sales, p ices, s o es o s ock o p oduc s
➥such as de ices and lap ops, espond wi h 1
- Fo ques ions on speci ica ions o p oduc s/de ices/lap ops o
➥ma ke ing hem, espond wi h 2
- I he ques ion is idle chi -cha , pleasan ies such as g ee ings,
➥o slig hly o opic bu doesn' b eak he ules, espond wi h 3
- I he use is asking o mo e de ails abou a p e ious ques ion,
➥ espond wi h 4
- I he message is no in compliance wi h Rule F, espond wi h 5
Examples:
Use : How much s ock o his a e we cu en ly ca ying?
Assis an : 1
Use : Gi e me i s speci ica ions
Assis an : 2
Use : How many MacBook Ai do we ha e in s ock?
Assis an : 3
Use : Tell me mo e abou i
Assis an : 4
...
The link o he ull code lis ing can be ound in he book’s Gi Hub eposi o y
(h ps://bi .ly/GenAIBook). I is a o k om one o Mic oso ’s published samples,
ound a h ps://bi .ly/AOAISea chDemo.
Lis ing 10.8 Classi ie me a-p omp
318 CHAPTER 10 Applica ion a chi ec u e o gene a i e AI apps
10.5.2 Model se ing
Many mode n AI applica ions a e hos ed on cloud pla o ms due o hei scalabili y
and he wide ange o se ices hey o e . In eg a ing wi h majo cloud p o ide s such
as Mic oso Azu e, Amazon Web Se ices, o Google Cloud Pla o m enables de elop-
e s o use a secu e global ne wo k o da a cen e s, ML managed se ices, and ools o
applica ion moni o ing and managemen . The e o e, many en e p ises use one o he
LLMs hos ed in he cloud, which is exposed ia an API. This means ha he cloud p o-
ide s ha manage he model se e o scale up o down model in e ence. I some
models a e hos ed on-p emise, he laye mus add ess model ope a ions wo king wi h
LLMOps.
The model laye a chi ec u e should p o ide a s a egic amewo k o using mul i-
ple LLMs o c ea e a obus , e sa ile, and scalable applica ion. This in ol es ca e ul
planning a ound model selec ion and API managemen , while ensu ing secu i y and
compliance in da a handling. The a chi ec u e should be lexible enough o adap o
new models and APIs as hey become a ailable.
10.6 Response il e ing
In mos cases, an applica ion should no sha e he aw gene a ion om he model
wi h he end use ; i should go h ough a p ocessing s ep o help manage and il e
any sensi i e de ails— his is whe e he p ocessing laye helps, and a key esponsibili y
o his laye is o manage he LLM ou pu .
The esponse il e ing laye is asked wi h quali y assu ance and con en mode a-
ion, c ucial o main aining us in GenAI applica ions. I in ol es using classi ie s
and NLP ools o sc een he ou pu s o accu acy, bias, and app op ia eness.
As we ha e seen, LLM ou pu can a y signi ican ly, anging om simple ex o
complex da a s uc u es. Managing hese ou pu s equi es a sys ema ic app oach so
hey mee he applica ion’s s anda ds and a e p esen ed o he use in a use ul o ma .
These pos p ocessing s eps include a ew a eas, as shown in igu e 10.7.
Figu e 10.7 Response il e ing s ages
Con en mode a ion elies on RAI p ac ices o mi iga e he po en ial isks o gene a-
i e AI models, such as biased, o ensi e, o misleading con en , cybe , p i acy, legal,
pe o mance, and in ellec ual p ope y isks. We need o adop RAI p ac ices o use
Fo ma ing O ensi e con en
de ec ion
Classi ica ionRe inemen
Making LLM ou pu s
use iendly depending
on UX and ask
Re ining con en wi h NLP
— summa iza ion, s yle,
one, e c. o be e
eadabili y and cohe ence
Fil e ing con en using
p ede ined c i e ia o
so con en by ype
Fil e ing ha m ul con en
ha iola es s anda ds
319Summa y
he powe o gene a i e AI. RAI is essen ial o he ou pu p ocessing laye o add ess
bo h applica ion- and en e p ise-le el isks, such as egula o y and compliance
equi emen s. In addi ion, RAI can enhance o he aspec s, such as p i acy, explain-
abili y, and ai ness.
The e a e many ools and amewo ks o s a wi h. Fo example, Mic oso ’s In e -
p e ML (h ps://in e p e .ml/) and Fai lea n (h ps:// ai lea n.o g/) a e open
sou ce oolki s ha help de elope s explain and imp o e he ai ness o ML models.
IBM’s AI Fai ness 360 is ano he open sou ce oolki ha helps de ec and educe bias
in ML models. We’ll examine RAI in mo e dep h la e in he book.
Ou pu and pos p ocessing a e c ucial o ensu ing he usabili y and sa e y o con-
en gene a ed by LLMs. The a chi ec u e should p o ide a obus amewo k o
e ining and managing ou pu s, including o ma ing, con en classi ica ion, alida-
ion, and caching. Quali y assu ance, bo h au oma ed and use d i en, mus be an
in eg al pa o he p ocess o main ain high s anda ds and imp o e o e ime.
This chap e shows how GenAI can be in eg a ed in o en e p ise applica ions
using he new GenAI app s ack and associa ed applica ion a chi ec u e. We ha e also
discussed he ole o he Cen e o Excellence in acili a ing his in eg a ion and
add essing he echnical, cul u al, and e hical challenges in ol ed. Howe e , building
an AI solu ion is only he i s s ep; deploying i o p oduc ion and scale equi es di -
e en skills and ools. The nex chap e will explo e wha i akes o ope a ionalize
gene a i e AI solu ions and ensu e hei eliabili y, pe o mance, and secu i y. We will
also look a some o bes p ac ices and amewo ks o managing he AI li ecycle and
deli e ing alue o he end-use s and s akeholde s.
Summa y
Copilo demons a es how gene a i e AI a chi ec u e can build en e p ise
applica ions and solu ions. I uses a di e en applica ion s ack ha wo ks wi h
copilo s o c ea e he new en e p ise a chi ec u e s ack. This s ack is o GenAI
apps, which use Copilo as a coun e pa o he LAMP s ack.
The GenAI app s ack includes ou laye s ha coope a e o make he applica-
ion s ack unc ion— he model, o ches a ion, g ounding, and esponse il e -
ing laye s.
The o ches a ion laye is one o he c i ical and ounda ional componen s o
he GenAI s ack. I handles and o ganizes di e en p ocesses, AI se ices, and
pla o ms o enable a dependable and cohe en expe ience.
The a ea o o ches a ion amewo ks is new and e ol ing, wi h many changes
and inno a ions aking place. Some o he amewo ks ha a e mo e widely
used oday a e SK, LangChain, and LlamaIndex.
By using plugins han can handle he in icacies o he sou ce sys ems, hei
p o ocols, and o he de ails, he g ounding laye acili a es da a in eg a ions
and p ep ocessing o RAG deploymen s in he en e p ises. I also o e sees he
embeddings and he ela ed ec o da abases.
326 CHAPTER 11 Scaling up: Bes p ac ices o p oduc ion deploymen
The e a e also some cons ain s and challenges o conside when using managed
LLMs ia an API, as ou lined in able 11.2.
In summa y, while managed LLMs ia API o e se e al bene i s, hey also come wi h
ce ain conside a ions. Whe he hey a e he igh op ion o you GenAI applica ion
depends on you needs and cons ain s. I you equi e a high le el o cus omiza ion,
ha e s ic da a p i acy equi emen s, o need o un you model o line, hen an
on-p emise o hyb id deploymen migh be mo e sui able. Howe e , a managed LLM
ia API could be a good choice i you alue ease o use, con inuous upda es, and au o-
ma ic scaling.
11.4 Bes p ac ices o p oduc ion deploymen
To use GenAI applica ions, a comp ehensi e app oach is equi ed ha in ol es ca e-
ul planning and execu ion o ensu e scalabili y, eliabili y, and secu i y. When using
LLMs in you applica ion, you need o hink abou aspec s such as LLMOps, obse -
abili y, and ooling o handle he li ecycle o you applica ion e ec i ely. In addi ion,
you need o conside o he aspec s such as model se ing and managemen , eliabili y
and pe o mance conside a ions, and secu i y and compliance conside a ions. These
a eas a e impo an o ensu ing ha he applica ion does wha i should and ollows
high eliabili y, secu i y, and compliance s anda ds.
Scalabili y Managed LLMs ia API can handle high olumes o eques s and scale au oma ically
based on demand, simila o o he cloud-based se ices.
Model complexi y LLMs a e eno mously complex ML models and can p esen se e al challenges o
en e p ises, pa icula ly hose wi hou ex ensi e expe ience in AI and ML. Managed
se ices o load his complexi y o he p o ide , exposing he in e ence ia an API.
Table 11.2 Conside a ions wi h managed LLMs
Conside a ions Desc ip ion
Cos The cos o using a managed LLM ia API can a y signi ican ly based on usage. While
some p o ide s o e ee ie s, mo e ex ensi e use can incu signi ican cos s.
Dependency Using a managed LLM ia API, you depend on he p o ide o he model and he in a-
s uc u e. I he p o ide expe iences down ime o discon inues he se ice, his could
a ec you applica ion.
Da a p i acy Da a is sen o he p o ide ’s se e s o p ocessing using a managed LLM ia API,
which can aise p i acy conce ns, especially ega ding sensi i e da a.
Limi ed
cus omiza ion
While managed LLMs ia API o e s ease o use, hey ypically o e limi ed cus omiza-
ion op ions. You’ e limi ed o he capabili ies and con igu a ions p o ided by he API
and can’ modi y he unde lying model.
Table 11.1 Ad an ages o managed LLMs (con inued)
Ad an ages Desc ip ion

[Document text truncated for crawler view.]

Related note

Why institutions use Plag.ai for originality review, entry 33
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by doctoral supervisors in universities, research institutes, colleges, schools, and publishing workflows, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also clearer documentation of academic decisions, reduced manual checking effort, and clearer separation between similarity and misconduct. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For course assignments, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai