scieee Science in your language
[en] (orig)

HTR Input & Correction Manual

Author: Meelen, Marieke; Griffiths, Rachael M.
Publisher: Zenodo
DOI: 10.5281/zenodo.17257009
Source: https://zenodo.org/records/17257009/files/MeelenGriffiths2025_PaganTibet_HTR_manual.pdf
HTR Inpu & Co ec ion Manual
pagan ibe .com
pagan ibe @gmail.com
Ve sion 2.1 (public): 25 Sep embe 2025
Ma ieke Meelen
Rachael G i i hs
Funded by he Eu opean Union (ERC, Pagan Tibe , 101097364). Views and opinions
exp essed a e howe e hose o he au ho (s) only and do no necessa ily e lec hose
o he Eu opean Union o he Eu opean Resea ch Council Execu i e Agency. Nei he
he Eu opean Union no he g an ing au ho i y can be held esponsible o hem.”
HTR inpu & co ec ion manual
Ve sion 2.1 (public)1 - Ma ieke Meelen & Rachael G i i hs
This manual is in ended o hose in ol ed in he Handw i en Tex Recogni ion (HTR)
wo k low o he ERC p ojec "PaganTibe ", in pa icula hose ocusing on he manual
inpu and co ec ion o ansc ip ions, i.e. he "T ansc ip ion Team" (TNI monks). De ails
on he p ojec -in e nal side and backend o he sys em a e no discussed he e, bu
a ailable elsewhe e o he ele an p ojec membe s and associa es, i.e. he in e nal
(ERC p ojec ) and ex e nal (BDRC & Esukhia) "HTR Teams".
1. P ojec goals & imeline
2. Who's who?
3. Ge ing s a ed wi h HTR inpu & co ec ion
3.1 Ins alling Fi e ox
3.2 Ins alling Ch ome
3.3 Ins alling he Monlam keyboa d
3.4 Ge ing a gmail accoun
3.5 Logging in o OpenPecha
3.6 Sign ou and Log in again
4. HTR inpu and co ec ion
4.1 Basic unc ionali y
4.1.1 Manual inpu ask - Team A
4.1.2 Co ec ion ask - Team B
4.1.3 Tasks o Team Leade s
4.2 Accu acy
4.3 Diploma ic ansc ip ions
4.3.1 Special cha ac e s?
4.3.2 Spelling mis akes?
4.3.3 Con ac ions?
4.3.4 Non-exis ing Tibe an Unicode s acks?
4.4 Di e en sc ip s
4.5 Gaps & s ains
4.6 Unknown cha ac e s
5. Wo k low and paymen
1 PaganTibe is an ERC- unded Ad anced G an p ojec (ERC, PaganTibe , 101097364) led by P o esso Cha les Ramble
(PI) in he Ho izon Eu ope amewo k, hos ed by he École P a ique des Hau es É udes (EPHE), PSL in Pa is.
Ⓒ Ma ieke Meelen & Rachael G i i hs (PaganTibe membe s), Sep embe 2025. This documen is licensed unde CC BY-
NC-SA 4.0.
2
Ve sion con ol o sha ed PDFs
Ve sion 1.0
Sep embe 2023 (20230921)
only sec ions 1-3 (MM)
Ve sion 1.1
Oc obe 2023 (20231002)
i s ull d a (MM)
Ve sion 2.0
Ap il 2024 (20240402)
e ised sc eensho s wi h new ool (RG+MM)
Ve sion 2.1
Sep embe 2025
(20250929)
mino upda es o imeline (RG)
→ i s public e sion
3
1. P ojec goals & imeline
The HTR asks a e pa o he la ge ERC p ojec "PaganTibe " whose o e all goals a e o:
1. Documen Leyu and Baima (and ela ed) adi ional i uals
2. Digi ise ALL eco dings and manusc ip s → HTR pa !
3. C ea e diploma ic & c i ical sea chable e sions o ALL ex s
4. Add philological, linguis ic, and o he in o ma ion o ALL ex s
5. C ea e de ailed, c i ical edi ions o selec ed ex s
6. Iden i y ea u es ha cha ac e ise p e-Buddhis , Pagan adi ion(s)
7. Use hese ea u es o econs uc ne wo ks o Pagan adi ion(s)
The Handw i en Tex Recogni ion (HTR) pa highligh ed in g een is essen ial o he es
o he p ojec , as he manusc ip s all need o be digi ised in a sea chable way, i.e. e- ex s
need o be c ea ed o all images we ha e o he manusc ip s be o e we can p oceed wi h
any o he wo k s eps. We o iginally had app oxima ely 70,000 images o ansc ibe in ~9
mon hs, bu acqui ed mo e ma e ials in he cou se o he yea and now ha e access o
o e 100,000 images.
To speed up he ansc ip ion ask, we use Handw i en Tex Recogni ion ools like
T ansk ibus, which uses neu al ne wo ks o c ea e ansc ip ion models ha can
au oma ically ansc ibe ou manusc ip s. In o de o c ea e hese models, we need o
inpu ansc ip ions by hand. This is he "Manual Inpu " ask, which Team A will ocus on.
The mo e manual inpu , he be e he esul s, bu he esul s a e ne e 100% co ec .
The e o e, we also need o check he esul s and co ec any mis akes. This is he
"Co ec ion" ask, which Team B will ocus on. Bo h asks and he gene al wo k low will
be desc ibed in de ail in sec ion 4 below.
Because he models won' ha e much inpu da a in he beginning, he esul s will
p obably no be e y good a i s , so i will ake mo e ime o do co ec ions. We will
he e o e spli ou images up in o smalle ba ches and ocus on one ba ch pe inpu &
co ec ion ound. A e e e y ound, we will use he new inpu and he co ec ed
ansc ip ions o imp o e he model o make i wo k be e o he nex ound. The
co ec ion ask should hen become easie o e ime, which is good, because he e is
much mo e co ec ion o do, so in he end bo h eams will spend a simila amoun o
ime.
Bo h asks we e comple ed in 12 mon hs, s a ing 1 Ap il 2024 un il he inal deadline o
30 Ap il 2025.
4
2. Who's who?
The e a e a ious people in ol ed in HTR inpu & co ec ion o he ERC p ojec :
ERC P ojec PI - Cha les Ramble (CR)
In e nal HTR eam - Ma ieke Meelen (MM) & Rachael G i i hs (RG)
Ex e nal HTR eam - Élie Roux (BDRC), NT, Tashi Tse ing & Tamdin (Esukhia)
T ansc ip ion Team - Kemi Tsewang & TNI monks di ided in o wo eams:
- Team A 'Manual inpu ': She ab Chokgyal (Team Leade ), Tsugphud Woese ,
Tsul im Palsang, Palgyi Wangchuk, Tsewang D ukgyal
- Team B 'Co ec ion': Tsul im Gyal sen (Team Leade ), T i suk Lhundup,
Tsukphud Rabsal, Tsognyi Gya so, She ap Wose
I you ha e any ques ions, ask you Team Leade i s . They can answe o pass he
ques ion on o ou local con ac in Ka hmandu, Kemi Tsewang. Kemi will be wo king
closely oge he wi h he in e nal HTR eam (Ma ieke and Rachael) who will be on
s andby o help wi h any possible issues o hey can pass on echnical bugs o he
ex e nal HTR eam i hey a ise.
3. Ge ing s a ed wi h HTR inpu & co ec ion
You only need he ollowing o ge s a ed:
1. Any compu e /lap op wi h Fi e ox OR Ch ome wi h he Monlam keyboa d and wi i
access;
2. A Gmail accoun o log in o OpenPecha Tools, de eloped by Esukhia.
3.1 Ins alling Fi e ox
Go o h ps://www.mozilla.o g/en-GB/ i e ox/new/ o download Fi e ox and ollow he
ins uc ions:

5
3.2 Ins alling Ch ome
Go o h ps://www.google.com/in l/en_uk/ch ome/d /download/ o download Ch ome and
ollow he ins uc ions:
3.3 Ins alling he Monlam keyboa d
Fo inpu ing new manual ansc ip ions and o co ec ing au oma ically gene a ed
ansc ip ions, you'll need o use he Monlam inpu keyboa d, which wo ks on bo h
Windows and Unix ope a ing sys ems. This keyboa d is necessa y as i is he only op ion
o ce ain special symbols and abb e ia ions a ailable in Tibe an Unicode. A ull guide
on he Monlam keyboa d and how o ins all i can be ound he e:
h ps://d i e.google.com/ ile/d/1Mzzq2_l8Spi0N xs9eWCJl3u3Lie9CUS/ iew.
A lis o examples and how o use i , especially o con ac ions and o he non- egula
cases can be ound in sec ion 4.3 below.
3.4 Ge ing a Gmail accoun
In o de o use OpenPecha, whe e he HTR inpu & co ec ion ool is hos ed, you need a
gmail accoun . Go o www.gmail.com and ollow he ins uc ions o c ea e a ee
accoun :
6
Please le Kemi Tsewang know wha you (new) Gmail add ess is and which eam (A-
Inpu o B-Co ec ion) you belong o so ha hey can g an you access o he
OpenPecha Tools. I you don' know you eam and/o ha e no p e e ence o a eam,
we'll assign you o one.
3.5 Logging in o OpenPecha
Once you accoun is ac i a ed, go o h ps://pecha. ools/ and click on "LOG IN":
7
Then click on he blue Google bu on and sign in wi h you Gmail accoun :
Check i you Google image/icon/ini ials appea on he op igh co ne and hen click on
"HTR":
8
Check i you “use ” and “g oup” names a e displayed co ec ly in he op le panel a e
"PROJECT INFO":
I all his is co ec , you can now s a you HTR ask. I you don' see you use name
and/o g oup name, please go back o he log in s age and make su e you a e logged in
wi h he co ec email.
3.6 Sign ou and log in again
I you ha e inished all you asks and/o he sys em says 'No asks assigned', please sign
ou by clicking on he ci cle on he op igh co ne and hen 'Sign ou '. You will ge back
o he i s page and i you log in again you will ge new asks:
15
membe o he wide HTR eam) o miss any mis akes, which is why we ha e e e y hing
checked mo e han once.
The Team Leade wo king on Task A will see an image o an en i e page wi h a
ansc ip ion o he page in he ex box unde nea h. The Team Leade needs o check
his ansc ip ion line by line o make su e i is 100% accu a e. I he e a e many mis akes,
he Team Leade should make a no e o he ype o mis akes and click on he ed c oss
bu on o ejec he ansc ip ion. I he e a e only 1 o 2 small mis akes, hey should
co ec hese and hen when he ansc ip ion is accu a e, click on he g een bu on o
submi .
The Team Leade wo king on Task B will see an image o one line wi h a ansc ip ion in
he ex box unde nea h. The Team Leade needs o check his ansc ip ion o make
su e i is 100% accu a e. I he e a e many mis akes, he Team Leade should make a no e
o he ype o mis akes and click on he ed c oss bu on o ejec he ansc ip ion. I
he e a e only 1 o 2 small mis akes, hey should co ec hese and hen when he
ansc ip ion is accu a e, click on he g een bu on o submi .

16
Fo bo h Team Leade s, new images and ansc ip ions will au oma ically appea a e
clicking he g een o ed bu on. No e ha some may appea wice wi h o wi hou a
changed ansc ip ion. Once you' e submi ed, you will also see he numbe s in he
a ge p ocess on he le panel change, keeping ack o you wo k since you logged in
oday, you o al amoun o wo k and how many ansc ip ions you accep ed o ejec ed:
4.2 Accu acy
Fo bo h asks i is ex emely impo an o be p ecise and accu a e: he ansc ibed ex
should e lec exac ly wha i says in he manusc ip and no hing mo e/less. E en i you
hink he e is a mis ake in he manusc ip , you should NOT ix he mis ake, bu ins ead,
ansc ibe he syllable exac ly as i is w i en in he manusc ip WITH he mis ake. Fo he
p ojec i is e y impo an ha we c ea e eTex s ha a e exac ly like he manusc ip s.
The sec ions below show how o deal wi h spelling mis akes, con ac ions, and non-
17
legi ima e s acks o Tibe an cha ac e s ha canno easily be ep esen ed wi h he
Tibe an Unicode.
No e ha i he e a e oo many mis akes in you ansc ip ion, you Team Leade canno
app o e i . I his is he case, you Team Leade can gi e you eedback on wha you did
w ong so you can do i co ec ly nex ime. I he e a e only some mino mis akes in you
ansc ip ion, you Team Leade can decide o ix hese and app o e i . I is impo an o
Team Leade s o make a no e o any issues o egula mis akes o help hei eam
membe s make hei wo k as accu a e as possible. This will mean eam membe s can
lea n mo e quickly, which ul ima ely means hey and he Team Leade will be able o
ha e mo e app o ed ansc ip ions.
4.3 Diploma ic ansc ip ions
Fo he p ojec i is o c ucial impo ance o ge ansc ip ions ha ep esen exac ly
wha we see in he manusc ip . This ype o ansc ip ion is called a "diploma ic
ansc ip ion". In diploma ic ansc ip ions, we do NOT y o ix o imp o e he ex in he
manusc ip in any way.
Fo example, i we ind a common wo d like !ལ་པོ་ 'king' wi h a missing na o like !ལ་པ་, hen
we should ansc ibe his exac ly like he manusc ip , so wi hou he na o: !ལ་པ་. The same
goes o o he spellings ha may look s ange o con ac ed o ms ha canno be
ep esen ed well in Tibe an Unicode: all o hese need o look exac ly like (o as close as
possible o) he o iginal in he manusc ip . The sec ions below will gi e mo e examples
o Special cha ac e s, Spelling mis akes, Con ac ions and Non-exis ing Tibe an Unicode
s acks. Sec ion 4.3.1. explains how o ype hese special cha ac e s using he Monlam
Keyboa d. He e's ano he example:
The accu a e way o ansc ibe his would be དབ(ི་ (Wylie dbalyi). We know ha his is an
unusual spelling o mis ake o དབལ་!ི་ (Wylie dbal gyi), bu we s ill need o ansc ibe i as
དབ(ི་ o be 100% accu a e in ende ing exac ly wha he manusc ip con ains.
A a la e s age o he p ojec , we will c ea e a "no malised ansc ip ion" in addi ion o
his diploma ic ansc ip ion. In he la e "no malised ansc ip ion", we will expand
con ac ions and egula ise spelling e c. o c ea e e- ex s ha a e easy o sea ch and use
o o he pu poses.
18
4.3.1 Special cha ac e s?
Mos s anda d Tibe an Unicode cha ac e s a e ound on he de aul Monlam keyboa d:
Some imes, less equen o less s anda d cha ac e s a e needed, howe e . These can be
ound on he Monlam keyboa d by p essing ei he SHIFT o CAPSLOCK. The
combina ions wi h SHIFT keys look like he ones nex o he capi al le e s, e.g. k = ཀ bu
K = ཁ :
19
A u he se o special cha ac e s can be ound p essing CAPSLOCK:
No e ha he abo e images a e no exhaus i e and can some imes di e sligh ly
depending on you ope a ing sys em. You can always ind he exac keyboa d layou by
le ing you compu e show he keyboa d iewe :
Clicking on 'Show Keyboa d Viewe ' will gi e you a de aul keyboa d o e iew. I you
p ess SHIFT, you will see he de aul keyboa d change o he SHIFT keyboa d op ions.
Simila ly, i you p ess CAPSLOCK, you will see he op ions o he CAPSLOCK key
combina ions.
20
On a Macbook, o example, SHIFT + 4 will yield he non-s anda d sha- ags as
exempli ied in ཞོ◌༹འ་ in sec ion 4.3.4 below.
A ull lis o examples and use cases o he Monlam keyboa d wi h de ailed explana ions
can be ound in he Monlam keyboa d manual.
4.3.2 Spelling mis akes?
Since we a e aiming o a diploma ic ansc ip ion, i is impo an o keep all spelling
mis akes o un amilia spellings exac ly as hey a e in he manusc ip .
He e a e some sample cases o odd spellings and unusual symbols wi h hei
ansc ip ions:
No e ha in some cases like དཾ་, we need addi ional cha ac e s ha may no be in he
s anda d e sion o he Monlam keyboa d. Sec ion 4.3.1 abo e explains how o deal wi h
hese and o he ypes o special cha ac e s; Tibe an Unicode s acks ha a e non-
con en ional a e discussed in sec ion 4.3.4.

21
4.3.3 Con ac ions?
Simila o he odd spellings, we need o keep con ac ed syllables and o ms in exac ly
he same way hey appea in he manusc ip :
I you a e unsu e o how o ype a con ac ion, he e is a buil -in Google shee o look up
how o ype exac ly wha you see in he manusc ip . You can access his shee in ei he
Task A o Task B, by clicking on he h ee lines ( he 'hambu ge icon') on he igh side:
When you click on his, a Google shee will pop up in he bo om igh co ne wi h a lis
o images o con ac ions in he i s column, a lis o con en ions o how o ansc ibe
hese in a 100% accu a e way in he g een middle column and, inally, a lis o expanded
e sions o hese con ac ions. You can sc oll down his alphabe ical lis o ind he
con ac ion you need and hen you can look a he g een con en ions column o lea n
how o ype i :
22
Bo h anno a o s and Team Leade s can use his google shee o look up con en ions o
con ac ed and non-s anda d o ms. You can hide his Google shee again i you like, by
clicking on he ed exi c oss on he igh :
I he line o page in he manusc ip you' e ansc ibing con ains he ollowing sequence
o cha ac e s, o example, hen you know his is no a s anda d Tibe an syllable, because
he e a e mul iple owels, no enough shegs and he e a e s acks o cha ac e s ha do
no exis in no mal Tibe an sc ip :
23
I you don' know how o ype his using he Monlam keyboa d (see sec ion 4.3.1 abo e)
and/o i you wan o make su e ha you a e using he co ec con en ion o
ansc ibing his, you look i up in he Google shee and hen you ype wha you ind in
he g een 'Con en ion' column:
I he con ac ion you' e seeing in he image is no in his lis , you should c ea e a
minia u e sc een sho o he syllable and add i in a new ow in he sha ed TNI Google
shee alongside a po en ial con en ion o ansc ibing i and an expanded e sion o he
con ac ion (i you know how o - o he wise lea e hese g een and ed columns blank, as
well as he name o he ba ch whe e you ound his and you own name, as shown in
hese wo sample cases:
24
4.3.4 Non-legi ima e Tibe an Unicode s acks?
Non-exis ing s acks o consonan s and/o owels wo k in he same way as he
con en ions, spelling mis akes, and special cha ac e s abo e. You may need o use he
SHIFT and/o CAPSLOCK keys on he Monlam keyboa d o ind ce ain elemen s like he
sha- ags:
4.4 Di e en sc ip s
In some images di e en ypes o sc ip s a e used. This is o en he case in i ual ex s,
o example, whe e ins uc ions e c. a e di e en om con en . These smalle sc ip s o
sc ip s in di e en colou s should be ansc ibed in exac ly he same way as o he pa s
o he ex .
In he line abo e we see h ee di e en sc ip s, o example:
1. he egula sc ip
2. a smalle e sion o he egula sc ip
3. he egula sc ip in ed ink
Fo he pu poses o ou p ojec , hese di e en ypes don’ need o be p ese ed, so all
h ee ypes o sc ip in he abo e line can be ansc ibed in he same way.
Some ex s con ain wo ds and/o lines in Zhangzhung sc ip s. These can be ma ked wi h
an ྾. On he Monlam keyboa d, you ind he ྾ by yping SHIFT + 6.