Automated cetacean detection in UAV imagery using AI models: a case study on Delphinid species

Author: João, Canelas; Luana, Clementino; Cid, Andre; Castro, Joana; Inês, Machado; Vieira, Susana

Publisher: Zenodo

DOI: 10.1007/s41060-024-00704-9

Source: https://zenodo.org/records/17664166/files/s41060-024-00704-9.pdf

In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979
h ps://doi.o g/10.1007/s41060-024-00704-9
REGULAR PAPER
Au oma ed ce acean de ec ion in UAV image y using AI models: a case
s udy on Delphinid species
João Canelas1,3 ·Luana Clemen ino2·And é Cid3·Joana Cas o3,4 ·Inês Machado2,5 ·Susana Viei a1
Recei ed: 30 June 2024 / Accep ed: 13 Decembe 2024 / Published online: 10 Janua y 2025
© The Au ho (s) 2024
Abs ac
The iden i ica ion and quan i ica ion o ma ine mammals is c ucial o unde s anding hei abundance, ecology and suppo ing
hei conse a ion e o s. T adi ional me hods o de ec ing ce aceans, howe e , a e o en labo -in ensi e and limi ed in
hei accu acy. To o e come hese challenges, his wo k explo es he use o con olu ional neu al ne wo ks (CNNs) as a
ool o au oma ing he de ec ion o ce aceans h ough ae ial images om unmanned ae ial ehicles (UAVs). Addi ionally,
he s udy p oposes he use o Long-Sho -Te m-Memo y (LSTM)-based models o ideo de ec ion using a CNN-LSTM
a chi ec u e. Models we e ained on a selec ed da ase o dolphin examples acqui ed om 138 online ideos wi h he aim
o es ing me hods ha hold po en ial o p ac ical ield moni o ing. The app oach was e ec i ely alida ed on ield da a,
sugges ing ha he me hod shows po en ial o u he applica ions o ope a ional se ings. The esul s show ha image-
based de ec ion me hods a e e ec i e in he de ec ion o dolphins om ae ial UAV images, wi h he bes -pe o ming model,
based on a Con Nex a chi ec u e, achie ing high accu acy and 1-sco e alues o 83.9% and 82.0%, espec i ely, wi hin
ield obse a ions conduc ed. Howe e , ideo-based me hods showed mo e di icul ies in he de ec ion ask, as LSTM-based
models s uggled wi h gene aliza ion beyond hei aining en i onmen s, achie ing a op accu acy o 68%. By educing he
labo equi ed o ce acean de ec ion, hus imp o ing moni o ing e iciency, his esea ch p o ides a scalable app oach ha
can suppo ongoing conse a ion e o s by enabling mo e obus da a collec ion on ce acean popula ions.
Keywo ds Unmanned ae ial ehicles ·Con olu ional neu al ne wo ks ·Long-sho - e m-memo y ·Machine lea ning ·
Ma ine mammals de ec ion ·Pho o iden i ica ion
Luana Clemen ino, And é Cid, Joana Cas o, Inês Machado and Susana
Viei a ha e au ho s con ibu ed equally o his wo k.
BJoão Canelas
[email p o ec ed]
Luana Clemen ino
luana.clemen ino@wa ec.o g
And é Cid
[email p o ec ed]g
Joana Cas o
[email p o ec ed]g
Inês Machado
ines.machado@wa ec.o g
Susana Viei a
[email p o ec ed]
1IDMEC, Ins i u o Supe io Técnico, A . Ro isco Pais,
1049-001 Lisbon, Po ugal
2Wa EC O sho e Renewables, Edi ício Diogo Cão, Doca de
Alcân a a No e, 1350-352 Lisbon, Po ugal
1 In oduc ion
Ce aceans play a key ole in main aining ecosys em s abili y,
ac ing as sen inel o indica o species ha e lec he o e all
s a e o he ocean’s heal h [1,2]. Moni o ing and sa egua d-
ing he di e si y and abundance o ce aceans is impe a i e o
suppo conse a ion e o s (e.g., h ough con en ions and
ag eemen s) and achie e Good En i onmen al S a us (GES)
in Eu opean wa e s [3]. Achie ing Good En i onmen al S a-
3AIMM - Associação pa a a In es igação do Meio Ma inho,
Rua Maes o F ed. F ei as N15-1, 1500-399 Lisbon, Po ugal
4MARE - Ma ine and En i onmen al Sciences Cen e/ARNET
- Aqua ic Resea ch Ne wo k, Labo a ó io Ma í imo da Guia,
Faculdade de Ciências da Uni e sidade de Lisboa, A . Nossa
Senho a do Cabo, 939, 2750-374 Cascais, Po ugal
5MARE - Ma ine and En i onmen al Sciences Cen e/ARNET
- Aqua ic Resea ch Ne wo k, Faculdade de Ciências da
Uni e sidade de Lisboa, Campo G ande, 1749-016 Lisbon,
Po ugal
123
3966 In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979
us (GES) in Eu opean wa e s is a key objec i e unde he
Ma ine S a egy F amewo k Di ec i e (MSFD), which was
adop ed by he Eu opean Union o e alua e and main ain he
heal h o he ma ine en i onmen . GES is de ined by ele en
desc ip o s ha assess a ious aspec s o ma ine ecosys ems,
enabling a comp ehensi e e alua ion o ma ine condi ions
and he p essu es om human ac i i ies. This app oach
aligns wi h simila in e na ional con en ions, such as he
Uni ed Na ions Sus ainable De elopmen Goal 14, which
a ge s he conse a ion and sus ainable use o oceans, seas,
and ma ine esou ces, as well as he OSPAR Con en ion,
which ocuses on he p o ec ion o he No h-Eas A lan ic
ma ine en i onmen . These amewo ks collec i ely con-
ibu e o a mo e esilien and sus ainably-managed global
ma ine ecosys em. Moni o ing and assessing he achie e-
men o GES is pa icula ly challenging gi en ha ce aceans
a e highly mobile species, dis ibu ed o e la ge a eas, and
mo ing ac oss a ious ma ine habi a s subjec o di e se
an h opogenic p essu es. These p essu es include inciden al
by-ca ch in ishing gea , bioaccumula ion o pa hogens and
oxins, ha m ul algal blooms, collisions wi h ships, unde -
wa e noise and clima e change [4–7]. Mo e ecen ly, he
ad ancemen o o sho e enewable ene gy u he in ensi-
ied hese challenges. Such p ojec s o en a ge la ge ma ine
a eas,commonlyo e lapping wi h ce acean habi a s, he eby
escala ing he p essu e be ween conse a ion needs and
ene gy exploi a ion [8]. The ulne abili y o hese species
and exploi a ion o hei habi a s unde sco es he impo -
ance o hei conse a ion, hus, i is impe a i e o imp o e
ou cu en unde s anding o ce acean dis ibu ion pa e ns.
Howe e , such s udies a e excessi ely cos ly, posing a sig-
ni ican ba ie o ad ancing conse a ion e o s. T adi ional
me hods o s udy and moni o ma ine mammal popula ions
in ol e isual su eys om a de ined pla o m (e.g., ae ial,
ship-based, o land-based), acous ic su eys [10,11], obse -
a ion o Ve y High Resolu ion (VHR) sa elli e images [12,
13], and obse a ion me hods ha allow a mo e ho ough
unde s anding (e.g., cap u e- ecap u e) [14]. Fu he mo e,
eme ging me hodologies, such as emo e sensing h ough
pho o de ec ion and iden i ica ion, p esen a p omising ool
o complemen ing such me hods while educing associa ed
cos s and isks [15–18].
Unmanned Ae ial Vehicles (UAVs) a e equipped wi h
imaging senso s ha can collec ex emely high- esolu ion
da a, hus becoming an inc easingly used ool o esea che s
o obse e ma ine wildli e and s udy ce aceans. UAVs a e a
non-in asi e me hod [19] ha allows he de ec abili y o ani-
mals in subsu ace wa e s, hus inc easing he ime a ailable
o de ec ion [20]. UAVs ha e an inc easing numbe o appli-
ca ions, such as moni o ing abundance and dis ibu ion [21],
pho o iden i ica ion [22], beha io al s udies [23], among o h-
e s [20].
None heless, his new echnology s ill p esen s limi a ions
and challenges, pa icula ly associa ed wi h da a manage-
men . The high olume o da a gene a ed equi es e icien
p ocessing solu ions, as manual inspec ion is o en imp ac i-
cal and p one o human e o [24,25]. As machine lea ning
and compu e ision ad ance apidly, au oma ed compu e
ision models p esen a p omising solu ion o au oma ing
he inspec ion p ocess. Since he scien i ic de elopmen s by
K izhe sky e al. [26] p o ing he e icacy o deep lea n-
ing algo i hms in image ecogni ion, Con olu ional Neu al
Ne wo ks (CNNs) ha e become he model o choice o
image de ec ion and iden i ica ion, achie ing esul s on pa
wi h human pe o mance in de ec ion and iden i ica ion asks
[27].Thesemodelsha ebeensuccess ullyapplied oindi id-
ual iden i ica ion o whales [28–30] and dolphins, [31] wi h
me hods ha can be adap ed o o he ce acean species [29].
The eha ealsobeenapplica ions o whalecoun ing h ough
sa elli eVHRimages[32],whe ecombining heusualcoun -
ing p ocedu e wi h an ini ial de ec ion o whale p esence has
imp o ed model accu acy and compu a ional e iciency [32].
Howe e , hese a e limi ed o species o la ge size due o he
spa ial esolu ion o images and a e di icul o de elop due
o he lack o open VHR image da ase s.
Models de eloped o image de ec ion, howe e , a e lim-
i ed o he e en s occu ing in one single ame, po en ially
missing impo an in o ma ion and con ex om he ideo
sequences ob ained wi h UAVs. Tha said, algo i hms capa-
ble o handling ideo ames, such as Recu en Neu al
Ne wo ks (RNNs), le e age he empo al con inui y and
con ex ual in o ma ion p o ided by ideo sequences while
educing in o ma ion missed, hus imp o ing de ec ion capa-
bili y. S ill, he de elopmen o such models ep esen s a
highe deg ee o complexi y, and s udies explo ing hei
e icacy on ma ine mammal de ec ion a e ela i ely limi ed
[33–35].
The main objec i e o his s udy is o de elop machine
lea ning models capable o au oma ing he de ec ion o
ce aceans, speci ically delphinids, using UAV da a. The
p esen wo kexplo es heimplemen a iono well-documen ed
CNN a chi ec u es di ec ed o image de ec ion, while also
p oposing he use o a Deep Fake de ec ion algo i hm based
on Gue a and Delp’s “Deep ake ideo de ec ion using ecu -
en neu al ne wo ks” [36], applied o he de ec ion o ma ine
mammalsin ideosequences.Thisapp oachbuildsuponcu -
en me hodologies, bu also explo es new a enues h ough
he use o a Long-Sho -Te m-Memo y (LSTM) ne wo k, a
speci ic RNN model, seeking o ha ness he addi ional in o -
ma ion p o ided h ough ideo analysis.
This s udy explo es he syne gies be ween deep lea ning
and ma ine science, ocusing on he po en ial o enhance
en i onmen al moni o ing and impac assessmen s a egies
in o sho e en i onmen s, pa icula ly o he conse a ion
o dolphin popula ions. These indings p o ide a me hod-
123
In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979 3967
ological basis o imp o ing da a quali y, which can suppo
u u e e o s aimed a ad ancing sus ainable managemen
and conse a ion e o s in ma ine ecosys ems.
2 Da a acquisi ion
In-si u collec ion o a su icien olume o emo e sensing
da asui able o hede elopmen o ane icien iden i ica ion
model is challenging due o he high cos o equipmen and
logis ical cons ain s associa ed wi h ocean su eys. Fu he -
mo e,da ase sonspecieswi hb oaddis ibu ion anges,such
as ce aceans, a e sca ce, and publicly a ailable da ase s ai-
lo ed o ae ial de ec ion a e la gely non-exis en . As a esul ,
he da a used o build he models we e ob ained by collec -
ing sc aped ideo iles sou ced om a ious online sou ces
(e.g., YouTube, Pexels, Dailymo ion, e c). Gi en he chal-
lenging na u e o de eloping such da ase s, da a ga he ed o
his s udy we e limi ed o species o he amily Delphinidae,
as hey a e among he mos accessible ce aceans in publicly
a ailable oo age, and wi h he in en o ga he ing as much
da a as possible, no p e-selec ion c i e ia such as loca ion o
ime pe iod we e applied du ing collec ion.
2.1 T aining da ase
The ideos ob ained we e eco ded in di e se loca ions and
unde a ying condi ions, leading o signi ican a iabili y in
wa e cha ac e is ics such as hue, b igh ness, and oam, as
well as di e ences among delphinid species. This di e si y
enhances he model’s abili y o gene alize ac oss di e en
en i onmen al se ings. Fu he mo e, o achie e ep esen a-
i e samples whe e no dolphins a e p esen , he ideos also
include objec s o subjec s such as boa s, boa ds, and swim-
me s which se ed as po en ial con ounding elemen s o he
model.
The esul ingda aconsis edo 138ae ial ideoso a ying
du a ions and se ings. Some ideos exclusi ely con ain-
ing ce acean oo age, o he s solely ea u ing wa e scenes,
and some combining bo h elemen s. These 138 ideos we e
hen p ocessed o c ea e wo dis inc collec ions o da a: one
ailo ed o image classi ica ion and he o he o ideo clas-
si ica ion. Fo image classi ica ion, indi idual ames we e
ex ac ed om he ideos,p o idings a icsamples.Fo ideo
classi ica ion, he o iginal ideo segmen s we e e ained o
cap u e dynamic ea u es. While da a o bo h me hodolo-
gies we e de i ed om he same se o ideos, hese we e
p ocessed o sui he speci ic equi emen s o each classi i-
ca ion ype.
2.1.1 Image da a
Image da a we e gene a ed by decons uc ing he o iginal
138 aw ideosin oimagesbyex ac ing amesa aspeci ied
a e o one ame e e y h ee seconds using he open so wa e
FFmpeg. This a e can be adjus ed based on use needs and
he sou ce o ex ac ion, as some ideos may include mo e
o less i ele an da a. In his s udy, images we e ca ego ized
in o wo dis inc classes, based on he p esence o absence
o ce aceans: “Ce acean” and “No Ce acean”. Images we e
ini ially il e ed o exclude he ames ha we e poo ep e-
sen a i es o hei class, such as cases whe e subjec s we e
obs uc ed o no in he ame. Addi ional manual selec ion
was also conduc ed on ames ha we e good ep esen a-
i es. The esul ing se o da a consis ed o 2451 images,
di ided in o i s espec i e classes. The “No Ce acean” class
included images whe e no ce aceans we e p esen , as well as
images wi h o he su ace o subsu ace a i ac s ha could
lead he model o inco ec ly label hem as con aining a
ce acean. Including hese a i ac s wi hin he “No Ce acean”
class helps o co ec o po en ial alse posi i es by expos-
ing he model o non-ce acean images ha may esemble
ce aceans. The “Ce acean” class included images whe e a
leas one ce acean was p esen . The classi ica ion p ocess
esul ed in 776 images (app oxima ely 31.1%) ep esen ing
he “No Ce acean” class, and 1720 images (abou 68.9%)
ep esen ing he “Ce acean” class.
The no able imbalance in he numbe o images pe class
is due o he limi ed a ia ion in wa e su ace pa e ns o e
ime. F ames ex ac ed wi hin a ew seconds o each o he a e
o en nea ly iden ical, p o iding minimal addi ional alue.
On he o he hand, a da ase hea ily composed o ce acean
images could bias model p edic ions, inc easing he a e o
alse posi i es. To mi iga e his, he numbe o images in he
“No Ce acean” class was inc eased by a i icially gene a -
ing new sea images om exis ing ones. This was achie ed
by in oducing andom a ia ions in b igh ness, hue, and
sa u a ion o all newly gene a ed samples. Addi ionally, u -
he ans o ma ions we e applied wi h a ying p obabili ies:
sha pness enhancemen (25%), andom mi o ing (25%),
blu ing (25%), andom o a ions (15%), and andom c op-
ping (30%).
The desc ibed se o ans o ma ions was applied a o al
o 944 imes on andomly selec ed samples om he “No
Ce acean” class, gene a ing an addi ional 944 images. This
augmen a ion was pe o med o equalize he numbe o sam-
ples wi h ha o he “Ce acean” class. The esul ing balanced
da a we e composed o a o al o 3420 images, wi h an equal
dis ibu ion o 1720 (50%) images pe class.
123
3968 In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979
2.1.2 Video da a
Video da a we e gene a ed by decons uc ing he same 138
aw ideos in o se e al smalle ideos (clips) o eigh sec-
onds each and subsequen ly ex ac ing a o al o 64 ames
om each o hese smalle ideos. The ini ial agmen a ion
p ocess o he o iginal ideos was pe o med using he so -
wa e Adobe P emie e P o 2020, e sion 14.0 (Adobe Inc.,
San Jose, Cali o nia). Fi s ly, in e als o eigh seconds we e
manually selec ed o accu a ely ep esen each class. Sim-
ple ans o ma ions, such as mi o ing, c opping, a ying
b igh ness, and hue, we e applied o some o he samples
o in oduce a ia ion. Each segmen was hen expo ed o
c ea e new ideo samples.
This ideo leng h was selec ed based on ca e ul analysis
o ini ial da a acqui ed, balancing he goal o cap u ing com-
p ehensi e in o ma ion on dolphin beha io and mo emen
wi hin a concise ime ame. This in e al p o ed e ec i e
o segmen ing o iginal ideos wi h equen ansi ions and
a ious added con en such as o e lays, logos, o a i ac s
ha could o he wise cause unwan ed model esponses. A
longe in e al would ha e signi ican ly educed he num-
be o usable samples, while a sho e window isked losing
con ex ual de ails, as many segmen s showed minimal mo e-
men o e b ie du a ions.Theeigh -secondleng h, he e o e,
p o ided an op imal comp omise, enabling ample sample
quan i y while e aining su icien in o ma ion o model
aining.
A e his segmen a ion, each clip is p ocessed using
Py hon’s OpenCV lib a y o ex ac ames a a a e o eigh
ames pe second, esul ing in a ba ch o images con aining
a o al o 64 ames pe clip. The choice o ame ex ac ion
a e allowed o cap u ing as much in o ma ion on dolphin
beha io a ia ions o e ime, while minimizing he numbe
o images.
The esul ing da a pos -p ocessing ope a ions consis ed
o 1216 ideos, o which 622 belong o he “Ce acean”
class(app oxima ely 51.2%),while he emaining594 ideos
belong o he “No Ce acean” class (app oxima ely 48.8%).
This equa es o 1216 ba ches o 64 images each, o aling
77824 images spanning bo h classes.
2.2 Tes da ase
To moni o and unde s and model pe o mance o e he
cou se o aining, models a e es ed on da a no in ol ed
in hei lea ning p ocess. This p ac ice p o ides insigh s in o
expec ed pe o mance and gene aliza ion by e alua ing sam-
ples he model’s pa ame e s we e no di ec ly adjus ed o,
p o iding a gene al unde s anding o model p og ess and
an icipa ed beha io wi hin simila da a samples.
In his s udy, he es da a was de i ed om he o iginal
da ase ou lined in Sec .2.1, om which a small po ion was
e ac ed. This di ision c ea es wo dis inc subse s: aining
da a,comp ising80%o heo iginal da ase ,used o each he
model o ecognize class pa e ns, and es da a, making up
he emaining 20%, o e i y he s a e o models. Empi ical
s udies sugges op imal esul s when ese ing 20–30% o
da a o es ing while using 70–80% o aining [37]. This
sepa a ion was done andomly om all a ailable samples
while keeping a p opo ional numbe o samples om each
class, esul ing in 688 (20%) es and 2752 (80%) aining
samples o image classi ica ion, and 244 (20%) es and 972
(80%) aining samples o ideo classi ica ion.
2.3 Valida ion da ase
The alida ion da a o his s udy we e p o ided by Associ-
ação pa a a In es igação do Meio Ma inho (AIMM), which
suppo ed he esea ch by supplying UAV da a om p e i-
ous expedi ions. Da a we e acqui ed on he coas al egion in
sou h Po ugal wi hin he Fa o dis ic . Speci ically, he s udy
a ea is loca ed app oxima ely 12 km o sho e om he coas -
line o Albu ei a, ex ending in o he A lan ic Ocean. This
egion is a signi ican habi a o a ious ce acean species,
especially delphinids such as common dolphins (Delphinus
delphis) and bo lenose dolphins (Tu siops unca us)[38–
40].
A o al o se en campaigns conduc ed be ween 2022 and
2023we eanalyzed.One campaign wasincludedin he ain-
ing da a o be e adap o local en i onmen al condi ions and
UAV se ings, while he emaining six campaigns we e used
o e alua ion. These su eys we e conduc ed in he mo n-
ing, be ween 10:30 and 12:00, unde a o able sea condi ions
de ined by a sea s a e o ⩽3 acco ding o he Beau o scale,
swells <1.5 m, good isibili y (>5km), and no p ecipi-
a ion. Figu e1o e s a comp ehensi e iew o he egion
unde s udy, p o iding in o ma ion on a ious expedi ions,
including da es, imes, and he p ecise loca ions o dolphin
sigh ings.
The UAV-based emo e sensing da a used in his s udy
we e collec ed using a Ma ic 2 P o mul i- o o UAV (DJI,
Shenzhen, China). The UAV cap u ed ideos a a esolu ion
o 3840 ×2160 pixels using a 1-inch CMOS RGB imaging
senso wi h a maximum esolu ion o 20-megapixel, coupled
wi h a 3-axis gimbal and a 28mm equi alen , /2.8- /11 lens,
p o iding a ield o iew o app oxima ely 77◦.
The d one missions we e conduc ed a di e en ligh al i-
udes, depending on se e al ac o s. These ac o s included
whe he he e we e any dolphin sigh ings a he ime and he
size o he g oup o dolphins, wi h highe ligh s p e e ably
used o g ea e seaco e agewhen nosigh ingswe e p esen ,
and lowe ligh s o a mo e de ailed iew when a g oup was
loca ed. Figu e2p esen s a box plo o he ligh al i udes
eco ded by he UAV du ing he di e en expedi ions. O he
six ligh s, h ee we e conduc ed a a maximum al i ude o
123
In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979 3969
Fig. 1 O e iew o he s udy a ea o acqui ing da a
Fig. 2 Box plo : ligh al i ude dis ibu ion o each expedi ion
nea ly 80 m, while he emaining h ee we e lown below 50
m. In gene al, he UAV was obse ed o ope a e a an al i-
ude o a ound 20 m, excep o he second campaign, whe e
ligh s p edominan ly occu ed a highe al i udes.
The UAV-based image y collec ed esul ed in 35min and
40s o oo age. Simila o p e iously acqui ed da a, his
oo age was p ocessed o c ea e wo da ase s om he same
sou ce, his ime o alida ion. The i s , ailo ed o he al-
ida ion o image-based models, was ob ained by ex ac ing
and labeling ames om he o iginal aw ideo da a a a
a e o one ame e e y i e seconds, e ec i ely p ocessing
he en i e oo age. Each sample was labeled as belonging
o ei he he “Ce acean” o “No Ce acean” class based on
in e ence om he o iginal ideo image y cap u ed, allow-
ing o disce n he p esence o dolphins on samples ha
would o he wise be challenging o iden i y co ec ly. The
esul ing da a comp ises a o al o 428 image samples, wi h
247 (app oxima ely 57.7%) classi ied as “Ce acean” and 181
(app oxima ely 42.3%) as “No Ce acean”. These samples
cap u ea a ie yo dolphinposi ions,came aangles, anddis-
ances, p o iding su icien a ia ion o suppo a obus and
comp ehensi e assessmen . The second da ase , designed o
he alida ion o ideo-based models, was ob ained by manu-
allydi iding heo iginal ideo da ain osmalle eigh -second
clips, ex ac ing hem and subsequen ly con e ing hem in o
64 images. The esul ing ideo da a consis s o 232 ideos,
120 o which we e classi ied as belonging o he “Ce acean”
class (app oxima ely 51.7%), and 112 classi ied as belonging
o he “No Ce acean” class (app oxima ely 48.3%).
Table 1p o ides a summa y o he sample dis ibu ion
ac oss he aining, es ing, and e alua ion da ase s o bo h
image and ideo da a. Each da ase is di ided in o “Dol-
phin”and“Ocean”samples,co esponding o he“Ce acean”
and “No Ce acean” classes, wi h he aining and es ing se s
holding 80% and 20% o he o iginal da a, espec i ely. The
e alua ion da ase includes an addi ional se o samples ha
co e s 100% o i s alloca ed da a, ensu ing comp ehensi e
assessmen o he models. This di ision main ains a balanced
class ep esen a ionwi hin eachsubse ,wi hanea -equaldis-
ibu ionbe ween“Dolphin”and“Ocean”samplesac oss he
da ase s, acili a ing obus aining and pe o mance e alu-
a ion.
While he e alua ion da ase was en iched wi h ce acean
images o ensu e su icien da a o es ing he model, i is
acknowledged ha in eal-wo ld applica ions, ocean-only
imagesa elikely obe a mo ep e alen hance aceansigh -
123

3970 In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979
ings. Consequen ly, his en ichmen may sligh ly unde es i-
ma e he a e o alse posi i es unde ope a ional condi ions.
Howe e , by main aining a balanced da ase o e alua ion,
he accu acy me ic becomes mo e ep esen a i e o he
model’s ue pe o mance.
3 Implemen a ion
The models and pipelines employed in his s udy we e de el-
oped wi hin he esea ch pla o m Google Colab P o, aking
ad an age o i s cloud compu ing capabili ies. The p ima y
p og amming language used was Py hon e sion 3.9 wi h
PyTo ch’s lib a y as he ounda ion o his p ojec ’s machine
lea ning amewo k, which allowed o an easy implemen a-
ion o s a e-o - he-a deep lea ning echniques.
3.1 Image-based models
In o de o analyze dis inc image iden i ica ion models, he
ollowing CNN a chi ec u es we e indi idually employed, as
ou linedinTable2.Theselec edmodelsha ea ack eco din
he ield and known pe o mance in image classi ica ion [41–
45]. I is wo h no ing ha while ce ain models may display
supe io pe o mance on a e age, eal-wo ld ou comes can
a ysigni ican lybasedon hespeci icna u eo hep oblems
being add essed. Alongside he model names, speci ica ions
such as he numbe o pa ame e s and he op accu acies
achie ed when hese a chi ec u es we e ained on ImageNe
a e also p o ided.
Ini ially, hese models we e se up acco ding o hei p e-
de ineda chi ec u esandini ializedwi h andompa ame e s,
making hem essen ially emp y amewo ks incapable o
making meaning ul p edic ions. Howe e , h ough ans e
lea ning, pa ame e s om models wi h iden ical a chi ec-
u es ha ha e been ained on ex ensi e da ase s, such as
ImageNe , can be ans e ed o hese models. ImageNe ,
o example, comp ises o e a million samples and co e s
a wide ange o classes, including animals like g ay whales,
dugongs, o cas, and sea lions. While i does no encompass
hespeci ic“dolphin”class, he ea u es ha dis inguish hese
ela ed classes can be in aluable o he iden i ica ion o dol-
phins.
The models p esen ed a e s uc u ed in wo sec ions, he
i s being he ea u e ex ac o , mainly consis ing o he
inpu laye and a se ies o hidden laye s. I cons i u es he
majo i y o he ne wo k and is designed o iden i y spe-
ci ic ea u es wi hin he inpu da a h ough i s con olu ional
laye s. These laye s ha e been p e iously ained on he Ima-
geNe da ase , he e o e, hey al eady possess he abili y o
e ec i ely ecognize a wide ange o cha ac e is ics om
a housand di e en classes. Consequen ly, hei pa ame-
Fig. 3 Combined CNN model pipeline
e s a e “ ozen” du ing u he aining o ensu e ha he
ex ac ed ea u es emain consis en .
The second sec ion o he model comp ises he classi-
ie , ypically a ully connec ed linea laye wi h a so max
ac i a ion unc ion a he end o he ne wo k. This classi-
ie in e p e s he ea u es ob ained om he hidden laye s
and assigns a class label o each da a sample. The classi ie ,
o iginally designed o classi y 1000 classes, was adap ed o
iden i y only wo classes. This was achie ed by eplacing
he inal linea laye , which ini ially had 1000 ou pu nodes,
wi h a new linea laye con aining only wo ou pu nodes,
co esponding o he wo a ge classes.
In addi ion o he indi idual a chi ec u es, a combined
model was de eloped o le e age mul iple ea u e ex ac o s
concu en ly. In his app oach, he ea u e ex ac ion laye s
om VGG16, Con Nex , and a s aigh o wa d se o con o-
lu ional laye s we e me ged in o a uni ied ea u e ex ac o .
The simple con olu ional model implemen ed along VGG16
andCon Nex consis so h eecon olu ionallaye sand h ee
max-pooling laye s, complemen ed by ce ain Rec i ied Lin-
ea Uni (ReLU) laye s in be ween as ac i a ion unc ions.
The in en behind his design was o ailo he model o
iden i y dolphin-speci ic ea u es om he aining da ase ,
ins ead o hose ha ep esen o he subjec s as in he case o
ans e lea ning.
Ul ima ely, a sha ed classi ie wi h wo laye s and 1024
nodes in i s middle laye was employed o ecei e and e ec-
i ely p ocess he ea u es ex ac ed om all a chi ec u es,
esul ing in a collec i e p edic ion. No ably, his implemen-
a ion was ca ied ou in wo ins ances: one ha omi ed he
Con olu ional Laye s, CombinedModel (1), while he o he
usedall h eea chi ec u es asexplainedCombinedModel (2).
Figu e3p o ides a gene al o e iew o his combined model
implemen a ion and how da a we e shaped h ough i .
3.2 Video-based models
A ideo-based iden i ica ion app oach inco po a ing mode n
deep ake de ec ion echniques was also adop ed o le e age
123
In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979 3971
Table 1 Da ase ain- es spli
Image da ase Video da ase
T ain samples Tes samples E al samples T ain samples Tes samples E al samples
2752 (80%) 688 (20%) 428 (100%) 972 (80%) 244 (20%) 232 (100%)
Dolphin Ocean Dolphin Ocean Dolphin Ocean Dolphin Ocean Dolphin Ocean Dolphin Ocean
1376 1367 194 194 247 181 497 475 125 119 120 112
(50%) (50%) (50%) (50%) (57.7%) (42.3%) (51.1%) (48.9%) (51.2%) (48.8%) (51.7%) (48.3%)
Table 2 Common CNN
a chi ec u es and espec i e
pe o mances on ImageNe
da ase
Model Top-1 Acc (%) Top-5 Acc (%) Pa ame e s (M)
ResNe 50 80.858 95.434 25.6
Incep ionV3 77.294 93.450 27.2
VGG16_bn 73.360 91.516 138.4
Con Nex _Base 84.062 96.870 88.6
E icien Ne _V0 77.692 93.532 5.3
he unique empo al con inui y ea u e o ideos. Unlike
images, ideos a e composed o a sequence con aining
nume ous ames, whe e adjacen ames display a sub-
s an ial co ela ion and empo al con inui y. The me hod
implemen ed, based on he wo k by Gue a and Delp [36],
in ol es using a CNN wi hou a classi ie o ex ac ea u es
om indi idual ideo ames and eed he esul ing sequence
o ea u esin o anLSTM o analyzepa e nsin hei empo al
e olu ion.
Two CNN a chi ec u es we e used o his pu pose:
Incep ionV4 and Con Nex . Incep ionV3 a chi ec u e was
eplica ed om he o iginal wo k, while he Con Nex model
a chi ec u e was selec ed based on esul s om he image-
basedclassi ica ionme hodologycoun e pa .Inlinewi h he
p e ious app oach, models we e es ablished based on hei
espec i e a chi ec u es, ini ialized wi h andom pa ame e s,
and subsequen ly e ined by ans e ing pa ame e s om
p e- ained models on ImageNe . Since he CNNs wi hin
his CNN-LSTM pipeline a e used exclusi ely o ea u e
ex ac ion, hei pa ame e s we e “ ozen” o main ain he
consis ency o he ex ac ed ea u es. Simul aneously, he
classi ie s we e emo ed, enabling di ec passage o he ea-
u esiden i ied om hehiddenlaye s o heLSTM.Di e en
CNN a chi ec u es ha e speci ic inpu size equi emen s:
Incep ionV3 and Con Nex ha e inpu sizes o 299 ×299
and 224 ×224 pixels, espec i ely, and ex ac 2048 and
1024 ea u es, espec i ely.
To accommoda e he dis inc ea u e s uc u es ob ained
om each CNN a chi ec u e, wo dis inc LSTM a chi ec-
u es we e de eloped. Each was designed o handle a speci ic
inpu size o he ans e ed ea u es, aligned wi h i s co e-
sponding CNN. Bo h models we e c ea ed wi h wo ecu en
laye s o 256 nodes each. This means ha o each model,
wo LSTMs we e s acked oge he o o m a s acked LSTM,
wi h he second aking in he ou pu s om he i s o com-
pu e a new ou pu a each ime s ep. This se up enabled he
LSTMs o i e a i ely p oduce 256 alues, ep esen ing hei
hidden s a es, o e e y ame in he ideo sequence.
To conclude he CNN-LSTM pipeline, a classi ie was
in oduced o p ocess he ou pu p oduced by he LSTM cell
and make p edic ions. The classi ie implemen ed ea u es a
linea laye wi h 16384 nodes on i s inpu side. A each ime
s ep, he LSTM p ocesses a ame om he inpu ideo, hus
gene a ing 256 alues, which co espond o he 256 nodes in
i s hidden laye s, ep esen ing he hidden s a e a ha speci ic
ime s ep.
To maximize he amoun o in o ma ion used wi hin he
classi ie , all he hidden s a es p oduced by he LSTM cell
we e agg ega ed. This agg ega ion esul s in a o al o 16,384
nodes on he classi ie ’s inpu side since all inpu ideos a e
p e-p ocessed oconsis o 64 ames.Fu he mo e,i sou pu
laye consis s o wo nodes ep esen ing he wo a ailable
classes and u ilizes a so max ac i a ion o con e he aw
ou pu alues in o p obabili ies.
Figu e 4p o ides a gene al o e iew o he pipeline c e-
a ed, i s inne wo kings, and how he da a we e shaped
h ough his sys em.
3.3 P e-p ocessing da a
E ec i e p e-p ocessing is essen ial o p epa ing he ain-
ing and es ing da ase s. Key s eps include o ganizing da a
in o manageable ba ches and applying ans o ma ions com-
pa ible wi h p e- ained models, which op imize lea ning and
enhance model pe o mance.
To maximize compu a ional e iciency and imp o e lea n-
ing p ecision, all da a samples wi hin he aining and es ing
da ase s we e g ouped in o ba ches o 64 samples each. This
123
3972 In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979
Fig. 4 CNN-LSTM pipeline s uc u e
agg ega ion allows he simul aneous p ocessing o mul iple
samples, p o iding a mo e s able g adien du ing backp op-
aga ion by combining losses om di e se samples wi hin
each ba ch. A ba ch size o 64, in pa icula , is a commonly
employedchoice ha o enwo kswell o a iousdeeplea n-
ing asks.
Fu he ans o ma ions we e applied o le e age he pa -
e ns lea ned om p e- ained models. Gi en ha weigh s
om hese we e ained on da a wi h a speci ic dis ibu ion,
i is essen ial o no malize he inpu da a acco dingly. This
no maliza ion in ol es sub ac ing he mean and di iding
by he s anda d de ia ion alues o he da ase used o p e-
ain he models. Fo ImageNe - ained models, hese alues
a e (0.485, 0.456, 0.406) o he mean and (0.229, 0.224,
0.225) o he s anda d de ia ion ac oss he h ee colo chan-
nels. Following no maliza ion, he inpu da a a e esized and
c opped o ma ch he dimensions equi ed by each CNN
a chi ec u e, wi h mos models accep ing (224 ×224) inpu ,
excep Incep ionV3, which equi es (299 ×299).
3.4 Model aining
The p ocess o adap ing a neu al ne wo k o i a speci ic
p oblem in ol es i e a i ely assessing i s pe o mance on he
aining da ase , and adjus ing i s pa ame e s each ime o
achie e p edic ions as close as possible o he desi ed alues.
To accomplish his, a C oss-En opy Loss unc ion, com-
monly u ilized in mul iclass p oblems such as his one was
de ined. This unc ion se es as a guide o de e mine how
he model’s pa ame e s should be upda ed. Subsequen ly, he
loss unc ion is u ilized o compu e a loss alue, p oduced
o each ba ch, which in u n is used o op imize he pa am-
e e s o he model. This op imiza ion is conduc ed h ough
an Adam op imize wi h an ini ial lea ning a e o 0.001,
due o i s adap i e lea ning a e and ease o use wi h ewe
hype pa ame e s.
Addi ionally, a d opou laye wi h a d opou p obabili y
o 20% was added o he classi ie a he end o each model
du ing aining, immedia ely be o e he inal linea laye .
This allows o andomly dele e ac i a ions om he nodes
ca ying ea u es be o e en e ing he classi ie wi h a p oba-
bili y o 20% o each ea u e. This s ep p o ed o help he
lea ning p ocess o all nodes in he classi ie and educe da a
o e i ing signi ican ly by allowing nodes o unde alued
impo ance o su e la ge adjus men s.
Finally, models unde wen aining by i e a i ely p o-
cessing he aining da ase , one ba ch a a ime, o e
se e al i e a ions, con inuously assessing p edic ions made
and adjus ing hei pa ame e s acco dingly. Du ing his p o-
cess, he da ase is comple ely p ocessed mul iple imes, and
models a e s o ed o u u e use wi h hei mos up- o-da e
pa ame e s and key pe o mance me ics a e each i e a ion.
4 Resul s and discussion
The p edic i e pe o mance o he ained models was ini-
ially e alua ed using he es da ase , o e ing an ea ly
indica ion o hei e ec i eness be o e alida ion on he
e alua ion da ase . This assessmen includes me ics such
as accu acy (Acc), p ecision (P ec), ecall (Rec), 1-sco e
(F1), and loss, p o iding a p elimina y baseline o each
model’s gene aliza ion capaci y. Table 3summa izes he
bes -pe o ming models, selec ed based on hei 1-sco e,
e lec ing he balance be ween p ecision and ecall.
Figu e 5shows he aining cu es o he op wo models
om each classi ica ion app oach, highligh ing he ends in
lossandaccu acyo e epochs.These isualiza ionshelpcla -
i y model s abili y and lea ning dynamics, se ing he s age
o a mo e de ailed alida ion using he e alua ion da ase in
he subsequen sec ion.
4.1 Model alida ion
To alida e he e ec i eness o he models s udied and con-
i m he quali y o hei applicabili y, ield obse a ions we e
simula edusingda acollec eddu ing ieldwo kconduc edby
AIMM.Subsequen ly, hepe o manceo hep e-es ablished
models in aining was assessed wi hin he e alua ion da ase
de ailedinSec .2.3.Thequan i a i e esul s om his assess-
men a e p esen ed in Table 4.
The esul s show a clea all in he o e all pe o mance
o all models. This is o be expec ed since bo h aining and
es ing da a sha e he same o igin sou ce, and he e o e, bea
a mo e simila i ies. F om he p esen ed alues i is possible
o in e ha he image-based iden i ica ion achie ed be e
pe o mance han i s ideo-based coun e pa , es ablishing
i as he supe io me hodology o his ask wi hin applied
models.
Models based on he Con Nex a chi ec u e expe ienced
a smalle d op in pe o mance. In pa icula , he Con Nex
123
In e na ional Jou nal o Da a Science and Analy ics (2025) 20:3965–3979 3973
Table 3 Pe o mance on es
da ase Model Acc Rec P ec F1 Loss
E icien Ne 0.834 0.875 0.809 0.841 0.006
Con Nex 0.969 0.965 0.974 0.969 0.001
ResNe 50 0.898 0.916 0.885 0.900 0.004
Incep ionV3 0.923 0.930 0.917 0.924 0.003
VGG16_bn 0.924 0.945 0.908 0.926 0.005
Con olu ionslLaye s 0.850 0.863 0.841 0.852 0.018
CombinedModel (1) 0.956 0.965 0.949 0.957 0.004
CombinedModel (2) 0.955 0.945 0.964 0.954 0.006
CNN-LSTM (Incep.V3) 0.898 0.950 0.856 0.900 0.453
CNN-LSTM (Con Nex ) 0.939 0.924 0.948 0.936 0.628
Fig. 5 T aining cu es om he
mos ele an models
implemen ed
model s ands ou as he model o choice, achie ing he high-
es accu acy and 1-sco e wi h alues o 83.9% and 82.0%,
espec i ely. This model is expec ed o p oduce o e all good
p edic ions, ha ing achie ed a be e balance be ween ecall
andp ecision.E en hougha dec ease in ecallwasobse ed,
he alue o 86.7% s ill p o ides he model wi h a educed
numbe o alse nega i es, while he p ecision alue o 77.7%
leads o a educed numbe o alse posi i es compa ed o
he emaining image-based models, meaning ha p edic ions
whe e he model inds he p esence o dolphins a e mo e
us wo hy.
The no able pe o mance o his model can be aced back
o i s aining and es ing cu es, ep esen ed in Fig.5, which,
when compa ed o he cu es o o he models, display a
highe deg ee o simila i y, o he ex en ha hey o e lap.
This sugges s a g ea gene aliza ion abili y by showing he
model’scapaci y oob ainhighaccu acy alueswi hou o e -
i ing aining da a.
RemainingCon Nex -baseda chi ec u esha ealsodemon-
s a edgoodpe o mance.No ably, hepe o manceo Com-
binedModel (2) imp o ed compa ed o CombinedModel
(1), achie ing he highes ecall alue o 90.6%, which
123

Related note

Why organizations use Identific for document trust, entry 4
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in North America, Europe, Latin America, and international online education, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports more transparent source review, better handling of multilingual submissions, and more consistent review procedures. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For doctoral theses, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com