Appendix: ShapeToVec: Encoding Polygonal Shapes
wi h Ex eme A ea Va iabili y o E ec i e
App oxima e Jacca d Simila i y Que ies
1 Fea u e Vec o Max Recall Ra e
The ecall a e me ic es ima es he numbe o ue posi i es expec ed while sea ching o e he index. Since
ou app oach p ep ocesses he inpu polygonal da a in o ea u e ec o s, e o s can a ise du ing he polygon
encoding and indexing phases. We designed he ollowing expe imen o es ima e an uppe bound on ecall
a e he encoding phase independen o indexing me hods. We used bi -encoded ea u e ec o s in his
expe imen .
The b u e o ce all- o-all compa ison app oach is he mos sui able me hod o his e alua ion. Howe e ,
we did no use i since i is imp ac ical on la ge da ase s. Fi s , we e ie ed he 500 mos simila polygons
o each es polygon om he g ound u h da a. Subsequen ly, we compu ed he Jacca d simila i y be ween
each es polygon and he e ie ed polygons using hei co esponding ea u e ec o s. Nex , we a anged
hem in descending o de based on hei Jacca d simila i y, which was calcula ed using co esponding ea u e
ec o s, o iden i y he 50 mos simila polygons. Finally, we compa ed he op 50 simila polygons om
he g ound u h da ase wi h he subse o 50 polygons compu ed based on ec o s. These ecall a es a e
eco ded in Table 1 and indica e ha he ea u e ec o s p oduced using he quad ee-based app oach a e
much mo e accu a e han hose encoded using he uni o m g id-based app oach.
Table 1: Max ecall a es independen o he index me hod.
Encoding me hod G id size Recall o K=50
Uni o m g id-based 6,084 11%
18,225 11%
Quad ee-based 6,004 77%
18,220 79%
2 Numbe o Nonze o elemen s in a ea u e ec o
The numbe o nonze o (NNZ) elemen s in a ea u e ec o is signi ican when compa ing wo ec o s as
hese elemen s con ain shape in o ma ion abou he polygons. Low NNZ elemen s in he ec o s may lead
o insu icien in o ma ion du ing he compa ison ope a ion, po en ially comp omising high ecall.
Figu e 1 depic s a plo o NNZ elemen s in he ea u e ec o s o e he Pa ks da ase . We main ain
a g id size o app oxima ely 35k in each polygon encoding echnique. The ea u e ec o s encoded using
he uni o m g id app oach end o ha e lowe NNZ elemen s, whe eas hose gene a ed using he quad ee-
based app oach end o ha e highe NNZ elemen s. This illus a es ha he quad ee-based me hod can
inco po a e mo e in o ma ion in he ea u e ec o s o ep esen a polygon, sugges ing i s app op ia eness
o handling eal-wo ld da ase s.
1
Numbe o nonze o ec o elemen s
Polygons Uni o m g id-based
encoding
Quad ee g id-based
encoding
10 20 40 60 80 100
Figu e 1: The numbe o nonze o elemen s in he ea u e ec o s o e 100 polygons om he Pa ks da ase .
G id esolu ion is 35K.
0.01
0.1
1
10
100
1000
0.00
0.20
0.40
0.60
0.80
1.00
18K 36K 72K 144K 1M 16M
Que ies pe Second
Hund eds
Recall @K=50
G id esolu ion
Uni o m g id-based Quad ee-based
QPS (Uni o m g id-based) QPS (Quad ee g id-based)
0.01
0.1
1
10
100
1000
0.00
0.20
0.40
0.60
0.80
1.00
18K 36K 72K 144K 1M 16M
Que ies pe Second
Hund eds
Recall @K=500
G id esolu ion
Uni o m g id-based Quad ee-based
QPS (Uni o m g id-based) QPS (Quad ee g id-based)
(a) (b)
Figu e 2: Recall a e and que y h oughpu compa ison o di e en g id sizes o e uni o m g id-based
ec o s and quad ee-based bi ec o s using 64 h eads. A subse o 50k eco ds om he Pa ks da ase
was used o indexing (80%) and es ing (20%). (a) Recall a K=50. (b) Recall a K=500.
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
21K 27K 39K
Recall (K=50)
G id size
Uni o m g id-based Quad ee-based
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
21K 27K 39K
Recall (K=500)
G id size
Uni o m g id-based Quad ee-based
(a)
(d)
a ea ange = 5 × 10−7, 5 × 10−5
sample size =127,806
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
21K 27K 39K
Recall (K=50)
G id size
Uni o m g id-based Quad ee-based
0.65
0.70
0.75
0.80
0.85
0.90
0.95
21K 27K 39K
Recall (K=500)
G id size
Uni o m g id-based Quad ee-based
(e)
(b)
a ea ange = 5 × 10−6, 5 × 10−5
sample size = 31,143
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
21K 27K 39K
Recall (K=50)
G id size
Uni o m g id-based Quad ee-based
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
21K 27K 39K
Recall (K=500)
G id size
Uni o m g id-based Quad ee-based
( )
(c)
a ea ange = 5 × 10−7, 5 × 10−6
sample size = 96,663
Figu e 3: Recall a e compa ison o e di e en a ea anges.
2
300
320
340
360
380
400
420
440
460
480
0.7
0.75
0.8
0.85
0.9
0.95
1
5 10 20 40 80 160 320
Que ies pe Second
Sco e
Top K
Recall P ecision F1 sco e Que y h oughpu (QPS)
Figu e 4: Pe o mance me ics e sus he numbe o nea es neighbo s (K) on he Pa ks da ase . The plo
illus a es he ade-o s be ween accu acy (Recall, P ecision, F1-Sco e) and que y h oughpu .
Table 2: E alua ion o Quad ee-based encoding (K=500).
Pa ks Wa e bodies Spo s
3k 6k 12k 3k 6k 12k 3k 6k 12k
Bi
encoding
Recall 80% 81% 82% 76% 74% 78% 63% 66% 79%
P ecision 63% 65% 65% 60% 58% 63% 62% 65% 76%
F1 sco e 68% 70% 70% 67% 65% 70% 63% 66% 77%
Que y h oughpu
(que ies/s) 1,642 1,692 1,572 1,876 1,642 1,192 3,937 2,865 1,199
Floa ing-poin
encoding
Recall 97% 97% 97% 97% 98% 98% 95% 96% 96%
P ecision 79% 79% 79% 78% 78% 79% 91% 92% 92%
F1 sco e 85% 85% 85% 86% 87% 87% 93% 93% 93%
Que ies pe second 1,115 696 357 246 512 286 974 657 410
3