Mul ile el Anno a ion o In o ma i eness
in Linguis ic Co po a: INFOLEXIS
Olga Ba iuko a, Keigh Rim & James Pus ejo sky
Au onomous Uni e si y o Mad id, B andeis Uni e si y
olha.ba siuko [email protected], k im@b andeis.edu, jamesp@b andeis.edu
Acknowledgemen s
O e iew
•Seman ic anno a ion o in o ma i eness-based
cons ain s on p edica e cons uc ion.
•Accep able ph ases mus be consis en and
in o ma i e.
•Anno a ed co po a: COCA and Co pus del Español
Web/Dialec s
Anno a ion scheme
Anno a ed cons uc ions
(1) a. Ea a ioli – ea ??(o ganic) ood
b. {Rew i en/ published} no el - ??( ecen ly) w i en
no el
c. This no el was { ew i en / published} – This no el
was w i en ??( ecen ly)
d. This book is yellowing - This book eads ??( as ).
Anno a ion examples
STEP 1. All he examples in (1) a e consis en . Inconsis en
example: The no el was de ou ed.
STEP 2. Fi s membe o each pai in (1): in o ma i e.
Second membe : unin o ma i e.
STEP 3. Addi ional modi ie s in unin o ma i e exp essions:
‘ ecen ly ead no el’, ‘ his book eads as ’.
Con as i e in e p e a ion o unin o ma i e exp essions: ‘a
WRITTEN no el’ is opposed o he no els ha ne e came o
exis .
Ve b classes: acquisi ion, caused mo ion,
communica ion, consump ion, c ea ion, physical change
o s a e, pe cep ion, psychological s a e, pu ing,
ans e , ligh .
INFOLEXIS Co pus Sea che (S ep 1)
Main con ibu ions o INFOLEXIS
Rela ed o he CLARIN objec i es:
1. No el mul ile el anno a ion scheme applicable o all
na u al languages.
2. 3 manually anno a ed da ase s:
•Syn ac ic cons uc ions classi ied as (in)consis en
and linked o hei o iginal syn ac ic con ex (STEP 1).
•Consis en cons uc ions classi ied as (un)in o ma i e
(STEP 2).
•Unin o ma i e cons uc ions wi h syn ac ic
componen s ha make hem accep able (STEP 3).
3. Anno a ion ools ha can be eused in o he manual
anno a ion p ojec s.
Theo e ical con ibu on:
•A aluable empi ical ool o he s udy o
in o ma i eness as an in e ace phenomenon, in ol ing
lexical-seman ic, syn ac ic, and p agma ic ac o s.
•INFOLEXIS con ibu es o he unde s anding o he
mechanisms unde lying selec ional cons ain s a he
ph asal and sen en ial le el.
Selec ed e e ences
•Ba iuko a, O. (2024). “Res icciones léxicas sob e la in o ma i idad de la
cons ucción <nomb e + pa icipio>”, Ve ba, 51.
•Ba iuko a, O., & Pus ejo sky, J. (2013). “In o ma i eness cons ain s and
composi ionali y”, P oceedings o GL2013, 92–100.
•Lahi i, S. (2015). “Squinky! a co pus o sen ence-le el o mali y,
in o ma i eness, and implica u e”, a Xi p ep in a Xi :1506.02306
•Molina, A., e al. (2013). “Discu si e sen ence comp ession”, In e na ional
Con e ence on In elligen Tex P ocessing and Compu a ional Linguis ics.
•Nishikawa, H., e al. (2010). “Op imizing in o ma i eness and eadabili y
o sen imen summa iza ion”, P oceedings o he ACL 2010 Con e ence.
•Pus ejo sky, J., & S ubbs, A. (2012). Na u al language anno a ion o
machine lea ning, O’Reilly Media.
•Rennie, J. D., & Jaakkola, T. (2005). “Using e m in o ma i eness o
named en i y de ec ion”. P oceedings o he 28 h ACM SIGIR con e ence on
Resea ch and de elopmen in in o ma ion e ie al.
This p esen a ion has been unded by:
• MCIN/AEI/10.13039/501100011033 and FSE+ unde he p ojec
PID2022-138135NB-I00.
•Comunidad de Mad id unde he ag eemen “The node CLARIAH-
CM: Digi al Humani ies and Language Techonologies” (4180134).