scieee Science in your language
[en] (orig)

Quantum Criticism: an Analysis of Political News Reporting

Author: 'Adhim, Achmad Fauzil
Publisher: Zenodo
DOI: 10.5281/zenodo.17721668
Source: https://zenodo.org/records/17721668/files/7220mlaij01.pdf
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
DOI:10.5121/mlaij.2020.7201 1
QUANTUM CRITICISM: AN ANALYSIS OF POLITICAL
NEWS REPORTING
Ashwini Badguja 1, Sheng Chen1, Pezanne Khamba a1, Tue hu T an1,
And ew Wang1, Kai Yu1, Paul In e ado2 and Da id Guy B izan1
1Depa men o Compu e Science
Uni e si y o San F ancisco, San F ancisco, CA, USA
2Depa men o Ma hema ics and Da a Science
Uni e si y o San F ancisco,San F ancisco, CA, USA
ABSTRACT
In his p ojec , we con inuously collec da a om he RSS eeds o adi ional news sou ces. We apply
se e al p e- ained implemen a ions o named en i y ecogni ion (NER) ools, quan i ying he success o
each implemen a ion. We also pe o m sen imen analysis o each news a icle a he documen , pa ag aph
and sen ence le el, wi h he goal o c ea ing a co pus o agged news a icles ha is made a ailable o he
public h ough a web in e ace. We show how he da a in his co pus could be used o iden i y bias in news
epo ing, and also es ablish di e en quan i iable publishing pa e ns o le -leaning and igh -leaning
news o ganisa ions.
KEYWORDS
Con en Analysis, Named En i y Recogni ion, Sen imen Analysis, Poli ics, News
1. INTRODUCTION
Many o us implici ly belie e ha he news we consume is an impo an summa y o he e en s
ge mane o ou li es. Rega dless o how we di ide ou sel es—by demog aphics, poli ical
leaning, p o ession o o he socioeconomic schism—we ely on us ed indi idual jou nalis s and
he news o ganiza ions o which hey belong o dis ill s o ies and p o ide unbiased con ex .
The e a e se e al o ganiza ions ha a emp o add ess his need. USAFac s.o g is a non-p o i
o ganiza ion and websi e which o e s a non-pa isan po ai o he US popula ion, i s
go e nmen ’s inances, and go e nmen ’s impac on socie y. Simila si es and ou le s ha e had
he same mission, pe haps mos p ominen ly MIT’s Da a USA and he US go e nmen ’s
da a.go . These e o s, howe e , la gely deal wi h qua e ly o bi-annual go e nmen epo s,
excluding day- o-day news analysis abou business, poli ics, e c.
Mo e imely news on hese excluded opics can ypically be ound epo ed on by p i a e news
o ganiza ions, o en unded by a subsc ip ion o ad-based model. The e a e, howe e , a subse o
a icles ha a e eely a ailable o he public. News p oduce s o en p omo e selec ed a icles
h ough hei eal simple syndica ion (RSS) eeds, consumed by phone o web applica ions such
as Feedly, NewsBlu and FlowReade , among o he s.
News o ganiza ions should be a e lec ion o he popula ions hey ep esen . Ye , despi e ease o
access o news a icles h ough RSS eeds, we ind a dea h o esou ces suppo ing he analysis
o said news a icles, e.g., how he news is epo ed o how i may be a ec ing ou li es o e
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
2
ime. Fo example, obse ing clima e change denial, one jou nalis om Vox, Da id Robe s, has
named he cu en Ame ican philosophical di ide “ ibal epis emology,” speci ically discussing
he ibalism o in o ma ion h ough he news [1]. While his p esen a ion is compelling, he idea
o ibal epis emology is la gely deli e ed wi hou an analysis o he news om sou ces which he
c i iques. Robe s’s lack o analysis could be he esul o ha ing no acile manne o ind and
analyse daily news a icles om mul iple sou ces in a single co pus.
In ou su ey o exis ing news co po a (Sec ion 2), we ind exis ing co po a lacking in one o
mo e aspec s, including cos , a ailabili y, co e age and/o analysis. We he e o e c ea e ou own
co pus, Quan um C i icism, o add ess hese issues. Speci ics o he ools and app oaches we use
o build ou co pus a e discussed in Sec ion 3. We discuss he pe o mance o ou ools in Sec ion
4. We aspi e o ou co pus o be used by jou nalis s and o hose in academic esea ch o
es ablish ends, iden i y di e ences, and a ec change in news epo ing and i s in e p e a ion. In
Sec ion 5, we demons a e wo ways in which ou co pus can be used o unco e po en ial media
bias.
2. RELATED WORK
We begin he Rela ed Wo k sec ion by highligh ing exis ing co po a ha ha e some co e age o
analysis limi a ion, discussed in Sec ion 2.1. Sec ion 2.2 b ie ly e iews common asks in na u al
language p ocessing, as well as some o he a ailable ools o accomplishing hose asks. Las ly,
in Sec ion 2.3, we explo e se e al use-cases o exis ing news co po a.
2.1. Co po a
The e a e se e al ou comes o o ming a news-based co pus. One may be he ask o language
modelling. Jou nalis s and news o ganiza ions can be ba ome e s o when a wo d ge s in oduced
o a language. Ano he impo an use o news-based co po a is he de i a ion o la ge social
pa e ns om indi idual uni s o epo ing.
The consume s o a news co pus mus ega d jou nalis s and news o ganisa ions as impe ec
messenge s. As a back as 1950, Whi e [2] demons a ed ha he news we ead is equen ly
colla ed by a se o “ga e keepe s” who il e candida e e en s. These ga e keepe s may ha e
biases based on ideological (libe al o conse a i e) leanings, ace o gende [3], economic
in e dependence [4] and geopoli ical a ilia ion [5], likely only some o he many ac o s
in luencing a news s o y’s selec ion. One use o a p ope ly cons uc ed co pus could be he
unea hing o selec ion bias o o he biases.
Selec ion bias may be he esul o he choices o no only he speci ic jou nalis s bu also he
news o ganiza ions and hei owne s [6]. In a la ge-scale s udy based on a icles om he GDELT
da abase, [7] lays ou he cons ain s unde which he news o ganiza ions ope a e and quan i y he
selec ion bias o news o ganiza ions.
P io o building ou Quan um C i icism co pus, we conside ed a numbe o o he co po a
assembled om news a icles, all appea ing online. The Linguis ic Da a Conso ium (LDC) has
an ex ensi e collec ion, including he New Yo k Times co pus [8], which we use o alida ion o
ou ools. (De ails in Sec ion 4.) This co pus con ains 1.8 million news a icles om he New
Yo k Times o e a pe iod o mo e han 10 yea s, co e ing s o ies as di e se as poli ical news and
es au an e iews. A icles a e p o ided in an XML o ma , wi h he majo i y o he a icles
agged o named en i ies—pe sons, places, o ganiza ions, i les and opics—so ha hese named
en i ies a e consis en ac oss a icles.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
3
The LDC also o e he No h Ame ican News Co pus [9], assembled om a ied sou ces,
including he New Yo k Times, he Los Angeles Times, he Wall S ee Jou nal and o he s. The
p ima y goals o his co pus a e suppo o in o ma ion e ie al and language modelling, so he
coun o “wo ds”—almos 350 million okens—is mo e impo an han he numbe o a icles.
Also o e ed by he LDC is he T eebank co pus [10], o en called he Penn T eebank, which has
been an impo an and endu ing language modelling esou ce; see [11] o an ea ly use o his
co pus, and [12] o a mo e ecen implemen a ion.
Collec i ely, he LDC co po a and hei like a e excellen esou ces o news gene a ed om a
disc e e numbe o sou ces du ing a pa icula pe iod o ime. Because o hei olume o a icles
and okens, and because hey a e mos ly w i en in S anda d Ame ican English, hey a e ideal o
building language models om he pe iod du ing which hey we e collec ed. Howe e , we ind
he a o emen ioned co po a b oadly lacking in a numbe o a eas, chie ly, in hei s a ic na u e:
hese co po a do no con inuously collec new a icles. Depending on he esea ch being
conduc ed, esea che s may equi e cu en a icles as well as his o ic ones. We also ind laws in
he agging o he a icles in he New Yo k Times Anno a ed Co pus, bu lea e he ull ea men
o his o Sec ion 4. Finally, we ind ha p ocessing hese a icles equi es a non- i ial cos and
e o . Finding a icles in which a pa icula pe son, place o o ganisa ion is men ioned equi es a
sea ch h ough a conside able numbe o a icles, o which he e a e no addi ional ags.
In con as o he o e ings by he LDC, he Global Da abase o E en s, Language and Tone [13],
known as GDELT, has a dizzying a ay o ools o sea ching and analysing hei co pus. Wi h a
public, no-cos access o a icles om 1979 o p esen , albei o e ed a a 48-hou delay, and a
commi men o he con inued collec ion o news om a wide a ie y o sou ces, GDELT’s
o e ings ha e esul ed in insigh ul esul s, some o which a e explo ed he ein.
One c i icism o GDELT by Wa d e al. [14] is ha he collec ion e o has been op imized o
olume o news a icles and speed o analysis h ough au oma ed echniques, sac i icing he
ca e ul cu a ion o a icles. This esul s in he imp ope classi ica ion o a icles, e ing mos ly
owa d alse posi i es, i.e., p esen ing mo e news a icles as ela ed o an e en han is wa an ed.
In e ms o implemen a ion, ou Quan um C i icism co pus is closes o he News on he Web
(NOW) Co pus, i sel a public- acing e sion o he Co pus o Con empo a y Ame ican English
[15]. As o he ime o his w i ing, his co pus epo s con aining 8.7 billion wo ds om a
numbe o Ame ican English sou ces, including such a ied sou ces as he Wall S ee Jou nal
and ige d oppings.com, he s uden newspape o Louisiana S a e Uni e si y. While he di e si y
o ou Quan um C i icism co pus is no as ex ensi e as wha we ind in he NOW Co pus, ou
ini ial e sion o he Quan um C i icism co pus con ains one non-Ame ican English sou ce and
allows he use o speci y he sou ce(s) o a que y. We belie e he powe o ou sea ch and
p esen a ion makes ou co pus a be e analysis ool.
2.2. O e iew o NLP Tools
We analyse news a icles in wo ways: h ough named-en i y ecogni ion and sen imen analysis.
Ou sea ch ool exposes he esul s o hese analyses simul aneously. In ee- o m ex , named-
en i y ecogni ion (NER) seeks o loca e and classi y he names o (among o he en i ies) people,
o ganisa ions and loca ions. Al hough he e a e o he possible ca ego ies o named en i ies, we
selec ed hese h ee classes based on a ailable esou ces and commonali y o model ou pu s.
Th ee powe ul and o used NER ools include BERT (Bidi ec ional Encode Rep esen a ions
om T ans o me s, [16]), which uses BIO agging, Co eNLP [17], which o e s bo h IO and BIO
agging, and spaCy [18], which employs IOB agging.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
4
2.3. Use Cases o Co po a
Using co po a and NLP ools, we can disco e he biases o a jou nalis , a news o ganisa ion o
he a ge audience o he news. The e ec s o biases can e ec change on he poli ical o
sociological li es o a people. We see some in e es ing examples o hese e ec s. While wo k by
Ra ail and McCa hy [19] s ops sho o he claim ha some news o ganiza ions made he Tea
Pa y—a small, igh -leaning mo emen —a poli ical o ce, he e may be ample e idence o d aw
such a conclusion. The sugges ion is ha he news media simpli ied he message o he pa y so
ha i could be consumed by a wide audience, as well as ampli ied he co e age o he pa y’s
e en s beyond he size i s suppo e s would no mally wa an gi en hei numbe s.
A mo e pe nicious e ec may be seen in he co e age o he Pe sian Gul “C isis” and
subsequen wa o he ea ly 1990s [20]. He e, he media was ocused on s o ies which, among
o he e ec s, made eade s inclined o a ou mili a y a he han diploma ic pa hs. In u n, his
had an e ec on he poli ical leade ship o he ime. The au ho s also ind an in e es ing e ec
whe ein he selec ion bias o s o ies was p opo ional o public in e es in such s o ies.
In e es ingly, wo k by So oka e al. [21] sugges s he opposi e e ec may be a o ce. He e, he
“s eng h” o sen imen in social media eac ions di e om he news media co e age in some
economic news co e age. As a esul , he con ex s and deg ee o which public opinion a ec s
news co e age o ice e sa dese e addi ional s udy.
Sys ema ic analysis o media co e age o en in ol es aming he con en om he poin -o - iew
o he eade . A pape by An and Gowe [22] discusses i e ames (a ibu ion o esponsibili y,
human in e es , e c.) and wo “ esponsible pa ies” (indi iduals s. o ganiza ions) in co e age o
c ises, inding ha some ames a e mo e common han o he s. Simila ly, T umbo [23] examines
he di e ing eac ions o scien is s and poli icians o clima e change. While analysis app oaches
end o ocus on he con en p oduced, wo k by Ribei o e al. [24] examines he poli ical leanings
and demog aphics o he a ge audience h ough he ad e ising associa ed wi h he con en . We
see his kind o side-channel in es iga ion as p omising, especially i applied sys ema ically o a
la ge se o da a.
Ou Quan um C i icism co pus is designed wi h hese ypes o analysis in mind. We ag each
a icle o named en i ies and sen imen and expose his co pus o he public. We expec his
co pus o ha e mul iple pu poses, including sociological esea ch on in luen ial people and
o ganiza ions, “ aming” news a icles and assigning esponsible pa ies, and he de ec ion o
selec ion bias and o he biases in a media o ganiza ion’s co e age. We p o ide de ails on how
each elemen o ou pipeline is buil , and quan i y he pe o mance using well-es ablished
me ics. We conclude by alida ing he ools employed and discussing wo use cases o ou
co pus.
3. CORPUS AND DATA PROCESSING
The da a used o ou Quan um C i icism e o was collec ed, managed, and p ocessed using a
p op ie a y sys em designed o sc ape, pa se, s o e and analyse he con en o news a icles om a
a ie y o sou ces. Se e al sen imen and named en i y ecogni ion ools we e un agains he
collec ed news a icles. We also implemen ed a cus om en i y esolu ion algo i hm, p o iding a
ich da a se upon which o explo e se e al hypo heses. A pic o ial summa y o he inges ion,
analysis and s o age pipeline is shown in Figu e 1.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
5
Figu e 1: A Summa y o he Inges ion, Analysis and S o age Pipeline
3.1. News Sc ape
Se e al cus om web sc ape s we e c ea ed o e ie ing news a icles om a ious online news
o ganiza ions. All web sc ape s we e un e e y wo hou s o e ie e a icles om he ollowing
i e news si es: he A lan ic, he B i ish B oadcas ing Co po a ion (BBC) News, Fox News, he
New Yo k Times and Sla e Magazine. Web sc ape s con inue o un e e y wo hou s in
pe pe ui y, sc aping addi ional news a icles. Collec i ely, he web sc ape s used each news
o ganiza ion’s RSS eed as inpu , s o ing he sc aped ou pu in o a cus om da abase. A icle
URLs we e used o disambigua ion; whe e wo sc aped a icles sha ed a URL, he mos ecen ly
e ie ed a icle eplaced p e ious e sions o a icles.
As o July 2020, we collec ed a o al o 150,000 news a icles om nine media o ganiza ions.
Figu e 2 depic s he numbe o cumula i e a icles sc aped o each news o ganiza ion o e ime.
E en hough a icles om Fox News we e egula ly sc aped ou mon hs la e han o he news
sou ces, he numbe o a icles sc aped ose quickly, and now cons i u es he news o ganiza ion
wi h he mos sc aped a icles. Gi en he news sc ape s un a egula ly scheduled wo-hou
in e als o all news o ganiza ion, his sugges s ha Fox News upda es i s RSS eed wi h new
a icles a mo e o en han o he s, and he A lan ic upda es i s RSS eed a less equen ly han
o he s.
Figu e 2: Cumula i e Quan i y o A icles Sc aped by News O ganiza ion

Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
6
3.2. Da a and Da abase Managemen
All sc aped da a is s o ed in a Ma iaDB ela ional da abase. We conside ed a NoSQL da abase,
especially one ocused on s o ing documen s, such as MongoDB; howe e , we ound ha a
ela ional da abase was app op ia e o he needs o his p ojec .
We cons uc ed many “p ima y” ables o suppo he sc aped a icles. The mos impo an o
hese ables a e he a icle, media (e.g., The A lan ic, BBC, e c. ep esen ing he news
o ganiza ion) and en i y (a named pe son, loca ion o o ganiza ion) ables. To suppo modelling
he many- o-many ela ionship be ween a icle and en i y, we ha e one “join” able (a icle
en i y). To suppo he wo k in sen imen analysis and named en i y ecogni ion; we also c ea ed
ables o s o e he ou pu s o he algo i hms o hese asks. Fo sen imen analysis, we c ea ed a
able called “sen imen . Fo named en i y ecogni ion, we c ea ed a able “en i y.” O he ables in
ou schema a e omi ed o b e i y. Cou esy o dbdiag am.io, a schema appea s as Figu e 3.
Figu e 3: Schema o he Quan um C i icism Da abase
3.3. Sen imen Analysis
Fo each news a icle, we gene a ed a sen imen sco e. We employed bo h he VADER (Valence
Awa e Dic iona y and sen imen Reasone ) [25] module, as implemen ed in NLTK [26] in
py hon, as well as Co eNLP sen imen analysis. Sen imen sco es in VADER a e con inuous
alues be ween -1 ( e y nega i e) o +1 ( e y posi i e), wi h 0 ep esen ing neu al sen imen .
Sen imen sco es in Co eNLP a e in ege alues be ween 0 ( e y nega i e) and 4 ( e y posi i e),
wi h 2 ep esen ing neu al sen imen .
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
7
Sen imen analysis ools we e un agains each sen ence and each pa ag aph in he a icle, as well
as on he en i e a icle. Fo example, i an a icle con ained wo pa ag aphs, whe e pa ag aph 1
con ains wo sen ences and pa ag aph 2 con ains one sen ence, we would ha e calcula ed six
di e en sen imen sco es pe sen imen analysis ool: one o each sen ence (3), one o each
pa ag aph (2), and one o he a icle (1). This decons uc ed app oach allows esea che s o
associa e named en i ies wi h hei associa ed sen imen a a quan um le el. This g anula le el o
sen imen may help disambigua e he sen imen o an a icle wi h espec o he named en i ies.
Fo example, an a icle om a conse a i e news o ganiza ion may be posi i e o e all, howe e ,
i is likely o be mo e c i ical o mo e libe al poli icians, o ganiza ions o causes men ioned
he ein, and mo e suppo i e o conse a i e o ganiza ions o causes. Ou quan um app oach o
sen imen analysis allows esea che s o pa se sen imen a he sen ence le el and associa e ha
sen imen wi h named en i ies, independen ly o he pa ag aph o a icle in he agg ega e.
3.4. Named En i y Resolu ion
We employed eigh named en i y ecogni ion (NER) models om Co eNLP, spaCy, and BERT
packages o iden i y PERSONs, ORGANIZATIONs and LOCATIONs in news a icles. While
some models p edic di e en NER ca ego ies, we sough only hose en i ies which we e agged
using he algo i hm depic ed in Figu e 4.
Figu e 4: Pseudo-code o Named En i y Resolu ion Wi hin a News A icle
We s o e each NER model ou pu indi idually in ou da abase. In many a icles, a named en i y is
e e enced by a comple e name o i le, and subsequen ly, by a sho ened e sion. Fo example, a
ecen New Yo k Times Opinion a icle (Democ a s’ Vulne abili ies? Eli ism and Nega i i y)
i s e e s o poli ician Alexand ia Ocasio-Co ez by he ull name, and hen subsequen ly as
Ocasio-Co ez. In o de o connec e e ences o he same named en i y, we implemen ed a
cus om en i y esolu ion algo i hm. Owing o he highly s uc u ed manne in which we obse ed
news a icles we e w i en, we expec ed o obse e he pa e n o an en i y’s ull name, ollowed
by pa ial name. Ou algo i hm he e o e ma ched any name ex ac ed in an a icle as a subs ing,
o he mos ecen ins ance o ano he name in he same a icle. Whe e a ma ch occu ed, he wo
names a e de e mined o be he same en i y. Such an en y is ma ched o c ea ed by ca ego y
(PERSON, LOCATION, ORGANIZATION).
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
8
This p ocess o en ailed o abb e ia ions such as he ac onym F.B.I.—in e e ence o he
Fede al Bu eau o In es iga ion—wi h pe iods le ou , esul ing in FBI. We he e o e also
c ea ed cus om code o que y a co pus o abb e ia ions and associa e ac onyms o hei ull
names. Only ull names we e s o ed in he da abase. We label each such ins ance o he ull name
a esol ed en i y. The en i y esolu ion algo i hm is depic ed in Figu e 4. En i ies a e also
esol ed ac oss a icles in a simila manne .
3.5. Web In e ace
We designed and implemen ed a web in e ace o ou co pus. Th ough his in e ace, a use can
speci y basic sea ch c i e ia o he a icles, speci ically: he en i y name, in whole o pa , o he
en i y o be sea ched; he news sou ce(s) o be sea ched om among hose in ou da abase; and
he da e o da e ange o he a icles. (See Figu e 5.)
Figu e 5: Sea ch Sc een o Web In e ace
Addi ionally, ad anced sea ch c i e ia—no shown in Figu e 5 bu a ailable on he li e web
in e ace—allow he use o include addi ional il e s o speci ic NER ools, he sen imen ools,
and/o he le el o g anula i y (a icle, pa ag aph o sen ence) o be epo ed. Upon execu ing a
success ul que y, a small subse o he esul s is displayed so ha he use may pe o m a quick
alida ion. In addi ion, a link is p o ided o allow he use o download he ull se o esul s in
comma-sepa a ed alue (CSV) o ma . Each ow in he esul se con ains he ields lis ed in
Table 1.
Table 1: Fields in he Resul Se o he Web In e ace
Field
Type
No es
id
in ege
Da abase able ID
en i y
s ing
Full name o en i y
en i y id
in
Da abase able ID
ype
Enum
PER, LOC o ORG
da e
Da e
Las modi ied da e
u l
S ing
A icle’s URL
NER ool
S ing
Name o he NER ool
pa ag aph
in
Possibly NULL o emp y
sen ence
in
Possibly NULL o emp y
sen imen sco e
loa
sen imen ool
S ing
Name o he model
media name
S ing
“Fox News”, “Sla e,” ...
media u l
S ing
URL o he news o g.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
9
No ably absen om he columns in he esul se a e he con en s o he a icle. This absence is
delibe a e. While ecen legal ulings ha e sugges ed ha dis ibu ing con en p oduced by hi d
pa ies is pe missible, we a e unsu e abou whe he ha uling is he inal wo d on his o whe he
he uling applies globally. As a esul , we p o ide he URL o he sou ce a icle, allowing he
use o download he con en hemsel es.
4. VALIDATION
We employ well-s udied ools wi h es ablished pe o mance benchma ks in ou da a inges ion
and p ocessing pipeline. In his sec ion, we desc ibe how we e alua ed he pe o mance o hose
ools. Fo he alida ions epo ed in his sec ion, we used wo news co po a: a his o ical New
Yo k Times co pus and ou Quan um C i icism co pus o sc aped news a icles.
4.1 Named En i y Recogni ion Valida ion
We es he e icacy o ou eigh NER models ac oss h ee di e en NER ools using wo
app oaches. Fi s ly, we execu ed each o he models agains he a icles in he New Yo k Times
Anno a ed Co pus wi h a 1s o Decembe publica ion da e ac oss all 20 yea s co e ed by he
co pus. Secondly, we explo e he ideli y o he NER ools by examining how well hey iden i y
he 538 membe s o he U.S. Cong ess (Sena e and House o Rep esen a i es), as iden i ied by
Ballo pedia.
4.1.1. NER Tools s. he LDC New Yo k Times Co pus
To de e mine he ideli y o ou esul s, we an each NER model agains he New Yo k Times
Anno a ed Co pus [8], o which named en i ies a e p o ided as an adjunc lis . The co pus
con ains 1.8 million a icles om he New Yo k Times om he yea s be ween 1987 and 2007.
While we ound ha he 4,713 a icles om he 1s o Decembe 1987–2007 was a su icien ly
ample olume om which o d aw conclusions, we es ed an addi ional en mon hs o da a o he
spaCy and Co eNLP models, inding no signi ican de ia ion om he esul s we epo he e.
Fo each o he a icles published on he 1s o Decembe , we de e mined he mean (and s anda d
de ia ion) o he ollowing in each a icle: he numbe o okens pe a icle: 587.2 (643.7); and
he numbe o named en i ies iden i ied by he models pe a icle: 31.8 (43.9). Fo each o he
models, we also compu ed he p ecision, ecall and F1 sco e o each a icle. The BERT be .
base. mul ilingual. cased model gene a ed he highes mean p ecision and mean F1 sco es o
0.1753 and 0.2549, espec i ely, whe eas he highes mean ecall sco e was ob ained om he
Co eNLP en-glish. all. 3class. dis sim. c . se model. We obse ed a consis en ly low F1 sco e
o all NER models, despi e he a iable numbe o en i ies iden i ied by he classi ie s. Some o
his poo pe o mance may be explained by he models’ gene a ion o imp ope ly esol ed
en i ies in he body o he a icle. Howe e , we belie e ha his poo pe o mance can be la gely
a ibu ed o e o s in he labels o he sou ce co pus.
To con i m his hypo hesis, we examined se e al a icles om he New Yo k Time Anno a ed
Co pus, and ound dis-ag eemen wi h he named en i ies iden i ied in he manual agging o he
co pus. Fil e ing o he named en i y classes PERSON, LOCATION and ORGANIZATION in
one o hese examined a icles, Homicides Up in New Yo k; O he C imes Keep Falling, we ind
only h ee ags om he co pus: Ca a Buckley, he a icle’s au ho ; New Yo k Ci y, he loca ion
being epo ed; and he Fede al Bu eau o In es iga ion. These ins ances iden i ied by he co pus
a e highligh ed in blue. In con as , one o he au ho s, a na i e English speake who has
pe o med se e al anno a ion asks on o he p ojec s, iden i ied se e al o he named en i ies.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
16
o ganiza ions, as de ailed in Sec ion 3.4. No e he e ha ce ain media o ganiza ion o which we
only ecen ly began collec ing da a ha e been omi ed om his lis , including B ei ba , CNN,
Reu e s and The New Yo ke .
Table 3: Mean Numbe o En i ies Pe A icle pe Media O ganiza ion
When combined wi h da a om Table 2, his sugges s ha Fox News published e y equen ly
in compa ison o i s media coun e pa s, bu publishes a icles ha a e a mo e ocused, and
discuss oughly hal he amoun o people, places and/o o ganiza ions han he New Yo k Times
does in each a icle.
To no malize his measu e, he mean numbe o sen ences pe a icle was compu ed (see Table 4)
and was di ided by he mean numbe o en i ies pe a icle in Table 3 o ob ain he mean numbe
o en i ies discussed pe sen ence in Table 5. The esul s demons a e ha al hough Fox News
does p oduce a icles wi h a ewe mean sen ences pe a icle, each sen ence discussed he
highes numbe o mean en i ies. Con e sely, he BCC has sen ences wi h he ewes numbe o
named en i ies pe sen ence.
Table 4: Mean Numbe o Sen ences pe A icle pe Media O ganiza ion
Las ly, we explo e mean sen imen associa ed wi h di e en news o ganiza ions in Figu e 12
using bo h Co eNLP (sco ing in ege alues om 0 o 4 inclusi e, wi h 0 being nega i e and 4
being posi i e) and VADER (sco ing om -1 o +1 inclusi e, wi h -1 being nega i e and +1
being posi i e). Bo h he New Yo k Times and Sla e ha e s a is ically signi ican ly highe mean
sen imen pe a icle, when using bo h Co eNLP and VADER o measu e sen imen . VADER
epo s sen imen o hese wo media o ganiza ions as posi i e, whe eas Co eNLP epo s hem as
nega i e, albei less nega i e (i.e., mo e posi i e) han ce ain o he news o ganisa ions
Compa a i ely, Fox News is epo ed o ha e s a is ically signi ican ly lowe and nega i e
sen imen .
Table 5: Mean Numbe o En i ies pe Sen ence pe Media O ganiza ion
News O ganiza ion
Mean Numbe o En i ies Pe A icle
BBC
39.6
Fox News
29.8
New Yo k Times
63.3
Sla e
45.0
The A lan ic
65.4
News O ganiza ion
Mean Numbe Sen ences pe A icle
BBC
51.9
Fox News
32.8
New Yo k Times
77.5
Sla e
53.0
The A lan ic
82.8
News O ganiza ion
Mean Numbe o En i ies pe Sen ence
BBC
0.764
Fox News
0.908
New Yo k Times
0.816
Sla e
0.849
The A lan ic
0.799

Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
17
Figu e 12: Mean Sen imen pe A icle pe News O ganisa ion
A pa e n begins o eme ge he e, whe e le -leaning o ganiza ions end o w i e less equen ye
signi ican ly longe a icles, wi h ewe e e ences o named en i ies pe pa ag aph, and an o e all
inc eased le el o sen imen . The longe a icles also end o include ela i ely many mo e named
en i ies in he a icles, making hem also mo e complex. Righ -leaning o ganiza ions end o w i e
equen , sho e a icles, wi h a high numbe o named en i ies pe sen ence wi h a nega i e mean
sen imen . Howe e , hey ha e ewe o e all named en i ies pe a icle, which implies he a icles
a e mo e ocused on a smalle subse o ideas o connec ions. This ecipe o sho e a icles,
ewe named en i ies pe a icle, nega i e sen imen , and highe equency o publica ion seems o
be success ul, wi h he Fox News ne wo k being he dominan news ne wo k on ele ision in he
USA.
5.3. The Augus 2019 Mass Shoo ing in El Paso
We use an e en as an exempla so ha we could in es iga e some phenomena mo e ho oughly.
The backg ound o his e en occu ed in Augus 2019. The ci y o El Paso, TX was
un o una ely he si e o a mass shoo ing. All o he media ou le s co e ing he e en (The
A lan ic, BBC, Fox News, he New Yo k Times and Sla e) acknowledged he shoo e was acis
and ha Mexicans we e he p ima y a ge . All he ou le s alluded o connec ions be ween his
e en and wo o he s—a mass shoo ing in Ch is chu ch, New Zealand, and ano he in a
synagogue loca ed in Day on, OH—because o hei ime and mo i a ions o he shoo e s.
Days a e he e en , he U.S. P esiden , Donald T ump, isi ed El Paso. In addi ion o he
co e age o he shoo ing mo e b oadly and he ensuing eac ions om hose in go e nmen and
en e ainmen , his isi was co e ed by h ee o he media ou le s om which we collec ed da a
(BBC, Fox News and he New Yo k Times). We chose his isi because o i s po en ial o
highligh he di e ences among he di e en media ou le s, speci ically as a la ge ep esen a ion
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
18
o he co e age ypical o each ou le . While o he media ou le s co e ed he backg ound
shoo ing and ela ed e en s, we could no ind co e age o his speci ic e en (T ump’s isi ) in
he RSS eeds o he A lan ic o o Sla e.
Table 6: Co e age o T ump’s Visi o El Paso ollowing Mass Shoo ing
We used NLTK’s sen ence okenize o de e mine he numbe o sen ences in each a icle. As
shown in Table 6, hese esul ed in e y di e en leng hs. The sho es , 11 sen ences by Fox
News, migh be cha ac e ised less abou T ump’s ac ual isi and mo e as a desc ip ion o a
con empo aneous discussion be ween New Yo k Ci y mayo Bill De Blasio and Fox News
commen a o Sean Hanni y, along wi h commen s made by o he poli ical igu es. By con as he
compa able a icles in he BBC and he New Yo k Times we e longe (49 and 69 sen ences
espec i ely) and men ioned mo e people o e all – albei ewe pe sen ence, as desc ibed in
Tables 4 and 5). These a icles also used he ex a space o p o ide addi ional con ex and
backg ound, such as T ump’s his o y o being unable o console ollowing a disas e and
poli icians local o he El Paso a ea.
Using VADER as implemen ed in NLTK, we examined he o e all sen imen as well as he
sen imen a he sen ence le el. The o e all sen imen o each a icle is simila , ho e ing
be ween -0.2934 and -0.1057 he BBC, Fox News and he NY Times. Figu e 13 shows how
sen imen changes o each sen ence h ough each a icle, sc aped ch onologically. We did
howe e obse e a no ably lowe s anda d de ia ion o sen imen o sen ences in Fox News
a icles when compa ed wi h bo h he BBC and he New Yo k Times (see Table 6).
Taken wi h he signi ican ly lowe sen imen o all a icles, a p o ile o Fox News eme ges. Thei
a icles a e designed o be ead quickly, con ain mo e asse ions by hei commen a o s and o he
popula igu es and a e w i en wi h a ocus on he nega i e elemen s o newswo hy e en s.
News O ganiza ion
Sen ences
S anda d De ia ion o Sen imen (VADER)
BBC
49
0.4287
Fox News
11
0.3634
New Yo k Times
69
0.4087
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
19
Figu e 13: Sen imen pe Sen ence o a Rep esen a i e A icle pe Media O ganisa ion
5.4. Sen imen o José Se ano
Scou ing le -leaning news o ganiza ions, we obse ed a peculia pa e n. When epo ing on a
le -leaning poli icians—in Ame ica, ypically, a Democ a — he sen imen associa ed wi h his
epo ing ollows a pa e n whe eby he sen imen o he o e all a icle is lowe han ha o he
sen imen associa ed wi h pa ag aphs in which he poli ician in ques ion is e e enced, which
i sel has a is lowe sen imen han he sen ence(s) in which he poli ician is men ioned. In sum,
he mo e ocus he e is on he poli ician he sel , he highe he sen imen . This has been shown o
be ue o se e al le -leaning poli icians when que ying he Quan um C i icism co pus. This
ule, howe e , is iola ed, when a seminal e en . Fo example, when José E. Se ano, Democ a
ep esen ing he 15 h dis ic o New Yo k announced his e i emen , he o e all sen imen o he
a icle jumped o a alue highe han ei he he pa ag aph-o sen ence-le el sen imen (see Figu e
14), a change only de ec able wi h he sen ence-le el o g anula i y p o ided by he Quan um
C i icism co pus.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
20
Figu e 14: Sen imen o José E. Se ano a he Sen ence, Pa ag aph and A icle le els
6. CONCLUSION AND FUTURE WORK
We collec ed a da abase o news a icles om i e popula media o ganiza ions, placed each
a icle in a pipeline o iden i y named en i ies and de e mined he a ec o each named en i y. We
iden i ied in e es ing pa e ns and con i med a geog aphic selec ion bias ound by o he e-
sea che s. Collec ing new news da a e e y wo hou s, ou pla o m shows g ea p omise o
u u e esea ch, and will u he bene i om addi ional i e a ions.
We aspi e o make his ool e en mo e use ul h ough he addi ion o news a icles om
addi ional news sou ces. Because news is some imes unde epo ed by o ganiza ions—see
Radiolab’s B eaking Bongo [27] o one unusual case–we will also conside adding selec ed
wee s and o he social media messages om indi iduals and o ganiza ions. We ha e al eady
collec ed hund eds o housands o candida e wee s which we ha e no ye il e ed o ele ance
o made a ailable. When coupled wi h be e o cus omized ools o NER, sen imen and en i y
esolu ion, we belie e his p ojec has he po en ial o unco e a wide ange o phenomena.
The addi ion o one o mo e amewo ks o coding e en da a, such as CAMEO, COPDAB o
o he s would also in-c ease he use ulness o he ool. Such amewo ks would allow compa ison
o he same se o e en s ac oss di e en media ou le s, communi ies and coun ies. The au ho s
acknowledge and hank cul u e c i ic Theodo e Gioia, who o igina ed he e m Quan um
C i icism and was a guide and inspi a ion o his wo k. So wa e Enginee Nikhil Ba apa e led
he e o o p oduce he web in e ace. The au ho s would also like o hank he Machine
lea ning, A i icial and Gaming In elligence and, Compu ing a Scale (MAGICS) Lab a he
Uni e si y o San F ancisco o suppo ing his esea ch wi h men o ship and compu a ional
in as uc u e.
Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
21
REFERENCES
[1] Robe s, D. (2017). Donald ump and he ise o ibal epis emology. Vox Media.
[2] Whi e, D. M. (1950). The “ga e keepe ”: A case s udy in he selec ion o news. Jou nalism Bulle in,
27(4):383– 390.
[3] G uenewald, J., Piza o, J., and Che mak, S. M. (2009). Race, gende , and he newswo hiness o
homicide inciden s. Jou nal o c iminal jus ice, 37(3):262–272.
[4] Wu, H. D. (2000). Sys emic de e minan s o in e na ional news co e age: A compa ison o 38
coun ies. Jou nal o communica ion, 50(2):110–130.
[5] Huibe s, E. and Joye, S. (2018). Close, bu no close enough? audiences’ eac ions o domes ica ed
dis an su e ing in in e na ional news co e age. Media, Cul u e & Socie y, 40(3):333–347.
[6] Baum, M. A. and Zhuko , Y. M. (2019). Media owne ship and news co e age o in e na ional
con lic . Poli ical Communica ion, 36(1):36–63.
[7] Bou geois, D., Rappaz, J., and Abe e , K. (2018). Selec ion bias in news co e age: lea ning i ,
igh ing i . In Companion o he The Web Con e ence 2018 on The Web Con e ence 2018, pages
535–543. In e na ional Wo ld Wide Web Con e ences S ee ing Commi ee.
[8] Sandhaus, E. (2008). The new yo k imes anno a ed co pus ldc2008 19.
[9] G a , D. (1995). No h Ame ican news ex co pus.
[10] Ma cus, M. P., San o ini, B., Ma cinkiewicz, M. A., and Taylo , A. (1999). T eebank-3. Linguis ic
Da a Conso ium, Philadelphia, 14.
[11] Bell, A., Ju a sky, D., Fosle -Lussie , E., Gi and, C., G ego y, M., and Gildea, D. (2001). Fo m
a ia ion o english unc ion wo ds in con e sa ion. Submi ed manusc ip .
[12] Kann, K., Mohananey, A., Bowman, S., and Cho, K. (2019). Neu al unsupe ised pa sing beyond
english. In P oceedings o he 2nd Wo kshop on Deep Lea ning App oaches o Low-Resou ce NLP
(DeepLo 2019), pages 209–218.
[13] Lee a u, K. and Sch od , P. A. (2013). Gdel : Global da a on e en s, loca ion, and one, 1979–2012.
In ISA annual con en ion, olume 2, pages 1–49. Ci esee .
[14] Wa d, M. D., Bege , A., Cu le , J., Dickenson, M., Do , C., and Rad o d, B. (2013). Compa ing
gdel and icews e en da a. Analysis, 21(1):267–97.
[15] Da ies, M. (2010). The co pus o con empo a y ame ican english as he i s eliable moni o co pus
o english. Li e a y and linguis ic compu ing, 25(4):447–464.
[16] De lin, J., Chang, M.-W., Lee, K., and Tou ano a, K. (2018). Be : P e- aining o deep bidi ec ional
ans o me s o language unde s anding. a Xi p ep in a Xi :1810.04805.
[17] Manning, C., Su deanu, M., Baue , J., Finkel, J., Be ha d, S., and McClosky, D. (2014). The S an o d
co enlp na u al language p ocessing oolki . In P oceedings o 52nd annual mee ing o he associa ion
o compu a ional linguis ics: sys em demons a ions, pages 55–60.
[18] Honnibal, M. and Mon ani, I. (2017). spaCy 2: Na u al language unde s anding wi h Bloom
embeddings, con olu ional neu al ne wo ks and inc emen al pa sing. To appea .

Machine Lea ning and Applica ions: An In e na ional Jou nal (MLAIJ) Vol.7, No.1/2, June 2020
22
[19] Ra ail, P. and McCa hy, J. D. (2018). Making he ea pa y epublican: Media bias and aming in
newspape s and cable news. Social Cu en s, 5(5):421–437.
[20] Iyenga , S. and Simon, A. (1993). News co e age o he gul c isis and public opinion: A s udy o
agendase ing, p iming, and aming. Communica ion esea ch, 20(3):365–383.
[21] So oka, S., Daku, M., Hiaeshu e -Rice, D., Guggenheim, L., and Pasek, J. (2018). Nega i i y and
posi i i y biases in economic news co e age: T adi ional e sus social media. Communica ion
Resea ch, 45(7):1078–1098.
[22] An, S.-K. and Gowe , K. K. (2009). How do he news media ame c ises? a con en analysis o c isis
news co e age. Public Rela ions Re iew, 35(2):107–112.
[23] T umbo, C. (1996). Cons uc ing clima e change: claims and ames in us news co e age o an
en i onmen al issue. Public unde s anding o science, 5:269–283.
[24] Ribei o, F. N., Hen ique, L., Bene enu o, F., Chak abo y, A., Kulsh es ha, J., Babaei, M., and
Gummadi, K. P. (2018). Media bias moni o : Quan i ying biases o social media news ou le s a
la ge-scale. In Twel h In e -na ional AAAI Con e ence on Web and Social Media.
[25] Hu o, C. J. and Gilbe , E. (2014). Vade : A pa simonious ule-based model o sen imen analysis o
social media ex . In Eigh h in e na ional AAAI con e ence on weblogs and social media.
[26] Lope , E. and Bi d, S. (2002). Nl k: The na u al language oolki . In P oceedings o he ACL-02
Wo kshop on E ec i e Tools and Me hodologies o Teaching Na u al Language P ocessing and
Compu a ional Linguis ics - Volume 1, ETMTNLP ’02, pages 63–70, S oudsbu g, PA, USA.
Associa ion o Compu a ional Linguis ics.
[27] Adle , S. “B eaking Bongo,” Radiolab, 26 No embe , 2019. New Yo k Ci y: WNYC S udios.
A ailable h ps://www.wnycs udios.o g/podcas s/ adiolab/a icles/b eaking-bongo. Accessed: 28
Feb ua y, 2020.