https://doi.org/10.1177/1461444819885334
new media & society
2020, Vol. 22(10) 1868 –1884
© The Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/1461444819885334
journals.sagepub.com/home/nms
Human-aided artificial
intelligence: Or, how to run
large computations in human
brains? Toward a media
sociology of machine learning
Rainer Mühlhoff
Technical University of Berlin, Germany
Abstract
Today, artificial intelligence (AI), especially machine learning, is structurally dependent
on human participation. Technologies such as deep learning (DL) leverage networked
media infrastructures and human-machine interaction designs to harness users to
provide training and verification data. The emergence of DL is therefore based on a
fundamental socio-technological transformation of the relationship between humans
and machines. Rather than simulating human intelligence, DL-based AIs capture human
cognitive abilities, so they are hybrid human-machine apparatuses. From a perspective
of media philosophy and social-theoretical critique, I differentiate five types of “media
technologies of capture” in AI apparatuses and analyze them as forms of power
relations between humans and machines. Finally, I argue that the current hype about
AI implies a relational and distributed understanding of (human/artificial) intelligence,
which I categorize under the term “cybernetic AI.” This form of AI manifests in socio-
technological apparatuses that involve new modes of subjectivation, social control, and
digital labor.
Keywords
Artificial intelligence, audience labor, commercial content moderation, cybernetics,
deep learning, human computation, human-computer interaction, social media,
tracking, training data, user experience design
Corresponding author:
Rainer Mühlhoff, Excellence Cluster Science of Intelligence, Technical University of Berlin, Straße des 17. Juni
135, 10623 Berlin, Germany.
Emails: [email protected]; [email protected]
885334NMS0010.1177/1461444819885334new media & societyMühlhoff
research-article2019
Article
Mühlhoff 1869
Introduction: a new era of AI?
In recent years, there has been a renewed hype about artificial intelligence (AI). AI tech-
nology is attracting immense public attention as more and more real and tangible appli-
cations are emerging in industry, consumer worlds, politics, and policy. At the
technological level, this trend is largely due to deep learning (DL) as one particular
approach within the heterogeneous field of AI research. The DL is a method based on
simulated artificial neural networks (ANNs) in the field of machine learning (ML)
(Bengio, 2009; Goodfellow et al., 2016; LeCun et al., 2015). Various hitherto difficult
computational problems such as object recognition in images, natural language process-
ing, and identification of patterns in large data sets can now be automated with DL.
While the breakthrough of DL is often seen as a “revolution,” the debate in media
studies shows that this is only a momentary—and above all economic—supremacy of
one of several AI paradigms that have long been running parallel (Sudmann, 2018). DL
is a “bottom-up” statistical approach based on the aggregation of empirical knowledge.
Since Alan Turing, learning-based AI has been contrasted with the paradigm of sym-
bolic AI, or “Good Old-Fashioned AI” (GOFAI) (Haugeland, 1985), which essentially
understands intelligence as the ability to manipulate symbols. GOFAI is modeled
around problems such as automated chess play or mathematical theorem proving
(Haugeland, 1981; Newell and Simon, 1976; see Brooks, 1991 for a historical over-
view). The current dominance of the ML paradigm over GOFAI is often explained by
strong developments in computing technology toward high-performance parallel com-
puting on graphical processing units (GPUs) during the last 10 years. That is, the cur-
rent progress of DL is attributed to a new generation of hardware architectures that is
better suited for the computational tasks related to ANNs that require processors differ-
ent from the classical von Neumann architectures (Bolz et al., 1994; Sudmann, 2018).
In this article, I would like to add another approach to explaining the success story of
DL: the diagnosis of an underlying socio-technological revolution. I will argue that DL’s
“breakthrough” required not only the development of high-performance parallel comput-
ing techniques, but a fundamental structural change in media culture and human-
computer interaction (HCI) at societal scale. I start from the observation that most indus-
trial DL implementations come with extensive media technological infrastructure for
capturing humans in distributed, human-machine computing networks, which as a whole
perform the intelligence capacity that is commonly attributed to the computer system as
“artificial intelligence.” Today, the scarce resource on which the success of a DL project
depends is neither algorithms nor computing power but rather the availability of training
and verification data, which is ultimately obtained through human participation. The
importance of this resource led to the emergence of new forms of exploitation and
implicit labor in the digital that build on existing socio-economic divides. Seen from the
angle of this article, DL is a form of distributed orchestration of human cognition through
networked media technology. The question of generating training data is so essential to
DL projects that at the core of any such project today lies a characteristic problem of
human-computer interaction (cf. Mühlhoff, 2019b): How does one design an interface,
a platform, or a medial environment that can serve as an infrastructure for obtaining
data through free and implitic human participation?
1870 new media & society 22(10)
Historical context
For many decades in the 20th century, the symbolic paradigm of AI (GOFAI) was deemed
more fruitful and received more research resources than ML approaches. This affected
not only AI research but also the conception of human intelligence itself which was
articulated in related fields such as cognitive science and psychology. The concept of
intelligence was at any time closely tied to current techniques of computation (Brooks,
1991). The concept of the universal Turing machine (cf. Turing, 1937) and its realization
in von Neumann processor architectures was not only better suited to the symbolic para-
digm than to ML but also influenced the general understanding of “intelligence” and
“cognition” of the time to focus on symbol manipulation and problem-solving. Despite
the fact that alternative paradigms both in AI and cognitive science, such as embodiment
and situatedness (Brooks, 1991), or distributed (Hutchins, 2001; Rumelhart and
McClelland, 1986) and connectionist (cf. Sun, 2014) approaches, have always been pur-
sued, it was not until the 2010s that ML based on ANNs made significant developments
that eventually lead to the current dominance of the learning paradigm over GOFAI. It is
common to explain this development by the discovery of the backpropagation training
algorithm (Rumelhart et al., 1986), which became effective only much later by the devel-
opment of high-performance parallel computing on GPUs.
Hence, the current boom of DL is largely seen as the product of a “hardware
revolution”—a claim that is also maintained in media studies (e.g. Bolz et al., 1994;
Sudmann, 2018). What is underrepresented in this description, however, is the funda-
mental shift of the relation between humans and machines that materializes in everyday
human-machine interaction designs (Mühlhoff, 2018) in the wake of “web 2.0”
(O’Reilly, 2005) and “ubiquitous computing” (Weiser, 1991). As I maintain in this arti-
cle, the media cultural transformations of modern user experience (UX) design are not
only a prerequisite for the success of DL but also instigated a shift of the conception of
intelligence itself, which is densely related to the media-technological relation of
humans and machines. In the DL paradigm, human cognitive skills are not simulated by
a machine anymore, but embedded in machine networks. DL is less about replacing
human cognitive labor by an intelligent machine but about embedding and harvesting
human cognition in computing networks through new forms of labor and machinized
power relations.
The perspective outlined in this article will stress this socio-technological dimension
of DL. I will proceed in three steps: In section “Introduction: a new era of AI?” I will use
two research contributions from 2006 and 2017 as examples to illustrate fundamental
transformations in consumer media that are a prerequisite to the success of DL. In sec-
tion “Hybrid processors: human-machine computing networks and AI,” I will differenti-
ate five forms of capturing human collaboration in hybrid human-machine AIs and point
to the different forms of power, subjectivation and labor engendered by these modes of
capture. In section “Conclusion: a cybernetic notion of AI,” I will debate the shift of the
understanding of intelligence that is implicit in DL, arguing that in order to accommo-
date recent developments, a “simulation-based” understanding must be differentiated
from a “cybernetic understanding of AI.”
Mühlhoff 1871
Hybrid processors: human-machine computing networks
and AI
The current, third era of AI technology is characterized by a new form of networked
technology that implements intelligent devices by incorporating humans as cognitive
agents. To make this historical thesis plausible, I will look at two exemplary research
contributions from 2006 and 2017 that illustrate this development. Both are lectures of
relevant scientists, which are available as videos.
Vignette 1: “games with a purpose”
In 2006, the computer scientist Luis von Ahn, a pioneer of “crowdsourcing” and founder of
the company reCAPTCHA ([Onl.1]), gave a Google Tech Talk under the title of “Human
Computation” ([Vid.1]). He says that his project started from the idea that the human brain
is actually “a pretty advanced processing unit . . . that can solve problems that computers
cannot yet solve” ([Vid.1]: 6 minutes 40 seconds), such as recognizing objects in images or
understanding spoken language. To this, he adds the sociological observation that there is
an immense number of “wasted human cycles”1 every day in the world, evident, for
instance, in “the 9 billion human-hours of Solitair [that were] played in 2003” ([Vid.1]: 7
minutes). Humans are not only good computing units, but their computing power is also
available in abundance. From these two premises, von Ahn put together the goal of his
research: “Running a computation in peoples’ brains instead of silicon processors” ([Vid.1]:
25 minutes). To this end, “we are going to consider all of humanity as an extremely
advanced, large-scale distributed processing unit that can solve large-scale problems that
computers cannot yet solve.” ([Vid.1]: 8 minutes; see also von Ahn, 2005)
One project of von Ahn and Laura Dabbish (2004) was the so-called “ESP game”—it
was later acquired by Google and became known as Google Image Labeler. Its purpose
was to obtain labels that describe images through the free participation of people on the
Internet. The ESP game is a two-person online game in which play partners are randomly
assigned to each other for the duration of a session and have no means of communica-
tion. In a game cycle, both players see the same image on their screens and are prompted
to enter keywords describing the image. They cannot see what the other is typing, but if
both enter the same keyword fast enough (“match”), they get points. In effect, these key-
words can be used as accurate labels for the image.
The ESP game has gained significant popularity after its launch in 2003. Over 1.3 mil-
lion labels for approx. 290,000 pictures were generated within four months (von Ahn and
Dabbish, 2004). The database of Google image search at that time contained about 425
million images, and von Ahn and Dabbish (2004) estimated that their game could com-
pletely index this stock in only 6 months by the free work of the players ([Vid.2]: 15
minutes 20 seconds). The labels could then be used to improve Google’s image search.
Notably, this came at a time when leading image search technology relied on file names,
HTML captions, and the surrounding text on the websites to associate images with search
keywords.
Luis von Ahn (2006) proposed the game-theoretical term “Games With a Purpose”
(GWAP) for games like this. He thus established what is commonly referred to as
1872 new media & society 22(10)
“gamification” and “human computation” (von Ahn, 2005) within HCI research. Remar-
kably, Amazon’s “Mechanical Turk” service was introduced at roughly the same time.
While Mechanical Turk allows repetitive but simple tasks to be outsourced to paid click-
workers, von Ahn’s vision was to turn an “extremely tedious task into a game that’s fun”
([Vid.1]: 32 minutes 40 seconds). Following this principle, von Ahn et al. (2006) have
developed several other online games that outsource computing problems to the free labor
of humans, for example, “Peek-a-Boom” for the spacial location of objects in images or
“verbosity” for the generation of a large knowledge base of common sense facts.
All these games are based on the idea of harnessing “human computing power” in
hybrid human-machine networks to perform a computational task that a silicon-based
computer cannot easily solve. The ultimate and best known application of this principle
is “reCAPTCHA”—a company founded by Luis von Ahn and later acquired by Google
([Onl.1]). reCAPTCHA combines the idea of CAPTCHA (von Ahn et al., 2003) with that
of “human computation.” A CAPTCHA is a small challenge that can be built into the
human-machine interaction here and there on the Internet to verify that the user is actu-
ally a “human user.” For this purpose, the user is asked to solve a small task such as
image recognition or text recognition, which is a low barrier for a human, but a high one
for a computer bot. reCAPTCHA extends on this principle by re-using the responses of
human users as training data for industrial Deep-Learning projects (von Ahn et al., 2008).
Vignette 2: the “eternal spring” of AI
A good 10 years after von Ahn’s Google Tech Talk, we are in the midst of the industrial
euphoria about learning based AI. In an exemplary form, this euphoria is visible in a talk
given by Andrew Ng in 2017 at the Stanford Graduate School of Business ([Vid.2]).
Andrew Ng is a leading AI expert, a Stanford professor, and former head of AI depart-
ments first at Google and then at Baidu. In his talk under the title “AI is the New
Electricity,” he explains that after the two “AI winters” in the late 1960s and 1980s, AI
technology is now in a phase of “eternal spring” ([Vid.2]: 1 hour 0 minutes). Today, he
says, AI has become a key technological component and transformative agent of our
civilization, similar to the indispensable role of silicon-based semiconductors or electric-
ity ([Vid.2]: 1 hour 0 minutes).
When Ng speaks of AI, he explicitly refers to the narrower category of DL in the vari-
ant of supervised learning, “because the massive economic value” of the industrial appli-
cation of AI is currently (in the future this could change) almost exclusively driven by
DL ([Vid.2]: 7 minutes 45 seconds). He also highlights that DL has become successful in
the last 10 years because of two independent factors. (1) The development of high per-
formance computing (HPC) on GPUs increased computing speed, and (2) DL requires an
enormous amount of training data, but sufficient data sets have only become available in
the last 10 years ([Vid.2], 21 minutes). This dependence on training data is because
supervised learning trains an ANN using a large set of known input and output pairs until
its internal parameters are calibrated so well that previously unseen input data are likely
to be connected to the correct output. For example, if an ANN is to recognize objects in
images, the input is an image and the output is a list of labels that designate the objects
in the image. A training data set would then be a database of labeled images. According
Mühlhoff 1873
to Ng, world-leading face recognition AIs are trained on more than 200 million facial
images; speech recognition AIs are build from more than 100,000 hours of transcribed
audio ([Vid.2]: 33 minutes).
Interestingly, from his business and application-oriented perspective Ng points out
that today only the second factor, the availability of training data, is genuinely a scarce
resource. This is to be seen in a context where computing power has been available as a
service on an industrial scale for several years now. Services such as Google’s “Cloud
AI” or IBM’s “Watson Machine Learning” allow any small company to bring their data
and train complex DL models “in the cloud” without having to maintain their own com-
puting infrastructure ([Onl.2]). Open source libraries such as TensorFlow ([Onl.3]) or
Keras ([Onl.4]) make algorithms for DL accessible via high-level application program-
ming interfaces (APIs), so industrial users often do not need to develop their own imple-
mentation of DL algorithms.
In a constellation where algorithms are public and computing power is for sale, the
core economic asset of “each defensible AI business” is training data ([Vid.2]: 30 min-
utes ff.). This is a fact that determines business strategies. Ng says, “I frequently launch
products where my motivation is not revenue, but is actually data; and we monetize the
data through a different product” ([Vid.2]: 33 minutes 40 seconds). The AI product
cycles are subject to a feedback loop that Ng calls the “virtuous circle of AI” ([Vid.2]: 35
minutes ff.): more users of an AI product typically generate more data using it; more data
make the AI and thus the product better; a better product in turn attracts more users.
Strategies for the introduction of new AI products on the market explicitly build on this
principle. In fact, it is not unheard of that in the early stages, human clickworkers instead
of intelligent computers sit “at the backend” of a new AI product. In this way, the virtu-
ous circle of AI can still be activated, even if no training data are available yet ([Onl.5]).
This trick allows to reverse the order of training and inference phases of an ANN.
Ng’s talk also mentions some limitations of DL that are useful to inform the critical
perspective I take in this article. First, Ng proposes a “rule of thumb” regarding which
types of problems he thinks could be expected to be automated by DL. “Anything that a
typical human can do in at most one second of thought, we can probably now or soon
automate with AI,” he says ([Vid.2]: 14 minutes). This statement includes image recogni-
tion and speech recognition tasks, but excludes, for example, the prediction of stock
market prices ([Vid.2]: 16 minutes). Second, Ng mentions the learning curve of DL AIs,
which is a graph that shows the performance as a function of the number of trained input
and output pairs. This curve rises steeply at the beginning, that is, with an increasing
amount of training data, DL makes strong progress in the accuracy of predictions; how-
ever, roughly at the point of “human-level performance,” this curve typically flattens
([Vid.2]: 18 minutes). Therefore, when an accuracy roughly equal to that of human cog-
nition is reached, additional training data have only minor effects and learning progress
slows down, according to Ng. Both observations suggest that the potentials of DL are
inherently tied to the cognitive skills of human beings.
Current commercial AIs do not replace human intelligence, they capture it
Remarkably, with his “rule of thumb,” Ng restricts the range of problems that can be
addressed by DL to exactly the same range that Luis von Ahn had envisaged 15 years
1874 new media & society 22(10)
earlier with his idea of “exploiting human brain cycles.” I argue that this correlation is no
coincidence. In the past 10 years, ML performed well precisely on the kind of tasks for
which there is now a comprehensive media infrastructure that involves human beings in
hybrid human-machine computing networks to obtain training data. In a development
that leads away from GOFAI, DL-based AI today is a product of harvesting human labor
and cognition in computing networks at large scale. ML is more than algorithms and
HPC: it is a media-cultural constellation involving human-machine interfaces and media
technology that makes people implicitly generate data that can be used as training data.
As we shall see in the next chapter, Luis von Ahn’s GWAPs are only one of several con-
temporary forms such a media technology of capture might take.
This shows that the emergence of DL is inherently tied to recent trends in HCI and UX
design (Mühlhoff, 2018). Any viable DL problem today is translated into a correspond-
ing problem in HCI. This problem is: How can a use case and a UX world be constructed
so that the data that is needed as training data can be obtained as behavioral data from the
“free labor” of a general audience of users (cf. Terranova, 2000; Fisher and Fuchs, 2015;
Fuchs, 2010)? The technology that solves this concurrent HCI problem must be seen as
an integral part of the technical apparatus that implements the AI. This makes building
an AI partly a problem of social engineering and interface design. The acquisition of
training data goes hand in hand with the creation of digital media infrastructures that take
the form of hybrid human-machine networks, which must themselves, as a whole, be
described as an entity in which the AI in question is to be located.
From a broader historical point of view, the commercial breakthrough of AI is there-
fore closely related to key developments of the “ubiquitous computing” paradigm (Weiser,
1991) and chiefly facilitated by the rise of the interactive “Web 2.0” and social media. It
was not until the end of 2006 that Facebook opened its service to the general public. The
idea of “Web 2.0,” which brought “design patterns and business models for the next gen-
eration of software” (O’Reilly, 2005), was popularized only in 2004. This creates an idea
of how remote the concept of harnessing human cognitive resources in distributed com-
puting networks must have appeared in 2003–2006 and earlier. Since then, however, vari-
ous infrastructures for capturing human cognitive resources in networked platforms have
de facto become a media cultural standard due to the penetration of the social world by
networked computers and graphical user interfaces. Today, a general convergence of
training data and everyday behavioral data can be observed. It has become relatively easy
to collect training data if this data are a by-product of everyday usage flows.
Media technologies of capture: five types of power relations
To show how this socio-technological analysis of DL spells out in relation to real appli-
cations today, I will now distinguish five different forms of capturing human cognitive
capacities in human-computer interfaces that feed into AI products. I will specifically
point out how the five forms differ in terms of human-computer power relations, subjec-
tivation of users, and new forms of labor in digital apparatuses.
The first form of capture has already been described above using the example of the
ESP game: It can be summarized under the term “gamification.” Gamification is a
method to engaging users in a playful interactive world in which they knowingly or
unknowingly perform tasks that originate from, and feed back into, a non-game context
Mühlhoff 1875
(Deterding et al., 2011). In this case, the form of power that shapes the relation of users
and computing machinery builds on playfulness and fun. Falling into the category of
“gamification-from-above” (Woodcock and Johnson, 2018), these examples show a
hierarchical extrication of “audience labor” (Fisher, 2015). By creating a subjective
experience of pleasure and harmlessness, this fact (which is known to many users) does
not dominate the user experience in a negative way.
A second form of harnessing human cognitive resources in computer networks can
be described as “trapping and tracking.” Its prototype is reCAPTCHA, fittingly
described as “Human-Based Character Recognition via Web Security Measures” by its
inventors (von Ahn et al., 2008). Through “trapping and tracking,” a (computing) task
that is to be outsourced to a human user is integrated into an interaction process so that
it must be completed in order for the user to achieve something else they want to
achieve. A more complex but less obvious example of this method of harnessing human
cognition is provided by the Google search engine. A list of Google search results is
not only the product of a calculation using AI, but it has embedded scripts that turn
each user into a data provider for further calibration and re-training of this AI. This is
facilitated by a click-tracking mechanism on the search engine result pages (SERPs)
that records every click on that page and reports it back to a Google server (Mühlhoff,
2019a). This infrastructure allows Google to register, among other things, which search
results users select and whether they return to the SERP after viewing one result (e.g.
using the back button) to click another one. Thus, by simply using Google Search,
users involuntarily generate a wealth of data providing information about the per-
ceived relevance of the results and enable detailed analyses of clicking behavior (which
website elements are more likely to be noticed; how far down users scroll; what bias
exists between ads and organic search results, etc.). If users are logged in to a Google
account, this data are linked to their personal user IDs and can be correlated with their
e-mail contents, YouTube activities, calendar dates, and so on (Mühlhoff, 2019a).
While I cannot go into the serious data protection issues arising from these tracking
techniques (Noble, 2018; O’Neil, 2016), in the present context my point is that the
real-time stream of usage data serves to continuously train and further calibrate the AI
that is responsible for generating the search results.
This example shows that in practice there is often no strong separation between the
training and inference phase of an ANN model. The collection of training data for con-
tinuous verification and recalibration of the Google search AI never stops. Through the
participation of users in Google’s search engine, a feedback loop is implemented, linking
the predictions back to reality. This feedback loop is a fixed infrastructural component of
Google Search necessary to make the search engine adapt to a dynamic world in which
it is regularly confronted with new pages, content, cultural, and political relevance con-
stellations, and so on. As a machine that is built to determine the relative relevance of
content with respect to search keywords, a search AI never finishes training. As a
dynamic process, its intelligence capacity lies in the immanence of a hybrid human-
computer information processing network. The involuntary involvement of humans as
data generators in Google Search creates a mediatized swarm principle, making the AI of
that search engine a performative product of implicit “audience labor” (Fisher, 2015;
Fuchs, 2010) in a networked infrastructure of human-machine interaction.
1876 new media & society 22(10)
The kind of power relation that is at work in the “trapping and tracking” class of
examples builds on a combination of two facts: first, most users are not aware that using
the respective service they contribute to a distributed computing network. Although data
collection is explicitly stated in Google’s terms of service, it is completely invisible at the
level of user interfaces; data collection happens in the background and as part of merely
consuming search results (see Fisher, 2015 who highlights that this is still a form of labor
that creates immediate value). Second, the strategy of “trapping and tracking” builds on
the fact that these services are perceived as indispensable by a majority of users. It is not
a realistic threat to those companies that users might abstain from using Google Search
or from solving a reCAPTCHA. Much unlike online games, neither service is used as an
end in itself, but rather is instrumental for the users to achieve another goal that they want
to reach, and in the case of reCAPTCHA, there is by definition no way of getting past it
without solving it.
A third form of harnessing human cognitive resources in AI systems is given by social
networking platforms such as Facebook. This form relies on the extrication of social
motivations, making the user unknowingly participate in a computing network by acting
socially. Labeling photos on Facebook is a good example for this kind of socially moti-
vated “free labor” in the digital (Terranova, 2000). Tagging someone on an uploaded
image is part of everyday social interaction on Facebook; in fact, Facebook as a medium
has created a UX world in which this is made an essential aspect of social communica-
tion.2 In this way, Facebook is aggregating a database of labeled facial images that could
be used to train a face recognition AI. Facebook has been building its face recognition AI
since 2010, and by 2017 it was pretty accurate [Onl.6]. In that year, Facebook began to
notify users when their face was automatically recognized on an uploaded photo [Onl.7].
The user could then select whether they want a label with their name added to the image,
whether they prefer to stay invisible, or whether it is not even them in the photo. Facebook
presents this “new feature” as a measure for better control of privacy, yet it obviously
serves another purpose. This is a good trick in the field of HCI design to obtain a constant
stream of verification data from free human labor to improve the predictions of the face
recognition AI.
A lack of built-in verification mechanisms for AI-based predictions is generally one
of the main sources of error and distortion in the real social use of predictive ML applica-
tions (O’Neil, 2016). A good AI needs feedback loops that help to align its predictions
with reality, otherwise false positives (in other circumstances, false negatives) will not be
discovered and controlled for by re-calibration of the AI. With the new “feature,”
Facebook set up such a feedback loop using UX design and taking advantage of a grow-
ing privacy sensitivity to capture human collaboration. The stream of training/verifica-
tion data generated by this infrastructure is an integral part of the apparatus, which, as a
whole, is referred to as Facebook’s face recognition AI. Similar to the example of Google
Search, this case shows that there is often no strict separation of the training and infer-
ence phases of an ANN model. DL models are often continuously re-calibrated in real
time using human-generated verification data; the training phase overlaps with the inter-
ference phase and training data often take the form of verification data.
In this socially motivated form of capture, the power relation between user and
machine can best be described as a social “exploit” (cf. Galloway and Thacker, 2007) in
Mühlhoff 1877
the rich sense of the term that includes its meaning in hacker culture: an exploit is a way
of taking advantage of a system through a loophole, by hijacking and subtly modulating
its functions. In this sense, Facebook is “hacking” itself into the social communication
habits of users to capture their cognitive capacities as free labor in a human-aided AI
apparatus for face recognition. This form of power operates in part through the produc-
tion of subjectivity insofar as Facebook created a social space in which such an unusual
activity as tagging faces is made an integral part of everyday interaction.
A fourth form of capturing human collaboration in hybrid computing networks is
given by information mining strategies that build on nudges and economic incentives.
An example is when a health insurance service offers discount to customers who use a
physical activity tracker, or a nutrition tracking app, to record step counts, movements,
dietary habits, and so on. Similarly, some auto insurances offer discounts for installing a
Global Positioning System (GPS) tracking and accelerometer device in ones car, track-
ing not only the individual location history but also the user’s “driving style” (O’Neil,
2016: 168–173). Insurance companies use this kind of data to correlate it with the per-
sonal medical record of that person (health insurance) or with the rate of damages and
incidents of the driver (car insurance). The idea is to use data analytics to predict diseases
or addictions, or respectively, to identify driving styles and routes that correlate with
higher risk of incidents. In both cases, behavioral data are used to train an AI that classi-
fies individual users in terms of (economic) risk categories, which then is used for indi-
vidual insurance pricing (O’Neil, 2016).
In order for this to qualify as an example of free human labor in a hybrid AI network,
one needs to point out in what way humans, by wearing activity trackers or equipping
their cars with GPS trackers, are providing a piece of computation to the machine net-
work. In fact, by providing their data, each user becomes part of a distributed routine by
means of which any other user can be classified as high-risk or low-risk. Slightly simpli-
fied, providing ones data amounts to enabling one more comparison between anyone and
oneself; it means one more computational operation that refines the outcome of the pre-
diction. This may seem an indirect way of contributing to a computational network, yet
it is significant because the AI in question does not have a built-in mechanism on its own
to distinguish safe from risky driving styles or healthy from unhealthy fitness habits. It
has to learn this from user data and each user, by providing their data, does a little bit of
the work of training the predictive system. At the same time, the negative consequences
resulting from high-risk classifications are visible only to some users as they are often
asymmetrically distributed to the disadvantage of the poor (O’Neil, 2016).
The power relation involved in this form of capture (and now I am referring only to
the moment of capture, not to the negative consequences one might suffer from being
classified as high-risk) is a soft one, often described as “nudging” that pushes the user in
a certain direction, for instance, by economic incentives.3 In these cases, the nudge is
further enabled by the fact that many users do not see the collective damage of providing
their data, but stick to their individual perspective in which it seems to them that they
“have nothing to hide.”
A fifth form of harnessing human cognitive resources in distributed computing net-
works is crowdsourcing on platforms such as Amazon Mechanical Turk (“MTurk”). This
platform for small, low-paid, on-screen tasks was publicly launched in 2005, around the
1878 new media & society 22(10)
same time Luis von Ahn and his team were developing their ideas to extract such work
for free through gamification. MTurk was originally developed for Amazon’s own pur-
poses, as an infrastructure to outsource a number of repetitive tasks related to maintain-
ing their product catalog, such as updating product information and identifying duplicates.
In the jargon of the platform, small tasks that can be processed by humans in a few sec-
onds for a few cents are called “HITs”—“Human Intelligence Tasks” ([Onl.8]). On
MTurk, there is always a worldwide community of casual workers available to process
HITs that are submitted by large companies or research institutions through an API. This
community of workers is mostly located in the Global South and often economically
precarious ([Onl.9–10]). Their deployment through MTurk is often cheaper than devel-
oping a full automation of the tasks; if automation is desired, these workers can be used
to create training or verification data.4
As the computer scientist Jaron Lanier (2014) puts it, MTurk really “allows you to
think of the people as software components.” Through an API that is available for all
major programming languages, processing HITs on a “human processor” can be inte-
grated smoothly into classical programming code (Figure 1). Such a programming code
is indeed partially executed on silicone-based processors and human brains. The access
to human workers through an API largely conceals the social dimension and social con-
sequences of this form of capture. This is particularly evident in commercial content
moderation (CCM), which is the outsourcing of moderation tasks from social media
platforms to service companies that rely on clickwork for manual reviews of
Figure 1. Sample algorithm for creating a sorted list of 50 tourist attractions in Berlin using
human cognitive resources via MTurk API-Call.
Source: The author / adapted from Little etal. (2010).
Mühlhoff 1879
user-generated content (UGC) (Roberts, 2016a). The army of CCM workers deployed by
Facebook, Twitter, Tinder, and so on to review UGC is conservatively estimated at more
than 100,000 people worldwide, more than double the number of Google employees and
14 times that of Facebook; many of them are located in low-wage areas and in the Global
South [Onl.11]. The task of CCM workers is to check UGC for compliance with laws and
platform guidelines. To do this, they review such content item by item, day by day, to sort
it into different risk categories. Most platforms do not send all images or posts uploaded
by users through such a manual review as this would be very expensive. Often UGC goes
live immediately and only when another user reports it as inappropriate (which is also a
form of capturing human collaboration) is it sent to CCM. In this way, as Sarah Roberts
(2016b) points out, CCM workers do not see the entire spectrum of uploaded material,
but a pre-selected list, which
“often focuses on content that is highly sexual or pornographic, depicts the abuse of adults, the
abuse of children (physical and/or sexual), the abuse and torture of animals, content coming
from war zones and other areas besieged by violent conflict, and any material that is designed
to be shocking, prurient or offensive by nature.”
Investigations by journalists and researchers point to psychological damage such as
post-traumatic stress disorder caused by this work. This is a form of social cost that adds
to the exploitative financial working conditions in the gig economy and is not covered in
the balance sheet of companies that use these kind of services ([Onl.11; Onl.12]). Hence,
clickwork shows how economic power relations shaped by precarious work conditions
and global economic disparities can be directly transformed into computing power.
Indeed, a clickwork platform is a machine that converts economic power differentials
into computing power.
In times when politics effectively force platform companies to install upload filters,
AI methods for the automatic classification of content are being developed [Onl.13]. At
present, these are not mature enough to allow a computer system alone to identify abu-
sive content with great accuracy [Onl.14]. A partial automation is still conceivable; by
combining silicon-based AI techniques with the selective use of clickwork, a human
decision is only necessary when the ML model delivers an uncertain result. This hybrid
form of automation is more efficient and cost-effective, for then it forms a hybrid human-
machine computing network that implements, as a whole, a human-aided AI for content
filtering.
Conclusion: a cybernetic notion of AI
I refer to the five forms of harnessing human cognitive, affective and social capacities in
hybrid human-machine computing networks, together with the various (commercial)
products and services that are built upon them, as the media-sociological dispositive of
“Human-Aided AI.” This concept aims to place the nexus of media technologies, social
interaction, the molding of end-user subjectivity, and new forms of labor in everyday
machinated power relations at the center of a discussion of AI. While I do not deny the
relevance of developments in computing technology for the success of DL, my point is
1880 new media & society 22(10)
to stress that today most commercially relevant AIs are emergent phenomena in hybrid
human-machine networks that rely on specific media-cultural prerequisites.
The term “Human-Aided AI” also aims at questioning the classical notion of “intel-
ligence” as an autonomous and sovereign rational capacity located within a physically
delineated apparatus or living being. Human-aided AI is an emergent and distributed
intelligence capacity of hybrid human-machine assemblages. To make this contrast clear,
I will refer to the classical (autonomous and confined) understanding of intelligence, if it
is applied to AI, as simulative understanding of AI. Simulative AI is strongly tied to the
idea of an intelligent system as a black box that passes the Turing Test (cf. Copeland,
2000; Turing, 1950). In this logic, intelligence is ascribed to, and located within, a sys-
tem if it can simulate human cognitive performance in its external interactions. (On the
semantics of “simulation” see Turing, 1996 [1951].) The symbolic paradigm of AI, or
GOFAI, is an example of a simulative conception of AI. It conceives of AI as a problem-
solving, language processing, or chess playing capability of a system that manifests in its
external relations and within the constraints of the mediality of its interactive channels to
the outside world. For instance, Joseph Weizenbaum’s (1966) ELIZA “chat bot” inter-
acted through a typewriter; a chess automaton interacts via a chess board, be it physically
present or visualized on a screen. By introducing the qualifier “simulative” to describe
this connotation of AI, I seek for an interface and media theoretical (rather than algorith-
mic) characterization of this type of AI. Regardless of its concrete algorithmic imple-
mentation, simulative AI assumes that intelligence is located within an apparatus and is
evident in its external interaction that resembles the intelligent behavior of humans, as it
can be tested by some variant of the Turing test.
I maintain that this principle of simulation is abandoned in the switch to DL. This is
because the media-theoretical logic of interaction between the intelligent device and
humans has changed: from simulation to immersion of human skills, from the machine
“growing into” human cognitive capacities to exploiting human cognitive capacities,
from the machine substituting human labor to the power strategy of capturing human
labor within distributed higher-order apparatuses. I refer to this as cybernetic under-
standing of AI—which is meant as an oppositional concept to simulative AI. I call it
cybernetic because the structural form of its relation to humans is that of feed-back loop
control (Rosenblueth et al., 1943): as we saw in the examples of Facebook and Google
Search, human action within the apparatus generates training and verification data that
feed into AI predictions; however, there is actually a double feedback effect as cybernetic
AIs also back-feed on the people who use them. By communicating through Facebook,
searching with Google, or providing data to one’s health insurance, the human-machine
network (aka “AI”) modulates the user’s movements, knowledge, well-being, and
affects. This double feedback effect of DL-based AI apparatuses subjugates users to a
mechanism of control. Control is a subtly modulating form of power that is central to the
sociotechnical mechanisms Norbert Wiener and others described under the title of
“cybernetics” (Ashby, 1957; Wiener, 1954).5 Using the term cybernetic AI stresses that
the AI apparatus is not just run by unilateral exploitation of free labor, but rather facili-
tates an emergent cognitive capacity of the apparatus that is regularly consulted by users
themselves. This leads to a reciprocal co-dependence of users and AI that is at the heart
of specific forms of mechanized power and control in the dispositive of human-aided AI.
Mühlhoff 1881
While this form of power is generally weak and non-repressive, it can still manifest in
strong forms of subordination that have recently been debated as “algorithmic discrimi-
nation,” “automated inequality,” or big data–based social selection (cf. Eubanks, 2018;
Noble, 2018; O’Neil, 2016).
The conceptual difference between simulative and cybernetic AI concerns the form of
mediated relations between machines and humans: In simulative AI, intelligence mani-
fests in a relation of comparison or resemblance of skills across external boundaries of
humans and machines. In cybernetic AI, intelligence is an emergent and distributed capac-
ity of the hybrid human-machine assemblages as a whole, while the single relations
between humans and machine are power relations that make the human a functional part
of that machine. Simulative AI reproduces human skills, while cybernetic AI embeds
them. This shows that recent developments in commercial applications of AI come with a
significant shift in the implicit conception of intelligence itself. In our specific media-
cultural context, this shift is related to concrete design principles and developments in the
field of HCI. The founding father of UX design, Donald Norman (1988), speaks of design
as a “psychology of everyday things.” Seen from this angle, interaction design is the busi-
ness of colonizing the cross-section of sociality and technology by a creative “will to
power.” In the media-cultural dispositive of human-aided AI, people are made to habitu-
ally attach to digital interfaces, which enable harnessing them as data servants and free
labor force. In consequence, human-aided AI is not just one technology among many, but
a historical formation. It is based on socio-economic conditions, technological standards,
political discourses, and specific habits, subjectivities and embodiments in the digital
world that are themselves a product of everyday interaction with digital media (Mühlhoff,
2018). In 2006, a specific online game had to be set up to gain training data for a specific
AI problem. With the emergence of the dispositive of human-aided AI in the years since
this relationship has turned upside down. Data are constantly generated and collected, and
its availability even tend to precede the concrete use for an AI problem.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/
or publication of this article: Research of this article has in part been supportet by the Collaborative
Research Center SFB1171 Affective Societies, project B05, at Freie Universität Berlin, funded by
the Deutsche Forschungsgemeinschaft (DFG), 2015–2019.
ORCID iD
Rainer Mühlhoff https://orcid.org/0000-0002-3936-9919
Notes
1. “Human cycles” allude to the term “processor cycles” in computer science, thus referring to
a fictitious unit of information processing power of the human brain.
2. Scholarship in the post-Marxist theoretical tradition compared Facebook to a “digital assem-
bly line,” where millions of free workers generate the economic value of the company
(Scholz, 2013). See also Fisher and Fuchs (2015), Fuchs (2010), and Terranova (2000). These
approaches start from extending the concept of work to the digital sphere in order to subject
the phenomenon to a (post-)Marxist strategy of economic critique.
1882 new media & society 22(10)
3. The term “nudging” originates from behavioral economics (see Thaler and Sunstein, 2008).
For a critical discussion in the context of interface design, see Mühlhoff (2018).
4. “Mechanical Turk” alludes to the (fake) chess computer of the Austro-Hungarian Baron von
Kempelen, who became known as the “chess Turk” in the 18th century, in whose generous
wooden housing a man was hiding, covertly playing the game (Levitt, 2000).
5. As a precursor to what we see in human-computer networks today, the notions of feedback
and control have been translated by the sociocybernetics movement into the sociological
framework of systems theory (Geyer, 1995).
References
Ashby WR (1957) An Introduction to Cybernetics. London: Chapman & Hall.
Bengio Y (2009) Learning deep architectures for AI. Foundations and Trends in Machine Learning
2(1): 1–127.
Bolz N, Kittler F and Tholen CG (1994) Computer als Medium. Munich: Fink.
Brooks R (1991) Intelligence Without Reason (A.I. Memo). Cambridge, MA: MIT Press.
Copeland J (2000) The Turing test. Minds and Machines 10(4): 519–539.
Deterding S, Dixon D, Khaled R, et al. (2011) From game design elements to gamefulness: defin-
ing gamification. In: Proceedings of the 15th international academic Mindtrek conference:
Envisioning future media environments, Tampere, 28–30 September, pp. 9–15. New York: ACM.
Eubanks V (2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the
Poor. New York: St. Martin’s Press.
Fisher E (2015) Audience labour on social media: learning from sponsored stories. In: Fisher E
and Fuchs C (eds) Reconsidering Value and Labour in the Digital Age. New York: Palgrave
MacMillan, pp. 115–132.
Fisher E and Fuchs C (eds) (2015) Reconsidering Value and Labour in the Digital Age. New York:
Palgrave MacMillan.
Fuchs C (2010) Labor in informational capitalism and on the Internet. The Information Society
26(3): 179–196.
Galloway A and Thacker E (2007) The Exploit: A Theory of Networks. Minneapolis, MN:
University of Minnesota Press.
Geyer F (1995) The challenge of sociocybernetics. Kybernetes 24(4): 6–32.
Goodfellow I, Bengio Y and Courville A (2016) Deep Learning. Cambridge, MA: MIT Press.
Haugeland J (1981) Semantic engines: an introduction to mind design. In: Haugeland J (ed.) Mind
Design. Cambridge, MA: MIT Press, pp. 34–50.
Haugeland J (1985) Artificial Intelligence: The Very Idea. Cambridge, MA: MIT Press.
Hutchins E (2001) Distributed cognition. In: Smelser N and Baltes P (eds) International
Encyclopedia of the Social & Behavioral Sciences. Oxford: Pergamon, pp. 2068–2072.
Lanier J (2014) Who Owns the Future? New York: Simon & Schuster.
LeCun Y, Bengio Y and Hinton G (2015) Deep learning. Nature 521: 436–444.
Levitt G (2000) The Turk, Chess Automaton. Jefferson: McFarland.
Little G, Chilton LB, Goldman M, et al. (2010) Turkit: human computation algorithms on mechan-
ical turk. In: Proceedings of the 23nd annual ACM symposium on user interface software and
technology, New York, 3–6 October, pp. 57–66. New York: ACM.
Mühlhoff R (2018) Digitale Entmündigung und “User Experience Design.” Wie digitale Geräte
uns nudgen, tracken und zur Unwissenheit erziehen. Leviathan—Journal of Social Sciences
46(4), pp. 551–574.
Mühlhoff R (2019a) Big data is watching you. Digitale Entmündigung am Beispiel von Facebook
und Google. In: Mühlhoff R, Breljak A and Slaby J (eds) Affekt Macht Netz: Auf dem Weg zu
einer Sozialtheorie der Digitalen Gesellschaft. Bielefeld: Transcript, pp. 81–107.
Mühlhoff 1883
Mühlhoff R (2019b) Menschengestützte Künstliche Intelligenz: Über die soziotechnischen
Voraussetzungen von “Deep Learning”. ZfM – Zeitschrift für Medienwissenschaft 11
(2/2019): 56–64.
Newell A and Simon H (1976) Computer science as empirical enquiry: symbols and search.
Communications of the ACM 19: 113–126.
Noble SU (2018) Algorithms of Oppression: How Search Engines Reinforce Racism. New York:
NYU Press.
Norman D (1988) The Psychology of Everyday Things. New York: Basic Books.
O’Neil C (2016) Weapons of Math Destruction. London: Penguin.
O’Reilly T (2005) What is web 2.0. design patterns and business models for the next generation
of software. Availabe at: http://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html
(accessed 7 March 2015).
Roberts ST (2016a) Commercial content moderation: digital laborers’ dirty work. Media Studies
Publications 12. Available at: https://ir.lib.uwo.ca/commpub/12
Roberts ST (2016b) Digital refuse: Canadian garbage, commercial content moderation and the
global circulation of social media’s waste. Wi: Journal of Mobile Media 10(1). Available at:
http://wi.mobilities.ca/digitalrefuse/
Rosenblueth A, Wiener N and Bigelow J (1943) Behavior, purpose and teleology. Philosophy of
Science 10(1): 18–24.
Rumelhart DE and McClelland JL (1986) Parallel Distributed Processing. Cambridge, MA: MIT
Press.
Rumelhart DE, Hinton GE and Williams RJ (1986) Learning representations by back-propagating
errors. Nature 323: 533–536.
Scholz T (ed.) (2013) Digital Labor: The Internet as Playground and Factory. London: Routledge.
Sudmann A (2018) Zur Einführung. In: Engemann C and Sudmann A (eds) Machine Learning.
Medien, Infrastrukturen und Technologien der künstlichen Intelligenz. Bielefeld: Transcript,
pp. 9–23.
Sun R (2014) Connectionism and neural networks. In: Frankish K and Ramsey W (eds) The
Cambridge Handbook of Artificial Intelligence. Cambridge: Cambridge University Press,
pp. 108–127.
Terranova T (2000) Free labor: producing culture for the digital economy. Social Text 18(2): 33–58.
Thaler RH and Sunstein CR (2008) Nudge: Improving Decisions about Health, Wealth, and
Happiness. New Haven, CT: Yale University Press.
Turing A (1937) On computable numbers, with an application to the Entscheidungsproblem.
Proceedings of the London Mathematical Society 2(1): 230–265.
Turing A (1950) Computing machinery and intelligence. Mind 59(236): 433–460.
Turing A (1996 [1951]) Intelligent machinery, a heretical theory. Philosophia Mathematica 4(3):
256–260.
von Ahn L (2005) Human computation. Doctoral Dissertation, School of Computer Science,
Carnegie Mellon University. Retrieved from http://reports-archive.adm.cs.cmu.edu/anon/
anon/usr/ftp/home/ftp/2005/CMU-CS-05-193.pdf
von Ahn L (2006) Games with a purpose. Computer 39(6): 92–94.
von Ahn L and Dabbish L (2004) Labeling images with a computer game. In: Proceedings of
the SIGCHI conference on human factors in computing systems, Vienna, 24–29 April, pp.
319–326. New York: ACM.
von Ahn L, Blum M, Hopper NJ, et al. (2003) Captcha: using hard AI problems for security. In:
Biham E (ed.) International Conference on the Theory and Applicationsof Cryptographic
Techniques. London: Springer, pp. 294–311.
1884 new media & society 22(10)
von Ahn L, Liu R and Blum M (2006) Peekaboom: a game for locating objects in images. In:
Proceedings of the SIGCHI conference on human factors in computing systems, Montreal,
QC, Canada, 22–28 April, pp. 55–64. New York: ACM.
von Ahn L, Maurer B, McMillen C, et al. (2008) reCAPTCHA: human-based character recogni-
tion via web security measures. Science 321(5895): 1465–1468.
Weiser M (1991) The computer for the 21st century. ACM SIGMOBILE Mobile Computing and
Communications Review 3: 3–11.
Weizenbaum J (1966) Eliza—a computer program for the study of natural language communica-
tion between man and machine. Communications of the ACM 9(1): 36–45.
Wiener N (1954) The Human Use of Human Beings. Boston, MA: Houghton Mifflin Harcourt.
Woodcock J and Johnson M (2018) Gamification: what it is, and how to fight it. The Sociological
Review 66(3): 542–558.
Online Sources
[Onl.1] https://www.google.com/recaptcha/
[Onl.2] https://cloud.google.com/products/ai/ and https://www.ibm.com/cloud/machine-learning
[Onl.3] https://www.tensorflow.org/
[Onl.4] https://keras.io/
[Onl.5] https://www.theguardian.com/technology/2018/jul/06/artificial-intelligence-ai-humans-bots-
tech-companies
[Onl.6] https://www.npr.org/sections/alltechconsidered/2013/10/28/228181778/a-look-into-face-
books-potential-to-recognize-anybodys-face
[Onl.7] https://www.wired.com/story/facebook-will-find-your-face-even-when-its-not-tagged/
[Onl.8] Amazon Mechanical Turk. API Reference. API Version 2017-01-17. Online (PDF):
https://docs.aws.amazon.com/AWSMechTurk/latest/AWSMturkAPI/amt-API.pdf
[Onl.9] http://techlist.com/mturk/global-mturk-worker-map.php
[Onl.10] https://www.theatlantic.com/business/archive/2018/01/amazon-mechanical-turk/551192/
[Onl.11] https://www.wired.com/2014/10/content-moderation/
[Onl.12] https://derstandard.at/2000035900517/
[Onl.13] https://www.washingtonpost.com/news/the-switch/wp/2018/04/11/ai-will-solve-facebooks-
most-vexing-problems-mark-zuckerberg-says-just-dont-ask-when-or-how/
[Onl.14] https://www.vice.com/en_au/article/wj7mv5/instagram-is-using-ai-to-filter-out-toxic-com-
ments
Videos
[Vid.1] von Ahn L (2006) Human computation. Google Tech Talk, 26 July. Available at: https://
www.youtube.com/watch?v=tx082gDwGcM
[Vid.2] Ng AY (2017) Artificial Intelligence is the new electricity. Talk at Stanford Graduate School
of Business, 25 January. Available at: https://www.youtube.com/watch?v=21EiKfQYZXc
Author biography
Rainer Mühlhoff is a postdoctoral research fellow in philosophy at the Cluster Science of
Intelligence at Technical University Berlin, where he works on “ethics of design in AI and
robotics”. Rainer’s research areas are social philosophy and critical theory of the digital society.
Rainer studied mathematics, philosophy, and gender studies in Heidelberg, Leipzig, and Belin
(http://rainermuehlhoff.de).