RESEARCH ARTICLE
How to interpret algorithmically constructed
topical structures of scientific fields? A case
study of citation-based mappings of the
research specialty of invasion biology
Matthias Held
1
and Theresa Velden
2
1
Social Studies of Science and Technology, TU Berlin, Berlin, Germany
2
Deutsches Zentrum für Hochschul- und Wissenschaftsforschung (DZHW), Berlin, Germany
Keywords: algorithmic classification, bibliometrics, direct citation, field delineation, field mapping,
invasion biology
ABSTRACT
Often, bibliometric mapping studies remain at a very abstract level when assessing the validity
or accuracy of the generated maps. In this case study of citation-based mappings of a research
specialty, we dig deeper into the topical structures generated by the chosen mapping
approaches and examine their correspondence to a sociologically informed understanding of
the research specialty in question. Starting from a lexically delineated bibliometric field data
set, we create an internal map of invasion biology by clustering the direct citation network
with the Leiden algorithm. We obtain a topic structure that seems largely ordered by the
empirical objects studied (species and habitat). To complement this view, we generate an
external map of invasion biology by projecting the field data set onto the global Centre for
Science and Technology Studies (CWTS) field classification. To better understand the
representation of invasion biology by this global map, we use a manually coded set of invasion
biological publications and investigate their citation-based interlinking with the fields defined
by the global field classification. Our analysis highlights the variety of types of topical
relatedness and epistemic interdependency that citations can stand for. Unless we assume that
invasion biology is unique in this regard, our analysis suggests that global algorithmic field
classification approaches that use citation links indiscriminately may struggle to reconstruct
research specialties.
1. INTRODUCTION
It has become common practice to use algorithmic approaches to produce mappings of the-
matic structures of science from bibliometric data. In spite of their promise and popularity, it
remains an open question how the algorithmically extracted structures relate to entities of
interest for sociological study, such as research topics and research specialties, which constitute
shared reference frames for researchers in the collective production of scientific knowledge
(Gläser, 2006;Held, Laudel, & Gläser, 2021).
The question of how citation-based maps relate to entities of sociological interest has
been critically addressed in the past, such as by Edge (1979), who disputed the capability
of citation analysis to effectively detect and map scientific communities. Today, the
an open access journal
Citation: Held, M., & Velden, T. (2022).
How to interpret algorithmically
constructed topical structures of
scientific fields? A case study of
citation-based mappings of the
research specialty of invasion biology.
Quantitative Science Studies,3(3),
651–671. https://doi.org/10.1162/qss_a
_00194
DOI:
https://doi.org/10.1162/qss_a_00194
Supporting Information:
https://doi.org/10.1162/qss_a_00194
Received: 30 June 2021
Accepted: 2 March 2022
Corresponding Author:
Matthias Held
Handling Editor:
Vincent Larivière
Copyright: © 2022 Matthias Held and
Theresa Velden. Published under a
Creative Commons Attribution 4.0
International (CC BY 4.0) license.
The MIT Press
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
relationship between citation-based clusters of publications and scientific research topics or
specialties is still unclear, not least because we lack ground truth data that would tell us how
bibliometric representations of actual research specialties or research topics look like (Gläser,
Glänzel, & Scharnhorst, 2017;Held et al., 2021).
In lieu of a ground truth that has been determined by direct empirical observation and mea-
surement of the object of interest, bibliometricians have constructed surrogates for a ground
truth by defining metrics that derive from alternative data sources and grouping sets of themat-
ically related publications together. Data sets used for such metrics include the reference lists
of review articles (Klavans & Boyack, 2017), the publications attributed to the same grant num-
ber (Klavans & Boyack, 2017;Boyack, Newman et al., 2011), articles in specialty journals
(Sjögårde & Ahlgren, 2020), special issues of journals (Donner, 2021), and articles associated
by reading logs
1
(Donner, 2021). These metrics allow us to measure and compare clustering
solutions in terms of the accuracy with which they place the same items together compared to
the grouping of the respective data set used as reference.
The extent to which the groupings of publications suggested by such reference data sets
relate to research topics or research specialties, however, remains vague. They are minimalist
prescriptions of a ground truth, as they merely define islands, subsets in a sea of publications,
that need to be grouped together by a clustering solution to score well. Hence, the accuracy
measured with the help of such metrics can provide a touchstone for thematic relatedness but
it provides little qualitative insight into the type and features of this thematic relatedness and
the correspondence between the grouping of publications produced and actual research com-
munities or research topics.
To shed more light on how algorithmic groupings of publications in citation-based networks
and the research done in research specialties are connected, we conduct a qualitative case
study that zooms in on one research specialty in particular, namely, invasion biology. Our
focus in this case study is on the meaning of citations in the context of the knowledge produc-
tion processes of a research specialty, because it is the pattern in citation links that is exploited
by clustering algorithms when generating citation-based mappings. We chose invasion biology
because this is a research specialty that we have studied in a related project
2
, using qualitative
methods (ethnographic observations and expert interviews) to better understand processes of
knowledge production in this field. This enables us to support our analysis by what we term
sociologically informed domain knowledge to distinguish it from the type of domain knowl-
edge commanded by experts who are participants in the field, which will be deeper in many
respects, but less attuned to a theoretically grounded, sociological analysis of collective
knowledge production processes and their representation by bibliometric networks.
We proceed in this case study by delineating the research specialty of invasion biology
through a lexical query in a bibliometric database and producing two complementary algo-
rithmic mappings of this data: an “internal”view that uses only the citation links within the
field and an “external”view that additionally includes the citation links from outside the field
and seeks to capture the embedding of the research specialty into the global network of sci-
entific publications. Our findings show that the thematic structure constructed by clustering
the internal citation network is ordered primarily by the empirical object of study in invasion
1
Digital traces of browsing behavior in bibliographic online databases that track the succession of items that a
user looks at or downloads.
2
The qualitative studies of invasion biology are part of a field-comparative research project of the junior
research group “Field Specific Forms of Open Science”at the Deutsche Zentrum für Hochschul- und
Wissenschaftsforschung, which is led by one of the coauthors of this study, Theresa Velden (2018).
Quantitative Science Studies 652
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
biology: species and habitat. When we produce an external view onto the research specialty
by overlaying the field data set with a prominent global field classification that is based on the
clustering of a global citation network of science, we find that the topical ordering that is
imposed on the research specialty is less clear. Only a fraction of the literature in invasion
biology is grouped together—the majority of publications are dispersed into hundreds of
smaller subsets embedded in larger clusters, termed microlevel fields. The global field classi-
fication clearly does not reconstruct the specialty in an intelligible way, echoing a finding by
Haunschild, Schier et al. (2018) for the research specialty of “overall water splitting.”Our
detailed qualitative examination indicates why that is the case.
2. BACKGROUND
2.1. Mapping with Direct Citation Links
Especially in the last decade, it has become common practice in scientometrics to use direct
citation links to map the global structure of science at the article level (Boyack & Klavans,
2010;Klavans & Boyack, 2017;Sjögårde & Ahlgren, 2020;Šubelj, van Eck, & Waltman,
2016)
3
. This development succeeded earlier attempts in science mapping which used indirect
citation links in the form of cocitation or bibliographic coupling (Klavans & Boyack, 2017,
pp. 986–7). Those two indirect models were preferred because they are more amenable to
thresholding. This was not only needed for technical reasons (to keep network sizes small
due to restrictions in computational power) but also conceptually favored, as the focus at
the time was on detecting subsets of publications that represent emerging topics, rather than
on a field classification of an entire corpus of literature (Klavans & Boyack, 2017, p. 987).
With recent advances in computational power, the task of global field classification at
article level has become tractable and direct citation models have become a preferred choice
for global maps (Klavans & Boyack, 2017). This is in large part due to the sparseness of their
adjacency matrix, which increases the computational efficiency of clustering very large cita-
tion networks (Sjögårde & Ahlgren, 2018,p.5;Sjögårde & Ahlgren, 2020, p. 209; Šubelj et al.,
2016,p.2;Van Eck & Waltman 2017, p. 1056).
Theoretical justification for the use of direct citation models is the assumption that a direct
citation indicates a topical relatedness between the citing and cited publication. Waltman and
Van Eck (2012), who use direct citations for an article-level global mapping, speculate that
direct citations may “provide a stronger indication of the relatedness of publications”because
they represent a direct connection, whereas cocitation or bibliographic coupling are one step
removed as they both require two direct citations to happen before two studies are connected
by a cocitation, or a bibliographic coupling link (Waltman&VanEck,2012, p. 2380). A
study by Klavans and Boyack (2017) that compares global maps built from direct citation,
cocitation, and bibliographic coupling data provides empirical support for this speculation.
Using an accuracy metric that is based on the groupings of publications referenced by the
same review article, they find that direct citation models outcompete the other citation based
models.
The interpretation of the clusters of publications that result from clustering direct citation
networks remains an open research question. The clusters obtained are referred to as research
areas (Šubelj et al., 2016;Waltman & Van Eck, 2012), taxonomic subjects (Klavans & Boyack,
2017), citation topics (Potter, 2020), and, depending on the level of resolution, research
3
Current commercial products, such as Clarivate’s InCites classification (Potter, 2020), as well as Elsevier’s
SciVal Topic Clusters (Elsevier, 2022), are also based on this method.
Quantitative Science Studies 653
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
specialties (Sjögårde & Ahlgren, 2020) and topics (Sjögårde & Ahlgren, 2018). Information
extracted from terms used in titles and abstracts, as well as journal titles, is frequently used
to describe the topical content of clusters. These labels corroborate a topical relatedness of
publications grouped together in a cluster, as well as a topical distinctiveness when compared
to publications in other clusters. However, it is not clear how these clusters of publications
relate to the social and cognitive formations of scientific communities and research specialties.
Relevant questions include the following: What do research processes that produce the pub-
lications included in a cluster have in common? How are publications with an increased den-
sity of citation links between them related to one another?
2.2. Validation of Field Mapping
It has been known for a long time among bibliometricians that a proper validation of the delin-
eation and mapping of scientific fields is extremely difficult (see e.g., Tijssen, 1993). Zitt, Lelu
et al. (2019, p. 26) recognize that “there is no ground truth basis for defining knowledge
domains.”The difficulty arises because creating a ground truth for a field would mean to
reconstruct the shared perspectives of all researchers belonging to the field. Each scientific
field is comprised of topics that researchers are working on. Topics represent collective inter-
pretations of knowledge claims which matter for their research. However, even validating the
reconstruction of topics poses a serious challenge. Held et al. (2021) provide an attempt to
construct a ground truth for topics by combining qualitative data from interviewing individual
researchers with bibliometric data to reconstruct their perspective (“microlevel”), and from this
combine the individual perspectives to obtain a shared perspective (“mesolevel”). The validity
of these ground truths to represent topics (“mesolevel”)—not even speaking of an entire
field—can, however, be questioned.
Owing to the plight of the lack of a ground truth as a means to validate algorithmically recon-
structed topical structures, studies have adopted different approaches to assess the appropriate-
ness of their results (Held et al., 2021, p. 4512), among them making use of experts’opinions to
evaluate the structures (for a critique, see Gläser [2020]); comparing different solutions that use
the same data set as input (Velden, Boyack et al., 2017); or calculating the accuracy in terms of
agreement with an independent data set as a “gold standard”(Boyack et al., 2011;Donner,
2021). All of these vehicles to evaluate bibliometric structures are useful to learn about the
maps we create, and to help to assess their plausibility. Yet, these approaches remain limited
when it comes to validity. They contribute little to understanding in a principled way how the
structures constructed from the data relate to research topics and research specialties.
As it is, we can understand the clusters created as algorithmically detected fluctuations in
citation link density that show some degree of overlap with thematic groupings of publications
as defined by reference lists of review articles, by temporal succession in reading logs, or by
special issues of journals. This, along with the topic labels derived from publications grouped
together in a cluster, suggests a certain level of thematic relatedness of publications in a clus-
ter. But without further insight into what underlies this thematic relatedness, and hence what it
is that these clusters represent, it is difficult to base strong claims on them in a sociological or
evaluative context, as they still could represent “algorithmic artifacts”(Leydesdorff & Milojević
2015, p. 201) rather than actual research topics or specialties.
What is further disconcerting is the fact that theoretical considerations imply that research
topics and research specialties not only vary in size, but overlap pervasively (Havemann,
Gläser, & Heinz, 2017). Current direct-citation-based mappings produce disjoint clusters
and sidestep the reconstruction of pervasively overlapping topic structures. This implies that
Quantitative Science Studies 654
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
even in the best case only some topics or specialties are captured by a given map, while others
are left out. As of now, we cannot specify under what circumstances which types of topics or
specialties are reconstructed.
To better understand how clusters of publications in direct citation networks may be inter-
preted and relate to research topics and research specialties, we propose to turn our attention
to the signal that enters clustering algorithms and take a closer look at the topical relatedness
that underlies the citation links that embed a given publication in the overall citation network.
2.3. Meaning of Citation Links
What today’s citation-based algorithmic mapping approaches share is that a citation is used as
dimensionless entity, where the meaning contained in a citation receives a maximal simplifi-
cation (Wouters, 1999). This simplification masks the various meanings that the act of citing
confers to the citation link between two publications. Amsterdamska and Leydesdorff (1989)
show in a case study of the argumentative function of citations that citations to sources confer
rather distinct types of relevance to a given source and that even references to the same sen-
tence can signify different types of relevance, depending on the argumentative function of the
citation in the citing publications. They argue that the process of integration of knowledge
claims into the existing knowledge base via citation links is much more complex and multi-
dimensional than bibliometric uses of citations account for. The discourse in science studies
about what citations indicate about the relationship between publications, and the potential
and limitations of quantitative citation analysis has been going on at least since the 1970s
(Amsterdamska & Leydesdorff, 1989;Edge, 1979;Erikson & Erlandson, 2014;Gläser, 2006,
pp.141–7; Leydesdorff, 1998;Luukkonen, 1997;Nigel Gilbert, 1977;Zuckerman, 1987).
Falling short of arriving at a consistent picture that would comprise a citation theory, this dis-
course has yielded “microtheories”(Gläser, 2006, p. 145) of citation, which look at the act of
citing a paper from many different perspectives. However, these different perspectives on the
scientists’act of citing another paper have not found much regard in quantitative bibliometric
analyses, neither in structural bibliometrics (where every citation is taken as an equally rele-
vant link between two papers) nor in evaluative bibliometrics (where every citation is equally
used to assess a paper’s impact).
If citations are multidimensional, and the process of integration is more complex than the
simplified use of citations as dimensionless entities suggests, then a case study that explores
more deeply the distinct meanings of citation links in a science map may provide pointers to a
better understanding of the relationship between algorithmically generated clusters and
research topics or specialties.
For this study, our take on interpreting citation links and their informational value for the
topical grouping of publications is that, from a sociological perspective, citation links are
indicative of the interdependency between researchers in the production of scientific knowl-
edge. One way in which this interdependency becomes tangible is what we term here the
epistemic dimension of reference links, as they situate a given study relative to published
knowledge claims along a small set of fundamental dimensions of research: the research prob-
lem treated, the empirical object(s) studied, the methods used, the theoretical resources used,
or the external relevancy of the study and its results (Seitz, Schmidt et al., 2021)
4
. Whereas
4
Our selection of these epistemic dimensions is building on Law (1973),whohasidentified“methods,”
“theory,”“object,”or “problem”as potential distinct foci of specialties; and Gläser, Laudel et al. (2018),
who distinguish these as dimensions along which interdependencies within communities exist (to differing
degrees, due to differences in the epistemic conditions of research).
Quantitative Science Studies 655
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
citation links cannot necessarily be taken to represent influence, as pointed out by Edge
(1979), we may attribute to them at least the signaling of some level of awareness of work that
is topically related along the fundamental dimensions listed above. By distinguishing such epi-
stemic dimensions of relatedness, we hope to get a better insight into how aggregate structures
of citations relate to knowledge production in a given research specialty.
3. DATA AND METHODS
This case study focuses on invasion biology, a research specialty that started to develop in the
1980s, building on an increasing knowledge base about the biology and ecology of invasive
species (Davis, 2006;Reichard & White, 2003). It became institutionalized in the 1990s and
early 2000s, when specialized research journals, research centers, and conference series were
founded (Vaz, Kueffer et al., 2017, p. 433). Besides a maturing knowledge base, the emergence
of the specialty was also shaped by a growing policy interest in managing biological invasions,
due to a growing awareness of potential socioeconomic impacts (Vaz et al., 2017).
The focus of invasion biology is the human-induced spread of new organisms, and it
addresses the invasion process, pathways, causes, and factors for invasion success, as well
as invasion impact and invasion control or management. As such, it comprises fundamental
research as well as applied research (Vaz et al., 2017, p. 433). It is strongly rooted in ecology,
but—judging from the journals in which invasion biological research is published—it also
receives contributions from a range of fields, especially environmental sciences, but also biol-
ogy and geosciences, as well as social sciences and humanities (Vaz et al., 2017, p. 434).
3.1. Delineating the Field
We delineated the field of invasion biology by following a scheme outlined in Zitt et al. (2019,
p. 55), which includes (a) supervised information retrieval (lexical query), (b) the expansion of
the query, (c) the bibliometric expansion (in our case a shrinking), and (d) a final evaluation of
outcome using our own field knowledge. For the first step (a), we rely on an existing lexical
query developed by field experts, then (b) we extend this lexical query (increasing the recall)
by using information from the metadata of researchers’publications in the field, and (c) reduce
the publication set again (increasing precision) using its citation network. For Step 1, we build
upon an approach that was developed by researchers in invasion biology (Vaz et al., 2017),
which was based on a lexical query developed and intended to capture publications belong-
ing to their research specialty:
[“Ecological invasion*”or “Biological invasion*”or “Invasion biology”or “Invasion ecol-
ogy”or “Invasive species”or “Alien species”or “Introduced species”or “Non-native spe-
cies”or “Nonnative species”or “Nonindigenous species”or “Non-indigenous species”or
“Allochthonous species”or “Exotic species”]
We used this query in a previous study before for delineation (Held & Velden, 2019). How-
ever, through our continued engagement with researchers in the field
5
, we knew of their pub-
lications lists and found that the above lexical query failed to cover a relevant proportion of
their publications in invasion biology. To assess how an increase in recall could be achieved,
we selected eight invasion biologists as a reference set, using a combination of criteria to
approximate a broad representation of the field (topic focus, long-term versus recent high
5
The project of T. Velden compares four fields of science (one of them invasion biology) and uses bibliometric
maps of research specialties to support comparisons in ethnographic science studies.
Quantitative Science Studies 656
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
producers). We found that authors use terms more specific than the more general terms in the
above query (e.g., “invasive tree”or “invasive cane toad,”rather than just “invasive species”),
and concluded that the original lexical query needed to be extended (Step 2, expansion of the
query). We decided to stay with the lexical query as main approach to delineate the field, as
the field seems to be delimitable using characteristic invasion-related terms, and we extended
the above query to cover the variation in term usage of researchers in the field. For this, we
queried a large, very coarsely delineated publication set in the Web of Science (WoS) to
identify and assess the frequency of common term phrases that characterize invasion
biology (using characteristic adjectives, and assessing their accompanying noun phrases,
which included taxonomic species and invasion biology concepts, see the full methodology
in Section S.1.1 of the Supplementary material). Thus, we iteratively extended our query with
the frequently occurring terms (the final lexical query “2.0”which we used in delimiting
invasion biology is provided in the Supplementary material, Section S.1.2). When assessing
the increase in recall using the reference set of eight authors, we found for seven that the
expanded query increased the recall of their invasion biological publications to above 90%.
For one reference author, whose work was initially particularly poorly represented by less than
30%, recall increased to almost 80% (see Sections S.1.3–S.1.4 for details). Using this query, we
retrieved a set of 65,046 publications from the WoS online interface on June 22, 2019
(Figure 1). In the database of the Kompetenzzentrum Bibliometrie
6
(stable version of 2019)
we were then able to retrieve full metadata for 63,967 publications. Finally (Step 3), to
increase the precision by reducing noise in the data we first constructed the direct citation
network (using only internal links, decreasing the set to 55,474) and extracted its giant
component (from 619 network components). The final set consists of 53,524 publications,
which constitutes what in the following we call the invasion biology field data set.Seitz
et al. (2021) used a manually coded set of publications (see Section 3.6) to assess the
precision of this field data set with regard to capturing invasion biological publications. The
Figure 1. Overall scheme depicting the workflow to produce the internal mapping (left side) and
the external mapping (right side) of the publication set.
6
Competence Centre for Bibliometrics: https://www.forschungsinfo.de/Bibliometrie/en/index.php?id=home.
Quantitative Science Studies 657
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
quality check conducted by Seitz et al. identified 6% of publications that formally matched the
lexical query but have no discernible link to invasion biology, in spite of the citation-based
shrinking of the data set in Step 3. The next two subsections describe how we use the field
data set as input for the Leiden algorithm to produce an internal mapping and for the
projection onto the Centre for Science and Technology Studies (CWTS) field classification
from 2019 to produce an external mapping (Figure 1).
3.2. Clustering of Field Data Set (Internal Mapping)
To produce a mapping of invasion biology from an internal perspective, we used as input the
generated field data set, which comprises 530,853 (internal) citation links. For clustering, we
chose the Leiden algorithm (Traag, Waltman, & van Eck, 2019), a community detection algo-
rithm that has been developed to overcome a decisive shortcoming of a widely used com-
munity detection algorithm, the Louvain algorithm (Blondel, Guillaume et al., 2008),
namely, the production of badly connected clusters. It further allows us to choose the quality
functions Constant Potts Model (CPM) or modularity. We chose CPM, which has been shown
to be resolution limit free (Traag, Van Dooren, & Nesterov, 2011;Traag et al., 2019). For the
resolution values and minimum cluster sizes, we selected two values each (Figure 1). Differ-
ent from the methodology introduced in Waltman and Van Eck (2012),wedidnotmerge
clusters below the threshold, and instead discarded them. The publications from those dis-
carded clusters amount to less than 10% of the publications in both solutions Leiden 7 and
Leiden 15. The algorithm was started with random seed 0, run with 100 iterations with 10
random starts each.
3.3. Projection onto Global Field Classification (External Mapping)
To complement this internal perspective on a research specialty with an external perspective
that takes the embedding of publications in the global citation network of science into account,
we projected the field data set onto the CWTS “microlevel field classification.”The global field
classification consists of 4,536 “microlevel fields”(MLFs) that have been extracted with the SLM
algorithm on the weighted direct citation network of more than 23 million publications pub-
lished in 2000–2019 and indexed by WoS. Of the 53,524 publications included in the field
data set, 51,845 were found in the MLFs of the CWTS field classification. This global field clas-
sification induces a subdivision of the field data set into groupings of publications that consti-
tute the intersection between field data set and MLFs. We refer to the publications included in
the intersection of MLFs with the field data set as projection clusters.
3.4. Labeling
To find characteristic terms to describe the content of clusters, we extracted the noun phrases
from titles and abstracts of the publications of each cluster in each cluster solution. We con-
solidated species names by tagging taxonomic species (because various terms referring to the
same invasive species are used in invasion biology) with the software Linnaeus (see Section
S.1.5). Finally, characteristic terms for each cluster were determined using the labeling
approach based on normalized mutual information described by Koopman and Wang
(2017). The full description of our labeling process is given in Section S.1.5.
3.5. Visualization
For visualizing the reconstructed topical structures of the internal and the external mapping,
we use topic affinity networks that evaluate the strength of citation links between topical
Quantitative Science Studies 658
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
clusters to determine their affinity. The existence of a link between clusters in the affinity net-
work indicates a surplus of connectivity between the two compared to a random null model
(see Velden, Yan, and Lagoze (2017) for details).
3.6. Probing with Publications in the Field
To mediate between the macrolevel of algorithmically generated thematic structures from
direct citation networks and the microlevel of individual citation decisions, we look at the
topical meaning that citations carry due to the epistemic dimension of scientific knowledge
production that they pertain to. As a first step, we identify publications with an undeniably
strong focus on invasion biology. To this end, we take advantage of a sample of publications
from the field data set that has been coded by Seitz et al. (2021) by their degree of focus on
invasion biology. The approach to the classification of research focus developed by Seitz et al.
requires the intellectual assessment of the degree to which a publication is focused on con-
tributing to the field of invasion biology based on the information and framing provided in the
bibliographic metadata, especially title, abstract, and author keywords. The classification
scheme distinguishes four ordinal classes: core (studies with a singular focus on invasion biol-
ogy), boundary (studies with an additional focus on another field), periphery (studies with only
a weak link to invasion biology), and unrelated (studies with no discernible link to invasion
biology). From a field delineation perspective, core and boundary publications together can be
considered as constituting the field, whereas peripheral and unrelated publications are not
considered part of the field. The classification rests on an understanding of what constitutes
research in the field of invasion biology, which requires us to look into the substantive content
of the science. To make this—in the end subjective—understanding explicit and increase the
transparency of the classification process, Seitz et al. describe the research focus and overarch-
ing research questions that define the field in their mind, specify a decision tree, and document
decisions on where to draw the line
7
.
Seitz et al. have applied this classification to 100 publications, randomly sampled from
publications in the field data set published in the year 2018. They found that 49% of publi-
cations are core, 22% boundary, 23% periphery, and 6% unrelated. Taken together, the data
suggest that 71%(±9%) of publications in the field data set represent invasion biological
research while 29%(±9%) have only a weak link or no link to invasion biology. For the
purposes of our case study, we use a subset of 49 publications from the sample that were
identified as belonging to the category of core publications, which means that they have an
undeniably strong focus on invasion biology. This subset allows us to probe how well the
algorithmically generated clusters of the global science map represented by the CWTS-MLF
classification capture the research specialty of invasion biology. This is similar to the approach
taken in the study conducted by Haunschild et al. (2018) about the degree to which the
CWTS-MLF classification captures the research specialty of overall water splitting.
In a second step, we move beyond a merely quantitative assessment of the agreement
between algorithmic field classification and intellectual assignment of publications to the field:
We inspect individual cases of core invasion-biological publications and explore how they are
embedded into the algorithmically generated MLFs by citation links that carry different kinds
of topical meaning.
7
For example, when a study examines an invasive species, without attending to any aspect of the invasion
process, it is classified as research that is not invasion biological, but peripheral to the research specialty of
invasion biology.
Quantitative Science Studies 659
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
4. RESULTS
We have generated two complementary bibliometric maps of the research specialty of inva-
sion biology that are based on direct citation links. Both work with the same lexically delin-
eated field data set representing invasion biology, and they use similar clustering approaches
to detect structures in direct citation networks. The first one exploits the signal provided by
citation links inside the field data set to construct an internal perspective of the topic structure
of the field (Section 4.1). The second one provides an external perspective on the field data set,
one that is influenced by citation links inside the field data set as well as by links from a global
set of scientific publications (Section 4.2).
4.1. Internal Field Mapping
For the internal field mapping we produce two clustering solutions, using two different reso-
lution levels. Cluster sizes are given in Figure 2. The topical content of each cluster is indicated
by the cluster labels provided in the Supplementary files (cluster descriptions). The alluvial
diagram (Supplementary material) of the two cluster solutions indicates great stability between
the two solutions, in the sense that clusters in the low-resolution solution (Leiden 7) are split up
to form smaller clusters in the higher resolution solution (Leiden 15). Only occasionally do
subsets from different “parent”clusters combine to form new clusters.
Figure 3 visualizes the topic structure of invasion biology in the form of a topic affinity net-
work and is based on the seven-cluster solution Leiden 7. Node sizes are scaled by number of
publications, and links indicate a surplus in intercluster citations, or “topic affinity.”The map
suggests that, based on internal citation links, publications in invasion biology group together
by empirical object, that is, habitat (aquatic versus terrestrial) and, within the terrestrial habitat,
species (vertebrates versus insects versus plants). The largest topical cluster contains 35.7% of
publications. A predominant focus of this largest cluster seems to be invasive plants, which
aligns with the judgement of field experts that observational studies of terrestrial plants
dominate the empirical literature in invasion biology (Jeschke & Heger, 2018, p. 162; Pyšek,
Richardson et al., 2008).
4.2. External Field Mapping
Projecting the field data set onto the more than 4,000 CWTS MLFs results in a very uneven size
distribution of projection clusters, as can be seen from Figure 5. The largest cluster contains
18% of publications in the field data set. The second largest cluster is drastically smaller,
Figure 2. Number of publications in the major clusters of the Leiden 7 and Leiden 15 solutions.
Quantitative Science Studies 660
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
capturing only 2.6% of publications in the field data set
8
. Overall, there are 873 projection
clusters (i.e., more than 20% of MLFs in the global field classification contain at least one
publication from the field data set). The topic affinity network depicted in Figure 4 is
based on the 56 largest projection clusters. It consists of the largest cluster, containing almost
10,000 publications, along with 55 small topical clusters ranging in size between 1,335 and
200 publications. The network depicted covers only 75% of publications in the field data
set. To obtain a mapping that includes at least 90% of publications, we would need to include
140 clusters, down to a size of 45 publications.
Figure 5 shows that in most cases, the publications from the field data set constitute just a
very small portion of the publications in the corresponding MLFs. Almost two-thirds (63%) of
publications in the field data set is spread across more than 800 MLFs in which they have less
than a 10% share, such that they are marginalized within the respective MLFs.
One possible explanation for such a high level of dispersion of the field data set when pro-
jected onto the CWTS field classification could be a lack of precision in the delineation of the
field data set. However, as mentioned above, according to the analysis of Seitz et al. (2021)
only 29%(±9%) of publications in the field data set have a weak or no link to invasion biology,
and hence could account for the spread. So while lack of precision may be one factor, alone it
is insufficient to explain the observed dispersion of 63% of the field data across MLFs. This
raises the question of why in the global field classification so many invasion biological
publications are part of MLFs where they constitute only a marginal subset and how they
are associated with the respective MLFs.
To answer this question, we take a closer look at a sample of publications from the field
data set that have a strong focus on invasion biology to see how they are embedded in MLFs.
Figure 3. Topic affinity network of invasion biology based on the Leiden 7 clustering solution. Node sizes reflect number of publications in a
topic cluster.
8
Restricting the field data set to those publications with a ”strong signal”(many citations/references, repre-
sented by the 10-core of the network), shows a similar distribution. Find detailed results in Section S.2.1.
Quantitative Science Studies 661
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
We take advantage of a sample of 49 publications manually classified by Seitz et al. (2021) as
core invasion biological publications (see Section 3.6 and a list of all 49 publications in the
supplementary files). We find that a quarter (24.5%) of those 49 core invasion biological pub-
lications are located in MLFs that overlap with the field data set by more than 50%, and hence
can be considered MLFs with an invasion biological orientation. More than half (53%) of the
sample of 49 core invasion biological publications, however, are included in MLFs that have a
Figure 4. Topic affinity network of the 56 largest projection clusters (defined as the intersection between field data set with CWTS MLFs). The
coloring of nodes derives from the Leiden 7 clusters (in Figure 3) and indicates when more than 50% of publications in a projection cluster
overlap with publications in the respective cluster from Leiden 7. Black/grey are clusters where the majority of publications are publications
that were in the residual cluster of the Leiden 7 solution (i.e., not part of one of the seven main clusters). Three nodes are highlighted with a
lighter shade, C5, C14, and C26 to indicate that the dominant share constitutes less than 50%.
Figure 5. The 56 CWTS MLFs with the highest number of publications from the field data set. The publications from the field data set (i.e.,
projection clusters) are indicated in green. The two MLFs with more than 40% of overlap with the field data set are highlighted.
Quantitative Science Studies 662
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
marginal overlap of less than 10% with the field data set. If one examines the topical orien-
tation of those MLFs
9
, we find that the epistemic link between those MLFs and the embedded
core invasion biological publications is primarily given through either a shared empirical
object or a shared methodological approach. In the following Table 1, we provide five exam-
ples to illustrate these different associations.
Example 1 describes the invasion history of an aquatic species in Finland and discusses the
ecological and economic risks associated with it, as it can support the spread of parasites. It is
embedded in MLF mf3541, which relates to research on aquatic parasites and is approxi-
mately 1,400 publications strong. The second paper is published in the journal Gigascience
and has been classified by Seitz et al. as core invasion biology, as it presents a study of causal
factors for invasion success. Its empirical object is an invasive freshwater snail that, according
to the authors of the study, is one of the 100 worst invasive species worldwide. The publication
is embedded in mf1589, which consists of more than 5,600 publications and relates to zoo-
logical research on molluscs. Example 3 is published in the macroecological journal Global
Ecology and Biogeography and has been classified by Seitz et al. as core invasion biology
because it deals with predicting the geographic expansion of nonindigenous organisms which
is a central invasion biological concern. It proposes a novel method for the prediction of range
expansion for invasive species, by adopting a forecasting approach previously applied in the
biomedical field. This study is the only publication from the field data set that is included in
mf316, which consists of more than 12,000 publications and seems to be dominated by
research on flu vaccinations. This example cautions against discounting a minimal overlap
between the field data set and an MLF as an indication of the irrelevance of the respective
publication to invasion biology. The study in example 4.1 is published in Entomological
Research and has been classified by Seitz et al. as core invasion biology because, again, it
deals with predicting the geographic expansion of nonindigenous organisms, which is a cen-
tral invasion biological concern. Its empirical object is a highly invasive ant. This publication
is located in an MLF of more than 8,000 publications (mf913) that seems to share a method-
ological approach: species distribution modeling. Example 4.2 represents another core inva-
sion biological study of invasive fire ants. It examines factors explaining the distribution pattern
of fire ants in the United States, and is assigned to mf227, an MLF dealing with sociobiological
and behavioral research on ants.
The two fire-ant-related invasion-biological studies referred to above represent an interest-
ing case for further exploration, given that they are assigned to different MLFs, neither of which
seems to have an invasion biological focus. Only around 50% of the cited sources in the ref-
erence lists of these two articles are included in the direct citation network that the CWTS MLF
classification is based on. The other 50% of cited sources are either not indexed in the WoS
Core Collection or published outside of the 2000–2019 time window. Figure 6 shows those
MLFs that the majority of the cited sources are assigned to. An inspection of titles and abstracts
of those cited sources provides insight into the different bodies of knowledge that they repre-
sent and that the citing articles relate to. We find that references assigned to the same MLF tend
to share either the empirical object, a methodological approach, or a theoretical concern with
a certain research problem, such as invasiveness (mf913) or cold tolerance of insects (mf1624).
As we can see from Figure 6, the two invasion biological studies are assigned to the MLF that
9
For this we rely on the CWTS “microlevel”field labels of 2019, accessible at https://www.leidenranking.com
/Content/CWTS%20Leiden%20Ranking%202019%20-%20Micro-level%20fields.xlsx (accessed
July 4, 2021).
Quantitative Science Studies 663
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
Table 1. Examples of five core invasion biology publications with their assignments to CWTS MLFs and identified epistemic link to the MLF
Example Title Journal MLF
Classification
by Seitz et al. Epistemic link
#1 Invasion of Finnish inland waters by the alien
moss animal Pectinatella magnifica Leidy,
1851 and associated potential risks
Management of
Biological Invasions
mf3541—aquatic
parasites
Core invasion
biology
Empirical object
#2 The genome of the golden apple snail Pomacea
canaliculata provides insight into stress
tolerance and invasive adaptation
Gigascience mf1589—molluscs Core invasion
biology
Empirical object
#3 Supervised forecasting of the range expansion
of novel nonindigenous organisms: Alien
pest organisms and the 2009 H1N1 flu
pandemic
Global Ecology and
Biogeography
mf316—flu vaccinations Core invasion
biology
Method
#4.1 Predicting the potential distribution of an
invasive species, Solenopsis invicta Buren
(Hymenoptera: Formicidae), under climate
change using species distribution models
Entomological
Research
mf913—species
distribution modeling
Core invasion
biology
Method
#4.2 Cuticular hydrocarbon chemistry, an important
factor shaping the current distribution pattern
of the imported fire ants in the USA
Journal of Insect
Physiology
mf227—sociobiological
and behavioral
research on ants
Core invasion
biology
Empirical object
Quantitative Science Studies 664
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
the relative majority of cited sources in their respective reference list is assigned to. For the
study in example 4.1, the number of method-related, empirical object-related, and theory-
related citations is fairly similar; for the study in example 4.2, the number of empirical
object-related citations clearly outcompetes the number of theory-related citations.
5. DISCUSSION
The internal and external mappings of invasion biology presented above are based on similar
clustering algorithms and the same data model (direct citation). However, they deliver rather
distinct perspectives on the target specialty due to the difference in the scope of the citation
signal that is used.
The organization of the topical structure of the internal map by empirical object echoes a
similar finding for a mapping of the field of astrophysics, which also used a direct citation
network as data model (Velden et al., 2017). This is an interesting result that raises the question
of whether the organization of the structure of the direct citation network by empirical object is
a signature of all empirically oriented scientific research specialties. Answering this question is
subject to future research.
For invasion biology, this finding is not entirely surprising. If we take citation links to signify
some form of communication (Vugteveen, Lenders, & Van den Besselaar, 2014), then the struc-
ture of the internal map indicates that the density of communicative links in invasion biology is
highest between publications pertaining to the same family of empirical objects. The majority
of field internal citation links, however, are not empirical object related but, according to the
analysis of Seitz et al. (2021), include a substantial proportion of theory-related and research-
problem-related citation links. Hence our finding of a citation-based topic structure that is
ordered by empirical object suggests that in this research specialty, research problems and
theoretical considerations are closely entangled with specific classes of empirical objects. This
would match observations of the field having a journal literature that is dominated by empir-
ical studies that use experimental and field-observational approaches to study specific invasive
species in specific geographical locations and habitats (Seitz, 2021). The predominance of
Figure 6. MLF assignment of cited sources of two core invasion biological journal articles related to invasive ants.
Quantitative Science Studies 665
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
case studies and system-specific knowledge in invasion biology is also discussed by
researchers in the field. It can be seen as a weakness and an expression of a lack of theoretical
integration (Davis, 2006), or it can be interpreted as a strength, and the result of application-
oriented research efforts directed at invasion management that adopt the stance that “useful
predictions [can] only emerge from focused studies on particular species and environments”
(Williamson [1999], cited in Davis [2006]).
Obviously, the ordering of publications by empirical object of research, which the direct
citation network delivers, is only one of potentially many useful thematic perspectives on the
published knowledge base in invasion biology. An alternative thematic perspective that
focuses on methods might distinguish field-observational studies, experimental studies,
macroecological studies, and policy-oriented social studies. Yet another thematic perspective
might foreground the theoretical understanding of invasion processes and distinguish work by
conceptual focus: from invasion pathways and factors of invasion success, to forms of invasion
impact. In principle, by use of different approaches and types of data one should be able to
tease out different aspects of the topic structures in a research specialty (see the triangulated
mapping of water science by Wen, Horlings et al. [2017]). A recent mapping approach devel-
oped by researchers from within the field of invasion biology (Jeschke, Enders et al., 2020)
aims to address a perceived lack of integration of knowledge in the field. The researchers
reconstruct the theoretical backbone of the field in the form of a network of related hypothesis
about invasion success (Enders, Havemann, & Jeschke, 2019), and organize publications that
report empirical results with regard to the empirical support they provide for the hypotheses.
As we move to an external perspective on the field and take the signal from the global
citation network into account, the topic structure of the research specialty that is reconstructed
reflects communication between knowledge claims produced in the field with knowledge
claims produced outside. An example is the mapping of astrophysics by Boyack (2017), pro-
duced by projecting the Astro Data set (Gläser et al., 2017) onto a global field classification
with approximately 1,700 clusters. Different from Boyack’s result for astrophysics, though, our
projection of the invasion biology field data set onto a global field classification results in a
strong dispersion of the target field (Figure 4), highlighting the many connections of knowledge
claims in the field to knowledge claims produced outside, mediated by empirical objects,
methodological approaches, and theoretical concerns. The difference in dispersion could
be indicative of actual differences in the cohesion and insularity of the respective fields: astro-
physics, an old, established field with a rather distinct sphere of empirical objects, versus inva-
sion biology, a new field with a strong share of empirical objects with other fields. However,
technical differences in the construction of the global mapping solutions and the delineation of
the field could also play a role, such that the results of the two studies cannot be directly
compared.
Our results can more readily be compared with the results of Haunschild et al. (2018), who
work with a previous version of the global CWTS field classification and examine the repre-
sentation of the research specialty of “overall water splitting”by CWTS MLFs. The strong dis-
persion that we observe for invasion biology reproduces their finding of a poor alignment of a
lexically delineated research specialty and the direct citation-based MLF classification. For the
specialty of overall water splitting, they find that the largest overlap of the field data set with an
MLF accounts for 41% of the publications in the field data set, which leads them to suggest
that the MLF classification may not have sufficient discriminatory power to delineate scientific
fields. In our study this largest proportion is even lower, with only 19% of publications from
the invasion biological field data set included in the MLF with the largest overlap.
Quantitative Science Studies 666
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
Invasion biology and overall water splitting are only two cases of research specialties that
align poorly with the CWTS field classification, which includes 4,000 MLFs. From a quantita-
tive point of view, one would argue that this is too small an empirical basis to dismiss the
ability of the MLF classification to identify and capture research specialties. More research
is needed to determine what kind of entities MLFs do represent, and what characteristics
may distinguish specialties missed by the classification from those that may be found to
be well represented by it.
Although our case study cannot authoritatively settle the question of the suitability of the
CWTS field classification to capture research specialties, it at least provides qualitative insights
into what causes the observed misalignment: In our investigation of the citation links of core
invasion biological publications to sources in the global direct citation network, we find that
citation links to sources in the same MLF relate to the same epistemic dimensions (especially
empirical object, method, or theory), and that the tally of citation links relating to one such
dimension determines, relatively arbitrarily
10
, which MLF of a set of potential MLFs a publi-
cation gets algorithmically assigned to. So, the dispersion observed seems to stem from the
variety of object-, method-, and theory-related bodies of knowledge that a typical invasion
biological study relates to, which the global microlevel field classification, based on generic
citation links, treats as distinct.
Perhaps some types of specialties fare better than others in being singled out by a direct
citation-based global mapping approach. Presumably, the less diverse a specialty is in terms
of objects studied and methods used, and the more integrated (or insulated from other fields)
theoretical concerns are, the more likely it is that citation links will cause related work to coa-
lesce into a common cluster of publications.
However, it is an open question how many research specialties fulfill such an “ideal”of a
cohesive and insulated base of relevant scientific knowledge. No observer of the sciences
today will deny that knowledge production processes are highly interconnected in most fields
of science. And we can point to features of knowledge production processes that become
apparent in our analysis of citation links in invasion biology here, and in Seitz et al. (2021),
that are at odds with features of global direct citation based mapping approaches. We observe
the “borrowing”of method-related and empirical knowledge, which, as we have shown, can
in the number of citation links it generates drown out the signal of citation links that are
directed at the discourse and knowledge produced within the research specialty that one is
analytically targeting. Using citation links as a generic signal is bound to conflate the topical
relatedness signal from publications in other research specialties that are used merely instru-
mentally, and the topical relatedness signal from publications within the specialty that a pub-
lication is seeking to contribute to. Furthermore, there is the pervasive overlap of research
specialties (Havemann et al., 2017)—a feature that is at odds with the hard clustering
approach used by global mapping approaches, which assigns publications to one cluster
only
11
.Seitz et al. (2021, p. 1031) observe in their random sample of publications from
the invasion biology field data set several instances of such overlap, giving as an example
10
Arbitrarily insofar as several dimensions may closely compete, and that the maximal number of sources cited
with regard to any of these dimensions does not necessarily indicate the research focus of a study.
11
Havemann et al. (2017, p. 1091) promote the idea of mapping topics locally (allowing for overlaps) by
including only the citations close to the area of interest in the network, in contrast to the global approaches,
which include all citations in the network. This idea, in principle, should also be applicable to map
specialties.
Quantitative Science Studies 667
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
publications about the interaction of invasive species with potential biological control agents.
The framing of such studies is decisively invasion biological, identifying an invasive pest,
describing its harmful impact, and studying ways to control it, such that Seitz et al. (2021) clas-
sify them as core invasion biological. At the same time, such a study is unequivocally contrib-
uting to the field of pest management, hence showcasing an instance of pervasive overlap of
the two fields. Similarly, several of the core invasion biological publications discussed in this
study and listed in Table 1 exemplify pervasive overlap: for instance, example 2 on the genome
of the golden apple snail, which is framed by its authors as contributing to elucidating invasion
success mechanisms, but simultaneously contributes to the field of studies of molluscs, and,
arguably, the new field of giga science.
So what could be a way forward? There does not seem to be an easy fix: Neither epistemic
dimensions of citations as (meta)data for clustering algorithms to exploit to produce refined
maps, nor highly performing clustering algorithms that allow for pervasive overlap of clusters,
nor visualization tools that support the inspection of polyhierarchical topic structures are
readily available.
What is needed, in the meantime, is a move toward greater veracity about the challenges
of interpreting the outcomes of algorithmic mapping approaches and a greater amount of
empirical evidence, rooted in an understanding of scientific knowledge production processes
that provides insight into what individual clusters represent than hitherto available in the sci-
entometric literature. With the approach we have taken in this study we seek to contribute to
the development of methods for producing such evidence and overcome the limitations of
expert evaluations of maps, which are susceptible to confirmation bias and tend to suffer from
the lack of a systematic contextualization of the view point of the expert (Gläser, 2020). An
important precondition for unleashing the power of a broader, interdisciplinary community to
envisage and pursue questions of validity and interpretation of global field classifications is to
make the respective data widely accessible, so they can be scrutinized in multiple case stud-
ies. We are grateful in this context for the readiness of our colleagues from the CWTS to share
their 2019 field classification with us.
6. CONCLUSIONS
Mapping structures of science using the citation signal has a decade-long history in biblio-
metrics. For certain purposes such a map may be useful, while for others it may fail to repre-
sent a suitable unit of analysis. Understanding what these mappings actually indicate—for
example, with regard to the delineation and topical structure of research specialties—is vital
if these maps are used for further analysis and evaluation.
To advance our understanding of how to interpret such mappings, we conducted a case
study of how direct-citation-based maps portray the research specialty of invasion biology.
Rather than ask field experts, who often are novices in the sociology of science, to provide
an ad hoc interpretation of a bibliometric map, we relied on sociologically informed domain
knowledge to support our analysis. Cornerstones of this analysis have been a transparent def-
inition of the field and a field focus classification (Seitz et al., 2021) that makes decision cri-
teria explicit for what type of publications are counted in, and an understanding of citation
links as indicating awareness and topical relatedness along fundamental epistemic dimensions
of research.
Interdependencies between scientific fields are, generally, mediated to a large degree by
shared methods, empirical objects, or theoretical concerns (Gläser et al., 2018). In this case
study, we demonstrate how these interdependencies are reflected in the interlinked citation
Quantitative Science Studies 668
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
structure of science. Our findings suggest that using direct citations to map scientific fields,
without any further differentiation regarding their epistemic dimension, results in a blurred sig-
nal and impairs the transparency of these maps. Along with other studies (Haunschild et al.,
2018;Held et al., 2021), it suggests that global science maps are likely not adequate to capture
all specialties. Our scrutiny of algorithmically generated field structures draws our attention to
what it means to constitute a field in today’s interdisciplinary science structure and the result-
ing challenges involved in mapping science.
ACKNOWLEDGMENTS
We thank Vincent Traag for providing us with the 2019 CWTS MLF classification data, and two
anonymous reviewers for their rich comments.
AUTHOR CONTRIBUTIONS
Matthias Held: Formal analysis, Methodology, Software, Writing—original draft, Writing—
review & editing. Theresa Velden: Conceptualization, Formal analysis, Investigation, Method-
ology, Visualization, Writing—original draft, Writing—review & editing.
COMPETING INTERESTS
The authors have no competing interests.
FUNDING INFORMATION
The work of MH was supported by the German Ministry of Education and Research (Grant
01PU17003). Furthermore, we acknowledge support by the German Research Foundation
and the Open Access Publication Fund of TU Berlin.
DATA AVAILABILITY
The data analyzed in this manuscript is subject to copyright (by Clarivate Analytics) and
cannot be made available.
REFERENCES
Amsterdamska, O., & Leydesdorff, L. (1989). Citations: Indicators of
significance? Scientometrics,15(5–6), 449–471. https://doi.org
/10.1007/BF02017065
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E.
(2008). Fast unfolding of communities in large networks. Journal
of Statistical Mechanics: Theory and Experiment,2008(10),
P10008. https://doi.org/10.1088/1742-5468/2008/10/P10008
Boyack, K. W. (2017). Investigating the effect of global data on
topic detection. Scientometrics,111(2), 999–1015. https://doi
.org/10.1007/s11192-017-2297-y
Boyack, K. W., & Klavans, R. (2010). Co-citation analysis, biblio-
graphic coupling, and direct citation: Which citation approach
represents the research front most accurately? Journal of the
American Society for Information Science and Technology,
61(12), 2389–2404. https://doi.org/10.1002/asi.21419
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., …
Börner, K. (2011). Clustering more than two million biomedical
publications: Comparing the accuracies of nine text-based
similarity approaches. PLOS ONE,6(3), e18029. https://doi.org
/10.1371/journal.pone.0018029, PubMed: 21437291
Davis, M. A. (2006). Invasion biology 1958–2005: The pursuit of
science and conservation. In Conceptual ecology and invasion
biology: Reciprocal approaches to nature (pp. 35–64). Springer.
https://doi.org/10.1007/1-4020-4925-0_3
Donner, P. (2021). Validation of the astro dataset clustering solu-
tions with external data. Scientometrics,126(2), 1619–1645.
https://doi.org/10.1007/s11192-020-03780-3
Edge, D. (1979). Quantitative measures of communication in science:
Acriticalreview.History of Science,17(2), 102–134. https://doi.org
/10.1177/007327537901700202,PubMed:11610633
Elsevier. (2022). Topic prominence in science.https://www.elsevier
.com/solutions/scival/features/topic-prominence-in
-sciencemethodology (accessed February 28, 2022).
Enders, M., Havemann, F., & Jeschke, J. M. (2019). A citation-based
map of concepts in invasion biology. NeoBiota,47,23
–42.
https://doi.org/10.3897/neobiota.47.32608
Quantitative Science Studies 669
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
Erikson, M. G., & Erlandson, P. (2014). A taxonomy of motives to
cite. Social Studies of Science,44(4), 625–637. https://doi.org/10
.1177/0306312714522871, PubMed: 25272615
Gläser, J. (2006). Wissenschaftliche produktionsgemeinschaften:
Die soziale ordnung der forschung (Vol. 906). Campus Verlag.
Gläser, J. (2020). Opening the black box of expert validation of bib-
liometric maps. In Lockdown bibliometrics: Papers not submitted
to the STI Conference 2020 in Aarhus (p. 27).
Gläser, J., Glänzel, W., & Scharnhorst, A. (2017). Same data—
Different results? Towards a comparative approach to the identi-
fication of thematic structures in science. Scientometrics,111(2),
981–998. https://doi.org/10.1007/s11192-017-2296-z
Gläser, J., Laudel, G., Grieser, C., & Meyer, U. (2018). Scientific
fields as epistemic regimes: New opportunities for comparative
science studies. Social Science Open Access Repository.
Haunschild, R., Schier, H., Marx, W., & Bornmann, L. (2018).
Algorithmically generated subject categories based on citation
relations: An empirical micro study using papers on overall water
splitting. Journal of Informetrics,12(2), 436–447. https://doi.org
/10.1016/j.joi.2018.03.004
Havemann, F., Gläser, J., & Heinz, M. (2017). Memetic search for
overlapping topics based on a local evaluation of link communi-
ties. Scientometrics,111(2), 1089–1118. https://doi.org/10.1007
/s11192-017-2302-5
Held, M., Laudel, G., & Gläser, J. (2021). Challenges to the validity
of topic reconstruction. Scientometrics,126, 4511–4536. https://
doi.org/10.1007/s11192-021-03920-3
Held, M., & Velden, T. (2019). How to interpret algorithmically
constructed topical structures of research specialties? A case
study comparing an internal and an external mapping of the
topical structure of invasion biology. arXiv preprint
arXiv:1905.03485. https://doi.org/10.48550/arXiv.1905.03485
Jeschke, J. M., & Heger, T. (2018). Invasion biology: Hypotheses
and evidence (Vol. 9). CABI. https://doi.org/10.1079
/9781780647647.0000
Jeschke, J. M., Enders, M., Bagni, M., Aumann, D., Jeschke, P., …
Heger, T. (2020). Hi-Knowledge.org, version 2.0. https://hi
-knowledge.org/.
Klavans, R., & Boyack, K. W. (2017). Which type of citation analysis
generates the most accurate taxonomy of scientific and technical
knowledge? Journal of the Association for Information Science and
Technology,68(4), 984–998. https://doi.org/10.1002/asi.23734
Koopman, R., & Wang, S. (2017). Mutual information based label-
ling and comparing clusters. Scientometrics,111(2), 1157–1167.
https://doi.org/10.1007/s11192-017-2305-2
Law, J. (1973). The development of specialties in science: The case
of X-ray protein crystallography. Science Studies,3(3), 275–303.
https://doi.org/10.1177/030631277300300303
Leydesdorff, L. (1998). Theories of citation? Scientometrics,43(1),
5–25. https://doi.org/10.1007/BF02458391
Leydesdorff, L., & Milojević, S. (2015). The citation impact of German
sociology journals: Some problems with the use of scientometric
indicators in journal and research evaluations. Soziale Welt,
193–204. https://doi.org/10.5771/0038-6073-2015-2-193
Luukkonen, T. (1997). Why has Latour’s theory of citations been
ignored by the bibliometric community? Discussion of socio-
logical interpretations of citation analysis. Scientometrics,38(1),
27–37. https://doi.org/10.1007/BF02461121
Nigel Gilbert, G. (1977). Referencing as persuasion. Social Studies of Sci-
ence,7(1), 113–122. https://doi.org/10.1177/030631277700700112
Potter, I. (2020). Introducing citation topics in InCites–Clarivate.
https://clarivate.com/blog/introducing-citation-topics (accessed
February 28, 2022).
Pyšek, P., Richardson, D. M., Pergl, J., Jarošík, V., Sixtova, Z., &
Weber, E. (2008). Geographical and taxonomic biases in inva-
sion ecology. Trends in Ecology & Evolution,23(5), 237–244.
https://doi.org/10.1016/j.tree.2008.02.002, PubMed:
18367291
Reichard, S. H., & White, P. S. (2003). Invasion biology: An
emerging field of study. Annals of the Missouri Botanical Garden,
90(1), 64–66. https://doi.org/10.2307/3298526
Seitz, C., Schmidt, M., Schwichtenberg, N., & Velden, T. (2021). A
case study of the epistemic function of citations—Implications for
citation-based science mapping. In Proceedings of the 18th
International Conference of the International Society for Sciento-
metrics and Informetrics (ISSI).
Seitz,C.L.(2021).Epistemische funktionen von zitierungen.
Master’s thesis. Humboldt-Universität zu Berlin.
Sjögårde, P., & Ahlgren, P. (2018). Granularity of algorithmically
constructed publication-level classifications of research publi-
cations: Identification of topics. Journal of Informetrics,12(1),
133–152. https://doi.org/10.1016/j.joi.2017.12.006
Sjögårde, P., & Ahlgren, P. (2020). Granularity of algorithmically
constructed publication-level classifications of research publica-
tions: Identification of specialties. Quantitative Science Studies,
1(1), 207–238. https://doi.org/10.1162/qss_a_00004
Šubelj, L., van Eck, N. J., & Waltman, L. (2016). Clustering scientific
publications based on citation relations: A systematic comparison
of different methods. PLOS ONE,11(4), e0154404. https://doi.org
/10.1371/journal.pone.0154404, PubMed: 27124610
Tijssen, R. (1993). A scientometric cognitive study of neural net-
work research: Expert mental maps versus bibliometric maps.
Scientometrics,28(1), 111–136. https://doi.org/10.1007
/BF02016288
Traag, V. A., Van Dooren, P., & Nesterov, Y. (2011). Narrow scope
for resolution-limit-free community detection. Physical Review E,
84(1), 016114. https://doi.org/10.1103/PhysRevE.84.016114,
PubMed: 21867264
Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain
to Leiden: Guaranteeing well-connected communities. Scientific
Reports,9(1), 1–12. https://doi.org/10.1038/s41598-019-41695-z,
PubMed: 30914743
Van Eck, N. J., & Waltman, L. (2017). Citation-based clustering of
publications using CitNetExplorer and VOSviewer. Scientomet-
rics,111(2), 1053–1070. https://doi.org/10.1007/s11192-017
-2300-7, PubMed: 28490825
Vaz, A. S., Kueffer, C., Kull, C. A., Richardson, D. M., Schindler, S.,
…Honrado, J. P. (2017). The progress of interdisciplinarity in
invasion science. Ambio,46(4), 428–442. https://doi.org/10
.1007/s13280-017-0897-7, PubMed: 28150137
Velden, T. (2018). Junior research group: Open science.https://
www.dzhw.eu/en/forschung/projekt?pr_id=635
Velden, T., Boyack, K. W., Gläser, J., Koopman, R., Scharnhorst, A.,
& Wang, S. (2017). Comparison of topic extraction approaches
and their results. Scientometrics,111(2), 1169–1221. https://doi
.org/10.1007/s11192-017-2306-1
Velden, T., Yan, S., & Lagoze, C. (2017). Mapping the cognitive
structure of astrophysics by infomap clustering of the citation
network and topic affinity analysis. Scientometrics,111(2),
1033–1051. https://doi.org/10.1007/s11192-017-2299-9
Vugteveen, P., Lenders, R., & Van den Besselaar, P. (2014). The
dynamics of interdisciplinary research fields: The case of river
research. Scientometrics,100(1), 73–96. https://doi.org/10.1007
/s11192-014-1286-7
Waltman, L., & Van Eck, N. J. (2012). A new methodology for con-
structing a publication-level classification system of science.
Quantitative Science Studies 670
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023
Journal of the American Society for Information Science and Tech-
nology,63(12), 2378–2392. https://doi.org/10.1002/asi.22748
Wen, B., Horlings, E., van der Zouwen, M., & van den Besselaar,
P. (2017). Mapping science through bibliometric triangula-
tion: An experimental approach applied to water research.
Journal of the Association for Information Science and
Technology,68(3), 724–738. https://doi.org/10.1002/asi
.23696
Williamson, M. (1999). Invasions. Ecography,22(1), 5–12. https://
doi.org/10.1111/j.1600-0587.1999.tb00449.x
Wouters, P. (1999). Beyond the holy grail: From citation theory to
indicator theories. Scientometrics,44(3), 561–580. https://doi.org
/10.1007/BF02458496
Zitt, M., Lelu, A., Cadot, M., & Cabanac, G. (2019). Bibliometric
delineation of scientific fields. In Springer handbook of science
and technology indicators (pp. 25–68). Springer. https://doi.org
/10.1007/978-3-030-02511-3_2
Zuckerman, H. (1987). Citation analysis and the complex problem
of intellectual influence. Scientometrics,12(5), 329–338. https://
doi.org/10.1007/BF02016675
Quantitative Science Studies 671
How to interpret algorithmically constructed topical structures of scientific fields?
Downloaded from http://direct.mit.edu/qss/article-pdf/3/3/651/2057776/qss_a_00194.pdf by TU BERLIN UNIV BIBLIOTHEK user on 08 February 2023