Two years of explicit CiTO annotations
Journal of Cheminformatics volume 15, Article number: 14 (2023)
Citations are an essential aspect of research communication and have become the basis of many evaluation metrics in the academic world. Some see citation counts as a mark of scientific impact or even quality, but in reality the reasons for citing other work are manifold which makes the interpretation more complicated than a single citation count can reflect. Two years ago, the Journal of Cheminformatics proposed the CiTO Pilot for the adoption of a practice of annotating citations with their citation intentions. Basically, when you cite a journal article or dataset (or any other source), you also explain why specifically you cite that source. Particularly, the agreement and disagreement and reuse of methods and data are of interest. This article explores what happened after the launch of the pilot. We summarize how authors in the Journal of Cheminformatics used the pilot, shows citation annotations are distributed with Wikidata, visualized with Scholia, discusses adoption outside BMC, and finally present some thoughts on what needs to happen next.
Communicating new research findings is still primarily done by written texts in the form of scholarly articles, books, and book chapters. To not having to repeat past research by themselves or others, authors cite relevant research . However, the reasons why authors cite literature vary, which complicates how we use citations . Typing citations is therefore of interest: it allows us to navigate literature more easily: it points us to essential research methods, data, and can warn us of research that cannot be reproduced, or others disagree with . Indeed, it helps us understand the history of science .
With the use of citations increasingly being picked up to help researchers with tools like scite.ai  en Connected Papers , having typed citations will help us explore literature. Therefore, the Journal of Cheminformatics started a pilot to explore capturing the intent of citations using annotations .
The Citation Typing Ontology Pilot
The pilot consisted of a couple of components and the editorial explains some of them . The Citation Typing Ontology was selected to express the intention , the intention is expressed a compact identifier wrapped in square brackets, also called a safe CURIE, standard proposed by the W3C in 2010 [6, 7]. The cito prefix is registered in Bioregistry . The bibnotes concept of the Springer Nature publishing platform was used as carrier. Authors are guided by a landing page consisting of a BMC Collection at https://www.biomedcentral.com/collections/cito and author guidelines explaining to authors how they can add the annotations with their favorite editor at https://jcheminform.github.io/jcheminform-author-guidelines/cito.
Because the CiTO ontology has many terms for many different citation intentions, we made a selection of CiTO terms authors could use : [cito:citesAsDatasource] to indicate a source that provides data to back up a claim, [cito:usesDataFrom] to indicate that the authors reused data, [cito:usesMethodIn] when a method or protocol explained in that source is used, and a few more general intentions like [cito:discusses], [cito:extends], [cito:agreesWith], and [cito:disagreesWith]. The journal itself would adopt the following CiTO annotations: [cito:retracts], [cito:repliesTo], and [cito:updates]. Fortunately, it has not been used yet, but the first would be used if an article was retracted from the journal. The second would be used when a Letter to the Editor replies to an earlier published article, and [cito:updates] when a Correction was published.
Wikidata and Scholia
To track the uptake but also to demonstrate the impact, we extended Scholia to visualize citation intention data. Scholia is a graphical interface around the data stored in Wikidata  and includes citations from OpenCitations  and PubMed. Wikidata allows adding qualifiers to statements which allowed us to define a data model for citations annotated with CiTO intention; the Wikidata property P3712 has been used for this, labeled objective of project or action (see Fig. 1; this property was relabeled in November 2022 as has goal).
This data in Wikidata can then be accessed in multiple ways, including REST APIs and a SPARQL interface. The latter is used by Scholia to tell us some overall statistics of the number of annotations, which we reported on about a year ago too . Since last year and recorded on August 25 2022, the number of annotations and the number of annotated citations have almost doubled (from 377 to 603 and from 304 to 494, respectively). The first number is higher because one citation can have more than one citation intention. To continue, the current number of citations are citing 387 articles in 141 different scholarly journals, and they are found in 98 articles in 48 different journals (see Fig. 2) .
It must be noted that the Journal of Cheminformatics is only one possible source of CiTO annotations. As far as the author knows, it is still the only journal that uses CiTO annotation explicitly in the articles itself. And with 335 annotated citations in 32 articles it also is the major source of CiTO annotations in Wikidata at the time of writing. However, CiTO intention annotations in Wikidata can come from other sources too and be added both manually and automatically using the tools around Wikidata. When all annotation is combined, Scholia shows us that [cito:citesAsAuthority] is the most used intention, with 226 annotated citations (out of 603) in 38 articles. [cito:usesMethodIn] follows with 102 annotated citations.
Adoption by Journal of Cheminformatics Authors
In the two years of the Pilot, including the seminal editorial , the Journal of Cheminformatics published fifteen articles with explicit CiTO annotation: three Editorials, four Research articles, two Database, and one of each of Data Note, Software, Letter to the Editor, Letter Response, Educational, and Methodology. Ten were published in the first year (Table 1) and five in the second year (Table 2). Each article annotated one or more citations with CiTO intentions, and several articles annotated every citation, far exceeding the original anticipation.
Also exceeding expectations is the diversity of the chosen CiTO intention types. The original guidance focused on [cito:citesAsDataSource], [cito:usesDataFrom] (the first is used to cite an article with data, the second when you reused data and cite the article where it comes from), [cito:usesMethodIn], [cito:citesAsAuthority], [cito:discusses], [cito:extends], [cito:agreesWith], and [cito:disagreesWith]. Not only have all of these been used, authors also used [cito:citesForInformation], [cito:citesAsPotentialSolution], [cito:citesAsRelated], [cito:documents], and [cito:obtainsBackgroundFrom].
To make life easier for authors, and following a Twitter discussion in Spring 2021, a Markdown template was developed with native CiTO support: https://jcheminform.github.io/jcheminform-author-guidelines/cito-guidelines/markdown.html. Here, the author indicates the CiTO type when they cite the article. This is using a method introduced by Krewinkel et al. . The manuscript can then be converted to a Microsoft Word file with Pandoc (https://pandoc.org/) for submission to the journal. For publishers it will be interesting to note that the Pandoc can be directly converted in the Journal Article Tag Suite (JATS) format [29, 30].
The Journal of Cheminformatics template is available from the journal’s GitHub organization, and authors and editors should feel free to adapt it to the needs of other journals. The BioHackrXiv (https://biohackrxiv.org/) preprint server also support CiTO annotations  and this template can be found at https://github.com/biohackrxiv/publication-template.
Annotated citation networks
We already use citation networks in finding relevant literature, for example based on co-citation patterns. Such analyses become stronger when we know more why articles cited. Similarly, an article that cites an article because it uses a method in second article and that method extends a method in a third articles, then the first article indirectly uses the method in that third article, even if that third article is not directly cited. This is reflecting citation habits: authors always decide whether to cite all articles about a method, only the most recent, or only the oldest (or something else). After all, journals frequently have rules about the maximum length of reference sections. Moreover, some methods are so well established, we are not expected to cite that work at all.
The general availability of open citation allows us to recover such more complex reuse scenarios using the citation networks. Moreover, when the citations are annotated, we can zoom in on reuse networks. Figure 3 shows a method reuse network for articles with explicit annotation in red. The network shows a few article that use methods in two cited articles, e.g. Kohulan Rajan, 2021 and Henning Otto Brinkhaus, 2022 [23, 27]. Of course, if we do not limit the network to a subset of articles (here, Journal of Cheminformatics articles with CiTO annotation), the network becomes more interesting, but also much more complex. Network analysis approaches can then be used instead of network visualization.
When we combine the reuse of methods and data, we can for journals summarize which articles are most reused. Analyses like this become a simple query that can be routinely performed for any journal. Figure 4 shows a tabular summary of articles of which methods or data is reused. Like citation counts, this data depends totally on explicit citation data. However, these citation counts are based on actual reuse and not also on the number of citations as authority.
Because many publishing platforms currently do not support display of citation level intention annotation, the simplest model only provides the annotation in the bibliography. The Pilot made this choice to be able to use the bibnotes approach which allows giving additional notes to a reference in the bibliography. The CiTO Pilot uses the safe CURIE standard, compatible with the typesetting in the Journal of Cheminformatics. This makes is easier to text mine the annotations by downstream tools, from both the HTML and PDF as well as the JATS versions of the article.
However, this links also to the limitation to the current use of CiTO annotation: the citation-level annotation may be supported by some authoring systems (Markdown), even then the depiction may not be supported. If we convert a Markdown file to a Microsoft Word file, the annotation is kept in the bibliography. However, when writing a manuscript in Microsoft Word, LibreOffice, or Google Docs, combined with a reference manager like Zotero, the CiTO annotation cannot be stored as part of the manuscript easily. The problem here is that CiTO annotation in the reference manager are no longer linked to when they are cited. The workaround is to add the CiTO annotation after completion of the Word document, directly to the bibliography.
LaTeX users may find them in a situation between that of Markdown users and reference manager users: only if a manuscript-level Bib(La)TeX file is used the CiTO annotation can be added as notes to the BiBTeX file. This way, the CiTO annotation is specific for this manuscript and each manuscript can use different annotations. This approach is explained in this guidance document: https://jcheminform.github.io/jcheminform-author-guidelines/cito-guidelines/latex.html.
From a use case perspective, it is easy to see how this kind of annotation can be used. For example, we here showed examples of reuse of work, via [cito:usesMethodIn] and [cito:usesDataFrom]. Second, scite.ai is a clear use case of [cito:agreesWith] and [cito:disagreesWith] annotation, though it makes a good case how such citation intentions can be extracted with text mining instead. However, other use cases still need development and adoption, which brings us to the question: what is next?
What is next?
With fifteen articles published in the CiTO Collection, the pilot triggered interest from authors. The Journal of Cheminformatics has already published a few more articles with CiTO annotation after August 2022 and a search on the preprint servers ResearchSquare and ChemRxiv  show a few more manuscripts. The support by BioHackrXiv  is a nice example of adoption beyond BMC. Further citation intention annotations will come from literature studies where citations networks are characterized. For example, Duca et al. used CiTO annotations to describe the citation network to retracted COVID-19 articles .
Further uptake of this idea of typed citations depends on the combined willingness of journal editors, authors, publishers and indexing services alike. The rise of services like scite.ai shows that the research community is ready for this kind of information. Logical steps forward include support of distributing citation typing annotation via platforms like PlumX, Altmetric.com, CrossRef, or EuropePMC, and support of CiTO annotation in the JATS format.
Because all innovation requires a critical amount of adoption to be accepted, additional sources of CiTO annotation will be welcomed. For example, providing CiTO annotations via a standardized format like a spreadsheet-like format or nanopublications would allow collections of annotations to be shared, such as that by Duca et al. Then, when archived on Zenodo and therefore citable, these annotations can be included in analysis as a trusted or at least citable source.
Availability of data and materials
CiTO annotation in the Journal of Cheminformatics are available from the journal’s articles. CiTO annotation data in Scholia is available from Wikidata. Markdown templates that support CiTO are available from https://github.com/jcheminform/markdown-jcheminf and https://github.com/biohackrxiv/publication-template. An archive of all CiTO annotations in Wikidata is available from https://doi.org/10.5281/zenodo.7513573.
Shotton D (2010) CiTO, the citation typing ontology. J Biomed Semant 1:S6. https://doi.org/10.1186/2041-1480-1-s1-s6. [cito:agreesWith] [cito:citesAsAuthority] [cito:usesMethodIn])
Nicholson JM (2021) Smart(er) citations. Matter 4:756–758. https://doi.org/10.1016/j.matt.2021.02.007. [cito:agreesWith][cito:citesAsAuthority]
Park M, Leahey E, Funk RJ (2023) Papers and patents are becoming less disruptive over time. Nature 613:138–144. https://doi.org/10.1038/s41586-022-05543-x. [cito:obtainsBackgroundFrom]
Tarnavsky-Eitan A, Smolyansky E, Knaan-Harpaz I, Perets S (2022) Connected papers. https://connectedpapers.com. Accessed 28 Aug 2022. [cito:citesAsAuthority]
Willighagen E (2020) Adoption of the citation typing ontology by the journal of cheminformatics. J Cheminformatics 12:47. https://doi.org/10.1186/s13321-020-00448-1. [cito:discusses] [cito:extends] [cito:citesForInformation]
McCarron S, Birbeck M (2010) CURIE syntax 1.0. https://www.w3.org/TR/2010/NOTE-curie-20101216/ [cito:usesMethodIn]
Wimalaratne SM, Juty N, Kunze J et al (2018) Uniform resolution of compact identifiers for biomedical data. Sci Data 5:180029. https://doi.org/10.1038/sdata.2018.29. [cito:usesMethodIn]
Hoyt CT, Balk M, Callahan TJ et al (2022) Unifying the identification of biomedical entities with the bioregistry. Sci Data 9:714. https://doi.org/10.1038/s41597-022-01807-3. [cito:usesDataFrom]
(2020) Citation Typing Ontology (CiTO) Pilot. https://www.biomedcentral.com/collections/cito
Nielsen FÅ, Mietchen D, Willighagen E (2017) Scholia, scientometrics and wikidata. In: Blomqvist E, Hose K, Paulheim H et al (eds) The semantic web: ESWC 2017 satellite events. Springer International Publishing, Cham, pp 237–259. [cito:usesMethodIn]
Peroni S, Shotton D (2020) OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies 1:428–444. https://doi.org/10.1162/qss_a_00023
Willighagen E (2021) Typed citations in the journal of cheminformatics. In: On physical sciences. https://blogs.biomedcentral.com/on-physicalsciences/2021/07/20/typed-citations-journal-of-cheminformatics/ [cito:citesAsDataSource]
(2022) Citation Typing Ontology Intentions—Scholia. https://scholia.toolforge.org/cito/. Accessed 26 Aug 2022 [cito:citesAsDataSource]
Ahmed L, Alogheli H, McShane SA et al (2020) Predicting target profiles with confidence as a service using docking scores. J Cheminformatics 12:62. https://doi.org/10.1186/s13321-020-00464-1. [cito:citesForInformation]
Schaub J, Zielesny A, Steinbeck C, Sorokina M (2020) Too sweet: Cheminformatics for deglycosylation in natural products. J Cheminformatics 12:67. https://doi.org/10.1186/s13321-020-00467-y. [cito:citesForInformation]
Tuerkova A, Zdrazil B (2020) A ligand-based computational drug repurposing pipeline using KNIME and programmatic data access: case studies for rare diseases and COVID-19. J Cheminformatics 12:71. https://doi.org/10.1186/s13321-020-00474-z. [cito:citesForInformation]
Sorokina M, Merseburger P, Rajan K et al (2021) COCONUT online: collection of open natural products database. J Cheminformatics 13:2. https://doi.org/10.1186/s13321-020-00478-9. [cito:citesForInformation]
Guha R, Willighagen E, Zdrazil B, Jeliazkova N (2021) What is the role of cheminformatics in a pandemic? J Cheminformatics 13:16. https://doi.org/10.1186/s13321-021-00491-6. [cito:citesForInformation]
Schymanski EL, Kondić T, Neumann S et al (2021) Empowering large chemical knowledge bases for exposomics: PubChemLite meets MetFrag. J Cheminformatics 13:19. https://doi.org/10.1186/s13321-021-00489-0. [cito:citesForInformation]
Galgonek J, Vondrášek J (2021) IDSM ChemWebRDF: SPARQLing small-molecule datasets. J Cheminformatics 13:38. https://doi.org/10.1186/s13321-021-00515-1. [cito:citesForInformation]
Schymanski EL, Bolton EE (2021) FAIR chemical structures in the journal of cheminformatics. J Cheminformatics 13:50. https://doi.org/10.1186/s13321-021-00520-4. [cito:citesForInformation]
Guha R, Jeliazkova N, Willighagen E, Zdrazil B (2021) Reply to “FAIR chemical structure in the journal of cheminformatics.” J Cheminformatics 13:49. https://doi.org/10.1186/s13321-021-00521-3. [cito:citesForInformation]
Rajan K, Zielesny A, Steinbeck C (2021) DECIMER 10: Deep learning for chemical image recognition using transformers. J Cheminformatics 13:61. https://doi.org/10.1186/s13321-021-00538-8. [cito:citesForInformation]
Liu X, Ye K, van Vlijmen HWT et al (2021) DrugEx v2: de novo design of drug molecules by pareto-based multi-objective reinforcement learning in polypharmacology. J Cheminformatics 13:85. https://doi.org/10.1186/s13321-021-00561-9. [cito:citesForInformation]
Ammar A, Cavill R, Evelo C, Willighagen E (2022) PSnpBind: a database of mutated binding site protein-ligand complexes constructed using a multithreaded virtual screening workflow. J Cheminformatics 14:8. https://doi.org/10.1186/s13321-021-00573-5. [cito:citesForInformation]
Zdrazil B, Guha R (2022) Diversifying cheminformatics. Journal of Cheminformatics 14:25, s13321–022–00597–5. https://doi.org/10.1186/s13321-022-00597-5 [cito:citesForInformation]
Brinkhaus HO, Zielesny A, Steinbeck C, Rajan K (2022) DECIMER—hand-drawn molecule images dataset. J Cheminformatics 14:36. https://doi.org/10.1186/s13321-022-00620-9. [cito:citesForInformation]
Krewinkel A, Winkler R (2017) Formatting open science: Agilely creating multiple document formats for academic manuscripts with pandoc scholar. PeerJ Comput Sci 3:e112. https://doi.org/10.7717/peerj-cs.112. [cito:usesMethodIn]
Beck J (2011) NISO Z39.96 the journal article tag suite (JATS): What happened to the NLM DTDs? J Electron Publ. https://doi.org/10.3998/3336451.0014.106. [cito:citesForInformation])
Krewinkel A, Bazán J, Smith AM (2022) JATS from markdown: Developer friendly single-source scholarly publishing. In: Journal article tag suite conference (JATS-con) proceedings 2022 [internet]. National Center for Biotechnology Information (US). [cito:citesAsPotentialSolution]
Prins P, Ohta T, Willighagen E (2021) Metadata for BioHackrXiv markdown publications. https://github.com/biohackrxiv/bhxiv-metadata/blob/main/doc/elixir_biohackathon2021/paper.md. [cito:citesAsDataSource] [cito:citesAsPotentialSolution]
Mudrak B, Bosshart S, Koch W et al (2022) Five years of ChemRxiv: Where we are and where we go from here. J Am Chem Soc 144:22333–22335. https://doi.org/10.1021/jacs.2c11417. [cito:citesAsAuthority]
Prins P, Ohta T, Castro LJ, Katayama T (2022) Metadata for BioHackrXiv markdown publications. BioHackrXiv. https://doi.org/10.37044/osf.io/ueqkj. [cito:citesAsAuthority]
Duca H, Pacheco I, Lopes GR et al (2022) Publications about COVID-19: a semantic analysis on retracted articles. ISys Braz J Inf Syst. https://doi.org/10.5753/isys.2022.2413. [cito:citesAsRecommendedReading]
MacKenzie C (2022) Add cito panel to work aspect with ask query. In: GitHub. https://github.com/WDscholia/scholia/commit/0af60cf7732bc8664d828cbe51f233aa63201df9. Accessed 26 Aug 2022. [cito:citesAsEvidence]
This work would not be possible without the support from Springer Nature and Matthew Smyllie in particular and the editors of the Journal of Cheminformatics, Rajarshi Guha, Nina Jeliazkova, and Barbara Zdrazil. Carlin MacKenzie is thanked for integration the sections on CiTO use into the main Scholia journal aspects . Huge thanks goes to Albert Krewinkel for developing the Markdown/Pandoc integration. Finally, thanks to all authors for including CiTO annotation in their articles.
Part of this work was supported by ELIXIR, the research infrastructure for life-science data.
The author declare that he has no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Willighagen, E. Two years of explicit CiTO annotations. J Cheminform 15, 14 (2023). https://doi.org/10.1186/s13321-023-00683-2