Cheminformatics Microservice: unifying access to open cheminformatics toolkits

Chandrasekhar, Venkata; Sharma, Nisha; Schaub, Jonas; Steinbeck, Christoph; Rajan, Kohulan

doi:10.1186/s13321-023-00762-4

Software
Open access
Published: 16 October 2023

Cheminformatics Microservice: unifying access to open cheminformatics toolkits

Venkata Chandrasekhar¹,
Nisha Sharma¹,
Jonas Schaub¹,
Christoph Steinbeck¹ &
…
Kohulan Rajan¹

Journal of Cheminformatics volume 15, Article number: 98 (2023) Cite this article

2073 Accesses
5 Altmetric
Metrics details

This article has been updated

Abstract

In recent years, cheminformatics has experienced significant advancements through the development of new open-source software tools based on various cheminformatics programming toolkits. However, adopting these toolkits presents challenges, including proper installation, setup, deployment, and compatibility management. In this work, we present the Cheminformatics Microservice. This open-source solution provides a unified interface for accessing commonly used functionalities of multiple cheminformatics toolkits, namely RDKit, Chemistry Development Kit (CDK), and Open Babel. In addition, more advanced functionalities like structure generation and Optical Chemical Structure Recognition (OCSR) are made available through the Cheminformatics Microservice based on pre-existing tools. The software service also enables developers to extend the functionalities easily and to seamlessly integrate them with existing workflows and applications. It is built on FastAPI and containerized using Docker, making it highly scalable. An instance of the microservice is publicly available at https://api.naturalproducts.net. The source code is publicly accessible on GitHub, accompanied by comprehensive documentation, version control, and continuous integration and deployment workflows. All resources can be found at the following link: https://github.com/Steinbeck-Lab/cheminformatics-microservice.

Graphical Abstract

Introduction

Open cheminformatics toolkits, large-scale chemical databases, and an increase in available computing power have led to significant advancements in the field of cheminformatics in recent years [1, 2]. As a result, large amounts of chemical data can be handled and analysed efficiently, which in turn benefits research fields like chemistry, drug discovery, and material design. Multiple prominent open-source cheminformatics toolkits are available, which include RDKit [3], Open Babel [4], Chemistry Development Kit (CDK) [5, 6], Indigo [7], and the recently developed Python-based Informatics Kit for Analysing CHemical Units (PIKAChU) [8]. A summary of their native programming language, latest version, and licence information is given in Table 1.

Table 1 Summary of widely used open-source cheminformatics toolkits

Full size table

All cheminformatics toolkits mentioned above offer ready-to-use routines for common tasks like data format conversions, descriptor calculations, and structure editing. On top of these, every toolkit has an individual set of more advanced functionalities like coordinate generation, substructure analysis, or structure normalisation. For this reason, researchers often use multiple toolkits in their tools or workflows [9, 10]. To do so, they have to familiarise themselves with the specific requirements, syntax, and algorithms of each employed toolkit. Being familiar with the underlying programming languages such as Python, Java, or C + + is required to achieve this most efficiently. It is also essential to have a thorough understanding of chemical concepts, molecular representations, and computational algorithms.

For developing cheminformatics workflows, machine learning models, or web applications, researchers and software developers need to set up a proper working environment for the toolkits in order to integrate them into their applications. These software tools can later become cumbersome to use and maintain if not set up properly or inadequately documented. Unfortunately, this is often the case due to the complexity of the setup, lack of documentation, and lack of good research data management practices [11]. Other challenges in developing cheminformatics software, databases, and web applications result from building on top of these toolkits due to factors such as various toolkit version management [12], dependencies management [13], and maintenance [14].

Software Version Management: The purpose of this process is to effectively manage the versions of software and toolkits throughout their development and maintenance cycle. The objective is to organise and track changes, facilitate collaboration, and ensure the stability and integrity of software projects, thus increasing productivity and streamlining development processes. This process can be time-consuming and requires careful planning and organisation.
Dependencies: The majority of cheminformatics tools require several interdependent third-party libraries to function. Managing these dependencies can be challenging since developers must ensure that each dependency is installed correctly and compatible with the other dependencies. Otherwise, this can lead to conflicts between dependencies, which may cause the software to malfunction.
Maintenance: It is challenging to maintain software and databases as they require regular updates and bug fixes to remain up-to-date and functional. In the special case of open-source software, this usually requires a large community of active and committed users and developers.

Most cheminformatics open-source programming toolkits have a rather high entrance barrier due to the abovementioned aspects. Also, the quality of available documentation and tutorials varies. This is especially critical for young researchers new to the field who have to spend a lot of setup time before being able to work on their research projects. Therefore, online tools for cheminformatics are becoming increasingly popular due to their usual ease of use, adequate documentation, and the fact that no or only little programming knowledge is required to use them [15,16,17,18]. The use of web-based solutions or solutions that are driven by Application Programming Interfaces (APIs) [19] offers a lower entrance barrier. In addition, these services are platform-independent and can be easily integrated with software utilities, databases, repositories, and cheminformatics data management workflows [17].

To overcome the challenging integration of third-party libraries like cheminformatics toolkits into application code, two common software development techniques can be used: containerization and microservices. Microservices [20], also referred to as microservice architecture, is a collection of small, autonomous services that can be deployed, scaled, and maintained individually [21]. By leveraging well-defined APIs, each microservice carries out a specific function and interacts with other microservices. These services are characterised by their granular nature and employ lightweight protocols to enable the independent execution of each service [22]. The native installation of such services on a system makes maintenance difficult in the long run. The service may not function as a result of Operating System (OS) level updates or software environment conflicts. In addition, in the event of a service failure, it is necessary to reboot the entire system. In order to address these issues, containerization can be used for the deployment of software services and applications. Containers are lightweight isolated environments that enable applications and their dependencies to run consistently across a variety of systems, independent of the OS [23]. In addition, a container provides a consistent and reproducible execution environment across development, testing, and production environments. In this work, containerization was achieved using Docker [24]. Software components can be containerized using Docker and distributed publicly via the Docker Hub [25], a cloud-based registry provided by Docker that allows developers to store, share, and distribute Docker images.

This article presents the Cheminformatics Microservice, an open-source solution for handling chemical data and performing various cheminformatics tasks by employing multiple cheminformatics toolkits (CDK, RDKit, and Open Babel). These tasks include generating high-quality chemical structure depictions and 3D conformers, calculating molecular descriptors and IUPAC names, and converting SMILES representations of chemical structures into other machine-readable formats. The microservice can be accessed through a unified REST (REpresentational State Transfer) [26] interface, which is made available as a public server for anyone to access via https://api.naturalproducts.net/. Alternatively, it can be installed locally or on a private server using the provided Docker image. It can also be deployed and auto-scaled on a Kubernetes-managed private cluster in just a few steps via the Helm Charts provided [27]. These deployment options are designed to be user-friendly since they require no prior knowledge of the underlying toolkits or their setup environment. They make the presented software service suitable for a wide range of applications in academic and industrial environments. The entire Cheminformatics Microservice source code is made available to the public on GitHub: https://github.com/Steinbeck-Lab/cheminformatics-microservice. Users and researchers are encouraged to submit feature requests and contribute to the microservice to ensure its continued growth.

Implementation

Cheminformatics Microservice is developed using FastAPI, a web framework for generating RESTful APIs with Python. FastAPI was chosen for this project due to its speed, efficiency, and suitability for building advanced APIs. It enables the straightforward creation of robust and scalable APIs. Docker is used for containerization, and semantic versioning principles are applied to track code changes and toolkit versions. In the microservice container, the cheminformatics toolkits RDKit and Open Babel are accessed natively using Python, while the Chemistry Development Kit (CDK) is integrated using JPype [28]. Cheminformatics Microservice consists of five modules, namely chem, convert, depict, ocsr, and tools. Compliant with the OpenAPI [29] specification version 3.1.0, this work provides standard documentation, encourages interoperability, enables automatic code generation, simplifies validation, and integrates with a variety of tools and libraries to enhance the functionality of REST (REpresentational State Transfer) APIs. An overview of the software architecture is given in Fig. 1.

The chem module offers various functions such as descriptor calculation, stereoisomer enumeration, HOSE code [30] generation, NPLikeness score calculation [31], ClassyFire classification [32], molecular structure standardisation [33], and a preprocessing pipeline for the upcoming version of the COCONUT database as explained in the results section below. The convert module provides conversions from SMILES [34] to other molecular string representations such as InChI [35, 36], InChIKey [36], canonical SMILES [37], CXSMILES [38], SELFIES [39, 40], and IUPAC names. The latter is achieved via the Smiles TO iUpac Translator (STOUT) version 2.0 toolkit [41]. Additionally, MOL files can be generated with 2D and 3D coordinates from SMILES input. Using the depict module, one can generate 2D depictions of chemical structures with various settings, including an option to generate 2D representations with stereochemical annotations following the Cahn–Ingold–Prelog [42] (CIP) sequence rules. RDKit or CDK can be used to generate 2D representations. The depictions are generated in an SVG format and may be scaled to fit the needs of the user. It is also possible to generate a 3D depiction [43] using this module, which is useful as a chemical structure display option for databases or as a teaching aid. The underlying 3D conformer is generated using RDKit. The ocsr module incorporates Deep lEarning for Chemical ImagE Recognition (DECIMER) modules [44, 45] for translating images of chemical structures into machine-readable SMILES representations. These can be accessed via HTTP POST requests. Finally, the tools module is designated as a miscellaneous collection of advanced cheminformatics tools. It offers a function to generate chemical structures from a molecular formula given as input using the surge chemical graph generator [46]. Other functions of the tools module can be used to identify glycosidic moieties in input molecules and to remove them in order to generate the aglycone structure. These routines are implemented based on the Sugar Removal Utility (SRU) [47].

The functionalities offered by each main module described above can also be implemented via independent Python functions, enabling users to access Cheminformatics Microservice natively without the REST interface. Users can import and use it like any other package directly in their own Python code. Individual toolkit wrapper modules provide access to the three cheminformatics toolkits RDKit, CDK, and Open Babel. RDKit and Open Babel are natively accessible, while CDK, a Java library, has functionalities ported to Python using JPype. It is possible to extend the functionalities provided by cheminformatics toolkits in the future by using these wrapper modules. Separating them into individual modules is necessary to achieve granular control over the functions. This also ensures that the entire system will not be broken if one module is affected by potential software failures. The Python functions are documented separately, and the documentation can be accessed via: https://cheminformatics-microservice.readthedocs.io/en/latest/.

To ensure reproducibility, a consistent versioning system is essential. Best practices for research data management [48, 49] recommend documenting software and component versions separately, especially for tools like the presented microservice with multiple dependencies. Cheminformatics Microservice uses multi-level versioning to record API and software dependencies. The codebase undergoes bi-annual major releases, with corresponding documentation provided for the underlying toolkits, tools, and environment dependencies for each release. The API version updates are released only when significant changes to the API endpoints have been made. It is possible to update the underlying cheminformatics toolkits whenever new releases are published without having to update the entire code base since the REST API remains unchanged. The API usage can be logged, monitored, and visualised using Prometheus [50] and Grafana [51] in a standalone or distributed environment. Cheminformatics Microservice can also be deployed using a Continuous Integration and Continuous Deployment (CI/CD) pipeline via GitHub Actions [52]. Code integration, testing, and application deployment are automated using CI/CD, which fosters collaboration, minimises manual tasks, and enables timely feedback.

Results and discussion

The presented Cheminformatics Microservice provides straightforward access to the open-source cheminformatics toolkits RDKit, CDK, and OpenBabel, as well as deep learning-based tools such as Deep lEarning for Chemical ImagE Recognition (DECIMER) for OCSR and Smiles TO iUpac Translator (STOUT). Building on top of these tools and toolkits, it makes a diverse selection of important functionalities needed by cheminformaticians on a daily basis accessible via a RESTful interface. Additionally, Cheminformatics Microservice includes a number of widely used packages, like the ChEMBL [53] curation pipeline [33], CDK-based sugar removal functionalities [47], and the open-source structure generator surge [46]. The presented microservice is intended to facilitate the handling of large amounts of chemical structural data to enable the development of adaptable, scalable, and maintainable cheminformatics applications.

The main modules of Cheminformatics Microservice are well-documented and can be accessed via the following link: https://api.naturalproducts.net/. In order to obtain the output generated by the microservice, each API module uses either a GET or a POST HTTP request method [54]. Most of the services provided by the chem, convert, and depict modules can be accessed using SMILES as an input format for molecular structures. Where a specific functionality provided by the microservice can be achieved in a similar manner with RDKit, CDK, or Open Babel, the user has the option of employing the toolkit of their choice via an additional parameter. The chem module offers routines for structure manipulation and standardisation, descriptor calculation, and chemical classification. The standardize functionality is used via a POST method for the purpose of standardising molecules represented by MOL format tables through the ChEMBL structure curation pipeline. The convert module enables users to convert SMILES representations into other formats of their choice, using the GET method. Meanwhile, the depict module allows users to generate customised 2D depictions of molecular structures, offering options for coloured or black and white output images. The stereochemical annotations on the 2D depictions are generated using the Cahn-Ingold-Prelog (CIP) priority rules. To accomplish this, the Java package centres [55, 56] is used in conjunction with CDK. Additionally, the depict module is able to produce interactive 3D models of input structures. Figure 2A, B illustrates the application of the depict module to generate a 2D depiction with CIP annotations in colour on a scale of 512 × 512 pixels and with a 52° rotation. The depicted image with CIP annotadepicttions can be generated directly by using this call to the API: https://api.naturalproducts.net/latest/depict/2D?smiles=C%5BC@%5D12CC%5BC@H%5D3%5BC@H%5D(%5BC@@H%5D1CC%5BC@@H%5D2O)CCC4=CC(=O)CC%5BC@%5D34C&width=512&height=512&rotate=52&&CIP=true&unicolor=false&toolkit=cdk

2D representations are produced in the form of SVG images, whereas 3D representations are returned as HTML files containing embedded JSMol objects [57, 58].

Through REST API calls, the tools and ocsr modules offer convenient access to the underlying functionalities of the integrated tools. For example, the tools module allows users to employ the open-source chemical structure generator surge to generate chemically valid structures based on a provided molecular formula. The generated structures are returned as a list of SMILES representations. In order to address the resource-intensive nature of the chemical structure generation process, a maximum limit of 10 heavy atoms in the given molecular formula has been imposed. This restriction applies exclusively to the public instance. The tools module also allows users to access the Sugar Removal Utility (SRU) to detect and remove linear and circular sugar moieties in/from input structures. Users can access the DECIMER toolkit through the ocsr module, which enables identification, segmentation, and translation into machine-readable representations of chemical structure depictions from the scientific literature.

Currently, Cheminformatics Microservice is available to the public via https://api.naturalproducts.net/latest/docs and in the back end, the container is running on a compute server with the processor Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz and 16 GB of RAM.

Use of Cheminformatics Microservice in the COCONUT database

Cheminformatics Microservice is extensively employed in the upcoming version of the COCONUT (COlleCtion of Open Natural prodUcTs) database that is currently under development. With the depict module, it becomes feasible to present all the natural product structure data entries within COCONUT in both 2D and 3D formats (Fig. 3). The microservice also includes the generation of molecular descriptors and further preprocessing steps for data submission to the COCONUT database.

Documentation

Cheminformatics Microservice offers detailed documentation alongside its source code, ensuring that users can easily access and navigate it without a high entry barrier. The documentation provides clear guidance on how to effectively use, deploy, and install the software. Figure 4 offers a glimpse of the documentation that is deployed using GitHub Pages and can be accessed at the following URL:

https://docs.api.naturalproducts.net.

Performance and scalability

Scaling is particularly important for those who plan to use the API endpoints for database generation, large-scale descriptor calculation, or automated literature mining since it considerably reduces the overall computation time required. Cheminformatics Microservice is specifically designed to scale at large, in a local deployment as well as an orchestrated cluster. It can be configured to run on multiple workers when deployed independently. If deployed through Docker Compose or over a Kubernetes cluster, the microservice can be auto-scaled infinitely to handle incoming requests. Helm Chart or the Docker Compose file provided in the codebase enable scaling without any additional setup. Prometheus and Grafana are used to monitor the request count over time. These logs and other performance indicators will facilitate the scaling of Cheminformatics Microservice workers based on user demand in the future.

To determine scalability, stress testing was performed using Vegeta [61]. To determine the maximum throughput Cheminformatics Microservice can handle when deployed on a machine with Intel(R) Xeon(R) Gold 6226R CPU @ 2.90 GHz and 16 GB of RAM, requests in small increments were added using Vegeta and the delivered throughput was measured until a limit was reached (Fig. 5). This stress test indicated that the service reached its maximum capacity when handling approximately 2500 requests per second for echo requests, after which the success rate began to decline. When queried with the task to generate 2D coordinates for a molecular structure given as a SMILES string using RDKit, Fig. 5 clearly demonstrates that the microservice can effectively handle a wide range of requests, ranging from 50 to 500 per second. With an increase in the input molecule size and number of requests per second, this rate starts to deteriorate. A detailed description of the stress test can be found in the Additional file 1.

Nevertheless, the default configuration of Cheminformatics Microservice ensures excellent stability in handling user requests and effectively manages to execute the computation required.

However, limitations are imposed on the public instance for some specific tools and routines. For example, restrictions are imposed in the structure generator surge when dealing with molecules containing more than ten heavy atoms, as this service demands significant computational resources. Access to mass mining through the DECIMER endpoint is also restricted. These restrictions might be reconsidered in the future, depending on the user demand and computation resource availability.

Conclusion

The presented Cheminformatics Microservice is a self-contained, web-based service that operates independently of the operating system. This can be used by researchers with no or limited programming experience to perform daily cheminformatics tasks effortlessly via the API hosted at https://api.naturalproducts.net. This service allows a diverse range of open-source cheminformatics toolkits to be accessed without requiring any software installation or environment setups. To the best of our knowledge, it is currently the sole microservice in the field of cheminformatics offering users the ability to access multiple toolkits, as well as additional tools such as the open-source structure generator surge, Sugar Removal Utility (SRU), and the DECIMER OCSR tools. Cheminformatics Microservice is designed to be user-friendly, easily extendable, deployable, and scalable. It can be accessed through the public API or hosted on private clusters or single machines.

The integration of multiple cheminformatics toolkits in a centralised platform enhances user accessibility to cheminformatics utilities. By using industry-standard monitoring, deployment, documentation, and code quality standards, this project demonstrates software development in a user-focused manner. By making the source code and the documentation completely open and public, we aim to make valuable contributions to the scientific community while also enabling the submission of feature requests for future enhancements. Cheminformatics Microservice is anticipated to become an invaluable asset for cheminformaticians due to its inherent expandability. It follows a clear semantic versioning system, receives bi-annual updates, and includes comprehensive documentation. These characteristics significantly facilitate data reproduction and reuse for researchers, thereby fostering better collaboration among peers.

Availability and requirements

Project name: Cheminformatics Microservice
Project home page: https://github.com/Steinbeck-Lab/cheminformatics-microservice
Docker Image: https://hub.docker.com/r/nfdi4chem/cheminformatics-microservice
Helm Chart repo: https://nfdi4chem.github.io/repo-helm-charts/
Helm Chart GitHub: https://github.com/NFDI4Chem/repo-helm-charts
Current version: v1.6.0
DOI of archived current release: https://doi.org/10.5281/zenodo.7745987
Operating system(s): Independent
Programming language: Python 3, HTML
Requirements:
- API calls:
  - Internet connection and command line interface or a web browser
- Run locally:
  - Docker—To use Cheminformatics Microservice as a Docker container.
  - Conda environment—to use Cheminformatics Microservice natively without Docker.
- Dependencies (managed by Docker/Conda):
  - Python packages: uvicorn ≥ 0.15.0, < 0.16.0, fastapi ≥ 0.80.0, fastapi-pagination = = 0.10.0, fastapi-versioning ≥ 0.10.0, prometheus-fastapi-instrumentator, jpype1 = = 1.4.1, jinja2, pandas, chembl_structure_pipeline, HOSE_code_generator @ git + https://github.com/Ratsemaat/HOSE_code_generator, websockets = = 10.4, pillow = = 9.4.0, opencv-python = = 4.7.0.68, matplotlib = = 3.4.3, scikit-image, pdf2image = = 1.16.2, IPython, pystow ≥ 0.4.9, unicodedata2 = = 15.0.0, efficientnet, tensorflow = = 2.12.0, pillow-heif = = 0.10.0, selfies ≥ 2.1.1, httpx ≥ 0.24.1, keras_preprocessing = = 1.1.2, decimer-segmentation ≥ 1.1.2, STOUT-pypi ≥ 2.0.5 and decimer ≥ 2.2.0
  - Java: OpenJDK for Java 11
  - Java Libraries: CDK 2.8.0, SRU 1.3.2 and Centres 1.0
Licence: MIT
Documentation:
- Home page: https://docs.api.naturalproducts.net/
- API: https://api.naturalproducts.net/latest/docs
- Python Documentation: https://cheminformatics-microservice.readthedocs.io/en/latest/
Any restrictions to use by non-academics: None.

Data availability

Not Applicable.

Change history

03 November 2023
We have replaced a link encoding on page 6

Abbreviations

2D:: Two dimensional
3D:: Three dimensional
APIs:: Application Programming Interfaces
BSD:: Berkeley Software Distribution
CC:: Creative Commons
CDK:: Chemistry Development Kit
CI/CD:: Continuous Integration and Continuous Deployment
CIP:: Cahn–Ingold–Prelog
COCONUT:: COlleCtion of Open Natural prodUcTs
CPU:: Central processing unit
CXSMILES:: Chemaxon Extended Simplified Molecular Input Line Entry Specification
DECIMER:: Deep lEarning for Chemical ImagE Recognition
GB:: Gigabyte
GHz:: Gigahertz
GNU:: GNU’s Not Unix
GPL:: General Public License
HOSE:: Hierarchically ordered spherical environment
HTTP:: Hypertext Transfer Protocol
HTML:: Hypertext Markup Language
InChI:: International Chemical Identifier
IUPAC:: International Union of Pure and Applied Chemistry
JDK:: Java Development Kit
JPype:: Java for Python
JSMol:: JavaScript molecular viewer
MIT:: Massachusetts Institute of Technology
NPLikeness:: Natural product likeness
OCSR:: Optical chemical structure recognition
OS:: Operating system
PC:: Personal computer
PIKAChU:: Python-based Informatics Kit for Analysing CHemical Units
RAM:: Random access memory
REST:: REpresentational state transfer
SELFIES:: SELF-referencIng embedded strings
SMILES:: Simplified molecular input line entry specification
SRU:: Sugar Removal Utility
STOUT:: SMILES-TO-IUPAC-name translator
SVG:: Scalable Vector Graphics

References

Ambure P, Aher RB, Roy K (2014) Recent advances in the open access cheminformatics toolkits, software tools, workflow environments, and databases. In: Zhang Wei (ed) Methods in pharmacology and toxicology. New York, Springer, pp 257–296
Google Scholar
Wegner JK, Sterling A, Guha R, Bender A, Faulon J-L, Hastings J, O’Boyle N, Overington J, Van Vlijmen H, Willighagen E (2012) Cheminformatics. Commun ACM 55:65–75. https://doi.org/10.1145/2366316.2366334
Article Google Scholar
Landrum G, et al. RDKit: open-source cheminformatics software. 2016. http://www.rdkit.org/. https://github.com/rdkit/rdkit. Accessed 10 July 2023
O’Boyle NM, Banck M, James CA, Morley C, Vandermeersch T, Hutchison GR (2011) Open babel: an open chemical toolbox. J Cheminform 3:33. https://doi.org/10.1186/1758-2946-3-33
Article PubMed PubMed Central CAS Google Scholar
Willighagen EL, Mayfield JW, Alvarsson J, Berg A, Carlsson L, Jeliazkova N, Kuhn S, Pluskal T, Rojas-Chertó M, Spjuth O et al (2017) The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminf. https://doi.org/10.1186/s13321-017-0220-4
Article Google Scholar
Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500. https://doi.org/10.1021/ci025584y
Article PubMed PubMed Central CAS Google Scholar
Indigo Toolkit. https://lifescience.opensource.epam.com/indigo/. Accessed 25 June 2020.
Terlouw BR, Vromans SPJM, Medema MH (2022) PIKAChU: a python-based informatics kit for analysing chemical units. J Cheminform 14:34. https://doi.org/10.1186/s13321-022-00616-5
Article PubMed PubMed Central Google Scholar
Brinkhaus HO, Rajan K, Zielesny A, Steinbeck C (2022) RanDepict: random chemical structure depiction generator. J Cheminform 14:31. https://doi.org/10.1186/s13321-022-00609-4
Article PubMed PubMed Central Google Scholar
Zulfiqar M, Gadelha L, Steinbeck C, Sorokina M, Peters K (2023) MAW: the reproducible metabolome annotation workflow for untargeted tandem mass spectrometry. J Cheminform 15:32. https://doi.org/10.1186/s13321-023-00695-y
Article PubMed PubMed Central CAS Google Scholar
Ashiq M, Usmani MH, Naeem M (2022) A systematic literature review on research data management practices and services. Glob Knowl Mem Commun 71:649–671. https://doi.org/10.1108/gkmc-07-2020-0103
Article Google Scholar
Van Gurp J, Prehofer C. Version management tools as a basis for integrating product derivation and software product families. In: Proceedings of the proceedings of the workshop on variability management-working with variability mechanisms at SPLC; 2006; pp. 48–58.
Esparrachiari S, Reilly T, Rentz A (2018) Tracking and controlling microservice dependencies. ACM Queue 16:44–65. https://doi.org/10.1145/3277539.3277541
Article Google Scholar
Canfora G, Cimitile A (2001) Software maintenance. In: Chang SK (ed) Handbook of software engineering and knowledge engineering. World Scientific Publishing Company, Singapore, pp 91–120
Chapter Google Scholar
Huang Y-C, Tremouilhac P, Nguyen A, Jung N, Bräse S (2021) ChemSpectra: a web-based spectra editor for analytical data. J Cheminform 13:8. https://doi.org/10.1186/s13321-020-00481-0
Article PubMed PubMed Central CAS Google Scholar
Jablonka KM, Moosavi SM, Asgari M, Ireland C, Patiny L, Smit B (2020) A data-driven perspective on the colours of metal-organic frameworks. Chem Sci 12:3587–3598. https://doi.org/10.1039/d0sc05337f
Article PubMed PubMed Central CAS Google Scholar
Patiny L, Borel A (2013) ChemCalc: a building block for tomorrow’s chemical infrastructure. J Chem Inf Model 53:1223–1228. https://doi.org/10.1021/ci300563h
Article PubMed CAS Google Scholar
Patiny L, Zasso M, Kostro D, Bernal A, Castillo AM, Bolaños A, Asencio MA, Pellet N, Todd M, Schloerer N et al (2018) The C6H6 NMR repository: an integral solution to control the flow of your data from the magnet to the public. Magn Reson Chem 56:520–528. https://doi.org/10.1002/mrc.4669
Article PubMed CAS Google Scholar
Ofoeda J, Boateng R, Effah J (2019) Application programming interface (API) research. Int J Enterp Inf Syst 15:76–95. https://doi.org/10.4018/ijeis.2019070105
Article Google Scholar
Newman S (2015) Building microservices. O’Reilly Media, Sebastopol
Google Scholar
Wolff E (2017) Microservices: flexible software architecture. Addison-Wesley, Boston
Google Scholar
Chen L. Microservices: architecting for continuous delivery and DevOps. In: Proceedings of the 2018 IEEE international conference on software architecture (ICSA); 2018; pp. 39–397.
Containerization explained. https://www.ibm.com/topics/containerization. Accessed 22 June 2023.
Turnbull J. The docker book: containerization is the new virtualization. James Turnbull. 2014
Cook J (2017) Docker hub. In: Cook J (ed) Docker for data science: building scalable and extensible data infrastructure around the Jupyter notebook server. Apress, Berkeley, pp 103–118
Chapter Google Scholar
Sohan SM, Maurer F, Anslow C, Robillard MP. A study of the effectiveness of usage examples in REST API documentation. In: Proceedings of the 2017 IEEE symposium on visual languages and human-centric computing (VL/HCC). 2017; pp. 53–61.
Gokhale S, Poosarla R, Tikar S, Gunjawate S, Hajare A, Deshpande S, Gupta S, Karve K. Creating helm charts to ease deployment of enterprise application and its related services in kubernetes. In: Proceedings of the 2021 international conference on computing, communication and green engineering (CCGE); 2021; pp. 1–5.
Nelson KE, Scherer MK, et al. JPype; Lawrence Livermore National Lab. (LLNL), Livermore, CA (United States), 2020.
The OpenAPI Specification (3.1.0). https://www.openapis.org. Accessed on 25 September 2023
Bremser W (1978) Hose—a novel substructure code. Anal Chim Acta 103:355–365. https://doi.org/10.1016/S0003-2670(01)83100-7
Article CAS Google Scholar
Ertl P, Roggo S, Schuffenhauer A (2008) Natural product-likeness score and its application for prioritization of compound libraries. J Chem Inf Model 48:68–74. https://doi.org/10.1021/ci700286x
Article PubMed CAS Google Scholar
DjoumbouFeunang Y, Eisner R, Knox C, Chepelev L, Hastings J, Owen G, Fahy E, Steinbeck C, Subramanian S, Bolton E et al (2016) ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J Cheminform 8:61. https://doi.org/10.1186/s13321-016-0174-y
Article Google Scholar
Bento AP, Hersey A, Félix E, Landrum G, Gaulton A, Atkinson F, Bellis LJ, De Veij M, Leach AR (2020) An open source chemical structure curation pipeline using RDKit. J Cheminform 12:51. https://doi.org/10.1186/s13321-020-00456-1
Article PubMed PubMed Central CAS Google Scholar
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28:31–36. https://doi.org/10.1021/ci00057a005
Article CAS Google Scholar
Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D (2015) InChI, the IUPAC international chemical identifier. J Cheminform 7:23. https://doi.org/10.1186/s13321-015-0068-4
Article PubMed PubMed Central CAS Google Scholar
Heller SR, McNaught AD (2009) The IUPAC international chemical identifier (InChI). Chem Int Newsmag IUPAC 31:7–9. https://doi.org/10.1515/ci.2009.31.1.7
Article CAS Google Scholar
Weininger D, Weininger A, Weininger JL (1989) SMILES. 2. Algorithm for generation of unique SMILES notation. J Chem Inf Comput Sci 29:97–101. https://doi.org/10.1021/ci00062a008
Article CAS Google Scholar
Chemaxon Extended SMILES and SMARTS–CXSMILES and CXSMARTS. https://docs.chemaxon.com/display/docs/chemaxon-extended-smiles-and-smarts-cxsmiles-and-cxsmarts.md. Accessed 22 June 2023.
Krenn M, Häse F, Nigam A, Friederich P, Aspuru-Guzik A (2020) Self-referencing embedded strings (SELFIES): a 100% robust molecular string representation. Mach Learn Sci Technol 1:045024. https://doi.org/10.1088/2632-2153/aba947
Article Google Scholar
Krenn M, Ai Q, Barthel S, Carson N, Frei A, Frey NC, Friederich P, Gaudin T, Gayle AA, Jablonka KM et al (2022) SELFIES and the future of molecular string representations. Patterns Prejud 3:100588. https://doi.org/10.1016/j.patter.2022.100588
Article CAS Google Scholar
Rajan K, Zielesny A, Steinbeck C (2021) STOUT: SMILES to IUPAC names using neural machine translation. J Cheminform 13:34. https://doi.org/10.1186/s13321-021-00512-4
Article PubMed PubMed Central CAS Google Scholar
Cahn RS, Ingold C, Prelog V (1966) Specification of molecular chirality. Angew Chem Int Ed Engl 5:385–415. https://doi.org/10.1002/anie.196603851
Article CAS Google Scholar
Rego N, Koes D (2015) 3Dmol.js: molecular visualization with WebGL. Bioinformatics 31:1322–1324. https://doi.org/10.1093/bioinformatics/btu829
Article PubMed Google Scholar
Rajan K, Brinkhaus HO, Sorokina M, Zielesny A, Steinbeck C (2021) DECIMER-segmentation: automated extraction of chemical structure depictions from scientific literature. J Cheminform 13:20. https://doi.org/10.1186/s13321-021-00496-1
Article PubMed PubMed Central CAS Google Scholar
Rajan K, Brinkhaus HO, Isabel Agea M, Zielesny A, Steinbeck C (2023) DECIMER.ai—an open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications. ChemRxiv. https://doi.org/10.26434/chemrxiv-2023-xhcx9
Article Google Scholar
McKay BD, Yirik MA, Steinbeck C (2022) Surge: a fast open-source chemical graph generator. J Cheminform 14:24. https://doi.org/10.1186/s13321-022-00604-9
Article PubMed PubMed Central Google Scholar
Schaub J, Zielesny A, Steinbeck C, Sorokina M (2020) Too sweet: cheminformatics for deglycosylation in natural products. J Cheminform 12:67. https://doi.org/10.1186/s13321-020-00467-y
Article PubMed PubMed Central CAS Google Scholar
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten J-W, da Silva Santos LB, Bourne PE et al (2016) The FAIR guiding principles for scientific data management and stewardship. Sci Data 3:160018. https://doi.org/10.1038/sdata.2016.18
Article PubMed PubMed Central Google Scholar
Hoyt CT, Zdrazil B, Guha R, Jeliazkova N, Martinez-Mayorga K, Nittinger E (2023) Improving reproducibility and reusability in the journal of cheminformatics. J Cheminform 15:62. https://doi.org/10.1186/s13321-023-00730-y
Article PubMed PubMed Central Google Scholar
Prometheus Overview. https://prometheus.io/docs/introduction/overview/. Accessed 23 June 2023.
Chakraborty M, Kundan AP (2021) Grafana. In: Chakraborty M, Kundan AP (eds) Monitoring cloud-native applications: lead agile operations confidently using open source software. Apress, Berkeley, pp 187–240
Chapter Google Scholar
Chandrasekara C, Herath P (2021) Introduction to GitHub Actions. In: Chandrasekara C, Herath P (eds) Hands-on GitHub actions: implement CI/CD with GitHub action workflows for your applications. Apress, Berkeley, pp 1–8
Chapter Google Scholar
Gaulton A, Hersey A, Nowotka M, Bento AP, Chambers J, Mendez D, Mutowo P, Atkinson F, Bellis LJ, Cibrián-Uhalte E et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45:D945–D954. https://doi.org/10.1093/nar/gkw1074
Article PubMed CAS Google Scholar
Taneja S, Gupta PR (2014) Python as a tool for web server application development. JIMS8I Int J Inf 2:77–83
Google Scholar
Hanson RM, Musacchio S, Mayfield JW, Vainio MJ, Yerin A, Redkin D (2018) Algorithmic analysis of Cahn–Ingold–Prelog rules of stereochemistry: proposals for revised rules and a guide for machine implementation. J Chem Inf Model 58:1755–1765. https://doi.org/10.1021/acs.jcim.8b00324
Article PubMed CAS Google Scholar
John M (2018) Centres: perception and labelling of stereogenic centres in chemical structures (Version 10) [Computer software]. Github, San Francisco
Google Scholar
Herráez A (2006) Biomolecules in the computer: Jmol to the rescue. Biochem Mol Biol Educ 34:255–261. https://doi.org/10.1002/bmb.2006.494034042644
Article PubMed Google Scholar
Hanson RM, Prilusky J, Renjian Z, Nakane T, Sussman JL (2013) JSmol and the next-generation web-based representation of 3D molecular structure as applied to Proteopedia. Isr J Chem 53:207–216. https://doi.org/10.1002/ijch.201300024
Article CAS Google Scholar
PubChem Testosterone. https://pubchem.ncbi.nlm.nih.gov/compound/6013. Accessed 23 June 2023.
Dai G, Sun J, Peng X, Shen Q, Wu C, Sun Z, Sui H, Ren X, Zhang Y, Bian X (2023) Astellolides R-W, drimane-type sesquiterpenoids from an Aspergillus Parasiticus strain associated with an isopod. J Nat Prod. https://doi.org/10.1021/acs.jnatprod.3c00215
Article PubMed Google Scholar
Senart T. Vegeta: HTTP load testing tool and library: it’s over 9000. Github.

Download references

Acknowledgements

The authors would like to thank Prof. Dr Achim Zielesny for his valuable input throughout the development of Cheminformatics Microservice and Mr Henning Otto Brinkhaus for his assistance with the JPype implementation.

Funding

Open Access funding enabled and organized by Projekt DEAL. Open Access funding enabled and organized by Projekt DEAL. This work was funded by the German Research Foundation under project number: 239748522- SFB 1127 ChemBioSys (Project INF) and NFDI4Chem under project number 441958208.

Author information

Authors and Affiliations

Institute for Inorganic and Analytical Chemistry, Friedrich Schiller University Jena, Lessingstr. 8, 07743, Jena, Germany
Venkata Chandrasekhar, Nisha Sharma, Jonas Schaub, Christoph Steinbeck & Kohulan Rajan

Authors

Venkata Chandrasekhar
View author publications
You can also search for this author in PubMed Google Scholar
Nisha Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Schaub
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Steinbeck
View author publications
You can also search for this author in PubMed Google Scholar
Kohulan Rajan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

VC and KR initiated the project and developed the software. NS developed, automated, and documented the work. JS helped with Java porting. KR designed the logo. VC, JS, and KR wrote the paper together. KR and CS supervised the study. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kohulan Rajan.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The authors have given their consent for the work to be published.

Competing interests

The authors do not have any competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Performance/Stress test results.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Chandrasekhar, V., Sharma, N., Schaub, J. et al. Cheminformatics Microservice: unifying access to open cheminformatics toolkits. J Cheminform 15, 98 (2023). https://doi.org/10.1186/s13321-023-00762-4

Download citation

Received: 07 July 2023
Accepted: 19 September 2023
Published: 16 October 2023
DOI: https://doi.org/10.1186/s13321-023-00762-4

Cheminformatics Microservice: unifying access to open cheminformatics toolkits