Skip to main content

Table 1 Discussion of public data sources in terms of data governance

From: Data governance in predictive toxicology: A review

Database

Data Accuracy

Data Completeness

Data Integrity

Metadata

Data Availability

Data Authorisation

ChemSpider

manual and automated data curation, crowd-sourcing community supported

claimed to be the richest source of structure-based chemistry

an aggregator of nearly 400 different data sources

includes data sources metadata, provenance metadata (e.g. owner and time-stamp of data creation, curation and update) and user meta-data

can be accessed by web GUI and web services via PC and mobile devices; query results can be downloaded as a set

publicly available, no multi-level user access supported

CEBS

manual data curation, collaboration between data de-positors and internal curation staffs

contains 132 chemicals and their response in 34 detailed studies

permits users to integrate various data types and studies, a database schema is well designed by the support of controlled vocabularies

includes domain-specific metadata (e.g. owner, study details) and provenance metadata (e.g. time-stamp of the study start, curation updates)

allows users to retrieve and combine customised information and export to various formats for downloading. Able to support up to 100 concurrent users

publicly available, but also provides private data access mode to protect sensitive user data

CTD

manual data curation, sup-ported by the scientific com-munity

includes 1.4 million chemical-gene-disease data connections and has been widely recognised

employs community-accepted vocabularies and ontologies to capture data and is integrated with external resources

includes domain-specific, provenance metadata and supporting literature sources are recorded

access to entire database (downloadable as a dump file) and individual data sources; query results can be customised and exported to different formats

publicly available, no multi-level user access provided

DSSTox

manual and automated data curation, quality assurance log files recorded; but no curation for external data

contains over 8000 chemicals and have been incorporated into several external sources

integrates molecular structures and toxicity data into standardised DSSTox SDF

includes domain-specific and provenance metadata

data sources and associated documents can be downloaded individually and included data is searchable via many options

publicly available, no user registration is required

ToxCast

manual data curation, includes internal and external review

covers various chemical classes and diverse mechanism of action, 320 chemicals have been collected in Phase I and 1000 more is currently being screened

well integrated into many other EPA databases

only domain-specific metadata is included, no recorded track of provenance metadata (e.g. curator and time-stamp)

data sources are available for download individually in the ToxCast website; included data can be browsed and queried via ToxCast DB web GUI

publicly available, no user registration is required

ACToR

does not provide any data curation itself, data quality totally depends on the original data sources

contains over 500,000 chemicals and associated toxicity data from nearly 500 sources

itself is a chemical toxicity data aggregator and employs a clear and flexible database schema for data integration

metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data

open-source implementation and the entire database can be downloaded

publicly available, no user registration is required

OpenTox

allows for automated, man-ual and global data quality validations

contains different categories of public data sources which supporting predictive toxicology

employs ontology to support efficient integration of data coming from different sources into a unifying structure

metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data

provides APIs and REST-ful web services for included data, algorithms, models, ontologies and reports

publicly available, but employs OpenSSO for initial implementation of multi-level user access