Skip to main content

Advertisement

Table 1 Discussion of public data sources in terms of data governance

From: Data governance in predictive toxicology: A review

Database Data Accuracy Data Completeness Data Integrity Metadata Data Availability Data Authorisation
ChemSpider manual and automated data curation, crowd-sourcing community supported claimed to be the richest source of structure-based chemistry an aggregator of nearly 400 different data sources includes data sources metadata, provenance metadata (e.g. owner and time-stamp of data creation, curation and update) and user meta-data can be accessed by web GUI and web services via PC and mobile devices; query results can be downloaded as a set publicly available, no multi-level user access supported
CEBS manual data curation, collaboration between data de-positors and internal curation staffs contains 132 chemicals and their response in 34 detailed studies permits users to integrate various data types and studies, a database schema is well designed by the support of controlled vocabularies includes domain-specific metadata (e.g. owner, study details) and provenance metadata (e.g. time-stamp of the study start, curation updates) allows users to retrieve and combine customised information and export to various formats for downloading. Able to support up to 100 concurrent users publicly available, but also provides private data access mode to protect sensitive user data
CTD manual data curation, sup-ported by the scientific com-munity includes 1.4 million chemical-gene-disease data connections and has been widely recognised employs community-accepted vocabularies and ontologies to capture data and is integrated with external resources includes domain-specific, provenance metadata and supporting literature sources are recorded access to entire database (downloadable as a dump file) and individual data sources; query results can be customised and exported to different formats publicly available, no multi-level user access provided
DSSTox manual and automated data curation, quality assurance log files recorded; but no curation for external data contains over 8000 chemicals and have been incorporated into several external sources integrates molecular structures and toxicity data into standardised DSSTox SDF includes domain-specific and provenance metadata data sources and associated documents can be downloaded individually and included data is searchable via many options publicly available, no user registration is required
ToxCast manual data curation, includes internal and external review covers various chemical classes and diverse mechanism of action, 320 chemicals have been collected in Phase I and 1000 more is currently being screened well integrated into many other EPA databases only domain-specific metadata is included, no recorded track of provenance metadata (e.g. curator and time-stamp) data sources are available for download individually in the ToxCast website; included data can be browsed and queried via ToxCast DB web GUI publicly available, no user registration is required
ACToR does not provide any data curation itself, data quality totally depends on the original data sources contains over 500,000 chemicals and associated toxicity data from nearly 500 sources itself is a chemical toxicity data aggregator and employs a clear and flexible database schema for data integration metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data open-source implementation and the entire database can be downloaded publicly available, no user registration is required
OpenTox allows for automated, man-ual and global data quality validations contains different categories of public data sources which supporting predictive toxicology employs ontology to support efficient integration of data coming from different sources into a unifying structure metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data provides APIs and REST-ful web services for included data, algorithms, models, ontologies and reports publicly available, but employs OpenSSO for initial implementation of multi-level user access