Data governance in predictive toxicology: A review

Fu, Xin; Wojak, Anna; Neagu, Daniel; Ridley, Mick; Travis, Kim

doi:10.1186/1758-2946-3-24

Journal of Cheminformatics

Table 1 Discussion of public data sources in terms of data governance

From: Data governance in predictive toxicology: A review

Database	Data Accuracy	Data Completeness	Data Integrity	Metadata	Data Availability	Data Authorisation
ChemSpider	manual and automated data curation, crowd-sourcing community supported	claimed to be the richest source of structure-based chemistry	an aggregator of nearly 400 different data sources	includes data sources metadata, provenance metadata (e.g. owner and time-stamp of data creation, curation and update) and user meta-data	can be accessed by web GUI and web services via PC and mobile devices; query results can be downloaded as a set	publicly available, no multi-level user access supported
CEBS	manual data curation, collaboration between data de-positors and internal curation staffs	contains 132 chemicals and their response in 34 detailed studies	permits users to integrate various data types and studies, a database schema is well designed by the support of controlled vocabularies	includes domain-specific metadata (e.g. owner, study details) and provenance metadata (e.g. time-stamp of the study start, curation updates)	allows users to retrieve and combine customised information and export to various formats for downloading. Able to support up to 100 concurrent users	publicly available, but also provides private data access mode to protect sensitive user data
CTD	manual data curation, sup-ported by the scientific com-munity	includes 1.4 million chemical-gene-disease data connections and has been widely recognised	employs community-accepted vocabularies and ontologies to capture data and is integrated with external resources	includes domain-specific, provenance metadata and supporting literature sources are recorded	access to entire database (downloadable as a dump file) and individual data sources; query results can be customised and exported to different formats	publicly available, no multi-level user access provided
DSSTox	manual and automated data curation, quality assurance log files recorded; but no curation for external data	contains over 8000 chemicals and have been incorporated into several external sources	integrates molecular structures and toxicity data into standardised DSSTox SDF	includes domain-specific and provenance metadata	data sources and associated documents can be downloaded individually and included data is searchable via many options	publicly available, no user registration is required
ToxCast	manual data curation, includes internal and external review	covers various chemical classes and diverse mechanism of action, 320 chemicals have been collected in Phase I and 1000 more is currently being screened	well integrated into many other EPA databases	only domain-specific metadata is included, no recorded track of provenance metadata (e.g. curator and time-stamp)	data sources are available for download individually in the ToxCast website; included data can be browsed and queried via ToxCast DB web GUI	publicly available, no user registration is required
ACToR	does not provide any data curation itself, data quality totally depends on the original data sources	contains over 500,000 chemicals and associated toxicity data from nearly 500 sources	itself is a chemical toxicity data aggregator and employs a clear and flexible database schema for data integration	metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data	open-source implementation and the entire database can be downloaded	publicly available, no user registration is required
OpenTox	allows for automated, man-ual and global data quality validations	contains different categories of public data sources which supporting predictive toxicology	employs ontology to support efficient integration of data coming from different sources into a unifying structure	metadata of data source and domain-specific information is well recorded, no recorded track of provenance meta-data	provides APIs and REST-ful web services for included data, algorithms, models, ontologies and reports	publicly available, but employs OpenSSO for initial implementation of multi-level user access

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com