Table 1 Number of compounds (CIDs) in the data sets employed in the present study

  Associated filtersa Series A Series B Ratio (B/A) (%)
PubChem all 36,017,715 31,776,025 88.2
MeSH pccompound_mesh 82,446 62,217 75.5
Protein3D pccompound_structure 22,753 17,387 76.4
PharmAct pccompound_mesh_pharm 11,415 6977 61.1
Drug pccompound_drugs 1773 950 53.6
  1. The five data sets in Series A were generated using associated Entrez filters, which are used to restrict a search to a particular compound subset in PubChem. The five data sets in Series B were generated from their Series A counterparts by adding the parent compounds of the chemicals in the Series A data sets and then selecting those with a computed 3-D conformer description available
  2. aPubChem Compound Entrez filters allow users to retrieve CIDs that have a particular annotation type. For example, CIDs with “Drug” annotation can be retrieved via the URL:[filter]