Fig. 5From: MolData, a molecular benchmark for disease and target based machine learningCumulative histogram of the largest Tanimoto Similarity Coefficient for 200,000 molecules within the MolData dataset. More than 92% of the molecules have other similar molecules to them within the dataset with Tanimoto Coefficient of higher than 0.5, and more than 44% of the molecules for Coefficient of 0.7Back to article page