Fast rule-based bioactivity prediction using associative classification mining

Table 2 The characteristics of the data sets used in this paper

Data set	hERG	antiTB	Mutagenicity
Source	PKKB [32]	Prathipati et al. [2]	Jeroen et al. [35]
#Compounds	806	3,779	4,337
Diversity	0.90	0.90	0.93
Class	blocker/non-blocker	active/inactive	mutagen/non-mutagen

Note: The diversity of each dataset is the average distance of all molecules and is calculated based on ECFP_6 by using Pipeline Pilot. The distance is defined as (1- similarity) for every pair of molecules based on the specified fingerprint.

ISSN: 1758-2946