Skip to main content

Table 1 Selected datasets from the epigenomic database

From: Statistical-based database fingerprint: chemical space dependent representation of compound databases

Dataset

Number of compounds

Intra-set similarity median (Tc)

Average “1” bits

Number of “1” bits in DFP

Number of “1” bits in SB-DFP

MACCSa

ECFP4b

MACCSa

ECFP4b

MACCSa

ECFP4b

MACCSa

ECFP4b

BRD2

234

0.569

0.152

56.0

54.3

53

27

67

229

BRD3

246

0.573

0.153

56.6

54.6

53

26

73

231

BRD4

477

0.486

0.133

55.9

52.8

47

14

71

333

CREBBP

105

0.694

0.276

56.1

53.9

52

36

50

185

DNMT1

127

0.403

0.115

55.4

51.7

50

13

62

281

EHMT2

61

0.636

0.228

62.4

55.7

62

41

56

167

EP300

57

0.425

0.106

58.2

55.7

53

11

56

285

HDAC10

190

0.514

0.165

53.2

50.6

50

17

46

272

HDAC11

137

0.494

0.156

51.2

50.8

48

16

42

229

HDAC1

2740

0.453

0.149

53.2

51.4

51

15

63

499

HDAC2

767

0.447

0.149

50.3

48.4

46

13

53

336

HDAC3

669

0.474

0.147

52.6

50.3

49

13

54

356

HDAC4

452

0.427

0.135

50.4

46.4

42

10

49

248

HDAC5

112

0.455

0.153

47.3

44.1

39

13

26

176

HDAC6

1374

0.474

0.149

54.3

49.8

48

13

62

415

HDAC7

112

0.489

0.165

50.4

45.8

43

12

28

197

HDAC8

864

0.500

0.153

54.9

51.2

50

12

52

398

HDAC9

102

0.494

0.169

52.6

47.4

46

13

29

190

KAT2B

55

0.583

0.179

50.8

37.3

46

13

44

99

KDM1A

241

0.380

0.143

44.8

46.2

31

21

31

216

KDM4C

88

0.359

0.101

48.8

40.3

41

10

38

158

L3MBTL1

50

0.804

0.551

42.2

36.8

37

27

37

56

L3MBTL3

89

0.731

0.404

40.4

36.6

37

26

35

83

MAP3K7

96

0.539

0.137

57.1

60.5

59

35

45

190

MGEA5

67

0.683

0.316

54.2

39.6

48

19

42

126

NCOA1

51

0.350

0.105

45.5

43.3

34

11

18

132

NCOA3

157

0.368

0.109

47.7

44.6

39

10

26

166

PRMT1

61

0.395

0.076

53.0

53.5

41

9

40

239

Average

350

0.507

0.178

52

48

46

18

46

232

  1. aMACCS keys 166-bit
  2. bECFP4 2048-bit