Maximum-score diversity selection for early drug discovery

Meinl, Thorsten; Ostermann, C; Nimz, O; Zaliani, A; Berthold, MR

doi:10.1186/1758-2946-2-S1-P33

Volume 2 Supplement 1

5th German Conference on Cheminformatics: 23. CIC-Workshop

Poster presentation
Open access
Published: 04 May 2010

Maximum-score diversity selection for early drug discovery

Thorsten Meinl¹,
C Ostermann¹,
O Nimz¹,
A Zaliani¹ &
…
MR Berthold¹

Journal of Cheminformatics volume 2, Article number: P33 (2010) Cite this article

1498 Accesses
4 Citations
Metrics details

Diversity selection is a common task in early drug discovery, be it for removing redundant molecules prior to HTS or reducing the number of molecules to synthesize from scratch. One drawback of the current approach, especially with regard to HTS, is, however, that only the structural diversity is taken into account. The fact that a molecule may be highly active or completely inactive is usually ignored. This is especially remarkable, as quite a lot of research is involved in improving virtual screening methods in order to forecast activity. We therefore present a modified version of diversity selection -- which we termed Maximum-Score Diversity Selection -- which additionally takes the predicted activities of the molecules into account. Not very surprisingly both objectives -- maximizing activity whilst also maximizing diversity in the selected subset -- conflict. As a result, we end up with a multiobjective optimization problem. We will show, that the task of diversity selection is quite complicated (it is NP-complete) and therefore heuristic approaches are needed for typical dataset sizes.

A common and popular approach is using multiobjective genetic algorithms, such as NSGA-II [1], for optimizing both objectives for the selected subsets. However, we will show that usual implementations suffer from severe limitations that prevent them from finding quite a lot of possible interesting solutions. Therefore, we evaluated two other heuristic for maximum-score diversity selection. One is special heuristic (called BB2) that was motivated by the mentioned proof of NP-completeness [2]. The other is a novel heuristics called Score Erosion which was specifically developed for our actual problem. Among all three heuristics, Score Erosion is by far the fastest one while finding solutions of equal quality compared to the genetic algorithm and BB2. This will be shown on several real world datasets, both public and internal ones.

All experiments were carried out using the data analysis platform KNIME [3] therefore we will also show some example how maximum-score diversity selection can be performed inside workflow-based environments.

References

Deb K, Pratap A, Agarwal S, Meyarivan T: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002, 6: 182-197. 10.1109/4235.996017.
Article Google Scholar
Erkut E: The discrete p-dispersion problem. European Journal of Operational Research. 1990, 46 (1): 48-60. 10.1016/0377-2217(90)90297-O.
Article Google Scholar
[http://www.knime.org/]

Download references

Author information

Authors and Affiliations

University of Konstanz, 78457, Konstanz, Germany
Thorsten Meinl, C Ostermann, O Nimz, A Zaliani & MR Berthold

Authors

Thorsten Meinl
View author publications
You can also search for this author in PubMed Google Scholar
C Ostermann
View author publications
You can also search for this author in PubMed Google Scholar
O Nimz
View author publications
You can also search for this author in PubMed Google Scholar
A Zaliani
View author publications
You can also search for this author in PubMed Google Scholar
MR Berthold
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Meinl, T., Ostermann, C., Nimz, O. et al. Maximum-score diversity selection for early drug discovery. J Cheminform 2 (Suppl 1), P33 (2010). https://doi.org/10.1186/1758-2946-2-S1-P33

Download citation

Published: 04 May 2010
DOI: https://doi.org/10.1186/1758-2946-2-S1-P33

5th German Conference on Cheminformatics: 23. CIC-Workshop

Maximum-score diversity selection for early drug discovery

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Journal of Cheminformatics

Contact us

5th German Conference on Cheminformatics: 23. CIC-Workshop

Maximum-score diversity selection for early drug discovery

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us