MaxMinSelection light


Michael Schmuker

michael.schmuker AT chemie DOT uni-frankfurt DOT de
Johann Wolfgang Goethe-Universität, Frankfurt am Main, Germany
Beilstein-Endowed Chair of Cheminformatics


Perform subset selection from a compound library by the usage of the MaxMin algorithm.

The algorithm works as follows:

  1. Choose first compound from and put it in the selection
  2. Find most distant compound from the compound already in selection and put it in the selection
  3. Repeat step 2 until the desired number of compounds is in the selection

In step 2, the most distant compound from those already in the selection is determined by max(min(d(i,j))), with i the compounds in the library, j the compounds in the selection, and d the distance index. In other words, the compound to be selected next is the one for which the minimal distance to all compounds in the selection is maximal among all compounds in the library.

Per default the Euclidian distance is used as distance metric. At the moment, no other metrics are available. This may change in the future.

Library file name

Specify the filename of the library file. This version is restricted to a maximum library size of 1000 compounds. See also File format.

Subset size

Specify the number of compounds to select from the library (minimum is 2).

Treat last column as activity value?

Check "yes" if the last column in your data file contains an activity value. This value will be ignored during calculation, and will reappear unchanged in the output file. See also File format.

File format

Compound libraries must be plain text, whitespace-separated datafiles. One line contains one compound.


Compound1       0.1234  0.5678  0.0000
Compound2       0.9876  0.5432  1.0000
Compound3       0.1357  0.2468  0.0000
# and so on...


Schmuker, M., Givehchi, A., and Schneider, G. (2004) Impact of different software implementations on the performance of the Maxmin method for diverse subset selection, Molecular Diversity 8:421-425.