Michael Schmuker
michael.schmuker AT chemie DOT uni-frankfurt DOT dePerform subset selection from a compound library by the usage of the MaxMin algorithm.
The algorithm works as follows:
In step 2, the most distant compound from those already in the selection is determined by max(min(d(i,j))), with i the compounds in the library, j the compounds in the selection, and d the distance index. In other words, the compound to be selected next is the one for which the minimal distance to all compounds in the selection is maximal among all compounds in the library.
Per default the Euclidian distance is used as distance metric. At the moment, no other metrics are available. This may change in the future.
Specify the filename of the library file. This version is restricted to a maximum library size of 1000 compounds. See also File format.
Specify the number of compounds to select from the library (minimum is 2).
Check "yes" if the last column in your data file contains an activity value. This value will be ignored during calculation, and will reappear unchanged in the output file. See also File format.
Compound libraries must be plain text, whitespace-separated datafiles. One line contains one compound.
Example:
#Example.dat Compound1 0.1234 0.5678 0.0000 Compound2 0.9876 0.5432 1.0000 Compound3 0.1357 0.2468 0.0000 # and so on...
Schmuker, M., Givehchi, A., and Schneider, G. (2004) Impact of different
software implementations on the performance of the Maxmin method for
diverse subset selection, Molecular Diversity 8:421-425.
PubMed