[Zinc-fans] About molecular descriptors and clustering.

John J. Irwin jji at cgl.ucsf.edu
Fri May 11 16:44:26 PDT 2007


Hi Yasset

Thanks for your interest in ZINC, and for your questions.

yasset perez riverol wrote:
>   1. First of all, in previous version the database
>      should present some of redundance compounds. 
>         a. diferent ZINC identifier and same mol2.
>         b. diferent ZINC identifier and diferent mol2
>            but same structure. What program you
>            are using to identified same structure of
>            different vendors.
>   
We are aware of duplication problems, which we aim to fix. Please send
details of problems you have found to support at docking.org. The reason
for the duplication is that our canonicalization algorithm fails to
identify that some molecules are really the same, or we have legacy
problems that still have not been attended to. If you are doing virtual
screening, a small amount of duplication shouldn't be a big deal.

>   2. What program you are usinfg to calculate
>      molecular descriptors, and which molecular
>      descriptors you are calculating.
>   
The molecular descriptors available in ZINC are calculated using
Molinspiration's mitools.

>   3. What program do you used to clustering the
>      database and, what molecular descriptor used to
>      calculate the tanimoto coefficient.  
>   
We use a variant of Benoit Bienfait's program SUBSET 1.0 (from his time
in Marc Nicklaus' lab). We use Cactvs hashed fingerprints that were
chosen to behave similar to Daylight fingerprints.

John


More information about the Zinc-fans mailing list