[Zinc-fans] requesting peer suggestions on docking and duplication
Jens Auer
auer at bit.uni-bonn.de
Fri May 16 01:12:25 PDT 2008
On Fri, 2008-05-16 at 13:26 +0530, elite 158 wrote:
> Another question is with handling duplicates in ZINC Libraries. The
> automated docking protocol we are currently using requires all the
> Zinc comounds (if we are interested to dock a whole subsets of
> drug-like, fragment-like etc,) to be present in one single sdf file.
> As John wrote a reply sometime back in July 2006 for a query on
> duplicate structures, one would definitely not be interested in
> removing the duplicates "... it is perfectly normal to have more
> than one representation of a molecule if you combine all the files.
> For example, imidazole would have one representation (e.g. protonated)
> in the p0 "reference" subset, and have the neutral form in the p1
> subset..." The question is, if we merge all the files into one
> single giga SDF file, are the ZINC IDs unique to all the entries,
> meaning, can we safely traceback the ligand of interest to its
> representative pH subset?
>From my experience, the ZINC ids are unique with some errors. We work
mostly on 2D search methods which do not use much of the information
present in a 3D structure (e.g. chirality) and have created our own 2D
unique subset of the ZINC database. When we did so, we have encountered
a few thousand compounds which share a ZINC id, but have different
structures. These compounds are completely different compounds, but have
the same id. I can provide you with a list if you are interested, but if
you have to rely on unique ids, it is probably better to use a newly
generated id.
More information about the Zinc-fans
mailing list