From jji at cgl.ucsf.edu Thu Sep 13 16:18:50 2007 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Thu, 13 Sep 2007 16:18:50 -0700 Subject: [Zinc-fans] [Dock-fans] more valences per nitrogen than they should be? In-Reply-To: References: <20070911083923.bnnr8udhc4kc0w4g@webmail.ualberta.ca> Message-ID: <46E9C55A.6090406@cgl.ucsf.edu> Hi - Thanks for your email. We know there are some "broken" molecules in ZINC. If you send me the list of molecules you found (to jji at cgl.ucsf.edu) I will see what I can do to put it right. John Scott Brozell wrote: > Hi, > > I am cc-ing Zinc-fans > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > > Scott > > On Tue, 11 Sep 2007 burgosgu at ualberta.ca wrote: > > >> I'm just abour running a virtual screening and I just noticed that >> most of the compounds I downloaded from the zinc database have 4 >> valences for nitrogen instead of 3, e.g. a nitrogen is bound to two >> carbons and two hydrogens. Is this correct? I'm afraid that this >> affect my dockings. >> > _______________________________________________ > Dock-fans mailing list > Dock-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/dock-fans > From baoilleach at gmail.com Fri Sep 14 02:23:36 2007 From: baoilleach at gmail.com (Noel O'Boyle) Date: Fri, 14 Sep 2007 10:23:36 +0100 Subject: [Zinc-fans] [Dock-fans] more valences per nitrogen than they should be? In-Reply-To: <46E9C55A.6090406@cgl.ucsf.edu> References: <20070911083923.bnnr8udhc4kc0w4g@webmail.ualberta.ca> <46E9C55A.6090406@cgl.ucsf.edu> Message-ID: Hello all, I came across this problem some time ago (see email below from Jul 18). I've placed the list of molecules at http://www.redbrick.dcu.ie/~noel/dodgyNs.txt. I am an OpenBabel developer, and if you need any help using OpenBabel to sort out problems in ZINC, I'd only be too happy to volunteer some time. We have some nice canonicalisation code donated by eMolecules, etc., etc. Noel ==== original email ===== Dear John, I mentioned in passing some problems with some of the structures in the MOL2 files. I've now followed this up. The main problem seems to be that there are molecules containing nitrogen atoms of type N.3 (this is specified in the file) but which have four bonds. This must be an error (right?). Either these are of type N.4 (and have a positive charge), or they are of type N.3 and have at most three bonds. I think that it is the latter, and that there has been some mistake by a structure generation program at an earlier stage in your pipeline. I looked for all examples of this in ZINC and have attached the result. There are 97211 molecules with atoms of type N.3 but with 4 bonds. That's almost 5% (Out of 2021041). Regards, Noel For the record, here's the Python code to create the attached file: import glob import pybel import openbabel as ob outputfile = open("dodgyNs.txt", "w") for filename in glob.glob("gzipfiles/*.mol2"): for mol in pybel.readfile("mol2", filename): for atom in mol: if atom.type == "N3": # Internal OB atom type (equivalent to N.3) numbonds = len([1 for x in ob.OBAtomBondIter(atom.OBAtom)]) if numbonds == 4: print >> outputfile, mol.title break outputfile.close() ==== end of original email ===== On 14/09/2007, John J. Irwin wrote: > Hi - > > Thanks for your email. We know there are some "broken" molecules in > ZINC. If you send me the list of molecules you found (to jji at > cgl.ucsf.edu) I will see what I can do to put it right. > > John > > > Scott Brozell wrote: > > Hi, > > > > I am cc-ing Zinc-fans > > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > > > > Scott > > > > On Tue, 11 Sep 2007 burgosgu at ualberta.ca wrote: > > > > > >> I'm just abour running a virtual screening and I just noticed that > >> most of the compounds I downloaded from the zinc database have 4 > >> valences for nitrogen instead of 3, e.g. a nitrogen is bound to two > >> carbons and two hydrogens. Is this correct? I'm afraid that this > >> affect my dockings. > >> > > _______________________________________________ > > Dock-fans mailing list > > Dock-fans at docking.org > > http://blur.compbio.ucsf.edu/mailman/listinfo/dock-fans > > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From jji at cgl.ucsf.edu Fri Sep 14 12:19:32 2007 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Fri, 14 Sep 2007 12:19:32 -0700 Subject: [Zinc-fans] [Dock-fans] more valences per nitrogen than they should be? In-Reply-To: References: <20070911083923.bnnr8udhc4kc0w4g@webmail.ualberta.ca> <46E9C55A.6090406@cgl.ucsf.edu> Message-ID: <46EADEC4.1040104@cgl.ucsf.edu> Noel Thank you for your kind offer of assistance. I would like to move towards some kind of community-based system whereby helpful and enterprising people like you could "fix" problems in ZINC, even without contacting me (wikizincia?). Unfortunately, we don't have a mechanism for that yet. For now, I gratefully accept notification of problems, and will redouble my effort to fix errors. John Noel O'Boyle wrote: > Hello all, > > I came across this problem some time ago (see email below from Jul > 18). I've placed the list of molecules at > http://www.redbrick.dcu.ie/~noel/dodgyNs.txt. > > I am an OpenBabel developer, and if you need any help using OpenBabel > to sort out problems in ZINC, I'd only be too happy to volunteer some > time. We have some nice canonicalisation code donated by eMolecules, > etc., etc. > > Noel > > ==== original email ===== > Dear John, > > I mentioned in passing some problems with some of the structures in > the MOL2 files. I've now followed this up. The main problem seems to > be that there are molecules containing nitrogen atoms of type N.3 > (this is specified in the file) but which have four bonds. > > This must be an error (right?). Either these are of type N.4 (and have > a positive charge), or they are of type N.3 and have at most three > bonds. I think that it is the latter, and that there has been some > mistake by a structure generation program at an earlier stage in your > pipeline. > > I looked for all examples of this in ZINC and have attached the > result. There are 97211 molecules with atoms of type N.3 but with 4 > bonds. That's almost 5% (Out of 2021041). > > Regards, > Noel > > For the record, here's the Python code to create the attached file: > > import glob > > import pybel > import openbabel as ob > > outputfile = open("dodgyNs.txt", "w") > for filename in glob.glob("gzipfiles/*.mol2"): > for mol in pybel.readfile("mol2", filename): > for atom in mol: > if atom.type == "N3": # Internal OB atom type (equivalent to N.3) > numbonds = len([1 for x in ob.OBAtomBondIter(atom.OBAtom)]) > if numbonds == 4: > print >> outputfile, mol.title > break > outputfile.close() > > ==== end of original email ===== > > On 14/09/2007, John J. Irwin wrote: > >> Hi - >> >> Thanks for your email. We know there are some "broken" molecules in >> ZINC. If you send me the list of molecules you found (to jji at >> cgl.ucsf.edu) I will see what I can do to put it right. >> >> John >> >> >> Scott Brozell wrote: >> >>> Hi, >>> >>> I am cc-ing Zinc-fans >>> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >>> >>> Scott >>> >>> On Tue, 11 Sep 2007 burgosgu at ualberta.ca wrote: >>> >>> >>> >>>> I'm just abour running a virtual screening and I just noticed that >>>> most of the compounds I downloaded from the zinc database have 4 >>>> valences for nitrogen instead of 3, e.g. a nitrogen is bound to two >>>> carbons and two hydrogens. Is this correct? I'm afraid that this >>>> affect my dockings. >>>> >>>> >>> _______________________________________________ >>> Dock-fans mailing list >>> Dock-fans at docking.org >>> http://blur.compbio.ucsf.edu/mailman/listinfo/dock-fans >>> >>> >> _______________________________________________ >> Zinc-fans mailing list >> Zinc-fans at docking.org >> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >> >> From lucioric at ibt.unam.mx Thu Sep 27 13:56:22 2007 From: lucioric at ibt.unam.mx (Lucio Montero) Date: Thu, 27 Sep 2007 15:56:22 -0500 Subject: [Zinc-fans] ZINC database updates Message-ID: <001601c80148$df746740$f520f884@x> Hi. When the ZINC database is updated, can I retrieve only the new records insetad of having to re-download all records?. Can the ZINC_ID change for the old records? Lucio Montero Laboratorio de Federico S?nchez Ext. 27666 Instituto de Biotecnolog?a, UNAM Cuernavaca, Morelos, M?xico -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20070927/bad89565/attachment.html From jji at cgl.ucsf.edu Thu Sep 27 14:14:52 2007 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Thu, 27 Sep 2007 14:14:52 -0700 Subject: [Zinc-fans] ZINC database updates In-Reply-To: <001601c80148$df746740$f520f884@x> References: <001601c80148$df746740$f520f884@x> Message-ID: <46FC1D4C.3010307@cgl.ucsf.edu> Hi Lucio Thanks for your email and your interest in ZINC. I'll take your second question first. ZINC IDs are never re-used for other molecules. ZINC IDs do disappear in new versions, for example by becoming unavailable ("depleted") or being "retired" when we realize they are the same as another molecule. But we will never re-use a ZINC ID. Back to your first question. It would be nice to be able to download just a delta of what is new instead of the entire database. We did consider offering this, and it does seem to be feasible. I have the information on hand, and I may see about offering it at some point. At the moment, like the Red Queen to Alice, I find I myself breathless just to stay in the same place. So we ask you to download everything you need again. Here is the good news: you can run one of our handy c-shell scripts that invoke wget last thing in the evening, and ZINC should be on your disk the next morning, if not before. John Lucio Montero wrote: > > Hi. When the ZINC database is updated, can I retrieve only the new > records insetad of having to re-download all records?. Can the ZINC_ID > change for the old records? > > > > > > Lucio Montero > > Laboratorio de Federico S?nchez > > Ext. 27666 > > Instituto de Biotecnolog?a, UNAM > > Cuernavaca, Morelos, M?xico > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >