From elite158 at gmail.com Fri May 16 00:56:38 2008 From: elite158 at gmail.com (elite 158) Date: Fri, 16 May 2008 13:26:38 +0530 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication Message-ID: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> *Dear Dr. Brian K. Shoichet and Dr. John J. Irwin, * I wouId like to thank you for your efforts in making ZINC a noncommercial database accessible to the docking enthusiasts and students worldwide. *Dear docking enthusiasts and Zinc fans*, Here, I present two scenarios, maybe some one might have come across... *Scenario 1*: In a certain docking protocol, my concern is primarily of the protonation states of the ligands in the library (subsets with different pH ranges) downloaded from ZINC, as I have recently read an article on "The influence of protonation in protein-ligand docking" http://www.journal.chemistrycentral.com/content/2/S1/P12 Considering an enzyme that is reported to be optimum at a pH of 7.6-8.0, which we intend to find inhibitors for, which subset of ZINC compounds do I chose for docking against my target of interest? (*Why was there a need to create these subsets based on pH of ligands in ZINC database? Please educate if time permits*) *Scenario 2*: Another question is with handling duplicates in ZINC Libraries. The automated docking protocol we are currently using requires all the Zinc comounds (if we are interested to dock a whole subsets of drug-like, fragment-like etc,) to be present in one single sdf file. As John wrote a reply sometime back in July 2006 for a query on duplicate structures, one would definitely not be interested in removing the duplicates "... it is perfectly normal to have more than one representation of a molecule if you combine all the files. For example, imidazole would have one representation (e.g. protonated) in the p0 "reference" subset, and have the neutral form in the p1 subset..." The question is, if we merge all the files into one single giga SDF file, are the ZINC IDs unique to all the entries, meaning, can we safely traceback the ligand of interest to its representative pH subset? With best regards, Elite158 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080516/722c30d6/attachment.html From auer at bit.uni-bonn.de Fri May 16 01:12:25 2008 From: auer at bit.uni-bonn.de (Jens Auer) Date: Fri, 16 May 2008 10:12:25 +0200 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> Message-ID: <1210925545.744.23.camel@lsi-08.bit.uni-bonn.de> On Fri, 2008-05-16 at 13:26 +0530, elite 158 wrote: > Another question is with handling duplicates in ZINC Libraries. The > automated docking protocol we are currently using requires all the > Zinc comounds (if we are interested to dock a whole subsets of > drug-like, fragment-like etc,) to be present in one single sdf file. > As John wrote a reply sometime back in July 2006 for a query on > duplicate structures, one would definitely not be interested in > removing the duplicates "... it is perfectly normal to have more > than one representation of a molecule if you combine all the files. > For example, imidazole would have one representation (e.g. protonated) > in the p0 "reference" subset, and have the neutral form in the p1 > subset..." The question is, if we merge all the files into one > single giga SDF file, are the ZINC IDs unique to all the entries, > meaning, can we safely traceback the ligand of interest to its > representative pH subset? >From my experience, the ZINC ids are unique with some errors. We work mostly on 2D search methods which do not use much of the information present in a 3D structure (e.g. chirality) and have created our own 2D unique subset of the ZINC database. When we did so, we have encountered a few thousand compounds which share a ZINC id, but have different structures. These compounds are completely different compounds, but have the same id. I can provide you with a list if you are interested, but if you have to rely on unique ids, it is probably better to use a newly generated id. From jji at cgl.ucsf.edu Fri May 16 09:31:16 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Fri, 16 May 2008 09:31:16 -0700 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> Message-ID: <482DB6D4.9080507@cgl.ucsf.edu> Dear Elite158 Thank you for your interest in ZINC. > > Considering an enzyme that is reported to be optimum at a pH of > 7.6-8.0, which we intend to find inhibitors for, which subset of ZINC > compounds do I chose for docking against my target of interest? I recommend subset the reference form at pH 7 (called "ref") as well as the additional forms at physiological pH (called "mid") 5.75-8.25. This is available for download via the "usual" download link on the ZINC download pages. > > (/Why was there a need to create these subsets based on pH of ligands > in ZINC database? Please educate if time permits/) Metalloenzymes deprotonate thiols, sulfonamides, and hydroxamic acids, for example. Thus you must create the deprotonated forms to "get the right answer for the right reasons". See Irwin JJ, Raushel FM, Shoichet BK, "Virtual screening against metalloenzymes for inhibitors and substrates.", Biochemistry, 2005, 44(37),12316-28. DOI .' At the other extreme, the W191G mutant of cytochrome C peroxidase requires compounds that are not normally protonated, such as anilines. See Brenk R, Vetter SW, Boyce SE, Goodin DB, Shoichet BK. Probing molecular docking in a charged model binding site. /J Mol Biol/* 357* (5), 1449-70 (2006). [Pubmed | DOI | Download PDF ] It is a small step from these two extremes to creating pH dependent subsets. We realize this is not an ideal solution, but it has helped in our own work. > > *_Scenario 2_*: > Another question is with handling duplicates in ZINC Libraries. The > automated docking protocol we are currently using requires all the > Zinc comounds (if we are interested to dock a whole subsets of > drug-like, fragment-like etc,) to be present in one single sdf file. > > As John wrote a reply sometime back in July 2006 for a query on > duplicate structures, one would definitely not be interested in > removing the duplicates > > "... it is perfectly normal to have more than one representation of a > molecule if you combine all the files. For example, imidazole would > have one representation (e.g. protonated) in the p0 "reference" > subset, and have the neutral form in the p1 subset..." > > The question is, if we merge all the files into one single giga SDF > file, are the ZINC IDs unique to all the entries, meaning, can we > safely traceback the ligand of interest to its representative pH subset? > We have done extensive duplicate removal ("retirement") in ZINC 8 (http://zinc8.docking.org/). I won't say there are no duplicates, but the number is way, way down. ZINC 7 remains unchanged (including duplicates) for backward compatibility with projects already in progress. ZINC 8 is not the official release yet, but it will be soon. Good luck John UCSF ZINC Team > > With best regards, > Elite158 > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From zsolt at simbiosys.ca Fri May 16 12:05:24 2008 From: zsolt at simbiosys.ca (Zsolt Zsoldos) Date: Fri, 16 May 2008 15:05:24 -0400 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication Message-ID: Dear Elite158 and John, It is an excellent step to offer multiple protonation versions of the molecules to help docking/screening runs. My sincere thanks to the ZINC team for that! However, I would like to make a note, that the problem is much deeper, and proper choice of protonation really depends on local environment in a protain-ligand binding scneraio, which cannot be sufficiently modeled by a global property, like the pH. I have written response on my blog about it and plan to folow up with more details this weekend: http://www.simbiosys.ca/blog/ Zsolt -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080516/c7c4b407/attachment-0001.html From jji at cgl.ucsf.edu Fri May 16 12:35:00 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Fri, 16 May 2008 12:35:00 -0700 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: References: Message-ID: <482DE1E4.7020402@cgl.ucsf.edu> Hi Zsolt Thanks for your comments. We agree, and look forward seeing the results of a more sophisticated treatment of protonation/tautomerism. You absolutely should evaluate the internal energy due to the cost of deprotonation and/or tautomerism. Unfortunately, this is often a pretty subtle thing to calculate even at a modestly high level of theory. ZINC provides a way to get started and anticipate these effects *now*, however imperfectly. Cheers John UCSF ZINC Team Zsolt Zsoldos wrote: > Dear Elite158 and John, > > It is an excellent step to offer multiple protonation versions of the > molecules to help docking/screening runs. > My sincere thanks to the ZINC team for that! > > However, I would like to make a note, that the problem is much deeper, > and proper choice of protonation really depends on local environment > in a protain-ligand binding scneraio, which cannot be sufficiently > modeled by a global property, like the pH. I have written response on > my blog about it and plan to folow up with more details this weekend: > http://www.simbiosys.ca/blog/ > > Zsolt > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From m.stoermer at imb.uq.edu.au Fri May 16 19:22:56 2008 From: m.stoermer at imb.uq.edu.au (Martin Stoermer) Date: Sat, 17 May 2008 12:22:56 +1000 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> Message-ID: Mornign everyone, another thing to remember is that if you do concatenate the zinc files (as we do) is that some docking packages will run into memory problems after too many ligands. For example we have had problems with GOLD. When I concatenated all 113 of the ZINC Druglike sdf "chunks" you get a 9.6GB SD file. When I ran a GOLD job on our cluster across 8 nodes if dies after ~600K ligands with error messages like: libpvm [t40002]: pvm_pklong(): Value too large Pvm Function pvm_pklong( (long) start_point, 1, 1), called from GOLD_COMM_SendNextDock, caused an error: Value too large ************************************************************************ ****** warning: Slave process 786433 on host node21.***.***.***.*** failed with following error message: Gold internal failure (3.2) in docking ligand 0 in file /chem_db/ Drug_like/GOLD/Drug_like_3d.sdf ************************************************************************ ****** WARNING: pvm task problem 786433 Fatal error: Received interrupt [signal number 15] I can't speak for how Autodock, Dock and the others handle these large datasets, I haven't tried to bludgeon them with 2.5 million compounds yet. Maybe others will have that answer. So it may be more sensible to limit yourself to smaller sets e.g. one concatentated vendor database at a time. cheers, Martin > > The question is, if we merge all the files into one single giga SDF > file, are the ZINC IDs unique to all the entries, meaning, can we > safely traceback the ligand of interest to its representative pH > subset? > > > With best regards, > Elite158 > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080517/70c4d7f6/attachment.html From jji at cgl.ucsf.edu Fri May 16 23:14:07 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Fri, 16 May 2008 23:14:07 -0700 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> Message-ID: <482E77AF.4040100@cgl.ucsf.edu> Hi Martin That's interesting. We like to keep database tranches small so we can run dock in parallel (course grained parallel, completely separate processes) on dozens or even hundreds of cores, and then combine the results together at the end. Most computers nowadays come with 4 or even 8 effective cores inside, so running dock in smaller chunks makes efficient use of your available hardware. John UCSF ZINC Team Martin Stoermer wrote: > Mornign everyone, > > another thing to remember is that if you do concatenate the zinc files > (as we do) is that some docking packages will run into memory problems > after too many ligands. For example we have had problems with GOLD. > When I concatenated all 113 of the ZINC Druglike sdf "chunks" you get > a 9.6GB SD file. When I ran a GOLD job on our cluster across 8 nodes > if dies after ~600K ligands with error messages like: > > libpvm [t40002]: pvm_pklong(): Value too large > Pvm Function pvm_pklong( (long) start_point, 1, 1), called from > GOLD_COMM_SendNextDock, caused an error: Value too large > > ****************************************************************************** > warning: Slave process 786433 on host node21.***.***.***.*** failed > with following error message: > Gold internal failure (3.2) in docking ligand 0 in file > /chem_db/Drug_like/GOLD/Drug_like_3d.sdf > ****************************************************************************** > WARNING: pvm task problem 786433 > Fatal error: > Received interrupt [signal number 15] > > I can't speak for how Autodock, Dock and the others handle these large > datasets, I haven't tried to bludgeon them with 2.5 million compounds > yet. Maybe others will have that answer. So it may be more sensible to > limit yourself to smaller sets e.g. one concatentated vendor database > at a time. > > cheers, > Martin > > >> >> The question is, if we merge all the files into one single giga SDF >> file, are the ZINC IDs unique to all the entries, meaning, can we >> safely traceback the ligand of interest to its representative pH subset? >> >> >> With best regards, >> Elite158 >> >> _______________________________________________ >> Zinc-fans mailing list >> Zinc-fans at docking.org >> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From elite158 at gmail.com Sat May 17 01:22:02 2008 From: elite158 at gmail.com (elite 158) Date: Sat, 17 May 2008 13:52:02 +0530 Subject: [Zinc-fans] protonation and duplication in zinc database Message-ID: <7811d3d20805170122t715f3806lf3f36358d0d11740@mail.gmail.com> Hello all, Thank you very much for the quick and very well accepted suggestions. It indeed is a very lively and active fan mail ..:) cheers, elite158 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080517/d9bf2e05/attachment.html From jji at cgl.ucsf.edu Sat May 17 07:18:51 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Sat, 17 May 2008 07:18:51 -0700 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: <482E77AF.4040100@cgl.ucsf.edu> References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> <482E77AF.4040100@cgl.ucsf.edu> Message-ID: <482EE94B.3090201@cgl.ucsf.edu> Hi Elite >>> >>> The question is, if we merge all the files into one single giga SDF >>> file, are the ZINC IDs unique to all the entries, meaning, can we >>> safely traceback the ligand of interest to its representative pH >>> subset? >>> You may not be able to, since we use the same ZINC ID for all representations that can interconvert freely in buffer. If it is important to you to know, I suggest just merging _within_ each pH subset. John UCSF ZINC Team From zsolt at simbiosys.ca Sat May 17 08:37:58 2008 From: zsolt at simbiosys.ca (Zsolt Zsoldos) Date: Sat, 17 May 2008 11:37:58 -0400 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: <482EE94B.3090201@cgl.ucsf.edu> References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> <482E77AF.4040100@cgl.ucsf.edu> <482EE94B.3090201@cgl.ucsf.edu> Message-ID: John, I know it is not exactly the profile of ZINC/DUD to work with PDB complexes, but would you be interested to setting up a correctly protonated complex database for cognate docking benchmarks ? Egon Willighagen has asked this on my blog in response to my detailed version of the protonation post: http://www.simbiosys.ca/blog/2008/05/17/correct-protonation-state-for-docking/#comment-1377 "Is there a gold standard; a good training set to set approaches on? As you already said, crystallography is not going to help. Databases with experimental pKa database are often proprietary? Any suggestions?" Since I do not know any public database with validated correct protonation assignement for specific complexes, it would be a very valuable thing to create. I am thinking about a community based peer-reviewed project, kind of like Wikipedia, where people could edit/comment/curate the data. I would be interested to help, deposit PDB complexes processed by our software. But I think it would be better if it is not hosted by us (commercial software vendor), I would not want it to be biased, so docking.org would be the perfect entity to house it. Zsolt (aka ZZ) On Sat, May 17, 2008 at 10:18 AM, John J. Irwin wrote: > Hi Elite > >>> > >>> The question is, if we merge all the files into one single giga SDF > >>> file, are the ZINC IDs unique to all the entries, meaning, can we > >>> safely traceback the ligand of interest to its representative pH > >>> subset? > >>> > You may not be able to, since we use the same ZINC ID for all > representations that can interconvert freely in buffer. If it is > important to you to know, I suggest just merging _within_ each pH subset. > > John > UCSF ZINC Team > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080517/fe95e61b/attachment.html From jji at cgl.ucsf.edu Mon May 19 11:49:35 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Mon, 19 May 2008 11:49:35 -0700 Subject: [Zinc-fans] requesting peer suggestions on docking and duplication In-Reply-To: References: <7811d3d20805160056y4c12500dtdf3fb7c75798ecac@mail.gmail.com> <482E77AF.4040100@cgl.ucsf.edu> <482EE94B.3090201@cgl.ucsf.edu> Message-ID: <4831CBBF.7080307@cgl.ucsf.edu> Hi Zsolt Thanks for your email and your recent blog entry http://www.simbiosys.ca/blog/2008/05/17/correct-protonation-state-for-docking/#comment-1377 about preparing a benchmarking set focusing on protonation states for docking. As far as I know, there is currently no good benchmarking set for receptor protonation, and thus this would be a welcome contribution to the community. As you know, the NIH recently sent out an RFA to fund a Drug Docking and Screening Data Resource project to assemble benchmarking sets for docking. http://grants.nih.gov/grants/guide/rfa-files/RFA-GM-08-008.html Although protonation per se was not one of the central criteria for the Resource, it certainly ought to figure prominently. The deadline for applications was March 18th 2008, reviews will be in June/July, and Council Review is in October. May I suggest you lobby the successful applicant with your ideas? If for some reason the successful applicant were to not wish to proceed along the lines you propose, then I think starting up your own "wiki-dock-protonation-benchmark" would be very helpful. Let me be clear: go ahead and do what you like. But perhaps you can work with the DDSD Resource to create something really excellent. You asked about me personally (i.e. UCSF) getting involved. We would certainly be happy to host (via our wiki, or otherwise) community data for docking. You don't even need to ask - just create a wiki account and post your data. However, for deeper involvement, I perfer to wait for the outcome of the DDSD Resource award, which should be known, at least informally, in the second half of the summer of '08. Finally, let me put in a word for the importance of prospective tests as well as any retrospective benchmarks. Paraphrasing Bohr, making predictions are particularly hard - and impressive - when they concern the future. I hope any benchmarking set for receptor protonation will be a living process that will continue to make testable predictions, and not just a "lookup table" based on historical precedent. Good luck! John UCSF ZINC Team Zsolt Zsoldos wrote: > John, > > I know it is not exactly the profile of ZINC/DUD to work with PDB > complexes, but would you be interested to setting up a correctly > protonated complex database for cognate docking benchmarks ? > > Egon Willighagen has asked this on my blog in response to my detailed > version of the protonation post: > http://www.simbiosys.ca/blog/2008/05/17/correct-protonation-state-for-docking/#comment-1377 > > "Is there a gold standard; a good training set to set approaches on? > As you already said, crystallography is not going to help. Databases > with experimental pKa database are often proprietary? Any suggestions?" > > Since I do not know any public database with validated correct > protonation assignement for specific complexes, it would be a very > valuable thing to create. I am thinking about a community based > peer-reviewed project, kind of like Wikipedia, where people could > edit/comment/curate the data. I would be interested to help, deposit > PDB complexes processed by our software. But I think it would be > better if it is not hosted by us (commercial software vendor), I would > not want it to be biased, so docking.org would be > the perfect entity to house it. > > Zsolt (aka ZZ) > > On Sat, May 17, 2008 at 10:18 AM, John J. Irwin > wrote: > > Hi Elite > >>> > >>> The question is, if we merge all the files into one single > giga SDF > >>> file, are the ZINC IDs unique to all the entries, meaning, can we > >>> safely traceback the ligand of interest to its representative pH > >>> subset? > >>> > You may not be able to, since we use the same ZINC ID for all > representations that can interconvert freely in buffer. If it is > important to you to know, I suggest just merging _within_ each pH > subset. > > John > UCSF ZINC Team > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From jji at cgl.ucsf.edu Tue May 20 13:36:37 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Tue, 20 May 2008 13:36:37 -0700 Subject: [Zinc-fans] What dose the subset number represent for? In-Reply-To: <551806.6375.qm@web15902.mail.cnb.yahoo.com> References: <551806.6375.qm@web15902.mail.cnb.yahoo.com> Message-ID: <48333655.2090104@cgl.ucsf.edu> Hi Yolanda The subset create feature works in ZINC 8 only, and has recently been upgraded, so there is a good chance it will work for you. Sorry to take so long to get back to you. Hope this helps. John UCSF ZINC Team ?? ?? wrote: > Hi all, > > Excuse me, I have another question. In the results browser, there is a > button named created subset. When I clicked it, "*Creating subset > 1202.[1] 28022" displayed. *But I don't know what its meaning. What > does the number represent for? I have looked "user-created subset" > page over , however I don't understand what does the first row named > "Subset#" mean. > Sorry for asking so many questions. > Thanks in advance for any answer. > > Best regards, > > > > > Yolanda Guo > Northeast Normal University > > ------------------------------------------------------------------------ > Never miss a thing. Make Yahoo your homepage. > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From mxs10 at case.edu Tue May 27 07:16:51 2008 From: mxs10 at case.edu (Menachem Shoham) Date: Tue, 27 May 2008 17:16:51 +0300 Subject: [Zinc-fans] viewing large Excel files Message-ID: I have downloaded files 1_prop.xls and 1_purch.xls. Due to the limitations in Excel I can only view about 65,000 entries. How do I look up an entry that is beyond this number? Thanks, Menachem Shoham Menachem Shoham, PhD Department of Biochemistry Case Western Reserve University Cleveland, Ohio Tel. 216-368-4665 E-mail: mxs10 at case.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080527/808a957f/attachment.html From jji at cgl.ucsf.edu Tue May 27 07:26:28 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Tue, 27 May 2008 07:26:28 -0700 Subject: [Zinc-fans] viewing large Excel files In-Reply-To: References: Message-ID: <483C1A14.1040807@cgl.ucsf.edu> Hi Menachem You can use grep to extract out the lines you want. Good luck! John UCSF ZINC Team Menachem Shoham wrote: > I have downloaded files 1_prop.xls and 1_purch.xls. Due to the > limitations in Excel I can only view about 65,000 entries. How do I > look up an entry that is beyond this number? > > > Thanks, > > Menachem Shoham > > Menachem Shoham, PhD > Department of Biochemistry > Case Western Reserve University > Cleveland, Ohio > Tel. 216-368-4665 > E-mail: mxs10 at case.edu > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From mxs10 at case.edu Tue May 27 08:46:33 2008 From: mxs10 at case.edu (Menachem Shoham) Date: Tue, 27 May 2008 18:46:33 +0300 Subject: [Zinc-fans] cannot find information on a series of compounds In-Reply-To: <483C1A14.1040807@cgl.ucsf.edu> References: <483C1A14.1040807@cgl.ucsf.edu> Message-ID: <864392F1-B4AD-49F5-80C7-BD364FB2121F@case.edu> Hi John, I would like to get purchasing and properties information on the following entries: ZINC04417836 ZINC04417838 ZINC04417841 ZINC04417847 I could not find entries for these compounds in 1_prop.xls or 1_purch.xls How do I retrieve this information? Thanks, Menachem On May 27, 2008, at 5:26 PM, John J. Irwin wrote: > Hi Menachem > > You can use grep to extract out the lines you want. > > Good luck! > > John > UCSF ZINC Team > > > Menachem Shoham wrote: >> I have downloaded files 1_prop.xls and 1_purch.xls. Due to the >> limitations in Excel I can only view about 65,000 entries. How do >> I look up an entry that is beyond this number? >> >> >> Thanks, >> >> Menachem Shoham >> >> Menachem Shoham, PhD >> Department of Biochemistry >> Case Western Reserve University >> Cleveland, Ohio >> Tel. 216-368-4665 >> E-mail: mxs10 at case.edu >> >> >> >> --------------------------------------------------------------------- >> --- >> >> _______________________________________________ >> Zinc-fans mailing list >> Zinc-fans at docking.org >> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >> Menachem Shoham, PhD Department of Biochemistry Case Western Reserve University Cleveland, Ohio Tel. 216-368-4665 E-mail: mxs10 at case.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://blur.compbio.ucsf.edu/pipermail/zinc-fans/attachments/20080527/a5660d2d/attachment.html From jji at cgl.ucsf.edu Tue May 27 08:58:26 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Tue, 27 May 2008 08:58:26 -0700 Subject: [Zinc-fans] cannot find information on a series of compounds In-Reply-To: <864392F1-B4AD-49F5-80C7-BD364FB2121F@case.edu> References: <483C1A14.1040807@cgl.ucsf.edu> <864392F1-B4AD-49F5-80C7-BD364FB2121F@case.edu> Message-ID: <483C2FA2.3010004@cgl.ucsf.edu> Hi Menachem Go to the ZINC search page. You can use the new version of ZINC http://zinc8.docking.org/ Choose "Search and Browse" from the "Home" menu. Paste the 4 zinc IDs (one per line) in the "ZINC Codes" field on the middle left Click on "Query Database" you should see the purchasing info for these compounds. Good luck! John UCSF ZINC Team Menachem Shoham wrote: > Hi John, > > I would like to get purchasing and properties information on the > following entries: > > ZINC04417836 > ZINC04417838 > ZINC04417841 > ZINC04417847 > > I could not find entries for these compounds in 1_prop.xls or 1_purch.xls > > How do I retrieve this information? > > Thanks, > Menachem > On May 27, 2008, at 5:26 PM, John J. Irwin wrote: > >> Hi Menachem >> >> You can use grep to extract out the lines you want. >> >> Good luck! >> >> John >> UCSF ZINC Team >> >> >> Menachem Shoham wrote: >>> I have downloaded files 1_prop.xls and 1_purch.xls. Due to the >>> limitations in Excel I can only view about 65,000 entries. How do I >>> look up an entry that is beyond this number? >>> >>> >>> Thanks, >>> >>> Menachem Shoham >>> >>> Menachem Shoham, PhD >>> Department of Biochemistry >>> Case Western Reserve University >>> Cleveland, Ohio >>> Tel. 216-368-4665 >>> E-mail: mxs10 at case.edu >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Zinc-fans mailing list >>> Zinc-fans at docking.org >>> http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >>> > > Menachem Shoham, PhD > Department of Biochemistry > Case Western Reserve University > Cleveland, Ohio > Tel. 216-368-4665 > E-mail: mxs10 at case.edu > > > From jji at cgl.ucsf.edu Wed May 28 00:49:54 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Wed, 28 May 2008 00:49:54 -0700 Subject: [Zinc-fans] Some questions about the general information of the vendor In-Reply-To: <838519.10154.qm@web15902.mail.cnb.yahoo.com> References: <838519.10154.qm@web15902.mail.cnb.yahoo.com> Message-ID: <483D0EA2.2010406@cgl.ucsf.edu> Hi Yolanda We will improve the formatting, but we use 1 for true, 0 for false. Thus the subset you cite is freely available (this is the only kind you get to see outside UCSF), is for sale, will not be uploaded to pubchem, and is not depleted (i.e. should still be available for sale according to the most recent information we have processed) I hope this helps! John UCSF ZINC Team ?? ?? wrote: > Dear zinc-fans: > > I noticed that there are some informations in the vendor subset page > as follows. But I'm not sure what are the meanings of the red > characters. What do the numbers "0,1" represent for? > > "Catalog version: May 07, # in catalog: 250000 > Website: dtp.nci.nih.gov, email dtpinfo at mail.nih.gov, phone no phone, > fax no fax > free? 1, purchasable? 1, pubchem? 0, depleted? 0 > # filtered 0" > > Any help would be much appreciated! > Thanks in advance. > > Best regards > > > Yolanda Guo > Northeast Normal University > > ------------------------------------------------------------------------ > Looking for last minute shopping deals? Find them fast with Yahoo! > Search. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans > From jji at cgl.ucsf.edu Wed May 28 00:58:32 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Wed, 28 May 2008 00:58:32 -0700 Subject: [Zinc-fans] non-unique ZINC ids in Zinc7 In-Reply-To: <1203073920.4324.2.camel@lsi-08> References: <1203073920.4324.2.camel@lsi-08> Message-ID: <483D10A8.3040404@cgl.ucsf.edu> Hi Jens Thank you for your email. I believe this has been mostly if not entirely cleared up, database-wide, in ZINC 8, which is available in beta now (http://zinc8.docking.org/) and will become the default version soon. John UCSF ZINC Team Jens Auer wrote: > Hi, > > we have just found several molecules in the Zinc7 database which are > different in structure but have the same ZINC id. The complete list includes > ~23000 ids, where most of the time two or three entries correspond the same > SMILES string (it also include the ZINC01278699 from the errata which matches > more than a hundred SMILES strings). I can send you the list of these ids if you're interested, but > it is too large to be posted here on the mailing list. I've attached a samle sd-file with > two compounds with the same id but different structure for illustration. > > Best regards, > Jens > > > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans From jji at cgl.ucsf.edu Wed May 28 07:30:06 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Wed, 28 May 2008 07:30:06 -0700 Subject: [Zinc-fans] total number of compounds in vendors subset -reg In-Reply-To: <2bdc62b30804230914p293f2bcmf1d1f6a8c8ce2c9e@mail.gmail.com> References: <2bdc62b30804230914p293f2bcmf1d1f6a8c8ce2c9e@mail.gmail.com> Message-ID: <483D6C6E.1070101@cgl.ucsf.edu> Hi Rafi Thanks for your email and your interest in ZINC. Sorry to take so long to get back to you. I have recently exported a fresh copy of Sigma Aldrich in ZINC 8 (http://zinc8.docking.org). There are 17,931 molecules in the source catalogs, and 15186 in ZINC. We downloaded every SDF file we could find on the Sigma Aldrich website. I've ordered the CD, and will include any additional molecules that may be there. Previously we have included the "rare" library from Sigma Aldrich, based on files we received perhaps 5 years ago. There were nearly 200K of these. Since these are no longer available on the Sigma Aldrich website, they have been removed from ZINC. I think this change may account for some of the discrepancies you saw. Good luck John UCSF ZINC Team rafi A wrote: > Hello, > > > Where can we find the total number of compounds in a subset? > > > > For example I want to download the vendors/sigma Aldrich subset. > > > > In the table column, catalog information: Source entries; shows 295,562. > > Another column, ZINC information: Loaded; shows 115,595. So I expected > the total number of molecules to be either 295,000 or 115,000. > > > > But when I downloaded the mid pH,( SMILES or mol2) it shows only > 14,449 molecules. > > > > Did I misunderstood something. Or can you tell me where I can find the > total number of molecules in a subset before downloading. > > > > Thanks in advance. > > > > Best regards, > > Rafi > From jji at cgl.ucsf.edu Wed May 28 08:04:29 2008 From: jji at cgl.ucsf.edu (John J. Irwin) Date: Wed, 28 May 2008 08:04:29 -0700 Subject: [Zinc-fans] All-purchasable subset In-Reply-To: <20004.7885.qm@web55402.mail.re4.yahoo.com> References: <20004.7885.qm@web55402.mail.re4.yahoo.com> Message-ID: <483D747D.6000107@cgl.ucsf.edu> Hi Josmar I have exported "all purchasable" for ZINC 8, which now has 8.4M molecules. It should have nearly 10M in a month or so after I get a few more problems sorted out. May I take this opportunity to point out that we have created several fun new subsets in ZINC that we have found useful, and you may too! #17 - neutral fragments. (51K of these) #29 - CNS permeable (209K of these) #33 - goldilocks - not too big, not too small, not too polar, not too greasy - just right. (almost 500K of these) #50 - stiff-solubles - fairly rigid fragments that are probably quite soluble. Happy docking! John UCSF ZINC Team Josmar R. da Rocha wrote: > Dear Zinc-fans, > > I noticed that the subset "all-purchasable" that could be downloaded > from Zinc 7 is no longer available in Zinc 8. I'd Like to know if the > only way to get this subset would be by downloading each one of the > files found in " By vendor /in stock" subsets or is there any other way? > > Thanks in advance! > > Josmar Rocha > > ------------------------------------------------------------------------ > Abra sua conta no Yahoo! Mail > , > o ?nico sem limite de espa?o para armazenamento! > ------------------------------------------------------------------------ > > _______________________________________________ > Zinc-fans mailing list > Zinc-fans at docking.org > http://blur.compbio.ucsf.edu/mailman/listinfo/zinc-fans >