[OTDev] Are there some sample dataset services available ?

Jörg Kurt Wegner joerg.wegner at web.de
Tue Feb 16 09:29:31 CET 2010


Nina, thanks for the clarification.

> InChiKeys are available for most of the compounds, but not used as unique identifier.  Just to note, InChiKey is a hashed identifier and theoretically not unique , thus it was decided not to use it as a compound identifier withing OpenTox.  Links to ChemSpider, PubChemID , ChemIdPlus, IUCLID5 and other possible sources will be exposed in future releases.

Agreed, and you could say this for many identifiers, which are often vendor specific assuming the vendor is capable of removing redundancy. Since InChiKeys allow at least a calculation from structures (watch the protonation and tautomerization state), I would strongly encourage a defined processing workflow (which might change over time). 

Finally, still, in theory, mapping hashed InChIKeys for "identical" structures is possible, whatever identical means. It is "just" a question of semantics and proper ontologies ;-)
BTW, things brings up an interesting question, at which pH do you calculate TOX species? Are the calculations robust enough for different protomeric and tautomeric forms? If not, multiple input structures, aka "identical" InChIKeys, should get used.

Cheers, Joerg




More information about the Development mailing list