[OTDev] Are there some sample dataset services available ?

Tue Feb 16 09:53:01 CET 2010

Joerg,

Good to have you into discussion on this list!

Jörg Kurt Wegner wrote:
> Nina, thanks for the clarification.
>
>   
>> InChiKeys are available for most of the compounds, but not used as unique identifier.  Just to note, InChiKey is a hashed identifier and theoretically not unique , thus it was decided not to use it as a compound identifier withing OpenTox.  Links to ChemSpider, PubChemID , ChemIdPlus, IUCLID5 and other possible sources will be exposed in future releases.
>>     
>
> Agreed, and you could say this for many identifiers, which are often vendor specific assuming the vendor is capable of removing redundancy.
Yes, it is indeed amazing how many incorrect structures or just wrong
mapping between identifiers and structures can be found.... and these
are further used in modeling.
>  Since InChiKeys allow at least a calculation from structures (watch the protonation and tautomerization state), I would strongly encourage a defined processing workflow (which might change over time). 
>
>   
Agree, structure processing is currently more or less hidden within
dataset or algorithm and model services, and it is a good idea to expose
it explicitly , under a separate algorithm service.
> Finally, still, in theory, mapping hashed InChIKeys for "identical" structures is possible, whatever identical means. 
Interesting, do you have a reference?
> It is "just" a question of semantics and proper ontologies ;-)
>   
> BTW, things brings up an interesting question, at which pH do you calculate TOX species? Are the calculations robust enough for different protomeric and tautomeric forms? 
It depends actually, because OpenTox involves toxicity prediction
algorithms from multiple implementations (partners), running on  remote
sites, and each calculation service may have different behaviour.
> If not, multiple input structures, aka "identical" InChIKeys, should get used.
>   
I am afraid we don't have a clear decision currently how to proceed in
this case, but fortunately, the project is still running :)  The dataset
service supports multiple structures per compound, we would need to
properly flag structures and agree how these are processed by
calculation services.

Best regards,
Nina
> Cheers, Joerg
>