[OTDev] scripts to extract toxicity data from echa site

Barry Hardy barry.hardy at douglasconnect.com
Tue Jun 7 17:32:25 CEST 2011


One example use case we would like to solve:
1) For endpoint X, extract matrix of REACH data [chemical structures, 
endpoint classification/values] similar to "our compounds"
2) Combine with "our data" [chemical structures, endpoint 
classification/values]
3) Build and validate model (as done today with ToxCreate but with 
general algorithm choice, perhaps running in cloud on large cluster)
4) Use model to give me predictions with confidence/applicability domain 
on "our structure(s)" of interest (similar to what is done today with 
ToxPredict).

We would not need to necessarily persist any resource such as a newly 
created public dataset.

Barry

Dear All:
It might be worthwhile for the developer community to write scripts to 
extract public REACH dossier toxicity data from the ECHA website to make 
it available in a more suitable form for scientific purposes including 
model building, improving models etc. It should also be done in a way 
that is legal.
What do you think?
Barry



More information about the Development mailing list