[OTDev] ToxCreate integration of Ambit classification datasets

Wed Mar 23 14:06:23 CET 2011

> On 23 March 2011 14:25, Christoph Helma <helma at in-silico.ch> wrote:
> 
> > Nina,
> >
> > >
> > > If you look at /dataset/9  RDF  representation, there is ot:acceptValue
> >  in
> > > RDF representation, which lists possible values for the feature. This was
> > > agreed for API 1.1 and is in the opentox.owl , and is used by TUM/NTUA
> > > services as far as I know.
> > >
> > > <http://apps.ideaconsult.net:8080/ambit2/feature/21573>
> > >       a       ot:Feature , ot:NumericFeature , ot:NominalFeature ;
> > >       dc:creator "
> > http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html" ;
> > >       dc:title "Canc" ;
> > >       ot:acceptValue "3.0" , "1.0" ;
> > >       ot:hasSource <
> > >
> > http://apps.ideaconsult.net:8080/ambit2/dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf
> > >
> > > ;
> > >       ot:units "" ;
> > >       =       otee:Carcinogenicity .
> > >
> > > I would suggest modifying your implementation to use ot:acceptValue,
> > instead
> > > of regexp.
> >
> > Oops, I forgot about this one. Is ot:acceptValue ordered, i.e. can I
> > trust, that the first value indicates always true/active and the second
> > one false/inactive?
> >
> 
> No, no guarantee on that - the order is arbitrary , and the values could be
> more than two.  I could easily enforce an alphabetical order , if this would
> help.

No need for that, I can do that internally.
> 
> I guess you question is related to how the results are presented to the
> user, otherwise for the classification method it should not matter - correct
> ?

Yes, it is mainly for presenting results to the user, although I prefer
to have the correct (binary) assignments already in the model.
> 
> If the concern is the user interface indeed, I think we should introduce
> some additional property to represent some kind of toxicological meaning,
> independent of ot:acceptValue.  This will help user interface to show e.g.
> green/red flags (or more colors).   In the context of Bioclipse there was a
> suggestion of  ot:isToxic property,  but it needs a bit more thinking to
> handle the generic case of >2 classes.

Presently I would be happy with ot:isToxic binary classifications. For
multiple classes we might also have to distinguish between ordered (e.g.
weak, moderate, strong) and unordered (e.g. "Carcinogen",
"SkinSensitizer", ...) classes.

Best regards,
Christoph