[OTDev] descriptor recalculation

Tobias Girschick tobias.girschick at in.tum.de
Fri Apr 30 14:31:58 CEST 2010


Hi All,

a lot of conversation around here! :-)

> See my last reply to Christoph, as well as the API description at
> http://opentox.org/dev/apis/api-1.1/Algorithm  - this has been developed
> jointly with TUM and included in their fminer implementation.

Just to avoid misunderstandings. Our "fminer" is FTM or gSpan. Both can
be used with (ii) (see below).

> 
> Best regards,
> Nina
> Andreas Maunz wrote:
> > Christoph Helma wrote on 04/29/2010 11:33 PM:
> >>>> According to our API the model knows about ot.Algorithm and
> >>>> ot.IndependentVariables, but it would need to know the service to
> >>>> calculate independent variables.
> >>> It does actually - every feature (variable) has ot:hasSource, which
> >>> points to the service it has been generated from (e.g. descriptor
> >>> calculation one) - and this is what we use in ToxPredict.
> >>
> >> True, but that makes sense only for "simple" descriptor calculation
> >> algorithms (i.e. descriptors that are independent of the training
> >> activities, like phys-chem properties, substructures). If we use e.g.
> >> supervised graph mining techiques we need
> >>
> >> (i) an algorithm (model because it is algorithm applied to data?) that
> >> mines features in the training dataset and creates a feature dataset
> >> (e.g. fminer)
> >>
> >> (ii) a simple substructure matching algorithm that determines if the
> >> mined features are present in the compound to be predicted (e.g.
> >> OpenBabel Smarts matcher)
> >
> > We need this not just for crossvalidation, but also for single
> > predictions.
> > The feature set is not fixed, but depends on properties inherent to
> > the (training) data set.
> > However, once the features are calculated for the training dataset,
> > the matching service (ii) may be seen as an ordinary feature
> > calculation service:
> >
> > f_i(mol) = 1   if feature i occurs in mol,
> > f_i(mol) = 0   else.
> >
> > I.e. it takes the same role as any other feature calculation service.
> >
> >> My interpretation was, that ot:hasSource should point to the graph
> >> mining algorithm, but the model would need the substructure matcher for
> >> predictions. How should we handle this?
> >
> > Set ot:hasSource to service (ii)?

That is what we did, but we didn't have the chance for intensive testing
here. The service should be reachable under
http://opentox.informatik.tu-muenchen.de:8080/OpenTox/algorithm/FTM/{smile}
or 
http://opentox.informatik.tu-muenchen.de:8080/OpenTox/algorithm/gSpan/{smile}
with {smile} being the UTF-8 encoded SMILES string of the substructure.

Maybe we should also expose the service under a separate URI and allow
for the substructure SMILES to be a parameter?

Regards,
Tobias

> >
> > Regards
> > Andreas
> >
> >
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development


-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick




More information about the Development mailing list