[OTDev] NTUA WebServices

Christoph Helma helma at in-silico.ch
Mon Aug 23 20:08:12 CEST 2010


Excerpts from Nina Jeliazkova's message of Mon Aug 23 19:36:04 +0200 2010:
> Christoph,
> 
> Thinking again, it seems to me there is a confusion here between feature
> generation and feature selection .

True. I had already written a lengthy reply to your last email only to
come to the same conclusion.

> The first is independent of the learning
> model and results can be cached safely, while the second is indeed not and
> is of course model /dataset specific.  And the confusion is because in BBRC
> there is apparently no clear split between phases of feature generation and
> feature selection.

This is of course its major advantage (supervised feature mining). My initial implementations separated both steps, but this works only for simple (linear) substructures.

> What I was trying to say in previous emails, generated features should be
> referred by dereferencable feature URIs, the content of these URIs having
> ot:Feature representation and pointing back to the generating algorithm via
> ot:hasSource property.  Then they are selected by feature selection
> procedure and this information goes into the model (via ot:independent
> parameter) and is later used by clients to figure out which algorithms to
> use for calculating descriptors.

This is how it works in our implementation, with the exception that the
model service (not the client) figures out how to calculate descriptors.

I will make feature URIs dereferencable in one of the next versions
(should be straightforward, current feature URIs are a temporary hack).
Might come back to you to ask how to represent features that depend on
training datasets.

> This allows to have model services completely independent from descriptor
> calculation ones, but nevertheless linked and transparent - and we do have
> such services.

Same here - lazar model service does not depend on fminer, it is just
the default setting. You can use in principle every descriptor
calculation service, since last friday also services that provide 
quantitative properties (e.g. phys/chem or a dataset service with
measured biological data (Toxcast ...)).

Best regards,
Christoph



More information about the Development mailing list