[OTDev] NTUA WebServices

Mon Aug 23 14:50:51 CEST 2010

On Mon, Aug 23, 2010 at 1:27 PM, Nina Jeliazkova
<jeliazkova.nina at gmail.com>wrote:

> Christoph,
>
> On Mon, Aug 23, 2010 at 12:49 PM, Christoph Helma <helma at in-silico.ch
> >wrote:
>
> > Excerpts from Nina Jeliazkova's message of Fri Aug 20 23:07:22 +0200
> 2010:
> >
> > > My fault for not being clear - the superservice will not build a
> > > model,  it could only apply a model.  To build a model, just POST the
> > > dataset and prediction feature to the algorithm uri directly.
> >
> > Ok, lets see if I understand correctly:
> >
> > To create a prediction model from scratch I would have to
> >
> > - create a dataset with structures and activities
> > - calculate (and eventually select) descriptors using one of the feature
> >  calculation (selection) algorithms
> > - apply one of the modelling algorithms to create a prediction model
> >
> > To make predictions I would use the superservice:
> >
> > - create a dataset with structures to be predicted
> > - submit the prediction dataset and the model to the superservice to
> >  obtain a dataset with the predictions
> >
> > Is this correct?
> >
> >
> Yes.
>
>
> > To simplify this procedure we are using for our services the following
> > convenience methods:
> >
> > Model creation:
> >
> >  curl -X POST -d dataset_uri={datset_uri} -d feature_uri={feature_uri} -d
> > feature_generation_uri={feature_generation_uri} {model_algorithm_uri}
> >  returns task URI for the prediction model, feaure_uri specifies the
> > dependent variable
> >  - calls feature_generation_algorithm for dataset
> >  - creates prediction model from calculated descriptors and training
> >    activities (in dataset)
> >
> >
>
> This looks like "superservice" for model creation.
>
> 1) -d dataset_uri parameter is fine
> 2) -d feature_uri parameter is not documented and not used by any of IDEA,
> TUM or NTUA partners, nor (AFAIK) in the API documentation
> Instead, what is used is the features , which are inherent to the dataset
> specified. This allows to have thousands of features.
> 3) The dependent variable, according to API should be under
> prediction_feature={featureuris} parameter, not feature_uri (see the wiki
> page for models).
>
> 4)feature_generation_uri is not specified anywhere in the API.  @ALL
>  please
> tell your opinions.
>
> Such parameter essentially makes every model a "super service" , which
> should be able to care about descriptor calculations as well.  From point
> of
> view of modularity  and task encapsulation I am not sure this is a good
> idea.  However, it could be very useful to have a "superservice" for model
> creation, which could take such parameters.
>

Hello Nina, Christoph, All,

I think we had that discussion a while ago (see e.g.
http://www.opentox.org/pipermail/development/2010/d
validating 000653.html<http://www.opentox.org/pipermail/development/2010/000653.html>
).
I like the idea of models and algorithms to be able to handle datasets
without features (-> christoph's proposal). But as far as I remember we
decided to use supermodels.
Therefore, I would vote for using supermodels (and extend the
supermodel functionality to build models).

Best regards,
Martin

>
>
> > I think this schema is rather generic as it allows to combine arbitrary
> > modelling algorithms with any supervised and unsupervised feature
> generation
> > algorithms. Additional parameters for modelling/feature generation
> > algorithms will be forwarded to these services.
> >
> >
> 5) There are also additional _documented_ and implemented by IDEA, TUM and
> NTUA parameters, namely "dataset_service" , which sets the dataset service,
> where the prediction results should be stored (prediction and descriptor
> calculation) .
>
>
> Predictions:
> >
> > Predict a dataset (seems to be similar to superservice, but is included
> in
> > the model service)
> >
> >  curl -X POST -d dataset_uri={dataset_uri} {model_uri}
> >  returns task URI for prediction dataset
> >  - calls feature_generation_algorithm for dataset
> >  - uses model to create a prediction dataset
> >
> > Predict a compound (convenience method without storing a dataset)
> >
> >  curl -X POST -d compound_uri={compound_uri} {model_uri}
> >  returns prediction as rdf/xml or yaml
> >  - calls feature_generation_algorithm for compound
> >  - uses model to create a prediction for compound
> >
> > Do you think we should unify? I would like to keep our methods, because
> > I find them intuitive and handy, but can of course provide a
> > superservice like interface.
> >
>
> I would like to keep things simple and not introduce descriptor calculation
> facilities into models who are not aware of such.
>
> We do have a documented API to comply with ... of course it could be
> modified.
>
> @ALL  - please let know our opinions.
>
> Best regards,
> Nina
>
> >
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
>
>
>
> --
>
> Dr. Nina Jeliazkova
> Technical Manager
> 4 A.Kanchev str.
> IdeaConsult Ltd.
> 1000 Sofia, Bulgaria
> Phone: +359 886 802011
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>

-- 
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 8442 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de