[OTDev] NTUA WebServices

Nina Jeliazkova jeliazkova.nina at gmail.com
Mon Aug 23 13:27:34 CEST 2010


Christoph,

On Mon, Aug 23, 2010 at 12:49 PM, Christoph Helma <helma at in-silico.ch>wrote:

> Excerpts from Nina Jeliazkova's message of Fri Aug 20 23:07:22 +0200 2010:
>
> > My fault for not being clear - the superservice will not build a
> > model,  it could only apply a model.  To build a model, just POST the
> > dataset and prediction feature to the algorithm uri directly.
>
> Ok, lets see if I understand correctly:
>
> To create a prediction model from scratch I would have to
>
> - create a dataset with structures and activities
> - calculate (and eventually select) descriptors using one of the feature
>  calculation (selection) algorithms
> - apply one of the modelling algorithms to create a prediction model
>
> To make predictions I would use the superservice:
>
> - create a dataset with structures to be predicted
> - submit the prediction dataset and the model to the superservice to
>  obtain a dataset with the predictions
>
> Is this correct?
>
>
Yes.


> To simplify this procedure we are using for our services the following
> convenience methods:
>
> Model creation:
>
>  curl -X POST -d dataset_uri={datset_uri} -d feature_uri={feature_uri} -d
> feature_generation_uri={feature_generation_uri} {model_algorithm_uri}
>  returns task URI for the prediction model, feaure_uri specifies the
> dependent variable
>  - calls feature_generation_algorithm for dataset
>  - creates prediction model from calculated descriptors and training
>    activities (in dataset)
>
>

This looks like "superservice" for model creation.

1) -d dataset_uri parameter is fine
2) -d feature_uri parameter is not documented and not used by any of IDEA,
TUM or NTUA partners, nor (AFAIK) in the API documentation
Instead, what is used is the features , which are inherent to the dataset
specified. This allows to have thousands of features.
3) The dependent variable, according to API should be under
prediction_feature={featureuris} parameter, not feature_uri (see the wiki
page for models).

4)feature_generation_uri is not specified anywhere in the API.  @ALL  please
tell your opinions.

Such parameter essentially makes every model a "super service" , which
should be able to care about descriptor calculations as well.  From point of
view of modularity  and task encapsulation I am not sure this is a good
idea.  However, it could be very useful to have a "superservice" for model
creation, which could take such parameters.


> I think this schema is rather generic as it allows to combine arbitrary
> modelling algorithms with any supervised and unsupervised feature generation
> algorithms. Additional parameters for modelling/feature generation
> algorithms will be forwarded to these services.
>
>
5) There are also additional _documented_ and implemented by IDEA, TUM and
NTUA parameters, namely "dataset_service" , which sets the dataset service,
where the prediction results should be stored (prediction and descriptor
calculation) .


Predictions:
>
> Predict a dataset (seems to be similar to superservice, but is included in
> the model service)
>
>  curl -X POST -d dataset_uri={dataset_uri} {model_uri}
>  returns task URI for prediction dataset
>  - calls feature_generation_algorithm for dataset
>  - uses model to create a prediction dataset
>
> Predict a compound (convenience method without storing a dataset)
>
>  curl -X POST -d compound_uri={compound_uri} {model_uri}
>  returns prediction as rdf/xml or yaml
>  - calls feature_generation_algorithm for compound
>  - uses model to create a prediction for compound
>
> Do you think we should unify? I would like to keep our methods, because
> I find them intuitive and handy, but can of course provide a
> superservice like interface.
>

I would like to keep things simple and not introduce descriptor calculation
facilities into models who are not aware of such.

We do have a documented API to comply with ... of course it could be
modified.

@ALL  - please let know our opinions.

Best regards,
Nina

>
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>



-- 

Dr. Nina Jeliazkova
Technical Manager
4 A.Kanchev str.
IdeaConsult Ltd.
1000 Sofia, Bulgaria
Phone: +359 886 802011



More information about the Development mailing list