[OTDev] NTUA WebServices
Nina Jeliazkova jeliazkova.nina at gmail.comMon Aug 23 15:17:00 CEST 2010
- Previous message: [OTDev] NTUA WebServices
- Next message: [OTDev] NTUA WebServices
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Aug 23, 2010 at 3:50 PM, Martin Guetlein < martin.guetlein at googlemail.com> wrote: > On Mon, Aug 23, 2010 at 1:27 PM, Nina Jeliazkova > <jeliazkova.nina at gmail.com>wrote: > > > Christoph, > > > > On Mon, Aug 23, 2010 at 12:49 PM, Christoph Helma <helma at in-silico.ch > > >wrote: > > > > > Excerpts from Nina Jeliazkova's message of Fri Aug 20 23:07:22 +0200 > > 2010: > > > > > > > My fault for not being clear - the superservice will not build a > > > > model, it could only apply a model. To build a model, just POST the > > > > dataset and prediction feature to the algorithm uri directly. > > > > > > Ok, lets see if I understand correctly: > > > > > > To create a prediction model from scratch I would have to > > > > > > - create a dataset with structures and activities > > > - calculate (and eventually select) descriptors using one of the > feature > > > calculation (selection) algorithms > > > - apply one of the modelling algorithms to create a prediction model > > > > > > To make predictions I would use the superservice: > > > > > > - create a dataset with structures to be predicted > > > - submit the prediction dataset and the model to the superservice to > > > obtain a dataset with the predictions > > > > > > Is this correct? > > > > > > > > Yes. > > > > > > > To simplify this procedure we are using for our services the following > > > convenience methods: > > > > > > Model creation: > > > > > > curl -X POST -d dataset_uri={datset_uri} -d feature_uri={feature_uri} > -d > > > feature_generation_uri={feature_generation_uri} {model_algorithm_uri} > > > returns task URI for the prediction model, feaure_uri specifies the > > > dependent variable > > > - calls feature_generation_algorithm for dataset > > > - creates prediction model from calculated descriptors and training > > > activities (in dataset) > > > > > > > > > > This looks like "superservice" for model creation. > > > > 1) -d dataset_uri parameter is fine > > 2) -d feature_uri parameter is not documented and not used by any of > IDEA, > > TUM or NTUA partners, nor (AFAIK) in the API documentation > > Instead, what is used is the features , which are inherent to the dataset > > specified. This allows to have thousands of features. > > 3) The dependent variable, according to API should be under > > prediction_feature={featureuris} parameter, not feature_uri (see the wiki > > page for models). > > > > 4)feature_generation_uri is not specified anywhere in the API. @ALL > > please > > tell your opinions. > > > > Such parameter essentially makes every model a "super service" , which > > should be able to care about descriptor calculations as well. From point > > of > > view of modularity and task encapsulation I am not sure this is a good > > idea. However, it could be very useful to have a "superservice" for > model > > creation, which could take such parameters. > > > > Hello Nina, Christoph, All, > > I think we had that discussion a while ago (see e.g. > http://www.opentox.org/pipermail/development/2010/d > validating 000653.html< > http://www.opentox.org/pipermail/development/2010/000653.html> > ). > Indeed. > I like the idea of models and algorithms to be able to handle datasets > without features (-> christoph's proposal). There are several disadvantages for calculating features on the fly: - this is not practical for any but simplest features. For example TUM implementation of CDK descriptors can run for hours on a moderately sized datasets (at least when we tested before Berlin meeting). The only reasonable way to overcome this is storing the calculated results and reuse when requested. This is what we do now. - One of the most important advantages of having linked RDF representation is to be able to provide links between data "columns" and the procedure that was used to generate that data. There is much talk about this currently at ACS RDF session in Boston (see http://egonw.github.com/acsrdf2010/) . OpenTox already has working support for this via features ot:hasSource predicate (this is how TUM, NTUA, IDEA calculations work and ToxPredict makes use of it.) If one is not using dereferencable features for descriptors/fragments, and calculates everything on the fly , this information is essentially lost. Therefore I would ask IST/ALU descriptor calculation and model services to use a feature service (their own or existing one). This will also solve the problem Andreas Maunz was mentioning in Oxford, on the need to generate fragments on each cross validation run. This is easily solved if you create one feature per fragment - effective allows to cache any substructure - and is how TUM fminer works. > But as far as I remember we > decided to use supermodels. > Yes. > Therefore, I would vote for using supermodels (and extend the > supermodel functionality to build models). > > It looks like as two different super services - one for creating models and one for prediction (currently existing). Any other thoughts? IMHO, the superservices should live as most close as possible to the dataset services, to avoid unnecessary data transfer. Best regards, Nina > Best regards, > Martin > > > > > > > > > > I think this schema is rather generic as it allows to combine arbitrary > > > modelling algorithms with any supervised and unsupervised feature > > generation > > > algorithms. Additional parameters for modelling/feature generation > > > algorithms will be forwarded to these services. > > > > > > > > 5) There are also additional _documented_ and implemented by IDEA, TUM > and > > NTUA parameters, namely "dataset_service" , which sets the dataset > service, > > where the prediction results should be stored (prediction and descriptor > > calculation) . > > > > > > Predictions: > > > > > > Predict a dataset (seems to be similar to superservice, but is included > > in > > > the model service) > > > > > > curl -X POST -d dataset_uri={dataset_uri} {model_uri} > > > returns task URI for prediction dataset > > > - calls feature_generation_algorithm for dataset > > > - uses model to create a prediction dataset > > > > > > Predict a compound (convenience method without storing a dataset) > > > > > > curl -X POST -d compound_uri={compound_uri} {model_uri} > > > returns prediction as rdf/xml or yaml > > > - calls feature_generation_algorithm for compound > > > - uses model to create a prediction for compound > > > > > > Do you think we should unify? I would like to keep our methods, because > > > I find them intuitive and handy, but can of course provide a > > > superservice like interface. > > > > > > > I would like to keep things simple and not introduce descriptor > calculation > > facilities into models who are not aware of such. > > > > We do have a documented API to comply with ... of course it could be > > modified. > > > > @ALL - please let know our opinions. > > > > Best regards, > > Nina > > > > > > > > Best regards, > > > Christoph > > > _______________________________________________ > > > Development mailing list > > > Development at opentox.org > > > http://www.opentox.org/mailman/listinfo/development > > > > > > > > > > > -- > > > > Dr. Nina Jeliazkova > > Technical Manager > > 4 A.Kanchev str. > > IdeaConsult Ltd. > > 1000 Sofia, Bulgaria > > Phone: +359 886 802011 > > _______________________________________________ > > Development mailing list > > Development at opentox.org > > http://www.opentox.org/mailman/listinfo/development > > > > > > -- > Dipl-Inf. Martin Gütlein > Phone: > +49 (0)761 203 8442 (office) > +49 (0)177 623 9499 (mobile) > Email: > guetlein at informatik.uni-freiburg.de > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development > -- Dr. Nina Jeliazkova Technical Manager 4 A.Kanchev str. IdeaConsult Ltd. 1000 Sofia, Bulgaria Phone: +359 886 802011
- Previous message: [OTDev] NTUA WebServices
- Next message: [OTDev] NTUA WebServices
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list