[OTDev] validation and reporting workflow
Tobias Girschick tobias.girschick at in.tum.deMon Dec 7 14:02:20 CET 2009
- Previous message: [OTDev] validation and reporting workflow
- Next message: [OTDev] validation and reporting workflow (algorithm API)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Nina, Martin, On Mon, 2009-12-07 at 10:13 +0200, Nina Jeliazkova wrote: > Hi Martin, > > (apologies, I have to read older emails before replying to the more > recent ones ...) > > Martin Guetlein wrote: > > Hi Tobias, All, > > > > On Fri, Dec 4, 2009 at 8:41 AM, Tobias Girschick > > <tobias.girschick at in.tum.de> wrote: > > > > > Hello Martin, > > > > > > thanks for the visulization of the Validation and Reporting Workflows. > > > It would be interesting to see the "API-Version" (e.g. sequence of curl > > > calls) of the graphical overviews, too. This could also be helpful to > > > check if the API in its current state is capable of handling the full > > > validation and reporting. > > > > > > > Thats a nice idea, I will add the curl calls. > > > It would be good to have the same for ToxModel and Fastox as well. > > On Fri, Dec 4, 2009 at 8:55 AM, Tobias Girschick > > <tobias.girschick at in.tum.de> wrote: > > > > > Hello Martin, > > > > > > another thing, that is not clear to me is that you write "The following > > > chart illustrates the possible working process of validating an > > > algorithm" (http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow) > > > and further below you say the reports described are "reports for model > > > validation". > > > In my opinion, the OpenTox user usually will validate a model, not an > > > algorithm. On the other hand, if you build "the same" (everything except > > > algorithm identical) model with two or three different algorithms (or > > > algorithm parameters), you can validate the algorithms (regarding this > > > dataset/model). > > > > > > > I'm not quit sure if I got your point right. > > I use the term 'validate an algorithm' for the procedure 'use > > > > > algorithm to build model on training set, make predictions on test > > set, compare predictions to actual values'. > > And the term 'validate a model' to 'make predictions on test set, > > compare predictions to actual values'. > > Both are of course possible with the validation webservice (I just > > sketched the first case on the web page, because it is more > > complicated, and it includes the second case). > > > > > Very useful discussion. It highlights the fact the validation service > is a client for Algorithm service (exactly the same way ToxModel user > interface is a client to the Algorithm service). > In this case it will make sense to have a common Algorithm API , > specifying e.g. dataset_uri, parameters, feature_uri-s , etc., which > can be used by all clients to build a model. > > Then the validation service will either 1)take an existing model, or > 2)build a model using Algorithm service API , and then 3) use it for > the validation procedures. > > Along the same line of thoughts, the Validation service is also a > client to the Model service, using Model prediction API with various > datasets (and of course doing more specific tasks as gathering > statistics). > Looking at the proposed workflow > http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow , Model API so far seems to be sufficient, while Algorithm API neeeds to be clarified. > > Is it possible to fix Algorithm API parameters as in Martin example: > > curl -X POST -d dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \ > -d prediction_feature="<prediction_feature>" \ > -d <alg_param_key1>="<alg_param_val1>" \ > -d <alg_param_key2>="<alg_param_val2>" \ > <algorithm_service>/algorithm/<algorithm_id> > -> <model_service>/model/<model_id> > > Algorithm POST call parameters: > > Training dataset: > 1)dataset_uri="<dataset_service>/dataset/<train_dataset_id>" > 2)Prediction feature(s) > prediction_feature="uri to prediction features (might be >=1)" > > 3)I would add parameters for the independent variables as well > independent_variables="uri to independent variables" > > 4)algorithm parameters > could be as proposed above > <alg_param_key1>="<alg_param_val1> > > I am not sure what's the best way for parameters, for example how to > embed with the key/value representation Weka algorithm parameters in > the form of > "-P -M10" , or MOPAC keywords "PM3 NOINTER NOMM BONDS MULLIK PRECISE > GNORM=0.0" ? > > > Other suggestions/comments? This is one of the reasons why we proposed to distinguish between learning and "non-learning" algorithms. The API for learning algorithms is accessed from very different services and a very generic and unspecified API makes it hard to build structured workflows. And obviously the validation (as also the ToxModel use case) needs parameters like the dataset_uri and the prediction_feature for/from every learning algorithm. The other parameters are different in every learning algorithm. I would propose to leave the Algorithm API as is for all non-learning algorithms and introduce more structure for the learning algorithms, so that e.g. the validation can be ensured. best regards Tobias > > > If a developer wants to compare his new algorithm to others, he could > > uses the 'validate an algorithm' command (with the new algorithm, as > > > Then this is "compare an algorithm", not "validate an algorithm". I > am not sure the later term is generally accepted - perhaps Stefan /TUM > group could clarify? > > Best regards, > Nina > > well as other algorithms, maybe on a range of data sets). Other > > techniques like cross-validation are possible as well, of course. > > > > If a developer has a model for a certain endpoint, he will use the > > 'validate model' command. > > Does that answer your question? > > > > Regards, > > Martin > > > > > > > > > > > > > best Regards, > > > Tobias > > > > > > On Thu, 2009-12-03 at 18:35 +0100, Martin Guetlein wrote: > > > > > > > Hello All, > > > > > > > > as discussed in the virtual meeting yesterday, I prepared a web page > > > > to give some insight into the validation and reporting services: > > > > > > > > http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow > > > > > > > > (You will find a link to this page on the validation api site as well.) > > > > > > > > Comments and suggestions for improvement are highly appreciated. > > > > > > > > Regards, > > > > Martin > > > > > > > > > > > -- > > > Dipl.-Bioinf. Tobias Girschick > > > > > > Technische Universität München > > > Institut für Informatik > > > Lehrstuhl I12 - Bioinformatik > > > Bolzmannstr. 3 > > > 85748 Garching b. München, Germany > > > > > > Room: MI 01.09.042 > > > Phone: +49 (89) 289-18002 > > > Email: tobias.girschick at in.tum.de > > > Web: http://wwwkramer.in.tum.de/girschick > > > > > > _______________________________________________ > > > Development mailing list > > > Development at opentox.org > > > http://www.opentox.org/mailman/listinfo/development > > > > > > > > > > > > > > > -- Dipl.-Bioinf. Tobias Girschick Technische Universität München Institut für Informatik Lehrstuhl I12 - Bioinformatik Bolzmannstr. 3 85748 Garching b. München, Germany Room: MI 01.09.042 Phone: +49 (89) 289-18002 Email: tobias.girschick at in.tum.de Web: http://wwwkramer.in.tum.de/girschick
- Previous message: [OTDev] validation and reporting workflow
- Next message: [OTDev] validation and reporting workflow (algorithm API)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list