[OTDev] validation and reporting workflow

Mon Dec 7 14:02:20 CET 2009

Hi Nina, Martin,

On Mon, 2009-12-07 at 10:13 +0200, Nina Jeliazkova wrote:
> Hi Martin,
> 
> (apologies, I have to read older emails before replying to the more
> recent ones ...)
> 
> Martin Guetlein wrote: 
> > Hi Tobias, All,
> > 
> > On Fri, Dec 4, 2009 at 8:41 AM, Tobias Girschick
> > <tobias.girschick at in.tum.de> wrote:
> >   
> > > Hello Martin,
> > > 
> > > thanks for the visulization of the Validation and Reporting Workflows.
> > > It would be interesting to see the "API-Version" (e.g. sequence of curl
> > > calls) of the graphical overviews, too. This could also be helpful to
> > > check if the API in its current state is capable of handling the full
> > > validation and reporting.
> > >     
> > 
> > Thats a nice idea, I will add the curl calls.
> >   
> It would be good to have the same for ToxModel and Fastox as well. 
> > On Fri, Dec 4, 2009 at 8:55 AM, Tobias Girschick
> > <tobias.girschick at in.tum.de> wrote:
> >   
> > > Hello Martin,
> > > 
> > > another thing, that is not clear to me is that you write "The following
> > > chart illustrates the possible working process of validating an
> > > algorithm" (http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow)
> > > and further below you say the reports described are "reports for model
> > > validation".
> > > In my opinion, the OpenTox user usually will validate a model, not an
> > > algorithm. On the other hand, if you build "the same" (everything except
> > > algorithm identical) model with two or three different algorithms (or
> > > algorithm parameters), you can validate the algorithms (regarding this
> > > dataset/model).
> > >     
> > 
> > I'm not quit sure if I got your point right.
> > I use the term 'validate an algorithm' for the procedure 'use
> >   
> 
> > algorithm to build model on training set, make predictions on test
> > set, compare predictions to actual values'.
> > And the term 'validate a model' to 'make predictions on test set,
> > compare predictions to actual values'.
> > Both are of course possible with the validation webservice (I just
> > sketched the first case on the web page, because it is more
> > complicated, and it includes the second case).
> > 
> >   
> Very useful discussion. It highlights the fact the validation service
> is a client for Algorithm service (exactly the same way ToxModel user
> interface is a client to the Algorithm service).
> In this case it will make sense to have a common Algorithm API ,
> specifying e.g. dataset_uri, parameters, feature_uri-s , etc., which
> can be used by all clients to build a model.
> 
> Then the validation service will either 1)take an existing model, or
> 2)build a model using Algorithm service API , and then 3) use it for
> the validation procedures.
> 
> Along the same line of thoughts, the Validation service is also a
> client to the Model service,  using Model prediction API with various
> datasets (and of course doing more specific tasks as gathering
> statistics).
> Looking at the proposed workflow
> http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow , Model API so far seems to be sufficient, while Algorithm API neeeds to be clarified.  
> 
> Is it possible to fix Algorithm API parameters as in Martin example:
> 
> curl -X POST -d dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
>                -d prediction_feature="<prediction_feature>" \
>                -d <alg_param_key1>="<alg_param_val1>" \
>                -d <alg_param_key2>="<alg_param_val2>" \
>                 <algorithm_service>/algorithm/<algorithm_id>
>   -> <model_service>/model/<model_id>
> 
> Algorithm POST call parameters:
> 
> Training dataset:
> 1)dataset_uri="<dataset_service>/dataset/<train_dataset_id>"
> 2)Prediction feature(s)
> prediction_feature="uri to prediction features (might be >=1)" 
> 
> 3)I would add parameters for the independent variables as well
> independent_variables="uri to independent variables"
> 
> 4)algorithm parameters
> could be as proposed above
> <alg_param_key1>="<alg_param_val1>
> 
> I am not sure what's the best way for parameters, for example how to
> embed with the key/value representation Weka algorithm parameters in
> the form of 
> "-P -M10" , or MOPAC keywords "PM3 NOINTER NOMM BONDS MULLIK PRECISE
> GNORM=0.0" ?
> 
> 
> Other suggestions/comments? 

This is one of the reasons why we proposed to distinguish between
learning and "non-learning" algorithms. The API for learning algorithms
is accessed from very different services and a very generic and
unspecified API makes it hard to build structured workflows. And
obviously the validation (as also the ToxModel use case) needs
parameters like the dataset_uri and the prediction_feature for/from
every learning algorithm. The other parameters are different in every
learning algorithm.
I would propose to leave the Algorithm API as is for all non-learning
algorithms and introduce more structure for the learning algorithms, so
that e.g. the validation can be ensured.

best regards
Tobias

> 
> > If a developer wants to compare his new algorithm to others, he could
> > uses the 'validate an algorithm' command (with the new algorithm, as
> >   
> Then this is "compare an algorithm", not "validate an algorithm".  I
> am not sure the later term is generally accepted - perhaps Stefan /TUM
> group could clarify?
> 
> Best regards,
> Nina 
> > well as other algorithms, maybe on a range of data sets). Other
> > techniques like cross-validation are possible as well, of course.
> > 
> > If a developer has a model for a certain endpoint, he will use the
> > 'validate model' command.
> > Does that answer your question?
> > 
> > Regards,
> > Martin
> > 
> > 
> > 
> > 
> >   
> > > best Regards,
> > > Tobias
> > > 
> > > On Thu, 2009-12-03 at 18:35 +0100, Martin Guetlein wrote:
> > >     
> > > > Hello All,
> > > > 
> > > > as discussed in the virtual meeting yesterday, I prepared a web page
> > > > to give some insight into the validation and reporting services:
> > > > 
> > > > http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow
> > > > 
> > > > (You will find a link to this page on the validation api site as well.)
> > > > 
> > > > Comments and suggestions for improvement are highly appreciated.
> > > > 
> > > > Regards,
> > > > Martin
> > > > 
> > > >       
> > > --
> > > Dipl.-Bioinf. Tobias Girschick
> > > 
> > > Technische Universität München
> > > Institut für Informatik
> > > Lehrstuhl I12 - Bioinformatik
> > > Bolzmannstr. 3
> > > 85748 Garching b. München, Germany
> > > 
> > > Room: MI 01.09.042
> > > Phone: +49 (89) 289-18002
> > > Email: tobias.girschick at in.tum.de
> > > Web: http://wwwkramer.in.tum.de/girschick
> > > 
> > > _______________________________________________
> > > Development mailing list
> > > Development at opentox.org
> > > http://www.opentox.org/mailman/listinfo/development
> > > 
> > >     
> > 
> > 
> > 
> >   
> 

-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick