[OTDev] validation and reporting workflow

Mon Dec 7 09:13:23 CET 2009

Hi Martin,

(apologies, I have to read older emails before replying to the more
recent ones ...)

Martin Guetlein wrote:
> Hi Tobias, All,
>
> On Fri, Dec 4, 2009 at 8:41 AM, Tobias Girschick
> <tobias.girschick at in.tum.de> wrote:
>   
>> Hello Martin,
>>
>> thanks for the visulization of the Validation and Reporting Workflows.
>> It would be interesting to see the "API-Version" (e.g. sequence of curl
>> calls) of the graphical overviews, too. This could also be helpful to
>> check if the API in its current state is capable of handling the full
>> validation and reporting.
>>     
>
> Thats a nice idea, I will add the curl calls.
>   
It would be good to have the same for ToxModel and Fastox as well.
> On Fri, Dec 4, 2009 at 8:55 AM, Tobias Girschick
> <tobias.girschick at in.tum.de> wrote:
>   
>> Hello Martin,
>>
>> another thing, that is not clear to me is that you write "The following
>> chart illustrates the possible working process of validating an
>> algorithm" (http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow)
>> and further below you say the reports described are "reports for model
>> validation".
>> In my opinion, the OpenTox user usually will validate a model, not an
>> algorithm. On the other hand, if you build "the same" (everything except
>> algorithm identical) model with two or three different algorithms (or
>> algorithm parameters), you can validate the algorithms (regarding this
>> dataset/model).
>>     
>
> I'm not quit sure if I got your point right.
> I use the term 'validate an algorithm' for the procedure 'use
>   

> algorithm to build model on training set, make predictions on test
> set, compare predictions to actual values'.
> And the term 'validate a model' to 'make predictions on test set,
> compare predictions to actual values'.
> Both are of course possible with the validation webservice (I just
> sketched the first case on the web page, because it is more
> complicated, and it includes the second case).
>
>   
Very useful discussion. It highlights the fact the validation service is
a client for Algorithm service (exactly the same way ToxModel user
interface is a client to the Algorithm service).
In this case it will make sense to have a common Algorithm API ,
specifying e.g. dataset_uri, parameters, feature_uri-s , etc., which can
be used by all clients to build a model.

Then the validation service will either 1)take an existing model, or
2)build a model using Algorithm service API , and then 3) use it for the
validation procedures.

Along the same line of thoughts, the Validation service is also a client
to the Model service,  using Model prediction API with various datasets
(and of course doing more specific tasks as gathering statistics).
Looking at the proposed workflow
http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow
, Model API so far seems to be sufficient, while Algorithm API neeeds to
be clarified. 

Is it possible to fix Algorithm API parameters as in Martin example:

curl -X POST -d dataset_uri="<dataset_service>/dataset/<train_dataset_id>" \
               -d prediction_feature="<prediction_feature>" \
               -d <alg_param_key1>="<alg_param_val1>" \
               -d <alg_param_key2>="<alg_param_val2>" \
                <algorithm_service>/algorithm/<algorithm_id>
  -> <model_service>/model/<model_id>

Algorithm POST call parameters:

Training dataset:
1)dataset_uri="<dataset_service>/dataset/<train_dataset_id>"
2)Prediction feature(s)
prediction_feature="uri to prediction features (might be >=1)"

3)I would add parameters for the independent variables as well
independent_variables="uri to independent variables"

4)algorithm parameters
could be as proposed above
<alg_param_key1>="<alg_param_val1>

I am not sure what's the best way for parameters, for example how to
embed with the key/value representation Weka algorithm parameters in the
form of
"-P -M10" , or MOPAC keywords "PM3 NOINTER NOMM BONDS MULLIK PRECISE
GNORM=0.0" ?

Other suggestions/comments?

> If a developer wants to compare his new algorithm to others, he could
> uses the 'validate an algorithm' command (with the new algorithm, as
>   
Then this is "compare an algorithm", not "validate an algorithm".  I am
not sure the later term is generally accepted - perhaps Stefan /TUM
group could clarify?

Best regards,
Nina
> well as other algorithms, maybe on a range of data sets). Other
> techniques like cross-validation are possible as well, of course.
>
> If a developer has a model for a certain endpoint, he will use the
> 'validate model' command.
> Does that answer your question?
>
> Regards,
> Martin
>
>
>
>
>   
>> best Regards,
>> Tobias
>>
>> On Thu, 2009-12-03 at 18:35 +0100, Martin Guetlein wrote:
>>     
>>> Hello All,
>>>
>>> as discussed in the virtual meeting yesterday, I prepared a web page
>>> to give some insight into the validation and reporting services:
>>>
>>> http://www.opentox.org/data/documents/development/validation/validation-and-reporting-overview-and-data-flow
>>>
>>> (You will find a link to this page on the validation api site as well.)
>>>
>>> Comments and suggestions for improvement are highly appreciated.
>>>
>>> Regards,
>>> Martin
>>>
>>>       
>> --
>> Dipl.-Bioinf. Tobias Girschick
>>
>> Technische Universität München
>> Institut für Informatik
>> Lehrstuhl I12 - Bioinformatik
>> Bolzmannstr. 3
>> 85748 Garching b. München, Germany
>>
>> Room: MI 01.09.042
>> Phone: +49 (89) 289-18002
>> Email: tobias.girschick at in.tum.de
>> Web: http://wwwkramer.in.tum.de/girschick
>>
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>>     
>
>
>
>