[OTDev] descriptor recalculation

Tue Apr 20 19:52:43 CEST 2010

Hi Tobias, All,

I am trying to think of API extension/change , necessary to include the
intermediate descriptor calculation service (preferably without making
it mandatory) ?

My suggestions:
1) For models and algorithms, there might be an (optional) parameter,
pointing to an URI of the new calculation service.  If the parameter is
missing and descriptor calculation is necessary, the model either
initiates the calculations itself, or returns an error.

2) The API of the new calculation service (is the 'recalculation' right
name for it - perhaps 'proxy calculation service' has a closer meaning ?) 

GET: RDF representation of ot:Algorithm object; with algorithm type as
in  3)

POST 
parameters:
dataset_uri - as usual
model_uri - the uri of the model
or (alternatively) algorithm_uris[] - list of descriptor calculation
algorithms

POST result -  returns URI of the dataset with calculated descriptors

3) eventually include a new type in Algorithms type ontology for the
proxy calculation service; and use it in the RDF representation of the
proxy service; thus one could find if such a service exist within the
list of algorithms.

Will this be sufficient ?  Not sure if I am not missing something
important, discussion welcome!

Best regards,
Nina

Tobias Girschick wrote:
> Hi Nina,
>
> the green and the black lines are two possibilities to go through the
> workflow. In the pdf the workflow has to be read from bottom to top
> (more or less). Everything starts with some prediction application (e.g.
> ToxPredict or a ValidationService,...) that needs descriptors to be
> recalculated for prediction. I added the third variant in red arrows and
> made 3 out of the one slide to make it easier readable.
>
> In version 1 (black) no descriptor recalculation service is needed and
> every model service has to delegate the descriptor recalculation to all
> descriptor calculation services.
> In version 2 (green) the descriptor recalculation service is called by
> the model service. The recalc service delegates the necessary descriptor
> calculations. In both cases the model service gets a dataset that has
> not all the descriptors needed to use the model for predicting the
> dataset.
> In version 3 (red) the descriptor recalculation service is called
> directly by the application, delegates the descriptor calculations at
> updates the dataset. This updated dataset is the submitted by the
> application itself to the model service.
>
> I hope this clarifies my rough sketch from last week.
>
> regards,
> Tobias
>
> On Tue, 2010-04-20 at 14:42 +0300, Nina Jeliazkova wrote: 
>   
>> Hi Tobias,
>>
>> Could you tell what's the difference between black and green lines in
>> your schema?
>>
>> I would suggest starting a new wiki page under API to discuss descriptor
>> calculator and its API.
>>
>> Best regards,
>> Nina
>>
>> Tobias Girschick wrote:
>>     
>>> Hi All,
>>>
>>> I attached one slide which illustrates the problem from my point of
>>> view. The green and the black lines are the two possibilities. Note that
>>> the "descriptor recalculator" has to be implemented only once (if it is
>>> generic). Otherwise, every new algorithm that learns models has to
>>> provide the whole functionality of calling all the different descriptor
>>> calculation services. 
>>>
>>> I think that wrapping the distribution to the different descriptor
>>> calculation services makes things a lot easier. 
>>>
>>> Just to again kick-off the discussion.
>>> regards,
>>> Tobias
>>>
>>>   
>>>       
>
>
>   
> ------------------------------------------------------------------------
>
> This body part will be downloaded on demand.