[OTDev] Task history

Fri Feb 26 12:52:29 CET 2010

Hi Surajit,

surajit ray wrote:
> Hi Nina,
>
> The situation is :
>
> I am generating a task which does the following
>
> a) Read a molecule from the ambit website
> b) load the dictionary from a mysql table
> c) fingerprint the molecule using this dictionary
> d) load the R engine interface and RandomForest package
> e) send fingerprint to the Rserve instance and retrieve a prediction
>   
>From the outside world, it looks like a single task of generating
prediction for a given molecule/dataset, and the steps might only
reflect in errors/ error description thrown . 

For example if MaxTox is used by ToxPredict, I would not like receiving
details how exactly the prediction is generated; error report if
something fails is sufficient.  For internal processing you might take
any reasonable approach, but this will not be reflected in the API.
> If I generate an atomic task for each step it would lead to a huge uri
> redirection route .. and also I have to have all the atomic tasks in memory
> (so they can redirect to the next step) . 
Redirection make sense  if the processing tasks are on different and/or
remote services. For internal processing, it doesn't make sense indeed.
> Although in principle this may
> sound elegant - the code involved and the cost in terms memory
> for retaining atomic tasks for redirection will be too much to justify this
> simple set of steps. Also say if an atomic task fails it would be a hassle
> to chain DELETE all the preceding tasks.
>   
No need to chain, it might be impossible, if tasks are on remote
services. The complete/failed tasks might just expire after certain time.
> I like Christoph's suggestion to have a tree for a task. But the question to
> what level within the tree the user can have control (in terms of DELETE)
> requests in case of failed tasks. Also threads may get locked in case the
> user accidentally deletes a subtask, whose result is being awaited by
> another thread. Right now I find it convenient to run a single main task -
> and maintain a history. Later we could modify the API to have a querying
> mechanism for a tree of tasks - to retrieve histories as well as
> intermediate results obtained before the failed subtask.
>   
The problem of more complex structures rather than atomic tasks is that
indeed one dives into thread management, deadlocks . etc. In a
distributed setting this is a nightmare and certainly not the main topic
of this project.  Tasks, arranged as trees might be fine, if everything
runs on the same site, but if not , it seems like additional trouble. 
We should , however, extend the API to be able to cancel atomic tasks.

My approach up to now is to use language provided tools for thread
handling (e.g. java.concurrent package in our case) and don't introduce
thread management complexity into the API.   Workflows might be better
means for managing task sequences.

Regards,
Nina
> Cheers
> Surajit
>
> On Fri, Feb 26, 2010 at 3:01 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>
>   
>> surajit ray wrote:
>>     
>>> Hi Nina,
>>>
>>> Should we not have some sort of task history parameter/value in the Task
>>> API. Otherwise in cases where there are multiple steps to a single a task
>>>       
>> -
>>     
>>> the user may not be able to see which step is failing .... or why...
>>>
>>>
>>>       
>> Well, the Task object was assumed to encapsulate an atomic job, which
>> does not consist of steps.   With the current redirection API , it is
>> quite easily to achieve series of tasks in a transparrent manner.  Asan
>> example, this is how currently TUM algorithm and model services work
>> 1) A dataset URI is posted to the Model service
>> 2) It returns Task URI at TUM service
>> 3) The TUM Model service runs the calculations, and posts the results
>> into IDEA dataset service.  When querying TUM Task service for the Task
>> URI from step2, it redirects (303) to IDEA Task service and  it returns
>> a new Task URI on IDEA server.
>> 4) Subsequent GETs on IDEA Task URI service will return OK 200 if the
>> results are stored into the database.
>>
>> This is a very elegant approach of automatic workflow by using HTTP
>> redirects and is not restricted to tasks on a single server.   Most
>> interesting part is we have not designed it intentionally, it just
>> happened by using REST style and proper HTTP codes.
>>
>> We should strive to keep things as simple as possible.  TUM group might
>> be also willing to share their experience of arranging workflows arount
>> OpenTox services.
>>
>> Best regards,
>> Nina
>>
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>>     
>
>
>
>