[OTDev] Task history

Fri Feb 26 14:29:06 CET 2010

Christoph Helma wrote:
> Excerpts from Nina Jeliazkova's message of Fri Feb 26 12:52:29 +0100 2010:
>   
>> Hi Surajit,
>>
>> surajit ray wrote:
>>     
>>> Hi Nina,
>>>
>>> The situation is :
>>>
>>> I am generating a task which does the following
>>>
>>> a) Read a molecule from the ambit website
>>> b) load the dictionary from a mysql table
>>> c) fingerprint the molecule using this dictionary
>>> d) load the R engine interface and RandomForest package
>>> e) send fingerprint to the Rserve instance and retrieve a prediction
>>>   
>>>       
>> From the outside world, it looks like a single task of generating
>> prediction for a given molecule/dataset, and the steps might only
>> reflect in errors/ error description thrown . 
>>
>> For example if MaxTox is used by ToxPredict, I would not like receiving
>> details how exactly the prediction is generated; error report if
>> something fails is sufficient.  For internal processing you might take
>> any reasonable approach, but this will not be reflected in the API.
>>     
>>> If I generate an atomic task for each step it would lead to a huge uri
>>> redirection route .. and also I have to have all the atomic tasks in memory
>>> (so they can redirect to the next step) . 
>>>       
>> Redirection make sense  if the processing tasks are on different and/or
>> remote services. For internal processing, it doesn't make sense indeed.
>>     
>>> Although in principle this may
>>> sound elegant - the code involved and the cost in terms memory
>>> for retaining atomic tasks for redirection will be too much to justify this
>>> simple set of steps. Also say if an atomic task fails it would be a hassle
>>> to chain DELETE all the preceding tasks.
>>>   
>>>       
>> No need to chain, it might be impossible, if tasks are on remote
>> services. The complete/failed tasks might just expire after certain time.
>>     
>>> I like Christoph's suggestion to have a tree for a task. But the question to
>>> what level within the tree the user can have control (in terms of DELETE)
>>> requests in case of failed tasks. Also threads may get locked in case the
>>> user accidentally deletes a subtask, whose result is being awaited by
>>> another thread. Right now I find it convenient to run a single main task -
>>> and maintain a history. Later we could modify the API to have a querying
>>> mechanism for a tree of tasks - to retrieve histories as well as
>>> intermediate results obtained before the failed subtask.
>>>   
>>>       
>> The problem of more complex structures rather than atomic tasks is that
>> indeed one dives into thread management, deadlocks . etc. In a
>> distributed setting this is a nightmare and certainly not the main topic
>> of this project.  Tasks, arranged as trees might be fine, if everything
>> runs on the same site, but if not , it seems like additional trouble. 
>> We should , however, extend the API to be able to cancel atomic tasks.
>>     
>
> I am not sure, if we can avoid it altogether (although I would like to -
> it is indedd a nightmare).
> Lets assume the following scenario from model validation:
>
> task: model validation (validation service)
> 	split dataset
> 	n times do
> 		task: create dataset (dataset service)
> 		wait_for_task
> 	n times do
> 		task: create model (algorithm service)
> 			task: parse input and create dataset (dataset service)
> 			wait_for_task
> 			task: create features (algorithm service)
> 				task: create feature dataset (dataset service)
> 				wait_for_task
> 			wait_for_task
> 			task: create model (algorithm service)
> 			wait_for_task
> 		task: predict test set (model service)
>     wait_for_task
> 	calculate statistics
> wait_for_task
> create report
>
> It involves at least 4 services (gets more complicated when modeling and
> feature calculation algorithms come from different services) and we
> cannot expect, that everything runs on the same machine. 
I agree, there are number of complex scenarios, and it is very good we
are discussing specific ones.

One such in ToxPredict is for finding if descriptors are required by a
model, calculating descriptors before launching a model , and finally
launching a model. A lot of the time of our integration efforts with TUM
was spent not on API inconsistencies itself, but because of some
intermediate step failing for weird reasons.
> Furthermore we
> cannot make any assumptions about calculation times (might take hours or
> even days for large datasets) - setting expiration times is not an
> option here.
>   
Well, I agree it might not be possible to estimate time for task
completion, but the user might have set some preferences , like doesn't
wanting to wait for calculations more than 10 minutes. Then if you are
at the "task: predict test set (model service)" ,and the timeout
happens, simply send DELETE for pending tasks and break out of the loop
. Will that work in validation scenario if we introduce task delete DELETE ?
> In my experience tasks tend to fail at the most unlikely places and for
> the stupidest reasons (e.g. exceeding database size limits, all kinds of
> timeouts, redland memory leaks, ...), so we need to have a mechanism to
> communicate errors (maybe also progress) back to parent tasks. Not sure
> about the best mechanism (I favor simplicity), but we should avoid
> one-way tickets. Suggestions are very welcome!
>   
It is the synchronization via call-backs  what I would like to avoid,
because it is an easy way into deadlocks.  The tasks should indeed
provide progress/error information via its GET interface, but it is the
client who is responsible to query them (a service can be of course
considered as a client to another service , e.g. validation service is a
client for algorithm service).  Do simple GETs on task URI work for the
scenario above? 

Errors communication would needs some more attention in the API, it
seems just HTTP codes might not be sufficient, for one can wrap multiple
failure reasons under HTTP Bad request code ... perhaps SOAP WS were
inventing error status for a reason...

BTW, would be good to have the validation scenario above into WP2 report :)

Best regards,
Nina
> Best regards,
> Christoph
>
>   
>> My approach up to now is to use language provided tools for thread
>> handling (e.g. java.concurrent package in our case) and don't introduce
>> thread management complexity into the API.   Workflows might be better
>> means for managing task sequences.
>>
>> Regards,
>> Nina
>>     
>>> Cheers
>>> Surajit
>>>
>>> On Fri, Feb 26, 2010 at 3:01 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>>>
>>>   
>>>       
>>>> surajit ray wrote:
>>>>     
>>>>         
>>>>> Hi Nina,
>>>>>
>>>>> Should we not have some sort of task history parameter/value in the Task
>>>>> API. Otherwise in cases where there are multiple steps to a single a task
>>>>>       
>>>>>           
>>>> -
>>>>     
>>>>         
>>>>> the user may not be able to see which step is failing .... or why...
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>> Well, the Task object was assumed to encapsulate an atomic job, which
>>>> does not consist of steps.   With the current redirection API , it is
>>>> quite easily to achieve series of tasks in a transparrent manner.  Asan
>>>> example, this is how currently TUM algorithm and model services work
>>>> 1) A dataset URI is posted to the Model service
>>>> 2) It returns Task URI at TUM service
>>>> 3) The TUM Model service runs the calculations, and posts the results
>>>> into IDEA dataset service.  When querying TUM Task service for the Task
>>>> URI from step2, it redirects (303) to IDEA Task service and  it returns
>>>> a new Task URI on IDEA server.
>>>> 4) Subsequent GETs on IDEA Task URI service will return OK 200 if the
>>>> results are stored into the database.
>>>>
>>>> This is a very elegant approach of automatic workflow by using HTTP
>>>> redirects and is not restricted to tasks on a single server.   Most
>>>> interesting part is we have not designed it intentionally, it just
>>>> happened by using REST style and proper HTTP codes.
>>>>
>>>> We should strive to keep things as simple as possible.  TUM group might
>>>> be also willing to share their experience of arranging workflows arount
>>>> OpenTox services.
>>>>
>>>> Best regards,
>>>> Nina
>>>>
>>>> _______________________________________________
>>>> Development mailing list
>>>> Development at opentox.org
>>>> http://www.opentox.org/mailman/listinfo/development
>>>>
>>>>     
>>>>         
>>>
>>>   
>>>       
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>