[OTDev] Wrapper/Super Services and Descriptor Calculation

Mon Jan 31 09:17:44 CET 2011

Hi Christoph, All,

On 26 January 2011 16:56, Christoph Helma <helma at in-silico.ch> wrote:

> Hi Pantelis, All,
>
> Your proposal for passing parameters seems to be generic and
> straightforward - I would suggest to move it into API 1.2.
>
> Let me try to explain my conceptions about algorithms, models and
> superservices once again to make it clearer and to avoid further
> confusion. I will try to look at it from a client point of view without
> caring about implementation details:
>
> Algorithms:
>
> Almost every algorithm depends on other algorithms (either through
> library calls or by using external REST services). For this reason it
> does not make much sense to separate "Superalgorithms" from algorithms
> (I think we have agreed on that for API 1.2).
>
> For the ToxCreate and model validation use cases we need algorithms that
> take
>  - a training dataset (with optional parameters) as input and
>  - provide a prediction model (more on its porperties below) as output.
>
> As a client I do not care if the "Supermodel" is a one trick pony (with
> hardcoded sub-algorithms) or a generic workflow system as in your
> proposal, as long as it creates a prediction model from a training
> dataset. For this reason there will be no generic "Superalgorithm"
> interface, model parameters and usage will have to be documented by the
> service devlopers.
>

OK, this actually leaves the door open for different implementations -
either a "black box" superalgorithm with no internal details exposed, or a
more generic  superalgorithm with configurable algorithms.

>
> Models:
>
> For the ToxCreate and model validation use cases we need models that
>  - take chemical structure(s) (without additional information) as input and
>  - create a prediction dataset as output
>  - are *immutable*, i.e. there should be no possibility to modifiy models
> once they are created (everything else would invalidate validation results,
> and would open possibilities for cheating))
>
> OK

> A model can use a variety of algorithms (internal or through
> webservices), it might use other models (e.g. consensus models) or
> datasets (instance based predictions).  But as a client I do not want to
> be bothered with these details (we store references to algorithms and
> datasets in the model representation, but YMMV).

Most QSAR developers/users would like to know details though.

> All I need is a straightforward
> interface with compound(s) as input and a dataset as output. Can we
> agree on this interface for API 1.2?
>

This is of course useful and we can agree this is the minimal requirement
for a supermodel for API 1.2.

In addition I would prefer that the algorithm is transparent in what it is
doing (well, to a certain extent), in keeping track of which algorithms from
the OpenTox API are used internally, being able to address intermediate
calculations via URIs (e.g. descriptors).  This will definitely help at
least in generating QMRF reports.

Lazar may not be the most generic example for such workflow, as AFAIK, on
one Lazar model there is only one feature generation algorithm used.
Contrary, in descriptor based QSARs, there might be several descriptor
calculation algorithms involved , as well as preprocessing algorithms and it
is important to keep track of these .

> Pantelis: Your proposal seems to be focused on a generic (linear)
> workflow implementation.

While it would be worthwile to have such an
> implementation, I do not hink we have to specify workflow systems at the
> API level.
> (BTW: Parallel workflows (e.g. for creating consensus models) and
> generic DAG workflows

As the proposal actually describe the "materialized" run that resulted in a
model, not the workflow description, it covers DAG workflows as well, as a
single path within a directed acyclic graph is a linear one.

(for experimental/data analysis that
> involves merging, splitting) could also be interesting).
>
>
Well, to be generic, workflows may not only be unidirectional, but contain
loops, forks, joins, etc. , but this will lead us to the land of workflow
languages, which should be better left for specific client implementations.

To summarize my preference would be regardless of the superalgorithm
implementation, to keep track which algorithms has been used (e.g.  to
calculate features  or transform data in any way - via ot:hasSource , or via
new properties, if necessary).

Best regards,
Nina

> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>