[OTDev] Wrapper/Super Services and Descriptor Calculation

Christoph Helma helma at in-silico.ch
Thu Feb 3 15:08:35 CET 2011


> Hi Christoph, All,
> 
> On 26 January 2011 16:56, Christoph Helma <helma at in-silico.ch> wrote:
> 
> > Hi Pantelis, All,
> >
> > Your proposal for passing parameters seems to be generic and
> > straightforward - I would suggest to move it into API 1.2.
> >
> > Let me try to explain my conceptions about algorithms, models and
> > superservices once again to make it clearer and to avoid further
> > confusion. I will try to look at it from a client point of view without
> > caring about implementation details:
> >
> > Algorithms:
> >
> > Almost every algorithm depends on other algorithms (either through
> > library calls or by using external REST services). For this reason it
> > does not make much sense to separate "Superalgorithms" from algorithms
> > (I think we have agreed on that for API 1.2).
> >
> > For the ToxCreate and model validation use cases we need algorithms that
> > take
> >  - a training dataset (with optional parameters) as input and
> >  - provide a prediction model (more on its porperties below) as output.
> >
> > As a client I do not care if the "Supermodel" is a one trick pony (with
> > hardcoded sub-algorithms) or a generic workflow system as in your
> > proposal, as long as it creates a prediction model from a training
> > dataset. For this reason there will be no generic "Superalgorithm"
> > interface, model parameters and usage will have to be documented by the
> > service devlopers.
> >
> 
> OK, this actually leaves the door open for different implementations -
> either a "black box" superalgorithm with no internal details exposed, or a
> more generic  superalgorithm with configurable algorithms.

Yes, that was the intention.
> 
> 
> >
> > Models:
> >
> > For the ToxCreate and model validation use cases we need models that
> >  - take chemical structure(s) (without additional information) as input and
> >  - create a prediction dataset as output
> >  - are *immutable*, i.e. there should be no possibility to modifiy models
> > once they are created (everything else would invalidate validation results,
> > and would open possibilities for cheating))
> >
> > OK
> 
> 
> > A model can use a variety of algorithms (internal or through
> > webservices), it might use other models (e.g. consensus models) or
> > datasets (instance based predictions).  But as a client I do not want to
> > be bothered with these details (we store references to algorithms and
> > datasets in the model representation, but YMMV).
> 
> 
> Most QSAR developers/users would like to know details though.

Which can be delivered in the model representation.

> > All I need is a straightforward
> > interface with compound(s) as input and a dataset as output. Can we
> > agree on this interface for API 1.2?
> >
> 
> This is of course useful and we can agree this is the minimal requirement
> for a supermodel for API 1.2.
> 
> In addition I would prefer that the algorithm is transparent in what it is
> doing (well, to a certain extent), in keeping track of which algorithms from
> the OpenTox API are used internally, being able to address intermediate
> calculations via URIs (e.g. descriptors).  This will definitely help at
> least in generating QMRF reports.

I think this information should be part of the models metadata. 

> 
> Lazar may not be the most generic example for such workflow, as AFAIK, on
> one Lazar model there is only one feature generation algorithm used.

This is no longer true, the most recent lazar version has a fixed
(read-across like) prediction workflow:

search for similar compounds -> create local model with similar compound -> predict query compound with local model

But algorithms for similarity calculation (includes feature calculation)
and (local) model creation are freely configurable during model
creation.  So you can use e.g.  euclidean similarity with phys/chem
properties for similarity calculations and ANNs for local regression
models.

> Contrary, in descriptor based QSARs, there might be several descriptor
> calculation algorithms involved , as well as preprocessing algorithms and it
> is important to keep track of these .

> 
> > Pantelis: Your proposal seems to be focused on a generic (linear)
> > workflow implementation.
> 
> 
> 
> While it would be worthwile to have such an
> > implementation, I do not hink we have to specify workflow systems at the
> > API level.
> > (BTW: Parallel workflows (e.g. for creating consensus models) and
> > generic DAG workflows
> 
> 
> As the proposal actually describe the "materialized" run that resulted in a
> model, not the workflow description, it covers DAG workflows as well, as a
> single path within a directed acyclic graph is a linear one.
> 
> (for experimental/data analysis that
> > involves merging, splitting) could also be interesting).
> >
> >
> Well, to be generic, workflows may not only be unidirectional, but contain
> loops, forks, joins, etc. , but this will lead us to the land of workflow
> languages, which should be better left for specific client implementations.
> 
> To summarize my preference would be regardless of the superalgorithm
> implementation, to keep track which algorithms has been used (e.g.  to
> calculate features  or transform data in any way - via ot:hasSource , or via
> new properties, if necessary).


> 
> 
> Best regards,
> Nina
> 
> 
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development



More information about the Development mailing list