[OTDev] TUM open questions

Fri Dec 4 10:21:30 CET 2009

Excerpts from Tobias Girschick's message of Fri Dec 04 09:46:19 +0100 2009:
> Dear All,
> 
> in our yesterdays meeting some questions/unresolved issues came up. To
> make it easier to discuss them later in the meeting I will give a short
> overview:
> 
> (1) Could one of you (maybe Nina or Christoph) shortly repeat the
> rationale behind the DataEntry in the RDF? (Will there be an API
> "access")

Nina has explained her rationale in previous posts - I am not sure if I
understand all of her arguments correctly.

> (2) About the API: Is there (will there be) a Feature API (the current
> state "obsolete with RDF" contains a lot of stuff from version 1.0, e.g.
> feature_definitions).

I do not think, that we need a separate service for feature values, as
these can be written as literals in the RDF - which is served through
the dataset service. We need a service to look up features (of feature
definitions in API 1.0) - this should be done through an ontology
service (well established features are covered e.g. in blueobelisc, but
we need a mechanism for new developments. This can be done either through
the ot: ontology or by the algorithms themself, fminer eg. will provide
metadata for its features).

> 
> (3) Don't we need a (REST) API to query the ontology?

Yes. 

> There is currently
> no way to access the ontology via REST services. E.g. how do I (or the
> GUI) get all the Algorithms (their URIs) for calculating
> physico-chemical descriptors? We lost this functionality in 1.0->1.1
> transition
> 
> (4) We propose to reintroduce one level of hierarchy to the algorithm
> API to make a clearer statement about input and output of an POST
> to /algorithm possible. We prefer to distinguish algorithms that learn a
> model from algorithms that merely alter a dataset (adding or selecting
> descriptors, ...). 

I do not think, that we have to expose that throught the API. Just ask
the algorithm service for a RDF with metadata from your algorithms and
you can make very flexible queries.

> (5) At the moment we see the workflow of predicting (applying a model)
> like this
>        1 - POST /model/3    dataset/1      (the dataset 1 may not have
> all the necessary descriptors needed to apply the model)
>        2 - ModelWS checks which descriptors need to be calculated
>        3 - POST /algorithm/<calcDesc> dataset/1       -> dataset/1
>        4 - calculate predicitons for dataset/1 based on model/3   
>        5 - POST/PUT dataset/1
>      This is fine. But in case we want to use the same test dataset
> (dataset/1) with several models (e.g. same algo but different
> parameters) we will have to recalculate the missing descriptors every
> time. Could we add a method/algorithm/service that transfers the
> features/descriptors from one (training) dataset to another (test)
> dataset to avoid this? Does this make sense?

I use the following workflow:

POST /descriptor_calculation training_dataset									# creates feature_dataset
POST /algorithm              training_dataset feature_dataset # creates model
POST /model                  compound_uri                     # creates prediction
or
POST /model                  prediction_dataset               # creates dataset with predictions

This is fairly straightforward and allows you to reuse/exchange descriptors.
> 
> (6) Regarding the AlgorithmTypes.owl: Could you explain why
> ClassificationEagerSingleTarget, ... are Individuals and not an
> instantiation of it, like WekaJ48? Furthermore we feel that it would be
> better called Multiple not Many, but this is a minor thing.

Nina?

Best regards,
Christoph