[OTDev] TUM open questions

Fri Dec 4 09:46:19 CET 2009

Dear All,

in our yesterdays meeting some questions/unresolved issues came up. To
make it easier to discuss them later in the meeting I will give a short
overview:

(1) Could one of you (maybe Nina or Christoph) shortly repeat the
rationale behind the DataEntry in the RDF? (Will there be an API
"access")

(2) About the API: Is there (will there be) a Feature API (the current
state "obsolete with RDF" contains a lot of stuff from version 1.0, e.g.
feature_definitions).

(3) Don't we need a (REST) API to query the ontology? There is currently
no way to access the ontology via REST services. E.g. how do I (or the
GUI) get all the Algorithms (their URIs) for calculating
physico-chemical descriptors? We lost this functionality in 1.0->1.1
transition

(4) We propose to reintroduce one level of hierarchy to the algorithm
API to make a clearer statement about input and output of an POST
to /algorithm possible. We prefer to distinguish algorithms that learn a
model from algorithms that merely alter a dataset (adding or selecting
descriptors, ...). 
Description
Method
URI
Parameters
Result
Get URIs of
all available
learning
algorithms
GET
/algorithm/learning
-
List of
algorithm URIs
Get URIs of
all available
non-learning
algorithms
GET
/algorithm/.../{id}
-
List of
algorithm URIs
Get the
ontology
representation
of a learning
algorithm
GET 
/algorithm/learning/{id}
-
Algorithm
representation
in one of the
supported
MIME-types
Get the
ontology
representation
of a
non-learning
algorithm
GET
/algorithm/.../{id}
-
Algorithm
representation
in one of the
supported
MIME-types
Learn a model
with an
algorithm
(regression,
classification, clustering)
POST
/algorithm/learning/{id}
dataset_URI,
algorithm
parameters
specified by
service
provider
model URI of
the learned
model (or task
URI in case of
time consuming
computation)
Apply the
algorithm
POST
/algorithm/.../{id}
dataset_URI,
algorithm
parameters
specified by
service
provider
dataset URI
(or task URI
in case of
time consuming
computation)

(5) At the moment we see the workflow of predicting (applying a model)
like this
       1 - POST /model/3    dataset/1      (the dataset 1 may not have
all the necessary descriptors needed to apply the model)
       2 - ModelWS checks which descriptors need to be calculated
       3 - POST /algorithm/<calcDesc> dataset/1       -> dataset/1
       4 - calculate predicitons for dataset/1 based on model/3   
       5 - POST/PUT dataset/1
     This is fine. But in case we want to use the same test dataset
(dataset/1) with several models (e.g. same algo but different
parameters) we will have to recalculate the missing descriptors every
time. Could we add a method/algorithm/service that transfers the
features/descriptors from one (training) dataset to another (test)
dataset to avoid this? Does this make sense?

(6) Regarding the AlgorithmTypes.owl: Could you explain why
ClassificationEagerSingleTarget, ... are Individuals and not an
instantiation of it, like WekaJ48? Furthermore we feel that it would be
better called Multiple not Many, but this is a minor thing.

best Regards, 
Tobias

-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick