[OTDev] TUM open questions

Fri Dec 4 12:39:56 CET 2009

Dear All,

Hope there will still be time for OT partners to read my reply before
the meeting starts.

Christoph Helma wrote:
> Excerpts from Tobias Girschick's message of Fri Dec 04 09:46:19 +0100 2009:
>   
>> Dear All,
>>
>> in our yesterdays meeting some questions/unresolved issues came up. To
>> make it easier to discuss them later in the meeting I will give a short
>> overview:
>>
>> (1) Could one of you (maybe Nina or Christoph) shortly repeat the
>> rationale behind the DataEntry in the RDF? (Will there be an API
>> "access")
>>     
>
> Nina has explained her rationale in previous posts - I am not sure if I
> understand all of her arguments correctly.
>   
I'll try again very briefly and without formal syntax. 

To model a binary relationship, one uses

Subject-Predicate-Object

To model a ternary, or higher order relation, e.g. Compound - hasFeature
- Feature -withValue - Value , one uses intermediate objects to  group
binary relationships and then establish another binary relationship.
Thus, the example will become

Compound - hasFeature - [ Feature -hasValue - Value ] 
or

Compound - hasFeature - FeatureValue

where FeatureValue is  [ Feature -hasValue - Value ]

Modeling Dataset is similar. We have

Dataset hasMember [ Compound - hasFeature -  [Feature -hasValue - Value
] ] 

or
Dataset hasMember DataEntry
where
DataEntry is [ Compound - hasFeature -  FeatureValue]

All this resolves to triples:

FeatureValue: Feature - hasValue - Value.
DataEntry: Compound - hasFeature -  FeatureValue.
Dataset - hasMember-  DataEntry.

Pantelis yesterday proposed another option, but I didn't have time to
find out how it differ, perhaps he could explain in a similar way.

Alternatively, we can define Features as predicates as in Compound -
Feature -Value, but this will not allow later to refer to Feature as
Resource.

The component of the triples (S-P-O) should not be regarded as just
URIs, but as objects of particular type, which happen to have
names/addresses in the form of URI. 

The subject (S) and object (O) are resources (of type rdfs:Resource) and
the predicate (P) is of type rdfs:Property.  In OWL, there are derived types
owl:Class and owl:Property.  All these terms comes from the area of
predicate logic.  In OWL-Lite and OWL-DL (which have stricter rules and
where we can do reasoning) resources can not be predicates and vice
versa.  I have to check if it is allowed in OWL-Full, but the general
recommendation is not to use constructs from OWL-Full.

I think Ivelina already got lot of hints what needs to be further
explained in the Protege tutorial :)

>   
>> (2) About the API: Is there (will there be) a Feature API (the current
>> state "obsolete with RDF" contains a lot of stuff from version 1.0, e.g.
>> feature_definitions).
>>     
>
> I do not think, that we need a separate service for feature values, as
> these can be written as literals in the RDF - which is served through
> the dataset service. We need a service to look up features (of feature
> definitions in API 1.0) - this should be done through an ontology
> service (well established features are covered e.g. in blueobelisc, but
>   
 Find some ideas below.
> we need a mechanism for new developments. This can be done either through
> the ot: ontology or by the algorithms themself, fminer eg. will provide
> metadata for its features).
>
>   
I would propose having back   /feature/{featureid}  , returning
http://www.opentox.org/api/1.1#Feature 

and being able to use these uris as query arguments in e.g.
/dataset/1?feature_uri[]=/feature/1
>> (3) Don't we need a (REST) API to query the ontology?
>>     
>
> Yes. 
>
>   
Yes.  I have been playing today with some ideas.  (Disclaimer: not a
proposal yet, just a feasibility study)

Example of a (simple) REST interface to an ontology:

http://ambit.uni-plovdiv.bg:8080/ambit2/ontology?subject=SUBJECT_QUERY&predicate=PREDICATEQUERY&object=OBJECTQUERY

Example of using the above query to retrieve all electronic descriptors
from Blue Obelisk ontology

http://ambit.uni-plovdiv.bg:8080/ambit2/ontology?object=http%3A%2F%2Fwww.blueobelisk.org%2Fontologies%2Fchemoinformatics-algorithms%2F%23electronicDescriptor

Physicochemical effects from ECHA ontology

http://ambit.uni-plovdiv.bg:8080/ambit2/ontology?object=http%3A%2F%2Fwww.opentox.org%2FechaEndpoints.owl%23PhysicoChemicalEffects

It could be later extended with ability to merge these ontologies with
arbitrary RDF graph and then do query over the entire graph.   Current
implementation is very simple, just one Jena OntModel, holding all three
ontologies in memory.  It is straightforward to read other RDF graphs
from elsewhere into the same model.  Of course this to be scalable,
memory model should be replaced with something else.

There is a another illustration one can read and query arbitrary RDF
graf - RDF playground at
http://ambit.uni-plovdiv.bg:8080/ambit2-www/ontology/test
This  allows to read RDF output from any URL , displays statements and
verifies if there are objects from OpenTox name space  (i.e.
http://www.opentox.org/api/1.1# ).  

http://ambit.uni-plovdiv.bg:8080/ambit2/ontology/test?search=http%3A%2F%2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Fmodel

You could paste any URI in the search text box and verify if there are
OpenTox objects declared in the RDF.

To summarize:

A very quick possible implementations of Ontology service:
1) An ontology service, containing fixed ontologies (BO, toxicity
endpoints, algorithm types). This can be as simple as exposing RDF from
these under single URL, straightforward implementation; can be one or
multiple across the partners, provided the ontologies are the same.
2)Everything, which is dynamic (e.g. Features, Algorithms, Models) is
handled as REST services, as currently.
3) The lookup  (e.g. for algorithm type)  is two step
a) Query ontology services for particular type of algorithms  (simple
query or Sparql , user interface)
http://ambit.uni-plovdiv.bg:8080/ambit2/ontology?object=http%3A%2F%2Fwww.opentox.org%2Falgorithms.owl%23AlgorithmType

b) Find the desired algorithm type (could be by user interface, or
automatically)
http://ambit.uni-plovdiv.bg:8080/ambit2/ontology?subject=http%3A%2F%2Fwww.opentox.org%2Falgorithms.owl%23RegressionEagerSingleTarget

c)Query Algorithm service for algorithm instances, that are owl:sameAs
or opentox:isA with opentox:RegressionEagerSingleTarget . This is a very
easy query, not necessary to include any RDF processing, but it is up to
the implementation.  We would just need a simple query extension for
Algorithm API , as well as for Model and Feature.

More advanced implementation would be to gather everyhting from OT
services in a single ontology space, but IMHO it needs more thinking to
be designed right.

>> There is currently
>> no way to access the ontology via REST services. E.g. how do I (or the
>> GUI) get all the Algorithms (their URIs) for calculating
>> physico-chemical descriptors? We lost this functionality in 1.0->1.1
>> transition
>>
>> (4) We propose to reintroduce one level of hierarchy to the algorithm
>> API to make a clearer statement about input and output of an POST
>> to /algorithm possible. We prefer to distinguish algorithms that learn a
>> model from algorithms that merely alter a dataset (adding or selecting
>> descriptors, ...). 
>>     
>
> I do not think, that we have to expose that throught the API. Just ask
> the algorithm service for a RDF with metadata from your algorithms and
> you can make very flexible queries.
>
>   
I have a slight preference to have the distinction, these are just
different type of resources (may be subclasses in opentox.owl )?
>> (5) At the moment we see the workflow of predicting (applying a model)
>> like this
>>        1 - POST /model/3    dataset/1      (the dataset 1 may not have
>> all the necessary descriptors needed to apply the model)
>>        2 - ModelWS checks which descriptors need to be calculated
>>        3 - POST /algorithm/<calcDesc> dataset/1       -> dataset/1
>>        4 - calculate predicitons for dataset/1 based on model/3   
>>        5 - POST/PUT dataset/1
>>      This is fine. But in case we want to use the same test dataset
>> (dataset/1) with several models (e.g. same algo but different
>> parameters) we will have to recalculate the missing descriptors every
>> time. Could we add a method/algorithm/service that transfers the
>> features/descriptors from one (training) dataset to another (test)
>> dataset to avoid this? Does this make sense?
>>     
>
> I use the following workflow:
>
> POST /descriptor_calculation training_dataset									# creates feature_dataset
> POST /algorithm              training_dataset feature_dataset # creates model
> POST /model                  compound_uri                     # creates prediction
> or
> POST /model                  prediction_dataset               # creates dataset with predictions
>
> This is fairly straightforward and allows you to reuse/exchange descriptors.
>   
Yes, but straightforward implementation duplicates information
(training/feature datasets are not very much different).  
>> (6) Regarding the AlgorithmTypes.owl: Could you explain why
>> ClassificationEagerSingleTarget, ... are Individuals and not an
>> instantiation of it, like WekaJ48? Furthermore we feel that it would be
>> better called Multiple not Many, but this is a minor thing.
>>     
>
> Nina?
>   
Yes.  In fact I was playing with both options (subclass and
individuals).   The reason for individuals is if one needs to relate an
Algorithm with the property owl:sameAs , or opentox:isA , this could be
done only through individuals, not classes.  One can't say "Class_X
owl:sameAs Class_Y". 

There are also ways to use subclasses (as in the Protege OWL tutorial ),
perhaps Ivelina will include some examples in her presentation.  It was
just my perception that relationships via properties  owl:sameAs ,
ot:isA and individuals  are a better fit for our case .

Finally, I would like to tell using common namespace and declaring
correctly OpenTox objects when generating RDF is crucial.   The proposed
namespace is http://www.opentox.org/api/1.1#  , but this is of course
open.  Please declare objects your use with rdf:type and the relevant
class, otherwise we might again have incompatible applications.   (In
Jena one just needs to use getOntClass , see examples on site).

Best regards,
Nina
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>