[OTDev] Some things have to be clarified

Mon Feb 8 20:25:04 CET 2010

Hello Pantelis,

chung wrote:
> Hi all,
>  In the OpenTox API specifications it is documented that the GET method
> on /model should return "List of model URIs or RDF representation".
> However it is not clear what we mean by "RDF representation"; is it just
> a URI list in RDF, or a set of model resources along with all/some of
>   
Just to clarify, there is no notion of URI list in RDF.  Recall RDF is
not a format, but data model , so there are description of objects and
their relationships.  URIs in RDF are identifiers of objects. Therefore,
there are the following options:
- provide only minimum information of the objects - its identifier
(rdf:about)  and rdf:type
- provide the above + a set (subset) of object properties.
> their meta data. In order to be more specific, the following meta
> information are included in the RDF representation of a single model
> (at /model/{id} ):
>
> 1. title
> 2. creator
> 3. date
> 4. audience
> 5. rights
> 6. provenance
> 7. description
> 8. source
> 9. relation
> 10. identifier
> 11. language
> 12. publisher
> 13. subject
> 14. type
> 15. A list of the algorithm tuning parameters used to train the model
> (if any)
> 16. A list of independent features   
> 17. dependent feature
> 18. predicted feature
> 19. status (NEW)
>   
20.algorithm

I would say properties , defined in OpenTox.owl , describing relations
to algorithms and features, plus status and title should be there,
others are optional .
ot:algorithm
ot:dependentVariables
ot:independentVariables
ot:predictedVariables
dc:title

The first will enable querying by feature ontologies, title is for
presenting something nice in the user interface.

> * Should all these meta (or some subset of these ) be available
> under /model ?
>
> What is more if a client performs a very specific query like:
>
> http://opentox.ntua.gr:3000/model?user=chung&dependent_feature=feature1&independent_feature[]=feature10&independent_feature[]=feature11&independent_feature[]=feature12&kernel=RBF&gamma_min=1&gamma_max=10  (Let me mention that queries like this or even like ?algorithm={...} are not documented or included in the API)
>
>   
Exactly, API doesn't specify queries so far for almost non of the objects.
> the server will return, say, 10 models, so its OK to include all meta
> data in the RDF representation. But if you navigate to
> http://opentox.ntua.gr:3000/model you'll find yourself in front of
> something like 56000 models. The representation of all these including
> all meta data will be HUGE!!! Is there a point to do this? The reasoning
> behind the current API, and behind REST in general, is that a client can
> get the information it needs assuming it knows where to find it. So one
> can first ask for a URI list from the corresponding service and then,
> for every URI in the list, perform another GET request to get the
> representation of the underlying resource.
>   
There are RDF storage systems, that can handle million of triples.  If
one wants to use RDF for its original purpose (reasoning/querying based
on links between objects), then all of its data should be retrieved and
available on a single place.

>From my point of view, there is no reason to duplicate the functionality
of uri-list in RDF. If a client would like to use uri-lists only, then
there is text/uri-list for this purpose. The RDF representation should
include all the information available.  For example, there is not much
sense in registering only URI of a model in ontology service, since
there will be no relations to other objects (properties, algorithms) and
no reasoning/querying will be possible. Therefore, in order to do
meaningful queries, the information of models should be retrieved from
the model service , either by one requests or multiple ones.

It might be reasonable to agree on queries syntax for models, as well
for other objects.

And finally, please note these 56000 models are not real ones, but
generated automatically by simulated requests.  In reality there will
never be such number of models in a single service.
BTW, just for comparison, in ambit database there are  currently 48783
features ,   268813 structures ,   4414311 feature values,  1666758 
unique string values .  Don't be afraid of numbers :)

RDF is indeed quite verbose, but it turns out there are tricks to reduce
the size (credits to Egon).  These are worth another email, though.
> Well I think some things have to be clarified because after all we
> encounter serious problems designing our services (the database and web
> part of it) since the specifications are unclear(!).
>   
Yes, a moving target . I guess Fastox development this week will be a
real test of the services working together.

Regards,
Nina
> BRs,
> Pantelis Sopasakis
> Charalampos Chomenides
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>