[OTDev] Some things have to be clarified

Tue Feb 9 09:18:14 CET 2010

Hello All,

I would like to propose another way of handling large lists, applicable
to all OpenTox objects, independent of the amount of details in the
representation - paging of the results.

query parameters:  pagesize and page

Thus, we'll be able to request

First 10 models:
/model?page=1&pagesize=10

Compounds from #26 to #50 of /dataset/1
/dataset/1?page=2&pagesize=25

This is a convenient technique , used everywhere , and will greatly
simplify clients, which are now forced to handle the long lists and
paging themselves. It could be used together with query parameters and
any of the representation formats.

Best regards,
Nina

Nina Jeliazkova wrote:
> Hello Pantelis,
>
> chung wrote:
>   
>> Hi all,
>>  In the OpenTox API specifications it is documented that the GET method
>> on /model should return "List of model URIs or RDF representation".
>> However it is not clear what we mean by "RDF representation"; is it just
>> a URI list in RDF, or a set of model resources along with all/some of
>>   
>>     
> Just to clarify, there is no notion of URI list in RDF.  Recall RDF is
> not a format, but data model , so there are description of objects and
> their relationships.  URIs in RDF are identifiers of objects. Therefore,
> there are the following options:
> - provide only minimum information of the objects - its identifier
> (rdf:about)  and rdf:type
> - provide the above + a set (subset) of object properties.
>   
>> their meta data. In order to be more specific, the following meta
>> information are included in the RDF representation of a single model
>> (at /model/{id} ):
>>
>> 1. title
>> 2. creator
>> 3. date
>> 4. audience
>> 5. rights
>> 6. provenance
>> 7. description
>> 8. source
>> 9. relation
>> 10. identifier
>> 11. language
>> 12. publisher
>> 13. subject
>> 14. type
>> 15. A list of the algorithm tuning parameters used to train the model
>> (if any)
>> 16. A list of independent features   
>> 17. dependent feature
>> 18. predicted feature
>> 19. status (NEW)
>>   
>>     
> 20.algorithm
>
> I would say properties , defined in OpenTox.owl , describing relations
> to algorithms and features, plus status and title should be there,
> others are optional .
> ot:algorithm
> ot:dependentVariables
> ot:independentVariables
> ot:predictedVariables
> dc:title
>
> The first will enable querying by feature ontologies, title is for
> presenting something nice in the user interface.
>
>   
>> * Should all these meta (or some subset of these ) be available
>> under /model ?
>>
>> What is more if a client performs a very specific query like:
>>
>> http://opentox.ntua.gr:3000/model?user=chung&dependent_feature=feature1&independent_feature[]=feature10&independent_feature[]=feature11&independent_feature[]=feature12&kernel=RBF&gamma_min=1&gamma_max=10  (Let me mention that queries like this or even like ?algorithm={...} are not documented or included in the API)
>>
>>   
>>     
> Exactly, API doesn't specify queries so far for almost non of the objects.
>   
>> the server will return, say, 10 models, so its OK to include all meta
>> data in the RDF representation. But if you navigate to
>> http://opentox.ntua.gr:3000/model you'll find yourself in front of
>> something like 56000 models. The representation of all these including
>> all meta data will be HUGE!!! Is there a point to do this? The reasoning
>> behind the current API, and behind REST in general, is that a client can
>> get the information it needs assuming it knows where to find it. So one
>> can first ask for a URI list from the corresponding service and then,
>> for every URI in the list, perform another GET request to get the
>> representation of the underlying resource.
>>   
>>     
> There are RDF storage systems, that can handle million of triples.  If
> one wants to use RDF for its original purpose (reasoning/querying based
> on links between objects), then all of its data should be retrieved and
> available on a single place.
>
> >From my point of view, there is no reason to duplicate the functionality
> of uri-list in RDF. If a client would like to use uri-lists only, then
> there is text/uri-list for this purpose. The RDF representation should
> include all the information available.  For example, there is not much
> sense in registering only URI of a model in ontology service, since
> there will be no relations to other objects (properties, algorithms) and
> no reasoning/querying will be possible. Therefore, in order to do
> meaningful queries, the information of models should be retrieved from
> the model service , either by one requests or multiple ones.
>
> It might be reasonable to agree on queries syntax for models, as well
> for other objects.
>
> And finally, please note these 56000 models are not real ones, but
> generated automatically by simulated requests.  In reality there will
> never be such number of models in a single service.
> BTW, just for comparison, in ambit database there are  currently 48783
> features ,   268813 structures ,   4414311 feature values,  1666758 
> unique string values .  Don't be afraid of numbers :)
>
> RDF is indeed quite verbose, but it turns out there are tricks to reduce
> the size (credits to Egon).  These are worth another email, though.
>   
>> Well I think some things have to be clarified because after all we
>> encounter serious problems designing our services (the database and web
>> part of it) since the specifications are unclear(!).
>>   
>>     
> Yes, a moving target . I guess Fastox development this week will be a
> real test of the services working together.
>
> Regards,
> Nina
>   
>> BRs,
>> Pantelis Sopasakis
>> Charalampos Chomenides
>>
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>   
>>     
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>