[OTDev] Some Questions

chung chvng at mail.ntua.gr
Mon Dec 21 11:01:23 CET 2009


Hi Nina,

On Mon, 2009-12-21 at 01:36 +0200, Nina Jeliazkova wrote:

> Dear Pantelis, All,
> 
> chung wrote: 
> 
> > On Fri, 2009-12-18 at 11:59 +0200, Nina Jeliazkova wrote:
> >   
> > 
> > > Dear Pantelis, All,
> > > 
> > > chung wrote:
> > >     
> > > 
> > > > However...
> > > > 
> > > > * When building a new model, the service has to generate a new feature
> > > > that will be used as a pointer to the predictions made by this model. Is
> > > > this operation implemented?
> > > > 
> > > >   
> > > >       
> > > 
> > > Provided we don't have currently API for creating new features (these
> > > are more or less embedded in datasets), there are two options I can
> > > think of:
> > > 
> > > 1) Extend the API to include Feature creation (independent of a
> > > dataset), e.g. /feature POST rdf-representation-of-a-feature  (similar
> > > to feature_definition POST in API 1.0)
> > > 
> > >     
> > 
> > 
> > The current version of the API (see
> > http://opentox.org/dev/apis/api-1.1/Feature ) includes that operation.
> > The client posts an RDF representation and creates a new feature URI.
> > 
> >   
> 
> OK.   This is now  implemented in
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature .   POST any RDF
> representation of a Feature object, as specified in opentox.owl with
> application/rdf+xml , or text/n3 content type to create  a new
> feature.
> 
> > > 2) Embed the RDF representation of the new feature into the RDF
> > > representation of the dataset, which is then POST-ed to a dataset service. 
> > > 
> > > e.g. if a feature is described with triples  and there is no feature
> > > URI, the dataset service will assume a new feature is to be created;
> > > 
> > >     
> > 
> > 
> > That's also a solution, but the embedded feature will not be globally
> > accessible. I think the first solution is better. 
> > 
> >   
> 
> I was assuming the dataset service will create a feature with globally
> accessible URI.   This also could be used with
> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset
> 
> To create a dataset, post an RDF representation of a dataset to
> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset with content-type
> application/rdf+xml , or text/n3. 
> 
> > > In fact the two options are not mutually exclusive, with the first more
> > > generic and the later requiring a single POST call to the dataset
> > > service, instead of two (one for feature creation  and one for posting a
> > > dataset).
> > > 
> > >     
> > 
> > 
> > Indeed in the first case we need two requests but the feature creation
> > will not be very time consuming and the Content-length the client has to
> > post is not that big, so I think the first option is not prohibitive. 
> > 
> >   
> > 
> > > Are there any preferences?
> > > 
> > >     
> > 
> > 
> > I'm in favor of the first solution. So if there are no objections I
> > think we should proceed this way (because we're running out of time...)
> > 
> >   
> 
> OK, please test any of the above.
> 


I managed to create some features on the ambit server using the
following curl command:

curl -v -i -X POST -H 'Content-type:application/rdf+xml' -d '
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"  
xmlns:ot="http://www.opentox.org/api/1.1#"    
xmlns:j.0="http://purl.org/net/nknouf/ns/bibtex#"    
xmlns:owl="http://www.w3.org/2002/07/owl#"     
xmlns:dc="http://purl.org/dc/elements/1.1/"     
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"     
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >    
<rdf:Description
rdf:about="http://www.opentox.org/api/1.1#hasSource">      
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#ObjectProperty"/>   
</rdf:Description>   
<rdf:Description rdf:about="http://www.opentox.org/api/1.1#units">     
<rdf:type
rdf:resource="http://www.w3.org/2002/07/owl#DatatypeProperty"/>   
</rdf:Description>    
<rdf:Description
rdf:about="http://ambit.uni-plovdiv.bg:8080/ambit2/reference/11889">    
<rdfs:seeAlso>http://sth.com/feature/0</rdfs:seeAlso>     
<dc:title>http://sth.com/feature/0</dc:title>      
<dc:identifier
rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://ambit.uni-plovdiv.bg:8080/ambit2/reference/11889</dc:identifier>   
<rdf:type
rdf:resource="http://purl.org/net/nknouf/ns/bibtex#Entry"/>    
</rdf:Description>   
<rdf:Description
rdf:about="http://www.opentox.org/api/1.1#Feature">     
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>   
</rdf:Description>   
<rdf:Description
rdf:about="http://purl.org/net/nknouf/ns/bibtex#Entry">     
<rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>   
</rdf:Description>   
<rdf:Description
rdf:about="http://ambit.uni-plovdiv.bg:8080/ambit2/feature/13001">     
<dc:type>http://www.w3.org/2001/XMLSchema#string</dc:type>     
<ot:hasSource
rdf:resource="http://ambit.uni-plovdiv.bg:8080/ambit2/reference/11889"/>     
<owl:sameAs>http://www.opentox.org/api/1.1#http://sth.com/feature/0</owl:sameAs>     
<ot:units></ot:units>     
<dc:title>http://sth.com/feature/0</dc:title>     
<dc:identifier
rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">http://ambit.uni-plovdiv.bg:8080/ambit2/feature/13001</dc:identifier>     
<rdf:type rdf:resource="http://www.opentox.org/api/1.1#Feature"/>   
</rdf:Description> </rdf:RDF>'
http://ambit.uni-plovdiv.bg:8080/ambit2/feature

This generates a feature with a name I specify myself (that is
http://ambit.uni-plovdiv.bg:8080/ambit2/feature/13001 ). Is it possible
that this name is
automatically assigned by the service to the created feature resource?


> 
> After several trials and errors, I finally managed to use an ambit
> dataset to create MLR model, as specified here
> https://opentox.ntua.gr/index.php?p=guide 
> 
> It seems the NTUA algorithm service expects parameters dataset_uri and
> target to be within the posted content, rather than in the URL (my
> initial assumption).  Do we have this specified in the API ?  


I think this is compliant with http://opentox.org/dev/apis/api-1.1/Model
(Is it?). I assume that the target is a parameter of the algorithm
defined within the RDF representation of the algorithm. 
These parameters are provided within the posted content (-d
'dataset_uri=...&target=... ).

> 
> It would help with troubleshooting if in case of missing input the
> service return client_error_bad_request with some explanation, than
> internal server error (500). 
> 
> Here is the successful call
> 1) curl -X POST -d
> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/30&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12913' http://opentox.ntua.gr:3000/algorithm/mlr
> 
> The dataset itself is a copy of http://opentox.ntua.gr/ds.rdf, created
> via POSTing its RDF/XMLrepresentation to
> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/


This request fails in the case of svm models on opentox.ntua.gr but it
works fine on my localhost. I will deploy the latest version and I think
this will fix any bugs.

> 
> 2)Unsuccessful call  - here the dataset contains not only numerical,
> but also string columns. 
> 
> ambit:/home/nina# curl -X POST -d
> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v
> * About to connect() to opentox.ntua.gr port 3000 (#0)
> *   Trying 147.102.82.32... connected
> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
> > POST /algorithm/mlr HTTP/1.1
> > User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
> > Host: opentox.ntua.gr:3000
> > Accept: */*
> > Content-Length: 122
> > Content-Type: application/x-www-form-urlencoded
> >
> < HTTP/1.1 500 empty String
> < Content-Type: text/html; charset=ISO-8859-1
> < Content-Length: 284
> < Date: Sun, 20 Dec 2009 23:09:53 GMT
> < Server: Noelios-Restlet/2.0m3
> < Connection: close
> <
> <html>
> <head>
>    <title>Status page</title>
> </head>
> <body>
> <h3>empty String</h3><p>You can get technical details <a
> href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1">here</a>.<br>
> Please continue your visit at our <a href="/">home page</a>.
> </p>
> </body>
> </html>
> * Closing connection #0


The dataset http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6 does not
contain purely numerical entries because they are declared to be of type
xsd:string, so internally I handle these as strings, not as numbers. A
modification of this dataset, changing these datatypes to xsd:double
would fix this problem. However, I should return an explanatory message
and a proper Status Code. 

The text/x-arff representations you provide include some string and
numeric declarations for the features of the dataset. So I think we
should do something like that in the RDF.

RDF representations, structurally, contain much more (meta)information
about the objects they describe than ARFFs, so this piece of information
in the text/x-arff (the datatype of each feature) IMHO has to be
included in the RDF or at least - in order not to modify the RDF
standards we adopted in API 1.1 - we should use proper XSD datatypes for
every value. After all, its not 1^^double, 1^^string and 1^^nominal is
not the same and won't (shouldn't) be handled the same way by a training
algorithm.

> 
> 3)  Unsuccessful call:
> If the dataset URI contains query parameters (in this case specifying
> to include only 3 numerical features),  I am not sure if it is parsed
> correctly by the NTUA service, or feature_uris[] parameter is
> perceived as a separate one to the dataset_uri parameter. The entire
> dataset URI should read:
>  'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951'
> 
> The entire (unsuccessful) call :
> ambit:/home/nina# curl -X POST -d
> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v
> 
> * About to connect() to opentox.ntua.gr port 3000 (#0)
> *   Trying 147.102.82.32... connected
> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
> > POST /algorithm/mlr HTTP/1.1
> > User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
> > Host: opentox.ntua.gr:3000
> > Accept: */*
> > Content-Length: 329
> > Content-Type: application/x-www-form-urlencoded
> >
> < HTTP/1.1 500 empty String


I haven't implemented those feature_uris[]=... yet :-)

> 
> 4)Unsuccessful call  (same as above, but with dataset_uri URL encoded)
> 
> ambit:/home/nina# curl -X POST -d 'dataset_uri=http%3A%2F%
> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Fdataset%2F6%3Ffeature_uris%5B
> %5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Ffeature%
> 2F11938%26feature_uris%5B%5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%
> 3A8080%2Fambit2%2Ffeature%2F11947%26feature_uris%5B%5D%3Dhttp%3A%2F%
> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Ffeature%
> 2F11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951'
> http://opentox.ntua.gr:3000/algorithm/mlr -v
> * About to connect() to opentox.ntua.gr port 3000 (#0)
> *   Trying 147.102.82.32... connected
> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
> > POST /algorithm/mlr HTTP/1.1
> > User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
> > Host: opentox.ntua.gr:3000
> > Accept: */*
> > Content-Length: 409
> > Content-Type: application/x-www-form-urlencoded
> >
> < HTTP/1.1 500 For input string: "NC"
> 
> Most important question so far is - is the way of specifying
> parameters as asciii data content and using syntax like below agreed
> and sufficient? 
> 
>         dataset_uri=aaaa&target=bbbbb  
> 
> Do the services expect these parameter values to be URL encoded - 


As far as I know, you may use non-URL encoded parameters.


> otherwise it is impossible to use e.g. URIs with query parameters.


I guess you can do that but I have to check this out.


Best Regards,
Pantelis

> 
> Best regards,
> Nina
> 
> 
> > Best Regards
> > Pantelis
> > 
> >   
> > 
> > > > * When a client posts a dataset on a model to make a prediction, then
> > > > the service generates a new dataset which (according to the API) should
> > > > be posted to a dataset service. Is this operation available?
> > > >   
> > > >       
> > > 
> > > ambit services accept SDF datasets on POST currently, and RDF upload
> > > will be available later today (if everything works right).
> > >     
> > > 
> > > > * How can I calculate a feature value for a certain compound URI? Is
> > > > there an example (e.g. curl command)?
> > > > 
> > > >   
> > > >       
> > > 
> > > Perhaps we need "compound_uri" parameter for algorithm API, similar to 
> > > Model API ?   
> > > 
> > > AFAIK TUM are developing descriptor calculation service, it will make
> > > sense to synchronize parameter names.
> > > 
> > > Hope this helps,
> > > Nina
> > > 
> > > 
> > >     
> > > 
> > > > Best Regards,
> > > > Pantelis
> > > > 
> > > > _______________________________________________
> > > > Development mailing list
> > > > Development at opentox.org
> > > > http://www.opentox.org/mailman/listinfo/development
> > > >   
> > > >       
> > > 
> > > _______________________________________________
> > > Development mailing list
> > > Development at opentox.org
> > > http://www.opentox.org/mailman/listinfo/development
> > > 
> > >     
> > 
> > 
> >   
> 
> 





More information about the Development mailing list