[OTDev] Some Questions

Nina Jeliazkova nina at acad.bg
Mon Dec 21 14:43:09 CET 2009


Hi Pantelis,

> Yes, I totally agree on that but I did create features for which I
> specified the URI. Could you give an example of how could one create a
> new feature without specifying or suggesting its URI. What I do not
> understand is what should I write instead of:
>
> <rdf:Description
> rdf:about="http://ambit.uni-plovdiv.bg:8080/ambit2/feature/13001">    
> ...
>
>   
You could simply create an anonymous Feature node - via
createIndividual(Feature-class)
>>>> After several trials and errors, I finally managed to use an ambit
>>>> dataset to create MLR model, as specified here
>>>> https://opentox.ntua.gr/index.php?p=guide 
>>>>
>>>> It seems the NTUA algorithm service expects parameters dataset_uri
>>>> and target to be within the posted content, rather than in the URL
>>>> (my initial assumption).  Do we have this specified in the API ?  
>>>>         
>>> I think this is compliant with
>>> http://opentox.org/dev/apis/api-1.1/Model  (Is it?). I assume that
>>> the target is a parameter of the algorithm defined within the RDF
>>> representation of the algorithm. 
>>> These parameters are provided within the posted content (-d
>>> 'dataset_uri=...&target=... ).
>>>       
>> I have to check as well if it is compliant with the description from
>> http://opentox.org/dev/apis/api-1.1
>>
>>         Parameters are posted with a
>>         "Content-Type:application/x-www-form-urlencoded" HTTP header.
>>         Parameter names are typed in bold letters in the API
>>         definitions. Square brackets (e.g. compound_uris[]) indicate
>>         that a list of arguments is expected.
>>     
>>>> It would help with troubleshooting if in case of missing input the
>>>> service return client_error_bad_request with some explanation,
>>>> than internal server error (500). 
>>>>
>>>> Here is the successful call
>>>> 1) curl -X POST -d
>>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/30&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12913' http://opentox.ntua.gr:3000/algorithm/mlr
>>>>
>>>> The dataset itself is a copy of http://opentox.ntua.gr/ds.rdf,
>>>> created via POSTing its RDF/XMLrepresentation to
>>>> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/
>>>>         
>>> This request fails in the case of svm models on opentox.ntua.gr but
>>> it works fine on my localhost. I will deploy the latest version and
>>> I think this will fix any bugs.
>>>       
>> Yes, we haven't managed to create any other models than MLR via NTUA
>> service.
>>     
>>>> 2)Unsuccessful call  - here the dataset contains not only
>>>> numerical, but also string columns. 
>>>>
>>>> ambit:/home/nina# curl -X POST -d
>>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v
>>>> * About to connect() to opentox.ntua.gr port 3000 (#0)
>>>> *   Trying 147.102.82.32... connected
>>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
>>>>         
>>>>> POST /algorithm/mlr HTTP/1.1
>>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
>>>>>           
>>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
>>>>         
>>>>> Host: opentox.ntua.gr:3000
>>>>> Accept: */*
>>>>> Content-Length: 122
>>>>> Content-Type: application/x-www-form-urlencoded
>>>>>
>>>>>           
>>>> < HTTP/1.1 500 empty String
>>>> < Content-Type: text/html; charset=ISO-8859-1
>>>> < Content-Length: 284
>>>> < Date: Sun, 20 Dec 2009 23:09:53 GMT
>>>> < Server: Noelios-Restlet/2.0m3
>>>> < Connection: close
>>>> <
>>>> <html>
>>>> <head>
>>>>    <title>Status page</title>
>>>> </head>
>>>> <body>
>>>> <h3>empty String</h3><p>You can get technical details <a
>>>> href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1">here</a>.<br>
>>>> Please continue your visit at our <a href="/">home page</a>.
>>>> </p>
>>>> </body>
>>>> </html>
>>>> * Closing connection #0
>>>>         
>>> The dataset http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6 does
>>> not contain purely numerical entries because they are declared to be
>>> of type xsd:string, so internally I handle these as strings, not as
>>> numbers. A modification of this dataset, changing these datatypes to
>>> xsd:double would fix this problem. However, I should return an
>>> explanatory message and a proper Status Code. 
>>>
>>>       
>> There is a mix of numeric and string entries.  I expected that you
>> might ignore the string entries (this is how I am going to proceed for
>> a clustering algorithm), but it might be better to return error code
>> indeed.
>>     
>>> The text/x-arff representations you provide include some string and
>>> numeric declarations for the features of the dataset. So I think we
>>> should do something like that in the RDF.
>>>       
>> Currently, I am using ,as you suggested , dc:type for the features
>> (see e.g. http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6) , but of
>> course we might decide to introduce something else.
>>
>>
>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11946>
>>       a       ot:Feature ;
>>       dc:type "http://www.w3.org/2001/XMLSchema#double" .
>>
>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11950>
>>       a       ot:Feature ;
>>       dc:type "http://www.w3.org/2001/XMLSchema#double" .
>>
>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11948>
>>       a       ot:Feature ;
>>       dc:type "http://www.w3.org/2001/XMLSchema#string" .
>>     
>
> I retrieve the type of each feature by picking an arbitrary value and
> check its datatype, so I have to change that. I agree that we might have
> to establish something better and more generic - maybe an extension of
> XSD types (for example http://www.w3.org/TR/xmlschema11-2/ , section 4 )
>
>
>
>   
Thanks, will have a look at that.
>>> RDF representations, structurally, contain much more
>>> (meta)information about the objects they describe than ARFFs, so
>>> this piece of information in the text/x-arff (the datatype of each
>>> feature) IMHO has to be included in the RDF or at least - in order
>>> not to modify the RDF standards we adopted in API 1.1 - we should
>>> use proper XSD datatypes for every value. After all, its not
>>> 1^^double, 1^^string and 1^^nominal is not the same and won't
>>> (shouldn't) be handled the same way by a training algorithm.
>>>       
>> Yes, especially for nominals, it would be better to introduce subclass
>> of Feature, rather than using XSD types for denoting the types.  I
>> might try to extend opentox.owl next days.
>>     
>>>> 3)  Unsuccessful call:
>>>> If the dataset URI contains query parameters (in this case
>>>> specifying to include only 3 numerical features),  I am not sure
>>>> if it is parsed correctly by the NTUA service, or feature_uris[]
>>>> parameter is perceived as a separate one to the dataset_uri
>>>> parameter. The entire dataset URI should read:
>>>>  'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951'
>>>>
>>>> The entire (unsuccessful) call :
>>>> ambit:/home/nina# curl -X POST -d
>>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v
>>>>
>>>> * About to connect() to opentox.ntua.gr port 3000 (#0)
>>>> *   Trying 147.102.82.32... connected
>>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
>>>>         
>>>>> POST /algorithm/mlr HTTP/1.1
>>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
>>>>>           
>>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
>>>>         
>>>>> Host: opentox.ntua.gr:3000
>>>>> Accept: */*
>>>>> Content-Length: 329
>>>>> Content-Type: application/x-www-form-urlencoded
>>>>>
>>>>>           
>>>> < HTTP/1.1 500 empty String
>>>>         
>>> I haven't implemented those feature_uris[]=... yet :-)
>>>       
>> But in this case feature_uris[] are parameters to the ambit dataset
>> call, not to the http://opentox.ntua.gr:3000/algorithm/mlr  - the
>> problem is there is no way to say the dataset_uri is this entire one
>>
>> dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951'
>>
>>     
>
> Yes, you are right. I think an acceptable solution would be to create a
> new dataset URI for that dataset or better parse the feature_uris
> parameters (as well as other dataset related parameters) in the
> algorithm service.
>   
The feature_uri parameters for the dataset service means only the
specified data columns will be transfered to the algorithm service. If
you parse feature_uris your selved, it will have slightly different meaning.

While in the dataset example above  it might be only related to
performance,  there are other cases as well. Let's say I have a
dataset-generating service searching for similar compounds - it would
not be possible to use it as a dataset entry for the algorithm service -
it doesn't make sense for algorithm service to parse similarity
parameters ...

http://ambit.uni-plovdiv.bg:8080/ambit2/query/similarity?search=c1ccccc1Oc2ccccc2&threshold=0.9

I do think we need a way to allow arbitrary URIs as parameter, otherwise
we are imposing more restrictions, than HTTP itself.

Best regards,
Nina
> Best regards,
> Pantelis
>
>   
>>>> 4)Unsuccessful call  (same as above, but with dataset_uri URL
>>>> encoded)
>>>>
>>>> ambit:/home/nina# curl -X POST -d 'dataset_uri=http%3A%2F%
>>>> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Fdataset%2F6%
>>>> 3Ffeature_uris%5B%5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%3A8080%
>>>> 2Fambit2%2Ffeature%2F11938%26feature_uris%5B%5D%3Dhttp%3A%2F%
>>>> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Ffeature%2F11947%
>>>> 26feature_uris%5B%5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%3A8080%
>>>> 2Fambit2%2Ffeature%
>>>> 2F11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v
>>>> * About to connect() to opentox.ntua.gr port 3000 (#0)
>>>> *   Trying 147.102.82.32... connected
>>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0)
>>>>         
>>>>> POST /algorithm/mlr HTTP/1.1
>>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2
>>>>>           
>>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18
>>>>         
>>>>> Host: opentox.ntua.gr:3000
>>>>> Accept: */*
>>>>> Content-Length: 409
>>>>> Content-Type: application/x-www-form-urlencoded
>>>>>
>>>>>           
>>>> < HTTP/1.1 500 For input string: "NC"
>>>>
>>>> Most important question so far is - is the way of specifying
>>>> parameters as asciii data content and using syntax like below
>>>> agreed and sufficient? 
>>>>         dataset_uri=aaaa&target=bbbbb  
>>>> Do the services expect these parameter values to be URL encoded - 
>>>>         
>>> As far as I know, you may use non-URL encoded parameters.
>>>       
>> Yes, but I could not I specify as a value to dataset_uri= the
>> following line, without feature_uris[] being perceived as parameters
>> of the algorithm service call?
>>
>> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951'
>>
>>     
>>>> otherwise it is impossible to use e.g. URIs with query parameters.
>>>>         
>>> I guess you can do that but I have to check this out.
>>>
>>>       
>> ok
>> Best regards,
>> Nina
>>     
>>> Best Regards,
>>> Pantelis
>>>       
>>>> Best regards,
>>>> Nina
>>>>
>>>>         
>>>>> Best Regards
>>>>> Pantelis
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>>> * When a client posts a dataset on a model to make a prediction, then
>>>>>>> the service generates a new dataset which (according to the API) should
>>>>>>> be posted to a dataset service. Is this operation available?
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> ambit services accept SDF datasets on POST currently, and RDF upload
>>>>>> will be available later today (if everything works right).
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> * How can I calculate a feature value for a certain compound URI? Is
>>>>>>> there an example (e.g. curl command)?
>>>>>>>
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> Perhaps we need "compound_uri" parameter for algorithm API, similar to 
>>>>>> Model API ?   
>>>>>>
>>>>>> AFAIK TUM are developing descriptor calculation service, it will make
>>>>>> sense to synchronize parameter names.
>>>>>>
>>>>>> Hope this helps,
>>>>>> Nina
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Best Regards,
>>>>>>> Pantelis
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Development mailing list
>>>>>>> Development at opentox.org
>>>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Development mailing list
>>>>>> Development at opentox.org
>>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>       
>>>>>           
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list