[OTDev] Some Questions
Nina Jeliazkova nina at acad.bgMon Dec 21 14:43:09 CET 2009
- Previous message: [OTDev] Some Questions
- Next message: [OTDev] Some Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Pantelis, > Yes, I totally agree on that but I did create features for which I > specified the URI. Could you give an example of how could one create a > new feature without specifying or suggesting its URI. What I do not > understand is what should I write instead of: > > <rdf:Description > rdf:about="http://ambit.uni-plovdiv.bg:8080/ambit2/feature/13001"> > ... > > You could simply create an anonymous Feature node - via createIndividual(Feature-class) >>>> After several trials and errors, I finally managed to use an ambit >>>> dataset to create MLR model, as specified here >>>> https://opentox.ntua.gr/index.php?p=guide >>>> >>>> It seems the NTUA algorithm service expects parameters dataset_uri >>>> and target to be within the posted content, rather than in the URL >>>> (my initial assumption). Do we have this specified in the API ? >>>> >>> I think this is compliant with >>> http://opentox.org/dev/apis/api-1.1/Model (Is it?). I assume that >>> the target is a parameter of the algorithm defined within the RDF >>> representation of the algorithm. >>> These parameters are provided within the posted content (-d >>> 'dataset_uri=...&target=... ). >>> >> I have to check as well if it is compliant with the description from >> http://opentox.org/dev/apis/api-1.1 >> >> Parameters are posted with a >> "Content-Type:application/x-www-form-urlencoded" HTTP header. >> Parameter names are typed in bold letters in the API >> definitions. Square brackets (e.g. compound_uris[]) indicate >> that a list of arguments is expected. >> >>>> It would help with troubleshooting if in case of missing input the >>>> service return client_error_bad_request with some explanation, >>>> than internal server error (500). >>>> >>>> Here is the successful call >>>> 1) curl -X POST -d >>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/30&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12913' http://opentox.ntua.gr:3000/algorithm/mlr >>>> >>>> The dataset itself is a copy of http://opentox.ntua.gr/ds.rdf, >>>> created via POSTing its RDF/XMLrepresentation to >>>> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/ >>>> >>> This request fails in the case of svm models on opentox.ntua.gr but >>> it works fine on my localhost. I will deploy the latest version and >>> I think this will fix any bugs. >>> >> Yes, we haven't managed to create any other models than MLR via NTUA >> service. >> >>>> 2)Unsuccessful call - here the dataset contains not only >>>> numerical, but also string columns. >>>> >>>> ambit:/home/nina# curl -X POST -d >>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v >>>> * About to connect() to opentox.ntua.gr port 3000 (#0) >>>> * Trying 147.102.82.32... connected >>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0) >>>> >>>>> POST /algorithm/mlr HTTP/1.1 >>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2 >>>>> >>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18 >>>> >>>>> Host: opentox.ntua.gr:3000 >>>>> Accept: */* >>>>> Content-Length: 122 >>>>> Content-Type: application/x-www-form-urlencoded >>>>> >>>>> >>>> < HTTP/1.1 500 empty String >>>> < Content-Type: text/html; charset=ISO-8859-1 >>>> < Content-Length: 284 >>>> < Date: Sun, 20 Dec 2009 23:09:53 GMT >>>> < Server: Noelios-Restlet/2.0m3 >>>> < Connection: close >>>> < >>>> <html> >>>> <head> >>>> <title>Status page</title> >>>> </head> >>>> <body> >>>> <h3>empty String</h3><p>You can get technical details <a >>>> href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.5.1">here</a>.<br> >>>> Please continue your visit at our <a href="/">home page</a>. >>>> </p> >>>> </body> >>>> </html> >>>> * Closing connection #0 >>>> >>> The dataset http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6 does >>> not contain purely numerical entries because they are declared to be >>> of type xsd:string, so internally I handle these as strings, not as >>> numbers. A modification of this dataset, changing these datatypes to >>> xsd:double would fix this problem. However, I should return an >>> explanatory message and a proper Status Code. >>> >>> >> There is a mix of numeric and string entries. I expected that you >> might ignore the string entries (this is how I am going to proceed for >> a clustering algorithm), but it might be better to return error code >> indeed. >> >>> The text/x-arff representations you provide include some string and >>> numeric declarations for the features of the dataset. So I think we >>> should do something like that in the RDF. >>> >> Currently, I am using ,as you suggested , dc:type for the features >> (see e.g. http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6) , but of >> course we might decide to introduce something else. >> >> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11946> >> a ot:Feature ; >> dc:type "http://www.w3.org/2001/XMLSchema#double" . >> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11950> >> a ot:Feature ; >> dc:type "http://www.w3.org/2001/XMLSchema#double" . >> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11948> >> a ot:Feature ; >> dc:type "http://www.w3.org/2001/XMLSchema#string" . >> > > I retrieve the type of each feature by picking an arbitrary value and > check its datatype, so I have to change that. I agree that we might have > to establish something better and more generic - maybe an extension of > XSD types (for example http://www.w3.org/TR/xmlschema11-2/ , section 4 ) > > > > Thanks, will have a look at that. >>> RDF representations, structurally, contain much more >>> (meta)information about the objects they describe than ARFFs, so >>> this piece of information in the text/x-arff (the datatype of each >>> feature) IMHO has to be included in the RDF or at least - in order >>> not to modify the RDF standards we adopted in API 1.1 - we should >>> use proper XSD datatypes for every value. After all, its not >>> 1^^double, 1^^string and 1^^nominal is not the same and won't >>> (shouldn't) be handled the same way by a training algorithm. >>> >> Yes, especially for nominals, it would be better to introduce subclass >> of Feature, rather than using XSD types for denoting the types. I >> might try to extend opentox.owl next days. >> >>>> 3) Unsuccessful call: >>>> If the dataset URI contains query parameters (in this case >>>> specifying to include only 3 numerical features), I am not sure >>>> if it is parsed correctly by the NTUA service, or feature_uris[] >>>> parameter is perceived as a separate one to the dataset_uri >>>> parameter. The entire dataset URI should read: >>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' >>>> >>>> The entire (unsuccessful) call : >>>> ambit:/home/nina# curl -X POST -d >>>> 'dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v >>>> >>>> * About to connect() to opentox.ntua.gr port 3000 (#0) >>>> * Trying 147.102.82.32... connected >>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0) >>>> >>>>> POST /algorithm/mlr HTTP/1.1 >>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2 >>>>> >>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18 >>>> >>>>> Host: opentox.ntua.gr:3000 >>>>> Accept: */* >>>>> Content-Length: 329 >>>>> Content-Type: application/x-www-form-urlencoded >>>>> >>>>> >>>> < HTTP/1.1 500 empty String >>>> >>> I haven't implemented those feature_uris[]=... yet :-) >>> >> But in this case feature_uris[] are parameters to the ambit dataset >> call, not to the http://opentox.ntua.gr:3000/algorithm/mlr - the >> problem is there is no way to say the dataset_uri is this entire one >> >> dataset_uri=http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' >> >> > > Yes, you are right. I think an acceptable solution would be to create a > new dataset URI for that dataset or better parse the feature_uris > parameters (as well as other dataset related parameters) in the > algorithm service. > The feature_uri parameters for the dataset service means only the specified data columns will be transfered to the algorithm service. If you parse feature_uris your selved, it will have slightly different meaning. While in the dataset example above it might be only related to performance, there are other cases as well. Let's say I have a dataset-generating service searching for similar compounds - it would not be possible to use it as a dataset entry for the algorithm service - it doesn't make sense for algorithm service to parse similarity parameters ... http://ambit.uni-plovdiv.bg:8080/ambit2/query/similarity?search=c1ccccc1Oc2ccccc2&threshold=0.9 I do think we need a way to allow arbitrary URIs as parameter, otherwise we are imposing more restrictions, than HTTP itself. Best regards, Nina > Best regards, > Pantelis > > >>>> 4)Unsuccessful call (same as above, but with dataset_uri URL >>>> encoded) >>>> >>>> ambit:/home/nina# curl -X POST -d 'dataset_uri=http%3A%2F% >>>> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Fdataset%2F6% >>>> 3Ffeature_uris%5B%5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%3A8080% >>>> 2Fambit2%2Ffeature%2F11938%26feature_uris%5B%5D%3Dhttp%3A%2F% >>>> 2Fambit.uni-plovdiv.bg%3A8080%2Fambit2%2Ffeature%2F11947% >>>> 26feature_uris%5B%5D%3Dhttp%3A%2F%2Fambit.uni-plovdiv.bg%3A8080% >>>> 2Fambit2%2Ffeature% >>>> 2F11951&target=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' http://opentox.ntua.gr:3000/algorithm/mlr -v >>>> * About to connect() to opentox.ntua.gr port 3000 (#0) >>>> * Trying 147.102.82.32... connected >>>> * Connected to opentox.ntua.gr (147.102.82.32) port 3000 (#0) >>>> >>>>> POST /algorithm/mlr HTTP/1.1 >>>>> User-Agent: curl/7.18.2 (x86_64-pc-linux-gnu) libcurl/7.18.2 >>>>> >>>> OpenSSL/0.9.8g zlib/1.2.3.3 libidn/1.8 libssh2/0.18 >>>> >>>>> Host: opentox.ntua.gr:3000 >>>>> Accept: */* >>>>> Content-Length: 409 >>>>> Content-Type: application/x-www-form-urlencoded >>>>> >>>>> >>>> < HTTP/1.1 500 For input string: "NC" >>>> >>>> Most important question so far is - is the way of specifying >>>> parameters as asciii data content and using syntax like below >>>> agreed and sufficient? >>>> dataset_uri=aaaa&target=bbbbb >>>> Do the services expect these parameter values to be URL encoded - >>>> >>> As far as I know, you may use non-URL encoded parameters. >>> >> Yes, but I could not I specify as a value to dataset_uri= the >> following line, without feature_uris[] being perceived as parameters >> of the algorithm service call? >> >> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6?feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11938&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11947&feature_uris[]=http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11951' >> >> >>>> otherwise it is impossible to use e.g. URIs with query parameters. >>>> >>> I guess you can do that but I have to check this out. >>> >>> >> ok >> Best regards, >> Nina >> >>> Best Regards, >>> Pantelis >>> >>>> Best regards, >>>> Nina >>>> >>>> >>>>> Best Regards >>>>> Pantelis >>>>> >>>>> >>>>> >>>>> >>>>>>> * When a client posts a dataset on a model to make a prediction, then >>>>>>> the service generates a new dataset which (according to the API) should >>>>>>> be posted to a dataset service. Is this operation available? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> ambit services accept SDF datasets on POST currently, and RDF upload >>>>>> will be available later today (if everything works right). >>>>>> >>>>>> >>>>>> >>>>>>> * How can I calculate a feature value for a certain compound URI? Is >>>>>>> there an example (e.g. curl command)? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Perhaps we need "compound_uri" parameter for algorithm API, similar to >>>>>> Model API ? >>>>>> >>>>>> AFAIK TUM are developing descriptor calculation service, it will make >>>>>> sense to synchronize parameter names. >>>>>> >>>>>> Hope this helps, >>>>>> Nina >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Best Regards, >>>>>>> Pantelis >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Development mailing list >>>>>>> Development at opentox.org >>>>>>> http://www.opentox.org/mailman/listinfo/development >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Development mailing list >>>>>> Development at opentox.org >>>>>> http://www.opentox.org/mailman/listinfo/development >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> > > > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] Some Questions
- Next message: [OTDev] Some Questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list