[OTDev] Significant milestone reached -- MLR model training

Vedrin Jeliazkov vedrin.jeliazkov at gmail.com
Fri Jan 1 18:44:37 CET 2010


Hi Pantelis,

2009/12/31 chung <chvng at mail.ntua.gr>:

>  The command:
>
> curl -v http://opentox.ntua.gr:3000/model/20761/predicted
>
> returns the predicted uri:
>
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%
> 2Fsomeserver.com%2Ffeature%2F101Default
>
> But then it seems that this resource returns a status code 404 - not
> found. This URI was returned by a post on your feature creation service.
> Could you take a look at that?

I've just checked that the above mentioned URI returns code 200 and
the associated rdf when accessed either by curl, FF or IE:

D:\curl-7.19.6-ssl-sspi-zlib-static-bin-w32>curl -iv
http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%2Fs
omeserver.com%2Ffeature%2F101Default
* About to connect() to ambit.uni-plovdiv.bg port 8080 (#0)
*   Trying 194.141.27.28... connected
* Connected to ambit.uni-plovdiv.bg (194.141.27.28) port 8080 (#0)
> GET /ambit2/feature/http%3A%2F%2Fsomeserver.com%2Ffeature%2F101Default HTTP/1.1
> User-Agent: curl/7.19.6 (i386-pc-win32) libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3
> Host: ambit.uni-plovdiv.bg:8080
> Accept: */*
>
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< Server: Apache-Coyote/1.1
Server: Apache-Coyote/1.1
< Date: Fri, 01 Jan 2010 16:49:16 GMT
Date: Fri, 01 Jan 2010 16:49:16 GMT
< Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
< Accept-Ranges: bytes
Accept-Ranges: bytes
< Server: Restlet-Framework/2.0m6
Server: Restlet-Framework/2.0m6
< Content-Type: application/rdf+xml;charset=UTF-8
Content-Type: application/rdf+xml;charset=UTF-8
< Transfer-Encoding: chunked
Transfer-Encoding: chunked

<
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:ot="http://www.opentox.org/api/1.1#"
    xmlns:j.0="http://purl.org/net/nknouf/ns/bibtex#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >

[RDF contents skipped]

> Note that the same RDF representation for feature, when posted to your
> feature creation service, sometimes returns a valid URI for the created
> features while once every now and then the above URI is returned and no
> feature seems to be created. Nina made some amendments including the
> upgrade from Restlet m3 to m6 which improved the performance of the
> service (such issues have become much more rare) but it seems there is
> still a problem.

Yes indeed, there are some further problems that we still have to fix
(please see below), however they are not due to the restlet POST bug,
which we have hopefully solved with the upgrade from 2.0M3 to 2.0M6.

> Another issue is that curl
> http://opentox.ntua.gr:3000/model/20767/predicted returns the URI
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29065 which is not
> included in the uri-list at
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature
>
> Check out:
>
> curl -H 'Accept:text/uri-list'
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature | grep 29065

Well, you're both right and wrong. What happens here is that we have
defined a default maximum number of returned resources, which is
currently set to 100. The rationale is that we're trying to avoid
overloading our development server with queries which could return
unexpectedly large responses (e.g. consider the EINECS dataset, which
has more than 100000 records...). This default limit can be further
tuned by adding a max=<some number> parameter to the URI, which would
help retrieving the full uri-list in this particular case:

curl -H 'Accept:text/uri-list'
http://ambit.uni-plovdiv.bg:8080/ambit2/feature?max=100000 | grep
29065
http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29065

It might be worth mentioning that:

1) we have to better document this URI parameter;
2) we should perhaps consider applying such policy only to a subset of
URIs, in particular avoiding any limits for uri-lists;
3) we're planning to set up a (more scalable) production server by the
end of Feb 2010 and might revise this policy or remove it altogether
at that time;

> P.S. See the following for reference:
>
> A. Buggy:
> curl http://opentox.ntua.gr:3000/model/20761/predicted
> curl http://opentox.ntua.gr:3000/model/20763/predicted
> curl http://opentox.ntua.gr:3000/model/20764/predicted
> curl http://opentox.ntua.gr:3000/model/20765/predicted
>
> B. Correct:
> curl http://opentox.ntua.gr:3000/model/20766/predicted
> curl http://opentox.ntua.gr:3000/model/20767/predicted
> curl http://opentox.ntua.gr:3000/model/20760/predicted
> curl http://opentox.ntua.gr:3000/model/20762/predicted

The problem here is even more subtle. When our service receives a
feature POST request it first checks whether this particular feature
already exists in the database. In case that it exists it returns a
URI like those in the "correct" set, pointing to the existing feature
(e.g http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29064). In case
that the feature doesn't exist, than it creates it and returns a URI
like those in the "buggy" set (e.g.
http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%2Fsomeserver.com%2Ffeature%2F101Default).
In fact both URIs are correct (they point to the relevant resource),
however there's still one big problem. It consists in the fact that
all these above mentioned features probably should have been
recognized as identical (because they've been generated by identical
operations, run by SmokePing) and perhaps only one feature should have
been created in the database and returned to all subsequent POST
requests. So in this sense all of the above mentioned URIs could be
considered buggy. The difference is that for those from the second
set, some features have been found to be identical only by chance...

In order to solve this issue it would be very helpful if you could
send us an example RDF for the feature POST request you're sending
and/or the code that generates it.

Last but not least, it would be nice if you could put some relevant
value in the dc:title property. In cases when this value is absent (as
it is currently in your feature POST requests), we assign the RDF node
id as feature name. This is a bug that we're going to fix (we'll
assign the sameAs URI you're providing instead). However, for the user
interface it would be much better to have an appropriate dc:title
value.

Kind regards,
Vedrin



More information about the Development mailing list