[OTDev] Significant milestone reached -- MLR model training

chung chvng at mail.ntua.gr
Fri Jan 1 01:06:31 CET 2010


Hello Vedrin,

On Fri, 2010-01-01 at 19:44 +0200, Vedrin Jeliazkov wrote:
> Hi Pantelis,
> 
> 2009/12/31 chung <chvng at mail.ntua.gr>:
> 
> >  The command:
> >
> > curl -v http://opentox.ntua.gr:3000/model/20761/predicted
> >
> > returns the predicted uri:
> >
> > http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%
> > 2Fsomeserver.com%2Ffeature%2F101Default
> >
> > But then it seems that this resource returns a status code 404 - not
> > found. This URI was returned by a post on your feature creation service.
> > Could you take a look at that?
> 
> I've just checked that the above mentioned URI returns code 200 and
> the associated rdf when accessed either by curl, FF or IE:
> 
> D:\curl-7.19.6-ssl-sspi-zlib-static-bin-w32>curl -iv
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%2Fs
> omeserver.com%2Ffeature%2F101Default
> * About to connect() to ambit.uni-plovdiv.bg port 8080 (#0)
> *   Trying 194.141.27.28... connected
> * Connected to ambit.uni-plovdiv.bg (194.141.27.28) port 8080 (#0)
> > GET /ambit2/feature/http%3A%2F%2Fsomeserver.com%2Ffeature%2F101Default HTTP/1.1
> > User-Agent: curl/7.19.6 (i386-pc-win32) libcurl/7.19.6 OpenSSL/0.9.8k zlib/1.2.3
> > Host: ambit.uni-plovdiv.bg:8080
> > Accept: */*
> >
> < HTTP/1.1 200 OK
> HTTP/1.1 200 OK
> < Server: Apache-Coyote/1.1
> Server: Apache-Coyote/1.1
> < Date: Fri, 01 Jan 2010 16:49:16 GMT
> Date: Fri, 01 Jan 2010 16:49:16 GMT
> < Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
> Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept
> < Accept-Ranges: bytes
> Accept-Ranges: bytes
> < Server: Restlet-Framework/2.0m6
> Server: Restlet-Framework/2.0m6
> < Content-Type: application/rdf+xml;charset=UTF-8
> Content-Type: application/rdf+xml;charset=UTF-8
> < Transfer-Encoding: chunked
> Transfer-Encoding: chunked
> 
> <
> <rdf:RDF
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:ot="http://www.opentox.org/api/1.1#"
>     xmlns:j.0="http://purl.org/net/nknouf/ns/bibtex#"
>     xmlns:owl="http://www.w3.org/2002/07/owl#"
>     xmlns:dc="http://purl.org/dc/elements/1.1/"
>     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" >
> 
> [RDF contents skipped]

Sorry, that's true. Probably I mistyped the command!
> 
> > Note that the same RDF representation for feature, when posted to your
> > feature creation service, sometimes returns a valid URI for the created
> > features while once every now and then the above URI is returned and no
> > feature seems to be created. Nina made some amendments including the
> > upgrade from Restlet m3 to m6 which improved the performance of the
> > service (such issues have become much more rare) but it seems there is
> > still a problem.
> 
> Yes indeed, there are some further problems that we still have to fix
> (please see below), however they are not due to the restlet POST bug,
> which we have hopefully solved with the upgrade from 2.0M3 to 2.0M6.
> 
> > Another issue is that curl
> > http://opentox.ntua.gr:3000/model/20767/predicted returns the URI
> > http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29065 which is not
> > included in the uri-list at
> > http://ambit.uni-plovdiv.bg:8080/ambit2/feature
> >
> > Check out:
> >
> > curl -H 'Accept:text/uri-list'
> > http://ambit.uni-plovdiv.bg:8080/ambit2/feature | grep 29065
> 
> Well, you're both right and wrong. What happens here is that we have
> defined a default maximum number of returned resources, which is
> currently set to 100. The rationale is that we're trying to avoid
> overloading our development server with queries which could return
> unexpectedly large responses (e.g. consider the EINECS dataset, which
> has more than 100000 records...). This default limit can be further
> tuned by adding a max=<some number> parameter to the URI, which would
> help retrieving the full uri-list in this particular case:
> 
> curl -H 'Accept:text/uri-list'
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature?max=100000 | grep
> 29065
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29065
> 
> It might be worth mentioning that:
> 
> 1) we have to better document this URI parameter;
> 2) we should perhaps consider applying such policy only to a subset of
> URIs, in particular avoiding any limits for uri-lists;
> 3) we're planning to set up a (more scalable) production server by the
> end of Feb 2010 and might revise this policy or remove it altogether
> at that time;
> 

It's true, I didn't know the existence of this parameter.

> > P.S. See the following for reference:
> >
> > A. Buggy:
> > curl http://opentox.ntua.gr:3000/model/20761/predicted
> > curl http://opentox.ntua.gr:3000/model/20763/predicted
> > curl http://opentox.ntua.gr:3000/model/20764/predicted
> > curl http://opentox.ntua.gr:3000/model/20765/predicted
> >
> > B. Correct:
> > curl http://opentox.ntua.gr:3000/model/20766/predicted
> > curl http://opentox.ntua.gr:3000/model/20767/predicted
> > curl http://opentox.ntua.gr:3000/model/20760/predicted
> > curl http://opentox.ntua.gr:3000/model/20762/predicted
> 
> The problem here is even more subtle. When our service receives a
> feature POST request it first checks whether this particular feature
> already exists in the database. In case that it exists it returns a
> URI like those in the "correct" set, pointing to the existing feature
> (e.g http://ambit.uni-plovdiv.bg:8080/ambit2/feature/29064). In case
> that the feature doesn't exist, than it creates it and returns a URI
> like those in the "buggy" set (e.g.
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature/http%3A%2F%2Fsomeserver.com%2Ffeature%2F101Default).
> In fact both URIs are correct (they point to the relevant resource),
> however there's still one big problem. It consists in the fact that
> all these above mentioned features probably should have been
> recognized as identical (because they've been generated by identical
> operations, run by SmokePing) and perhaps only one feature should have
> been created in the database and returned to all subsequent POST
> requests. So in this sense all of the above mentioned URIs could be
> considered buggy. The difference is that for those from the second
> set, some features have been found to be identical only by chance...
> 
> In order to solve this issue it would be very helpful if you could
> send us an example RDF for the feature POST request you're sending
> and/or the code that generates it.

Here is the code that generates the feature representation and posts it
to your server:

    /**
     * Generates a new Feature and POSTs it to a feature service
     * @param sameAs Declares a same-as relationship between this
feature and some
     * other feature.
     * @param featureService Some feature service where the generated
feature should be stored.
     * @return The response of the feature service to the request for
feature creation.
     */
    public Response createNewFeature(String sameAs, URI featureService)
throws ResourceException, IOException{
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        Feature feature = new Feature();
        feature.createNewFeature(sameAs, out);
        Representation featureToPost = new
StringRepresentation(out.toString());
        featureToPost.setMediaType(MediaType.APPLICATION_RDF_XML);
        Client cli = new Client(Protocol.HTTP);
        int n_RETRY = 5, i = 0;
        boolean success = false;
        Response response = new Response(null);

        while (!success && i < n_RETRY){
             response = cli.post(featureService.toString(),
featureToPost);
             success = (response.getStatus().equals(Status.SUCCESS_OK));
             i++;
        }
        
        return response;
    }

I attach an example of such a feature.

> 
> Last but not least, it would be nice if you could put some relevant
> value in the dc:title property. In cases when this value is absent (as
> it is currently in your feature POST requests), we assign the RDF node
> id as feature name. This is a bug that we're going to fix (we'll
> assign the sameAs URI you're providing instead). However, for the user
> interface it would be much better to have an appropriate dc:title
> value.
> 

I think an acceptable solution would be to set the dc:title of the
predicted feature equal to the dc:title of the dependent variable of the
model because these features are highly affine with each other. In case
the dependent feature doesn't have a title or cannot be located, we can
name it according to the model for which it was generated, e.g.
<dc:title>model-123-predictedFeature</dc:title>.

Regards,
Pantelis

> Kind regards,
> Vedrin
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
> 

-------------- next part --------------
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:ot="http://www.opentox.org/api/1.1#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://purl.org/dc/elements/1.1/creator">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#AnnotationProperty"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://other.com/feature/200">
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#Feature"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://purl.org/dc/elements/1.1/identifier">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#AnnotationProperty"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://www.opentox.org/api/1.1#Feature">
    <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class"/>
  </rdf:Description>
  <rdf:Description rdf:nodeID="A0">
    <owl:sameAs rdf:resource="http://other.com/feature/200"/>
    <dc:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://opentox.ntua.gr/feature/10000</dc:identifier>
    <dc:creator rdf:datatype="http://www.w3.org/2001/XMLSchema#string">http://opentox.ntua.gr</dc:creator>
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#Feature"/>
  </rdf:Description>
</rdf:RDF>


More information about the Development mailing list