[OTDev] Significant milestone reached -- MLR model training

Vedrin Jeliazkov vedrin.jeliazkov at gmail.com
Thu Dec 31 20:41:48 CET 2009


Hi Pantelis,

I've just finished a more thorough experiment, which involved running
10 concurrent instances of the following bash script:

#!/bin/bash

for i in `seq 1 1000`;

do
curl -i -X POST -d
'dataset_uri=http://opentox.ntua.gr/ds.rdf&target=http://sth.com/feature/1'
http://opentox.ntua.gr:3000/algorithm/mlr >> curl.log
done

Each of the script instances should have created 1000 models, so the
expected grand total is 10000. The total running time was
approximately 2329 seconds (between Thu, 31 Dec 2009 17:51:22 GMT and
Thu, 31 Dec 2009 18:30:11 GMT).

The results are as follows:

Response code           Number of instances

HTTP/1.1 200              9915
HTTP/1.1 400                   5
HTTP/1.1 500                   1
Empty response             79

The associated error reports for HTTP/1.1 400 are like the one below:

HTTP/1.1 400 The request could not be understood by the server due to
malformed syntax
Content-Type: text/plain
Date: Thu, 31 Dec 2009 17:58:49 GMT
Accept-Ranges: bytes
Server: Noelios-Restlet/2.0m3
Connection: close
Transfer-Encoding: chunked

Error Report.
TimeStamp: Thu Dec 31 12:58:49 EST 2009

Error #1
Exception Details: java.lang.NullPointerException
Explanation: Probably this exception is thrown because the dataset or
target uri you provided is not valid or some other internal server
error happened! Please verify that the target uri you specified is an
attribute of the dataset and is not of type 'string'!
For debugging reasons we provide a brief list of the exceptions:
- org.opentox.algorithm.trainer.MlrTrainer.train(MlrTrainer.java:81)
- org.opentox.resource.Algorithm.post(Algorithm.java:143)
- org.restlet.resource.ServerResource.doHandle(ServerResource.java:340)
- org.restlet.resource.ServerResource.doNegotiatedHandle(ServerResource.java:592)
- org.restlet.resource.ServerResource.doConditionalHandle(ServerResource.java:260)

The associated error report for HTTP/1.1 500 is:

HTTP/1.1 500 The server encountered an unexpected condition which
prevented it from fulfilling the request
Content-Type: text/plain
Date: Thu, 31 Dec 2009 18:15:34 GMT
Accept-Ranges: bytes
Server: Noelios-Restlet/2.0m3
Connection: close
Transfer-Encoding: chunked

Error Report.
TimeStamp: Thu Dec 31 13:15:34 EST 2009

Error #1
Exception Details: com.hp.hpl.jena.shared.PropertyNotFoundException:
http://www.opentox.org/api/1.1#value
Explanation: Severe Error while trying to build an MLR model.
For debugging reasons we provide a brief list of the exceptions:
- com.hp.hpl.jena.rdf.model.impl.ModelCom.getRequiredProperty(ModelCom.java:992)
- com.hp.hpl.jena.rdf.model.impl.ResourceImpl.getRequiredProperty(ResourceImpl.java:145)
- com.hp.hpl.jena.rdf.model.impl.StatementImpl.getProperty(StatementImpl.java:89)
- org.opentox.rdf.Dataset.getWekaDatasetForTraining(Dataset.java:295)
- org.opentox.algorithm.trainer.MlrTrainer.train(MlrTrainer.java:80)

In summary, three different problems are likely to exist. They seem to
be exhibited more often when the model service is under a bit heavier
load. These problems seem to be independent from the dataset and
feature services in use (in this particular case the dataset and
features were local for the service). The issues might have something
in common, however I can only guess what the real issue is. I'm also
attaching the full log in case you would like to compare it against
the model service log.

Hope this helps!

Kind regards,
Vedrin

2009/12/30 chung <chvng at mail.ntua.gr>:
> Hi Vedrin,
>  Could you please perform that scalability test with some concurrent
> requests, using the following command:
>
> curl -X POST -d
> 'dataset_uri=http://opentox.ntua.gr/ds.rdf&target=http://sth.com/feature/1' http://opentox.ntua.gr:3000/algorithm/mlr
>
> I would like to know if you get an empty reply from the server (Status
> Code 52) or if any other problems arise. I tried that at home using my
> low speed connection so I can't flood the server with requests.
>
> * We could use 'ab' to perform scalability tests; it provides some
> useful information. An example of use is:
>
> ab -n 50000 -c 5  http://147.102.82.32:3000/algorithm/mlr
>
> This will perform 50000 requests to the server (5 at the time)
>
> Best Regards,
> Pantelis
>
> On Thu, 2009-12-31 at 16:43 +0200, Vedrin Jeliazkov wrote:
>> Hi Pantelis, All,
>>
>> 2009/12/30 chung <chvng at mail.ntua.gr>:
>>
>> > When my RDF parser detects an empty string or a value that should be
>> > numeric but cannot be cast as such, it overrides it and considers of it
>> > to be missing. I'll try to fix that - I just need one more check in a
>> > critical step of the RDF parsing, so I think it wont be very difficult.
>> > The message "Can't Handle class attribute" (because the class attribute
>> > is of type String) appears because there is some non-numeric entry for
>> > the specified target feature which characterizes the whole feature as
>> > String.
>>
>> I just want to clarify that this error occurs (very) rarely and is
>> difficult to reproduce. It might be exhibited only under periods of
>> high load of the services (e.g. when they receive 10 consecutive
>> requests for model training), though I'm not sure whether high load is
>> the real trigger. However, it is important to keep in mind that the
>> error occurs with the same dataset/feature URIs, for which in the
>> majority of cases model training succeeds. So, perhaps, in this
>> particular case, it is not due to any persistent (source) data issues,
>> but has rather something to do with the way data is transmitted and/or
>> processed at lower levels. The POST under load issue that Nina has
>> solved with code migration to Noelios-Restlet/2.0m6 might be the
>> culprit (I've noticed you're still using Noelios-Restlet/2.0m3),
>> however this is just a (very) wild guess...
>>
>> > Many issues will be fixed if we establish datatype declarations
>> > for features.
>>
>> Agreed -- in any case this is something we should discuss and decide
>> upon during the next virtual meeting.
>>
>> Best regards,
>> Vedrin
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: curl.log.gz
Type: application/x-gzip
Size: 54888 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20091231/a6de202d/attachment.bin>


More information about the Development mailing list