[OTDev] OWL-DL performance/scalability problems
Christoph Helma helma at in-silico.chFri Sep 3 15:37:07 CEST 2010
- Previous message: [OTDev] How to upload a dataset ...
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear all, I have been investigating several problems that we had with creating and serving OWL-DL representations: - slow response - gateway timeouts - memory allocation problems Both problems depend of course on the size and complexity of the datasets. Most problematic are datasets with tuples, here we run into troubles even for medium sized datasets (several hundreds of compounds) with less than 100 features. It took e.g. 20 minutes to create http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow larger, we may run into memory allocation problems. All of this can be quite annoying, because - long running processes eat CPU time, slowing down other processes - tasks may timeout before processes have finished - users expect a response without getting them - users get unpatient, restarting processes which slow down the sytem even more - memory allocation failures my crash the dataset service - ... What is probably _not_ responsible: RDF/XML representation: Same problem for turtle, json, triples Iteration over our internal data structures: Takes only 0.3% of the total processing time Redland libraries: I have tried another library (not too much choices in Ruby), takes 5 times longer than with Redland. What _could_ be responsible: Wrong/inefficient OWL-DL representation: Can one of the OWL exports please have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf? OpenTox OWL-DL/Triple representation: Symptoms: Our internal representation (http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still keeping redundant information for efficient searches), OWL-DL as RDF/XML needs 15M (which is still 6.1M in Turtle) for the same information. I have the impression that our OWL-DL does not scale well especially for Tuples, here are some measured figures for the rdf/yaml ratio (maybe one of the computer scientists can have a closer look): small dataset (85 compounds), 1 feature/compound: 6.5 medium dataset (580 compounds), 1 feature/compound: 7.4 small dataset (85 compounds), 56 features as tuples: 32 medium dataset (580 compounds), 55 features as tuples: 170 Possible solutions: Curing symptoms: Lazy generation/caching of OWL-DL representations: Implemented, you might still get timeouts at the first request/have to wait a long time for OWL-DL to finish, does not solve memory allocation problems Use a persistent store instead of memory store: might solve memory allocation problems, but will slow down things even further Get more faster hardware Curing the cause (I am at loss here, please help): Tell us what goes wrong in our OWL-DLs Improve scalability of OpenTox OWL-DL especially in respect to tuples (I definitly need a method to represent "complex" features) IMHO it does not make much sense to proceed with further developments until we have ressolved this substantial issue. I am looking forward to hear your ideas! Best regards, Christoph PS: Martin mentioned, that he has also experienced performance problems in accessing the parsed OWL-DL datastructure (parsing the file seems to be ok) - also for external (i.e. non IST/ALU) datasets. I have always blamed Redland libraries, but maybe this is a related issue.
- Previous message: [OTDev] How to upload a dataset ...
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list