[OTDev] OWL-DL performance/scalability problems
Nina Jeliazkova jeliazkova.nina at gmail.comFri Sep 3 16:22:32 CEST 2010
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christoph, In Jena one can set options for the triple memory storage - e.g. ModelFactory.createOntologyModel( OntModelSpec) , http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html These differ in memory efficiency and reasoning capabilities. Perhaps Redland has something similar to use? The dataset size you are reporting seem rather small ; on the other hand in-memory storage has limits in any representation, just the boundaries are different. Nina On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch> wrote: > Dear all, > > I have been investigating several problems that we had with creating and > serving OWL-DL representations: > > - slow response > - gateway timeouts > - memory allocation problems > > Both problems depend of course on the size and complexity of the > datasets. Most problematic are datasets with tuples, here we run into > troubles even for medium sized datasets (several hundreds of compounds) > with less than 100 features. It took e.g. 20 minutes to create > http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow > larger, we may run into memory allocation problems. All of this can be > quite annoying, because > > - long running processes eat CPU time, slowing down other processes > - tasks may timeout before processes have finished > - users expect a response without getting them > - users get unpatient, restarting processes which slow down the sytem > even more > - memory allocation failures my crash the dataset service > - ... > > What is probably _not_ responsible: > > RDF/XML representation: Same problem for turtle, json, triples > Iteration over our internal data structures: Takes only 0.3% of the total > processing time > Redland libraries: I have tried another library (not too much choices in > Ruby), takes 5 times longer than with Redland. > > What _could_ be responsible: > > Wrong/inefficient OWL-DL representation: Can one of the OWL exports please > have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf? > > OpenTox OWL-DL/Triple representation: > > Symptoms: > Our internal representation ( > http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still keeping > redundant information for efficient searches), OWL-DL as RDF/XML needs 15M > (which is still 6.1M in Turtle) for the same information. > I have the impression that our OWL-DL does not scale well especially for > Tuples, here are some measured figures for the rdf/yaml ratio (maybe one of > the computer scientists can have a closer look): > > small dataset (85 compounds), 1 feature/compound: 6.5 > medium dataset (580 compounds), 1 feature/compound: 7.4 > small dataset (85 compounds), 56 features as tuples: 32 > medium dataset (580 compounds), 55 features as tuples: 170 > > Possible solutions: > > Curing symptoms: > > Lazy generation/caching of OWL-DL representations: Implemented, you > might still get timeouts at the first request/have to wait a long time for > OWL-DL to finish, does not solve memory allocation problems > Use a persistent store instead of memory store: might solve memory > allocation problems, but will slow down things even further > Get more faster hardware > > Curing the cause (I am at loss here, please help): > > Tell us what goes wrong in our OWL-DLs > Improve scalability of OpenTox OWL-DL especially in respect to tuples (I > definitly need a method to represent "complex" features) > > IMHO it does not make much sense to proceed with further developments > until we have ressolved this substantial issue. I am looking forward to > hear your ideas! > > Best regards, > Christoph > > PS: Martin mentioned, that he has also experienced performance problems in > accessing the parsed OWL-DL datastructure (parsing the file seems to be ok) > - also for external (i.e. non IST/ALU) datasets. I have always blamed > Redland libraries, but maybe this is a related issue. > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development > -- Dr. Nina Jeliazkova Technical Manager 4 A.Kanchev str. IdeaConsult Ltd. 1000 Sofia, Bulgaria Phone: +359 886 802011
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list