[OTDev] OWL-DL performance/scalability problems
Nina Jeliazkova jeliazkova.nina at gmail.comMon Sep 6 10:33:05 CEST 2010
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christoph, Are there options in Redland to setup prefixes in RDF ? Will looks like ... xmlns:dataset="http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112> " ... <ot:Dataset rdf:about="dataset/112"> instead of "http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112>/112" everywhere. Prefixes can be defined for all objects. Nina On Mon, Sep 6, 2010 at 11:24 AM, Christoph Helma <helma at in-silico.ch> wrote: > Dear all, > > Excerpts from Nina Jeliazkova's message of Fri Sep 03 16:22:32 +0200 2010: > > In Jena one can set options for the triple memory storage - e.g. > > > > ModelFactory.createOntologyModel( OntModelSpec) , > > > http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html > > > > These differ in memory efficiency and reasoning capabilities. Perhaps > > Redland has something similar to use? > > AFAIK Redland does not have such options (apart from choosing a triple > store) - reasoning is done by a separate library (Rasqual). > > > The dataset size you are reporting seem rather small ; on the other hand > > in-memory storage has limits in any representation, just the boundaries > are > > different. > > We can switch to another triple store, but that does not solve the > general scalability problem. > > Contrary to my initial assumptions I do not think that Redland libraries > are the cause for our problems. Based on my measurements I am pretty > convinced, that our OWL-DL representation does not scale well, > especially when it comes to complex features that require tuples > (computing times seem to correspond to the resulting file sizes). > > > > I have the impression that our OWL-DL does not scale well especially > for > > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe > one of > > > the computer scientists can have a closer look): > > > > > > small dataset (85 compounds), 1 feature/compound: 6.5 > > > medium dataset (580 compounds), 1 feature/compound: 7.4 > > > small dataset (85 compounds), 56 features as tuples: 32 > > > medium dataset (580 compounds), 55 features as tuples: 170 > > > > > I have also tried to switch to another library (RDF.rb), which did not > resolve the problem. > > So we are either making a mistake in our (IST/ALU) OWL-DL implementation > (any help is greatly appreciated - maybe the redundant representation of > features is the culprit) or our OpenTox OWL in general does not scale well > for > larger datasets (especially with complex features). > > If you want to have a look: > http://webservices.in-silico.ch/dataset/112 > (cached to save you the timeouts), Accept:application/x-yaml or > http://webservices.in-silico.ch/dataset/112.yaml gives you our internal > representation. > > Thanks! > Christoph > > > Christoph, > > > > > > Nina > > > > On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch> > wrote: > > > > > Dear all, > > > > > > I have been investigating several problems that we had with creating > and > > > serving OWL-DL representations: > > > > > > - slow response > > > - gateway timeouts > > > - memory allocation problems > > > > > > Both problems depend of course on the size and complexity of the > > > datasets. Most problematic are datasets with tuples, here we run into > > > troubles even for medium sized datasets (several hundreds of compounds) > > > with less than 100 features. It took e.g. 20 minutes to create > > > http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow > > > larger, we may run into memory allocation problems. All of this can be > > > quite annoying, because > > > > > > - long running processes eat CPU time, slowing down other processes > > > - tasks may timeout before processes have finished > > > - users expect a response without getting them > > > - users get unpatient, restarting processes which slow down the sytem > > > even more > > > - memory allocation failures my crash the dataset service > > > - ... > > > > > > What is probably _not_ responsible: > > > > > > RDF/XML representation: Same problem for turtle, json, triples > > > Iteration over our internal data structures: Takes only 0.3% of the > total > > > processing time > > > Redland libraries: I have tried another library (not too much choices > in > > > Ruby), takes 5 times longer than with Redland. > > > > > > What _could_ be responsible: > > > > > > Wrong/inefficient OWL-DL representation: Can one of the OWL exports > please > > > have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf? > > > > > > OpenTox OWL-DL/Triple representation: > > > > > > Symptoms: > > > Our internal representation ( > > > http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still > keeping > > > redundant information for efficient searches), OWL-DL as RDF/XML needs > 15M > > > (which is still 6.1M in Turtle) for the same information. > > > I have the impression that our OWL-DL does not scale well especially > for > > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe > one of > > > the computer scientists can have a closer look): > > > > > > small dataset (85 compounds), 1 feature/compound: 6.5 > > > medium dataset (580 compounds), 1 feature/compound: 7.4 > > > small dataset (85 compounds), 56 features as tuples: 32 > > > medium dataset (580 compounds), 55 features as tuples: 170 > > > > > > Possible solutions: > > > > > > Curing symptoms: > > > > > > Lazy generation/caching of OWL-DL representations: Implemented, you > > > might still get timeouts at the first request/have to wait a long time > for > > > OWL-DL to finish, does not solve memory allocation problems > > > Use a persistent store instead of memory store: might solve memory > > > allocation problems, but will slow down things even further > > > Get more faster hardware > > > > > > Curing the cause (I am at loss here, please help): > > > > > > Tell us what goes wrong in our OWL-DLs > > > Improve scalability of OpenTox OWL-DL especially in respect to > tuples (I > > > definitly need a method to represent "complex" features) > > > > > > IMHO it does not make much sense to proceed with further developments > > > until we have ressolved this substantial issue. I am looking forward to > > > hear your ideas! > > > > > > Best regards, > > > Christoph > > > > > > PS: Martin mentioned, that he has also experienced performance problems > in > > > accessing the parsed OWL-DL datastructure (parsing the file seems to be > ok) > > > - also for external (i.e. non IST/ALU) datasets. I have always blamed > > > Redland libraries, but maybe this is a related issue. > > > _______________________________________________ > > > Development mailing list > > > Development at opentox.org > > > http://www.opentox.org/mailman/listinfo/development > > > > > > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list