[OTDev] OWL-DL performance/scalability problems

Christoph Helma helma at in-silico.ch
Mon Sep 6 10:24:50 CEST 2010


Dear all,

Excerpts from Nina Jeliazkova's message of Fri Sep 03 16:22:32 +0200 2010:
> In Jena one can set options for the triple memory storage - e.g.
> 
> ModelFactory.createOntologyModel( OntModelSpec)  ,
> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html
> 
> These differ in memory efficiency and reasoning capabilities.   Perhaps
> Redland has something similar to use?

AFAIK Redland does not have such options (apart from choosing a triple
store) - reasoning is done by a separate library (Rasqual).

> The dataset size you are reporting seem rather small ; on the other hand
> in-memory storage has limits in any representation, just the boundaries are
> different.

We can switch to another triple store, but that does not solve the
general scalability problem.

Contrary to my initial assumptions I do not think that Redland libraries
are the cause for our problems. Based on my measurements I am pretty
convinced, that our OWL-DL representation does not scale well,
especially when it comes to complex features that require tuples
(computing times seem to correspond to the resulting file sizes). 

> >    I have the impression that our OWL-DL does not scale well especially for
> > Tuples, here are some measured figures for the rdf/yaml ratio (maybe one of
> > the computer scientists can have a closer look):
> >
> >      small dataset (85 compounds), 1 feature/compound:        6.5
> >      medium dataset (580 compounds), 1 feature/compound:      7.4
> >      small dataset (85 compounds), 56 features as tuples:    32
> >      medium dataset (580 compounds), 55 features as tuples: 170
> >

I have also tried to switch to another library (RDF.rb), which did not
resolve the problem.

So we are either making a mistake in our (IST/ALU) OWL-DL implementation
(any help is greatly appreciated - maybe the redundant representation of
features is the culprit) or our OpenTox OWL in general does not scale well for
larger datasets (especially with complex features).

If you want to have a look:
http://webservices.in-silico.ch/dataset/112
(cached to save you the timeouts), Accept:application/x-yaml or
http://webservices.in-silico.ch/dataset/112.yaml gives you our internal
representation.

Thanks!
Christoph

> Christoph,
> 
> 
> Nina
> 
> On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch> wrote:
> 
> > Dear all,
> >
> > I have been investigating several problems that we had with creating and
> > serving OWL-DL representations:
> >
> > - slow response
> > - gateway timeouts
> > - memory allocation problems
> >
> > Both problems depend of course on the size and complexity of the
> > datasets. Most problematic are datasets with tuples, here we run into
> > troubles even for medium sized datasets (several hundreds of compounds)
> > with less than 100 features. It took e.g. 20 minutes to create
> > http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow
> > larger, we may run into memory allocation problems. All of this can be
> > quite annoying, because
> >
> > - long running processes eat CPU time, slowing down other processes
> > - tasks may timeout before processes have finished
> > - users expect a response without getting them
> > - users get unpatient, restarting processes which slow down the sytem
> >  even more
> > - memory allocation failures my crash the dataset service
> > - ...
> >
> > What is probably _not_ responsible:
> >
> >  RDF/XML representation: Same problem for turtle, json, triples
> >  Iteration over our internal data structures: Takes only 0.3% of the total
> > processing time
> >  Redland libraries: I have tried another library (not too much choices in
> > Ruby), takes 5 times longer than with Redland.
> >
> > What _could_ be responsible:
> >
> >  Wrong/inefficient OWL-DL representation: Can one of the OWL exports please
> > have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf?
> >
> >  OpenTox OWL-DL/Triple representation:
> >
> >  Symptoms:
> >    Our internal representation (
> > http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still keeping
> > redundant information for efficient searches), OWL-DL as RDF/XML needs 15M
> > (which is still 6.1M in Turtle) for the same information.
> >    I have the impression that our OWL-DL does not scale well especially for
> > Tuples, here are some measured figures for the rdf/yaml ratio (maybe one of
> > the computer scientists can have a closer look):
> >
> >      small dataset (85 compounds), 1 feature/compound:        6.5
> >      medium dataset (580 compounds), 1 feature/compound:      7.4
> >      small dataset (85 compounds), 56 features as tuples:    32
> >      medium dataset (580 compounds), 55 features as tuples: 170
> >
> > Possible solutions:
> >
> >  Curing symptoms:
> >
> >    Lazy generation/caching of OWL-DL representations: Implemented, you
> > might still get timeouts at the first request/have to wait a long time for
> > OWL-DL to finish, does not solve memory allocation problems
> >    Use a persistent store instead of memory store: might solve memory
> > allocation problems, but will slow down things even further
> >    Get more faster hardware
> >
> >  Curing the cause (I am at loss here, please help):
> >
> >    Tell us what goes wrong in our OWL-DLs
> >    Improve scalability of OpenTox OWL-DL especially in respect to tuples (I
> > definitly need a method to represent "complex" features)
> >
> > IMHO it does not make much sense to proceed with further developments
> > until we have ressolved this substantial issue. I am looking forward to
> > hear your ideas!
> >
> > Best regards,
> > Christoph
> >
> > PS: Martin mentioned, that he has also experienced performance problems in
> > accessing the parsed OWL-DL datastructure (parsing the file seems to be ok)
> > - also for external (i.e. non IST/ALU) datasets. I have always blamed
> > Redland libraries, but maybe this is a related issue.
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
> 



More information about the Development mailing list