[OTDev] OWL-DL performance/scalability problems

Nina Jeliazkova jeliazkova.nina at gmail.com
Fri Sep 3 16:22:32 CEST 2010


Christoph,

In Jena one can set options for the triple memory storage - e.g.

ModelFactory.createOntologyModel( OntModelSpec)  ,
http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html

These differ in memory efficiency and reasoning capabilities.   Perhaps
Redland has something similar to use?

The dataset size you are reporting seem rather small ; on the other hand
in-memory storage has limits in any representation, just the boundaries are
different.

Nina


On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch> wrote:

> Dear all,
>
> I have been investigating several problems that we had with creating and
> serving OWL-DL representations:
>
> - slow response
> - gateway timeouts
> - memory allocation problems
>
> Both problems depend of course on the size and complexity of the
> datasets. Most problematic are datasets with tuples, here we run into
> troubles even for medium sized datasets (several hundreds of compounds)
> with less than 100 features. It took e.g. 20 minutes to create
> http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow
> larger, we may run into memory allocation problems. All of this can be
> quite annoying, because
>
> - long running processes eat CPU time, slowing down other processes
> - tasks may timeout before processes have finished
> - users expect a response without getting them
> - users get unpatient, restarting processes which slow down the sytem
>  even more
> - memory allocation failures my crash the dataset service
> - ...
>
> What is probably _not_ responsible:
>
>  RDF/XML representation: Same problem for turtle, json, triples
>  Iteration over our internal data structures: Takes only 0.3% of the total
> processing time
>  Redland libraries: I have tried another library (not too much choices in
> Ruby), takes 5 times longer than with Redland.
>
> What _could_ be responsible:
>
>  Wrong/inefficient OWL-DL representation: Can one of the OWL exports please
> have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf?
>
>  OpenTox OWL-DL/Triple representation:
>
>  Symptoms:
>    Our internal representation (
> http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still keeping
> redundant information for efficient searches), OWL-DL as RDF/XML needs 15M
> (which is still 6.1M in Turtle) for the same information.
>    I have the impression that our OWL-DL does not scale well especially for
> Tuples, here are some measured figures for the rdf/yaml ratio (maybe one of
> the computer scientists can have a closer look):
>
>      small dataset (85 compounds), 1 feature/compound:        6.5
>      medium dataset (580 compounds), 1 feature/compound:      7.4
>      small dataset (85 compounds), 56 features as tuples:    32
>      medium dataset (580 compounds), 55 features as tuples: 170
>
> Possible solutions:
>
>  Curing symptoms:
>
>    Lazy generation/caching of OWL-DL representations: Implemented, you
> might still get timeouts at the first request/have to wait a long time for
> OWL-DL to finish, does not solve memory allocation problems
>    Use a persistent store instead of memory store: might solve memory
> allocation problems, but will slow down things even further
>    Get more faster hardware
>
>  Curing the cause (I am at loss here, please help):
>
>    Tell us what goes wrong in our OWL-DLs
>    Improve scalability of OpenTox OWL-DL especially in respect to tuples (I
> definitly need a method to represent "complex" features)
>
> IMHO it does not make much sense to proceed with further developments
> until we have ressolved this substantial issue. I am looking forward to
> hear your ideas!
>
> Best regards,
> Christoph
>
> PS: Martin mentioned, that he has also experienced performance problems in
> accessing the parsed OWL-DL datastructure (parsing the file seems to be ok)
> - also for external (i.e. non IST/ALU) datasets. I have always blamed
> Redland libraries, but maybe this is a related issue.
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>



-- 

Dr. Nina Jeliazkova
Technical Manager
4 A.Kanchev str.
IdeaConsult Ltd.
1000 Sofia, Bulgaria
Phone: +359 886 802011



More information about the Development mailing list