[OTDev] OWL-DL performance/scalability problems

Nina Jeliazkova jeliazkova.nina at gmail.com
Mon Sep 6 10:49:46 CEST 2010


Christoph,

On Mon, Sep 6, 2010 at 11:33 AM, Nina Jeliazkova
<jeliazkova.nina at gmail.com>wrote:

> Christoph,
>
> Are there options in Redland to setup prefixes in RDF ?
> Will looks like
> ...
>  xmlns:dataset="http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112>
> "
> ...
> <ot:Dataset rdf:about="dataset/112">
>
>
> instead of "http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112>/112"
> everywhere. Prefixes can be defined for all objects.
>
> Nina
>
>
It looks like there are settings for base uri and namespaces
http://nxg.me.uk/dist/racket-librdf/docs/rdf.html

base-uri : base URI of output – #f for "don’t care"

Just setting base URI to  http://webservices.in-silico.ch/
<http://webservices.in-silico.ch/dataset/112>should help a lot,  the RDF
will be similar to the example below


<rdf:RDF

  xml:base="http://ambit.uni-plovdiv.bg:8080/ambit2/">

  <ot:Dataset rdf:about="dataset/1">
      <ot:dataEntry>
      <ot:DataEntry>
        <ot:values>
          <ot:FeatureValue>
            <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >formaldehyde</ot:value>
            <ot:feature rdf:resource="feature/9"/>
          </ot:FeatureValue>
        </ot:values>
               <ot:values>
          <ot:FeatureValue>
            <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >DSL,TSCA</ot:value>
            <ot:feature rdf:resource="feature/10"/>
          </ot:FeatureValue>
        </ot:values>
        <ot:values>
          <ot:FeatureValue>
            <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
            >formaldehyde</ot:value>
            <ot:feature rdf:resource="feature/2"/>
          </ot:FeatureValue>
        </ot:values>
      </ot:DataEntry>
    </ot:dataEntry>
  </rdf:RDF>


Hope this helps,
Nina

>
>
>
> On Mon, Sep 6, 2010 at 11:24 AM, Christoph Helma <helma at in-silico.ch>wrote:
>
>> Dear all,
>>
>> Excerpts from Nina Jeliazkova's message of Fri Sep 03 16:22:32 +0200 2010:
>> > In Jena one can set options for the triple memory storage - e.g.
>> >
>> > ModelFactory.createOntologyModel( OntModelSpec)  ,
>> >
>> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html
>> >
>> > These differ in memory efficiency and reasoning capabilities.   Perhaps
>> > Redland has something similar to use?
>>
>> AFAIK Redland does not have such options (apart from choosing a triple
>> store) - reasoning is done by a separate library (Rasqual).
>>
>> > The dataset size you are reporting seem rather small ; on the other hand
>> > in-memory storage has limits in any representation, just the boundaries
>> are
>> > different.
>>
>> We can switch to another triple store, but that does not solve the
>> general scalability problem.
>>
>> Contrary to my initial assumptions I do not think that Redland libraries
>> are the cause for our problems. Based on my measurements I am pretty
>> convinced, that our OWL-DL representation does not scale well,
>> especially when it comes to complex features that require tuples
>> (computing times seem to correspond to the resulting file sizes).
>>
>> > >    I have the impression that our OWL-DL does not scale well
>> especially for
>> > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe
>> one of
>> > > the computer scientists can have a closer look):
>> > >
>> > >      small dataset (85 compounds), 1 feature/compound:        6.5
>> > >      medium dataset (580 compounds), 1 feature/compound:      7.4
>> > >      small dataset (85 compounds), 56 features as tuples:    32
>> > >      medium dataset (580 compounds), 55 features as tuples: 170
>> > >
>>
>> I have also tried to switch to another library (RDF.rb), which did not
>> resolve the problem.
>>
>> So we are either making a mistake in our (IST/ALU) OWL-DL implementation
>> (any help is greatly appreciated - maybe the redundant representation of
>> features is the culprit) or our OpenTox OWL in general does not scale well
>> for
>> larger datasets (especially with complex features).
>>
>> If you want to have a look:
>> http://webservices.in-silico.ch/dataset/112
>> (cached to save you the timeouts), Accept:application/x-yaml or
>> http://webservices.in-silico.ch/dataset/112.yaml gives you our internal
>> representation.
>>
>> Thanks!
>> Christoph
>>
>> > Christoph,
>> >
>> >
>> > Nina
>> >
>> > On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch>
>> wrote:
>> >
>> > > Dear all,
>> > >
>> > > I have been investigating several problems that we had with creating
>> and
>> > > serving OWL-DL representations:
>> > >
>> > > - slow response
>> > > - gateway timeouts
>> > > - memory allocation problems
>> > >
>> > > Both problems depend of course on the size and complexity of the
>> > > datasets. Most problematic are datasets with tuples, here we run into
>> > > troubles even for medium sized datasets (several hundreds of
>> compounds)
>> > > with less than 100 features. It took e.g. 20 minutes to create
>> > > http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow
>> > > larger, we may run into memory allocation problems. All of this can be
>> > > quite annoying, because
>> > >
>> > > - long running processes eat CPU time, slowing down other processes
>> > > - tasks may timeout before processes have finished
>> > > - users expect a response without getting them
>> > > - users get unpatient, restarting processes which slow down the sytem
>> > >  even more
>> > > - memory allocation failures my crash the dataset service
>> > > - ...
>> > >
>> > > What is probably _not_ responsible:
>> > >
>> > >  RDF/XML representation: Same problem for turtle, json, triples
>> > >  Iteration over our internal data structures: Takes only 0.3% of the
>> total
>> > > processing time
>> > >  Redland libraries: I have tried another library (not too much choices
>> in
>> > > Ruby), takes 5 times longer than with Redland.
>> > >
>> > > What _could_ be responsible:
>> > >
>> > >  Wrong/inefficient OWL-DL representation: Can one of the OWL exports
>> please
>> > > have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf?
>> > >
>> > >  OpenTox OWL-DL/Triple representation:
>> > >
>> > >  Symptoms:
>> > >    Our internal representation (
>> > > http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still
>> keeping
>> > > redundant information for efficient searches), OWL-DL as RDF/XML needs
>> 15M
>> > > (which is still 6.1M in Turtle) for the same information.
>> > >    I have the impression that our OWL-DL does not scale well
>> especially for
>> > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe
>> one of
>> > > the computer scientists can have a closer look):
>> > >
>> > >      small dataset (85 compounds), 1 feature/compound:        6.5
>> > >      medium dataset (580 compounds), 1 feature/compound:      7.4
>> > >      small dataset (85 compounds), 56 features as tuples:    32
>> > >      medium dataset (580 compounds), 55 features as tuples: 170
>> > >
>> > > Possible solutions:
>> > >
>> > >  Curing symptoms:
>> > >
>> > >    Lazy generation/caching of OWL-DL representations: Implemented, you
>> > > might still get timeouts at the first request/have to wait a long time
>> for
>> > > OWL-DL to finish, does not solve memory allocation problems
>> > >    Use a persistent store instead of memory store: might solve memory
>> > > allocation problems, but will slow down things even further
>> > >    Get more faster hardware
>> > >
>> > >  Curing the cause (I am at loss here, please help):
>> > >
>> > >    Tell us what goes wrong in our OWL-DLs
>> > >    Improve scalability of OpenTox OWL-DL especially in respect to
>> tuples (I
>> > > definitly need a method to represent "complex" features)
>> > >
>> > > IMHO it does not make much sense to proceed with further developments
>> > > until we have ressolved this substantial issue. I am looking forward
>> to
>> > > hear your ideas!
>> > >
>> > > Best regards,
>> > > Christoph
>> > >
>> > > PS: Martin mentioned, that he has also experienced performance
>> problems in
>> > > accessing the parsed OWL-DL datastructure (parsing the file seems to
>> be ok)
>> > > - also for external (i.e. non IST/ALU) datasets. I have always blamed
>> > > Redland libraries, but maybe this is a related issue.
>> > > _______________________________________________
>> > > Development mailing list
>> > > Development at opentox.org
>> > > http://www.opentox.org/mailman/listinfo/development
>> > >
>> >
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>
>
>



More information about the Development mailing list