[OTDev] OWL-DL performance/scalability problems
Nina Jeliazkova jeliazkova.nina at gmail.comMon Sep 6 10:49:46 CEST 2010
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christoph, On Mon, Sep 6, 2010 at 11:33 AM, Nina Jeliazkova <jeliazkova.nina at gmail.com>wrote: > Christoph, > > Are there options in Redland to setup prefixes in RDF ? > Will looks like > ... > xmlns:dataset="http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112> > " > ... > <ot:Dataset rdf:about="dataset/112"> > > > instead of "http://webservices.in-silico.ch/dataset<http://webservices.in-silico.ch/dataset/112>/112" > everywhere. Prefixes can be defined for all objects. > > Nina > > It looks like there are settings for base uri and namespaces http://nxg.me.uk/dist/racket-librdf/docs/rdf.html base-uri : base URI of output – #f for "don’t care" Just setting base URI to http://webservices.in-silico.ch/ <http://webservices.in-silico.ch/dataset/112>should help a lot, the RDF will be similar to the example below <rdf:RDF xml:base="http://ambit.uni-plovdiv.bg:8080/ambit2/"> <ot:Dataset rdf:about="dataset/1"> <ot:dataEntry> <ot:DataEntry> <ot:values> <ot:FeatureValue> <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >formaldehyde</ot:value> <ot:feature rdf:resource="feature/9"/> </ot:FeatureValue> </ot:values> <ot:values> <ot:FeatureValue> <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >DSL,TSCA</ot:value> <ot:feature rdf:resource="feature/10"/> </ot:FeatureValue> </ot:values> <ot:values> <ot:FeatureValue> <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >formaldehyde</ot:value> <ot:feature rdf:resource="feature/2"/> </ot:FeatureValue> </ot:values> </ot:DataEntry> </ot:dataEntry> </rdf:RDF> Hope this helps, Nina > > > > On Mon, Sep 6, 2010 at 11:24 AM, Christoph Helma <helma at in-silico.ch>wrote: > >> Dear all, >> >> Excerpts from Nina Jeliazkova's message of Fri Sep 03 16:22:32 +0200 2010: >> > In Jena one can set options for the triple memory storage - e.g. >> > >> > ModelFactory.createOntologyModel( OntModelSpec) , >> > >> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html >> > >> > These differ in memory efficiency and reasoning capabilities. Perhaps >> > Redland has something similar to use? >> >> AFAIK Redland does not have such options (apart from choosing a triple >> store) - reasoning is done by a separate library (Rasqual). >> >> > The dataset size you are reporting seem rather small ; on the other hand >> > in-memory storage has limits in any representation, just the boundaries >> are >> > different. >> >> We can switch to another triple store, but that does not solve the >> general scalability problem. >> >> Contrary to my initial assumptions I do not think that Redland libraries >> are the cause for our problems. Based on my measurements I am pretty >> convinced, that our OWL-DL representation does not scale well, >> especially when it comes to complex features that require tuples >> (computing times seem to correspond to the resulting file sizes). >> >> > > I have the impression that our OWL-DL does not scale well >> especially for >> > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe >> one of >> > > the computer scientists can have a closer look): >> > > >> > > small dataset (85 compounds), 1 feature/compound: 6.5 >> > > medium dataset (580 compounds), 1 feature/compound: 7.4 >> > > small dataset (85 compounds), 56 features as tuples: 32 >> > > medium dataset (580 compounds), 55 features as tuples: 170 >> > > >> >> I have also tried to switch to another library (RDF.rb), which did not >> resolve the problem. >> >> So we are either making a mistake in our (IST/ALU) OWL-DL implementation >> (any help is greatly appreciated - maybe the redundant representation of >> features is the culprit) or our OpenTox OWL in general does not scale well >> for >> larger datasets (especially with complex features). >> >> If you want to have a look: >> http://webservices.in-silico.ch/dataset/112 >> (cached to save you the timeouts), Accept:application/x-yaml or >> http://webservices.in-silico.ch/dataset/112.yaml gives you our internal >> representation. >> >> Thanks! >> Christoph >> >> > Christoph, >> > >> > >> > Nina >> > >> > On Fri, Sep 3, 2010 at 4:37 PM, Christoph Helma <helma at in-silico.ch> >> wrote: >> > >> > > Dear all, >> > > >> > > I have been investigating several problems that we had with creating >> and >> > > serving OWL-DL representations: >> > > >> > > - slow response >> > > - gateway timeouts >> > > - memory allocation problems >> > > >> > > Both problems depend of course on the size and complexity of the >> > > datasets. Most problematic are datasets with tuples, here we run into >> > > troubles even for medium sized datasets (several hundreds of >> compounds) >> > > with less than 100 features. It took e.g. 20 minutes to create >> > > http://webservices.in-silico.ch/dataset/112.rdf. If datasets grow >> > > larger, we may run into memory allocation problems. All of this can be >> > > quite annoying, because >> > > >> > > - long running processes eat CPU time, slowing down other processes >> > > - tasks may timeout before processes have finished >> > > - users expect a response without getting them >> > > - users get unpatient, restarting processes which slow down the sytem >> > > even more >> > > - memory allocation failures my crash the dataset service >> > > - ... >> > > >> > > What is probably _not_ responsible: >> > > >> > > RDF/XML representation: Same problem for turtle, json, triples >> > > Iteration over our internal data structures: Takes only 0.3% of the >> total >> > > processing time >> > > Redland libraries: I have tried another library (not too much choices >> in >> > > Ruby), takes 5 times longer than with Redland. >> > > >> > > What _could_ be responsible: >> > > >> > > Wrong/inefficient OWL-DL representation: Can one of the OWL exports >> please >> > > have a look at e.g. http://webservices.in-silico.ch/dataset/112.rdf? >> > > >> > > OpenTox OWL-DL/Triple representation: >> > > >> > > Symptoms: >> > > Our internal representation ( >> > > http://webservices.in-silico.ch/dataset/112.yaml) needs 90K (still >> keeping >> > > redundant information for efficient searches), OWL-DL as RDF/XML needs >> 15M >> > > (which is still 6.1M in Turtle) for the same information. >> > > I have the impression that our OWL-DL does not scale well >> especially for >> > > Tuples, here are some measured figures for the rdf/yaml ratio (maybe >> one of >> > > the computer scientists can have a closer look): >> > > >> > > small dataset (85 compounds), 1 feature/compound: 6.5 >> > > medium dataset (580 compounds), 1 feature/compound: 7.4 >> > > small dataset (85 compounds), 56 features as tuples: 32 >> > > medium dataset (580 compounds), 55 features as tuples: 170 >> > > >> > > Possible solutions: >> > > >> > > Curing symptoms: >> > > >> > > Lazy generation/caching of OWL-DL representations: Implemented, you >> > > might still get timeouts at the first request/have to wait a long time >> for >> > > OWL-DL to finish, does not solve memory allocation problems >> > > Use a persistent store instead of memory store: might solve memory >> > > allocation problems, but will slow down things even further >> > > Get more faster hardware >> > > >> > > Curing the cause (I am at loss here, please help): >> > > >> > > Tell us what goes wrong in our OWL-DLs >> > > Improve scalability of OpenTox OWL-DL especially in respect to >> tuples (I >> > > definitly need a method to represent "complex" features) >> > > >> > > IMHO it does not make much sense to proceed with further developments >> > > until we have ressolved this substantial issue. I am looking forward >> to >> > > hear your ideas! >> > > >> > > Best regards, >> > > Christoph >> > > >> > > PS: Martin mentioned, that he has also experienced performance >> problems in >> > > accessing the parsed OWL-DL datastructure (parsing the file seems to >> be ok) >> > > - also for external (i.e. non IST/ALU) datasets. I have always blamed >> > > Redland libraries, but maybe this is a related issue. >> > > _______________________________________________ >> > > Development mailing list >> > > Development at opentox.org >> > > http://www.opentox.org/mailman/listinfo/development >> > > >> > >> _______________________________________________ >> Development mailing list >> Development at opentox.org >> http://www.opentox.org/mailman/listinfo/development >> > > >
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OWL-DL performance/scalability problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list