[OTDev] OWL-DL performance/scalability problems
Christoph Helma helma at in-silico.chWed Sep 8 13:49:31 CEST 2010
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OpenTox presentations at SMi ADMET and IUTOX conferences
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from Egon Willighagen's message of Mon Sep 06 18:29:44 +0200 2010: > On Mon, Sep 6, 2010 at 6:27 PM, Christoph Helma <helma at in-silico.ch> wrote: > >> Otherwise, I am still not sure I understand where the exact bottleneck > >> is... this exercise seems to indicate it is the volume of the RDF/XML > >> serialization... > > > > I have the impression that the bottleneck is the insertion of statements > > into the RDF graph, not serilization or the volume of data. I use > > the volume of data only as indicator for the size of the RDF graph. > > > > BTW: Who knows the (theoretical) complexity of inserting statements into a > > RDF graph? > > Indeed. There might be indexing ongoing in the background... > > >> What generates the data and how > >> to do create the RDF? Would it be possible to skip RedLand and any > >> other RDF library at all? > > > > In principle yes, but I would hate to reinvent the wheel and write > > RDF/XML "by hand". > > Fair, but a simple XML library would get you very far, and would allow > a streaming approach... Writing our own serializer (ntriples at the moment, because it was easier to implement for testing) did the trick, reducing eg. processing time from ~20 minutes to ~10 seconds (but only with a fast string concatination operator ("<<"), with a slow one ("+=") processing took much longer than with Redland ;-)). I would like to see if other implementations (Jena) perform much more efficiently than Redland. Maye we can run some benchmarks in Rhodes. We have also observed, that processing time depends to a large extend on the depth of the RDF tree, not only on the size of the dataset. And OWL-DL (especially with tuples) makes of course larger and deeper trees than "plain" RDF. If the complexity of datasets has really such a dramatic impact on processing time and resource I am not very optimistic, if we can process complex biological data with OWL-DL. Lets assume e.g. HTS gene expression datasets linked with experimental conditions, data analysis procedures, phenotype or other -omics measurements, pathway information, ... This would result in very large, deeply linked datasets. I would be very happy if one the computer scientists could have a look at the theoretical properties and scalability of the datastructures that are used to represent OWL-DL (not OWL-DL as a knowledge representation language). If there are theoretical limititations we would have to think about alternatives (not sure, what they could be ...) Christoph
- Previous message: [OTDev] OWL-DL performance/scalability problems
- Next message: [OTDev] OpenTox presentations at SMi ADMET and IUTOX conferences
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list