[OTDev] OWL-DL performance/scalability problems

Christoph Helma helma at in-silico.ch
Mon Sep 6 18:27:10 CEST 2010


Excerpts from Egon Willighagen's message of Mon Sep 06 11:23:31 +0200 2010:
> Further reduction of the below RDF/XML can be done with:
> 
> On Mon, Sep 6, 2010 at 10:49 AM, Nina Jeliazkova
> <jeliazkova.nina at gmail.com> wrote:
> > <rdf:RDF
> 
> xmlns="http://what.ever.matches/ot:"
> xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
> 
> >  xml:base="http://ambit.uni-plovdiv.bg:8080/ambit2/">
> >
> >  <ot:Dataset rdf:about="dataset/1">
> 
> including that first @xmlns on the root element would make this simply:
> 
> <Dataset rdf:about="dataset/1">
> 
> >      <ot:dataEntry>
> >      <ot:DataEntry>
> >        <ot:values>
> >          <ot:FeatureValue>
> >            <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
> >            >formaldehyde</ot:value>
> 
> The first + second @xmlns:xsd would conver this line into:
> 
> <value rdf:datatype="xsd:string">formaldehyde</ot:value>
> 
> Though you could also argue that the rdf:datatype can be left out, as
> that knowledge would be encoded in the ontology...
> 
> Otherwise, I am still not sure I understand where the exact bottleneck
> is... this exercise seems to indicate it is the volume of the RDF/XML
> serialization...

I have the impression that the bottleneck is the insertion of statements
into the RDF graph, not serilization or the volume of data. I use
the volume of data only as indicator for the size of the RDF graph.

BTW: Who knows the (theoretical) complexity of inserting statements into a
RDF graph?

> Christoph, what does your full dataflow look like? How often is the
> RDF/XML serialized and deserialized?

Only once in both directions, internally we work with our own
representation.

> What generates the data and how
> to do create the RDF? Would it be possible to skip RedLand and any
> other RDF library at all?

In principle yes, but I would hate to reinvent the wheel and write
RDF/XML "by hand".

> That is, it is merely serialization, or do
> you process the data too somewhere?

Yes, just serialisation of our internal representation. We had
an implementation based on OWL-DL, but had to revert back to our own
data structures for efficiency reasons. 

PS: Thanks Nina for the prefix/namespace tip. I will try.

Best regards,
Christoph



More information about the Development mailing list