[OTDev] Experiments with RDF

Wed Oct 6 15:52:15 CEST 2010

Hi Pantelis,

On Wed, Oct 6, 2010 at 4:32 PM, chung <chvng at mail.ntua.gr> wrote:

> Hi Nina,
>
> On Wed, 2010-10-06 at 15:52 +0300, Nina Jeliazkova wrote:
>
> > Hi Pantelis,
> >
> > I guess there is a typo in your report , where you say "Jena was about 14
> > seconds faster than StAX based on 32 successive measurements that are
> > presented in the following figure", but on the figure response times
> using
> > Jena (red line) are higher than StAX.
>
>
>   You're right, that was just a typo! Indeed your implementation of StAX
> is faster and I'm also interested in using it for serializing datasets
> and other objects into RDF. Could you send me a link and maybe some
> hints on how to use your source code? If possible let me know of any
> dependencies I need.
>
>
The code is here, although heavy dependent on internal ambit data objects. I
hope you could get the idea regardless of this dependency.

https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-www/src/main/java/ambit2/rest/dataset/DatasetRDFStaxReporter.java
https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-www/src/main/java/ambit2/rest/RDFStaXConvertor.java
https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-www/src/main/java/ambit2/rest/QueryStaXReporter.java

It only uses  javax.xml.stream.XMLStreamWriter
http://download.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamWriter.html
(these java oracle links look weird, don't they :)

>
> > Also, the statement "It was shown that StAX outperforms the internal
> > implementation of Jena for parsing RDF documents" is not entirely
> correct,
> > as currently StAX is used for writing (serializing), not for parsing RDF
> > documents.
> >
>
> Yes, I will rephrase that.
>
> > Finally, as we discussed off-list, would be good to split the response
> time
> > into a download time and RDF parse time.
> >
>
> That is the next step...
>
>

As a next next step, are you interesting in trying to parse JSON (provided
we could manage to serialize the datasets in JSON).

One of the links Egon just sent looks straightforward to try
http://code.google.com/p/linked-data-api/  , as the package works with Jena
models.

Best regards,
Nina

> Best regards,
> Pantelis
>
> > Best regards,
> > Nina
> >
> > On Mon, Oct 4, 2010 at 6:12 PM, chung <chvng at mail.ntua.gr> wrote:
> >
> > > Hi Christoph,
> > >   What is dimension (number of features and compounds) in this dataset?
> > > By the way, I have made some more measurements that you will find
> > > attached.
> > >
> > > Best regards,
> > > Pantelis
> > >
> > > On Mon, 2010-10-04 at 10:21 +0200, Christoph Helma wrote:
> > >
> > > > Dear all,
> > > >
> > > > I have just returned from holidays. In the attachment I am sending
> you a
> > > > few benchmarks for OWL-DL serialisation for various libraries
> (RDF.rb,
> > > > Redland with Ruby bindings, Redland with SWIG/Ruby bindings, direct
> > > > serialisation to strings (ntriples)) I have made before our meeting.
> > > > All of them use some internal housekeeping to avoid duplicate triples
> > > > (Triples creation ...). Algorithms are not 100% comparable (Objects
> are
> > > > sometimes created during triple creation, sometimes during triples
> > > > insertiion), but in general the bottleneck is the creation of the RDF
> > > > graph (Triples insertion into model ...). Serialisation itself is
> rarely
> > > > a problem (also not parsing).
> > > >
> > > > For future experiments I would suggest to share some benchmark
> datasets
> > > > (large, medium, small) - I will still have to read all the
> > > > messages/attachments of this thread in detail.
> > > >
> > > > Best regards,
> > > > Christoph
>