[OTDev] Experiments with RDF

Fri Oct 1 12:20:04 CEST 2010

Hi Nina,
  Thanks for you input!

On Fri, 2010-10-01 at 08:32 +0300, Nina Jeliazkova wrote:

> Hi Pantelis, All,
> 
> 
> On Thu, Sep 30, 2010 at 9:33 PM, chung <chvng at mail.ntua.gr> wrote:
> 
> > Hi all,
> >  During a round (rectangle in fact) table discussion in Rhodes we
> > questioned the efficiency of web services based on RDF and in particular
> > its OWL-DL variant. I gathered some statistics using ToxOtis while
> > experimenting with downloading and parsing datasets. Also we've tested
> > the performance  of ToxOtis in converting dataset objects into weka
> > objects (weka.core.Instances); the latter are useful to users of Weka.
> > These are preliminary results and we must not jump into conclusions but
> > we can start a discussion around some performance issues. Java
> > developers may use ToxOtis as a kind of client-profiler for their
> > services. Find attached a draft report that attempts to correlate the
> > size of a dataset with the computational effort needed for its parsing.
> >
> 
> 
> Would it be possible to run further experiments - in particular:
> 
> - Split the reported time into time, necessary to download the RDF
> representation from the server, and time, necessary to parse and load the
> RDF as Jena object.  The reason for asking is these two parts can be
> optimized by different approaches (minimizing file size by prefixing or
> compression for the former and exploiting different Jena storage models for
> the later).

Yes, that was in my plans.

> 
> - Report time to parse RDF into different in-memory Jena models (ones from
> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html(not
> sure which is being used in the  tests now)

That would be also an interesting experiment.

> 
> - Report timings, using slightly different approach to convert to weka
> instances, namely , retrieve URIs of compounds first and then retrieve
> features for each compound in subsequent calls.
> 

Well, the time needed to convert the dataset object into Instances is
relatively small. Do you think this needed to be optimized further. We
can do the experiment however.

> - Report timings, when using Jena persistent storages , instead of in-memory
> one (http://openjena.org/TDB/, http://openjena.org/SDB/ )

I don't think that persistence will outperform the memory storage in
terms of computational time but will probably allocate less memory.
Apart from that, I don't think that such persistence is needed on the
client side since that data are persistent on the server. 

> 
> If we find an optimal setting after these experiments, the next step would
> be trying to work with datasets, comparable with size to the raw malaria
> data.  Ideally, would be nice to compare with RDF libraries, other than
> Jena, but this may require more efforts.
> 

 We might find such a comparison online otherwise we could run some
tests!

> Best regards,
> Nina
> 
> 
> 
> >
> > Best regards,
> > Pantelis S.
> >
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
> >
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>