[OTDev] Experiments with RDF

Fri Oct 1 12:40:27 CEST 2010

Hi Pantelis,

On Fri, Oct 1, 2010 at 1:20 PM, chung <chvng at mail.ntua.gr> wrote:

> Hi Nina,
>  Thanks for you input!
>
>
> On Fri, 2010-10-01 at 08:32 +0300, Nina Jeliazkova wrote:
>
> > Hi Pantelis, All,
> >
> >
> > On Thu, Sep 30, 2010 at 9:33 PM, chung <chvng at mail.ntua.gr> wrote:
> >
> > > Hi all,
> > >  During a round (rectangle in fact) table discussion in Rhodes we
> > > questioned the efficiency of web services based on RDF and in
> particular
> > > its OWL-DL variant. I gathered some statistics using ToxOtis while
> > > experimenting with downloading and parsing datasets. Also we've tested
> > > the performance  of ToxOtis in converting dataset objects into weka
> > > objects (weka.core.Instances); the latter are useful to users of Weka.
> > > These are preliminary results and we must not jump into conclusions but
> > > we can start a discussion around some performance issues. Java
> > > developers may use ToxOtis as a kind of client-profiler for their
> > > services. Find attached a draft report that attempts to correlate the
> > > size of a dataset with the computational effort needed for its parsing.
> > >
> >
> >
> > Would it be possible to run further experiments - in particular:
> >
> > - Split the reported time into time, necessary to download the RDF
> > representation from the server, and time, necessary to parse and load the
> > RDF as Jena object.  The reason for asking is these two parts can be
> > optimized by different approaches (minimizing file size by prefixing or
> > compression for the former and exploiting different Jena storage models
> for
> > the later).
>
>
> Yes, that was in my plans.
>

OK ,will be looking for the results.

> >
> > - Report time to parse RDF into different in-memory Jena models (ones
> from
> >
> http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html(not<http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html%28not>
> > sure which is being used in the  tests now)
>
>
> That would be also an interesting experiment.
>
> >
> > - Report timings, using slightly different approach to convert to weka
> > instances, namely , retrieve URIs of compounds first and then retrieve
> > features for each compound in subsequent calls.
> >
>
>
> Well, the time needed to convert the dataset object into Instances is
> relatively small. Do you think this needed to be optimized further. We
> can do the experiment however.
>
>
In my experience, Weka instances work fine up to certain limit (dependending
on memory available to JVM) , but it is not possible to work with Weka for
large datasets.  Moreover, I assume in the current test setup , the data is
at least duplicated (once in the RDF model and once as Weka instances) ,
which makes the memory consumption worse. Thus, the suggestion to load RDF
in small chunks (e.g. per compound), create weka instances and immediately
discard RDF instances.

>
> > - Report timings, when using Jena persistent storages , instead of
> in-memory
> > one (http://openjena.org/TDB/, http://openjena.org/SDB/ )
>
>
> I don't think that persistence will outperform the memory storage in
> terms of computational time but will probably allocate less memory.
>

As the memory consumption is the bottleneck, it surely will do; persistent
storages are usually optimized to do so.  There are existing benchmarks
showing in-memory Jena performs worst.

> Apart from that, I don't think that such persistence is needed on the
> client side since that data are persistent on the server.
>

IMHO in-memory Jena models will simply not work for datasets > few thousand
entries , especially if the code runs as a server application (e.g.
ToxPredict) and should support multiple simultaneous users.

Besides time, could you record also memory related stats?

>
> >
> > If we find an optimal setting after these experiments, the next step
> would
> > be trying to work with datasets, comparable with size to the raw malaria
> > data.  Ideally, would be nice to compare with RDF libraries, other than
> > Jena, but this may require more efforts.
> >
>
>  We might find such a comparison online otherwise we could run some
> tests!
>

Yes indeed.

Nina

>
>
> > Best regards,
> > Nina
> >
> >
> >
> > >
> > > Best regards,
> > > Pantelis S.
> > >
> > > _______________________________________________
> > > Development mailing list
> > > Development at opentox.org
> > > http://www.opentox.org/mailman/listinfo/development
> > >
> > >
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>