[OTDev] RDF in OpenTox

Nina Jeliazkova jeliazkova.nina at gmail.com
Sat May 28 07:31:24 CEST 2011


Hi All,

You might check the benchmarking we did last summer, and included in my talk
[1] at ACS RDF session Aug 2010 (slides 44-46, slide 45 attached in this
email).

Regards,
Nina

[1] N. Jeliazkova, RESTful RDF services for predictive
toxicology<http://vedina.users.sourceforge.net/publications/2010/ACS-RDF-NJ.pdf>,
ACS RDF Symposium at 240th ACS National
meeting<http://portal.acs.org/portal/Navigate?nodeid=2061>,
Boston, MA, Aug 22-26,*2010*.
http://vedina.users.sourceforge.net/publications/2010/ACS-RDF-NJ.pdf

On 28 May 2011 00:53, chung <chvng at mail.ntua.gr> wrote:

> Hi Christoph,
>    I may say that our benchmark results coincide (and one thumbs up
> from me for "bushiness").
>
> On Fri, 2011-05-27 at 20:04 +0200, Christoph Helma wrote:
>
> > Dear All,
> >
> > Some time ago I made some benchmark tests, if I remember correctly the
> > main results were
> >
> > - the most resource intensive task was to build and maintain the
> >   internal RDF tree of the library (this is at least true for redland
> >   and RDF.rb libraries). I suppose that a lot of indexing is going on to
> >   make tree traversal more efficient. Resource usage (CPU and memory
> >   scaled very unfavorable with the size _and_ the "bushiness" (ie.
> >   branching degree) of the RDF tree
> >
> > - the library implementation (redland in C, RDF.rb in ruby) has a small
> >   impact, but both versions were unusable for our datasets
> >
>
>
> That is expected I think. RDF has a different purpose IMHO, which serves
> perfectly. But maybe the time we turned to some other serialization has
> come. But to be realistic... not in this project!
>
>
> > - RDF format (RDF/XML, Turtle, N3, JSON, ...) had only a minor impact
> >
>
>
> I confirm that too.
>
>
> > - parsing times was reduced by several orders of magnitude with a custom
> >   parser that avoids complex data structures and indexing (not much fun
> >   to write and maintain). I am also not sure how well it scales, it has
> >   still the limitation that everything has to fit into memory.
> >
> > Best regards,
> > Christoph
> >
> >
> > On Fri, 27 May 2011 19:27:24 +0200, Egon Willighagen <
> egon.willighagen at gmail.com> wrote:
> > > Dear Pantelis,
> > >
> > > On Fri, May 27, 2011 at 5:50 PM, chung <chvng at mail.ntua.gr> wrote:
> > > > Some criticism on RDF from the experience we've gained in OpenTox :
> > > > http://is.gd/qLJG3h . The article is not complete yet and will be
> > > > enriched with more facts and diagrams.
> > >
> > > Please do, because right now you left out so much detail on what you
> > > are in fact doing. I do appreciate your frustration, and the
> > > difference is unacceptable.
> > >
> > > I have these questions:
> > >
> > > * RDF is not a format, while ARFF is for file format? you mix RDF and
> > > RDF/XML as if they are the same thing; why?
> > > * what RDF file format have you used? RDF/XML, as you later refer to?
> > > * are you using reasoning, and if so why? moreover, you should not
> > > compare a reasoning environment with a non-reasoning one (of course,
> > > you'd see differences)
> > > * what information is specified in the ARFF header?
> > > * why aren't you using a vector annotation in RDF?
> > > * how large is the file, and what are you doing to use 2GB of heap
> space?
> > > * how large is your data set?
> > > * what does your code look like?
> > >
> > > A fair comment would be take ARFF takes a short cut: it imposes
> > > additional structure on the data, something you identify in your
> > > report. RDF does not do that by itself. A vector environment does.
> > > That does not mean that such is not possible with RDF. Have you
> > > consider what options there are to introduce this vector restriction
> > > into the computational framework, forced to use RDF? Do you believe it
> > > is impossible to achieve that with RDF? Would you see it impossible to
> > > define an ontology to capture vector notation, allowing you to specify
> > > what each column in that vector represents?
> > >
> > > Now, given that you do see that option too, you would probably end up
> > > with a ontology looking very much like the ARFF specification, but the
> > > in RDF.
> > >
> > > In short, based on your report I really cannot judge of RDF is the
> > > problem, because your results do not make such conclusion possible.
> > > Instead, I rather think that you are running into a highly confounded
> > > analysis where it is not possible to assign the slowness to any
> > > factor. I think you are comparing two widely different data models,
> > > one optimized for computation (ARFF) and one not (your current RDF/XML
> > > file). Would that perhaps be the significant factor in the difference
> > > in speed?
> > >
> > > I am looking forward to a more detailed report on the various involved
> > > factors that determine the speed here,
> > >
> > > Egon
> > >
> > > --
> > > Dr E.L. Willighagen
> > > Postdoctoral Researcher
> > > Institutet för miljömedicin
> > > Karolinska Institutet (http://ki.se/imm)
> > > Homepage: http://egonw.github.com/
> > > LinkedIn: http://se.linkedin.com/in/egonw
> > > Blog: http://chem-bla-ics.blogspot.com/
> > > PubList: http://www.citeulike.org/user/egonw/tag/papers
> > > _______________________________________________
> > > Development mailing list
> > > Development at opentox.org
> > > http://www.opentox.org/mailman/listinfo/development
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rdf-acs-slide45.png
Type: image/png
Size: 415482 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20110528/3ec6a4b8/attachment.png>


More information about the Development mailing list