[OTDev] RDF in OpenTox

Christoph Helma helma at in-silico.ch
Fri May 27 20:04:53 CEST 2011


Dear All,

Some time ago I made some benchmark tests, if I remember correctly the
main results were

- the most resource intensive task was to build and maintain the
  internal RDF tree of the library (this is at least true for redland
  and RDF.rb libraries). I suppose that a lot of indexing is going on to
  make tree traversal more efficient. Resource usage (CPU and memory
  scaled very unfavorable with the size _and_ the "bushiness" (ie.
  branching degree) of the RDF tree

- the library implementation (redland in C, RDF.rb in ruby) has a small
  impact, but both versions were unusable for our datasets

- RDF format (RDF/XML, Turtle, N3, JSON, ...) had only a minor impact

- parsing times was reduced by several orders of magnitude with a custom
  parser that avoids complex data structures and indexing (not much fun
  to write and maintain). I am also not sure how well it scales, it has
  still the limitation that everything has to fit into memory.

Best regards,
Christoph


On Fri, 27 May 2011 19:27:24 +0200, Egon Willighagen <egon.willighagen at gmail.com> wrote:
> Dear Pantelis,
> 
> On Fri, May 27, 2011 at 5:50 PM, chung <chvng at mail.ntua.gr> wrote:
> > Some criticism on RDF from the experience we've gained in OpenTox :
> > http://is.gd/qLJG3h . The article is not complete yet and will be
> > enriched with more facts and diagrams.
> 
> Please do, because right now you left out so much detail on what you
> are in fact doing. I do appreciate your frustration, and the
> difference is unacceptable.
> 
> I have these questions:
> 
> * RDF is not a format, while ARFF is for file format? you mix RDF and
> RDF/XML as if they are the same thing; why?
> * what RDF file format have you used? RDF/XML, as you later refer to?
> * are you using reasoning, and if so why? moreover, you should not
> compare a reasoning environment with a non-reasoning one (of course,
> you'd see differences)
> * what information is specified in the ARFF header?
> * why aren't you using a vector annotation in RDF?
> * how large is the file, and what are you doing to use 2GB of heap space?
> * how large is your data set?
> * what does your code look like?
> 
> A fair comment would be take ARFF takes a short cut: it imposes
> additional structure on the data, something you identify in your
> report. RDF does not do that by itself. A vector environment does.
> That does not mean that such is not possible with RDF. Have you
> consider what options there are to introduce this vector restriction
> into the computational framework, forced to use RDF? Do you believe it
> is impossible to achieve that with RDF? Would you see it impossible to
> define an ontology to capture vector notation, allowing you to specify
> what each column in that vector represents?
> 
> Now, given that you do see that option too, you would probably end up
> with a ontology looking very much like the ARFF specification, but the
> in RDF.
> 
> In short, based on your report I really cannot judge of RDF is the
> problem, because your results do not make such conclusion possible.
> Instead, I rather think that you are running into a highly confounded
> analysis where it is not possible to assign the slowness to any
> factor. I think you are comparing two widely different data models,
> one optimized for computation (ARFF) and one not (your current RDF/XML
> file). Would that perhaps be the significant factor in the difference
> in speed?
> 
> I am looking forward to a more detailed report on the various involved
> factors that determine the speed here,
> 
> Egon
> 
> -- 
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Institutet för miljömedicin
> Karolinska Institutet (http://ki.se/imm)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development



More information about the Development mailing list