[OTDev] RDF in OpenTox

Egon Willighagen egon.willighagen at gmail.com
Fri May 27 19:27:24 CEST 2011


Dear Pantelis,

On Fri, May 27, 2011 at 5:50 PM, chung <chvng at mail.ntua.gr> wrote:
> Some criticism on RDF from the experience we've gained in OpenTox :
> http://is.gd/qLJG3h . The article is not complete yet and will be
> enriched with more facts and diagrams.

Please do, because right now you left out so much detail on what you
are in fact doing. I do appreciate your frustration, and the
difference is unacceptable.

I have these questions:

* RDF is not a format, while ARFF is for file format? you mix RDF and
RDF/XML as if they are the same thing; why?
* what RDF file format have you used? RDF/XML, as you later refer to?
* are you using reasoning, and if so why? moreover, you should not
compare a reasoning environment with a non-reasoning one (of course,
you'd see differences)
* what information is specified in the ARFF header?
* why aren't you using a vector annotation in RDF?
* how large is the file, and what are you doing to use 2GB of heap space?
* how large is your data set?
* what does your code look like?

A fair comment would be take ARFF takes a short cut: it imposes
additional structure on the data, something you identify in your
report. RDF does not do that by itself. A vector environment does.
That does not mean that such is not possible with RDF. Have you
consider what options there are to introduce this vector restriction
into the computational framework, forced to use RDF? Do you believe it
is impossible to achieve that with RDF? Would you see it impossible to
define an ontology to capture vector notation, allowing you to specify
what each column in that vector represents?

Now, given that you do see that option too, you would probably end up
with a ontology looking very much like the ARFF specification, but the
in RDF.

In short, based on your report I really cannot judge of RDF is the
problem, because your results do not make such conclusion possible.
Instead, I rather think that you are running into a highly confounded
analysis where it is not possible to assign the slowness to any
factor. I think you are comparing two widely different data models,
one optimized for computation (ARFF) and one not (your current RDF/XML
file). Would that perhaps be the significant factor in the difference
in speed?

I am looking forward to a more detailed report on the various involved
factors that determine the speed here,

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



More information about the Development mailing list