[OTDev] ARFF mime type

Rajarshi Guha rajarshi.guha at gmail.com
Mon Sep 28 15:11:26 CEST 2009


On Mon, Sep 28, 2009 at 8:31 AM, Andreas Maunz <andreas at maunz.de> wrote:

>
>
> I wonder if the data interchange format could be based on existing work
> in order to make it compatible to existing standards.
>
> For example, the blue obelisk descriptor ontology that is used by CDK
> and OB could be a start, as pointed out by Egon Willighagen in an
> earlier post to this list.
>

I've been lurking on the list for a few weeks, but I support the points
raised by Andreas and Nina. While a new YAML format may be simple to
read/convert etc, it is still another format. Given that there are
pre-existing formats that do cover much of these issues, it'd make sense to
reuse (and extend/modify if necessary) currently available formats.

I think Egon might have pointed out that the Bioclipse project has also
address this whole issue from the point of view of packaging QSAR models.
That approach is based on CML, the BO descriptor ontology and some extra
features and seems to cover a lot of what is being discussed here.

I don't know whether this has been discussed before - but has PMML been
considered as a model exchange format? While it doesn't directly support
descriptors, CML etc, it is an XML fomrat and with the use of namespaces,
one could easily include CML fragments, descriptor ontology fragments. A
nice feature of PMML is that it is support by a number of industry
heavyweights (as well as R).

PS. Having just joined this mailing list, I'm pretty excited to see all this
activity. Very interesting stuff going on!


-- 
Rajarshi Guha
NIH Chemical Genomics Center



More information about the Development mailing list