[OTDev] ARFF mime type

Egon Willighagen egon.willighagen at gmail.com
Sat Sep 26 18:18:09 CEST 2009


Hi Joerg,

I am on the list :)

On Sat, Sep 26, 2009 at 6:01 PM, Jörg Kurt Wegner <joerg.wegner at web.de> wrote:
>>It would be sensible to choose a chemically oriented format (ARFF may be
>>too generic).
>
> I totally agree, maybe you could use the ARFF molecule instances we created
> a few years ago, the only thing you now need are converters or wrappers ;-)
> http://joelib.svn.sourceforge.net/viewvc/joelib/trunk/src/joelib2/algo/datam
> ining/weka/
>
> Maybe YAML and CML could get married ?
> I would check with Egon Willighagen and Rajarshi Guha (CCed).

CML is always a good choice, as it mixes very well with other
XML-based namespaces...

That said, I invite everyone to have a look at Ola Spjuth's work on
QSAR data sets for Bioclipse, which currently involves molecules and
calculated descriptors... it (re)uses the Blue Obelisk Descriptor
Ontology used by the CDK and JOELib too... the manuscript is in
preparation, but the code is all online...

Please have a look at Bioclipse 2.1, install the QSAR feature from the
update site, start a new QSAR project, and look at the resulting ML...

http://pele.farmbio.uu.se/bioclipse-devel/

I lost track a bit on this thread, and at least note that Ola's ML
does not do models yet.

BTW, I much prefer R over Weka for any machine learning.

Egon


-- 
Post-doc @ Uppsala University
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers



More information about the Development mailing list