[OTDev] ARFF mime type

Nina Jeliazkova nina at acad.bg
Sat Sep 26 15:18:48 CEST 2009


Jörg ,

Jörg Kurt Wegner wrote:
> Have you seen already the BlueDesc package with Molecule2ARFF output?
> http://www.ra.cs.uni-tuebingen.de/software/bluedesc/welcome_e.html
>
>   
Thanks for the pointer. I like the non-destructive way to include
molecule IDs as comments in ARFF file - can we think of using compound
URI instead?

@DATA
% NAME OF MOLECULE 1: BindingDB_12662
50.0, 0.0, 0.0, 22.0, 0.0, 4.0, 4.0, 0.0, 7.0, 2.0, 0.0, 0.0, 0.1538, 54.0, 39.0, 26.0, 485.401, 3.0, 26.6009, 10.6366, 5.5786, 71.6461, 2.047, 37.0269, 5.6569, 21.6395, 17.0, 16.0, 12.1962, 5.3868, 211.0, 0.0, 0.0, 0.3103, 0.7135, 0.0, 0.0, 0.085, 0.1809, 3.6642, 0.2887, 1.0992, 0.1443, 1.0236, 0.0135, 0.1689, 0.0035, 24.905, 16.6132, 16.5093, 13.6322, 11.21, 8.768, 6.5839, 4.6398, 17.8379, 10.0274, 7.5832, 5.2551, 3.5409, 2.2338, 1.2768, 0.7538, 7.432, 10.7843, 14.4263, 1.9217, 2.3736, 2.502, 3847.0, 58.0, 14.2148, 7.5378, 925.0, 5.0, 50.0, 29.0, 1.0222, -0.5327, -0.0657, 0.3505, -0.3961, 45.1827, 46.1996, 74.3703, 73.9339, 71.536, 29.3641, 60.2539, 3382.5475, 58.1597, 15.0112, 3546.3866, 59.5515, 15.2497, 8114.332, 90.0796, 20.0948, 1.5611, 1.5611, 11485.7858, 8926.5776, 3405.5854, 1.2867, 3.3726, 2.6212, 9.5507, 15.4607, 1.009, 1.295, 5.8001, 19.113, 0.0494, 0.2213,  ?,  ?,  ?, 0.3216, 0.412, 0.5462, 26.2081, 143.1203, 312.8887, 0.297,  ?, 1.2798, 21.0, 9.0, 2.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 132.7647, 7.5384, 116.79, 1.8941, 416.2712, 1167.5065, 37.8629, 393.5322, -1103.7309, -58.2347, 22.739, 2271.2373, 96.0976, 0.514, 1.4417, 0.0468, 0.486, -1.363, -0.0719, 337.0978, 945.4506, 30.6615, 318.6837, -893.805, -47.1586, 0.155, 0.1014, 2.8621, 1.5707, 664.7586, 145.0448, 0.8209, 0.1791, 2.325, 0.0, 3.5263605246161616

Best regards,
Nina
> Here a list of the included 'chemical encodings'
> http://www.ra.cs.uni-tuebingen.de/mitarb/hinselmann/software/Descriptors_ove
> rview.txt
>
> Joerg Kurt Wegner
> http://miningdrugs.blogspot.com/
>
>
> -----Original Message-----
> From: development-bounces at opentox.org
> [mailto:development-bounces at opentox.org] On Behalf Of Nina Jeliazkova
> Sent: Freitag, 25. September 2009 17:34
> To: opentox development mailing list
> Subject: Re: [OTDev] ARFF mime type
>
> Hello Richard, Tobias, All,
>
> richard apodaca wrote:
>   
>> Hello Tobias,
>>
>> Please pardon some naive comments below - I'm new to the discussion...
>>   
>>     
> Good to have you on the OpenTox list; it is important for us to hear
> fresh views outside of the project (this is what "Open" is for ;)
>
> OpenTox partners, please excuse me for repeating some of my thoughts I
> have already shared during the Rome meeting.
>   
>> Is this the format you're interested in:
>>
>> http://www.cs.waikato.ac.nz/~ml/weka/arff.html
>>
>> What kind of support exists for it? If support is sparse, who benefits
>>     
> most from exposing resources in that representation?
>   
>>   
>>     
>
> ARFF files are very popular in machine learning, mostly because Weka is
> the de-facto standard open source software for machine learning.  ARFF
> files would be a perfect choice, if OpenTox objective was a generic
> platform for machine learning. 
>
> However, with the aim being predictive toxicology with molecules as
> objects being modeled, it is somewhat different.  ARFF files doesn't
> have standard support for identifying objects, let alone complex one as
> molecules. Even if we invent some convention like having molecule
> identifiers of certain type in the e.g. first column, this will be
> "OpenTox arff file", rather than generic "arff file".
>   
>> After a quick peek, there doesn't seem to be anything there that can't be
>>     
> done with good ol' XHTML and JSON. Worse, ARFF doesn't look like it supports
> hypertext, a cornerstone of all RESTful APIs:
>   
>> http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
>>   
>>     
> Exactly.  This is one of my main objections of having ARFF as standard
> format in OpenTox - there is no standard way to introduce molecule
> identifiers (preferably in form of URI) in ARFF, nor URI to
> descriptors/endpoints being used in the model. 
>
> We might have it as "export" format, in the same way we could have e.g.
> Microsoft Excel. For defining MIME type, it would be good to synchronize
> this with WEKA developers - if possible - or at least sending a message
> to WEKA mailing list. 
>
> Another point for discussion is whether a format, that is linked only to
> a specific implementation (WEKA) is appropriate; for example there could
> be services, providing same machine learning methods as in WEKA, but
> based on R, Matlab, etc.  Currently almost all services, providing
> machine learning algorithms are based on WEKA, thus the ARFF preference
> - but this might change in the future.
>
> Perhaps it will help if Tobias or other partners could explain the use
> cases that would benefit ARFF as a communication format between
> services, rather than only as an internal format for services that are
> based on WEKA.  For example, how ARFF would fit in the (very) simplified
> use case below:
>
> 1) The end user specifies molecules in e.g. SDF format. This is uploaded
> and became available as a dataset URI.
> 2) The dataset URI is submitted to a service, calculating descriptors.
> The descriptors have to be assigned to the molecules in the  dataset.
> 3) The dataset URI  (molecules and descriptors) is submitted to a
> service, offering a predictive model.
> 4) The model generates prediction, which needs to be assigned to the
> molecules in the dataset from 1)
> 5) The results (molecules and predictions) are reported in an user
> friendly format.
>
>
>   
>> It's only been recently that folks have started to pay attention to
>>     
> hypertext when developing RESTful APIs (although browsers have worked this
> way from the beginning). A lot of the discussion is pretty abstract. For
> some examples that apply to science, see:
>   
>> The RESTful Chemical Tracking System Series:
>>
>>     
> http://depth-first.com/articles/2009/08/07/the-restful-chemical-tracking-sys
> tem-part-1-introduction
>   
>> The Chemcaster API:
>> http://chemcaster.com/rest
>>
>> See also the Sun Cloud API:
>> http://kenai.com/projects/suncloudapis/pages/Home
>>
>> In all three examples, you'll notice a high priority placed on crafting
>>     
> domain-specific media types based on standard data formats like XHTML and
> JSON.
>   
>> How does Open Tox approach this issue?
>>   
>>     
> Have a look at dataset API  ( http://opentox.org/dev/apis/dataset )- in
> the current version a dataset XML format is essentially a set of URIs,
> referring to molecules and features (descriptors, etc.).  
>
> There was a proposal to have <link ref="URI"/>  into every XML,
> representing an object in OpenTox API. This might not be the case with
> the current API, which is to be updated soon, and I do agree hyper-links
> are essential for a fully RESTful design.
>
> Best regards,
> Nina
>   
>> Best,
>> Rich
>>
>> ___________________________________
>>
>> Richard L. Apodaca
>>
>> http://depth-first.com      Blog
>> http://metamolecular.com    Company
>>
>>
>> --- On Fri, 9/25/09, Tobias Girschick <tobias.girschick at in.tum.de> wrote:
>>
>>   
>>     
>>> From: Tobias Girschick <tobias.girschick at in.tum.de>
>>> Subject: [OTDev] ARFF mime type
>>> To: development at opentox.org
>>> Date: Friday, September 25, 2009, 3:53 AM
>>> Dear all,
>>>
>>> in Rome we were talking very shortly about a (at the
>>> moment
>>> non-existing) MIME type for arff files. Would it be
>>> possible to agree on
>>> something like 
>>> text/arff 
>>> although this type does officially not exist? Is there an
>>> alternative? 
>>> Cases where this MIME type will be of use are, for example
>>> if I want to
>>> retrieve a dataset via GET from /dataset/{id} in ARFF
>>> format.
>>>
>>> Any comments?
>>>
>>> Regards,
>>> Tobias
>>> -- 
>>> Dipl.-Bioinf. Tobias Girschick
>>>
>>> Technische Universität München
>>> Institut für Informatik
>>> Lehrstuhl I12 - Bioinformatik
>>> Bolzmannstr. 3
>>> 85748 Garching b. München, Germany
>>>
>>> Room: MI 01.09.042
>>> Phone: +49 (89) 289-18002
>>> Email: tobias.girschick at in.tum.de
>>> Web: http://wwwkramer.in.tum.de/people/girschic
>>>
>>> _______________________________________________
>>> Development mailing list
>>> Development at opentox.org
>>> http://www.opentox.org/mailman/listinfo/development
>>>
>>>     
>>>       
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>   
>>     
>
>
>   




More information about the Development mailing list