[OTDev] ARFF mime type

Tobias Girschick tobias.girschick at in.tum.de
Wed Sep 30 13:53:55 CEST 2009


Hello Nina,

On Tue, 2009-09-29 at 16:42 +0300, Nina Jeliazkova wrote:
> Hello Christoph,
> 
> Christoph Helma wrote:
> > Excerpts from Nina Jeliazkova's message of Mon Sep 28 14:48:33 +0200 2009:
> >   
> >>> I think at the present stage we should focus on finalizing and using our
> >>> internal data exchange format (which should contain URIs, not raw data).
> >>> At a later stage of the project we may cater for a better communication
> >>> with the outside world, by providing import/export facilities (which
> >>> may include arff, cml, sdf, ...). These conversion facilties can run as
> >>> a separate webservice, which would avoid multiple implementations of the
> >>> same feature in our webservices.
> >>>   
> >>>       
> >> One would need to be able to dereference links.
> >>     
> >
> > Yes - we have the compound and feature services for this purpose.
> > Dereferencing can be done lazily only when real data is needed. I am
> >   
> Yes, of course, but still the client should be aware what format to
> expect /request .
> > presently passing only URIs (using such a format) and did not experience
> > any performance problems due to dereferencing (needing a lot of dataset
> > operations with my neighbour based approach).
> >
> >   
> >> At least one standard
> >> format needs to be handled by the services themselves, otherwise no
> >> client or a separate service would be able to read the content
> >> referenced by the links.   IMHO a separate webservice for converting
> >> between formats doesn't seem to me as a RESTfull approach, but I might
> >> be wrong.
> >>     
> >
> > Well, a dataset service could also do format conversions. I just want
> > to avoid that every webservice has to have its own import/export
> > facilities.
> >
> >   
> You mean even if we have multiple implementation of dataset services,
> the conversion functionality should be restricted to e.g. dataset service?
> 
> The initial dataset proposal actually does assume conversion facilities,
> simply by being able to return different formats by specifying MIME
> types. For example, a dataset with format SDF is POST-ed, but then can
> be retrieved with format CML (or ARFF, YAML, etc.)

This is what I intended in my initial question on this thread. What I
wanted to do is specify a MIME type (which does not exist for ARFFs) and
retrieve the dataset information (that I might have POSTed earlier in
SDF to the /dataset resource). 

> 
> Do you think there is a need for specific "conversion service" rather
> than relying on Content-type ?

I like the content-type thing up to now.

> 
> 
> >>> A question to the XML guys: Is there a canonical way to represent such a
> >>> datastructure in XML?
> >>>
> >>>   
> >>>       
> >> Without going into much details , the xml below is would handle your
> >> structure and is pretty close to the current dataset/compounds/feature
> >> proposal.  
> >>     
> >
> > Just to confirm my own understanding: There is no "standardised" XML way
> > to (de)serialise common datastractures - I need to know the schema to
> > reconstruct a datastructure from XML.
> >
> >   
> Well, no, the purpose of XML is to serialize domain specific objects,
> not necessarily data structures. The reason is to provide implementation
> neutral description of domain objects, which is quite flexible IMHO (the
> same set of molecules and properties might not be implemented as hash
> table in a third party service, yet serialized to the same data format).
> >> <dataset>
> >>
> >>     <compound>
> >>         <link ref="uri"/>
> >>         <feature>
> >>             <link ref="uri"/>  
> >>         </feature>
> >>         <feature>
> >>             <link ref="uri"/>  
> >>         </feature>
> >>     </compound>
> >>     <compound>
> >>         <link ref="uri"/>
> >>         <feature>
> >>             <link ref="uri"/>  
> >>         </feature>
> >>     </compound>
> >>
> >> </dataset>
> >>
> >> Note that in your (YAML) and XML (above) format, it is not clear if
> >> "feature" means feature value or feature definition (name, link to
> >> ontology, etc.) and if a feature value, how it is linked to the feature
> >> definitions.  I would suggest not going into another round of proposing
> >> formats, but first comment on the API web pages what should be modified
> >> in the current 1.0 API.
> >>
> >> The current 1.0 proposal looks like
> >> <dataset>
> >>     <features>
> >>        <feature_definition>uri</feature_definition>
> >>        <feature_definition>uri</feature_definition>
> >>     </features>
> >>     <compound>uri</compound>
> >>     <compound>uri</compound>
> >> </dataset>
> >>
> >> and feature values can be assessed by 
> >> /compound/{cid}/feature_definition/{fid}  , thus allowing to reference
> >> any feature value of any compound defined in the particular dataset.
> >>
> >> Could you tell what is missing/inappropriate in the current dataset API
> >> 1.0 XML ?
> >>     
> >
> > Sorry, I was thinking in terms of my own feature API proposal (it is now
> > in the new API version on the website). Basically my main suggestion is to
> > move the feature-definition part into the feature-ontology and keep only
> > a very minimal feature API.
> >   
> I have still to read it. I guess if feature-ontology is read/write it
> would not be much different from the initial idea on feature-definitions.
> 
> Best regards,
> Nina
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >   
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/people/girschic




More information about the Development mailing list