[OTDev] ARFF mime type

Christoph Helma helma at in-silico.de
Tue Sep 29 14:46:03 CEST 2009


Excerpts from Nina Jeliazkova's message of Mon Sep 28 14:48:33 +0200 2009:
> > I think at the present stage we should focus on finalizing and using our
> > internal data exchange format (which should contain URIs, not raw data).
> > At a later stage of the project we may cater for a better communication
> > with the outside world, by providing import/export facilities (which
> > may include arff, cml, sdf, ...). These conversion facilties can run as
> > a separate webservice, which would avoid multiple implementations of the
> > same feature in our webservices.
> >   
> One would need to be able to dereference links.

Yes - we have the compound and feature services for this purpose.
Dereferencing can be done lazily only when real data is needed. I am
presently passing only URIs (using such a format) and did not experience
any performance problems due to dereferencing (needing a lot of dataset
operations with my neighbour based approach).

> At least one standard
> format needs to be handled by the services themselves, otherwise no
> client or a separate service would be able to read the content
> referenced by the links.   IMHO a separate webservice for converting
> between formats doesn't seem to me as a RESTfull approach, but I might
> be wrong.

Well, a dataset service could also do format conversions. I just want
to avoid that every webservice has to have its own import/export
facilities.

> > A question to the XML guys: Is there a canonical way to represent such a
> > datastructure in XML?
> >
> >   
> Without going into much details , the xml below is would handle your
> structure and is pretty close to the current dataset/compounds/feature
> proposal.  

Just to confirm my own understanding: There is no "standardised" XML way
to (de)serialise common datastractures - I need to know the schema to
reconstruct a datastructure from XML.

> 
> <dataset>
> 
>     <compound>
>         <link ref="uri"/>
>         <feature>
>             <link ref="uri"/>  
>         </feature>
>         <feature>
>             <link ref="uri"/>  
>         </feature>
>     </compound>
>     <compound>
>         <link ref="uri"/>
>         <feature>
>             <link ref="uri"/>  
>         </feature>
>     </compound>
> 
> </dataset>
> 
> Note that in your (YAML) and XML (above) format, it is not clear if
> "feature" means feature value or feature definition (name, link to
> ontology, etc.) and if a feature value, how it is linked to the feature
> definitions.  I would suggest not going into another round of proposing
> formats, but first comment on the API web pages what should be modified
> in the current 1.0 API.
> 
> The current 1.0 proposal looks like
> <dataset>
>     <features>
>        <feature_definition>uri</feature_definition>
>        <feature_definition>uri</feature_definition>
>     </features>
>     <compound>uri</compound>
>     <compound>uri</compound>
> </dataset>
> 
> and feature values can be assessed by 
> /compound/{cid}/feature_definition/{fid}  , thus allowing to reference
> any feature value of any compound defined in the particular dataset.
> 
> Could you tell what is missing/inappropriate in the current dataset API
> 1.0 XML ?

Sorry, I was thinking in terms of my own feature API proposal (it is now
in the new API version on the website). Basically my main suggestion is to
move the feature-definition part into the feature-ontology and keep only
a very minimal feature API.

Best regards,
Christoph



More information about the Development mailing list