[OTDev] ARFF mime type

Nina Jeliazkova nina at acad.bg
Tue Sep 29 15:42:19 CEST 2009


Hello Christoph,

Christoph Helma wrote:
> Excerpts from Nina Jeliazkova's message of Mon Sep 28 14:48:33 +0200 2009:
>   
>>> I think at the present stage we should focus on finalizing and using our
>>> internal data exchange format (which should contain URIs, not raw data).
>>> At a later stage of the project we may cater for a better communication
>>> with the outside world, by providing import/export facilities (which
>>> may include arff, cml, sdf, ...). These conversion facilties can run as
>>> a separate webservice, which would avoid multiple implementations of the
>>> same feature in our webservices.
>>>   
>>>       
>> One would need to be able to dereference links.
>>     
>
> Yes - we have the compound and feature services for this purpose.
> Dereferencing can be done lazily only when real data is needed. I am
>   
Yes, of course, but still the client should be aware what format to
expect /request .
> presently passing only URIs (using such a format) and did not experience
> any performance problems due to dereferencing (needing a lot of dataset
> operations with my neighbour based approach).
>
>   
>> At least one standard
>> format needs to be handled by the services themselves, otherwise no
>> client or a separate service would be able to read the content
>> referenced by the links.   IMHO a separate webservice for converting
>> between formats doesn't seem to me as a RESTfull approach, but I might
>> be wrong.
>>     
>
> Well, a dataset service could also do format conversions. I just want
> to avoid that every webservice has to have its own import/export
> facilities.
>
>   
You mean even if we have multiple implementation of dataset services,
the conversion functionality should be restricted to e.g. dataset service?

The initial dataset proposal actually does assume conversion facilities,
simply by being able to return different formats by specifying MIME
types. For example, a dataset with format SDF is POST-ed, but then can
be retrieved with format CML (or ARFF, YAML, etc.)

Do you think there is a need for specific "conversion service" rather
than relying on Content-type ?


>>> A question to the XML guys: Is there a canonical way to represent such a
>>> datastructure in XML?
>>>
>>>   
>>>       
>> Without going into much details , the xml below is would handle your
>> structure and is pretty close to the current dataset/compounds/feature
>> proposal.  
>>     
>
> Just to confirm my own understanding: There is no "standardised" XML way
> to (de)serialise common datastractures - I need to know the schema to
> reconstruct a datastructure from XML.
>
>   
Well, no, the purpose of XML is to serialize domain specific objects,
not necessarily data structures. The reason is to provide implementation
neutral description of domain objects, which is quite flexible IMHO (the
same set of molecules and properties might not be implemented as hash
table in a third party service, yet serialized to the same data format).
>> <dataset>
>>
>>     <compound>
>>         <link ref="uri"/>
>>         <feature>
>>             <link ref="uri"/>  
>>         </feature>
>>         <feature>
>>             <link ref="uri"/>  
>>         </feature>
>>     </compound>
>>     <compound>
>>         <link ref="uri"/>
>>         <feature>
>>             <link ref="uri"/>  
>>         </feature>
>>     </compound>
>>
>> </dataset>
>>
>> Note that in your (YAML) and XML (above) format, it is not clear if
>> "feature" means feature value or feature definition (name, link to
>> ontology, etc.) and if a feature value, how it is linked to the feature
>> definitions.  I would suggest not going into another round of proposing
>> formats, but first comment on the API web pages what should be modified
>> in the current 1.0 API.
>>
>> The current 1.0 proposal looks like
>> <dataset>
>>     <features>
>>        <feature_definition>uri</feature_definition>
>>        <feature_definition>uri</feature_definition>
>>     </features>
>>     <compound>uri</compound>
>>     <compound>uri</compound>
>> </dataset>
>>
>> and feature values can be assessed by 
>> /compound/{cid}/feature_definition/{fid}  , thus allowing to reference
>> any feature value of any compound defined in the particular dataset.
>>
>> Could you tell what is missing/inappropriate in the current dataset API
>> 1.0 XML ?
>>     
>
> Sorry, I was thinking in terms of my own feature API proposal (it is now
> in the new API version on the website). Basically my main suggestion is to
> move the feature-definition part into the feature-ontology and keep only
> a very minimal feature API.
>   
I have still to read it. I guess if feature-ontology is read/write it
would not be much different from the initial idea on feature-definitions.

Best regards,
Nina
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list