[OTDev] ARFF mime type

Nina Jeliazkova nina at acad.bg
Mon Sep 28 14:48:33 CEST 2009


Hi Christoph, All,

Christoph Helma wrote:
> Hi all,
>
> I think at the present stage we should focus on finalizing and using our
> internal data exchange format (which should contain URIs, not raw data).
> At a later stage of the project we may cater for a better communication
> with the outside world, by providing import/export facilities (which
> may include arff, cml, sdf, ...). These conversion facilties can run as
> a separate webservice, which would avoid multiple implementations of the
> same feature in our webservices.
>   
One would need to be able to dereference links. At least one standard
format needs to be handled by the services themselves, otherwise no
client or a separate service would be able to read the content
referenced by the links.   IMHO a separate webservice for converting
between formats doesn't seem to me as a RESTfull approach, but I might
be wrong.
> For the format I would prefer YAML (lightweight, human readable, easy
> (de)serialisation of datastructures), but I think we will have to
> provide XML too.
>
> As datastructure I would suggest a hash with compound_uris as keys and
> arrays of feature_uris as values. In YAML this would look like:
>
> compound1_uri:
> 	- feature1_uri
> 	- feature2_uri
> 	- ...
> compound2_uri:
> 	- feature1_uri
> 	- feature3_uri
> 	- ...
> ...
>   
Again IMHO, hash is an implementation detail and exchange formats should
be independent of implementation details, allowing different
implementations.


> A question to the XML guys: Is there a canonical way to represent such a
> datastructure in XML?
>
>   
Without going into much details , the xml below is would handle your
structure and is pretty close to the current dataset/compounds/feature
proposal.  

<dataset>

    <compound>
        <link ref="uri"/>
        <feature>
            <link ref="uri"/>  
        </feature>
        <feature>
            <link ref="uri"/>  
        </feature>
    </compound>
    <compound>
        <link ref="uri"/>
        <feature>
            <link ref="uri"/>  
        </feature>
    </compound>

</dataset>

Note that in your (YAML) and XML (above) format, it is not clear if
"feature" means feature value or feature definition (name, link to
ontology, etc.) and if a feature value, how it is linked to the feature
definitions.  I would suggest not going into another round of proposing
formats, but first comment on the API web pages what should be modified
in the current 1.0 API.

The current 1.0 proposal looks like
<dataset>
    <features>
       <feature_definition>uri</feature_definition>
       <feature_definition>uri</feature_definition>
    </features>
    <compound>uri</compound>
    <compound>uri</compound>
</dataset>

and feature values can be assessed by 
/compound/{cid}/feature_definition/{fid}  , thus allowing to reference
any feature value of any compound defined in the particular dataset.

Could you tell what is missing/inappropriate in the current dataset API
1.0 XML ?


Best regards,
Nina
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list