[OTDev] On the ARFF mime type
Christoph Helma helma at in-silico.deThu Oct 1 22:10:43 CEST 2009
- Previous message: [OTDev] On the ARFF mime type
- Next message: [OTDev] RDF as a common exchange format for OpenTox ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from Nina Jeliazkova's message of Thu Oct 01 19:37:50 +0200 2009: > >> What are the advantages of having separate service for data conversion, > >> rather than being able to request ARFF mime type from Dataset resource > >> (as Tobias suggested initially)? > >> IMHO the latest sounds more RESTfull. I would not say, that a data conversion service іs un-RESTful per se: POST /data-conversion file=filename, Content-type:text/arff => returns a dataset_uri (converts input file to our internal dataset representation and posts it to a dataset service) GET /data-conversion/{dataset_uri}, Accept:text/arff => returns ARFF for dataset_uri If the same data conversion routines are required at more than one place, it would make sense to factor them out into a separate service, to avoid duplication. But if we focus on exchanging dataset_uris and requesting data in our (not yet decided) canonical data exchange format it can as well be tied to the datasetset component. I would still insist that these conversion features are only for the conversation with the outside world. For internal data exchange we should use a single common format. If a lot of developers need to convert it into another format, say ARFF, this would be an argument for a ѕeparate conversion service. > What I am concerned is how these should be used by the client > application. Lets' look at the FastTox case. The user specifies the > dataset (by drawing compounds, searching, uploading SDF, etc.) . > Then the application shows the list of models (I am intentionally > skipping the endpoint selection step). The user selects few models to > be applied on his compounds and then presses "Predict" button. > > This should initiate POSTs on Models resources, with dataset URI as a > parameter. Now the Models need to dereference the dataset URI , > transform the content into their internal format, do the calculations > and (according to the current API ), return URI to the new calculated > features (prediction results). Here are the caveats: > > If Models expect format X that is not supported by the Dataset, > everything will fail, unless > 1)There is a logic in the Model that on failure it submits the > dataset to a transformation service. The Model should know where such > transformation service exists and hope it will do the conversion. > 2) It is not typical for such a logic to be in the Model, the other > place (besides the dataset resource itself) is in the client. That means > the client application should handle the case Model fails to apply a > dataset, because it doesn't understand the format. The Client App > should find the transformation service for each Model (provided there > are several for different formats) , get the results from conversion and > submit to the Models. > > I would prefer the case when Dataset supports several formats, then the > Model can first ask for its preferred format, provided it will be more > efficient for processing, and on fall back reply on a single common > format. Client App is then becoming quite simple :) If we define a model API like POST /model/{id} dataset_id={dataset_id} => prediction_uri and a dataset API GET /dataset/{dataset-id} => internal dataset representation the model should be able to work with the internal representation. How it achieves this goal (work with the internal representation, convert it internally to to another format, use a format-conversion service) is up to the developers of the model webservice. Neither the client nor the dataset service should have to know (or assume) anything about the internals of the model webservice (even if they want to make their life easier ;-)). Best regards, Christoph
- Previous message: [OTDev] On the ARFF mime type
- Next message: [OTDev] RDF as a common exchange format for OpenTox ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list