[OTDev] Missing values [was Re: DataSet]
Christoph Helma helma at in-silico.deTue Oct 6 16:23:51 CEST 2009
- Previous message: [OTDev] Missing values [was Re: DataSet]
- Next message: [OTDev] Missing values [was Re: DataSet]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from Nina Jeliazkova's message of Mon Oct 05 08:40:08 +0200 2009: > Dear Pantelis, > > chung wrote: > > Hi Nina, > > > > On Fri, 2009-10-02 at 17:43 +0300, Nina Jeliazkova wrote: > > > >> Hi Pantelis, > >> > >> chung wrote: > >> > >>> Hi Nina, > >>> Once we define the RESTful operation in the new version of the API, we > >>> will have to start developing. Yet from the API 1.0, models are trained > >>> provided a dataset URI, so we need such a dataset to do some experiments > >>> (build an Instances object, train a model, perform some predictions > >>> using the trained model). Is it possible for you to provide us a dataset > >>> URI? > >>> > >> I am not sure what is the question - can you please clarify? > >> > > > > I mean that we need a dataset for which all RESTful operations specified > > in API 1.0 or API 1.1 are implemented and for every operation a status > > code 200 is normally expected. We need a dataset, say: > > > > http://someserver.com/dataset/123 (i) > > > > such that, for any compound in that, e.g. > > > > http://someserver.com/compound/55 (ii) > > > > and every feature definition in it: > > > > http://someserver.com/feature_definition/10 (iii) > > > > the following URI returns the value of the feature definition (iii) for > > the compound (ii): > > > > http://someserver.com/feature/compound/55/feature_definition/10 > > > > and will not return "NULL" or an error code (e.g. 404). > > We need that dataset to develop model training web services. The input > > parameters to our services will be the dataset uri and probably a URI > > for the target feature. Will it be possible for you to provide us a > > complete dataset object with all RESTful operations implemented? I mean, > > we dont need a huge one, 20 compounds and some feature definitions will > > be ok, but we need every compound/feature_definition pair to correspond > > to a feature value! > > > I understand your reasonong, but please note in a generic setup some > feature values might be missing and it is not the dataset provider job > to fix that. Handling missing values is usually done by the modeller, > we need still to think how to cast this process into the REST scheme. > > For example in the Toxcast dataset there are plenty of entries with > missing values; one might address the issue with creating "derived" > dataset by ignoring the entries without values, but one could also > replace missing values with e.g, averages or using more complicated > methods. I am copying this discussion to the development list as well, > because it is a generic question - should the OpenTox framework provide > API to handle missing values, where is the best place for this > (preprocessing algorithms?), what API do we need? My first impression is that we do not need a separate API (or a convention) for missing values - I should be the developers task to deal with "missing values". With a clear separation between features and feature annotations, we also would not run into the problem, that values for feature definitions are missing: A dataset representation would contain only the features, that are available, not feature definitions with possibly empty values. Best regards, Christoph
- Previous message: [OTDev] Missing values [was Re: DataSet]
- Next message: [OTDev] Missing values [was Re: DataSet]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list