[OTDev] Descriptor Calculation Services
Nina Jeliazkova nina at acad.bgThu Jan 14 14:04:31 CET 2010
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Christoph Helma wrote: > Excerpts from Nina Jeliazkova's message of Wed Jan 13 09:06:37 +0100 2010: > >>> returns the URI of the generated cleaned-up dataset. This is the way >>> feature selection services should work too. >>> >>> First of all, I think we need a cleanup service that removes all string >>> features from the dataset >>> >> You can do this right now with the following steps: >> >> 1) Get features for a dataset via /dataset/{id}/feature or any other >> means (e.g. looking through the entire dataset ) >> 2)Select string features (numerics are denoted as in the latest opentox >> ontology as ot:NumericFeature) >> 3) form the URL for the reduced dataset as >> /dataset/{id}?feature_uris[]=/mynumericfeature1&feature_uris[]=/mynumericfeature >> 2&feature_uris[]=/mynumericfeature3&eature_uris[]=etc >> >> String feature dropping service will be just a convenient wrapper for >> the steps above. >> I will use the chance to remind all developers, the dataset API 1.1 allows specifying feature URI and compound URI (this is an improvement over API 1.0, thanks to Christoph). http://opentox.org/dev/apis/api-1.1/dataset Query a dataset GET /dataset/{id} *compound_uris[]* and/or *feature_uris[]* to select compounds and features; These are very flexible means to get slices of a dataset (features = columns, compounds = rows ), or merging data across different datasets, without the need to download/upload dataset content. The above functionality is especially relevant for feature selection algorithms and data cleanup algorithms. Will it make sense for these kind of algorithms to specify output of the algorithm as a set of feature uris, instead of a dataset? e.g. FeatureSelection alorithm : input parameter dataset_uri ; output parameter feature_uri[] >>> and a second service that handles the missing >>> values of the dataset substituting them with the average of the median >>> of all other values for the same feature in the dataset. >>> >> This functionality is indeed missing. >> > What is the purpose of this, why do we need/want this functionality? > It is referring to one of the methods to handle missing values in machine learning (there are also more complex solutions than taking an average). We might check if we have included such methods in the list of planned ones, and with what priority. Best regards, Nina
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list