[OTDev] Descriptor Calculation Services
Christoph Helma helma at in-silico.deThu Jan 14 12:56:41 CET 2010
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from Nina Jeliazkova's message of Wed Jan 13 09:06:37 +0100 2010: > > returns the URI of the generated cleaned-up dataset. This is the way > > feature selection services should work too. > > > > First of all, I think we need a cleanup service that removes all string > > features from the dataset > > You can do this right now with the following steps: > > 1) Get features for a dataset via /dataset/{id}/feature or any other > means (e.g. looking through the entire dataset ) > 2)Select string features (numerics are denoted as in the latest opentox > ontology as ot:NumericFeature) > 3) form the URL for the reduced dataset as > /dataset/{id}?feature_uris[]=/mynumericfeature1&feature_uris[]=/mynumericfeature > 2&feature_uris[]=/mynumericfeature3&eature_uris[]=etc > > String feature dropping service will be just a convenient wrapper for > the steps above. > > and a second service that handles the missing > > values of the dataset substituting them with the average of the median > > of all other values for the same feature in the dataset. > This functionality is indeed missing. What is the purpose of this, why do we need/want this functionality? > >>> I would be extremely careful with the addition of missing features for > >>> several reasons: > >>> > >>> - Sometimes there are good physical/chemical/biological/algorithmic reasons why > >>> features are missing - calculating these features might give > >>> you a number but it is very likely that it is meaningless. > >>> > >>> > >> Agree. > >> > > > > Yes, sometimes indeed. What about all other times. > It might be an interesting topic to think how do we distinguish the two > cases :) A descriptor calculation service can write this info to OWL-DL (see Ninas proposal for calculation errors), but I not very optimistic to get the same info reliably for measured values (e.g if a compound has poor solubility, high volatility, ... most experimenters would rather enter nothing instead of stating their difficulties with a compound). > > For instance how > > useful is a dataset which contains a set of compounds and values for one > > and only feature (the target) without a service that calculates the > > values for the other features? > > Information about description calculation services, used to generate > existing values should be available for each Feature via ot:hasSource > property. It is then straightforward to use the URL of the service to > launch remote or local calculation. Agreed. > > I believe that there are lots of reasons > > to have a service which searches for missing values in the dataset and > > tries to calculate them; after all that service will not be bundled with > > the model training and its use would be optional. > > > > > Why not just use descriptor calculation services , as they currently > exist? It is implementation detail if the service will prefer to > calculate existing values once again or only perform calculations where > these are not available (I would actually prefer the later as default > implementation, purely for performance reasons). Agreed. I do this e.g. for Tox predictions, if the service finds a measured value it returns the measured value, otherwise a prediction is calculated. Thanks Nina for your detailed explanantions. Best regards, Christoph
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list