[OTDev] Descriptor Calculation Services
Christoph Helma helma at in-silico.deTue Jan 12 18:16:52 CET 2010
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from Tobias Girschick's message of Mon Jan 11 10:05:23 +0100 2010: > Hi Pantelis, All, > > On Thu, 2010-01-07 at 18:49 +0200, chung wrote: > > Hi Tobias, All, > > While trying to train a model, the service is possible to "find" some > > missing values for a specific feature. > > To obviate misunderstandings: You want to train a model with a data set > that contains missing values for a specific feature and the service > detects the missing features before training, right? > > > Is there a way to use your > > services to obtain the missing value? > > If the feature with the missing values was produced from our descriptor > calculation service, yes. But you would have to build a dataset with all > the compounds where the value is missing and submit it to the descriptor > calculation service. > The question is, if a model training service should automatically > provide the functionality of "filling up" missing values. I think this > is something that should be done in the preprocessing phase - in a > preprocessing/data cleaning service. I would be extremely careful with the addition of missing features for several reasons: - Sometimes there are good physical/chemical/biological/algorithmic reasons why features are missing - calculating these features might give you a number but it is very likely that it is meaningless. - A sameAs relationship does not guarantee, that (calculated and measured) feature values are comparable (very frequently they are not). - Even if you find a measured value for the same feature, there is a good chance, that it has been obtained by a different protocol and that it is not comparable with the other feature values. I would suggest to add features only - if you have a clear understanding, why a feature is missing - if you can prove that the feature calculation algorithm creates values that are comparable with the original measurements (or calculation algorithm) - if you clearly document how and why the original dataset has been modified Best regards, Christoph
- Previous message: [OTDev] Descriptor Calculation Services
- Next message: [OTDev] Descriptor Calculation Services
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list