[OTDev] Missing values [was Re: DataSet]
Christoph Helma helma at in-silico.deWed Oct 7 09:40:50 CEST 2009
- Previous message: [OTDev] Missing values [was Re: DataSet]
- Next message: [OTDev] Missing values [was Re: DataSet]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > IMO, dataset with missing values is a valid one. There exists algorithms > > in machine learning that can deal with missing values, usually > > preprocessing ones (there are Weka implementations as well). > > In any case, one might need to create a dataset without missing values > > from a dataset with missing values (by ignoring empty entries or > > applying something else). I am not sure if there should be API on > > dataset level, on algorithms level, or model level. As understand, > > Christoph is in favour of dataset level API - am I right? І do not think, that we need API modifications to deal with missing levels. The whole problem can be solved - by choosing an appropriate dataset representation (see below) - by algorithm/model developers: they have to find a way to deal with missing values (calculate, ignore, ...) > I like the idea of having missing values represented within the dataset. > > One thing that would be useful, would be to have consistent notation to > indicate a missing value. Something like 'NA' etc I disagree. My impression, is that the whole concept of missing values originates from the fact that we (and a lot of software) are trained to think in terms of tables. Having a fixed nuber of columns requires of course a method to indicate missing values. As soon as we represent a dataset differently e.g. like compound1_uri: - feature1_uri - feature2_uri ... compound2_uri: - feature1_uri - feature3_uri ... ... or in XML <dataset> <compound> <link ref="uri"/> <feature> <link ref="uri"/> </feature> <feature> <link ref="uri"/> </feature> </compound> <compound> <link ref="uri"/> <feature> <link ref="uri"/> </feature> </compound> </dataset> we do not have to indicate missing features - they are just not there (and it is up to the model developer to deal with this situation). Best regards, Christoph
- Previous message: [OTDev] Missing values [was Re: DataSet]
- Next message: [OTDev] Missing values [was Re: DataSet]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list