[OTDev] Validation: Efficiency
Christoph Helma helma at in-silico.chFri Feb 25 17:16:05 CET 2011
- Previous message: [OTDev] Receiving task
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> Nina Jeliazkova wrote on 02/25/2011 01:32 PM: > > > > > > On 25 February 2011 14:26, Martin Guetlein > > <martin.guetlein at googlemail.com <mailto:martin.guetlein at googlemail.com>> > > wrote: > > > > * Still 10 models would be created (and 10 validations, but I could > > try to solve this internally), so we would not end up with 1 policy > > for a crossvalidation. > > > > Unless predictions are stored in the same dataset. > > It sounds feasible to me. What do you think, Christoph? For efficiency reasons (and implementation simplicity) I prefer to keep datasets in small and manageable chunks. I am quite convinced that aggregating everything in a single dataset will not scale well. Lets assume a larger dataset with several 1000 compounds and several 1000-10000 class sensitive descriptors. Adding features for each validation fold would increase the dataset 11 times and with such a size I assume that all search/subset operations will be extremely slow. I do not even dare thinking about serialising such a monster to rdfxml. @Martin: Would it help with AA to have "sets of datasets" accessible through URIs like /dataset/{set_id}/{dataset_id}. Best regards, Christoph
- Previous message: [OTDev] Receiving task
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list