[OTDev] Validation: Efficiency
Andreas Maunz andreas at maunz.deFri Feb 25 14:12:20 CET 2011
- Previous message: [OTDev] Validation: Efficiency
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Nina Jeliazkova wrote on 02/25/2011 12:53 PM: > Andreas, > > On 25 February 2011 13:28, Andreas Maunz <andreas at maunz.de > <mailto:andreas at maunz.de>> wrote: > > Nina, > > you are right (I think it still is the case that datasets are > redundant). > However, with different model parameters, which will probably be > used a lot in validation, new datasets will be created. > I think it would be definitely necessary to not store data > redundantly (as you indicated), but that might be only part of the > solution. > So it may still be necessary to compress the amount of policies needed. > > > Well, thinking further > > 1) I would implement validation splits (at least at our services) as > logical splits of the same dataset , assigning some tags, similar to > what is in the mutagenicity Benchmark dataset (look for column "Set" > http://apps.ideaconsult.net:8080/ambit2/feature/28956 ) > > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?max=100 > > and introduce searching similar to the queries below (restricted to the > property in question) > > Training set > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=TRAIN > > Crossvalidation sets > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV1 > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV2 > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV3 > ... > > > Thus, everything is in the original dataset (or a single copy of it on > another dataset service) and no need of additional policies. > > > Different features , calculated during validation run would be specified > via feature_uris[] parameter on the same dataset URI. > > http://apps.ideaconsult.net:8080/ambit2/dataset/2344?search=CV3?feature_uris[]=.... This approach is optimal in terms of avoiding redundancy. It imposes structure without adding more than the minimum required information, specifically without being redundant. I opt for us to definitely (try to) go a similar way. Andreas
- Previous message: [OTDev] Validation: Efficiency
- Next message: [OTDev] Validation: Efficiency
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list