[OTDev] acceptValues (again)
Christoph Helma helma at in-silico.chTue May 10 18:39:08 CEST 2011
- Previous message: [OTDev] Today's presentation
- Next message: [OTDev] acceptValues (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear All, I have to bring up the topic of acceptValues for classification datasets once again. I will use the "stringified" version of the ISSCAN dataset (http://apps.ideaconsult.net:8080/ambit2/dataset/429390) as an example. This dataset has two nominal features: "Canc" (http://apps.ideaconsult.net:8080/ambit2/feature/530584) with acceptValues: "carcinogen", "noncarcinogen" "SAL" (http://apps.ideaconsult.net:8080/ambit2/feature/530585) with acceptValues: "ND", "equivocal", "mutagen", "nonmutagen" Especially the second example makes it clear that acceptValues are presently a mixed bag. Applying classification algorithms without caring for the semantics of acceptValues would also create "ND" and "equivocal" predictions, which is of course nonsense. Generally speaking we would need mechanisms to - indicate classes that should not be used for modelling (e.g. "ND", "equivocal", "inconclusive", ...) - distinguish between ordered (e.g. weak, medium, strong) and unordered classes (e.g. toxic mechanisms like narcotic, alkylating, ...) - indicate ranks in ordered classes (or "positives" vs "negatives" in binary classifications) This information is not only necessary for the graphical depiction of prediction results (coloring "toxic" classes in green would not be very intuitive), but also for selecting algorithms (regression can make sense for ordered classes, but not for unordered), the generation of reports and for validation (how can we determine sensitivity/specificity if we do not know positive/negative classes). I am aware that adding such information will require (documented) human intervention (WP3?), but I think it is worth the additional efforts. I also think that such information should be added to the source (i.e. datasets) and not through guesswork/hacks at the GUI/report/validation level. I would also like to retain the original information (e.g. equivocal classifications) in the dataset, because it can be useful for exploration and comparison purposes. If we can agree on these requirements we can proceed to discuss their implementation in the dataset representation. Best regards, Christoph
- Previous message: [OTDev] Today's presentation
- Next message: [OTDev] acceptValues (again)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list