[OTDev] ToxCreate integration of Ambit classification datasets
Nina Jeliazkova jeliazkova.nina at gmail.comTue Mar 22 19:55:45 CET 2011
- Previous message: [OTDev] ToxCreate integration of Ambit classification datasets
- Next message: [OTDev] ToxCreate integration of Ambit classification datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear Christoph, On 22 March 2011 20:43, Christoph Helma <helma at in-silico.ch> wrote: > Dear Nina, Vedrin, All, > > I had a look at feature > http://apps.ideaconsult.net:8080/ambit2/feature/21573 from > http://apps.ideaconsult.net:8080/ambit2/dataset/9, which raises some > interesting questions: IMHO "Canc" is clearly a nominal feature, but its > representation tells me that it is both a nominal and a numeric feature > (maybe due to the fact that classes are represented as "1.0", "2.0" and > "3.0"). Yes. > In order to call the correct (classification or regression) > algorithms I need however to know unambiguously: > > 1. the feature type (Numeric or Nominal) > 2. "true" and "false" classes for binary classifications > > I assume that 1. can be easily solved, by making NumericFeature and > NominalFeature disjunct. > Currently, a numeric feature can be nominal, which is useful in this case, and I don't think it is contradictory. > > Guessing "true" and "false" classes is harder, because there are many > possibilities to indicate them in real world datasets. In our services > we are currently checking with regular expressions for common cases > (e.g. active/inactive, 1/0, toxic/nontoxic, ...), but this will not work > for all possible feature values. > If you look at /dataset/9 RDF representation, there is ot:acceptValue in RDF representation, which lists possible values for the feature. This was agreed for API 1.1 and is in the opentox.owl , and is used by TUM/NTUA services as far as I know. <http://apps.ideaconsult.net:8080/ambit2/feature/21573> a ot:Feature , ot:NumericFeature , ot:NominalFeature ; dc:creator "http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html" ; dc:title "Canc" ; ot:acceptValue "3.0" , "1.0" ; ot:hasSource < http://apps.ideaconsult.net:8080/ambit2/dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf> ; ot:units "" ; = otee:Carcinogenicity . I would suggest modifying your implementation to use ot:acceptValue, instead of regexp. > I have no definitive solution for problem 2, a few thoughts: > > a) Present a list of classes and let the user assign true and false > classes > + can be used for all datasets/features (also for the discretization > of NumericFeatures > - needs human intervention (not suited for automated model creation) > - same step has to be repeated every time a dataset is used > - might be error prone, might lead to suboptimal results from > inexperienced users > > b) Standardize allowed values for NominalFeatures > + unambiguous, automated processing possible > - needs human curation of imported datasets > > > I tend to favor b) as a long term solution, whats your opinion? > This was the reason to introduce ot:acceptValue . It allows to specify which are the allowed values. Setting the feature as nominal needs manual intervention indeed. > > Another question: > > If I expand our regexp hack and implement a) as a fallback, I would need > to write new feature values into a dataset. Would you prefer to > > - overwrite the old values in the original dataset (original > information is lost) > - add a new feature (with modified values) to the original dataset > (original information untouched, but might destroy the dataset if > handled improperly) > - create a new consolidated dataset (IMHO safest) > No problem to create new datasets, but preferred option is to use ot:acceptValue, as regexp will not work for other datasets with different values. Best regards, Nina > > Best regards, > Christoph > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] ToxCreate integration of Ambit classification datasets
- Next message: [OTDev] ToxCreate integration of Ambit classification datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list