[OTDev] ToxCreate integration of Ambit classification datasets
Martin Guetlein martin.guetlein at googlemail.comFri May 6 14:57:30 CEST 2011
- Previous message: [OTDev] ToxCreate integration of Ambit classification datasets
- Next message: [OTDev] ToxCreate integration of Ambit classification datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, Mar 22, 2011 at 7:55 PM, Nina Jeliazkova <jeliazkova.nina at gmail.com>wrote: > Dear Christoph, > > On 22 March 2011 20:43, Christoph Helma <helma at in-silico.ch> wrote: > > > Dear Nina, Vedrin, All, > > > > I had a look at feature > > http://apps.ideaconsult.net:8080/ambit2/feature/21573 from > > http://apps.ideaconsult.net:8080/ambit2/dataset/9, which raises some > > interesting questions: IMHO "Canc" is clearly a nominal feature, but its > > representation tells me that it is both a nominal and a numeric feature > > (maybe due to the fact that classes are represented as "1.0", "2.0" and > > "3.0"). > > > Yes. > > > > In order to call the correct (classification or regression) > > algorithms I need however to know unambiguously: > > > > 1. the feature type (Numeric or Nominal) > > 2. "true" and "false" classes for binary classifications > > > > I assume that 1. can be easily solved, by making NumericFeature and > > NominalFeature disjunct. > > > > Currently, a numeric feature can be nominal, which is useful in this case, > and I don't think it is contradictory. > > > > > > Guessing "true" and "false" classes is harder, because there are many > > possibilities to indicate them in real world datasets. In our services > > we are currently checking with regular expressions for common cases > > (e.g. active/inactive, 1/0, toxic/nontoxic, ...), but this will not work > > for all possible feature values. > > > > If you look at /dataset/9 RDF representation, there is ot:acceptValue in > RDF representation, which lists possible values for the feature. This was > agreed for API 1.1 and is in the opentox.owl , and is used by TUM/NTUA > services as far as I know. > > <http://apps.ideaconsult.net:8080/ambit2/feature/21573> > a ot:Feature , ot:NumericFeature , ot:NominalFeature ; > dc:creator "http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html" > ; > dc:title "Canc" ; > ot:acceptValue "3.0" , "1.0" ; > ot:hasSource < > > http://apps.ideaconsult.net:8080/ambit2/dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf > > > ; > ot:units "" ; > = otee:Carcinogenicity . > > I would suggest modifying your implementation to use ot:acceptValue, > instead > of regexp. > Hi Nina, I noticed the following, the property ot:acceptValue of feature/21573 is available here: curl http://apps.ideaconsult.net:8080/ambit2/dataset/9 -H "accept:application/turtle" <ot:Feature rdf:about=" http://apps.ideaconsult.net:8080/ambit2/feature/21573"> <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NominalFeature "></rdf:type> <ot:acceptValue>3.0</ot:acceptValue> <ot:acceptValue>1.0</ot:acceptValue> <ot:acceptValue>2.0</ot:acceptValue> <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NumericFeature "></rdf:type> <dc:creator>http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html </dc:creator> <ot:hasSource>ISSCAN_v3a_1153_19Sept08.1222179139.sdf</ot:hasSource> <owl:sameAs rdf:resource=" http://www.opentox.org/echaEndpoints.owl#Carcinogenicity"></owl:sameAs> <ot:units></ot:units> <dc:title>Canc</dc:title> but its missing here: curl http://apps.ideaconsult.net:8080/ambit2/dataset/9/features -H "accept:application/turtle" <ot:NumericFeature rdf:about="feature/21573"> <dc:creator>http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html </dc:creator> <ot:hasSource rdf:resource="dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf"/> <owl:sameAs rdf:resource=" http://www.opentox.org/echaEndpoints.owl#Carcinogenicity"/> <ot:units></ot:units> <dc:title>Canc</dc:title> <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NominalFeature"/> <rdf:type rdf:resource="http://www.opentox.org/api/1.1#Feature"/> </ot:NumericFeature> I would need acceptValue for validation purposes (and our dataset parsing routine uses the latter request for feature metadata). Could you fix that? Regards, Martin > > > > I have no definitive solution for problem 2, a few thoughts: > > > > a) Present a list of classes and let the user assign true and false > > classes > > + can be used for all datasets/features (also for the discretization > > of NumericFeatures > > - needs human intervention (not suited for automated model creation) > > - same step has to be repeated every time a dataset is used > > - might be error prone, might lead to suboptimal results from > > inexperienced users > > > > b) Standardize allowed values for NominalFeatures > > + unambiguous, automated processing possible > > - needs human curation of imported datasets > > > > > > > > I tend to favor b) as a long term solution, whats your opinion? > > > > This was the reason to introduce ot:acceptValue . It allows to specify > which are the allowed values. Setting the feature as nominal needs manual > intervention indeed. > > > > > > Another question: > > > > If I expand our regexp hack and implement a) as a fallback, I would need > > to write new feature values into a dataset. Would you prefer to > > > > - overwrite the old values in the original dataset (original > > information is lost) > > - add a new feature (with modified values) to the original dataset > > (original information untouched, but might destroy the dataset if > > handled improperly) > > - create a new consolidated dataset (IMHO safest) > > > > No problem to create new datasets, but preferred option is to use > ot:acceptValue, as regexp will not work for other datasets with different > values. > > Best regards, > Nina > > > > > > Best regards, > > Christoph > > _______________________________________________ > > Development mailing list > > Development at opentox.org > > http://www.opentox.org/mailman/listinfo/development > > > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development > -- Dipl-Inf. Martin Gütlein Phone: +49 (0)761 203 8442 (office) +49 (0)177 623 9499 (mobile) Email: guetlein at informatik.uni-freiburg.de
- Previous message: [OTDev] ToxCreate integration of Ambit classification datasets
- Next message: [OTDev] ToxCreate integration of Ambit classification datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list