[OTDev] ToxCreate integration of Ambit classification datasets

Martin Guetlein martin.guetlein at googlemail.com
Fri May 6 14:57:30 CEST 2011


On Tue, Mar 22, 2011 at 7:55 PM, Nina Jeliazkova
<jeliazkova.nina at gmail.com>wrote:

> Dear Christoph,
>
> On 22 March 2011 20:43, Christoph Helma <helma at in-silico.ch> wrote:
>
> > Dear Nina, Vedrin, All,
> >
> > I had a look at feature
> > http://apps.ideaconsult.net:8080/ambit2/feature/21573 from
> > http://apps.ideaconsult.net:8080/ambit2/dataset/9, which raises some
> > interesting questions: IMHO "Canc" is clearly a nominal feature, but its
> > representation tells me that it is both a nominal and a numeric feature
> > (maybe due to the fact that classes are represented as "1.0", "2.0" and
> > "3.0").
>
>
> Yes.
>
>
> > In order to call the correct (classification or regression)
> > algorithms I need however to know unambiguously:
> >
> >  1. the feature type (Numeric or Nominal)
> >  2. "true" and "false" classes for binary classifications
> >
> > I assume that 1. can be easily solved, by making NumericFeature and
> > NominalFeature disjunct.
> >
>
> Currently, a numeric feature can be nominal, which is useful in this case,
> and I don't think it is contradictory.
>
>
> >
> > Guessing "true" and "false" classes is harder, because there are many
> > possibilities to indicate them in real world datasets. In our services
> > we are currently checking with regular expressions for common cases
> > (e.g. active/inactive, 1/0, toxic/nontoxic, ...), but this will not work
> > for all possible feature values.
> >
>
> If you look at /dataset/9  RDF  representation, there is ot:acceptValue  in
> RDF representation, which lists possible values for the feature. This was
> agreed for API 1.1 and is in the opentox.owl , and is used by TUM/NTUA
> services as far as I know.
>
> <http://apps.ideaconsult.net:8080/ambit2/feature/21573>
>       a       ot:Feature , ot:NumericFeature , ot:NominalFeature ;
>      dc:creator "http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html"
> ;
>      dc:title "Canc" ;
>      ot:acceptValue "3.0" , "1.0" ;
>      ot:hasSource <
>
> http://apps.ideaconsult.net:8080/ambit2/dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf
> >
> ;
>      ot:units "" ;
>      =       otee:Carcinogenicity .
>
> I would suggest modifying your implementation to use ot:acceptValue,
> instead
> of regexp.
>


Hi Nina,

I noticed the following, the property ot:acceptValue of feature/21573 is
available here:

curl http://apps.ideaconsult.net:8080/ambit2/dataset/9 -H
"accept:application/turtle"

   <ot:Feature rdf:about="
http://apps.ideaconsult.net:8080/ambit2/feature/21573">
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NominalFeature
"></rdf:type>
    <ot:acceptValue>3.0</ot:acceptValue>
    <ot:acceptValue>1.0</ot:acceptValue>
    <ot:acceptValue>2.0</ot:acceptValue>
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NumericFeature
"></rdf:type>
    <dc:creator>http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html
</dc:creator>
    <ot:hasSource>ISSCAN_v3a_1153_19Sept08.1222179139.sdf</ot:hasSource>
    <owl:sameAs rdf:resource="
http://www.opentox.org/echaEndpoints.owl#Carcinogenicity"></owl:sameAs>
    <ot:units></ot:units>
    <dc:title>Canc</dc:title>

but its missing here:

curl http://apps.ideaconsult.net:8080/ambit2/dataset/9/features -H
"accept:application/turtle"

  <ot:NumericFeature rdf:about="feature/21573">
    <dc:creator>http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html
</dc:creator>
    <ot:hasSource
rdf:resource="dataset/ISSCAN_v3a_1153_19Sept08.1222179139.sdf"/>
    <owl:sameAs rdf:resource="
http://www.opentox.org/echaEndpoints.owl#Carcinogenicity"/>
    <ot:units></ot:units>
    <dc:title>Canc</dc:title>
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#NominalFeature"/>
    <rdf:type rdf:resource="http://www.opentox.org/api/1.1#Feature"/>
  </ot:NumericFeature>

I would need acceptValue for validation purposes (and our dataset parsing
routine uses the latter request for feature metadata). Could you fix that?

Regards,
Martin



>
>
> > I have no definitive solution for problem 2, a few thoughts:
> >
> > a) Present a list of classes and let the user assign true and false
> >   classes
> >   + can be used for all datasets/features (also for the discretization
> >   of NumericFeatures
> >   - needs human intervention (not suited for automated model creation)
> >   - same step has to be repeated every time a dataset is used
> >   - might be error prone, might lead to suboptimal results from
> > inexperienced users
> >
> > b) Standardize allowed values for NominalFeatures
> >   + unambiguous, automated processing possible
> >   - needs human curation of imported datasets
> >
>
>
> >
> > I tend to favor b) as a long term solution, whats your opinion?
> >
>
> This was the reason to introduce  ot:acceptValue .  It allows to specify
> which are the allowed values.  Setting the feature as nominal needs manual
> intervention indeed.
>
>
> >
> > Another question:
> >
> > If I expand our regexp hack and implement a) as a fallback, I would need
> > to write new feature values into a dataset. Would you prefer to
> >
> >  - overwrite the old values in the original dataset (original
> >    information is lost)
> >  - add a new feature (with modified values) to the original dataset
> >    (original information untouched, but might destroy the dataset if
> >    handled improperly)
> >  - create a new consolidated dataset (IMHO safest)
> >
>
> No problem to create new datasets, but preferred option is to use
> ot:acceptValue, as regexp will not work for other datasets with different
> values.
>
> Best regards,
> Nina
>
>
> >
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>



-- 
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 8442 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de



More information about the Development mailing list