[OTDev] Validation: classification statistics for non-binary class values

Martin Guetlein martin.guetlein at googlemail.com
Wed Dec 9 13:45:09 CET 2009


Hi Nina,

On Wed, Dec 9, 2009 at 12:38 PM, Nina Jeliazkova <nina at acad.bg> wrote:
> Hi Martin,
>
> Martin Guetlein wrote:
>
> Hi Nina, All,
>
> very good Point. Here is how it could look like:
>
> [[
> default:confusionmatrix
>   a ot:ConfusionMatrix ;
>
>   # contains numClassValues**2 entries like the following
>
>   ot:confusionMatrixValue
>   [
>     a ot:ConfusionMatrixValue ;
>     dc:value "25"^^xsd:int ;
>     ot:confusionMatrixCoordinates ;
>     [
> 	a ot:ConfusionMatrixCoordinate ;
> 	dc:predictedValue "active"^^xsd:String ;
> 	dc:actualValue "moderately_active"^^xsd:String ;
>     ]
>   ]
>   ...
> ]]
>
> I think we will end up with quite a lot of Classes in our opentox.owl.
>
>
> Having large number of classes should be fine, provided we are not
> replicating things under different names.
>
> Here is an idea how we can reuse some of the existing classes :
>
> #This is a cell in a confusion matrix
>     <owl:Class rdf:ID="ConfusionMatrixCell">
>         <rdfs:subClassOf rdf:resource="#OpentoxResource"/>
>     </owl:Class>
> #the cell is linked to the Feature and the actual value via FeatureValue
> class
>     <owl:ObjectProperty rdf:ID="confusionMatrixActual">
>         <rdf:type rdf:resource="&owl;FunctionalProperty"/>
>         <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
>         <rdfs:range rdf:resource="#FeatureValue"/>
>     </owl:ObjectProperty>
>
> #the cell is linked to the Feature and the predicted value via FeatureValue
> class
>     <owl:ObjectProperty rdf:ID="confusionMatrixPredicted">
>         <rdf:type rdf:resource="&owl;FunctionalProperty"/>
>         <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
>         <rdfs:range rdf:resource="#FeatureValue"/>
>     </owl:ObjectProperty>
> #and the numeric value itself
>     <owl:DatatypeProperty rdf:ID="confusionMatrixValue">
>         <rdf:type rdf:resource="&owl;FunctionalProperty"/>
>         <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
>         <rdfs:range rdf:resource="&xsd;int"/>
>     </owl:DatatypeProperty>
>
> #theabove is to be added in opentox.owl
>
> #instances elsewhere (generated by services)
>     <ConfusionMatrixCell rdf:ID="ConfusionMatrixCell_7">
>         <confusionMatrixActual rdf:resource="#FeatureValue_8"/>
>         <confusionMatrixPredicted rdf:resource="#FeatureValue_9"/>
>         <confusionMatrixValue
> rdf:datatype="&xsd;int">25</confusionMatrixValue>
>     </ConfusionMatrixCell>
>
>     <FeatureValue rdf:ID="FeatureValue_8">
>         <feature rdf:resource="#Feature_10"/>
>         <value rdf:datatype="&xsd;string">active</value>
>     </FeatureValue>
>     <FeatureValue rdf:ID="FeatureValue_9">
>         <feature rdf:resource="#Feature_10"/>
>         <value rdf:datatype="&xsd;string">moderate</value>
>     </FeatureValue>
>
> (as a side effect, visualising confusion matrix with relevant links for
> predicted/actual will be straightforward :)
>
> #and using ConfusionMatrixCell to denote a ConfusionMatrix as in your
> proposal.
>
> Any comments?
>

Looks good, linking to feature values really makes sense.
I will try to integrate this into the opentox ontology today.

Best regards,
Martin


> Best regards,
> Nina
>
> Best Regards,
> Martin
>
>
>
> On Tue, Dec 8, 2009 at 12:57 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>
>
> Hi Martin,
>
> Do we have confusion matrix somewhere in the classification statistics?
> It provides more information than just true positives.
>
> Best regards,
> Nina
>
>
> Martin Guetlein wrote:
>
>
> Hello All,
>
> as Harry noted in one of the last meetings, the classification
> statistics in the validation object only take binary classification
> into account so far. There can of course be more than one class value
> (e.g. inacitve, moderately-active, active).
> Hence, some classification results (e.g. numTruePositives) are now
> available multiple times (once for each class-value).
>
> As collections are not allowed in OWL-DL, I had to create
> intermediate classes (following the scheme Nina proposed for the
> dataset). Here is how an example of the Classification Statistics
> Object may look like:
>
> [[
> default:thisClassificationStatistics
>   a ot:classificationStatistics ;
>
>   ot:accuracy "99.0"^^xsd:float ; # accuracy is only available once
>   ot:numberUnclassified "26"^^xsd:int ;
>   ...
>
>   ot:classStatisticEntry
>     [ a ot:classStatisticEntry ;
>       ot:classValue "moderately_active"^^String ;
>       ot:classStatisticValue
>         [ a ot:ClassStatisticValue ;
>           ot:classStatistic default:areaUnderRocCurve ;
>           ot:value "0.77"^^:xsd:float ;
>         ] ;
>       ot:classStatisticValue
>         [ a ot:ClassStatisticValue ;
>           ot:classStatistic default:numTruePositives ;
>           ot:value "123"^^:xsd:int ;
>         ] ;
>       ot:classStatisticValue
>       ...
>
>   ot:classStatisticEntry
>     [ a ot:classStatisticEntry ;
>       ot:classValue "intactive"^^String ;
>       ...
>
>   ot:classStatisticEntry
>     [ a ot:classStatisticEntry ;
>       ot:classValue "active"^^String ;
>       ...
> ]]
>
> Here is the old classification statistics object (I renamed it from
> ClassifcationInformation to ClassificationStatistics):
> http://www.opentox.org/data/documents/development/RDF%20files/Validation/#-ot-classificationinfo-rdf
>
> Any comments, corrections before I add that to the opentox.owl?
>
> Best regards,
> Martin
>
>
>
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>
>
>
>
>



-- 
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 8442 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de



More information about the Development mailing list