[OTDev] Validation: classification statistics for non-binary class values

Nina Jeliazkova nina at acad.bg
Wed Dec 9 12:38:59 CET 2009

Hi Martin,

Martin Guetlein wrote:
> Hi Nina, All,
> very good Point. Here is how it could look like:
> [[
> default:confusionmatrix
>   a ot:ConfusionMatrix ;
>   # contains numClassValues**2 entries like the following
>   ot:confusionMatrixValue
>   [
>     a ot:ConfusionMatrixValue ;
>     dc:value "25"^^xsd:int ;
>     ot:confusionMatrixCoordinates ;
>     [
> 	a ot:ConfusionMatrixCoordinate ;
> 	dc:predictedValue "active"^^xsd:String ;
> 	dc:actualValue "moderately_active"^^xsd:String ;
>     ]
>   ]
>   ...
> ]]
> I think we will end up with quite a lot of Classes in our opentox.owl.
Having large number of classes should be fine, provided we are not
replicating things under different names.

Here is an idea how we can reuse some of the existing classes :

#This is a cell in a confusion matrix
    <owl:Class rdf:ID="ConfusionMatrixCell">
        <rdfs:subClassOf rdf:resource="#OpentoxResource"/>
#the cell is linked to the Feature and the actual value via FeatureValue
    <owl:ObjectProperty rdf:ID="confusionMatrixActual">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="#FeatureValue"/>

#the cell is linked to the Feature and the predicted value via
FeatureValue class
    <owl:ObjectProperty rdf:ID="confusionMatrixPredicted">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="#FeatureValue"/>
#and the numeric value itself
    <owl:DatatypeProperty rdf:ID="confusionMatrixValue">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="&xsd;int"/>

#theabove is to be added in opentox.owl

#instances elsewhere (generated by services)
    <ConfusionMatrixCell rdf:ID="ConfusionMatrixCell_7">
        <confusionMatrixActual rdf:resource="#FeatureValue_8"/>
        <confusionMatrixPredicted rdf:resource="#FeatureValue_9"/>

    <FeatureValue rdf:ID="FeatureValue_8">
        <feature rdf:resource="#Feature_10"/>
        <value rdf:datatype="&xsd;string">active</value>
    <FeatureValue rdf:ID="FeatureValue_9">
        <feature rdf:resource="#Feature_10"/>
        <value rdf:datatype="&xsd;string">moderate</value>

(as a side effect, visualising confusion matrix with relevant links for
predicted/actual will be straightforward :)

#and using ConfusionMatrixCell to denote a ConfusionMatrix as in your

Any comments?

Best regards,
> Best Regards,
> Martin
> On Tue, Dec 8, 2009 at 12:57 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>> Hi Martin,
>> Do we have confusion matrix somewhere in the classification statistics?
>> It provides more information than just true positives.
>> Best regards,
>> Nina
>> Martin Guetlein wrote:
>>> Hello All,
>>> as Harry noted in one of the last meetings, the classification
>>> statistics in the validation object only take binary classification
>>> into account so far. There can of course be more than one class value
>>> (e.g. inacitve, moderately-active, active).
>>> Hence, some classification results (e.g. numTruePositives) are now
>>> available multiple times (once for each class-value).
>>> As collections are not allowed in OWL-DL, I had to create
>>> intermediate classes (following the scheme Nina proposed for the
>>> dataset). Here is how an example of the Classification Statistics
>>> Object may look like:
>>> [[
>>> default:thisClassificationStatistics
>>>   a ot:classificationStatistics ;
>>>   ot:accuracy "99.0"^^xsd:float ; # accuracy is only available once
>>>   ot:numberUnclassified "26"^^xsd:int ;
>>>   ...
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "moderately_active"^^String ;
>>>       ot:classStatisticValue
>>>         [ a ot:ClassStatisticValue ;
>>>           ot:classStatistic default:areaUnderRocCurve ;
>>>           ot:value "0.77"^^:xsd:float ;
>>>         ] ;
>>>       ot:classStatisticValue
>>>         [ a ot:ClassStatisticValue ;
>>>           ot:classStatistic default:numTruePositives ;
>>>           ot:value "123"^^:xsd:int ;
>>>         ] ;
>>>       ot:classStatisticValue
>>>       ...
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "intactive"^^String ;
>>>       ...
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "active"^^String ;
>>>       ...
>>> ]]
>>> Here is the old classification statistics object (I renamed it from
>>> ClassifcationInformation to ClassificationStatistics):
>>> http://www.opentox.org/data/documents/development/RDF%20files/Validation/#-ot-classificationinfo-rdf
>>> Any comments, corrections before I add that to the opentox.owl?
>>> Best regards,
>>> Martin
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development

More information about the Development mailing list