[OTDev] Validation: classification statistics for non-binary class values

Nina Jeliazkova nina at acad.bg
Wed Dec 9 12:38:59 CET 2009


Hi Martin,

Martin Guetlein wrote:
> Hi Nina, All,
>
> very good Point. Here is how it could look like:
>
> [[
> default:confusionmatrix
>   a ot:ConfusionMatrix ;
>
>   # contains numClassValues**2 entries like the following
>
>   ot:confusionMatrixValue
>   [
>     a ot:ConfusionMatrixValue ;
>     dc:value "25"^^xsd:int ;
>     ot:confusionMatrixCoordinates ;
>     [
> 	a ot:ConfusionMatrixCoordinate ;
> 	dc:predictedValue "active"^^xsd:String ;
> 	dc:actualValue "moderately_active"^^xsd:String ;
>     ]
>   ]
>   ...
> ]]
>
> I think we will end up with quite a lot of Classes in our opentox.owl.
>   
Having large number of classes should be fine, provided we are not
replicating things under different names.

Here is an idea how we can reuse some of the existing classes :

#This is a cell in a confusion matrix
    <owl:Class rdf:ID="ConfusionMatrixCell">
        <rdfs:subClassOf rdf:resource="#OpentoxResource"/>
    </owl:Class>
#the cell is linked to the Feature and the actual value via FeatureValue
class 
    <owl:ObjectProperty rdf:ID="confusionMatrixActual">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="#FeatureValue"/>
    </owl:ObjectProperty>

#the cell is linked to the Feature and the predicted value via
FeatureValue class
    <owl:ObjectProperty rdf:ID="confusionMatrixPredicted">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="#FeatureValue"/>
    </owl:ObjectProperty>
#and the numeric value itself
    <owl:DatatypeProperty rdf:ID="confusionMatrixValue">
        <rdf:type rdf:resource="&owl;FunctionalProperty"/>
        <rdfs:domain rdf:resource="#ConfusionMatrixCell"/>
        <rdfs:range rdf:resource="&xsd;int"/>
    </owl:DatatypeProperty>

#theabove is to be added in opentox.owl

#instances elsewhere (generated by services)
    <ConfusionMatrixCell rdf:ID="ConfusionMatrixCell_7">
        <confusionMatrixActual rdf:resource="#FeatureValue_8"/>
        <confusionMatrixPredicted rdf:resource="#FeatureValue_9"/>
        <confusionMatrixValue
rdf:datatype="&xsd;int">25</confusionMatrixValue>
    </ConfusionMatrixCell>

    <FeatureValue rdf:ID="FeatureValue_8">
        <feature rdf:resource="#Feature_10"/>
        <value rdf:datatype="&xsd;string">active</value>
    </FeatureValue>
    <FeatureValue rdf:ID="FeatureValue_9">
        <feature rdf:resource="#Feature_10"/>
        <value rdf:datatype="&xsd;string">moderate</value>
    </FeatureValue>

(as a side effect, visualising confusion matrix with relevant links for
predicted/actual will be straightforward :)

#and using ConfusionMatrixCell to denote a ConfusionMatrix as in your
proposal.

Any comments?

Best regards,
Nina
> Best Regards,
> Martin
>
>
>
> On Tue, Dec 8, 2009 at 12:57 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>   
>> Hi Martin,
>>
>> Do we have confusion matrix somewhere in the classification statistics?
>> It provides more information than just true positives.
>>
>> Best regards,
>> Nina
>>
>>
>> Martin Guetlein wrote:
>>     
>>> Hello All,
>>>
>>> as Harry noted in one of the last meetings, the classification
>>> statistics in the validation object only take binary classification
>>> into account so far. There can of course be more than one class value
>>> (e.g. inacitve, moderately-active, active).
>>> Hence, some classification results (e.g. numTruePositives) are now
>>> available multiple times (once for each class-value).
>>>
>>> As collections are not allowed in OWL-DL, I had to create
>>> intermediate classes (following the scheme Nina proposed for the
>>> dataset). Here is how an example of the Classification Statistics
>>> Object may look like:
>>>
>>> [[
>>> default:thisClassificationStatistics
>>>   a ot:classificationStatistics ;
>>>
>>>   ot:accuracy "99.0"^^xsd:float ; # accuracy is only available once
>>>   ot:numberUnclassified "26"^^xsd:int ;
>>>   ...
>>>
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "moderately_active"^^String ;
>>>       ot:classStatisticValue
>>>         [ a ot:ClassStatisticValue ;
>>>           ot:classStatistic default:areaUnderRocCurve ;
>>>           ot:value "0.77"^^:xsd:float ;
>>>         ] ;
>>>       ot:classStatisticValue
>>>         [ a ot:ClassStatisticValue ;
>>>           ot:classStatistic default:numTruePositives ;
>>>           ot:value "123"^^:xsd:int ;
>>>         ] ;
>>>       ot:classStatisticValue
>>>       ...
>>>
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "intactive"^^String ;
>>>       ...
>>>
>>>   ot:classStatisticEntry
>>>     [ a ot:classStatisticEntry ;
>>>       ot:classValue "active"^^String ;
>>>       ...
>>> ]]
>>>
>>> Here is the old classification statistics object (I renamed it from
>>> ClassifcationInformation to ClassificationStatistics):
>>> http://www.opentox.org/data/documents/development/RDF%20files/Validation/#-ot-classificationinfo-rdf
>>>
>>> Any comments, corrections before I add that to the opentox.owl?
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>>
>>>       
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>>     
>
>
>
>   




More information about the Development mailing list