[OTDev] Java Examples for Dataset Creation/statistics/validation - questions

Nina Jeliazkova nina at acad.bg
Wed Dec 9 16:56:43 CET 2009


Tobias, Martin, All,

Tobias Girschick wrote:
> Hi Martin,
>
> On Wed, 2009-12-09 at 15:24 +0200, Nina Jeliazkova wrote: 
>   
>> Martin Guetlein wrote:
>>     
>>> Hi Nina, All,
>>>
>>> On Wed, Dec 9, 2009 at 8:26 AM, Nina Jeliazkova <nina at acad.bg> wrote:
>>>   
>>>       
>>>> Hi Pantelis,
>>>>
>>>> chung wrote:
>>>>     
>>>>         
>>>>> Hi Nina,
>>>>>  At http://www.opentox.org/data/documents/development/RDF%
>>>>> 20files/AlgorithmTypes/view?searchterm=Algorithm%20Types%20ontology (the
>>>>> ontology for all algorithm types we use in OT), all algorithm types,
>>>>> appear to be Resources, not Literals. However in
>>>>> http://www.opentox.org/data/documents/development/RDF%
>>>>> 20files/JavaOnly/JenaExamples , the object which answers the question:
>>>>>
>>>>> <http://myservice.com/algorithm/id>
>>>>> <http://www.opentox.org/api/1.1#isA> ?obj
>>>>>
>>>>> is a literal. The corresponding triple is:
>>>>>
>>>>> Subject:
>>>>> http://opentox.ntua.gr:3000/algorithm/mlr
>>>>> Predicate:
>>>>> http://www.opentox.org/api/1.1#isA
>>>>> Object:
>>>>> "http://www.opentox.org/algorithmTypes.owl#RegressionEagerSingleTarget"
>>>>>
>>>>> Is this correct? If yes, should we use literals in that case or
>>>>> resources?
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>> |Should be resources,  not literals (the Range of isA property is a
>>>> resource).  I'll update the example ASAP.
>>>>     
>>>>         
>>>>> The same holds for the supported statistics. The java code snippet
>>>>> produces an RDF which includes the triple:
>>>>>
>>>>> http://opentox.ntua.gr:3000/algorithm/mlr
>>>>> http://www.opentox.org/api/1.1#statisticsSupported
>>>>> "statistics-1"^^http://www.w3.org/2001/XMLSchema#string
>>>>>
>>>>> Shouldn't the object
>>>>> ("statistics-1"^^http://www.w3.org/2001/XMLSchema#string ) be a Resource
>>>>> instead of a string? Was it a Resource, one assign propertied on it -
>>>>> For example one could declare its type etc.
>>>>>
>>>>>       
>>>>>           
>>>> In the current opentox.owl supported statistics are simply literals
>>>> (just to follow the old XML spec), but I agree it will be better if
>>>> "statisticsSupported" are indeed resources.
>>>> On another note, statisticsSupported are closely related to the
>>>> Validation service, which already has defined several statistics,
>>>> specific to classification and regression models.  I am not sure how/if
>>>> Validation service (or another client or service) uses the information
>>>> from "statisticsSupported" field, but it would be good if this
>>>> information is somehow exploited.
>>>>
>>>> Tobias, Martin, what do you think?
>>>>     
>>>>         
>>> I don't quite understand what the statisticsSupported flag is about.
>>> In the example on the overview page
>>> (http://www.opentox.org/data/documents/development/RDF%20files/Overview)
>>> the svm algorithm supports all the regression statistics listed in the
>>> validation object so far. Would it not be enough to state that it is a
>>> regression algorithm (then the RegressionStatistics object in the
>>> validation result will be set)?
>>> Or do I misinterpret the functionality? If so, could you give an example?
>>>   
>>>       
>> I guess TUM /NTUA could answer better.
>>     
>
> I think that at the moment all regression algorithms support all the
> statistics. But this might not always be the case. Some algorithm might
> for example only supply a RMSE and no other quality measure. 
>
>   
I would like to post few questions for the clarification of the
Algorithm - Model - Validation relationship:

- What is the assumed usage of "supportedStatistics" ?
- Currently "supportedStatistics" are declared as arbitrary strings. Is
this sufficient or do we need to a stricter specification - either
Resource or a list with allowed values?  ( This was the original
question in this thread by Pantelis).
- How could one retrieve statistics (e.g. RMSE) directly from an
existing model?  Should we have a specific  API?
- Does the Validation service rely on such functionality, or calculates
the relevant statistics itself ?

Best regards,
Nina
>   
>>> This leads to another question regarding validation. AFAIK there is no
>>> regression/classification flag in prediction models(?). That's why I'm
>>>   
>>>       
>> This is supposed to be handled via AlgorithmTypes ontology 
>> http://opentox.org/data/documents/development/RDF%20files/AlgorithmTypes/view 
>>
>> and each Algorithm is supposed to declare a link to that ontology via
>> ot:isA or owl:sameAs property. 
>>
>>     
>>> planning to distinguish between regression and classification via data
>>> type of the prediction feature (numerical -> regression, else
>>> classification). Do you think that's sufficient?
>>>   
>>>       
>> This might not be sufficient to handle e.g. Toxtree or clustering
>> algorithms .
>>
>> Best regards,
>> Nina
>>     
>>> Best regards,
>>> Martin
>>>
>>>
>>>   
>>>       
>>>> Best regards,
>>>> Nina
>>>>     
>>>>         
>>>>> These phenomena do not appear in the RDF representation of a dataset
>>>>> where most elements are Resources instead of Literals.
>>>>>
>>>>> On Tue, 2009-12-08 at 13:45 +0200, Nina Jeliazkova wrote:
>>>>>
>>>>>       
>>>>>           
>>>>>> Hi Pantelis,
>>>>>>
>>>>>> In principle yes (the Class of the resource should be defined and this
>>>>>> is done via RDFType), but there is already in the example
>>>>>>
>>>>>>      OT.OTClass.Dataset.createOntClass(jenaModel);
>>>>>>
>>>>>> which does the same , if jenaModel is OntModel.
>>>>>>
>>>>>>         
>>>>>>             
>>>>> When I add this piece of code, the following triple is additionally
>>>>> included in the representation:
>>>>>
>>>>> * http://sth.com/dataset/1
>>>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> * http://www.opentox.org/api/1.1#Dataset
>>>>>
>>>>> which is absent if
>>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel)); is
>>>>> not included. Otherwise the only triple present is this:
>>>>>
>>>>> * http://www.opentox.org/api/1.1#Dataset
>>>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>>>> * http://www.w3.org/2002/07/owl#Class
>>>>>
>>>>> which simply implies that Dataset is of type Class (this doesn't provide
>>>>> information about the dataset itself, as an instance, but only for the
>>>>> resource http://www.opentox.org/api/1.1#Dataset . So this way, we have
>>>>> not defined that http://sth.com/dataset/1 is of type
>>>>> http://www.opentox.org/api/1.1#Dataset which in turn is a Class.
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Pantelis
>>>>>
>>>>>
>>>>>       
>>>>>           
>>>>>> Regards,
>>>>>> Nina
>>>>>> chung wrote:
>>>>>>
>>>>>>         
>>>>>>             
>>>>>>> Hi Nina,
>>>>>>>  I think we have to include the following line in the code for the
>>>>>>> creation of an RDF representation for datasets:
>>>>>>>
>>>>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel));
>>>>>>>
>>>>>>> This declares that the Resource under consideration is a Dataset.
>>>>>>>
>>>>>>> P.S. Thanks for the snippets!
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Pantelis
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>> _______________________________________________
>>>>> Development mailing list
>>>>> Development at opentox.org
>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>
>>>>>       
>>>>>           
>>>> _______________________________________________
>>>> Development mailing list
>>>> Development at opentox.org
>>>> http://www.opentox.org/mailman/listinfo/development
>>>>
>>>>     
>>>>         
>>>
>>>   
>>>       
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>     
>
>
>   




More information about the Development mailing list