[OTDev] Java Examples for Dataset Creation/statistics/validation

Nina Jeliazkova nina at acad.bg
Wed Dec 9 14:24:17 CET 2009


Martin Guetlein wrote:
> Hi Nina, All,
>
> On Wed, Dec 9, 2009 at 8:26 AM, Nina Jeliazkova <nina at acad.bg> wrote:
>   
>> Hi Pantelis,
>>
>> chung wrote:
>>     
>>> Hi Nina,
>>>  At http://www.opentox.org/data/documents/development/RDF%
>>> 20files/AlgorithmTypes/view?searchterm=Algorithm%20Types%20ontology (the
>>> ontology for all algorithm types we use in OT), all algorithm types,
>>> appear to be Resources, not Literals. However in
>>> http://www.opentox.org/data/documents/development/RDF%
>>> 20files/JavaOnly/JenaExamples , the object which answers the question:
>>>
>>> <http://myservice.com/algorithm/id>
>>> <http://www.opentox.org/api/1.1#isA> ?obj
>>>
>>> is a literal. The corresponding triple is:
>>>
>>> Subject:
>>> http://opentox.ntua.gr:3000/algorithm/mlr
>>> Predicate:
>>> http://www.opentox.org/api/1.1#isA
>>> Object:
>>> "http://www.opentox.org/algorithmTypes.owl#RegressionEagerSingleTarget"
>>>
>>> Is this correct? If yes, should we use literals in that case or
>>> resources?
>>>
>>>
>>>       
>> |Should be resources,  not literals (the Range of isA property is a
>> resource).  I'll update the example ASAP.
>>     
>>> The same holds for the supported statistics. The java code snippet
>>> produces an RDF which includes the triple:
>>>
>>> http://opentox.ntua.gr:3000/algorithm/mlr
>>> http://www.opentox.org/api/1.1#statisticsSupported
>>> "statistics-1"^^http://www.w3.org/2001/XMLSchema#string
>>>
>>> Shouldn't the object
>>> ("statistics-1"^^http://www.w3.org/2001/XMLSchema#string ) be a Resource
>>> instead of a string? Was it a Resource, one assign propertied on it -
>>> For example one could declare its type etc.
>>>
>>>       
>> In the current opentox.owl supported statistics are simply literals
>> (just to follow the old XML spec), but I agree it will be better if
>> "statisticsSupported" are indeed resources.
>> On another note, statisticsSupported are closely related to the
>> Validation service, which already has defined several statistics,
>> specific to classification and regression models.  I am not sure how/if
>> Validation service (or another client or service) uses the information
>> from "statisticsSupported" field, but it would be good if this
>> information is somehow exploited.
>>
>> Tobias, Martin, what do you think?
>>     
>
> I don't quite understand what the statisticsSupported flag is about.
> In the example on the overview page
> (http://www.opentox.org/data/documents/development/RDF%20files/Overview)
> the svm algorithm supports all the regression statistics listed in the
> validation object so far. Would it not be enough to state that it is a
> regression algorithm (then the RegressionStatistics object in the
> validation result will be set)?
> Or do I misinterpret the functionality? If so, could you give an example?
>   
I guess TUM /NTUA could answer better.
> This leads to another question regarding validation. AFAIK there is no
> regression/classification flag in prediction models(?). That's why I'm
>   
This is supposed to be handled via AlgorithmTypes ontology 
http://opentox.org/data/documents/development/RDF%20files/AlgorithmTypes/view 

and each Algorithm is supposed to declare a link to that ontology via
ot:isA or owl:sameAs property. 

> planning to distinguish between regression and classification via data
> type of the prediction feature (numerical -> regression, else
> classification). Do you think that's sufficient?
>   
This might not be sufficient to handle e.g. Toxtree or clustering
algorithms .

Best regards,
Nina
> Best regards,
> Martin
>
>
>   
>> Best regards,
>> Nina
>>     
>>> These phenomena do not appear in the RDF representation of a dataset
>>> where most elements are Resources instead of Literals.
>>>
>>> On Tue, 2009-12-08 at 13:45 +0200, Nina Jeliazkova wrote:
>>>
>>>       
>>>> Hi Pantelis,
>>>>
>>>> In principle yes (the Class of the resource should be defined and this
>>>> is done via RDFType), but there is already in the example
>>>>
>>>>      OT.OTClass.Dataset.createOntClass(jenaModel);
>>>>
>>>> which does the same , if jenaModel is OntModel.
>>>>
>>>>         
>>> When I add this piece of code, the following triple is additionally
>>> included in the representation:
>>>
>>> * http://sth.com/dataset/1
>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>> * http://www.opentox.org/api/1.1#Dataset
>>>
>>> which is absent if
>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel)); is
>>> not included. Otherwise the only triple present is this:
>>>
>>> * http://www.opentox.org/api/1.1#Dataset
>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
>>> * http://www.w3.org/2002/07/owl#Class
>>>
>>> which simply implies that Dataset is of type Class (this doesn't provide
>>> information about the dataset itself, as an instance, but only for the
>>> resource http://www.opentox.org/api/1.1#Dataset . So this way, we have
>>> not defined that http://sth.com/dataset/1 is of type
>>> http://www.opentox.org/api/1.1#Dataset which in turn is a Class.
>>>
>>>
>>> Best Regards,
>>> Pantelis
>>>
>>>
>>>       
>>>> Regards,
>>>> Nina
>>>> chung wrote:
>>>>
>>>>         
>>>>> Hi Nina,
>>>>>  I think we have to include the following line in the code for the
>>>>> creation of an RDF representation for datasets:
>>>>>
>>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel));
>>>>>
>>>>> This declares that the Resource under consideration is a Dataset.
>>>>>
>>>>> P.S. Thanks for the snippets!
>>>>>
>>>>> Best regards,
>>>>> Pantelis
>>>>>
>>>>>
>>>>>           
>>> _______________________________________________
>>> Development mailing list
>>> Development at opentox.org
>>> http://www.opentox.org/mailman/listinfo/development
>>>
>>>       
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>>     
>
>
>
>   




More information about the Development mailing list