[OTDev] Java Examples for Dataset Creation/statistics/validation

Tobias Girschick tobias.girschick at in.tum.de
Wed Dec 9 16:26:35 CET 2009


Hi Martin,

On Wed, 2009-12-09 at 15:24 +0200, Nina Jeliazkova wrote: 
> Martin Guetlein wrote:
> > Hi Nina, All,
> >
> > On Wed, Dec 9, 2009 at 8:26 AM, Nina Jeliazkova <nina at acad.bg> wrote:
> >   
> >> Hi Pantelis,
> >>
> >> chung wrote:
> >>     
> >>> Hi Nina,
> >>>  At http://www.opentox.org/data/documents/development/RDF%
> >>> 20files/AlgorithmTypes/view?searchterm=Algorithm%20Types%20ontology (the
> >>> ontology for all algorithm types we use in OT), all algorithm types,
> >>> appear to be Resources, not Literals. However in
> >>> http://www.opentox.org/data/documents/development/RDF%
> >>> 20files/JavaOnly/JenaExamples , the object which answers the question:
> >>>
> >>> <http://myservice.com/algorithm/id>
> >>> <http://www.opentox.org/api/1.1#isA> ?obj
> >>>
> >>> is a literal. The corresponding triple is:
> >>>
> >>> Subject:
> >>> http://opentox.ntua.gr:3000/algorithm/mlr
> >>> Predicate:
> >>> http://www.opentox.org/api/1.1#isA
> >>> Object:
> >>> "http://www.opentox.org/algorithmTypes.owl#RegressionEagerSingleTarget"
> >>>
> >>> Is this correct? If yes, should we use literals in that case or
> >>> resources?
> >>>
> >>>
> >>>       
> >> |Should be resources,  not literals (the Range of isA property is a
> >> resource).  I'll update the example ASAP.
> >>     
> >>> The same holds for the supported statistics. The java code snippet
> >>> produces an RDF which includes the triple:
> >>>
> >>> http://opentox.ntua.gr:3000/algorithm/mlr
> >>> http://www.opentox.org/api/1.1#statisticsSupported
> >>> "statistics-1"^^http://www.w3.org/2001/XMLSchema#string
> >>>
> >>> Shouldn't the object
> >>> ("statistics-1"^^http://www.w3.org/2001/XMLSchema#string ) be a Resource
> >>> instead of a string? Was it a Resource, one assign propertied on it -
> >>> For example one could declare its type etc.
> >>>
> >>>       
> >> In the current opentox.owl supported statistics are simply literals
> >> (just to follow the old XML spec), but I agree it will be better if
> >> "statisticsSupported" are indeed resources.
> >> On another note, statisticsSupported are closely related to the
> >> Validation service, which already has defined several statistics,
> >> specific to classification and regression models.  I am not sure how/if
> >> Validation service (or another client or service) uses the information
> >> from "statisticsSupported" field, but it would be good if this
> >> information is somehow exploited.
> >>
> >> Tobias, Martin, what do you think?
> >>     
> >
> > I don't quite understand what the statisticsSupported flag is about.
> > In the example on the overview page
> > (http://www.opentox.org/data/documents/development/RDF%20files/Overview)
> > the svm algorithm supports all the regression statistics listed in the
> > validation object so far. Would it not be enough to state that it is a
> > regression algorithm (then the RegressionStatistics object in the
> > validation result will be set)?
> > Or do I misinterpret the functionality? If so, could you give an example?
> >   
> I guess TUM /NTUA could answer better.

I think that at the moment all regression algorithms support all the
statistics. But this might not always be the case. Some algorithm might
for example only supply a RMSE and no other quality measure. 


> > This leads to another question regarding validation. AFAIK there is no
> > regression/classification flag in prediction models(?). That's why I'm
> >   
> This is supposed to be handled via AlgorithmTypes ontology 
> http://opentox.org/data/documents/development/RDF%20files/AlgorithmTypes/view 
> 
> and each Algorithm is supposed to declare a link to that ontology via
> ot:isA or owl:sameAs property. 
> 
> > planning to distinguish between regression and classification via data
> > type of the prediction feature (numerical -> regression, else
> > classification). Do you think that's sufficient?
> >   
> This might not be sufficient to handle e.g. Toxtree or clustering
> algorithms .
> 
> Best regards,
> Nina
> > Best regards,
> > Martin
> >
> >
> >   
> >> Best regards,
> >> Nina
> >>     
> >>> These phenomena do not appear in the RDF representation of a dataset
> >>> where most elements are Resources instead of Literals.
> >>>
> >>> On Tue, 2009-12-08 at 13:45 +0200, Nina Jeliazkova wrote:
> >>>
> >>>       
> >>>> Hi Pantelis,
> >>>>
> >>>> In principle yes (the Class of the resource should be defined and this
> >>>> is done via RDFType), but there is already in the example
> >>>>
> >>>>      OT.OTClass.Dataset.createOntClass(jenaModel);
> >>>>
> >>>> which does the same , if jenaModel is OntModel.
> >>>>
> >>>>         
> >>> When I add this piece of code, the following triple is additionally
> >>> included in the representation:
> >>>
> >>> * http://sth.com/dataset/1
> >>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
> >>> * http://www.opentox.org/api/1.1#Dataset
> >>>
> >>> which is absent if
> >>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel)); is
> >>> not included. Otherwise the only triple present is this:
> >>>
> >>> * http://www.opentox.org/api/1.1#Dataset
> >>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type
> >>> * http://www.w3.org/2002/07/owl#Class
> >>>
> >>> which simply implies that Dataset is of type Class (this doesn't provide
> >>> information about the dataset itself, as an instance, but only for the
> >>> resource http://www.opentox.org/api/1.1#Dataset . So this way, we have
> >>> not defined that http://sth.com/dataset/1 is of type
> >>> http://www.opentox.org/api/1.1#Dataset which in turn is a Class.
> >>>
> >>>
> >>> Best Regards,
> >>> Pantelis
> >>>
> >>>
> >>>       
> >>>> Regards,
> >>>> Nina
> >>>> chung wrote:
> >>>>
> >>>>         
> >>>>> Hi Nina,
> >>>>>  I think we have to include the following line in the code for the
> >>>>> creation of an RDF representation for datasets:
> >>>>>
> >>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel));
> >>>>>
> >>>>> This declares that the Resource under consideration is a Dataset.
> >>>>>
> >>>>> P.S. Thanks for the snippets!
> >>>>>
> >>>>> Best regards,
> >>>>> Pantelis
> >>>>>
> >>>>>
> >>>>>           
> >>> _______________________________________________
> >>> Development mailing list
> >>> Development at opentox.org
> >>> http://www.opentox.org/mailman/listinfo/development
> >>>
> >>>       
> >> _______________________________________________
> >> Development mailing list
> >> Development at opentox.org
> >> http://www.opentox.org/mailman/listinfo/development
> >>
> >>     
> >
> >
> >
> >   
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development


-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick




More information about the Development mailing list