[OTDev] Java Examples for Dataset Creation/statistics/validation - questions
Nina Jeliazkova nina at acad.bgWed Dec 9 16:56:43 CET 2009
- Previous message: [OTDev] Java Examples for Dataset Creation/statistics/validation
- Next message: [OTDev] Java Examples for Dataset Creation/statistics/validation - questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Tobias, Martin, All, Tobias Girschick wrote: > Hi Martin, > > On Wed, 2009-12-09 at 15:24 +0200, Nina Jeliazkova wrote: > >> Martin Guetlein wrote: >> >>> Hi Nina, All, >>> >>> On Wed, Dec 9, 2009 at 8:26 AM, Nina Jeliazkova <nina at acad.bg> wrote: >>> >>> >>>> Hi Pantelis, >>>> >>>> chung wrote: >>>> >>>> >>>>> Hi Nina, >>>>> At http://www.opentox.org/data/documents/development/RDF% >>>>> 20files/AlgorithmTypes/view?searchterm=Algorithm%20Types%20ontology (the >>>>> ontology for all algorithm types we use in OT), all algorithm types, >>>>> appear to be Resources, not Literals. However in >>>>> http://www.opentox.org/data/documents/development/RDF% >>>>> 20files/JavaOnly/JenaExamples , the object which answers the question: >>>>> >>>>> <http://myservice.com/algorithm/id> >>>>> <http://www.opentox.org/api/1.1#isA> ?obj >>>>> >>>>> is a literal. The corresponding triple is: >>>>> >>>>> Subject: >>>>> http://opentox.ntua.gr:3000/algorithm/mlr >>>>> Predicate: >>>>> http://www.opentox.org/api/1.1#isA >>>>> Object: >>>>> "http://www.opentox.org/algorithmTypes.owl#RegressionEagerSingleTarget" >>>>> >>>>> Is this correct? If yes, should we use literals in that case or >>>>> resources? >>>>> >>>>> >>>>> >>>>> >>>> |Should be resources, not literals (the Range of isA property is a >>>> resource). I'll update the example ASAP. >>>> >>>> >>>>> The same holds for the supported statistics. The java code snippet >>>>> produces an RDF which includes the triple: >>>>> >>>>> http://opentox.ntua.gr:3000/algorithm/mlr >>>>> http://www.opentox.org/api/1.1#statisticsSupported >>>>> "statistics-1"^^http://www.w3.org/2001/XMLSchema#string >>>>> >>>>> Shouldn't the object >>>>> ("statistics-1"^^http://www.w3.org/2001/XMLSchema#string ) be a Resource >>>>> instead of a string? Was it a Resource, one assign propertied on it - >>>>> For example one could declare its type etc. >>>>> >>>>> >>>>> >>>> In the current opentox.owl supported statistics are simply literals >>>> (just to follow the old XML spec), but I agree it will be better if >>>> "statisticsSupported" are indeed resources. >>>> On another note, statisticsSupported are closely related to the >>>> Validation service, which already has defined several statistics, >>>> specific to classification and regression models. I am not sure how/if >>>> Validation service (or another client or service) uses the information >>>> from "statisticsSupported" field, but it would be good if this >>>> information is somehow exploited. >>>> >>>> Tobias, Martin, what do you think? >>>> >>>> >>> I don't quite understand what the statisticsSupported flag is about. >>> In the example on the overview page >>> (http://www.opentox.org/data/documents/development/RDF%20files/Overview) >>> the svm algorithm supports all the regression statistics listed in the >>> validation object so far. Would it not be enough to state that it is a >>> regression algorithm (then the RegressionStatistics object in the >>> validation result will be set)? >>> Or do I misinterpret the functionality? If so, could you give an example? >>> >>> >> I guess TUM /NTUA could answer better. >> > > I think that at the moment all regression algorithms support all the > statistics. But this might not always be the case. Some algorithm might > for example only supply a RMSE and no other quality measure. > > I would like to post few questions for the clarification of the Algorithm - Model - Validation relationship: - What is the assumed usage of "supportedStatistics" ? - Currently "supportedStatistics" are declared as arbitrary strings. Is this sufficient or do we need to a stricter specification - either Resource or a list with allowed values? ( This was the original question in this thread by Pantelis). - How could one retrieve statistics (e.g. RMSE) directly from an existing model? Should we have a specific API? - Does the Validation service rely on such functionality, or calculates the relevant statistics itself ? Best regards, Nina > >>> This leads to another question regarding validation. AFAIK there is no >>> regression/classification flag in prediction models(?). That's why I'm >>> >>> >> This is supposed to be handled via AlgorithmTypes ontology >> http://opentox.org/data/documents/development/RDF%20files/AlgorithmTypes/view >> >> and each Algorithm is supposed to declare a link to that ontology via >> ot:isA or owl:sameAs property. >> >> >>> planning to distinguish between regression and classification via data >>> type of the prediction feature (numerical -> regression, else >>> classification). Do you think that's sufficient? >>> >>> >> This might not be sufficient to handle e.g. Toxtree or clustering >> algorithms . >> >> Best regards, >> Nina >> >>> Best regards, >>> Martin >>> >>> >>> >>> >>>> Best regards, >>>> Nina >>>> >>>> >>>>> These phenomena do not appear in the RDF representation of a dataset >>>>> where most elements are Resources instead of Literals. >>>>> >>>>> On Tue, 2009-12-08 at 13:45 +0200, Nina Jeliazkova wrote: >>>>> >>>>> >>>>> >>>>>> Hi Pantelis, >>>>>> >>>>>> In principle yes (the Class of the resource should be defined and this >>>>>> is done via RDFType), but there is already in the example >>>>>> >>>>>> OT.OTClass.Dataset.createOntClass(jenaModel); >>>>>> >>>>>> which does the same , if jenaModel is OntModel. >>>>>> >>>>>> >>>>>> >>>>> When I add this piece of code, the following triple is additionally >>>>> included in the representation: >>>>> >>>>> * http://sth.com/dataset/1 >>>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type >>>>> * http://www.opentox.org/api/1.1#Dataset >>>>> >>>>> which is absent if >>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel)); is >>>>> not included. Otherwise the only triple present is this: >>>>> >>>>> * http://www.opentox.org/api/1.1#Dataset >>>>> * http://www.w3.org/1999/02/22-rdf-syntax-ns#type >>>>> * http://www.w3.org/2002/07/owl#Class >>>>> >>>>> which simply implies that Dataset is of type Class (this doesn't provide >>>>> information about the dataset itself, as an instance, but only for the >>>>> resource http://www.opentox.org/api/1.1#Dataset . So this way, we have >>>>> not defined that http://sth.com/dataset/1 is of type >>>>> http://www.opentox.org/api/1.1#Dataset which in turn is a Class. >>>>> >>>>> >>>>> Best Regards, >>>>> Pantelis >>>>> >>>>> >>>>> >>>>> >>>>>> Regards, >>>>>> Nina >>>>>> chung wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Hi Nina, >>>>>>> I think we have to include the following line in the code for the >>>>>>> creation of an RDF representation for datasets: >>>>>>> >>>>>>> dataset.addRDFType(OT.OTClass.Dataset.createProperty(datasetModel)); >>>>>>> >>>>>>> This declares that the Resource under consideration is a Dataset. >>>>>>> >>>>>>> P.S. Thanks for the snippets! >>>>>>> >>>>>>> Best regards, >>>>>>> Pantelis >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> _______________________________________________ >>>>> Development mailing list >>>>> Development at opentox.org >>>>> http://www.opentox.org/mailman/listinfo/development >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Development mailing list >>>> Development at opentox.org >>>> http://www.opentox.org/mailman/listinfo/development >>>> >>>> >>>> >>> >>> >>> >> _______________________________________________ >> Development mailing list >> Development at opentox.org >> http://www.opentox.org/mailman/listinfo/development >> > > >
- Previous message: [OTDev] Java Examples for Dataset Creation/statistics/validation
- Next message: [OTDev] Java Examples for Dataset Creation/statistics/validation - questions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list