[OTDev] Some questions on the RDF for Datasets

chung chvng at mail.ntua.gr
Thu Dec 10 15:41:04 CET 2009


Hi Nina, All,
 First of all, thanks a lot for the code!!! I'm almost done with the
parser but I still have a query about the RDF. What I currently do, to
retrieve the datatype of a feature is that for each feature, I pick a
"values" node and get the XSD datatype of its value. [This is more or
less fine for Regression Algorithms! ]

* I think that in case a featureValue appears in different data entries
with incompatible datatypes (e.g. date and double) then this should be
an exception. For example it is not normal to say that the Molecular
weight of the compound A is 100 and for the compound B is "XYZ". What do
you think?

* If a feature value shares the same data type with all other values of
the same feature (in all dataentries) then the data type can be thought
of as a property of the feature too. So I think that it would be
convenient to declare the datatype on every feature, e.g.
<http://someservder.com/feature/100> <dc:type> <XSD type URI>
This will not perturb the structure of the RDF for the Datasets. It
becomes more clear in the following case:

* When training a classification model, both Weka and LibSVM (as well as
other libraries) need to know the range of the dependent variable a
priori. A solution would of course be to get the different values of
that variable one by one (Iterating over all values for that feature).
However it would be again more convenient if the datatype was a property
of the feature itself. 

* Do we have a formal way for denoting missing values or they will not
appear at all?

Opinions?

Best Regards,
Pantelis

On Wed, 2009-12-09 at 23:21 +0200, Nina Jeliazkova wrote:
> chung wrote:
> > Dear Nina, Tobias,
> >  I'm trying to get access to anonymous Resources of an RDF document
> > using Jena. I want to iterate over all FeatureValue nodes and read their
> > value and the URI of the corresponding compound. Do you have any idea
> > how I could do that? 
> >
> > Is there a way to retrieve these information in a List or via an
> > ExtendedIterator somehow? 
> >
> >   
> Jena example at
> http://opentox.org/data/documents/development/RDF%20files/JavaOnly/JenaExamples/#section-22
> 
> Best regards,
> Nina
> > Do you think that it would be more convenient if we didn't use anonymous
> > nodes. For example, dataEntries could be URIs in the form
> > http://opentox.org/dataEntry/xyz ... (Well I'm not either sure about
> > that). 
> >
> > Tobias,  I believe we're also working on parsing RDF documents (i.e.
> > using RDF representations to generate weka.core.Instances objects), so
> > we could collaborate on that. 
> >
> > My source code can be found at http://github.com/sopasakis/yaqp .
> >
> > Best Regards,
> > Pantelis
> >
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >   
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
> 





More information about the Development mailing list