[OTDev] Some questions on the RDF for Datasets

Nina Jeliazkova nina at acad.bg
Thu Dec 10 16:16:47 CET 2009


Hi Pantelis, All,

chung wrote:
> Hi Nina, All,
>  First of all, thanks a lot for the code!!! I'm almost done with the
>   
welcome :)
> parser but I still have a query about the RDF. What I currently do, to
> retrieve the datatype of a feature is that for each feature, I pick a
> "values" node and get the XSD datatype of its value. [This is more or
> less fine for Regression Algorithms! ]
>   
You could check if the value is a Literal , and use
((Literal)value).getInt() or similar functions.
> * I think that in case a featureValue appears in different data entries
> with incompatible datatypes (e.g. date and double) then this should be
> an exception. For example it is not normal to say that the Molecular
> weight of the compound A is 100 and for the compound B is "XYZ". What do
> you think?
>   
In a way it is normal, because for some compounds one can have range of
molecular weights, rather than single value. There are lot of such
examples in ambit datasets.
The real question is whether a modeler would like to deal with compounds
not having a single value for molecular weight (these will most probably
not be fixed structures).

There are also lot of examples where a value can be either numeric or
string, I have already showed an example with skin sensitisation
dataset. This is the real data. If it doesn't fit a modeling procedure,
then use a preprocessing / cleaning data service and create a new
dataset, complying with your requirements.

Then there is a case where a measured value is not a single value, but
denoted by an inequality ( e.g. > 1000).  One could have mix of fixed
numbers and inequalities in a single dataset. 

> * If a feature value shares the same data type with all other values of
> the same feature (in all dataentries) then the data type can be thought
> of as a property of the feature too. So I think that it would be
> convenient to declare the datatype on every feature, e.g.
> <http://someservder.com/feature/100> <dc:type> <XSD type URI>
> This will not perturb the structure of the RDF for the Datasets. It
> becomes more clear in the following case:
>   
See examples above.
> * When training a classification model, both Weka and LibSVM (as well as
> other libraries) need to know the range of the dependent variable a
> priori. A solution would of course be to get the different values of
> that variable one by one (Iterating over all values for that feature).
> However it would be again more convenient if the datatype was a property
> of the feature itself. 
>   
The range of the values is specific for a dataset, not per feature, so
what's wrong with getting values one by one?
> * Do we have a formal way for denoting missing values or they will not
> appear at all?
>   
FeatureValue will just not exist  if there is a missing value ;)

Best regards,
Nina
> Opinions?
>
> Best Regards,
> Pantelis
>
> On Wed, 2009-12-09 at 23:21 +0200, Nina Jeliazkova wrote:
>   
>> chung wrote:
>>     
>>> Dear Nina, Tobias,
>>>  I'm trying to get access to anonymous Resources of an RDF document
>>> using Jena. I want to iterate over all FeatureValue nodes and read their
>>> value and the URI of the corresponding compound. Do you have any idea
>>> how I could do that? 
>>>
>>> Is there a way to retrieve these information in a List or via an
>>> ExtendedIterator somehow? 
>>>
>>>   
>>>       
>> Jena example at
>> http://opentox.org/data/documents/development/RDF%20files/JavaOnly/JenaExamples/#section-22
>>
>> Best regards,
>> Nina
>>     
>>> Do you think that it would be more convenient if we didn't use anonymous
>>> nodes. For example, dataEntries could be URIs in the form
>>> http://opentox.org/dataEntry/xyz ... (Well I'm not either sure about
>>> that). 
>>>
>>> Tobias,  I believe we're also working on parsing RDF documents (i.e.
>>> using RDF representations to generate weka.core.Instances objects), so
>>> we could collaborate on that. 
>>>
>>> My source code can be found at http://github.com/sopasakis/yaqp .
>>>
>>> Best Regards,
>>> Pantelis
>>>
>>> _______________________________________________
>>> Development mailing list
>>> Development at opentox.org
>>> http://www.opentox.org/mailman/listinfo/development
>>>   
>>>       
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>>     
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list