[OTDev] RDF for dataset representation

Thu Oct 29 13:56:39 CET 2009

Christoph, All,

Christoph Helma wrote:
> Excerpts from Nina Jeliazkova's message of Wed Oct 28 14:45:27 +0100 2009:
>
>   
>> - Do we need to make distinction between different e.g. XLogP
>> implementations (I would say yes) ?  Is it possible to handle this via
>> BO ontology, or we need an extension?
>>     
>
> Egon? 
>
>   
>> - What would be the best way to extend BO ontology (this is more a
>> question to Egon)?
>> - How would we handle quantities, defined in existing data sets (e.g.
>> all LogP flavours available in EPA DSSTOX), not calculated via OpenTox,
>>     
>
> For DSSTOX we can use the URI of the field definitions, e.g.
>
> <http://www.epa.gov/ncct/dsstox/StandardChemFieldDefTable.html#STRUCTURE_MolecularWeight> or
> <http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#ActivityOutcome_CPDBAS_Rat>
>
>   
URI solves the "data source" issue, but if we are to map it to the
"Molecular Weight" ontology term as well, so we arrive again at 
"semantics" + "data source" attributes for a property.

It looks like we need an object to denote a "data source"  , where one
can incorporate information about the author, license, origin of the
data, etc . Alternatively, such properties can be attached to the object
themselves. Dublin Core ontology  should fit nicely in both cases. 
http://dublincore.org/

Currently OpenTox Reference object serves similar purpose, but we have
it attached only with features and not to e.g. Datasets and Algorithms
or Models.  And it was already noticed we don't have a way now to tell
what are the origin and license of the data (same is valid for models).

>> or an user uploaded dataset.
>>     
>
> If the user is unable to link to an existing ontology, (s)he still can
> use local links (e.g. <#my_new_algorithm>) as predicates (although that
> will not be very useful to put the results into a meaningful context,
> but it can be sufficient for computational experiments).
>   
Eventually it could be implemented as follows:
- upon upload a default semantics is attached;
- then the user should be able to specify links to existing ontology
(manually or automatic) or even define a new one.
Does this make sense?
>   
>> - How to handle quantities, calculated via some algorithm, but with
>> different parameters (e.g. eHOMO calculated with AM1 or PM3). 
>>     
>
> I think that this could be resolved at the ontology level (e.g.
> ontology:eHOMO/AM1 vs ontology:eHOMO/PM3 or
> ontology:eHOMO?parameters=AM1). See also the next point.
>   
Well, I could think of at least two more approaches:
- assign parameters to algorithm ontology (subclassesof algorithms,
given parameters)
- or even better, attach AM1/PM3/etc. parameters to the 3D structure of
the chemical and then link  eHOMO/eLUMO for that specific structure

(have to look if ChemAxiom handles similar issues)

>   
>> I would prefer that the property (e.g. blueobelisk:xlogp) refer to a
>> specific implementation, rather to the algorithm itself  (same concept
>> as algorithm/model split we already invented).
>> The implementation itself will be linked to the algorithm.
>>     
>
> The predicate (i.e. property) could be the URI of the service that has
> calculated the value. To make the process completely reproducible, we
> would need to provide the POST URI together with all parameters - I am
> not sure if RDF supports this.
>
>   
If a predicate is the URI of the service and not a term from an
ontology,  it seems to me we need indeed an additional object .
Therefore we could have 2 triples

LogPService - implements - LogPAlgorithm
Chemical -  LogPService - Value

Or in another style it could be

Chemical - hasProperty - "Octanol-water Partition Coefficient"
"Octanol-water Partition Coefficient" - isCalculatedby - "Software X"
"SoftwareX" - implements - "Algorithm XLogP"
"Algorithm XLogP" - described in (or the proper predicate from Dublin
Core )  - publication-for-XLogP-algorithm

and further
"Octanol-water Partition Coefficient"- same as -
ToxML/IUCLID5_entry_for_the_property

"Octanol-water Partition Coefficient" - is a  - 
PhysicochemicalProperty  (now we enter ECHA classification of endpoints)
PhysicochemicalProperty - is a - Endpoint 

(sorry for the free style of triples, I still need go get accustomed to
these. Perhaps an online collaborative RDF/OWL editor will help).

There is nice application for visualizing triples at
http://simile.mit.edu/welkin/  , one can load BO owl file and explore
the relationships.

>> Looking into the current list of feature definitions in Ambit
>> (http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition ), most of
>> them can be mapped to existing or to-be-developed ontologies, but we
>> need to extend your proposal in a way to keep track of the source of the
>> data.
>>
>> For example it is important to know that feature MolWeight 
>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109>is
>> representing Molecular weight, but I would not want to lose the
>> information it came from ISSCAN_v3a_1153_19Sept08.1222179139.sdf
>> <http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html>
>>
>> http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109
>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/11945>
>> This was the primary reason to invent feature definition to consist of
>> name + reference - I am sure this can be described in RDF as well.
>>
>>     
>
> Ah, now I get the idea behind the feature-definition.
>   
That means I've been very bad at explaining the idea over the last
months ...
>   
>> Actually I was thinking of an  (extensible) ontology for SMARTS defined
>> fragments;  ChEBI ontology  has lot of predefined groups that can be
>> used.  Read across use case will benefit from that :)
>>     
>
> Yes, but this should support also arbitrary SMARTS
> substructures that come e.g. from supervised graph mining.
>   
Yes of course (also structures drawn by an user) - this is one of the
reasons I would like to learn how to extend ontologies in a dynamic way.
>   
>> We would need a way to handle dynamically defined properties and even
>> ontologies.  I am particularly thinking of user-defined datasets.
>>     
>
> I agree, but I am not sure how to keep user defined ontologies
> consistent. We would need a curation process (who is responsible?), but
> maybe a simple tagging system could also work.
>   
I have a feeling ontologies consistency is an open problem ... let alone
data consistency, especially related to chemical structures and
associated data.

In a related thought  the following thread from this morning  at Linked
Open Data mailing list
http://lists.w3.org/Archives/Public/public-lod/2009Oct/0165.html is
quite interesting, especially this paragtaph:

Q. If sameAs indicates that two URI references contain information about
the same thing; how do we assert that two URI's contain the same
information about the same thing (ie identical data)?
A. You don't want to assert that they have the same data. You are
asserting co-reference i.e. the URIs are about the same Entity. Thus,
you can then perform union style expansion from the co-reference URIs to
get a bigger picture of a  given entity e.g., London, from a variety of
data sources.
>   
>> There are several Java libraries , even Restlet in 2.x has some support
>> (no querying) - graph structure with serialization to several formats.
>>     
>
> RDF support in Ruby could be better. Redland (http://librdf.org) seems
> to be fairly powerful and has Ruby (as well as Perl, PHP, Python and C)
> bindings, but it requires manual compilation of at least 3 libraries
> (i.e. no convenient 'gem install redland').
>
>   
Actually there are plenty of Java software for RDF, including powerful
RDF stores like Sesame.  An overview (23 entries)  is available at
http://java-source.net/open-source/rss-rdf-tools
Jena http://java-source.net/open-source/rss-rdf-tools/jena would be one
of the first candidates to try.

>>> I suspect that RDF could be also useful for the representation of other
>>> OpenTox objects (Algorithms, Models, ...).
>>>   
>>>       
>> Yes.  Could we have a closer look into Algorithm object in BO dictionary
>> and decide if it can be reused in OpenTox  ?
>>     
>
> Munich ?
>   

Could we use indeed start using some collaborative tool for developing
RDF or OWL?  Integration with Plone would be nice, but anything else
could do.  As a side effect we would get a nice graph of objects and
interdependencies.

Plone gurus - could you tell us if this module
http://plone.org/products/ploneontology is relevant  ?

Best regards,
Nina
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>