[OTDev] RDF for dataset representation

Nina Jeliazkova nina at acad.bg
Wed Oct 28 14:45:27 CET 2009


Hi Christoph,

Sounds good, see more comments inline, mostly based on use cases I am
interested in.

Christoph Helma wrote:
> Hi all,
>
> I had a closer look at the RDF ѕtuff, and from my superficial understanding up
> to now (no practical experience yet) it seems to be a good exchange format for
> the dataset component.  I would suggest the following convention for creating
> RDF triplets that represent a dataset:
>
> Subject:    Compound URI
> Predicate:  Measurement/Algorithm definition URI
> Object:     Feature value
>
> An example could look like (in Notation 3 http://www.w3.org/2000/10/swap/Primer):
>
>   @prefix algorithm: <http://www.opentox.org/ontologies/algorithm/> .
>   @prefix toxicity: <http://www.opentox.org/ontologies/toxicity/> .
>   @prefix blueobelisk: <http://blueobelisk.sourceforge.net/ontologies/chemoinformatics-algorithms/#> .
>
>   # Examples:
>
>   # a calculated logP
>   <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  blueobelisk:xlogP -2.20 .
>   

- Do we need to make distinction between different e.g. XLogP
implementations (I would say yes) ?  Is it possible to handle this via
BO ontology, or we need an extension?
- What would be the best way to extend BO ontology (this is more a
question to Egon)?
- How would we handle quantities, defined in existing data sets (e.g.
all LogP flavours available in EPA DSSTOX), not calculated via OpenTox,
or an user uploaded dataset.
- How to handle quantities, calculated via some algorithm, but with
different parameters (e.g. eHOMO calculated with AM1 or PM3). 

I would prefer that the property (e.g. blueobelisk:xlogp) refer to a
specific implementation, rather to the algorithm itself  (same concept
as algorithm/model split we already invented).
The implementation itself will be linked to the algorithm.

Looking into the current list of feature definitions in Ambit
(http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition ), most of
them can be mapped to existing or to-be-developed ontologies, but we
need to extend your proposal in a way to keep track of the source of the
data.

For example it is important to know that feature MolWeight 
<http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109>is
representing Molecular weight, but I would not want to lose the
information it came from ISSCAN_v3a_1153_19Sept08.1222179139.sdf
<http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html>

http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109
<http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/11945>
This was the primary reason to invent feature definition to consist of
name + reference - I am sure this can be described in RDF as well.

>   # toxicological classification
>   <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  toxicity:multi_cell_call "active" .
>
>   # a class sensitive structural feature, calculated by an algorithm that does not yet exist in an established ontology
>   <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  algorithm:backbone_refinement_class  [ <#smarts> "N-N"; <#p_value> 0.9998; <#effect> "activating" ] .
>
>   

Actually I was thinking of an  (extensible) ontology for SMARTS defined
fragments;  ChEBI ontology  has lot of predefined groups that can be
used.  Read across use case will benefit from that :)


> or in RDF/XML:
>
>   <rdf:RDF xmlns="file:///home/ch/ontologies/tmp#"
>       xmlns:algorithm="http://www.opentox.org/ontologies/algorithm/"
>       xmlns:blueobelisk="http://blueobelisk.sourceforge.net/ontologies/chemoinformatics-algorithms/#"
>       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>       xmlns:toxicity="http://www.opentox.org/ontologies/toxicity/">
>
>       <rdf:Description rdf:about="http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2">
>           <blueobelisk:xlogP rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">-2.2</blueobelisk:xlogP>
>           <algorithm:backbone_refinement_class rdf:parseType="Resource">
>               <effect>activating</effect>
>               <p_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">0.9998</p_value>
>               <smarts>N-N</smarts>
>           </algorithm:backbone_refinement_class>
>           <toxicity:multi_cell_call>active</toxicity:multi_cell_call>
>       </rdf:Description>
>   </rdf:RDF>
>
> Advantages: 
>
>   - We do not need a separate feature webservice (at least for for simple feature values and moderatly complex features, like the tuples in the BBRC example)
>   - We do not need necessarily a feature-ontology (or feature-definition) webservice, if we use, expand and combine existing ontologies
>   
We would need a way to handle dynamically defined properties and even
ontologies.  I am particularly thinking of user-defined datasets.
>   - It can help us to solve the problem of unique IDs, by using URIs 
>   
AFAIK, that will require an RDF store for the ontology service
(centralised one?)  - am I right?   It would be good if three is a
distributed solution.
>   - Plays well with REST
>   - Established standard
>   - Facilitates queries/reasoning (especially useful for building GUIs)
>   

> Possible disadvantages:
>   
>   - Support in programming languages?
>   
There are several Java libraries , even Restlet in 2.x has some support
(no querying) - graph structure with serialization to several formats.
> I suspect that RDF could be also useful for the representation of other
> OpenTox objects (Algorithms, Models, ...).
>   
Yes.  Could we have a closer look into Algorithm object in BO dictionary
and decide if it can be reused in OpenTox  ?


> Any opinions?
>   
No need to say I am in favour of trying to cast OpenTox objects to RDF
instead of custom formats.  

BTW, couple of weeks ago I've started a list of potentially useful
ontologies at
http://opentox.org/dev/apis/api-1.1/feature_ontology/ontologies_existing/onto_list/?searchterm=existing%20ontologies
. If there is a better place for the list at the site, please free to
move it.

Best regards,
Nina

> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list