[OTDev] RDF for dataset representation

Christoph Helma helma at in-silico.de
Wed Oct 28 13:51:05 CET 2009


Hi all,

I had a closer look at the RDF ѕtuff, and from my superficial understanding up
to now (no practical experience yet) it seems to be a good exchange format for
the dataset component.  I would suggest the following convention for creating
RDF triplets that represent a dataset:

Subject:    Compound URI
Predicate:  Measurement/Algorithm definition URI
Object:     Feature value

An example could look like (in Notation 3 http://www.w3.org/2000/10/swap/Primer):

  @prefix algorithm: <http://www.opentox.org/ontologies/algorithm/> .
  @prefix toxicity: <http://www.opentox.org/ontologies/toxicity/> .
  @prefix blueobelisk: <http://blueobelisk.sourceforge.net/ontologies/chemoinformatics-algorithms/#> .

  # Examples:

  # a calculated logP
  <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  blueobelisk:xlogP -2.20 .

  # toxicological classification
  <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  toxicity:multi_cell_call "active" .

  # a class sensitive structural feature, calculated by an algorithm that does not yet exist in an established ontology
  <http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2>  algorithm:backbone_refinement_class  [ <#smarts> "N-N"; <#p_value> 0.9998; <#effect> "activating" ] .

or in RDF/XML:

  <rdf:RDF xmlns="file:///home/ch/ontologies/tmp#"
      xmlns:algorithm="http://www.opentox.org/ontologies/algorithm/"
      xmlns:blueobelisk="http://blueobelisk.sourceforge.net/ontologies/chemoinformatics-algorithms/#"
      xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
      xmlns:toxicity="http://www.opentox.org/ontologies/toxicity/">

      <rdf:Description rdf:about="http://webservices.in-silico.ch/compound/InChI=1S/H4N2/c1-2/h1-2H2">
          <blueobelisk:xlogP rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">-2.2</blueobelisk:xlogP>
          <algorithm:backbone_refinement_class rdf:parseType="Resource">
              <effect>activating</effect>
              <p_value rdf:datatype="http://www.w3.org/2001/XMLSchema#decimal">0.9998</p_value>
              <smarts>N-N</smarts>
          </algorithm:backbone_refinement_class>
          <toxicity:multi_cell_call>active</toxicity:multi_cell_call>
      </rdf:Description>
  </rdf:RDF>

Advantages: 

  - We do not need a separate feature webservice (at least for for simple feature values and moderatly complex features, like the tuples in the BBRC example)
  - We do not need necessarily a feature-ontology (or feature-definition) webservice, if we use, expand and combine existing ontologies
  - It can help us to solve the problem of unique IDs, by using URIs 
  - Plays well with REST
  - Established standard
  - Facilitates queries/reasoning (especially useful for building GUIs)

Possible disadvantages:
  
  - Support in programming languages?

I suspect that RDF could be also useful for the representation of other
OpenTox objects (Algorithms, Models, ...).

Any opinions?

Best regards,
Christoph



More information about the Development mailing list