[OTDev] RDF for dataset representation

Christoph Helma helma at in-silico.de
Thu Oct 29 12:34:19 CET 2009


Excerpts from Nina Jeliazkova's message of Wed Oct 28 14:45:27 +0100 2009:

> - Do we need to make distinction between different e.g. XLogP
> implementations (I would say yes) ?  Is it possible to handle this via
> BO ontology, or we need an extension?

Egon? 

> - What would be the best way to extend BO ontology (this is more a
> question to Egon)?
> - How would we handle quantities, defined in existing data sets (e.g.
> all LogP flavours available in EPA DSSTOX), not calculated via OpenTox,

For DSSTOX we can use the URI of the field definitions, e.g.

<http://www.epa.gov/ncct/dsstox/StandardChemFieldDefTable.html#STRUCTURE_MolecularWeight> or
<http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#ActivityOutcome_CPDBAS_Rat>

> or an user uploaded dataset.

If the user is unable to link to an existing ontology, (s)he still can
use local links (e.g. <#my_new_algorithm>) as predicates (although that
will not be very useful to put the results into a meaningful context,
but it can be sufficient for computational experiments).

> - How to handle quantities, calculated via some algorithm, but with
> different parameters (e.g. eHOMO calculated with AM1 or PM3). 

I think that this could be resolved at the ontology level (e.g.
ontology:eHOMO/AM1 vs ontology:eHOMO/PM3 or
ontology:eHOMO?parameters=AM1). See also the next point.

> I would prefer that the property (e.g. blueobelisk:xlogp) refer to a
> specific implementation, rather to the algorithm itself  (same concept
> as algorithm/model split we already invented).
> The implementation itself will be linked to the algorithm.

The predicate (i.e. property) could be the URI of the service that has
calculated the value. To make the process completely reproducible, we
would need to provide the POST URI together with all parameters - I am
not sure if RDF supports this.

> Looking into the current list of feature definitions in Ambit
> (http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition ), most of
> them can be mapped to existing or to-be-developed ontologies, but we
> need to extend your proposal in a way to keep track of the source of the
> data.
> 
> For example it is important to know that feature MolWeight 
> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109>is
> representing Molecular weight, but I would not want to lose the
> information it came from ISSCAN_v3a_1153_19Sept08.1222179139.sdf
> <http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html>
> 
> http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109
> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/11945>
> This was the primary reason to invent feature definition to consist of
> name + reference - I am sure this can be described in RDF as well.
> 

Ah, now I get the idea behind the feature-definition.

> Actually I was thinking of an  (extensible) ontology for SMARTS defined
> fragments;  ChEBI ontology  has lot of predefined groups that can be
> used.  Read across use case will benefit from that :)

Yes, but this should support also arbitrary SMARTS
substructures that come e.g. from supervised graph mining.

> We would need a way to handle dynamically defined properties and even
> ontologies.  I am particularly thinking of user-defined datasets.

I agree, but I am not sure how to keep user defined ontologies
consistent. We would need a curation process (who is responsible?), but
maybe a simple tagging system could also work.

> There are several Java libraries , even Restlet in 2.x has some support
> (no querying) - graph structure with serialization to several formats.

RDF support in Ruby could be better. Redland (http://librdf.org) seems
to be fairly powerful and has Ruby (as well as Perl, PHP, Python and C)
bindings, but it requires manual compilation of at least 3 libraries
(i.e. no convenient 'gem install redland').

> > I suspect that RDF could be also useful for the representation of other
> > OpenTox objects (Algorithms, Models, ...).
> >   
> Yes.  Could we have a closer look into Algorithm object in BO dictionary
> and decide if it can be reused in OpenTox  ?

Munich ?

Best regards,
Christoph



More information about the Development mailing list