[OTDev] RDF for dataset representation

Mon Nov 2 12:15:53 CET 2009

Hi Christoph,

On Thu, 2009-10-29 at 12:34 +0100, Christoph Helma wrote: 
> Excerpts from Nina Jeliazkova's message of Wed Oct 28 14:45:27 +0100 2009:
> 
> > - Do we need to make distinction between different e.g. XLogP
> > implementations (I would say yes) ?  Is it possible to handle this via
> > BO ontology, or we need an extension?
> 
> Egon? 
> 
> > - What would be the best way to extend BO ontology (this is more a
> > question to Egon)?
> > - How would we handle quantities, defined in existing data sets (e.g.
> > all LogP flavours available in EPA DSSTOX), not calculated via OpenTox,
> 
> For DSSTOX we can use the URI of the field definitions, e.g.
> 
> <http://www.epa.gov/ncct/dsstox/StandardChemFieldDefTable.html#STRUCTURE_MolecularWeight> or
> <http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#ActivityOutcome_CPDBAS_Rat>
> 
> > or an user uploaded dataset.
> 
> If the user is unable to link to an existing ontology, (s)he still can
> use local links (e.g. <#my_new_algorithm>) as predicates (although that
> will not be very useful to put the results into a meaningful context,
> but it can be sufficient for computational experiments).
> 
> > - How to handle quantities, calculated via some algorithm, but with
> > different parameters (e.g. eHOMO calculated with AM1 or PM3). 
> 
> I think that this could be resolved at the ontology level (e.g.
> ontology:eHOMO/AM1 vs ontology:eHOMO/PM3 or
> ontology:eHOMO?parameters=AM1). See also the next point.
> 
> > I would prefer that the property (e.g. blueobelisk:xlogp) refer to a
> > specific implementation, rather to the algorithm itself  (same concept
> > as algorithm/model split we already invented).
> > The implementation itself will be linked to the algorithm.
> 
> The predicate (i.e. property) could be the URI of the service that has
> calculated the value. To make the process completely reproducible, we
> would need to provide the POST URI together with all parameters - I am
> not sure if RDF supports this.
> 
> > Looking into the current list of feature definitions in Ambit
> > (http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition ), most of
> > them can be mapped to existing or to-be-developed ontologies, but we
> > need to extend your proposal in a way to keep track of the source of the
> > data.
> > 
> > For example it is important to know that feature MolWeight 
> > <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109>is
> > representing Molecular weight, but I would not want to lose the
> > information it came from ISSCAN_v3a_1153_19Sept08.1222179139.sdf
> > <http://www.epa.gov/NCCT/dsstox/sdf_isscan_external.html>
> > 
> > http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/12109
> > <http://ambit.uni-plovdiv.bg:8080/ambit2/feature_definition/11945>
> > This was the primary reason to invent feature definition to consist of
> > name + reference - I am sure this can be described in RDF as well.
> > 
> 
> Ah, now I get the idea behind the feature-definition.
> 
> > Actually I was thinking of an  (extensible) ontology for SMARTS defined
> > fragments;  ChEBI ontology  has lot of predefined groups that can be
> > used.  Read across use case will benefit from that :)
> 
> Yes, but this should support also arbitrary SMARTS
> substructures that come e.g. from supervised graph mining.
> 
> > We would need a way to handle dynamically defined properties and even
> > ontologies.  I am particularly thinking of user-defined datasets.
> 
> I agree, but I am not sure how to keep user defined ontologies
> consistent. We would need a curation process (who is responsible?), but
> maybe a simple tagging system could also work.
> 
> > There are several Java libraries , even Restlet in 2.x has some support
> > (no querying) - graph structure with serialization to several formats.
> 
> RDF support in Ruby could be better. Redland (http://librdf.org) seems
> to be fairly powerful and has Ruby (as well as Perl, PHP, Python and C)
> bindings, but it requires manual compilation of at least 3 libraries
> (i.e. no convenient 'gem install redland').
> 
> > > I suspect that RDF could be also useful for the representation of other
> > > OpenTox objects (Algorithms, Models, ...).

Regarding RDF. As this is not some format but more a concept to describe
knowledge it should be possible, otherwise I'd say RDF doesn't deliver
on it's promises. 
And if we use RDF for features and feature_definitions...it would be
nice to have algorithms and models consistent.

> > >   
> > Yes.  Could we have a closer look into Algorithm object in BO dictionary
> > and decide if it can be reused in OpenTox  ?

You were thinking of this dictionary, I suppose:
http://qsar.sourceforge.net/dicts/blue-obelisk/index.xhtml
The Algorithms listed and described there are as far as I can see solely
for descriptor/property calculation purposes (except maybe 2D Layout and
3D Geometry). There is a the moment no categorization for learning
algorithms and related stuff. I don't see any mapping or category
overlap of this dictionary with the algorithms we have implemented so
far. Probably there are some descriptor calculations in CDK and JOELib2
that map to the dictionary. 
On the other hand this does not mean it can't be reused. But we would 
have to add a lot of categories and classifications (e.g. regression).

Best Regards,
Tobias 
> 
> Munich ?

> 
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/people/girschic