[OTDev] RDF, APIs and ontologies

Tobias Girschick tobias.girschick at in.tum.de
Mon Nov 16 11:43:06 CET 2009


Hi everyone,

On Fri, 2009-11-13 at 13:49 +0200, Nina Jeliazkova wrote: 
> 
> Christoph Helma wrote:
> > Excerpts from Nina Jeliazkova's message of Wed Nov 11 14:54:42 +0100 2009:
> >   
> >> Dear Christoph, All,
> >>
> >> I would suggest to start with an example.  Before Friday meeting it will
> >> be good if we have specific idea how to represent features in RDF .  We
> >> can consider BO ontology for descriptors and preliminary ontology for
> >> carcinogenicity Olga Tcheremenskaia showed yesterday during the online
> >> meeting.
> >>
> >> So far we have identified the following information is necessary to
> >> describe a feature
> >>
> >> 1)Name
> >> 2)Units
> >> 3)Data type (numeric, string, etc.)
> >> 4)Where the feature originates from: - this can be an algorithm used to
> >> calculate,a model, measurement protocol, literature reference,or another
> >> data source.
> >>
> >> RDF suggestions to represent this information are welcome. 
> >>     
> >
> > I would represent feature values in the dataset RDF as follows:
> >
> > 	@prefix compound: <http://webservices.in-silico.ch/compound/>
> > 	@prefix feature: <http://opentox.org/ontologies/features/>
> >
> > 	compound:{compound_id} feature:{feature_id} {feature_value} .
> >
> > Examples:
> >
> > 	# Carcinogenicity classification
> > 	# if we are happy with the DSSTOX definition
> > 	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H <http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#ActivityOutcome_CPDBAS_MultiCellCall> true . # true and false are boolean literals in N3, you can also define datatypes explicitly (http://www.w3.org/TR/rdf-mt/#dtype_interp)
> >
> > 	# if we want to manage our own definitions
> > 	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H feature:multi_cell_call true . 
> >
> > 	# Rat TD50
> > 	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H feature:rat_td50_mmol 0.207 . # implies numeric values
> >
> > 	# BBRC structral feature from supervised graph mining
> > 	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H feature:bbrc_representative  [ <#smarts> "NO"; <#p_value> 0.99;  <#effect> "activating"  ]. # a more complex feature with name/value pairs
> > 	
> > 	...
> >
> > GET http://opentox.org/ontologies/features/{feature_id} should return the feature definitions in RDF like:
> >
> > 	@prefix feature: <http://opentox.org/ontologies/features/>
> >
> > 	feature:{feature_id} rdfs:label {feature_name} .
> > 	feature:{feature_id} whatever:unit {feature_unit} . # I would have to find an ontology entry, maybe there is something in blueobelisc or chemaxon
> > 	feature:{feature_id} whatever:source {uri_for_algorithm_or_model_or_protocol_or_reference} . # have to find a suitable ontology
> > 	# if we need to specify algorithm/model/... parameters
> > 	{uri_for_algorithm_or_model_or_protocol_or_reference} whatever:parameters {parameter_value} . # have to find a suitable ontology
> >
> > Examples:
> >
> > 	feature:multi_cell_call rdfs:label "DSSTOX/CPDB Multi Cell Call" .
> > 	# no unit - nothing to define here
> > 	feature:multi_call_call  whatever:source <http://www.epa.gov/ncct/dsstox/StructureDataFiles/CPDBAS_DownloadFiles/CPDBAS_v5d_1547_20Nov2008.zip> . # source file
> > http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#TD50_Rat_mmol
> > 	feature:rat_td50_mmol whatever:unit "mmol/kg-bw/day" .
> > 	feature:rat_td50_mmol whatever:source <http://www.epa.gov/ncct/dsstox/StructureDataFiles/CPDBAS_DownloadFiles/CPDBAS_v5d_1547_20Nov2008.zip> . # source file
> > 	feature:bbrc_representative rdfs:label "Backbone refinement class representatives" 
> > 	feature:bbrc_representative whatever:source <http://webservices.in-silico.ch/algorithms/fminer> .
> > 	<http://webservices.in-silico.ch/algorithms/fminer> whatever:parameters [ <#dataset_uri> <http://webservices.in-silico.ch/dataset/3> ] .
> >
> > POSTing the same RDF to http://opentox.org/ontologies/features/ should
> > create http://opentox.org/ontologies/features/{feature_id}. PUT and
> > DELETE would work in analogy.
> >
> >   
> For everybody's convenience , I am gathering links to existing
> ontologies at
> http://opentox.org/dev/apis/api-1.1/feature_ontology/ontologies_existing/onto_list
> There are links to various ontologies, related to chemistry, data mining
> as well as generic one as Dublin core and measurement units.
> 
> The proposal sounds reasonable as start.  Will be no doubt refining lot
> of things when going into implementation.
> 
> I would propose
> 1) Every OpenTox object  to make use of Dublin Core ontology to define
> title, subject , description , type, source , relation , creator and
> publisher.   An excerpt from Dublin core elements are below:
> http://dublincore.org/documents/usageguide/elements.shtml
> 4.1. Title
> 4.2. Subject
> 4.3. Description
> 4.4. Type
> 4.5. Source
> 4.6. Relation
> 4.8. Creator
> 4.9. Publisher
> 4.10. Contributor
> 4.11. Rights
> 4.12. Date
> 4.13. Format
> 4.14. Identifier
> 4.16. Audience
> 4.17. Provenance
> 
> For example the "Source" element can be used to refer to the algorithm
> used to generate a feature, or could refer to original data source or
> publication.  The Relation element can be used to denote the feature is
> e.g. carcinogenicity endpoint, by referring to carcinogenicity ontology.
> 
> 
> 2)  Does the proposal means we abandon the API that allows to retrieve
> feature values, given a compound and feature identifiers ?

Another question: Does the proposal imply that features are coupled now
to datasets? That would mean, that we cannot have a compound with
features stored, that is not in a dataset? Or am I missing something? 

If I calculate a descriptors with the new API...do I update the dataset
or do I create a new one? The latter might lead to a huge number of
datasets and maybe even redundancy. 

I have to admit that removing the feature API has led to some confucion
on my/our side...I hope this clears up as soon as the ontology API is
there, which should contain a follow up for the feature_definitions and
references if I am understanding things right. (and something for the
algorithm ontology). Did we set a time frame for that (I am not sure
anymore...it was a long meeting)?

Best Regards
Tobias

> 
> Best regards,
> Nina
> 
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >   
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development


-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick




More information about the Development mailing list