[OTDev] RDF, APIs and ontologies

Nina Jeliazkova jeliazkova.nina at gmail.com
Mon Nov 16 11:59:25 CET 2009


Hi Tobias,


 Hi everyone,

On Fri, 2009-11-13 at 13:49 +0200, Nina Jeliazkova wrote:

Christoph Helma wrote:

Excerpts from Nina Jeliazkova's message of Wed Nov 11 14:54:42 +0100 2009:

Dear Christoph, All,

I would suggest to start with an example.  Before Friday meeting it will
be good if we have specific idea how to represent features in RDF .  We
can consider BO ontology for descriptors and preliminary ontology for
carcinogenicity Olga Tcheremenskaia showed yesterday during the online
meeting.

So far we have identified the following information is necessary to
describe a feature

1)Name
2)Units
3)Data type (numeric, string, etc.)
4)Where the feature originates from: - this can be an algorithm used to
calculate,a model, measurement protocol, literature reference,or another
data source.

RDF suggestions to represent this information are welcome.

I would represent feature values in the dataset RDF as follows:

	@prefix compound: <http://webservices.in-silico.ch/compound/>
	@prefix feature: <http://opentox.org/ontologies/features/>

	compound:{compound_id} feature:{feature_id} {feature_value} .

Examples:

	# Carcinogenicity classification
	# if we are happy with the DSSTOX definition
	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H
<http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#ActivityOutcome_CPDBAS_MultiCellCall>
true . # true and false are boolean literals in N3, you can also
define datatypes explicitly
(http://www.w3.org/TR/rdf-mt/#dtype_interp)

	# if we want to manage our own definitions
	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H
feature:multi_cell_call true .

	# Rat TD50
	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H
feature:rat_td50_mmol 0.207 . # implies numeric values

	# BBRC structral feature from supervised graph mining
	compound:InChI=1S/C6H5NO2/c8-7(9)6-4-2-1-3-5-6/h1-5H
feature:bbrc_representative  [ <#smarts> "NO"; <#p_value> 0.99;
<#effect> "activating"  ]. # a more complex feature with name/value
pairs
	
	...

GET http://opentox.org/ontologies/features/{feature_id} should return
the feature definitions in RDF like:

	@prefix feature: <http://opentox.org/ontologies/features/>

	feature:{feature_id} rdfs:label {feature_name} .
	feature:{feature_id} whatever:unit {feature_unit} . # I would have to
find an ontology entry, maybe there is something in blueobelisc or
chemaxon
	feature:{feature_id} whatever:source
{uri_for_algorithm_or_model_or_protocol_or_reference} . # have to find
a suitable ontology
	# if we need to specify algorithm/model/... parameters
	{uri_for_algorithm_or_model_or_protocol_or_reference}
whatever:parameters {parameter_value} . # have to find a suitable
ontology

Examples:

	feature:multi_cell_call rdfs:label "DSSTOX/CPDB Multi Cell Call" .
	# no unit - nothing to define here
	feature:multi_call_call  whatever:source
<http://www.epa.gov/ncct/dsstox/StructureDataFiles/CPDBAS_DownloadFiles/CPDBAS_v5d_1547_20Nov2008.zip>
. # source file
http://www.epa.gov/ncct/dsstox/CentralFieldDef.html#TD50_Rat_mmol
	feature:rat_td50_mmol whatever:unit "mmol/kg-bw/day" .
	feature:rat_td50_mmol whatever:source
<http://www.epa.gov/ncct/dsstox/StructureDataFiles/CPDBAS_DownloadFiles/CPDBAS_v5d_1547_20Nov2008.zip>
. # source file
	feature:bbrc_representative rdfs:label "Backbone refinement class
representatives"
	feature:bbrc_representative whatever:source
<http://webservices.in-silico.ch/algorithms/fminer> .
	<http://webservices.in-silico.ch/algorithms/fminer>
whatever:parameters [ <#dataset_uri>
<http://webservices.in-silico.ch/dataset/3> ] .

POSTing the same RDF to http://opentox.org/ontologies/features/ should
create http://opentox.org/ontologies/features/{feature_id}. PUT and
DELETE would work in analogy.


For everybody's convenience , I am gathering links to existing
ontologies at
http://opentox.org/dev/apis/api-1.1/feature_ontology/ontologies_existing/onto_list
There are links to various ontologies, related to chemistry, data mining
as well as generic one as Dublin core and measurement units.

The proposal sounds reasonable as start.  Will be no doubt refining lot
of things when going into implementation.

I would propose
1) Every OpenTox object  to make use of Dublin Core ontology to define
title, subject , description , type, source , relation , creator and
publisher.   An excerpt from Dublin core elements are below:
http://dublincore.org/documents/usageguide/elements.shtml
4.1. Title
4.2. Subject
4.3. Description
4.4. Type
4.5. Source
4.6. Relation
4.8. Creator
4.9. Publisher
4.10. Contributor
4.11. Rights
4.12. Date
4.13. Format
4.14. Identifier
4.16. Audience
4.17. Provenance

For example the "Source" element can be used to refer to the algorithm
used to generate a feature, or could refer to original data source or
publication.  The Relation element can be used to denote the feature is
e.g. carcinogenicity endpoint, by referring to carcinogenicity ontology.


2)  Does the proposal means we abandon the API that allows to retrieve
feature values, given a compound and feature identifiers ?


Another question: Does the proposal imply that features are coupled now
to datasets? That would mean, that we cannot have a compound with
features stored, that is not in a dataset? Or am I missing something?

 This is what was left without discussion (IMHO) . I am not sure this is a
good option, there are lot of compound properties which are independent of
any dataset.


If I calculate a descriptors with the new API...do I update the dataset
or do I create a new one? The latter might lead to a huge number of
datasets and maybe even redundancy.

 Exactly.  From my point of view compounds are separae entities, compounds
have features and datasets are purely for denoting subsets of compounds and
features.  Thus my disagreement with the proposal to abandon feature API.


I have to admit that removing the feature API has led to some confucion
on my/our side...I hope this clears up as soon as the ontology API is
there, which should contain a follow up for the feature_definitions and
references if I am understanding things right. (and something for the
algorithm ontology). Did we set a time frame for that (I am not sure
anymore...it was a long meeting)?

 There is a scheduled meeting at Friday.

Best regards,
Nina


Best Regards
Tobias

Best regards,
Nina

Best regards,
Christoph
_______________________________________________
Development mailing list
Development at opentox.org
http://www.opentox.org/mailman/listinfo/development

_______________________________________________
Development mailing list
Development at opentox.org
http://www.opentox.org/mailman/listinfo/development



More information about the Development mailing list