[OTDev] Datasets with Features for multi entity relationships ? Models & Algorithms

Mon Dec 6 12:53:36 CET 2010

Surajit,

I am resending the examples in the attachment. Here is my previous
comment:

Here is the promised example of a complete nearest neighbor prediction.
It is more complicated than the previous substructure example and
contains:

1. The query compound
(http://localhost/compound/InChI=1S/C6H8N2/c7-8-6-4-2-1-3-5-6/h1-5,8H,7H2)
2. A lazar prediction for the query compound
(http://localhost/dataset/582/feature/prediction/Hamster%20Carcinogenicity/0),
this feature has among other annotations also a confidence value
3. Neighbors of the query compound as features
(http://localhost/dataset/582/feature/neighbor/*), these features have
(among other annotations) a compound, similarity and a measured
activity.
4. Substructures of the query compound
(http://localhost/dataset/583/feature/descriptor/*), these are the same
type of fminer/bbrc substructures as in the previous substructure
dataset example
5. Substructures of the neighbors (in data entries of the neighbor
compounds)

In addition to the proposed substructure additions, we would need the
following entries in the OpenTox ontology:

To distinguish between different types of features: ot:Neighbor,
ot:Substructure, ot:ModelPrediction (ot:MeasuredFeature could be useful
for values that come from database searches)
To represent neighbors: ot:measuredFeeature, ot:similarity

The main lesson I have learnt from this exercise was to use feature
annotations to represent anything that is more complex than a single
value and to indicate the presence of such a "complex" feature by a
boolean value in the data entries. This allows us also to represent
multiple occurrences of the same feature without having to modify the
API.

Best regards,
Christoph

Excerpts from surajit ray's message of Mon Dec 06 12:18:03 +0100 2010:
> Hi Christoph,
> 
> Could you please paste the post here ... or atleast the subject line
> of the email - so I can search previous mails ?
> 
> Regards
> Surajit
> 
> On 6 December 2010 16:22, Christoph Helma <helma at in-silico.ch> wrote:
> > Dear Surajit,
> > Excerpts from surajit ray's message of Sat Dec 04 13:26:10 +0100 2010:
> >> Hi Nina,
> >>
> >> Heres a question that Christoph asked in the comments under Model API.
> >> which makes a good case for having features sets and assigning them to
> >> datasets.
> >>
> >> To Quote ---->
> >>
> >> URI returned on Model POST
> >> Posted by Helma Christoph at Oct 01, 2009 09:07 PM
> >> My predictions return not only a prediction_feature, but a lot of
> >> additional information (similarities, neighbors, substructures with
> >> statistical significance, etc) that do not fit very well into our
> >> dataset definition (they are in fact an aggregation of datasets and
> >> features). Any suggestions how to deal with such a situation?
> >
> > In one of my previous posts I have attached an example with such a
> > prediction. I works within the current API and needs only a few ontology
> > additions.
> >
> > Best regards,
> > Christoph
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nearest_neighbor_prediction.turtle
Type: application/octet-stream
Size: 44935 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20101206/5dbfa58a/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nearest_neighbor_prediction.rdfxml
Type: application/octet-stream
Size: 160482 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20101206/5dbfa58a/attachment-0001.obj>