[OTDev] Uploading non-standard datasets

Tue Sep 28 07:05:17 CEST 2010

On 27 September 2010 20:33, chung <chvng at mail.ntua.gr> wrote:
> On Mon, 2010-09-27 at 20:03 +0530, surajit ray wrote:
>> Hi,
>>
>> Well having them as features will not cut it - for the simple reason
>> that the "feature" in this case belongs to the the input dataset (or
>> whichever set is being worked upon).
>
> As far as I know as it is generally conceived in OpenTox and as far as
> the implementation in AMBIT is concerned, features are separated from
> datasets and can be standalone. That is, you can have a feature that
> does not appear in any datasets. You might only have a pointer to a
> dataset using the object property 'ot:hasSource' but this does not
> somehow bind the feature to the dataset. However, I'm not sure if I
> understood well.
>

In the current framework a feature may be a part of a dataset (or not)
and has to be assigned to a compound. Question is can a feature be
assigned to a dataset ?

To be clear lets consider an example of a Dataset X having 20
compounds. I create a feature which will be assigned to Dataset X
(that is Dataset X will have a property which is this feature). The
compounds within the dataset will not have this feature. So in our
case the feature could be another Dataset B of fragments which belong
to this Dataset X (when examined on the whole in a pairwise fashion).
Individual compounds of Dataset X  may have have some fragments of
Dataset B. However the same compounds (of Dataset X) may have another
different set of fragments as features when examined in the context of
Dataset Y (containing these compounds along with other newer
compounds).

>> The compound itself may not have
>> a substructure but it may be a a part of a dataset which when examined
>> will have the substructure appearing while doing the pairwise
>> comparisons.
>
> If you need to establish a relationship between a such a feature and a
> compound (so that given the feature you can retrieve the
> fragment/compound to which it refers in any supported MIME type), then
> we can extend the range of the property ot:hasSource to include also
> ot:Compound and assign a compound URI to such features. i.e. something
> like:
>
> /feature/123
>        a ot:Feature
>        ot:hasSource /compound/435
>

A substructure in our case is the result of a pairwise comparison. So
if treated as a feature it should have two hasSource. Also a single
pairwise comparison is of little value. The set of substructures
obtained on doing this operation over a dataset can provide more
valuable information after using a mathematical model like SVM etc.
Thus the feature (representing all the substructures found after doing
pairwise comparisons within a dataset) belongs to a dataset rather
than individual compounds. OR if we try to use the present framework -
this feature (set of substructures) belongs to a compound only when
the compound is part of the Dataset which was examined to create this
feature - AND - the compound itself cannot be the direct owner of this
feature (since it will contain substructures that are not part of this
compound !)

> But then I'm not sure whether the following are also needed:
>
> 1. Declare that the feature above is a ot:SubstructureFeature (new) or
> at least declare that it is boolean.
>
> 2. Make it explicit that the above compound is a ot:Fragment (new)
>
> Maybe we can go without introducing extra classes.
>

I am not entirely in agreement with that. Although it makes sense to
cover as much ground as possible with the present classes, it will be
really short-sighted to fit in "everything" by arm-twisting
definitions.

>>
>> Using the features system in this manner is not (IHMO) the solution to
>> this problem. It will be really be cumbersome to maintain the feature
>> URIs which may number in many thousands and will be extremely
>> transient. In effect it will be lot of resources being hogged by a
>> system which could do with a much more simpler implementation.
>
> That is not really a problem. A feature is a very small entry in a
> database. There are enterprises that maintain databases of some tens of
> TeraBytes or even more.
>

A little fact that we are forgetting here is that we are not operating
on VPN or dedicated networks here. Maintaining terrabytes of data is
what Oracle specializes in - and they will quail if they are asked to
deliver such services in a networked resource over the internet (vs
intranet or ethernet). Apart from this the other important fact is
that - if we treat the fragments as "features" we will have to
maintain a vast array of relationships within such substructure
"features" to prevent redundant "features" appearing on the fly over
many such algorithm runs.

>> Moreover a certain feature in such a system will be a part of a
>> compound if its a part of Dataset A and may not be a part of the same
>> compound when examined in Dataset B.
>>
>
> This is true. For example if a compound does not contain C=O it is
> obvious it will not contain CC=O or in general RC=O.
>

Let me be more clear here ....

lets say we have Compound A and B in a Dataset C - a comparison
produces C=O as a substructure common to both. So this fragment will
belong to the set of fragments obtained after examining the whole of
Dataset C. Now we have the Compound A and X in another Dataset D. Now
since there is no Compound B in this dataset C=O will not appear in
the fragment set (for a moment assume none of the other comparisons
within Dataset D - produce this fragment).

So the feature C=O is assigned to compound A only when there is
Compound B to compare with or more generally when Compound A is
examined within context of Dataset C. This feature cannot be assigned
to Compound A if there is no compound B (to compare with) ....

>> Summing up heres a few things I would like in the next API
>>
>> a) Ability to upload bulk compounds from scratch, using a dataset
>> construct (and not posting single compounds)
>
> I think this is supported. You can POST a dataset with a set of new
> compounds. If one or more compounds are not found in the database of the
> server they should be created.
>

Are we clear on this - maybe Nina can verify this fact.

>> b) Ability to assign features to datasets
>
> You mean "to append" features or have some structured meta information
> about the dataset itself?
>

Yeah assign features to datasets and not individual compounds.

Regards
Surajit

>> c) Ability to have non-standard datasets/compounds which contain
>> substructures rather than molecules.
>>
>> Regards
>> Surajit
>
> Best regards,
> Pantelis
>>
>> On 27 September 2010 18:31, chung <chvng at mail.ntua.gr> wrote:
>> > Hi Surajit,
>> >   As far as I can understand you have a problem similar to the one I
>> > was discussing with Alexey from IBMC. You need  a way to define which
>> > substructures are present in a certain structure. For this purpose you
>> > have to use features and not compounds. So you need a collection of
>> > features each one of which corresponds to a certain substructure.
>> > However in Ambit you can create a new compound by POSTing it
>> > to /compound in a supported MIME (e.g. SMILES, SDF etc) for example
>> > 'curl -X POST --data-binary @/path/to/file.sdf -H Content-type:blah/blah
>> > +sdf http://someserver.com/compound'. What is needed in OpenTox though
>> > is a collection of substructures in a feature service and a way to
>> > lookup for a certain feature according to its structure (e.g. providing
>> > its SMILES representation).
>> >
>> > Best Regards,
>> > Pantelis
>> >
>> > On Mon, 2010-09-27 at 14:18 +0530, surajit ray wrote:
>> >
>> >> Hi Nina,
>> >>
>> >> Need to upload some fragments (have smile representations) into a
>> >> dataset. Is this possible in the current framework ?
>> >>
>> >> To be more elaborate -
>> >> Currently I am uploading a dataset with compounds as the links to the
>> >> respective compound URIs (which happens at the end of the online
>> >> MaxtoxTest service). How would I upload new compounds (with smile/mol
>> >> representations) ? And secondly if these (the upload set) happen to be
>> >> fragments (and not molecules) is there a way to store such information
>> >> using the ambit dataset service ?
>> >>
>> >> Thanx
>> >> Surajit
>> >> _______________________________________________
>> >> Development mailing list
>> >> Development at opentox.org
>> >> http://www.opentox.org/mailman/listinfo/development
>> >>
>> >
>> >
>> > _______________________________________________
>> > Development mailing list
>> > Development at opentox.org
>> > http://www.opentox.org/mailman/listinfo/development
>> >
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>
>
>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>