[OTDev] Uploading non-standard datasets
Nina Jeliazkova jeliazkova.nina at gmail.comThu Sep 30 13:48:14 CEST 2010
- Previous message: [OTDev] Uploading non-standard datasets
- Next message: [OTDev] Uploading non-standard datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi All, On Mon, Sep 27, 2010 at 6:03 PM, chung <chvng at mail.ntua.gr> wrote: > On Mon, 2010-09-27 at 20:03 +0530, surajit ray wrote: > > Hi, > > > > Well having them as features will not cut it - for the simple reason > > that the "feature" in this case belongs to the the input dataset (or > > whichever set is being worked upon). > No, a feature does not belong to a dataset, but might be associated with one or more datasets (or none). > > As far as I know as it is generally conceived in OpenTox and as far as > the implementation in AMBIT is concerned, features are separated from > datasets and can be standalone. Exactly. Features are standalone objects, can be created by POSTing to /feature service and the only connection is via ot:DataEntry , used in ot:Datasets. A feature could be used by multiple datasets. That is, you can have a feature that > does not appear in any datasets. You might only have a pointer to a > dataset using the object property 'ot:hasSource' but this does not > somehow bind the feature to the dataset. However, I'm not sure if I > understood well. > This is correct. > > > The compound itself may not have > > a substructure but it may be a a part of a dataset which when examined > > will have the substructure appearing while doing the pairwise > > comparisons. > In the current setup, the substructures are features, as any other properties. If a compound has no such substructure, it may simply have no such feature assigned via ot:DatasetEntry. This gives universal access to learning algorithm which can work with any kind of features, including substructures. (e.g. there is no need to tell SVM algorithm if a feature is a substructure or anything else). > > If you need to establish a relationship between a such a feature and a > compound (so that given the feature you can retrieve the > fragment/compound to which it refers in any supported MIME type), then > we can extend the range of the property ot:hasSource to include also > ot:Compound and assign a compound URI to such features. i.e. something > like: > > /feature/123 > a ot:Feature > ot:hasSource /compound/435 > We can simply use the current construct to declare ot:hasSource point to algorithm, that verifies presence of substructures, or finds substructures by any other means. Recall the purpose of ot:hasSource is to be able to regenerate the feature, when applied to new compound. If it points to an algorithm generating the feature is straightforwards, if pointing to a compound, it will not be possible. > > But then I'm not sure whether the following are also needed: > > 1. Declare that the feature above is a ot:SubstructureFeature (new) or > at least declare that it is boolean. > Extension of ot:Feature is reasonable, but it would be better if the algorithms, generating substructures are described in an ontology, as any other algorithms (e.g. descriptor calculation ones). > > 2. Make it explicit that the above compound is a ot:Fragment (new) > > > Maybe we can go without introducing extra classes. > > > > > Using the features system in this manner is not (IHMO) the solution to > > this problem. It will be really be cumbersome to maintain the feature > > URIs which may number in many thousands and will be extremely > > transient. In effect it will be lot of resources being hogged by a > > system which could do with a much more simpler implementation. > > That is not really a problem. A feature is a very small entry in a > database. There are enterprises that maintain databases of some tens of > TeraBytes or even more. > Indeed. Besides, having substructures as features effectively introduces caching of substructures, which allows 1) avoid multiple calculation of the same feature 2) being able to see if same substructures are used /generated by different algorithms and even do some comparison. 3) visualisation and statistics over features , without having to have all of them in memory. > > > Moreover a certain feature in such a system will be a part of a > > compound if its a part of Dataset A and may not be a part of the same > > compound when examined in Dataset B. > > > > This is true. For example if a compound does not contain C=O it is > obvious it will not contain CC=O or in general RC=O. > This is no problem currently, features might be present in one dataset and not in another. > > > Summing up heres a few things I would like in the next API > > > > a) Ability to upload bulk compounds from scratch, using a dataset > > construct (and not posting single compounds) > > I think this is supported. You can POST a dataset with a set of new > compounds. If one or more compounds are not found in the database of the > server they should be created. > Indeed, this is the operation most often used now. Compounds are created, if not found in the database. > > > b) Ability to assign features to datasets > > You mean "to append" features or have some structured meta information > about the dataset itself? > If you PUT an RDF, containing new features to an existing dataset, they will be appended to the dataset, and /dataset/id/feature will return the full set of features > > > c) Ability to have non-standard datasets/compounds which contain > > substructures rather than molecules. > In fact, if smiles/ mol /sdf files, containing substructures are uploaded , they will be saved as it is, but there will be available via /compound services. However, here are some of discussion points: - I don't think substructures and compounds should use the same dataset and compound API, these are semantically different resources - If we consider substructures as properties of the compounds, it's more logical to have substructures as features, as we try to model any properties via this construct. - What would be the benefit of having substructures as dataset? I am not sure there is standard way to distinguish whether given SMILES should be represented as a substructure , rather than an entire compound. - As I understood, the substructures datasets is something used internally by MaxTox algorithm, and not necessarily exposed to the end user. Thus, is it really necessary to have it available via a dataset service? Perhaps it is explained better in the manual Tobias is preparing , as it covers a similar case. Best regards, Nina > > > > Regards > > Surajit > > Best regards, > Pantelis > > > > On 27 September 2010 18:31, chung <chvng at mail.ntua.gr> wrote: > > > Hi Surajit, > > > As far as I can understand you have a problem similar to the one I > > > was discussing with Alexey from IBMC. You need a way to define which > > > substructures are present in a certain structure. For this purpose you > > > have to use features and not compounds. So you need a collection of > > > features each one of which corresponds to a certain substructure. > > > However in Ambit you can create a new compound by POSTing it > > > to /compound in a supported MIME (e.g. SMILES, SDF etc) for example > > > 'curl -X POST --data-binary @/path/to/file.sdf -H > Content-type:blah/blah > > > +sdf http://someserver.com/compound'. What is needed in OpenTox though > > > is a collection of substructures in a feature service and a way to > > > lookup for a certain feature according to its structure (e.g. providing > > > its SMILES representation). > > > > > > Best Regards, > > > Pantelis > > > > > > On Mon, 2010-09-27 at 14:18 +0530, surajit ray wrote: > > > > > >> Hi Nina, > > >> > > >> Need to upload some fragments (have smile representations) into a > > >> dataset. Is this possible in the current framework ? > > >> > > >> To be more elaborate - > > >> Currently I am uploading a dataset with compounds as the links to the > > >> respective compound URIs (which happens at the end of the online > > >> MaxtoxTest service). How would I upload new compounds (with smile/mol > > >> representations) ? And secondly if these (the upload set) happen to be > > >> fragments (and not molecules) is there a way to store such information > > >> using the ambit dataset service ? > > >> > > >> Thanx > > >> Surajit > > >> _______________________________________________ > > >> Development mailing list > > >> Development at opentox.org > > >> http://www.opentox.org/mailman/listinfo/development > > >> > > > > > > > > > _______________________________________________ > > > Development mailing list > > > Development at opentox.org > > > http://www.opentox.org/mailman/listinfo/development > > > > > _______________________________________________ > > Development mailing list > > Development at opentox.org > > http://www.opentox.org/mailman/listinfo/development > > > > > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development > -- Dr. Nina Jeliazkova Technical Manager 4 A.Kanchev str. IdeaConsult Ltd. 1000 Sofia, Bulgaria Phone: +359 886 802011
- Previous message: [OTDev] Uploading non-standard datasets
- Next message: [OTDev] Uploading non-standard datasets
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list