[OTDev] Datasets with Features for multi entity relationships ?
surajit ray mr.surajit.ray at gmail.comWed Nov 24 05:11:57 CET 2010
- Previous message: [OTDev] Datasets with Features for multi entity relationships ?
- Next message: [OTDev] Datasets with Features for multi entity relationships ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Christoph, On 23 November 2010 22:05, Christoph Helma <helma at in-silico.ch> wrote: > Excerpts from surajit ray's message of Tue Nov 23 09:12:16 +0100 2010: >> Hi Christoph, >> >> Scrolling through the mile long RDF file - I could barely make out >> whats going on ! > > One of the big advantages of XML ;-) > >> Could you please outline in a graphical/intuitive >> description as to what exactly is implemented in the RDF ? > > You are better off, if you have a look at the Turtle representation (was > attached in the previous post) which is easier to read. > > A dataset with 3 compounds and 3 substructures would have the following > basic structure (assuming that > - featureX occurs in compound1 and compound2 > - featureY occurs in compound2 > - featureZ occurs in compound1 and compound3 > ) > > compounds: > - compound1 > - compound2 > - compound3 > > data_entries: > - compound1: > featureX: true > featureZ: true > - compound2: > featureY: true > featureX: true > - compound3: > featureZ: true > > features: > featureX: > ot:smarts: cN > ot:pValue: 0.97 > ot:effect: activating > ot:hasSource: http://webservices.in-silico.ch/algorithm/fminer/bbrc > ot:parameters: > dataset_uri: http://webservices.in-silico.ch/dataset/1 > featureY: > ot:smarts: ccc > ot:pValue: 0.96 > ot:effect: deactivating > ot:hasSource: http://webservices.in-silico.ch/algorithm/fminer/bbrc > ot:parameters: > dataset_uri: http://webservices.in-silico.ch/dataset/1 > featureZ: > ot:smarts: N(O)=O > ot:pValue: 0.99 > ot:effect: activating > ot:hasSource: http://webservices.in-silico.ch/algorithm/fminer/bbrc > ot:parameters: > dataset_uri: http://webservices.in-silico.ch/dataset/1 > For a large dataset, the number of substructures mined by a given algorithm may be large (in the rage of thousands). Now according this representation - a substructure which occurs in 80% of the compounds will have to be associated with 80% of the dataset - vastly increasing the size of the dataset representation. Iterating over all the substructures may yield a dataset of gigantic proportions. For our use case we do not really need this as we are anyway fingerprinting each compound with the occurrence of the substructures mined. Furthermore the present representation cannot be called a fingerprint (of the compounds) with respect to the substructures as we would then have to fit in the "FALSE" occurrences as well ( the features which do not occur would have to mentioned with a value false). Therefore this representation is not serving the fingerprint functionality as well, without additional processing. I still suggest having a FeatureSet/SubstructureSet type object within the API to make it convenient to club features without compound representations. >> Also I have a question about mutually common relationships like MCSS. >> MCSS is common to both compounds (being compared). So in your >> representation would it be necessary to represent the relationship >> twice ? That is once for each compound - or can it be represented just >> once and be associated with both compounds ? > > I would do it like this: > > compounds: > - compound1 > - compound2 > > data_entries: > - compound1: > mcss_feature: true > - compound2: > mcss_feature: true > > features: > mcss_feature: > ot:smarts: c1cccc1(CC) > ot:hasSource: your_mcss_service_uri > Does this imply that the dataset will be locked. Without locking the dataset onto the two compounds (whose MCSS is being represented) - this representation will not work as it is not showing the three way relationship. MCSS can have a value of a smarts string and "occur" in a compound. But MCSS has to have a third entry - which is the second compound being compared to. The above representation can "imply" this relationship if the Dataset is locked on the two compounds. Which essentially brings us back to the original premise of assigning such "relationship" features to locked datasets. Regards Surajit > Best regards, > Christoph > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] Datasets with Features for multi entity relationships ?
- Next message: [OTDev] Datasets with Features for multi entity relationships ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list