[OTDev] Datasets with Features for multi entity relationships ?
Christoph Helma helma at in-silico.chThu Nov 25 17:13:01 CET 2010
- Previous message: [OTDev] Datasets with Features for multi entity relationships ?
- Next message: [OTDev] Datasets with Features for multi entity relationships ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Excerpts from surajit ray's message of Thu Nov 25 14:49:19 +0100 2010: > > This type of representation (we are using it internally) has served well > > for our datasets which might contain also several (10-100) thousand > > substructures for a few thousands compounds. I also do not think, that > > the representation is redundant: > > - each compound is represented once > > - each substructure is represented once > > - each association between compound and substructure is represented once > > Please correct me, if I am missing something obvious. > > According to this representation each dataEntry for a compound will > have to have all substructure features that were found in them. > Therefore each dataEntry may have 1000-10000 feature/featureValue > pairs . For 500 datasentries that means on an average of > 500*5000(assuming 5000 substructures) = 2,500,000 feature/featureValue > pairs - thats 2.5 million ! In our case it is a lot less (not completely sure about your feature types), because only a very small subset of features occurs in a single compound. > versus just having a featureset with a > 5000 feature entries. You can imagine the difference in cost of > bandwidth,computation etc. I am not sure, if I get you right, but where do you want to store the relationships between features and compounds? If there are really 2.5 million associations you have to assert them somewhere. And having features without compounds seems to be quite useless for me. > > > > Adding "false" occurences would not violate the current API (but would > > add redundant information). Keep in mind that the dataset representation > > is mainly for exchanging datasets between services - internally you can > > use any datastructure that is efficient for your purposes (we also do > > that in our services). So if you need fingerprints internally, extract > > them from the dataset. > > Internalizing an intermediate step completely serves the purpose but > leads to less flexible design paradigms. If we internalize the > workflow from substructure extraction to fingerprinting - we will lose > the ability to provide the data to a third party server for an > independent workflow. Of course the reasoning could be "who needs it > ?" - well you never know !! I am very interested in exchanging "fingerprints" with other services, but that can be done already with the current API. I see fingerprints as sets of features that are present in a compound (also using set operations to calculate similarities), and find it fairly straightforward to parse/serialize them to/from datasets. > > >> I still suggest having a FeatureSet/SubstructureSet type object within > >> the API to make it convenient to club features without compound > >> representations. > > > > I prefer to keep the API as generic as possible and not to introduce > > ad-hoc objects (or optimizations) for special purposes - otherwise it > > will be difficult to maintain services in the long term. Why don't you > > use ontologies for grouping features? > > Grouping features using ontologies is clubbing the features Not the > feature values But you cannot have feature values without relating features to compounds. If you use the representation I proposed feature values are "true" anyway. > So how do we know mcss3 occuring in compound X is with respect to > which compound. As you said we can have arbitary fields in the feature > definitions (for MCSS) - but that would be outside API definitions. features: mcss3: ot:componds: - compound2 - compound3 ot:smarts: smarts3 In my understanding you can add any annotation you want to a feature. Best regards, Christoph
- Previous message: [OTDev] Datasets with Features for multi entity relationships ?
- Next message: [OTDev] Datasets with Features for multi entity relationships ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list