[OTDev] Clustering and Scaling Algorithms
chung chvng at mail.ntua.grFri Sep 3 19:14:10 CEST 2010
- Previous message: [OTDev] Contents of Development digest
- Next message: [OTDev] Clustering and Scaling Algorithms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear All, We had a discussion here about new kind of algorithms like clustering and scaling ones and I think we need to clarify some details to proceed with the implementation of such functionalities. First of all, some algorithms in order to produce reliable results (e.g. SVM) have to be fed with scaled data (whose values vary between -1 and 1 or 0 and 1 in some cases) as training sets. This requires not just a "scaling service" that accepts a dataset as input and creates a new dataset with the scaled data but the minimum and maximum values per feature of the dataset have to be stored also somewhere. These values could be either saved in the model, stored in some new kind of resource (e.g. under /scaling_parameters/123) or be retrieved (dynamically) from an existing dataset (e.g. from /dataset/{id}/minmax or something equivalent). So this is something to be discussed. Note that the SVM training algorithm produces high quality results but only if it uses scaled data as input and note also that a test dataset applied to a model for prediction need to be scaled with respect to the min and max values of the training dataset. For synchronization and data consistency reasons I would suggest that getting min/max from /dataset/{id}/minmax is the best way to go. Second, clustering algorithms are of high importance in predictive toxicology but it is unclear how can a cluster be represented in OpenTox. We plan to implement a new training algorithm whose vital component is a clustering routine and we are wondering how could this be materialized as an OpenTox web service. It needs to be mentioned that a cluster is not just a dataset and a client should be able to tell (using some web service) whether a compound belongs to a given cluster (which is something different compared to its belonging to a dataset). There are lots of algorithms that could be introduced in OT and these could be also part of our discussion in Rhodes about new services. Best Regards, NTUA development team :-) -------------- next part -------------- A non-text attachment was scrubbed... Name: face-smile.png Type: image/png Size: 873 bytes Desc: not available URL: <http://lists.opentox.org/pipermail/development/attachments/20100903/da73e20c/attachment.png>
- Previous message: [OTDev] Contents of Development digest
- Next message: [OTDev] Clustering and Scaling Algorithms
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list