[OTDev] Clustering and Scaling Algorithms
chung chvng at mail.ntua.grFri Sep 3 19:37:14 CEST 2010
- Previous message: [OTDev] Clustering and Scaling Algorithms
- Next message: [OTDev] Techie Table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, 2010-09-03 at 20:22 +0300, Nina Jeliazkova wrote: > On Fri, Sep 3, 2010 at 8:14 PM, chung <chvng at mail.ntua.gr> wrote: > > > Dear All, > > We had a discussion here about new kind of algorithms like clustering > > and scaling ones and I think we need to clarify some details to proceed > > with the implementation of such functionalities. First of all, some > > algorithms in order to produce reliable results (e.g. SVM) have to be > > fed with scaled data (whose values vary between -1 and 1 or 0 and 1 in > > some cases) as training sets. This requires not just a "scaling service" > > that accepts a dataset as input and creates a new dataset with the > > scaled data but the minimum and maximum values per feature of the > > dataset have to be stored also somewhere. These values could be either > > saved in the model, stored in some new kind of resource (e.g. > > under /scaling_parameters/123) or be retrieved (dynamically) from an > > existing dataset (e.g. from /dataset/{id}/minmax or something > > equivalent). So this is something to be discussed. Note that the SVM > > training algorithm produces high quality results but only if it uses > > scaled data as input and note also that a test dataset applied to a > > model for prediction need to be scaled with respect to the min and max > > values of the training dataset. For synchronization and data consistency > > reasons I would suggest that getting min/max from /dataset/{id}/minmax > > is the best way to go. > > > > I would prefer minmax to be per feature, not per dataset , and uris like > /feature/{id}/minmax to return the min/max values (or other relevant > statistics. There might be "statistics" algorithm that could return min, > max, average, standard deviation, etc. Of course if it could run close to > the dataset service for performance reasons. > What we need is a min-max value for the values of the feature restricted to a specific dataset. The range of values for the feature could be declared in the RDF of the feature as well. We could easily build a service for getting min and max values out of a dataset but this should involve downloading and parsing so as you said running close to the dataset service will be more effective. > For scaling one could have "scaling " algorithm that produce "scaled" > dataset with the current API > > e.g. > curl -X POST -d "dataset_uri=" /algorithm/scaling > > returns dataset uri of the scaled dataset. > > > > Second, clustering algorithms are of high importance in predictive > > toxicology but it is unclear how can a cluster be represented in > > OpenTox. We plan to implement a new training algorithm whose vital > > component is a clustering routine and we are wondering how could this be > > materialized as an OpenTox web service. It needs to be mentioned that a > > cluster is not just a dataset and a client should be able to tell (using > > some web service) whether a compound belongs to a given cluster (which > > is something different compared to its belonging to a dataset). There > > are lots of algorithms that could be introduced in OT and these could be > > also part of our discussion in Rhodes about new services. > > > > > We a running clustering algorithm since January, cluster is just a feature, > linked to the algorithm. > > http://apps.ideaconsult.net:8080/ambit2/algorithm/SimpleKMeans > Great! That will be very useful... > Nina > > > > > Best Regards, > > NTUA development team :-) > > > > > > > > _______________________________________________ > > Development mailing list > > Development at opentox.org > > http://www.opentox.org/mailman/listinfo/development > > > > > >
- Previous message: [OTDev] Clustering and Scaling Algorithms
- Next message: [OTDev] Techie Table
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list