[OTDev] Clustering and Scaling Algorithms

Fri Sep 3 19:14:10 CEST 2010

Dear All,
  We had a discussion here about new kind of algorithms like clustering
and scaling ones and I think we need to clarify some details to proceed
with the implementation of such functionalities. First of all, some
algorithms in order to produce reliable results (e.g. SVM) have to be
fed with scaled data (whose values vary between -1 and 1 or 0 and 1 in
some cases) as training sets. This requires not just a "scaling service"
that accepts a dataset as input and creates a new dataset with the
scaled data but the minimum and maximum values per feature of the
dataset have to be stored also somewhere. These values could be either
saved in the model, stored in some new kind of resource (e.g.
under /scaling_parameters/123) or be retrieved (dynamically) from an
existing dataset (e.g. from /dataset/{id}/minmax or something
equivalent). So this is something to be discussed. Note that the SVM
training algorithm produces high quality results but only if it uses
scaled data as input and note also that a test dataset applied to a
model for prediction need to be scaled with respect to the min and max
values of the training dataset. For synchronization and data consistency
reasons I would suggest that getting min/max from /dataset/{id}/minmax
is the best way to go.
   Second, clustering algorithms are of high importance in predictive
toxicology but it is unclear how can a cluster be represented in
OpenTox. We plan to implement a new training algorithm whose vital
component is a clustering routine and we are wondering how could this be
materialized as an OpenTox web service. It needs to be mentioned that a
cluster is not just a dataset and a client should be able to tell (using
some web service) whether a compound belongs to a given cluster (which
is something different compared to its belonging to a dataset). There
are lots of algorithms that could be introduced in OT and these could be
also part of our discussion in Rhodes about new services.

Best Regards,
NTUA development team :-)

-------------- next part --------------
A non-text attachment was scrubbed...
Name: face-smile.png
Type: image/png
Size: 873 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20100903/da73e20c/attachment.png>