[OTDev] descriptor recalculation
Christoph Helma helma at in-silico.chThu Apr 29 16:18:23 CEST 2010
- Previous message: [OTDev] descriptor recalculation
- Next message: [OTDev] descriptor recalculation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear All, I am not sure, if this helps, but here is my conceptualisation of the situation: >From an users point of view the situation is quite clear: He wants to submit a dataset and obtain predictions for new compounds: Input: training-data, compound(s) Output: prediction(s) Most real world users don't care how predictions are obtained (as long as they are correct), so at the end-user level (ToxPredict, ToxCreate) we should basically expose the functionality predict(training-dataset,compound). Although it might be possible to make predictions without explicit models (e.g. k-nn with graph-similarities), we have decided (with a good reason) to break the procedure into two steps in our implementation : Model-learning: Input: training-data Output: model Prediction: Input: model, compound(s) Output: prediction(s) As a ToxCreate developer (and I think Martin will agree for the validation service) I would like to work at this level with basically two operations: model = training-algorithm(training-dataset,[parameters]) prediction = model(compound) At this level, I do not want to be bothered with implementation details - this is the job of the webservice developers. They might decide e.g. to implement the training-algorithm as descriptors = cdk-descriptors(training-dataset) pca-descriptors = pca(descriptors) mlr-model = mlr(training-dataset,pca-descriptors) model.model = mlr-model model.descriptors = pca-descriptors and the prediction as compound-descriptors = cdk-descriptors(compound,model.descriptors) prediction = model.model(compound-descriptors) or as model = svm-with-graph-kernel(training-dataset) prediction = model(compound) The model learning task and the prediction task may utilize one or more algorithms (or models - the separation blurs once again), but at the high level I want to use only the "super" algorithms/models. As a GUI developer I still want to have access to the underlying algorithms, but they can be provided as parameters (our existing API is quite flexible in this respect). An algorithm webservice could provide e.g. a high level regression algorithm that allows to choose descriptor calculation, feature selection and modelling parameters by setting parameters (and it should document and check internally which algorithms play together). Future lazar version e.g. will have the facility to freely switch descriptor calculation services or use datasets with biological measurements. Maybe we should add the facility to represent sub-algorithms in OWL-DL for "super" algorithms. According to our API the model knows about ot.Algorithm and ot.IndependentVariables, but it would need to know the service to calculate independent variables. This could be inferred from the ot.Algorithms's sub-algorithms or stated explicitly. More importantly the service would have to be able to call the necessary services (of course this has to be implemented, if you are using stock ML/DM tools - but OpenTox should be more than just wrapping existing programs into a REST interface). It would be a large waste of efforts, if every developer would have to implement descriptor calculation separatly in their webservice clients. To sum up my personal opinion: For ToxCreate I would like to handle to high-level objects/services: training-algorithm (for creating models) and model (for predictions). I do not want to have to care about implementation details for model training and predictions, but would like to have access to the underlying algorithms through parameters. We might need minor API changes for representing "super" algorithm services (i.e. algorithm services that call other algorithm sservices) and for informing the model service about the right descriptor calculation service. Best regards, Christoph
- Previous message: [OTDev] descriptor recalculation
- Next message: [OTDev] descriptor recalculation
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list