[OTDev] In house XML schemas for Model Objects vs the PMML schema

Tobias Girschick tobias.girschick at in.tum.de
Mon Oct 5 11:22:14 CEST 2009


Hi Pantelis,

On Fri, 2009-10-02 at 17:39 +0300, chung wrote:
> Dear All,
>  In API 1.0 we accepted an XML schema for the representation of our
> models. This XML is small and simple and contains all meta-information
> about the model (user, id, name, tuning parameters, dataset uri) but no
> information about the parameters of the trained model. 

Well, as discussed in Rome with, e.g. Nina the XML schema of version 1.0
were in sufficient, as the don't provide any information on the
feature_definitions used to build the model (except if we say all the
feature_definitions in the dataset have to be in the model = new
dataset-uri after feature selection).
We updated the XMLs and put them on the website as a proposition. 

Regarding problems with PMML. At the moment it seems like we are not
able to give a PMML representation for every type of model (e.g.
Toxtree), so in my opinion we should stick to the XMLs for those cases
until there is either a PMML solution or another acceptable alternative.
The question is, do we want a mixed solution: PMML where possible (and
XML on explicit user request) and XML were PMML is not possible?

Regards,
Tobias

> I'm not sure if
> this is a real problem or not since a client can use this model to
> perform predictions without caring about these parameters but its very
> easy to build such models and internally store a model in any file
> format (serialized weka file, PMML, LibSVM DSD files, etc...). So do we
> have to provide this PMML file?
>   On the other hand, as Jorg mentioned, PMML files are widely accepted
> in industry while others (including me) have reported difficulty in
> building such models. Indeed, generating a PMML model is not
> straightforward in some cases and I still can't figure out how can I
> convert the LibSVM output into a PMML format (I'm talking about SVM
> models). 
>    So I'm wandering if we need to provide those models as PMML or if its
> ok (at least for now), to provide our In-house XMLs for Model Object....
> 
> Any Suggestions/Objections/Alternative ideas/Proposals (SOAP)?
> 
> Best Regrads,
> Pantelis
> 
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/people/girschic




More information about the Development mailing list