[OTDev] In house XML schemas for Model Objects vs the PMML schema

Nina Jeliazkova nina at acad.bg
Mon Oct 5 11:28:25 CEST 2009


Tobias Girschick wrote:
> Hi Pantelis,
>
> On Fri, 2009-10-02 at 17:39 +0300, chung wrote:
>   
>> Dear All,
>>  In API 1.0 we accepted an XML schema for the representation of our
>> models. This XML is small and simple and contains all meta-information
>> about the model (user, id, name, tuning parameters, dataset uri) but no
>> information about the parameters of the trained model. 
>>     
>
> Well, as discussed in Rome with, e.g. Nina the XML schema of version 1.0
> were in sufficient, as the don't provide any information on the
> feature_definitions used to build the model (except if we say all the
> feature_definitions in the dataset have to be in the model = new
> dataset-uri after feature selection).
> We updated the XMLs and put them on the website as a proposition. 
>
> Regarding problems with PMML. At the moment it seems like we are not
> able to give a PMML representation for every type of model (e.g.
> Toxtree), so in my opinion we should stick to the XMLs for those cases
> until there is either a PMML solution or another acceptable alternative.
> The question is, do we want a mixed solution: PMML where possible (and
> XML on explicit user request) and XML were PMML is not possible?
>
>   
I would suggest PMML as export/import format where possible, in order to
allow for interoperability with different software solutions. We dont'
want OpenTox platform to provide only  closed custom solutions, don't we?

Ideally would be good if the internal XML formats for models and
everything else follows some established formats, e.g. Resource
Description Framework for example.  I realize it might be too demanding
to move to RDF for API 1.1 - we need to keep the deadlines we have set -
but let's think about it for the next API 1.2.

Regards,
Nina
> Regards,
> Tobias
>
>   
>> I'm not sure if
>> this is a real problem or not since a client can use this model to
>> perform predictions without caring about these parameters but its very
>> easy to build such models and internally store a model in any file
>> format (serialized weka file, PMML, LibSVM DSD files, etc...). So do we
>> have to provide this PMML file?
>>   On the other hand, as Jorg mentioned, PMML files are widely accepted
>> in industry while others (including me) have reported difficulty in
>> building such models. Indeed, generating a PMML model is not
>> straightforward in some cases and I still can't figure out how can I
>> convert the LibSVM output into a PMML format (I'm talking about SVM
>> models). 
>>    So I'm wandering if we need to provide those models as PMML or if its
>> ok (at least for now), to provide our In-house XMLs for Model Object....
>>
>> Any Suggestions/Objections/Alternative ideas/Proposals (SOAP)?
>>
>> Best Regrads,
>> Pantelis
>>
>>
>> _______________________________________________
>> Development mailing list
>> Development at opentox.org
>> http://www.opentox.org/mailman/listinfo/development
>>     


-- 
---------------------------------
Dr. Nina Jeliazkova
Technical Manager
IdeaConsult Ltd.
1000 Sofia, Bulgaria
Tel: +359 886 802011
ICQ: 10705013
www: http://ambit.sourceforge.net
---------------------------------                          
PGP Public Key
http://cert.acad.bg/pgp-keys/keys/nina-nikolova-0xEEABA669.asc
	8E99 8BAD D804 1A43 27B7  7F87 CF04 C7D1 EEAB A669
---------------------------------------------------------------




More information about the Development mailing list