[OTDev] ARFF mime type

Nina Jeliazkova nina at acad.bg
Wed Sep 30 16:55:07 CEST 2009


Christoph Helma wrote:
> Excerpts from Nina Jeliazkova's message of Tue Sep 29 16:01:03 +0200 2009:
>   
>> Jörg Kurt Wegner wrote:
>>     
>>>> I don't know whether this has been discussed before - but has PMML been
>>>> considered as a model exchange format? While it doesn't directly support
>>>>     
>>>>         
>> Since nobody answered so far, I would try to. PMML has been discussed
>> and agreed to be a good choice for model exchange format.  The main
>> issue currently is limited (read only and not for all type of models)
>> Weka support for PMML models.   With some efforts this might be
>> resolved, what is more challenging is how to make use of PMML for
>> specific SAR and (Q)SAR models, for example those based on structural
>> alerts. In particular it is not clear to me how to describe Toxtree in
>> PMML in a generic way and I am curious if Christoph has a solution for
>> the lazar system.
>>     
>
> Having a quick look at PMML I have found a TreeModel
> (http://www.dmg.org/v4-0/TreeModel.html), but I have no idea if that
> could work for Toxtree. 
Probably will not work without modifications. Toxtree is tricky in the
sense it is not just rules at the nodes. Some nodes involve "reactions"
(e.g. hydrolysis ) and then reaction products are passed down the tree. 
Some nodes involve descriptor calculations and Linear Discriminant models.

Not mentioning most of the "decision trees" in Toxtree are not really
trees, but directed graphs (
http://toxtree.sourceforge.net/images/cramer/tree.jpg ). 

To clarify, I would really like to have standard PMML based model
exchange format, but it is not clear how to do that for specific cases. 
Still it would be better to support PMML at least for regression/neural
networks. 
> I have found nothing about the representation of
> nearest neighbor models and it lacks also graph mining models.
>   
Well, lack of support for nearest neighbor models is understandable; one
need to store all data points.  Will it be possible to extend PMML to
include the training a dataset addressable as URI ?

Best regards,
Nina
> So my first impression is, that PMML is presently limited to a few
> "standard" techniques (well k-nn should be standard, too). I am not sure 
> if we can/want to extend PMML for additional algorithms, and most of our
> specialised/domain specific algorithms might contain too many implementation
> details to run on general purpose systems.
>
> Stefan, Andreas and Andreas: Do you have any idea, how PMML would work
> for your developments?
>
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list