[OTDev] Preprocessing

Nina Jeliazkova nina at acad.bg
Mon Nov 2 14:31:22 CET 2009


Hi Tobias,All,

Tobias Girschick wrote:
> Hi All,
>
> I had a discussion with Nina and Pantelis what best to do with Feature
> Definitions that have a mixed data type. This can happen if, e.g.
> EC50_RAT has sometimes "ND" or similar instead of a numeric value. Here
> is part of the discussion:
>
>   
>> Yes, indeed.  I don't recall we planned a service for preprocessing,
>> but this might be necessary at certain point.
>>     
> I think we definitely will need it. More sooner than later. And we did
> introduce the category in our Algorithm ontology:
> http://opentox.org/dev/apis/api-1.1/Algorithms
>
> Preprocessing 
>       * Feature selection 
>               * supervised 
>               * unsupervised 
>       * Discretization 
>               * supervised 
>               * unsupervised 
>       * Data cleanup 
>       * Normalization
>
> But except Feature/Descriptor selection there's nothing there, yet.
> Discretization and Normalization can be easily integrated...I am not
> sure about "Data cleanup". This might be tricky and it is not clearly
> defined what it exactly is. Might include ways to handle missing values
> for algorithms that can't cope with that. Might also handle cases like
> the one we discussed with mixed data types...
>   

An easy route would be to look into weka implementation of these and if
directly wrap these as services, at least use some ideas how
preprocessing functionality might be organized.


Regards,
Nina
> Any opinions?
>
> Tobias
>
>   




More information about the Development mailing list