[OTDev] Preprocessing

Tobias Girschick tobias.girschick at in.tum.de
Mon Nov 2 13:59:11 CET 2009


Hi All,

I had a discussion with Nina and Pantelis what best to do with Feature
Definitions that have a mixed data type. This can happen if, e.g.
EC50_RAT has sometimes "ND" or similar instead of a numeric value. Here
is part of the discussion:

> Yes, indeed.  I don't recall we planned a service for preprocessing,
> but this might be necessary at certain point.
I think we definitely will need it. More sooner than later. And we did
introduce the category in our Algorithm ontology:
http://opentox.org/dev/apis/api-1.1/Algorithms

Preprocessing 
      * Feature selection 
              * supervised 
              * unsupervised 
      * Discretization 
              * supervised 
              * unsupervised 
      * Data cleanup 
      * Normalization

But except Feature/Descriptor selection there's nothing there, yet.
Discretization and Normalization can be easily integrated...I am not
sure about "Data cleanup". This might be tricky and it is not clearly
defined what it exactly is. Might include ways to handle missing values
for algorithms that can't cope with that. Might also handle cases like
the one we discussed with mixed data types...

Any opinions?

Tobias

-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/people/girschic




More information about the Development mailing list