[OTDev] [Fwd: Re: Feature Generation Algorithms: Avoiding duplicates]

Tobias Girschick tobias.girschick at in.tum.de
Tue Jan 19 15:35:35 CET 2010


Hi Nina,

> It might help if you try to define your descriptors in a way similar 
> to BO ontology.

We have thought about that. But I am not sure that this makes sense or
is possible. At least if I consider e.g. this as one descriptor:

C(C) (minSup: 0.7, dataset: http://somedataset, hasSource/algo: FTM)

How should I describe this in an ontology? What I can do is use
information of some of the parameters (e.g. path not tree) for
categorization. But if I am right a single descriptor is to be
understood as a unique mapping, a function that takes the molecule and
maps it to a real, int or boolean value. For e.g. physico-chemical
descriptors, the owl:sameAs relation gives a definition of the function,
right? 

Clearly we need to define a way to store parameters for (some) features
and if I remember your last email to Fabian and the last meeting right,
you agree on that. The question is how? 
I still don't like the idea of declaring this type of feature as some
kind of model, although from a modelling point of view it seems the same
or very similar. But from a semantic point of view this are two totally
different things. What do you think about extending the Feature instead
of the Model. We could have simple Features (same as at the moment) and
ComplexFeatures that have an ot:Algorithm with ot:parameters and an
ot:dataset?


> BTW, it seems you are not using owl:sameAs in RDF description of
> features, or at least they do not appear in the database. Can we
> verify? It might be parsing error from my side as well.

No we are not using them up to now, so it's no parsing error ;) We were
not sure what to put there. In the CDK and JOELib2 case we will have to
do a by-hand mapping of the descriptors to the BO ontology and use this
(or extend it), right? In the FTM or gSpan case, problems see above...

Best regards,
Tobias



-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick




More information about the Development mailing list