[OTDev] descriptor recalculation

Fri Apr 30 10:49:39 CEST 2010

Christoph Helma wrote on 04/29/2010 11:33 PM:
>>> According to our API the model knows about ot.Algorithm and
>>> ot.IndependentVariables, but it would need to know the service to
>>> calculate independent variables.
>> It does actually - every feature (variable) has ot:hasSource, which
>> points to the service it has been generated from (e.g. descriptor
>> calculation one) - and this is what we use in ToxPredict.
>
> True, but that makes sense only for "simple" descriptor calculation
> algorithms (i.e. descriptors that are independent of the training
> activities, like phys-chem properties, substructures). If we use e.g.
> supervised graph mining techiques we need
>
> (i) an algorithm (model because it is algorithm applied to data?) that
> mines features in the training dataset and creates a feature dataset
> (e.g. fminer)
>
> (ii) a simple substructure matching algorithm that determines if the
> mined features are present in the compound to be predicted (e.g.
> OpenBabel Smarts matcher)

We need this not just for crossvalidation, but also for single predictions.
The feature set is not fixed, but depends on properties inherent to the 
(training) data set.
However, once the features are calculated for the training dataset, the 
matching service (ii) may be seen as an ordinary feature calculation 
service:

f_i(mol) = 1   if feature i occurs in mol,
f_i(mol) = 0   else.

I.e. it takes the same role as any other feature calculation service.

> My interpretation was, that ot:hasSource should point to the graph
> mining algorithm, but the model would need the substructure matcher for
> predictions. How should we handle this?

Set ot:hasSource to service (ii)?

Regards
Andreas

-- 
http://www.maunz.de

                         C Programmers do it recursively.