[OTDev] Descriptor Calculation Services

Nina Jeliazkova nina at acad.bg
Tue Jan 12 18:23:57 CET 2010


Christoph Helma wrote:
> Excerpts from Tobias Girschick's message of Mon Jan 11 10:05:23 +0100 2010:
>   
>> Hi Pantelis, All,
>>
>> On Thu, 2010-01-07 at 18:49 +0200, chung wrote: 
>>     
>>> Hi Tobias, All,
>>>  While trying to train a model, the service is possible to "find" some
>>> missing values for a specific feature. 
>>>       
>> To obviate misunderstandings: You want to train a model with a data set
>> that contains missing values for a specific feature and the service
>> detects the missing features before training, right?
>>
>>     
>>> Is there a way to use your
>>> services to obtain the missing value? 
>>>       
>> If the feature with the missing values was produced from our descriptor
>> calculation service, yes. But you would have to build a dataset with all
>> the compounds where the value is missing and submit it to the descriptor
>> calculation service.
>> The question is, if a model training service should automatically
>> provide the functionality of "filling up" missing values. I think this
>> is something that should be done in the preprocessing phase - in a
>> preprocessing/data cleaning service.
>>     
>
> I would be extremely careful with the addition of missing features for
> several reasons:
>
> - Sometimes there are good physical/chemical/biological/algorithmic reasons why
>   features are missing - calculating these features might give
>   you a number but it is very likely that it is meaningless. 
>   
Agree.
> - A sameAs relationship does not guarantee, that (calculated and
>   measured) feature values are comparable (very frequently they are
> 	not).
>   
Right, this is the reason of having ot:hasSource for features , allowing
to identify exactly the descriptor calculation service used. 
> - Even if you find a measured value for the same feature, there is a
>   good chance, that it has been obtained by a different protocol and
> 	that it is not comparable with the other feature values.
>   
Agree.
> I would suggest to add features only
>
> - if you have a clear understanding, why a feature is missing
> - if you can prove that the feature calculation algorithm creates values
>   that are comparable with the original measurements (or calculation
> 	algorithm)
> - if you clearly document how and why the original dataset has been
>   modified
>   
An user interface supporting the above (e.g. allowing the user to
document why something is modified) would be relevant for both Fastox
and Toxmodel.

Best regards,
Nina
> Best regards,
> Christoph
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>   




More information about the Development mailing list