[OTDev] On confidences

Andreas Maunz andreas at maunz.de
Fri Jun 3 14:35:10 CEST 2011


Nina Jeliazkova wrote on 06/02/2011 08:59 AM:
>> >  I read material Andreas Maunz sent over just before the AXLR8 meeting, and
>> >  felt after reading it I at least had a reasonable (although superficially
>> >  shallow) understanding of the maths of your generalising to
>> >  significance-weighted Tanimotos, smoothing nearest neighbour similarities,
>> >  and adding the gaussian smoothing exponential.  However making the next
>> >  steps needs further work and interaction:
>> >  a) How to understand the above maths (and others) more clearly and deeply?
>> >  An interaction discussion along the path "developer - communicator - user"
>> >  is probably needed. Otherwise I worry that converting the maths used into a
>> >  simplified explanation in English may result in incorrect statements. They
>> >  will at least need a review.
>> >  b) Even then it is hard to understand the values in practice, so we need
>> >  several examples with several models to get a better feel for the meaning of
>> >  the numbers
>> >  c) The ToxCreate help says that "For most models confidence>  0.025 is a
>> >  sensible (hard) cutoff to distiguish between reliable and unreliable
>> >  predictions." and you can tell people that, but the first reaction to a
>> >  prediction that has a confidence of 0.026 as being a reliable prediction is
>> >  confusion, with the first reaction often being the opposite comprehension.
>> >  So redefining the index (even 1-x?) would be helpful for first meaning
>> >  comprehensions siutations. Could we even have a classification to the index?
>> >  - strongly confident .... very unconfident ... that users could understand
>> >  more easily?
>> >  d) Then also we have to prepare help explaining the maths and concepts if a
>> >  way that is easy to understand (probably leaving out the maths)
>> >
>> >  Another issue, is that different models using different methods to
>> >  communciate confidences in predictions will also be difficult for users to
>> >  grasp. Could a classification approach on diverse confidences somehow
>> >  "normalise meanings" for users?
>> >
> If I understand correctly, the Lazar confidence value is specific to Lazar
> algorithm/models and not really comparable  with the notion of confidence
> intervals in statistics. Ideally,  algorithms should use established
> statistical terms for communicating its performance, and if something is
> specific to the algorithm then using a distinct term should avoid confusion.
>
> Nina
>

Hi Nina, Barry, all,
I have posted some thoughts on how to improve on the situation here, and 
would be happy about comments:
http://www.maunz.de/wordpress/opentox/2011/lazar-confidence-a-probabilistic-view. 

It might also be of interest to non-Lazar models.

Andreas



More information about the Development mailing list