[OTDev] On confidences

Thu Jun 2 08:59:29 CEST 2011

On 2 June 2011 09:51, Barry Hardy <barry.hardy at douglasconnect.com> wrote:

> (I am expanding this comms exchange on lazar model confidences with
> Christoph as it would probably benefit from further discussion on what our
> general framework and approach to confidences and communications of
> confidences should be)
>
> I read material Andreas Maunz sent over just before the AXLR8 meeting, and
> felt after reading it I at least had a reasonable (although superficially
> shallow) understanding of the maths of your generalising to
> significance-weighted Tanimotos, smoothing nearest neighbour similarities,
> and adding the gaussian smoothing exponential.  However making the next
> steps needs further work and interaction:
> a) How to understand the above maths (and others) more clearly and deeply?
> An interaction discussion along the path "developer - communicator - user"
> is probably needed. Otherwise I worry that converting the maths used into a
> simplified explanation in English may result in incorrect statements. They
> will at least need a review.
> b) Even then it is hard to understand the values in practice, so we need
> several examples with several models to get a better feel for the meaning of
> the numbers
> c) The ToxCreate help says that "For most models confidence > 0.025 is a
> sensible (hard) cutoff to distiguish between reliable and unreliable
> predictions." and you can tell people that, but the first reaction to a
> prediction that has a confidence of 0.026 as being a reliable prediction is
> confusion, with the first reaction often being the opposite comprehension.
> So redefining the index (even 1-x?) would be helpful for first meaning
> comprehensions siutations. Could we even have a classification to the index?
> - strongly confident .... very unconfident ... that users could understand
> more easily?
> d) Then also we have to prepare help explaining the maths and concepts if a
> way that is easy to understand (probably leaving out the maths)
>
> Another issue, is that different models using different methods to
> communciate confidences in predictions will also be difficult for users to
> grasp. Could a classification approach on diverse confidences somehow
> "normalise meanings" for users?
>

If I understand correctly, the Lazar confidence value is specific to Lazar
algorithm/models and not really comparable  with the notion of confidence
intervals in statistics. Ideally,  algorithms should use established
statistical terms for communicating its performance, and if something is
specific to the algorithm then using a distinct term should avoid confusion.

Nina

> Barry
>
>  - Also, the developers will have to communicate with the "tutorial
>>> developer", and perhaps even react. For example, try to explain to
>>> someone what a confidence of 0.08 for a lazar model prediction means. I
>>> am not even sure yet from the material how I would redefine it to be
>>> somewhat intuitive.
>>>
>> The last point is a good example why developers should not write
>> tutorials. I have explained lazar confidence ~1000 times (and an
>> explanation pops up if you click at the word in ToxCreate) so I simply
>> do not realize if a proper definition is missing in a tutorial. For this
>> reason I need someone who less involved to spot such problems. I can of
>> course try to give an explanation (e.g. as in ToxCreate), but I cannot
>> judge if it is understandable for someone with a different background.
>>
>> Mapping lazar confidences to something more intuitive (ie real
>> probabilities) is possible, it is on my list, but we just did not have
>> the time to implement it.
>>
>> Best regards,
>> Christoph
>>
>>
>>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>