[OTDev] Fwd: Predicted variables and confidence --- was: [OTP] Lazar models

Tue May 24 20:46:18 CEST 2011

Hi Martin, All,

On 24 May 2011 21:27, Martin Guetlein <martin.guetlein at googlemail.com>wrote:

> This should probably better be posted to the development list...
>
> ---------- Forwarded message ----------
> From: Martin Guetlein <martin.guetlein at googlemail.com>
> Date: Tue, May 24, 2011 at 8:26 PM
> Subject: Predicted variables and confidence --- was: [OTP] Lazar models
> To: opentox partners mailing list <partners at opentox.org>, Nina Jeliazkova
> <
> jeliazkova.nina at gmail.com>
> Cc: Christoph Helma <helma at in-silico.ch>
>
>
> Hi all,
>
> I just managed to produce the first validation report that utilizes
> non-lazar 'confidence' values, with a j48 model from ambit:
> http://local-ot/validation/report/validation/47
> (Once again this is just proof of concept, this is a training data
> validation and the confidence value is the class-probability value coming
> from WEKA, I asked Nina to add this information to the model predictions
> some time ago.)
>

Good to have both services working :)

>
> Both model services (ambit and lazar) now add the confidence as a separate
> feature to the prediction dataset which is nice, I think we should keep it
> that way.
>
> One deviation is that Ambit adds both features (prediction and confidence)
> to Model#predictedVariables while IST puts them into
> PredictionDataset#features. IST is doing this because we do not have a
> feature service, features do only exist in datasets (which makes A&A
> easier). I am fine with both solutions, but we maybe should agree on a
> common way to do it?
>
>
What about combining both solutions?  Features could be in the dataset, as
in IST services, or as separate resources,  but additionally models provide
list of predicted variables via /model/id/predicted ?  This way there will
be still no need of a separate feature service for you.

It's quite convenient to know how many and which features are generated by a
model.  We are using these to find out if the predictions are already
cached, or need to be calculated a new. And there will be a straightforward
way to check if a dataset indeed contains features from particular model.
Finally, if <dependentVariable | predictedVariable> owl:sameAs  <endpoint>
is set, then the model will appear under one of the endpoint categories in
ToxPredict, and not as a model with unknown endpoint, as now.

> The second deviation is how the actual prediction and confidence features
> look like. To unify this, my proposition would be:
> * The predicted feature is of type OT:ModelPredictionFeature (subclass of
> OT:Feature)
> * The confidence feature is of type OT:ModelConfidenceFeature (subclass of
> OT:Feature)
> * The confidence feature has a property OT:confidenceOf which points to the
> predicted feature (in case a model has more than one prediction feature)
>
>
Agree.

Nina

> Best regards,
> Martin
>
>
> --
> Dipl-Inf. Martin Gütlein
> Phone:
> +49 (0)761 203 8442 (office)
> +49 (0)177 623 9499 (mobile)
> Email:
> guetlein at informatik.uni-freiburg.de
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>