[OTDev] LogP modeling challenge

Nina Jeliazkova jeliazkova.nina at gmail.com
Wed Feb 23 11:32:30 CET 2011


On 23 February 2011 12:21, Egon Willighagen <egon.willighagen at gmail.com>wrote:

> Hej Nina,
>
> On Wed, Feb 23, 2011 at 10:53 AM, Nina Jeliazkova
> <jeliazkova.nina at gmail.com> wrote:
> > In an exercise to reproduce an ECOSAR model, I've found the current
> > implementation of XLogP (CDK ) performs a bit different compared to
> KOWWIN
> > [1].
> >
> > http://tinyurl.com/xlogp-kowwin
> >
> > This makes hard to reproduce the ECOSAR model, since it depends on LogP.
> > Also, LogP is an important parameter in many toxicity prediction models.
>
> Yeah, this nicely reflect how 'reproducible' QSAR modeling is. Most
> models are so numerically unstable, that exchanging a variable with a
> highly correlated one (KOWWIN and CDK LogP's) ruins the prediction...
> says more about the QSAR model thatn the LogP descriptors...
>


>
> > This is why, I think it is an opportunity for everybody within OpenTox
> (and
> > outside) to create a better LogP prediction model from a dataset, which
> has
> > been recently made available to OpenTox . Models would be built
> preferably
> > via OpenTox API , but not necessary (in that case we could consider
> wrapping
> > the models into OpenTox API later) .
> >
> > Models can be then validated by OpenTox validation service at
> ALU-Freiburg
> > and best one(s) selected.
>
> Does OpenTox also provide the training data?
>
> > The dataset is available via OpenTox dataset service (several formats via
> > HTTP Accept:mime-type header  )
> >
> > http://apps.ideaconsult.net:8080/ambit2/dataset/181563
> > http://apps.ideaconsult.net:8080/ambit2/dataset/181563
>
> This is data to be used as training data? Do you have information on
> how it was curated? How tautomers were selected? Etc...
>
> It's about 2300 compounds with experimental LogP values... what's the
> license?
>
> wget --accept application/rdf+xml
> http://apps.ideaconsult.net:8080/ambit2/dataset/181563/metadata
>
> did not reveal license/copyright or modify/redistribution rights...
>
>
We need to further clarify the exact license, the data was sent recently to
us to be used within OpenTox project, for a modeling exercise similar to
this (claimed data being mostly public).   The structures were originally
in and SDfile, I don't have information how exactly they have been
selected.
I guess this could be another interesting experience to find how/if the
models change if different tautomers are used.

As the original exercise is to reproduce the ECOSAR model, we can hardly do
that, without comparable (to KOWWIN)  LogP  model.
(In)validating KOWWIN model could be of course part of the exercise :)

Nina


> Egon
>
> --
> Dr E.L. Willighagen
> Postdoctoral Researcher
> Institutet för miljömedicin
> Karolinska Institutet
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
>



More information about the Development mailing list