[OTDev] Performance testing and monitoring

Tue Apr 20 17:15:10 CEST 2010

Hi Surajit,

Sorry for the late reply, my travel last week was a bit delayed by the
volcano crisis in Europe.

surajit ray wrote:
> Hi Nina,
>
> Was trying to make sense of the changes mentioned  here .....
>
> On Mon, Mar 22, 2010 at 3:40 PM, Nina Jeliazkova <nina at acad.bg> wrote:
>
>
>
>
>  Hi Surajit,
>   
>> The main issue is the code has some deviation in representing objects and
>> properties, compared to those defined in
>> http://opentox.org/api/1.1/opentox.owl .
>>
>> IMHO, the most convenient way to familiarize oneself with objects and
>> relationships is to open opentox.owl with Protege and explore OWLClasses tab
>> with properties view.
>>
>> The list of properties , defined for ot:Model object are in the middle
>> panel.
>>
>>
>> surajit ray wrote:
>>
>> Hi Nina,
>> this is them code generating the curl output for model
>>
>>              OntModel jenaModel = createJenaRDFModel();
>>             OT.OTClass.Model.createOntClass(jenaModel);
>>
>>              Individual model =
>> jenaModel.createIndividual(MaxtoxApplicationSettings.getServerRootPath() +
>> "/model/" + model_number, OT.OTClass.Model.getOntClass(jenaModel));
>>             model.addLiteral(DC.title, jenaModel.createTypedLiteral("Model
>> Number : " + model_number, XSDDatatype.XSDstring));
>>             model.addLiteral(DC.description,
>> jenaModel.createTypedLiteral(modelDetails.get("description"),
>> XSDDatatype.XSDstring));
>>             model.addLiteral(DC.identifier,
>> jenaModel.createTypedLiteral(MaxtoxApplicationSettings.getServerRootPath() +
>> "/model/" + model_number, XSDDatatype.XSDanyURI));
>>
>>
>> links to other ontologies are to be established between ot:Algorithm, not
>> ot:Model , so a statement below is not relevant for ot:Model
>>
>>              model.addProperty(OT.isA, "
>> http://www.opentox.org/modelTypes.owl#MCSSBasedToxicityPredictor");
>>
>> I guess your perspective is correct, from the way Ambit is built from
>>     
> ground up. But in our case the algorithm is a part of the model and not a
> separate entity. 
No, an algorithm and a model are separate entities by definition.  The
algorithm is the abstract series of steps , used to achieve something  
http://en.wikipedia.org/wiki/Algorithm .

One can apply an algorithm to create several models, by using different
parameters and datasets.  I guess you are not restricting your software
to work with a single dataset and producing a single model.

> I could make a REST interface for just a description of the
> algorithm, but we dont intend at this stage, to give separate access to the
> algorithm (at least that was not mentioned in our use-case requirement
> agreed upon at the project onset). Also our algorithm is quite multi-layered
> so exposing the algorithm( piece by piece) would be a time consuming task.
>   
Exposing algorithm step by step is not necessary at all (IMHO I already
noted this in previous emails).  From the API point of view , an
algorithm is just an URI , which accepts POST with set of parameters and
a dataset, and generates an URL, which points to a model.
> Also there is the question of storing the intermediate data that the first
> layer of the  algorithm will generate before the other layers can work on
> it.
>   
Storing intermediate data needs not to be exposed via API and should not
be of concern.
> Right now we can provide access to the model as a whole (as agreed in the
> use-case). In that perspective the sub-components need only have
> descriptions.
>
>
>   
Let's first agree on what is considered a model or algorithm.  

Will it be possible to use your service to create MaxTox model, with
arbitrary OpenTox dataset, or it is restricted to some dataset of your
preference?

If it is the first, then you need a separate URLs for algorithm and a
model.  If it is the second, than providing only Model URL might be OK.
>>              Individual dictionaryProducingDataset =
>> jenaModel.createIndividual(OT.OTClass.Dataset.getOntClass(jenaModel));
>>             dictionaryProducingDataset.addLiteral(DC.title,
>> jenaModel.createTypedLiteral("dictionaryProducingDataset",
>> XSDDatatype.XSDstring));
>>             dictionaryProducingDataset.addLiteral(DC.description,
>> jenaModel.createTypedLiteral("The dataset which was used to get the
>> fragments in the dictionary for this model", XSDDatatype.XSDstring));
>>             dictionaryProducingDataset.addLiteral(DC.identifier,
>> jenaModel.createTypedLiteral(modelDetails.get("dataset_uri"),
>> XSDDatatype.XSDanyURI));
>>             model.addProperty(OT.trainingDataset,
>> dictionaryProducingDataset);
>>
>>              Individual fragmentDataset =
>> jenaModel.createIndividual(OT.OTClass.Dataset.getOntClass(jenaModel));
>>             fragmentDataset.addLiteral(DC.title,
>> jenaModel.createTypedLiteral("fragmentDataset", XSDDatatype.XSDstring));
>>             fragmentDataset.addLiteral(DC.description,
>> jenaModel.createTypedLiteral("The dataset which shows all the fragments in
>> the dictionary", XSDDatatype.XSDstring));
>>             fragmentDataset.addLiteral(DC.identifier,
>> jenaModel.createTypedLiteral(modelDetails.get("fragmentset_uri"),
>> XSDDatatype.XSDanyURI));
>>             model.addProperty(OT.trainingDataset, fragmentDataset);
>>
>> In our case we wish to reveal two kinds of datasets. That is because the
>>     
> intermediate data happens to be a descriptor for the molecule. The training
> data obviously fits right in with the ot:trainingdataset. How about the MCSS
> fragment set for the model ? Where do you thinks that fits in ?
>   
MCSS fragment sets are clearly descriptors, which might be stored as
ot:Features , with corresponding information about the descriptor
calculation algorithm.

IMHO it is very much similar to FMiner descriptors , which are also set
of fragments with some weights  (numeric properties) assigned.  FMiner
descriptors also depend on a dataset, and how this is described in
OpenTox RDF has already been discussed on the list and implemented by
TUM.  I don't see a reason for introducing different approach for the
same task.  Please do study how TUM implementation work; for example
they do store FMiner features in IDEA web service.

By the way, IBMC MNA descriptors seems to be not too much different
conceptually.
> Also right now the dataset we are providing as training dataset is not from
> those available as a service within Opentox(since at the time we started we
> were unaware of such datasets).
>
>   Endpoints are assigned via ot:Feature, not as parameters of ot:Model
>   
>>               Individual endpoint =
>> jenaModel.createIndividual(OT.OTClass.Parameter.getOntClass(jenaModel));
>>             endpoint.addLiteral(DC.title,
>> jenaModel.createTypedLiteral("ToxicityEndpoint", XSDDatatype.XSDstring));
>>             endpoint.addLiteral(DC.description,
>> jenaModel.createTypedLiteral(modelDetails.get("endpoint"),
>> XSDDatatype.XSDstring));
>>             //endpoint.addLiteral(DC.identifier,
>> jenaModel.createTypedLiteral(MaxtoxApplicationSettings.getServerRootPath() +
>> "/parameter/moleculeSizeCutoffParameter", XSDDatatype.XSDanyURI));
>>             model.addProperty(OT.parameters, endpoint);
>>
>>
>>     
> Accepted. Does that mean we have to a build a REST interface for a feature
> as well ? 
As usual, there are two options - either you build a feature service, or
use one elsewhere.  TUM are using ambit feature service, but your choice
might be different.
> Once the other services are up, we intend to use datasets from the
> Opentox interfaces - so in the long run we do not need to host a REST
> interface for the features.. I'd rather code in the direction of more
> utilization of existing interfaces than provide redundant feature REST
> interfaces. Once we make a model from Opentox dataset (which will take some
> research and time) - I think the problem will automatically get solved.
>
>   
Can you provide a list what are the obstacles of using OpenTox dataset
service at this moment ?
>>   URI of the compound used for prediction is not  parameter of the model
>>
>>              Individual compound_uri =
>> jenaModel.createIndividual(OT.OTClass.Parameter.getOntClass(jenaModel));
>>             compound_uri.addLiteral(DC.title,
>> jenaModel.createTypedLiteral("compound_uri", XSDDatatype.XSDstring));
>>             compound_uri.addLiteral(DC.description,
>> jenaModel.createTypedLiteral("URI of compound whose toxicity needs to be
>> predicted.", XSDDatatype.XSDstring));
>>             compound_uri.addLiteral(DC.identifier,
>> jenaModel.createTypedLiteral(MaxtoxApplicationSettings.getServerRootPath() +
>> "/parameter/compound_uri", XSDDatatype.XSDanyURI));
>>             compound_uri.addLiteral(OT.paramValue,
>> jenaModel.createTypedLiteral("Any URI", XSDDatatype.XSDstring));
>>             compound_uri.addLiteral(OT.paramScope,
>> jenaModel.createTypedLiteral("mandatory", XSDDatatype.XSDstring));
>>             model.addProperty(OT.parameters, compound_uri);
>>
>> Ok. So what is it ? Is it ot:hasInput ? But that seems to be dataset as a
>>     
> API spec. What if it is a compound URI ? I guess the API does not take care
> of the variations to the central theme.
>
>   
The URI of the compound used for prediction is provided as a parameter
of the POST operation to the model URI.

Example: If you have http://seascape.com/model/maxtox , then a compound
is submitted as prediction by

curl -X POST -d "compound_uri=http://myhost.com/compound/mycompound" 
http://seascape.com/model/maxtox

This is completely independent of model RDF representation.    The
result of the prediction (the POST operation above) is assigned as a
feature of the compound above and should be available as

http://myhost.com/compound/mycompound?feature_uris[]=http://somefeature.service.com/feature/{maxtox_prediction}

The model itself should have ot:predictedFeatures point to
http://somefeature.service.com/feature/{maxtox_prediction} 
>   
>>  and finally ot:algorithm, ot:independentVariables, ot:dependentVariables,
>> ot:predictedVariables properties are missing.
>>
>>
>> Must we fill in ALL the blanks to be compliant. I must say the API is way
>>     
> too inflexible then.
>   
Filling the variables ot:algorithm, ot:independentVariables,
ot:dependentVariables and ot:predictedVariables  are crucial to
establish a generic interface to prediction algorithm and models.

If a model doesn't use independent variables or dependent variables, (as
some Toxtree models which rey on fixed rules), then it is not necessary
to fill it. 

But ot:predictedVariables and ot:algorithm are crucial for external
prediction application to learn which algorithm has been used and where
to find the result .
>
>   
>>  Could you tell me how to do it correctly ?
>>
>> Please have a look at
>>
>>
>> https://ambit.svn.sourceforge.net/svnroot/ambit/branches/opentox/opentox-client/src/main/java/org/opentox/rdf/representation/ModelRepresentation.java
>>
>> The code at
>> https://ambit.svn.sourceforge.net/svnroot/ambit/branches/opentox/opentox-clientdepends only on Jena and Restlet and can be directly used in others
>> projects.
>>
>> Best regards,
>> Nina
>>
>>
>>     
> The links don't work.  Could you please give me some alternates ?
>   
The links on sourceforge work fine, as have been recently verified and
used by other partners as well.

https://ambit.svn.sourceforge.net/svnroot/ambit/branches/opentox/opentox-client/src/main/java/org/opentox/rdf/representation/ModelRepresentation.java

https://ambit.svn.sourceforge.net/svnroot/ambit/branches/opentox/opentox-client

> And lastly :
> We have upgraded the website to have three fully analysed and validated
> models. The list of models are at :
> http://opentox2.informatik.uni-freiburg.de:8080/MaxtoxTest/model (can be
> opened in a browser)
>
> As to alpha testing mentioned by Barry, I would like to know what would
> constitute an alpha test. Since we are predicting a single molecule's
> activity with the model (the same as what Vedrin is running as a performance
> test) - I fathom from our perspective it is pretty much what is already
> happening. We could compare the results with other algorithms. That would be
> validation/testing.
>
>   
Compliance means:
- if the services claim to implement API of object X, it should follow
the description of that X object API on the API wiki page.  If some part
of it is not implemented, it should be documented.
- GET representation of objects should comply with opentox.owl 
- not introducing custom solutions, without discussing them with other
partners.

> As to the API , I still believe that it has arose from a single application
> reference (ambit) and needs to include other possibilities which may not
> follow the same patterns. Otherwise the load of "compliance coding" is going
> to severely hamper independent developers from joining in - a serious issue
> from the sustainability perspective.
>   
The API has been discussed and agreed by all partners with the purpose
to make it generic enough to cover all possible use cases in model
development and usage.

Ambit was not a template for the API, on the contrary, AMBIT Web
interface were built after the API  was defined in June 2009.  

Regarding models , currently AMBIT has a generic wrapper to offer
arbitrary Weka algorithms, all Toxtree modules , all CDK descriptors,
MOPAC -generated descriptors, as well as several applicability domain
algorithms, which pretty much cover what OpenTox is about (validation
excluded). 

I am sincerely in doubt MaxTox is so much different from all other QSAR
algorithms, that it could not fit with the current API.  Besides,
MaxTox, FMiner and MNA algorithms are doing similar thing - generating
fragments by respective algorithms - and FMiner already has API
compliant  implementation - what makes MaxTox and MNA different?

> Datasets are easier to provide, just a question of putting the interface in
> place. 
Same for models and algorithms :) 
> Models and the way it is conceptualized withing the API will lead to
> huge "redesign sessions" for any one who simply intends to provide a
> prediction model that he/she has built. 
Not necessarily, just think of a model as an entity, which produce
results, when given a set of parameters.  In fact it is still not quite
clear tome what was the reason of MaxTox redesign , since only a warpper
should be normally sufficient. 
> Essentially leaving most amateur
> coders out in the cold (since they will not have the bandwidth to make such
> huge code changes).
>   
On the contrary.  The purpose of an API is to be generic , not to change
on each use case.

API has a very simple and generic architecture:

The algorithm , model, feature, dataset and compound are all objects,
identified by URIs and their representation . GET operations return RDF
representation of objects.  POST operation create new objects.

1)POST on algorithm creates a model.  The POST parameters are
dataset_uri and algorithm parameters;
2)POST on model create predictions. dataset_uri or compound_uri is used
as parameters.   The results are stored as features;
3)Any compound properties are stored as features.  Whether the property
is generated by a descriptor algorithm, or prediction model is described
by ot:hasSource property of the feature.

> And one final question ... Is this API going to last the next six months  ?
>   
We've agreed minor changes are possible.  But the generic concepts of
models, being created by POST on algorithms , predictions, generated by
POST on models and storing compound properties as features are not going
to change.
> If not, compliance could mean a different set of rules in the future.
> Normally in the domain of software dev and testing, compliance testing
> happens when the API is matured enough (and stable). Our AA system will make
> more changes to the API - and this far from finalized.
>   
If we design the AA system right, it should not change the existing API,
only impose restrictions, transparent to the current API.

Best regards,
Nina
> Cheers
> Surajit
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>