[OTDev] API 1.1. extensions - Numeric and Nominal data type implemented

Thu Feb 4 08:37:41 CET 2010

Hi Surajit,

Few more questions, to help me understand MaxTox:

Tobias Girschick wrote:
> Hi Surajit,
>
> On Wed, 2010-02-03 at 23:02 +0530, surajit ray wrote: 
>   
>> First of all - Thanks Ivelina for your inputs !
>>
>> You are right about the RatTD50 being a individual - I was a little confused
>> there. I have made the correction. The rest of the ontology follows from our
>> work flow which is :
>>
>> a) We generate a dictionary of overlaps found from a dataset of molecules.
>> (Basically we run a pairwise graph comparison of all the molecules in a
>> dataset and collect all the overlaps found in all the comparisons.)
>>     
Is the dictionary a set of fragments, obtained from a particular dataset
, and not a dataset of molecules?

e.g. if you compare "CCC"  with "CC" , the overlap will be "CC" ?
>> b) This dictionary is then used to generate a fingerprint of a test molecule
>> (for the endpoint of the particular dataset which generated the
>> dictionary).
>>
>> c) In the back end we have modeled the fingerprints of the molecules ( of
>> the dataset) against the dictionary generated from the dataset. For our
>> internal testing and validation - we then use it in a RandomForest model to
>> predict toxicity of future molecules.
>>
>> The question I am thinking about is how does this ontology fits in with the
>> API. Since the API has a dataset class - but what about the dictionary we
>> generate - would it be a dataset also - or a dataset derived object of some
>> sort ?
>>     
>
> I am not completely sure, but I have the impression that your dictionary
> is a set of features. Not simple features like a logP but features that
> have parameters. We had some discussion on this topic going on on the
> mailing list. The most important parameter of your dictionary features
> would be the dataset from which it is created. 
It seems to me  MaxTox approach can indeed be handled the same way we
have discussed recently in the context of TUM algorithms.

1)There is a description calculation algorithm, with a dataset as input
parameter (and possibly other parameters)
2)The description calculation algorithm generates descriptor values
(e.g. YES/NO/count for each fragment from the fragment set)
3)The descriptor values are used by random forest algorithm for building
prediction models.

> You can have a look here 
> http://www.opentox.org/dev/apis/api-1.1/Algorithm in the section where
> the descriptor calculation algorithms are explained in more detail. 
>   
To summarize:
-There is a generic MaxTox fingerprint calculation algorithm
 /algorithm/maxtoxfingerprints
-Calculation of fingerprints for particular dataset is done via POSTing
a dataset to the algorithm
curl -X POST -d /dataset/ABCD  /algorithm/maxtoxfingerprints 

This operation returns an URL of a new dataset with descriptor values,
but also creates a new URL for the algorithm with specific parameters, e.g.

    /algorithm/maxtoxfingerprints1

The new dataset contains features like

    <Feature  rdf:about="http://maxtox.in/feature/1>
             <dc:title rdf:datatype="&xsd;string">CCCC</dc:title>
            <hasSource rdf:resource=http://maxtox.in/algorithm/maxtoxfingerprints1/>
    </Feature>

The new algorithm entry  /algorithm/maxtoxfingerprints1 will provide RDF
representation as follows:

     <Algorithm rdf:about="http://maxtox.in/algorithm/maxtoxfingerprints1">
           <hasInput rdf:resource="http://maxtox.in/dataset/ABCD"/>

           <parameters rdf:resource="#Parameter_4"/>

            <parameters rdf:resource="#Parameter_3"/>
             <owl:sameAs rdf:resource="http://www.blueobelisk.org/ontologies/chemoinformatics-algorithms/#maxtox"/>
        </Algorithm>

         <Parameter rdf:ID="Parameter_3">
             <paramValue rdf:datatype="&xsd;string"></paramValue>
         </Parameter>
         <Parameter rdf:ID="Parameter_4">
             <paramValue rdf:datatype="&xsd;double">0.7</paramValue>
         </Parameter>

This approach will require only extending Blue Obelisk ontology with
MaxTox descriptor calculation algorithm, an no further changes, neither
in opentox ontology, nor in API. 

@Tobias and Fabian - if examples sounds familiar to you, it is
intentional :)

Finally, to align MaxTox with the API, there should be a clear split
between descriptor calculation algorithm (fingerprints) and modeling
(RandomForest).  For the Ranfom forest algorithm the representation as a
machine learning algorithm should be used (see AlgorithmTypes ontology).

Does this make sense for MaxTox?

Best regards,
Nina

> I hope this helps just a little bit
> Regards,
> Tobias
>
>   
>> Thanks
>> Surajit
>>
>> On Wed, Feb 3, 2010 at 7:36 PM, Ivelina Nikolova <iva at lml.bas.bg> wrote:
>>
>>     
>>> Dear Surajit,
>>>
>>> I'm looking at the MaxTox.owl you've attached earlier this week. The
>>> reasoner classifies it well, so technically it is correct, but I'm
>>> lacking some additional knowledge about the problem you wish to solve
>>> with this ontology, may you give me some more explanations so that i  so
>>> that I can get your point while creating it.
>>>
>>> It is surprising for me to see that you have chosen to create a class
>>> called RatTD50 as a Dataset subclass. Normally the concrete datasets or
>>> features are individuals and they are not part of the ontology, but of
>>> some external resource and they have their URL. What is your reason to
>>> design it this way?
>>>
>>> Best,
>>> ivelina
>>>
>>>
>>>
>>>
>>>
>>>
>>> surajit ray wrote:
>>>       
>>>> Hi Nina,
>>>>
>>>> I have created a basic ontology for the Maxtox model. Could please go
>>>> through it briefly when you have the time - and suggest improvements as
>>>>         
>>> well
>>>       
>>>> as how we can fit this into the existing Ontology of Opentox. Just
>>>> indications will do (I understand you are already under a deadline
>>>> pressure...)
>>>>
>>>> I have just started learning protege so a few concepts may have gone
>>>>         
>>> awry. I
>>>       
>>>> have tried my best to get the gist of our prediction system into the
>>>> attached ontology.
>>>>
>>>> Thanks in advance
>>>>
>>>> Cheers
>>>> Surajit
>>>>
>>>> On Tue, Feb 2, 2010 at 11:45 AM, Nina Jeliazkova <nina at acad.bg> wrote:
>>>>
>>>>
>>>>         
>>>>>  Hi Surajit,
>>>>>
>>>>>
>>>>> surajit ray wrote:
>>>>>
>>>>> Hi Nina,
>>>>>
>>>>>  Are we officially supposed to use the restlet2 m7 ?
>>>>>
>>>>>  The library and particular release are choice of the developer, so the
>>>>> answer is - it is up to you.
>>>>>
>>>>>  At this point I have few questions about this interaction
>>>>> a) Once a time consuming calculation is over - should the server notify
>>>>>           
>>> an
>>>       
>>>>> attached client about the result OR just sit with the data at a
>>>>>           
>>> specified
>>>       
>>>>> URL till it is fetched by the client ?
>>>>>
>>>>> The client sends GET requests and verifies if the result is ready (200
>>>>>           
>>> OK).
>>>       
>>>>> Have a look at the API  at
>>>>>           
>>> http://opentox.org/dev/apis/api-1.1/AsyncTask
>>>       
>>>>> When returning 303 (redirect) for an uncomplete task and making use of
>>>>> Refresh field, it is very easy to implement browser -like client to
>>>>> periodically check the task status. Browsers will automatically try to
>>>>>           
>>> fetch
>>>       
>>>>> the content after time interval, specified in Refresh:
>>>>>
>>>>>  (In the REST interface is it allowable for the server to contact a
>>>>> connected client ?)
>>>>>
>>>>> No.  REST is using HTTP protocol for communication and there is no any
>>>>> notion of an "attached" client.  The client sends GET/PUT/POST/DELETE
>>>>> request, receives an answer and there is no permanent between them after
>>>>>           
>>> the
>>>       
>>>>> response is sent.
>>>>> Unless you implement your client to behave as server as well, there is
>>>>>           
>>> no
>>>       
>>>>> way of server to tell the client anything, besides answering client
>>>>>           
>>> request.
>>>       
>>>>>  b) Should the server identify the client requesting the computation and
>>>>> authenticate before delivering the data OR give it to any client
>>>>>           
>>> requesting
>>>       
>>>>> the URI of the predicted/computed values ?
>>>>>
>>>>> We've decided to postpone everything, concerning authentication and
>>>>> authorization after the end of February, and currently all the services
>>>>>           
>>> are
>>>       
>>>>> open to everybody (I know it sounds scary :)
>>>>>
>>>>>  c) How long should the computed values be retained on the server OR
>>>>> should there be REST interface for destroying (and hence freeing up
>>>>> resources) a set of computed values ?
>>>>>
>>>>> It depends on your implementation.
>>>>>
>>>>> For example ambit services store everything in a database and keep the
>>>>> results forever, unless a delete operation is performed on specific
>>>>> resource.  REST way of deleting a resource is sending DELETE request
>>>>> (instead of POST or PUT, which are generaly for create/update).  For
>>>>>           
>>> most of
>>>       
>>>>> OpenTox resources DELETE operation is defined in the API (see wiki), but
>>>>>           
>>> not
>>>       
>>>>> everybody has implemented the full set of the API.
>>>>>
>>>>> d) Should there be an "account" like system for every client on the
>>>>>           
>>> server
>>>       
>>>>> See answer in (b).
>>>>>
>>>>> ? If yes - should the data generated by a client be attached to that
>>>>> "account" or available to all ?
>>>>>
>>>>> There might be public and private data, but at this moment we consider
>>>>> everything is public and will decide on details after finalizing
>>>>> deliverables at the end of this month.
>>>>>
>>>>>  Should the computed data persist indefinitely in each of these
>>>>>           
>>> "accounts"
>>>       
>>>>> ?
>>>>>
>>>>>  Again, it depends on your implementation.
>>>>>
>>>>> Best regards,
>>>>> Nina
>>>>>
>>>>>  Cheers
>>>>> Surajit
>>>>>
>>>>> On Tue, Feb 2, 2010 at 3:31 AM, Nina Jeliazkova <nina at acad.bg> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Hi Pantelis,
>>>>>>
>>>>>> There is a standard Java class java.util.concurrent.ExecutorService ;
>>>>>>             
>>> it
>>>       
>>>>>> could be configured to work as a pool of fixed or variable number of
>>>>>> threads.
>>>>>>
>>>>>> There is a Restlet TaskService , which is wraps the ExecutorService.
>>>>>> I've found it behaved weird and switched to the standard Java class.
>>>>>>
>>>>>> You might look at ambit code at
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>> https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-www/src/main/java/ambit2/rest/AmbitApplication.java
>>>       
>>>>>>             
>>> https://ambit.svn.sourceforge.net/svnroot/ambit/trunk/ambit2-all/ambit2-www/src/main/java/ambit2/rest/task
>>>       
>>>>>> For each asynchronous task, it creates a Callable class, which returns
>>>>>> Reference. Each tasks has an unique identifier (UUID) and the set of
>>>>>> tasks is stored in a ConcurrentMap. There is a timer, which removes
>>>>>> completed tasks few hours after completion.
>>>>>>
>>>>>> Hope this helps,
>>>>>> Nina
>>>>>>
>>>>>>
>>>>>> chung wrote:
>>>>>>
>>>>>>             
>>>>>>> Hi Nina,
>>>>>>>  I'm trying to make some improvements on the services so except for
>>>>>>>               
>>> the
>>>       
>>>>>>> migration from restlet2 m3 to m7 I was thinking of introducing some
>>>>>>> execution pool (e.g. an ExecutorService or -why not- something
>>>>>>> 'homemade') and establish a queue for the incoming requests
>>>>>>>               
>>> (especially
>>>       
>>>>>>> those characterized as time-consuming and memory-consuming ones). This
>>>>>>> way I will be able to manage all running tasks on the server and make
>>>>>>> some performance improvements I hope. Is there some standard way of
>>>>>>> doing this? Could you suggest some executor or some utility to manage
>>>>>>> the running threads and do you know if there is some way to specify
>>>>>>>               
>>> the
>>>       
>>>>>>> maximum number of running threads for Restlet?
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Pantelis
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 2010-01-26 at 14:21 +0200, Nina Jeliazkova wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Hello All,
>>>>>>>>
>>>>>>>> Following the data type discussions and proposal earlier this month,
>>>>>>>>
>>>>>>>>                 
>>>>>> now
>>>>>>
>>>>>>             
>>>>>>>> support for NumericFeature and NominalFeature are implemented in IDEA
>>>>>>>> services.
>>>>>>>>
>>>>>>>> Please note all features are explicitly declared to be subclass of
>>>>>>>> ot:Feature as well. While this is redundant and can be derived from
>>>>>>>>                 
>>> the
>>>       
>>>>>>>> ontology with a help of a reasoner, it does make the client
>>>>>>>> implementation somewhat easier.
>>>>>>>>
>>>>>>>> Examples from CPDBAS dataset at
>>>>>>>> http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/9
>>>>>>>>
>>>>>>>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12122>
>>>>>>>>       a       ot:Feature , ot:NominalFeature ;
>>>>>>>>       dc:identifier
>>>>>>>> "http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12122"^^xsd:anyURI
>>>>>>>>                 
>>> ;
>>>       
>>>>>>>>       dc:title "ActivityOutcome_CPDBAS_SingleCellCall" ;
>>>>>>>>       ot:acceptValue "inactive" , "active" ;
>>>>>>>>       ot:hasSource
>>>>>>>> <http://ambit.uni-plovdiv.bg:8080/ambit2/reference/11847> ;
>>>>>>>>       ot:units "" ;
>>>>>>>>       =       ot:ActivityOutcome_CPDBAS_SingleCellCall .
>>>>>>>>
>>>>>>>>
>>>>>>>> <http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12124>
>>>>>>>>       a       ot:Feature , ot:NumericFeature ;
>>>>>>>>       dc:identifier
>>>>>>>> "http://ambit.uni-plovdiv.bg:8080/ambit2/feature/12124"^^xsd:anyURI
>>>>>>>>                 
>>> ;
>>>       
>>>>>>>>       dc:title "STRUCTURE_MolecularWeight" ;
>>>>>>>>       ot:hasSource
>>>>>>>> <http://ambit.uni-plovdiv.bg:8080/ambit2/reference/11847> ;
>>>>>>>>       ot:units "" ;
>>>>>>>>       =       ot:STRUCTURE_MolecularWeight .
>>>>>>>>
>>>>>>>>
>>>>>>>> Bug reports are of course welcome at the usual place
>>>>>>>>
>>>>>>>> http://sourceforge.net/tracker/?group_id=191756
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Nina
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> 1) Feature data types:
>>>>>>>>> Proposal (based on Pantelis suggestions and Protege guide) at
>>>>>>>>> http://opentox.org/data/documents/development/RDF%20files/Datatypes
>>>>>>>>>                   
>>> .
>>>       
>>>>>>>>> Updated opentox.owl at
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>> http://opentox.org/data/documents/development/RDF%20files/OpenToxOntology/view
>>>       
>>>>>>>>>                   
>>>>>>>> _______________________________________________
>>>>>>>> Development mailing list
>>>>>>>> Development at opentox.org
>>>>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> _______________________________________________
>>>>>>> Development mailing list
>>>>>>> Development at opentox.org
>>>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Development mailing list
>>>>>> Development at opentox.org
>>>>>> http://www.opentox.org/mailman/listinfo/development
>>>>>>
>>>>>>
>>>>>>             
>>>>> --
>>>>> Surajit Ray
>>>>> Partner
>>>>> www.rareindianart.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Development mailing list
>>>> Development at opentox.org
>>>> http://www.opentox.org/mailman/listinfo/development
>>>>
>>>>         
>>> _______________________________________________
>>> Development mailing list
>>> Development at opentox.org
>>> http://www.opentox.org/mailman/listinfo/development
>>>
>>>       
>>
>>     
>
>
>