[OTDev] API extension summary

Tobias Girschick tobias.girschick at in.tum.de
Mon Jan 18 14:12:00 CET 2010


Hi Pantelis, All,

> [...] Starting with a brief definition, a DoA service tells whether a
> > compound can be used by a model. 

I don't agree here. The service should tell whether a model can be used
to predict a compound that is in the model's AD.

> So, at first, a POST operation at :
> >
> > /algorithm/{doa_id}
> >
> > of a model_uri, will return another model uri. Clients can now post
> > datasets on that model uri to get another dataset which has an extra
> > feature (call it for instance http://sth.com/feature/doa ) which is
> > boolean and 1 corresponds to "compound belongs to the doa of the
> > underlying models" and  to the opposite.

I am not sure I do understand the process you propose completely. Can
you give a detailed work flow? The extra feature you are proposing will
be dependent on the training dataset of the prediction model. This will
have to be encoded. 

> >   
> Seems fine, besides that output of AD model might be probability -based,
> rather than yes/no.   

Agreed.

> This could be handled via multiple feature_uris,
> returned by the model.
> > This proposal assumes no modification of the current API and if there is
> > no objection on that we could implement a doa service based on the
> > method of "leverages".
> >   
> If accepted, algorithm types ontology will need to be extended with a
> subclass for applicability domain.

Agreed.

Best regards,
Tobias

> 
> Best regards,
> Nina
> 
> 
> > Best regards,
> > Pantelis
> >
> >
> >
> > On Mon, 2010-01-18 at 11:46 +0200, Nina Jeliazkova wrote:
> >   
> >> Hello All,
> >>
> >> Some discussion points for today meeting:
> >>
> >> 1. Data processing Algorithms.  All algorithms are subclasses of 
> >> http://www.opentox.org/api/1.1#Algorithm
> >>
> >> Generic input parameters:
> >> dataset_uri (as with other algorithms)
> >> parameters
> >>
> >> a) Data cleanup algorithms. Algorithm, which is a subclass of   
> >> http://www.opentox.org/algorithms.owl#DataCleanup
> >> input parameters: generic
> >> output parameters: dataset_uri
> >>
> >> b) Feature selection algorithms  , subclass of 
> >> http://www.opentox.org/algorithmTypes.owl#FeatureSelection
> >> input parameters: generic
> >> output parameters:  feature_uris[]
> >>
> >> c)Supervised learning algorithms , subclass of 
> >> http://www.opentox.org/algorithmTypes.owl#Supervised
> >> input parameter:   prediction_feature
> >> output parameters:  dataset_uri
> >>
> >> d)Descriptor calculation algorithms   subclass of
> >> http://www.opentox.org/algorithms.owl#DescriptorCalculation
> >>
> >> input parameters: generic
> >> output parameters:  dataset_uri
> >>
> >> http://opentox.org/dev/apis/api-1.1/Algorithm  entry is (partially) updated
> >>
> >>
> >> 3) How to identify features, generated by an algorithm and specific set 
> >> of parameters:
> >>
> >> According to current opentox.owl, a Feature can be assigned Algorithm,
> >> Model or Dataset as its origin (via property ot:hasSource).   There is
> >> no support for Algorithm + Parameters, except if the specific case of a
> >> Model can be regarded as Algorithm + Parameter instance.
> >>
> >> One possible solution could be:
> >> - define superclass A, which is determined by Algorithm + Parameters
> >> - Make Model subclass  of A
> >> - define  domain of ot:hasSource  as classes A and Dataset
> >> - Find a nice name for the superclass A
> >>
> >> This will be searchable via ontology service.
> >>
> >> Question: Can we directly use Model to denote descriptors, especially
> >> descriptors, which require datasets to be calculated?
> >>
> >> 3. Dataset API
> >> Reminder: the dataset API 1.1 allows specifying feature URI and compound
> >> URI on GET operations:
> >>
> >> http://opentox.org/dev/apis/api-1.1/dataset
> >> Query a dataset 	GET 	/dataset/{id} 	*compound_uris[]* and/or *feature_uris[]* to select compounds and features;
> >>
> >> These are very flexible means to get slices of a dataset (features = columns, compounds = rows ), or merging data across different datasets, without the need to download/upload dataset content.
> >>
> >> However, there have been some concerns, regarding the length of the URL. The proposal is to extend the same approach to allow POST and PUT operations to specify datasets via dataset_uri, compound_uris and feature_uris.  
> >>
> >>
> >> Create a new dataset 	POST 	/dataset 	
> >> 	Dataset representation in a supported MIME type. MIME type to be
> >> specified via *Content-type* header.
> >> 	New URI /dataset/{id} or redirect to task URI (for large uploads)
> >> 	200,202,400,503
> >>
> >> Update a dataset 	PUT 	/dataset/{id} 	
> >> 	Data representation in a supported MIME type; entries for existing
> >> compound/feature pairs will be overwritten, entries for new
> >> compound/features will be added
> >> 	Dataset or task URI
> >> 	200,202,400,404,503
> >>
> >>
> >> *Proposal: *
> >> 3.1.  If MIME type is *application/www-form-urlencoded*, allow
> >> dataset_uri , feature_uris[] and compound_uris[] are input parameter for
> >> PUT and POST operations.    This will facilitate assigning new dataset
> >> id to client specified subsets of data.  URL length is not an issue
> >> anymore, since parameters are passed via POST content body.
> >>
> >> example: 
> >> POST /dataset 
> >> dataset_uri=http://myservice/dataset/1 
> >> feature_uris[]=/selectedfeature1
> >> feature_uris[]=/selectedfeature2
> >>
> >> 3.2.  For file uploads, agree on fixed name for file upload parameter
> >> in  *application/www-form-urlencoded  *- e.g. *file_upload*. 
> >> When uploading content other than RDF (e.g. MOL, SDF, SMILES), there are
> >> currently no means how to assign metadata (even file name is not
> >> available when POSTing content other than RDF).
> >>
> >> 4. Query API.  There is currently no agreed API on querying for   .
> >> There are some custom implementations:
> >>
> >> Query for property/identifier  value
> >> http://ambit.uni-plovdiv.bg:8080/ambit2/compound?property=CAS&search=50-00-0
> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/compound?search=55-55-0>
> >> or
> >> /compound?search=phenolphthalein
> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/compound?search=phenolphthalein>
> >>
> >> Proposal:  /compound?search=value&sameas=http://url_from_an_ontology , e.g.
> >>
> >>
> >> /compound?search=50-00-0&sameas=http://www.opentox.org/api/1.1#CASRN
> >>
> >> Substructure
> >> /query/smarts?search=c1ccccc1O&max=100
> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/query/smarts?search=c1ccccc1O&max=100>
> >>
> >> Similarity
> >> /query/similarity?search=c1ccccc1&threshold=0.8
> >> <http://ambit.uni-plovdiv.bg:8080/ambit2/query/similarity?search=c1ccccc1&threshold=0.8>
> >>
> >>
> >>
> >> AFAIK, IST implementation uses /compound/{id}  API , which seems
> >> reasonable for first two cases, but there might be issues with embedding
> >> non-ascii symbols in {id}  (e.g. InChI , Smiles)
> >>
> >> Best regards,
> >> Nina
> >>
> >>
> >>
> >> _______________________________________________
> >> Development mailing list
> >> Development at opentox.org
> >> http://www.opentox.org/mailman/listinfo/development
> >>
> >>     
> >
> >
> > _______________________________________________
> > Development mailing list
> > Development at opentox.org
> > http://www.opentox.org/mailman/listinfo/development
> >   
> 
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development


-- 
Dipl.-Bioinf. Tobias Girschick

Technische Universität München
Institut für Informatik
Lehrstuhl I12 - Bioinformatik
Bolzmannstr. 3
85748 Garching b. München, Germany

Room: MI 01.09.042
Phone: +49 (89) 289-18002
Email: tobias.girschick at in.tum.de
Web: http://wwwkramer.in.tum.de/girschick




More information about the Development mailing list