[OTDev] API extension summary

Mon Jan 18 10:46:03 CET 2010

Hello All,

Some discussion points for today meeting:

1. Data processing Algorithms.  All algorithms are subclasses of 
http://www.opentox.org/api/1.1#Algorithm

Generic input parameters:
dataset_uri (as with other algorithms)
parameters

a) Data cleanup algorithms. Algorithm, which is a subclass of   
http://www.opentox.org/algorithms.owl#DataCleanup
input parameters: generic
output parameters: dataset_uri

b) Feature selection algorithms  , subclass of 
http://www.opentox.org/algorithmTypes.owl#FeatureSelection
input parameters: generic
output parameters:  feature_uris[]

c)Supervised learning algorithms , subclass of 
http://www.opentox.org/algorithmTypes.owl#Supervised
input parameter:   prediction_feature
output parameters:  dataset_uri

d)Descriptor calculation algorithms   subclass of
http://www.opentox.org/algorithms.owl#DescriptorCalculation

input parameters: generic
output parameters:  dataset_uri

http://opentox.org/dev/apis/api-1.1/Algorithm  entry is (partially) updated

3) How to identify features, generated by an algorithm and specific set 
of parameters:

According to current opentox.owl, a Feature can be assigned Algorithm,
Model or Dataset as its origin (via property ot:hasSource).   There is
no support for Algorithm + Parameters, except if the specific case of a
Model can be regarded as Algorithm + Parameter instance.

One possible solution could be:
- define superclass A, which is determined by Algorithm + Parameters
- Make Model subclass  of A
- define  domain of ot:hasSource  as classes A and Dataset
- Find a nice name for the superclass A

This will be searchable via ontology service.

Question: Can we directly use Model to denote descriptors, especially
descriptors, which require datasets to be calculated?

3. Dataset API
Reminder: the dataset API 1.1 allows specifying feature URI and compound
URI on GET operations:

http://opentox.org/dev/apis/api-1.1/dataset
Query a dataset 	GET 	/dataset/{id} 	*compound_uris[]* and/or *feature_uris[]* to select compounds and features;

These are very flexible means to get slices of a dataset (features = columns, compounds = rows ), or merging data across different datasets, without the need to download/upload dataset content.

However, there have been some concerns, regarding the length of the URL. The proposal is to extend the same approach to allow POST and PUT operations to specify datasets via dataset_uri, compound_uris and feature_uris.  

Create a new dataset 	POST 	/dataset 	
	Dataset representation in a supported MIME type. MIME type to be
specified via *Content-type* header.
	New URI /dataset/{id} or redirect to task URI (for large uploads)
	200,202,400,503

Update a dataset 	PUT 	/dataset/{id} 	
	Data representation in a supported MIME type; entries for existing
compound/feature pairs will be overwritten, entries for new
compound/features will be added
	Dataset or task URI
	200,202,400,404,503

*Proposal: *
3.1.  If MIME type is *application/www-form-urlencoded*, allow
dataset_uri , feature_uris[] and compound_uris[] are input parameter for
PUT and POST operations.    This will facilitate assigning new dataset
id to client specified subsets of data.  URL length is not an issue
anymore, since parameters are passed via POST content body.

example: 
POST /dataset 
dataset_uri=http://myservice/dataset/1 
feature_uris[]=/selectedfeature1
feature_uris[]=/selectedfeature2

3.2.  For file uploads, agree on fixed name for file upload parameter
in  *application/www-form-urlencoded  *- e.g. *file_upload*. 
When uploading content other than RDF (e.g. MOL, SDF, SMILES), there are
currently no means how to assign metadata (even file name is not
available when POSTing content other than RDF).

4. Query API.  There is currently no agreed API on querying for   .
There are some custom implementations:

Query for property/identifier  value
http://ambit.uni-plovdiv.bg:8080/ambit2/compound?property=CAS&search=50-00-0
<http://ambit.uni-plovdiv.bg:8080/ambit2/compound?search=55-55-0>
or
/compound?search=phenolphthalein
<http://ambit.uni-plovdiv.bg:8080/ambit2/compound?search=phenolphthalein>

Proposal:  /compound?search=value&sameas=http://url_from_an_ontology , e.g.

/compound?search=50-00-0&sameas=http://www.opentox.org/api/1.1#CASRN

Substructure
/query/smarts?search=c1ccccc1O&max=100
<http://ambit.uni-plovdiv.bg:8080/ambit2/query/smarts?search=c1ccccc1O&max=100>

Similarity
/query/similarity?search=c1ccccc1&threshold=0.8
<http://ambit.uni-plovdiv.bg:8080/ambit2/query/similarity?search=c1ccccc1&threshold=0.8>

AFAIK, IST implementation uses /compound/{id}  API , which seems
reasonable for first two cases, but there might be issues with embedding
non-ascii symbols in {id}  (e.g. InChI , Smiles)

Best regards,
Nina