[OTDev] Missing values [was Re: DataSet]

Nina Jeliazkova nina at acad.bg
Mon Oct 5 08:40:08 CEST 2009


Dear Pantelis,

chung wrote:
> Hi Nina,
>
> On Fri, 2009-10-02 at 17:43 +0300, Nina Jeliazkova wrote: 
>   
>> Hi Pantelis,
>>
>> chung wrote:
>>     
>>> Hi Nina,
>>>  Once we define the RESTful operation in the new version of the API, we
>>> will have to start developing. Yet from the API 1.0, models are trained
>>> provided a dataset URI, so we need such a dataset to do some experiments
>>> (build an Instances object, train a model, perform some predictions
>>> using the trained model). Is it possible for you to provide us a dataset
>>> URI? 
>>>       
>> I am not sure what is the question - can you please clarify?
>>     
>
> I mean that we need a dataset for which all RESTful operations specified
> in API 1.0 or API 1.1 are implemented and for every operation a status
> code 200 is normally expected. We need a dataset, say:
>
> http://someserver.com/dataset/123 (i)
>
> such that, for any compound in that, e.g.
>
> http://someserver.com/compound/55 (ii)
>
> and every feature definition in it:
>
> http://someserver.com/feature_definition/10 (iii)
>
> the following URI returns the value of the feature definition (iii) for
> the compound (ii):
>
> http://someserver.com/feature/compound/55/feature_definition/10 
>
> and will not return "NULL" or an error code (e.g. 404). 
> We need that dataset to develop model training web services. The input
> parameters to our services will be the dataset uri and probably a URI
> for the target feature. Will it be possible for you to provide us a
> complete dataset object with all RESTful operations implemented? I mean,
> we dont need a huge one, 20 compounds and some feature definitions will
> be ok, but we need every compound/feature_definition pair to correspond
> to a feature value!  
>   
I understand your reasonong, but please note in a generic setup some
feature values might be missing and it is not the dataset provider job
to fix that.  Handling missing values is usually done by the modeller,
we need still to think how to cast this process into the REST scheme.

For example in the Toxcast dataset there are plenty of entries with
missing values; one might address the issue with creating "derived"
dataset by ignoring the entries without values, but one could also
replace missing values with e.g, averages or using more complicated
methods.  I am copying this discussion to the development list as well,
because it is a generic question - should the OpenTox framework provide
API to handle missing values, where is the best place for this
(preprocessing algorithms?), what API do we need?

Regarding the implementation, I'll try to put a new version online you
next couple of days, fixing the issues you've reported.

Best regards,
Nina
> Best Regards,
> Pantelis
>
>   
>>> I see you have done much work but the method:
>>>
>>> GET 'Accept:text/uri-list' /dataset/{id}/feature_definition 
>>>
>>> is not implemented yet. 
>>>   
>>>       
>> Right, please submit bug/feature requests to the issue tracker:
>> http://sourceforge.net/tracker/?group_id=191756 
>>     
>>> I also noticed that there for every compound in the dataset 8 the
>>> corresponding feature_definitions are different.  
>>>   
>>>       
>> Yes, the dataset content at the server is just what was imported from
>> the original SDF (or other) files.   It might well happen compounds have
>> different properties.  I will be introducing possibility to create new
>> datasets, with user selected compounds and feature definitions . One
>> compound can belong to different datasets and properties for all
>> datasets are listed. 
>>
>> Most probably we'll need something like  
>> /dataset/8/compound/413/feature_definition  in order to retrieve only
>> features from dataset 8.
>>     
>>> Furthermore the request:
>>>
>>> GET 'Accept:text/uri-list' /dataset/{id}/compound
>>>
>>> returns a URI of all compounds in a dataset. Is it possible that you
>>> added a newline "\n" after every compound URI - it would be very
>>> helpful.
>>>   
>>>       
>> Argh, old bug .  Please report it to the issue tracker as above.
>>
>> Best regards,
>> Nina
>>     
>>> Best Regards,
>>> Pantelis
>>>   
>>>       




More information about the Development mailing list