[OTDev] Validation: Efficiency

Mon Feb 28 10:44:37 CET 2011

On Sun, Feb 27, 2011 at 11:32 AM, Nina Jeliazkova
<jeliazkova.nina at gmail.com> wrote:
> Christoph, All,
>
>
>> >
>> > >
>> > Mapped to our services, there is a need for top level "noun"
>> >
>> > http://host:port/ambit2/{set_id}/{dataset_id}
>> >
>> > http://host:port/ambit2/dataset/{set_id}/{dataset_id}
>>
>> This is what I had in mind. I guess we will need a slight API
>> modification to create dataset sets (e.g. POST
>> http://host:port/ambit2/dataset/set to create a set, which can be the
>> target of a further POST to create a dataset).
>>
>> I am not sure if such a solution fits well into the framework, as the
>> OpenTox way to group datasets would be through ontology entries - but
>> that does not reduce the number of policies.  Lets hear Martins and
>> Andreas opinions first, maybe someone else has also another idea how to
>> reduce the number of validation policies.
>>
>>
> If the above will change the current pattern of /dataset/id , I am not much
> of favour of it (testing compliance across all partners is very time
> consuming and at this stage it is better to avoid any such changes). If only
> adding new resource,  without changing the current API,  it's fine.

Hi Nina, Christoph, All,

I just had a short discussion with Andreas Maunz, and we both think
that sets are a good solution.
Just a few points:

Downwards compatibility should be assured, the dataset service should
work as it does now.

The set concept would be needed for models too, as the number of
models grows with the number of folds, and so does the number of
policies (so far).

This point is more for my understanding how the whole thing would
work: A set would only contain resources of the local service, e.g.
<model-service>/set/<set-id> would only contain models from the same
service with URIs like <model-service>/set/<set-id>/model/<model-id>.
The model service uses the set URI for checking user rights at the
policy service (no wildcards needed at the policy service). When
creating a model (or a dataset) the set is given as 'destination
location' parameter. Is this how it could work?

>
> What about the following:
>
> The validation service starts a validation procedure. At this point it
> already knows it should split the dataset into N subsets and there will be N
> more datasets, holding prediction results.  It could allocate placeholders
> (empty datasets with known URIs) for all the necessary resources and create
> one policy, involving all URIs (as Andreas noted one policy could have many
> URIs) , then proceed with calculations.
>
> This will require an option to tell the model where to store the results
> (into the empty dataset created as above).  Such option was already
> discussed before in the context of descriptor calculation (to be able to
> POST/PUT results into a given dataset URI  - added as optional in the API )
> . Your implementation will need to be only slightly extended, to accept POST
> (or PUT is better in this case?) to a dataset, which is empty (I assume you
> could easily check if a dataset is empty).  Finally, as it is only one
> policy , the policy deletion issue should be resolved.
>
> Will this work?

Nice idea. I would favor the set concept though, because this approach
has IMHO some drawbacks:

Allocating the empty datasets, would require some
create-empty-dataset-without-policy mechanism, because you do not know
the dataset URI beforehand. This mechanism would require either a API
extension, or it would limit the validation service to only work with
'its own' dataset service.

Don't know how this would work for models.

Best regards,
Martin

>
>
> Best regards,
> Nina
>
>
>
>
>
>> Best regards,
>> Christoph
>>
> _______________________________________________
> Development mailing list
> Development at opentox.org
> http://www.opentox.org/mailman/listinfo/development
>

-- 
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 8442 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de