[OTDev] Datatype of features

chung chvng at mail.ntua.gr
Tue Dec 29 15:15:41 CET 2009


Hi Nina, All,
 I tried to use the dataset
http://ambit.uni-plovdiv.bg:8080/ambit2/dataset/6 to build some models
and I managed to build some MLR and SVM ones but I encountered some
problems handling the datatypes of features. For example, this dataset
includes the following contradictory entries:


## line: 1406
  </rdf:Description>
  <rdf:Description rdf:nodeID="A400">
    <ot:value
rdf:datatype="http://www.w3.org/2001/XMLSchema#double">6.2426</ot:value>
    <ot:feature
rdf:resource="http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11954"/>
    <rdf:type
rdf:resource="http://www.opentox.org/api/1.1#FeatureValue"/>
  </rdf:Description>

and

## line: 1386
  <rdf:Description rdf:nodeID="A396">
    <ot:value rdf:datatype="http://www.w3.org/2001/XMLSchema#string">
</ot:value>
    <ot:feature
rdf:resource="http://ambit.uni-plovdiv.bg:8080/ambit2/feature/11954"/>
    <rdf:type
rdf:resource="http://www.opentox.org/api/1.1#FeatureValue"/>
  </rdf:Description>

This means that the same feature appears as string and double in the
same dataset. My understanding is that the second one is an empty
string, i.e. a missing value, but I think it would be better if missing
values where just missing. This would lead to a smaller RDF
representation. What do you think?

- Is there some query for the dataset with which we could retrieve only
the non-string features?

I have the feeling we're moving towards some kind of integration and
that's quite encouraging. I deployed the new version today to let you do
some tests. Working components are:

* MLR and SVM model creation provided that the target attribute is
declared as numeric.
* All GET methods on /model
and /model/{id}, /model/{id}/predicted, /model/{id}/dependent, /model/{id}/independent
* POST on /model/{id} is supported only for MLR models and what is
returned is just an ARFF representation of the predicted data.


I attach some draft documentation reports for the services...


Best regards,
Pantelis
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Algorithm_GET.doc
Type: application/msword
Size: 36352 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20091229/6f934f93/attachment.doc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MLR_Regression_POST.doc
Type: application/msword
Size: 35840 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20091229/6f934f93/attachment-0001.doc>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SVM_Regression_POST.doc
Type: application/msword
Size: 43520 bytes
Desc: not available
URL: <http://lists.opentox.org/pipermail/development/attachments/20091229/6f934f93/attachment-0002.doc>


More information about the Development mailing list