[OTDev] Dataset RDF
Nina Jeliazkova nina at acad.bgThu Dec 3 13:49:02 CET 2009
- Previous message: [OTDev] Dataset RDF
- Next message: [OTDev] Dataset RDF
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Dear Christoph, All, Christoph Helma wrote: > Dear Nina, all, > > My main point was not the representation of multiple features/compound > (my example was too simplified), but how to > > - indicate that a collection of triples (i.e. graph) belongs to a certain dataset > Add a property ( a predicate), relating the dataset and the compound-feature-value triple (or the equivalent to data entry). > - represent metadata about a dataset > If all metadata can be represented as e.g. DC properties, this is as simple as adding DC.title, DC.creator, etc. properties to the dataset object. > Maybe some of my confusion arises also from the fact that > > - I have to insert triples (and create anonymous nodes) "by hand" with > Redland (AFAIK there is no automated mechanism to create more complex > statements - but the documentation is very sketchy) > Same for other languages - I have put some examples last days at http://opentox.org/data/documents/development/RDF%20files/JavaOnly/JenaExamples, <http://opentox.org/data/documents/development/RDF%20files/JavaOnly/JenaExamples> these should be more or less similar for all languages. <http://opentox.org/data/documents/development/RDF%20files/JavaOnly/JenaExamples> > - I have problems to translate the syntactic sugar of your examples into > bare-bones triples > Well, this is a good point, I can add examples in NTriple format. Personally, I switch into "triple" mode in Protege to examine triples. > I have e.g. a feature generation service, that creates a dataset with > features and sends it to the dataset service. If I understand your > example and Redland correctly I would have to do the following steps to > create the proposed structure: > > - create an anonymous node for each compound > - assert that the compound is a ot:compound > - set the identifier URI for the compound > > - create an anonymous node for each feature > - assert that the feature is a ot:feature > - set the identifier URI for the feature > - set the title of the feature > - set the source of the feature > > - create an anonymous node for each feature value > - assert that the value is a ot:FeatureValue > - define the ot:feature of the value > - assert the literal value > > - create an anonymous node for each data entry > - assert that the data entry is a ot:dataEntry > - assert that the compound is a ot:compound > - assert for each feature value that it is a ot:values > > - create an anonymous node for the dataset > - assert that the dataset is a dataset > - set the identifier URI for the dataset (this has to be rewritten by > the dataset service!) > - insert all data entry nodes > More or less yes (you might use anonymous nodes or named ones) . > All in all this is quite a lengthy and complicated procedure for a rather > simple task(I hope I finally have got the idea while writing this down). > Well, yes, but the flexibility of triples come with its verbosity. > I am proposing two things to reduce the complexity: > > The most straightforward solution to handle sets of graphs (i.e. > multiple datasets) is to use named graphs, context, quadruples (you name > it - the concepts are more or less the same). Most RDF > libraries/datastores support this, but it is not straightforward to > express these concepts in RDF/XML. Instead of using a workaround that > complicates things, I would suggest to let the dataset service handle > It is the recommended way to create data models with triples, one could model lot more complicated things with simple predicate logic... > datasets (see my previous post). The beneficial side effect is, that > we can simplify the RDF model to a large extend. The first dataset > Thus we simplify the syntax, with the expense of losing an essential functionality , which was the original reason to use RDF. Regarding the quads, IMHO , it complicates the setup, because we can't use the most popular serialization formats and not all libraries have support for contexts. And we have a rather simple data structure (set with some structured entries within), which needs just one additional predicate to be modeled without involving named graphs. In fact I've tried couple of times to simplify the current proposal in Protege, but without success. This is just a non-binary relationship, which can't be modeled with single predicate. One can try using rdfs:Containers for dataset, instead of predicate relating dataset and dataentry, but this results in going into OWL-Full language, where automatic reasoning is much harder than OWL-DL. Advice from experts is highly appreciated. > example can be eg. rewritten without any loss of information as > > # multiple features/compound, simple features > <http://myservice/compound/{id1}> dsstox:MultiCellCall "true"^^xsd:boolean . > <http://myservice/compound/{id1}> lazar:MultiCellCallPredicted "true"^^xsd:boolean . > > (assuming that dsstox:MultiCellCall, lazar:MultiCellCallPredicted > provides the feature definitions). > This is what I am trying to tell since a while - the assumption is wrong. One can't mix predicates and objects. Once you have used dsstox:MultiCellCall in the place of predicate (property), it can't be considered a resource anymore, you can't have statements dsstox:MultiCellCall owl:sameAs something, nor dsstox:MultiCellCall dc:title "something" nor dsstox:MultiCellCall ot:units "something" . You can't relate this feature to Models, Validation objects, etc. If we go this direction, we simply abandon the power of RDF/OWL (querying, reasoning) for features/datasets and are treating it as pure serialization format, not much different than ARFF or MS Excel. We could have stayed with XML as well and not lose couple of months for educating ourselves. If it is fine for other partners, OK. Implementation-wise there is not problem for ambit, I am not changing the internal structures anyway, just adding more code to generate different serializations. But we just lose lot of nice querying options , ability to linking to external ontologies, etc. > It can be retrieved by asking for GET /dataset/{id}. The corresponding > meta-information from GET /dataset/{id}/metadata would be > > dc:identifier "http://myservice/dataset/{id}"^^xsd:string ; > dc:title "Multi Cell Call prediction from lazar"^^xsd:string ; > > > The expression of more complex features is also straightforward: > > # multiple features/compound, more complex features > <http://myservice/compound/{id1}> > fminer:BBRC [ > fminer:smarts "NN" ; > fminer:p_value "0.97" ; > fminer:effect "activating" > ]; > fminer:BBRC [ > fminer:smarts "CO" ; > fminer:p_value "0.95" ; > fminer:effect "deactivating" > ]. > > # in explicit notation with anonymous nodes > <http://myservice/compound/{id1}> fminer:BBRC _:feature1 . > _:feature1 fminer:smarts "NN" . > _:feature1 fminer:p_value "0.97" . > _:feature1 fminer:effect "activating" . > <http://myservice/compound/{id1}> fminer:BBRC _:feature2 . > > I would prefer if you could define a data model in RDFS or OWL with your proposal, with the ability to link features to other ontologies. This will help us avoid lot of misunderstanding. I think it would be best to leave the final decision (at least until February deadline) to other partners. Best regards, Nina > Best regards, > Christoph > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] Dataset RDF
- Next message: [OTDev] Dataset RDF
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list