[OTDev] Experiments with RDF
Nina Jeliazkova jeliazkova.nina at gmail.comFri Oct 1 12:40:27 CEST 2010
- Previous message: [OTDev] Experiments with RDF
- Next message: [OTDev] Experiments with RDF
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi Pantelis, On Fri, Oct 1, 2010 at 1:20 PM, chung <chvng at mail.ntua.gr> wrote: > Hi Nina, > Thanks for you input! > > > On Fri, 2010-10-01 at 08:32 +0300, Nina Jeliazkova wrote: > > > Hi Pantelis, All, > > > > > > On Thu, Sep 30, 2010 at 9:33 PM, chung <chvng at mail.ntua.gr> wrote: > > > > > Hi all, > > > During a round (rectangle in fact) table discussion in Rhodes we > > > questioned the efficiency of web services based on RDF and in > particular > > > its OWL-DL variant. I gathered some statistics using ToxOtis while > > > experimenting with downloading and parsing datasets. Also we've tested > > > the performance of ToxOtis in converting dataset objects into weka > > > objects (weka.core.Instances); the latter are useful to users of Weka. > > > These are preliminary results and we must not jump into conclusions but > > > we can start a discussion around some performance issues. Java > > > developers may use ToxOtis as a kind of client-profiler for their > > > services. Find attached a draft report that attempts to correlate the > > > size of a dataset with the computational effort needed for its parsing. > > > > > > > > > Would it be possible to run further experiments - in particular: > > > > - Split the reported time into time, necessary to download the RDF > > representation from the server, and time, necessary to parse and load the > > RDF as Jena object. The reason for asking is these two parts can be > > optimized by different approaches (minimizing file size by prefixing or > > compression for the former and exploiting different Jena storage models > for > > the later). > > > Yes, that was in my plans. > OK ,will be looking for the results. > > > > - Report time to parse RDF into different in-memory Jena models (ones > from > > > http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html(not<http://jena.sourceforge.net/javadoc/com/hp/hpl/jena/ontology/OntModelSpec.html%28not> > > sure which is being used in the tests now) > > > That would be also an interesting experiment. > > > > > - Report timings, using slightly different approach to convert to weka > > instances, namely , retrieve URIs of compounds first and then retrieve > > features for each compound in subsequent calls. > > > > > Well, the time needed to convert the dataset object into Instances is > relatively small. Do you think this needed to be optimized further. We > can do the experiment however. > > In my experience, Weka instances work fine up to certain limit (dependending on memory available to JVM) , but it is not possible to work with Weka for large datasets. Moreover, I assume in the current test setup , the data is at least duplicated (once in the RDF model and once as Weka instances) , which makes the memory consumption worse. Thus, the suggestion to load RDF in small chunks (e.g. per compound), create weka instances and immediately discard RDF instances. > > > - Report timings, when using Jena persistent storages , instead of > in-memory > > one (http://openjena.org/TDB/, http://openjena.org/SDB/ ) > > > I don't think that persistence will outperform the memory storage in > terms of computational time but will probably allocate less memory. > As the memory consumption is the bottleneck, it surely will do; persistent storages are usually optimized to do so. There are existing benchmarks showing in-memory Jena performs worst. > Apart from that, I don't think that such persistence is needed on the > client side since that data are persistent on the server. > IMHO in-memory Jena models will simply not work for datasets > few thousand entries , especially if the code runs as a server application (e.g. ToxPredict) and should support multiple simultaneous users. Besides time, could you record also memory related stats? > > > > > If we find an optimal setting after these experiments, the next step > would > > be trying to work with datasets, comparable with size to the raw malaria > > data. Ideally, would be nice to compare with RDF libraries, other than > > Jena, but this may require more efforts. > > > > We might find such a comparison online otherwise we could run some > tests! > Yes indeed. Nina > > > > Best regards, > > Nina > > > > > > > > > > > > Best regards, > > > Pantelis S. > > > > > > _______________________________________________ > > > Development mailing list > > > Development at opentox.org > > > http://www.opentox.org/mailman/listinfo/development > > > > > > > > _______________________________________________ > > Development mailing list > > Development at opentox.org > > http://www.opentox.org/mailman/listinfo/development > > > > > _______________________________________________ > Development mailing list > Development at opentox.org > http://www.opentox.org/mailman/listinfo/development >
- Previous message: [OTDev] Experiments with RDF
- Next message: [OTDev] Experiments with RDF
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Development mailing list