[OTDev] R Web Interfaces

Mon Jul 5 17:17:47 CEST 2010

Excerpts from Vedrin Jeliazkov's message of Mon Jul 05 16:17:49 +0200 2010:
> Hi All,
> 
> Today's SWDT conf call prompted me to have a look at the various
> options for making R functionalities available via a web interface. I
> assume you're already familiar with the corresponding FAQ entry:
> 
> http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces
> 
> Among the plethora of options, my attention was attracted by the
> solution that has been developed at Uni-Augsburg's Dept. of Computer
> Oriented Statistics and Data Analysis:
> 
> http://stats.math.uni-augsburg.de/Rserve/
> 
> More up-to-date information for Rserve is available at:
> 
> http://www.rforge.net/Rserve/
> 
> In particular, the following Rserve configuration options should
> address (at least to some extent) Surajit's worries on memory
> allocation:
> 
> - - - - - - - - 8< - - - - - - - -
> maxinbuf and maxsendbuf are rather special. Previous versions of
> Rserve had fixed buffer sizes. Since 0.1-9 internal buffers change
> per-connection automatically. The maxinbuf specifies (in kilobytes)
> the maximal allowable size of the input buffer, that is the maximal
> size of data transported from the client to the server. Analogously
> maxsendbuf sets the maximum size of the send buffer, that is the size
> of data sent from Rserve to the client. If your server is likely to
> process very many parallel connections you may want to lower this
> setting for security reasons. On the other hand if the server will
> process only few connections in parallel and you expect very large
> data, raise the value accorsing to your computer's memory. Basically
> the settings are present to prevent malicious users from crashing your
> server by supplying too large data. 0 has a special meaning telling
> Rserve to allow unlimited use.
> - - - - - - - - 8< - - - - - - - -
> 
> This doesn't seem to do exactly what Surajit was asking for (an upper
> limit for R's memory pool), however it does limit R's input and output
> size, thus providing a mechanism for a fair resource distribution
> among a larger number of requests being processed in parallel. It
> might be inconvenient that only Java, C, C++ and R clients are
> available currently for Rserve, which uses its own binary protocol for
> data transfer and needs a client implementing this protocol. In
> particular, a native Ruby client doesn't seem to be available,

Yes, thats the reason, why I did not consider it for my purposes.

> however
> I guess it should be possible to invoke one of the C/C++/R clients
> from Ruby.

Possible yes, but going Ruby -> Java/C/C++ -> R -> Java/C/C++ -> Ruby is
far from straightforward, thats the reason why I prefer RinRuby (RSRuby
had threading problems).

> I find particularly attractive and well suited to our needs
> the following Rserve features, which enhance R with server-friendly
> capabilities:
> 
> -- fast - no initialization of R is necessary;
> -- binary transport - the transport protocol sends R objects as binary
> data, not just R text output;
> -- automatic type conversion - most R data types are converted into
> native data types, e.g. the result of rnorm(10) will be double[10] in
> C/Java. Java client also provides classes for new R types such as
> RBool, RList etc;
> -- persistent - each connection has its own namespace and working
> directory. Every object you create is persistent until the connection
> is closed. The client doesn't have to fetch or store intermediate
> results;
> -- client independence - since the client is not linked to R there are
> no threading issues like in RSJava etc;
> -- security - Rserve provides some basic security by supporting
> encrypted user/password authentication with server challenge. Rserve
> can be also configured to accept local connections only;
> -- file transfer - the Rserve protocol allows to transfer files
> between the client and the server. This way Rserve can be used as a
> remote server even for task such as generating plot images etc;
> 
> It might be worth considering a REST webservice, acting as a wrapper
> of Rserver client functionalities and relying on an instance of R,
> running on a dedicated hardware with sufficient CPU and RAM resources.
> One could even imagine a cluster of R servers, called in a (e.g.)
> round-robin sequence by the wrapping webservice, if/when the typical
> load is greater than any single server can manage to process in a
> reasonable time. Some of these R servers could be configured to accept
> only small input/output, while others might be dedicated for
> processing large data sets. The later might require specific user
> authorization, etc...

That would be my preferred solution! I am however not sure, how type
conversion could/should work with a REST interface (especially for
complex datatypes like dataframes). I am pretty sure that OWL-DL is too
slow for our purposes (tons of statistical significance calculations for
supervized graph mining, tons of local svm models for regression
predictions/validations). File transfer is a must for creating
reports.

> Does anyone have some experience with Rserve or some of the other
> alternatives? Any feedback/comments/suggestions would be most welcome!

I am working with RinRuby which communicates with a R server through a TCP/IP
socket. I am however not sure, if this implementation is 100% thread safe.

Best regards,
Christoph