[OTDev] R Web Interfaces

Vedrin Jeliazkov vedrin.jeliazkov at gmail.com
Mon Jul 5 16:17:49 CEST 2010


Hi All,

Today's SWDT conf call prompted me to have a look at the various
options for making R functionalities available via a web interface. I
assume you're already familiar with the corresponding FAQ entry:

http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-Web-Interfaces

Among the plethora of options, my attention was attracted by the
solution that has been developed at Uni-Augsburg's Dept. of Computer
Oriented Statistics and Data Analysis:

http://stats.math.uni-augsburg.de/Rserve/

More up-to-date information for Rserve is available at:

http://www.rforge.net/Rserve/

In particular, the following Rserve configuration options should
address (at least to some extent) Surajit's worries on memory
allocation:

- - - - - - - - 8< - - - - - - - -
maxinbuf and maxsendbuf are rather special. Previous versions of
Rserve had fixed buffer sizes. Since 0.1-9 internal buffers change
per-connection automatically. The maxinbuf specifies (in kilobytes)
the maximal allowable size of the input buffer, that is the maximal
size of data transported from the client to the server. Analogously
maxsendbuf sets the maximum size of the send buffer, that is the size
of data sent from Rserve to the client. If your server is likely to
process very many parallel connections you may want to lower this
setting for security reasons. On the other hand if the server will
process only few connections in parallel and you expect very large
data, raise the value accorsing to your computer's memory. Basically
the settings are present to prevent malicious users from crashing your
server by supplying too large data. 0 has a special meaning telling
Rserve to allow unlimited use.
- - - - - - - - 8< - - - - - - - -

This doesn't seem to do exactly what Surajit was asking for (an upper
limit for R's memory pool), however it does limit R's input and output
size, thus providing a mechanism for a fair resource distribution
among a larger number of requests being processed in parallel. It
might be inconvenient that only Java, C, C++ and R clients are
available currently for Rserve, which uses its own binary protocol for
data transfer and needs a client implementing this protocol. In
particular, a native Ruby client doesn't seem to be available, however
I guess it should be possible to invoke one of the C/C++/R clients
from Ruby. I find particularly attractive and well suited to our needs
the following Rserve features, which enhance R with server-friendly
capabilities:

-- fast - no initialization of R is necessary;
-- binary transport - the transport protocol sends R objects as binary
data, not just R text output;
-- automatic type conversion - most R data types are converted into
native data types, e.g. the result of rnorm(10) will be double[10] in
C/Java. Java client also provides classes for new R types such as
RBool, RList etc;
-- persistent - each connection has its own namespace and working
directory. Every object you create is persistent until the connection
is closed. The client doesn't have to fetch or store intermediate
results;
-- client independence - since the client is not linked to R there are
no threading issues like in RSJava etc;
-- security - Rserve provides some basic security by supporting
encrypted user/password authentication with server challenge. Rserve
can be also configured to accept local connections only;
-- file transfer - the Rserve protocol allows to transfer files
between the client and the server. This way Rserve can be used as a
remote server even for task such as generating plot images etc;

It might be worth considering a REST webservice, acting as a wrapper
of Rserver client functionalities and relying on an instance of R,
running on a dedicated hardware with sufficient CPU and RAM resources.
One could even imagine a cluster of R servers, called in a (e.g.)
round-robin sequence by the wrapping webservice, if/when the typical
load is greater than any single server can manage to process in a
reasonable time. Some of these R servers could be configured to accept
only small input/output, while others might be dedicated for
processing large data sets. The later might require specific user
authorization, etc...

Does anyone have some experience with Rserve or some of the other
alternatives? Any feedback/comments/suggestions would be most welcome!

Kind regards,
Vedrin



More information about the Development mailing list