[OTDev] OpenAM performance

Thu Jul 7 03:02:59 CEST 2011

Hi Folks,

We decided to change our test setup by moving the config store to
OpenAM's built-in OpenDS instance, while keeping the user store in a
separate OpenDJ instance. This setup follows more closely the OT
production AA instance configuration. The preliminary test results are
quite encouraging -- you can see some graphs here:

http://ambit.uni-plovdiv.bg/cgi-bin/smokeping.cgi?target=IDEA-DEV.AA

At the moment we already have 300K policies registered and the
observed latencies under heavy stress (up to ~50 requests per second)
are quite acceptable for a production service. In particular, the
workflow that would be executed most frequently (submit user/pass, get
token, and test authorisation for a policy-protected resource) has an
average latency of 18.3 milliseconds right now, which is perfect. Here
is a link to the corresponding graph, which also confirms that this
latency doesn't increase while the number of defined policies grows
from 0 up to 300000:

http://ambit.uni-plovdiv.bg/cgi-bin/smokeping.cgi?target=IDEA-DEV.AA.TestWorkflow-3a

The workflow, which includes also policy creation has a somehow
higher, but still acceptable latency:

http://ambit.uni-plovdiv.bg/cgi-bin/smokeping.cgi?target=IDEA-DEV.AA.TestWorkflow-2

The statistics we've gathered so far confirm that this latency
increases when the number of defined policies grows, however having in
mind that we're already in the range of several hundreds of thousands
of policies, this looks more or less acceptable, at least for the time
being.

Unsurprisingly the highest latency is observed for the workflow that
also includes policy deletion:

http://ambit.uni-plovdiv.bg/cgi-bin/smokeping.cgi?target=IDEA-DEV.AA.TestWorkflow-1a

The graph also suggests that this latency grows faster when increasing
the number of registered policies, than the policy creation latency
(see above). This is not so encouraging, however a possible workaround
would be to schedule bulk policy deletions for periods of low service
utilisation.

I had a look at the built-in (and stripped-down) version of OpenDS --
it dates from 15 Sep 2010, which happens to be before OpenDJ's first
version was released. However it does not correspond to any of the
official releases of OpenDS that are available here:

http://www.opends.org/

It looks like it was specially tweaked for OpenSSO (and OpenAM
lately). In addition to the differences mentioned so far, some other
important  changes in our setup include:

-- changed MySQL table definition as follows:

CREATE TABLE `pol` (
  `pol` varchar(4096) default NULL,
  `user` varchar(255) default NULL,
  `res` varchar(4096) default NULL,
  KEY `index_pol` USING BTREE (`pol`(767)),
  KEY `index_user` USING BTREE (`user`),
  KEY `index_res` USING BTREE (`res`(767))
) ENGINE=InnoDB DEFAULT CHARSET=ascii;

-- changed JVM settings as follows:

-Dorg.opends.server.LockManagerConcurrencyLevel=24
-XX:MaxPermSize=128m
-XX:+UseCompressedOops
-XX:+UnlockExperimentalVMOptions
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50
-XX:GCPauseIntervalMillis=1000
-XX:MaxTenuringThreshold=1
-XX:+AggressiveOpts
-XX:+UseBiasedLocking
-XX:+UseFastAccessorMethods
-Xverify:none
-Xmn1024m
-Xms6122m
-Xmx6122m
-Xss128k

(these settings are optimised for a server with 8 CPU cores and 8 GB of RAM)

-- improved several critical sections of the Pol service source code
(Nina could provide further details on this -- there are some more
optimisations left that still have to be implemented);

We have to check how stable would be this setup over a longer period
of time and under a more varied workload, however, the results
obtained so far are quite encouraging. It might be time to start
considering an upgrade of the server that will host the next instance
of the OpenTox AA. Based on my findings so far I would recommend at
least 8 GB of RAM (16 GB would be much better) and at least 8 CPU
cores (again more would be better) for this server. Such setup would
provide enough room for the volume of resources we're currently
dealing with, while also allowing more time for investigation of
alternative AAA solutions that could satisfy our needs better in
future.

We'll be running more tests in the following days in an attempt to see
what are the limits for the number of registered policies and the
associated query latencies on a machine with 8 GB of RAM and 8 CPU
cores.

Kind regards,
Vedrin

PS: It would be great if you could also check the results of running
you code against this test AA instance and report any issues you might
encounter while doing this. We'll keep online this service at least
for a few days, but not indefinitely, so please don't rely on it in a
longer term.