Additional testing of Mercury with 2GB and 512MB RAM

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
Greg Coit's picture

My name is Greg Coit, sysadmin for Chapter 3 and I've been helping with Mercury development and testing.

We wanted a get a quick idea of how hard we could push mercury under more "real world" circumstances, so I combined siege and ab to generate a broad spectrum of hits. ab (short for apache benchmark and part of the apache2-utils package) allows you to generate a very large number of hits on one url, while siege (a perl script which comes in a self-titled debian/ubuntu package) lets you spread the hits across many urls, most of which won't be cached. This mixed-load is a much more nuanced and accurate way of looking at performance than peak throughput on a single url.

For this test we used 2 types of target servers:

1) An Amazon Web Services (AWS) small 32-bit instance running Mercury Alpha 0.6. These come with 2GB ram.

2) To test a tighter resource environment, we also used a Slicehost.com VPS with 512MB ram. This Ubuntu 9.04 32-bit server was setup exactly like Mercury Alpha 0.6.

Both target servers had 2000 nodes created using the devel module (with each node having up to 5 comments per node). After Solr had had a chance to index all 2000 nodes (and after turning on Performance logging), memory usage looked like this:

the 2GB RAM Server target server:

             total       used       free     shared    buffers     cached
Mem:       1747764     357164    1390600          0       9352     221924
-/+ buffers/cache:     125888    1621876
Swap:       917496          0     917496

the 512MB RAM target server:

             total       used       free     shared    buffers     cached
Mem:        524508     395948     128560          0       6052     205684
-/+ buffers/cache:     184212     340296
Swap:      1048568          0    1048568

We booted up a second (source) server on each network to run the test from (all tests were run against the internal network IP to reduce the effects of network lag). We generated a list of url's for siege to use::

#!/bin/bash

for ((a=1; a<=2000 ; a++))
do echo "http://internal_url/node/$a"
done

and redirected the output to urls.txt. We then ran the following command:

siege -c 32 -i -t 5m -d 5 -f urls.txt

this creates 32 concurrents users hitting any of the 2000 nodes randomly with a random sleep of up to 5 seconds per user for 5 minutes.

While that was running, we ran the following command from the same server:

ab -c 100 -n 50000 http://internal_url/

This command generates 100 concurrent users hitting the front page at a time (up to a maximum hits of 50,000).

ab produces the following result against the 2Gb target server:

Concurrency Level:      100
Time taken for tests:   38.985 seconds
Complete requests:      50000
Failed requests:        0
Write errors:           0
Total transferred:      827324795 bytes
HTML transferred:       802179441 bytes
Requests per second:    1282.55 [#/sec] (mean)
Time per request:       77.970 [ms] (mean)
Time per request:       0.780 [ms] (mean, across all concurrent requests)
Transfer rate:          20724.37 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0   20  86.6      9    3048
Processing:     2   58  34.9     56     576
Waiting:        0   22  25.6     11     563
Total:          3   78  92.0     71    3103

Percentage of the requests served within a certain time (ms)
  50%     71
  66%     91
  75%     98
  80%    103
  90%    121
  95%    134
  98%    152
  99%    165
100%   3103 (longest request)

The results of siege are:

Transactions:                    3645 hits
Availability:                  99.64 %
Elapsed time:                 300.39 secs
Data transferred:               3.08 MB
Response time:                  0.16 secs
Transaction rate:              12.13 trans/sec
Throughput:                     0.01 MB/sec
Concurrency:                    1.95
Successful transactions:           0
Failed transactions:              13
Longest transaction:            6.63
Shortest transaction:           0.00

The load average briefly hit 10, but RAM available never went below 1 GB - no swap was used by the target server in this test. 2 things are clear: we could push much harder (but multiple source servers would need to be used - it's too easy to overwhelm the network socket of the source server) and an AWS small instance running Mercury can handle a huge spike in traffic.

ab generates the following result against the 512MB target server:

Concurrency Level:      100
Time taken for tests:   258.123 seconds
Complete requests:      50000
Failed requests:        744
   (Connect: 0, Receive: 0, Length: 742, Exceptions: 2)
Write errors:           0
Total transferred:      787021946 bytes
HTML transferred:       762116931 bytes
Requests per second:    193.71 [#/sec] (mean)
Time per request:       516.245 [ms] (mean)
Time per request:       5.162 [ms] (mean, across all concurrent requests)
Transfer rate:          2977.56 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  163 552.0     60    9041
Processing:     0  349 709.0    240    9249
Waiting:        0  103 391.4     60    8480
Total:         20  513 900.0    300    9361

Percentage of the requests served within a certain time (ms)
  50%    300
  66%    340
  75%    364
  80%    380
  90%    440
  95%   3196
  98%   4200
  99%   5100
100%   9361 (longest request)

and the results of siege are:

Transactions:                    1692 hits
Availability:                  98.89 %
Elapsed time:                 299.92 secs
Data transferred:               4.09 MB
Response time:                  2.84 secs
Transaction rate:               5.64 trans/sec
Throughput:                     0.01 MB/sec
Concurrency:                   16.01
Successful transactions:        1692
Failed transactions:              19
Longest transaction:           30.78
Shortest transaction:           0.05

The 512MB target server used 10MB of SWAP in this test, and had a peak load average of 2.1. I suspect that the internal network on this VPS kept the number of hits (and the peak load average) much lower than the first test, but it's still many times faster and much more data than you can get from an external network.

These test indicate we are going in the right direction. Next up (after our beta release of Mercury) is adapting Jacob Sing's great jMeter test suite, which will allow us to do much more in-depth (and real-world-like) testing of Mercury.

Comments

Nice work Greg, this is

SeanBannister's picture

Nice work Greg, this is really kicking ass. I can't wait to benchmark this on larger instance sizes, that new 68gb instance would make an awesome DB server.

Awesome work. I'm gonna copy

brianmercer's picture

Awesome work. I'm gonna copy it for my nginx tests.

Couple of questions. I've read that Slicehost doesn't currently offer 32-bit kernels, but that you can run a 32-bit chroot on your 64-bit kernel to save memory. Would a 64-bit kernel account for the almost 50% greater memory use on Slicehost? 184212 v 125888?

I'd be interested to see the results on a longer siege. Varnish must have been passing requests through for the first request of a particular node and caching it and then occasionally serving from Varnish on the second random request to a given node. In those 5 minutes, it made 3645 requests. Maybe a statistician could tell us out of 3645 requests to 2000 random nodes how many were Varnish cache hits and how many were misses or maybe pull a hit/miss stat from the Varnish log.

It suggests another test for fully static sites. In that case you could precrawl all 2000 nodes to get them into the Varnish cache first and then measure performance. Or come up with a ratio of static to dynamic pages like 4:1 and tell Varnish not to cache a certain 20% of the site. Then do a random siege test where you know that over a long enough siege that about 80% of the requests would be coming from the Varnish cache and the other 20% would be dynamically generated.

Great work. Thanks.

Varnish is varnish

joshk's picture

It suggests another test for fully static sites. In that case you could precrawl all 2000 nodes to get them into the Varnish cache first and then measure performance.

That's a great idea in terms of pre-warming the cache (and actually there's a way to do this from the back side via the Varnish control channel), but once the cache is set the performance is going to once again fall out on the network interface. There are ways you can squeeze more out of this end, and there are commercial products that can maximize static file serving even further, but really once you're up to thousands of requests per second, your problems are likely solved for those pages.

The really critical question for speed is what to do to maximize the cases where the whole page isn't cached in Varnish. That breaks down to two things, I think:

1) Implementing more next-gen features to push caching up the stack: ESI is the Next Big Thing here.

2) Making Drupal itself faster faster faster with better system tuning, PHP configuration, backend caching, and database speed. The frontier here is rockier, as a lot of this depends on how you build your Drupal site. However, there are some exciting things on the horizon like Drizzle which coud lead to much better database performance, and next-gen PHP optimization methods like Quercus.