APC reduces Apache Thread Size: unlocking the power of top to grok

Events happening in the community are now at Drupal community events on www.drupal.org.
joshk's picture

At Do It With Drupal last December in New Orleans, I was very rude to the inestimable Matt Westgate by making a contradictory comment in his "Drupal under Pressure" high-performance session about how APC decreases rather than increases total memory utilization in Apache with mod_php and then having my phone ring with an emergency client call and having to walk out of the room without explaining what I meant. Poor form to say the least. Ever since I've been meaning to write up the full point, and finally here it is.

Basically the confusion here results from what the ultra-handy Linux utility top measures by default. As this great server fault answer explains, the default measurement you see in top is a aggregate memory number that includes shared resources. Since the main benefit of APC is using a shared and pre-compiled opcode cache for application code, that aggregate number includes that shared memory for each thread listed in top.

Here's the proof. I went to top and added the column for S: DATA = Data+Stack size (kb), which shows the actual/hard memory use for that thread. To do this, you go into top, hit "f" and then "S" (shift+s) to include the column. I also then sorted by that column as well by returning to the main screen and hitting "F" (shift+f) to pull up the sorting config screen. A picture is worth a thousand words, so here are some screenshots. First a simple stress test against a vanilla Mercury site with APC disabled:

As you can see, we've got the load up to 2.21, and a number of Apache threads at 18mb of memory. Now take a look at the same server, same test, with APC enabled and apache rebooted:

Not only is the effect of APC on performance clear (load = 0.21 and the test went about 4x faster), but the thread sizes are smaller: the first thread that ran pulled more memory since it was responsible for loading Drupal the very first time and storing it in APC, but the others are much smaller.

Hopefully this will help balance my Karma for disrupting Matt's presentation, and help more Drupalists explore the power and intricacies of top and APC!

AttachmentSize
no_apc.gif26.27 KB
yes_apc.gif26.41 KB

Comments

Nicely! It's safe to say most

matt westgate's picture

Nicely! It's safe to say most geeks know to use APC or some other opcode cache, but now the memory proof's in the pudding. Thanks for the puddin' ;)

It's not quite as simple as that...

owen barton's picture

Just to contradict your contradiction ;)

I am guessing you used a mod_php based setup for your tests, and in this case you should indeed see a reduction in thread size, because APC will be able to use a single RAM allocation for it's cache (since it is started at the same time as Apache).

For a cgi/fcgid/fastcgid (*cgi) setup you should actually see an increase (compared to mod_php+apc) in the total thread size for the apache+*cgi threads, because each thread cannot share the same cache (unfortunately apc cannot collaborate threads with shared memory, although we did a little work in that direction a while ago). Depending on the apc cache size you have set I think this could be higher than the memory usage with apc disabled.

That said, even with a *cgi setup the total memory usage for a challanging req/sec rate will be lower than for the same setup with APC disabled (and hence the max req/sec rate the system could handle will be higher) because although each apache+*cgi thread combo is higher the cache still allows much faster processing of requests, causing fewer total threads to be spawned.

Off topic, but in terms of mod_php vs. *cgi - I would say that for sites that normally use a regular volume of code for each request they are pretty equivalent, however *cgi has the advantage if you have particular pages that load a much increased volume of code (e.g. CiviCRM). This is because the *cgi processes can respawn periodically, whereas mod_php apache processes just inflate and hang around indefinitely and hold on to all of that RAM. *cgi also has a lot of security advantages on a shared server of course.

I'm not sure I understand

brianmercer's picture

I'm not sure I understand this. I use nginx with one php5-cgi that has 6 children. Each child process takes requests and shares the same APC cache.

Other than running under different different users, what is the advantage of running multiple php5-cgi instances instead of using child processes?

Process Management

nnewton's picture

The issue with that is process management, mainly that PHP sucks at it. There is work to fix this (php-fpm.org) which will soon be merged upstream, however for now PHP's management of its own child processes leaves a lot to be desired and can often result in random 503s in high load situations. However, if your dealing with load of that level you are also probably not that concerned with lack of memory.

Due to this, many people have nginx/apache/lighty manage the php processes, where each php process has 1 child or at the very least limits each php process to 5 or so children.

Yes!

kbahey's picture

yes, this is why fcgid is a net win, and our favored configuration at present. This is because fcgid manages the processes well, is stable and a net gain in memory too. Downside is no shared APC cache.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Thanks for the info.

brianmercer's picture

Thanks for the info.

I don't know how ngnix works

owen barton's picture

I don't know how ngnix works in terms of spawning threads, but unless it is doing something very different from Apache I would be surprised if they were really sharing the same cache. How are you testing this?

Here are a couple of requests/bug reports for APC - this is a long standing issue:
http://pecl.php.net/bugs/bug.php?id=11988
http://pecl.php.net/bugs/bug.php?id=11666

nginx isn't doing it. It's

brianmercer's picture

nginx isn't doing it. It's just speaking to the upstream php processes. But as the above guys have stated, it leaves php to do the process management.

I don't run high traffic sites, so I haven't run into the deficiencies of php as a process manager. I've also been following the progress of php-fpm as it makes its way into the official release of php. I'm glad that is going to happen.

FCGI_CHILDREN?

kbahey's picture

Are you using FCGI_CHILDREN? If so, then yes you can share APC across the php-cgi processes. But do you have a high watermark protection on the maximum number of CGI processes? In fcgid, there is one, and Apache handles the reaping of children and all that. Not sure if other FastCGI ways have it ...

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

I had forgotten about

owen barton's picture

I had forgotten about FCGI_CHILDREN - it's not a setup I have used. Google turned up http://www.brandonturner.net/blog/2009/07/fastcgi_with_php_opcode_cache/ which was a good refresher. From the sound of it FCGI_CHILDREN has the potential to cause some tricky performance issues/bugs of it's own, but perhaps a nice solution in the future.

Runaway processes

kbahey's picture

I have tried that a while back, and encountered runaway processes, which overflowed memory, and causing swapping. The exact reason I abandoned mod_php, but with no clear/stable/easy way to enforce an absolute maximum.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Nginx and PHP_FCGI_CHILDREN

mbutcher's picture

IIRC, the sort of "typical" Nginx configuration of PHP FastCGI is to use a single PHP FastCGI server and use PHP_FCGI_CHILDREN to set the number of children. (This is the way the Nginx wiki suggests setting things up, and we've done so with great success).

To this point, I had simply assume that the APC cache was shared between children. Our APC stats seem to look the same over repeated requests, and I would think they would look quite different if not shared.

After benchmarking, we have found 8 FastCGI children to 4 nginx processes to be about the optimal configuration on a four-core system. I can't remember, though, if we have set up anything regarding maximum number of requests per child. I think 1000 is about standard.

You hit the nail on the head!

kbahey's picture

Owen, as usual, you hit the nail on the head.

The issue with mod_php is that each Apache process has PHP inside it, whether it is serving static content or PHP requests. This is fine for small to medium sites, but once you start having several hundred Apache processes, that overhead adds up and can strain memory.

The solution is PHP as FastCGI. Each CGI process will have its own APC cache, and hence total memory size across all PHP processes will be more than total memory size across the same number of Apache processes.

However, when you run FastCGI, you need less PHP process than Apache, and when you run Apache as threaded (MPM Worker), the savings are even more.

So the net result is that you save memory on large sites by using FastCGI.

Look at the memory graphs in this article and you will see considerable gains
http://2bits.com/articles/apache-fcgid-acceptable-performance-and-better...

You can also run FastCGI in a configuration that will share APC across processes using PHP_FCGI_CHILDREN, but that would not be mod_fastcgi (has stability issues with certain configurations) not fcgid (which we find to be more stable). So, if you can find a stable combination you can have your cake and eat it too.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

CDN or Varnish?

BartVB's picture

Sounds like splitting of static content to a CDN (or just a separate domain) would be a more scalable and less problematic solution? Or use something like Varnish/Nginx/etc in front of Apache to handle the static content. That way Apache can be used with a shared APC cache and it would only handle the PHP content. This way you can keep Apache under control with a fairly low MaxClients setting. You don't want 200 Apache processes processing PHP pages and wrecking havoc on your CPUs.

MaxRequestsPerChild?

justintime's picture

@Owen - regarding your statement

whereas mod_php apache processes just inflate and hang around indefinitely and hold on to all of that RAM

Does setting MaxRequestsPerChild in Apache help mitigate this?

Not really

kbahey's picture

This has the effect of making each Apache process die after it serves a set number of requests (say 1,000 or 5,000). This helps when you have memory leaks or other subtle issues: each Apache process will terminate after some time.

However, if you have an avalanche of requests, say your site is on Digg's front page, that parameter will not help one bit, since Apache will create new processes and each did not yet serve the 1,000 or so requests, so they stay around.

The parameter that may help is MaxClients, which provides an absolute maximum on the number of Apache processes.

But with mod_php, MaxClients applies to all Apache processes, regardless of whether they server static content or PHP requests. Moreover, each Apache process will have a connection to MySQL as well, so you use more.

With FastCGI, you can run Apache threaded MPM Worker. You have a fewer number of PHP processes, fewer number of connections to MySQL, and static content serving is nimbler.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

I use this in my spawn-fcgi

brianmercer's picture

I use this in my spawn-fcgi init script for nginx.

PHP_FCGI_MAX_REQUESTS="1000"; export PHP_FCGI_MAX_REQUESTS
PHP_FCGI_CHILDREN="6"; export PHP_FCGI_CHILDREN

Six children is based on my little Linode 360. Total memory minus max mysql, minus other small stuff (collectd, four nginx processes, sshd, postfix, kernel, etc...) minus apc cache, and then divide the rest by php max memory, equals number of children.

Max Requests per client ...

kbahey's picture

This is maximum requests per client, not a maximum number of clients though. So it is like Apache's MaxRequestsPerProcess. What is needed is the equivalent of MaxClients but for FastCGI. fcgid has that, mod_fastcgi does not.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

yes

joshk's picture

I am guessing you used a mod_php based setup for your tests

Or from the part where I said I was using mod_php. ;)

Yes, it's all different under cgi/fcgi. APC is not very useful there.

I don't have the numbers in

mikey_p's picture

I don't have the numbers in front of me, but in my previous testing, and I think Kahlid's will back this up as well, eAccelerator has even greater memory savings, at the cost of slightly increased request processing time. This is an awkward tradeoff, but one to be aware of if you are trying to run a handful of small sites on a shared VPS with mod_php, you'll be able to run more Apache processes before paging out.

eAccelerator vs. APC

kbahey's picture

eAccelerator is both faster and saves more memory than APC, when used with mod_php at least.

Here are the detailed numbers, from some time ago:

http://2bits.com/articles/benchmarking-apc-vs-eaccelerator-using-drupal....

http://2bits.com/articles/benchmarking-drupal-with-php-op-code-caches-ap...

The issue with eAccelerator is that is not maintained and tended to crash a lot on our platform of choice (Ubuntu 8.04 LTS Server Edition).

P.S. Khalid it is ...

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

On the original post...

jburnett's picture

Josh,
I would be interested to see how your memory usage changed as the load progressed over time. We do see the memory increase with APC enabled on production sites, but I just replicated your test and your results here in our lab, so I don't have any data to support it currently. I am curious to how that memory footprint might change over time and if the increase in memory usage may be a byproduct of cache expiring or changing with new content.

On the off topic discussions...

jburnett's picture

Just my 2 cents, we have deployed most of the opcode cache configurations in our hosting environment over time and repeatedly come back to APC and mod_php, these are not just Drupal sites, but many thousands of php based sites. The combination of APC and mod_php seems to be the most stable configuration and provide extremely good speed in very high load environments. That being said, with the resources available preserving memory when measured in small increments is not a huge concern for us.
On the topic of MySQL connections, again system resources are not something that become a major issue in most of our environments because these are large enterprise type environments, but we have yet to see any performance degradation in an APC/mod_php environment that was not also present when utilizing the other configurations. We didn't capture resource utilization metrics or numbers, but based on pure stability and meeting customer expectations we currently use APC/mod_php in all of our environments.

mod_php vs cgi

joshk's picture

The combination of APC and mod_php seems to be the most stable configuration and provide extremely good speed in very high load environments.

This is my experience also. I tend to find at the concurrency level where the thread benefits of cgi come into play (e.g there are 20+ active drupal thread) I need more webheads because by then I'm CPU bound.

But I don't manage shared hosting setups, or servers which run lots of different sites, or really huge multiprocessor systems. My expertise is in commodity servers (or clusters of servers) dedicated to single applications with all anonymous/plain-apache traffic offloaded to Varnish.

As always, your mileage will vary based on your use-case. :)

Used to think like that ...

kbahey's picture

Agree on the AC part. We occasionally try others, but come back to APC for stability.

As for mod_php, I used to think that too, but on sites getting sudden rush, FastCGI is just much more resource efficient. See the graphs I posted for the difference.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

memory use vs page response time

joshk's picture

I'll be watching the fcgi-php project with interest as that should raise performance consistently. For me, the stability and speed of mod_php are still usually reason enough to trade off a bit against memory utilization.

php-fpm

kbahey's picture

Apache fcgid already exists, and works well.

I guess you meant php-fpm, which is now a third party project, but work is being done on integrating it with PHP core. Yes, something to watch for sure.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Yes

joshk's picture

php-fpm was what I'm looking at. There's a potential architecture with Nginex talking to a php-fpm backend that I find really interesting, but I am waiting for the whole thing to stabilize before abandoning good old mod_php

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: