A list of best Practices for getting decent Performance on High Traffic Sites

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
gdtechindia's picture

Hi Everyone.

Very important topics have been discussed in the high performance group in the past and i would like to request all members to help make a list of important factors, which can offer performance boost for Drupal website. I will list a few i know work very nicely. Please do add your suggestions.

  1. Use BOOST module to serve static content to users
  2. Cache router can also be used for caching.
  3. If you are on dedicated hosting, use LiteSpeed Webserver. Even there are some companies offering LiteSpeed hosting in shared environment.
  4. If you have really high traffic and static content, use CDN. (I haven't tried it yet)
  5. Use BLOCK CACHE module (in Drupal 5), while its already in Drupal 6.
  6. Use different dedicated servers for Database and content delivery (if your website project makes enough money to have such multi server setup).
  7. If you own a dedicated server, go for SCSI or SSD (its more expensive, but offers speed boost), instead of simple drives. Even it will be better to use two disks under RAID 0.
  8. Use Apache Solr (i am not sure, if it will work with LiteSpeed web server)

I welcome other members and experts to share their ideas and tips on increasing performance.

Thanks
Dhaliwal

Comments

Use BOOST module to serve

gchaix's picture

Use BOOST module to serve static content to users

Boost is an excellent choice. As is Varnish (like Project Mercury). Both have their benefits and costs. The "right" solution is going vary depending on your particular site design and user population. Authcache may also me a good option, especially if your traffic is primarily authenticated users.

Cache router can also be used for caching.

"Also" is not really the right word here. Cacherouter and memcached are used for Drupal's internal caching of variables, menus, etc. while Boost and Varnish cache pages and content externally. They do very different things. Ideally, you'd use both an internal cache (cacherouter or memcache) in addition to an external cache (like Boost or Varnish). Also, don't forget to install an opcode cache like APC so that the system doesn't have to keep compiling the PHP code on every pageload.

If you are on dedicated hosting, use LiteSpeed Webserver. Even there are some companies offering LiteSpeed hosting in shared environment.

While Apache can be a resource hog, I wouldn't necessarily recommend going with an alternate web server daemon. In my experience, while alternative web servers like lighttpd and nginx can provide a performance boost the increase is small enough that it may not be worth the additional complexity. We do use nginx for some special-purpose use cases - like serving static content - but I'm not convinced there's a compelling reason to drop Apache as the primary dynamic PHP content server.

If you own a dedicated server, go for SCSI or SSD (its more expensive, but offers speed boost), instead of simple drives.

Drive speed is important, but I think sufficient RAM is even more important. With enough RAM and a well-tuned cache you can keep much - if not all - of the site's data in vastly-faster RAM and only rarely have to resort to reading from disk. I'd take the money that you could spend on upgrading to SSDs and put it into RAM instead.

Even it will be better to use two disks under RAID 0.

ACK!!! Never ever run a server on RAID 0. By doing so, you not only remove any sort of redundancy in you storage you actually double the likelihood of a catastrophic disk failure (or triple if you have three disks in the array). In RAID 0, if any drive in the array fails the entire array is lost. Go with RAID 1 (with two disks) or RAID 5 (with three or more disks). You could even go with RAID 10 or RAID 50 if you have a lot of disks available. With any of the higher RAID levels (pretty much any RAID but RAID 0) you get the performance benefits of spreading the read/write traffic over multiple drive spindles but you also get redundancy in case a drive fails.

boost plus some others

akucharski's picture

I agree with gchaix on all his points.

  • more memory the better
  • RAID 0 will probably improve your performance a little but may not be worth it unless you are already in a high availability environment
  • off loading search from your main drupal install is generally a good idea. Using solr or google custom search engine are some ways in which you can do that.

Some other things I would add to the best performance practices:
- cache as much as possible
- split up your static images or other aspects to subdomains. You are only able download only four objects concurrently from same url, you increase this by splitting them up
- making sure APC or another php accelerator is running is a big performance boos, especially if they are configured properly

Of course monitoring your server performance with tools like cacti is essential as it will provide you with a visual on how your site changes are impacting your server performance, which may lead to a decreased page load performance

Dev and Support: prometsource.com

This could get complex...

jburnett's picture

I agree with gchaix on all points. This is actually something that we do quite often and we've got it down pretty well. Cache as much data as possible, RAM is hugely important and making sure that your MySQL is configured properly. These are the 3 areas we are able to gain the most improvement when working with high traffic sites. I see MySQL as being a bottleneck for a lot of sites just because whoever set it up didn't have the necessary experience to get everything right. Take the time and find an expert, our last big event on a Drupal site actually did 13,000 queries per second against one of the MySQL servers in the cluster.

What about Core Cache and Views cache and others tools?

KarimB's picture

What about:

  • Drupal core cache
  • Views2 caching
  • Throttle module
  • DB Maintenance module
  • php e-accelerator

Are they on the list? Do you recommend them?

Other Tools

mikeytown2's picture
  • Drupal core cache puts an unnecessary load on MySQL; since it only works with anonymous users an "external" cache like boost/varnish is ideal.
  • Views 2 caching is a great tool
  • Throttle module is a bad idea for 99% of the users out there
  • DB Maintenance module - Optimizing tables when no one is using the site can help to improve performance, issue being the "no one is using" part.
  • php e-accelerator - Same goal as APC.

Out of all of the above I recommend views2 caching; it's simple yet effective.

Thank you but...

KarimB's picture

Thx you for your fast response and the cool tips. But..,

Drupal Core Cache and Boost
Do you mean that if we are using boost for anonymous users it’s better to don’t activate the Drupal core cache?

DB Maintenance module
OK. But how or when do you optimize your tables?

Thx and best regards

Karim

Drupal core cache

mikeytown2's picture

Drupal's core cache only work for anonymous users. Boost only works for anonymous users. For most sites, it is sorta pointless to have them both enabled.

Optimizing tables: If you know there won't be any logged in users at a given time then run it at that time. My understanding is the Optimize command locks the table and creates a new one on the hard drive. Once done, table access is a lot faster, but while optimize is going you can stall your site.
http://www.mysqlperformanceblog.com/2008/05/01/learning-about-mysql-tabl...

Thx Mikey, here is my list so far....

KarimB's picture

Implemented.

  1. Optimize views queries

  2. Page and block Views cache (views2 caching)

  3. Core Cache or boost for anonymous users (http://drupal.org/project/boost)
    Boost is really awesome. As you recommended ("Drupal's core cache only work for anonymous users. Boost only works for anonymous users. For most sites, it is sorta pointless to have them both enabled."), I didn't enable the Drupal's core cache since I use the boost module as you recommended . But I'm wondering if there’s no performance issues since without enabling the core cache, the table cache_page is empty. Does memcache ignore this table too?

  4. Memcache API (http://drupal.org/project/memcache)
    I'm still wondering how it works but I suppose that instead of storing cache data in to the DB, it stores in the memory. I can see a huge performance gain with this module.

Not implemented yet. What do you think about the following strategies?

  1. Memcache for sessions (http://www.hyperionreactor.net/blog/storing-drupal-sessions-memcache)

  2. Memcache path (http://drupal.org/project/pathcache)

  3. No Anonymous Sessions (http://drupal.org/project/no_anon)

  4. Authcache (http://drupal.org/project/authcache)

Thx again for the tips

Using Pressflow instead of

brianmercer's picture

Using Pressflow instead of stock Drupal will also give you the anonymous lazy sessions. There may be an issue using Memcache for sessions with Pressflow.

cache_page table

mikeytown2's picture

cache_page stores the full html version of the page; same thing that Boost does in the cache folder. Don't worry if that table is empty with boost enabled, they are caching the same thing, thus you don't need the cache_page table. Memcache can use the cache_page table, but you are still booting up PHP when doing this so Boost is usually faster even in this case.

Parallel

nirad's picture

What do you guys think of Parallel? Are there any potential conflicts when using it along with the other methods mentioned here? Does using good "internal" (memcached) and "external" (Boost) caching obviate the need for this kind of delivery system? Should I just use a "real" external CDN?

-nirad

Parallel addresses a

brianmercer's picture

Parallel addresses a different problem. Take a look at this from the Yslow dev and Yahoo optimizer guy:

http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connecti...

You also get the benefit of cookie-free domains for static content which is another one of Souders' Yslow optimization recommendations.

There is some added overhead from doing a few more DNS lookups. So if you're already optimizing your site by having aggregated css/jss files and aggregated img files by using css sprites, then there's less need to use Parallel.

Parallel is really good if you're using a lot of images and many css and js files on a page.

Of course a real CDN gives you geographically diverse servers that will be closer to your visitors, but they have an extra cost and take quite a bit more setup than Parallel.

So if you're already

dalin's picture

So if you're already optimizing your site by having aggregated css/jss files and aggregated img files by using css sprites, then there's less need to use Parallel.

I disagree. By getting static content off of your server you have the benefit of being able to tune your webserver as an application server only rather than a mixed application/static content server.

Of course a real CDN gives you geographically diverse servers that will be closer to your visitors, but they have an extra cost and take quite a bit more setup than Parallel.

A good geographically dispersed origin-pull CDN should only set you back about $30-50 / month depending of course on your bandwidth usage. Origin-pull being the easy-to-setup flavour of CDNs.

You can use Parallel to get running with an origin-pull CDN fairly quickly.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Apache tuning

rob d's picture

I've been tuning my apache instance rather than moving to a different http server.

The default apache config for the work module can be improved significantly.

Work out which worker module type I'm using:

sysadmin@WebSrv01:~$ apache2 -V
Server version: Apache/2.2.11 (Ubuntu)
Server MPM:     Prefork

These were the defaults for apache, installed from Ubuntu repos:

Timeout 300                                        <-- this holds the server in a wait of upto 300 seconds for a connection to timeout
KeepAlive On                                      <-- keepalives allow for multiple requests per connection, holding open sessions on the web server
MaxKeepAliveRequests 100                 <-- 100 KeepAlive requsts
KeepAliveTimeout 15                           <-- which timeout after 15 seconds
<IfModule mpm_prefork_module>
    StartServers          5                        <-- start 5 servers
    MinSpareServers       5                    <-- keep 5 spare
    MaxSpareServers      10                  <-- to a max of 10 spare
    MaxClients          150                      <-- hand 150 clients per server
    MaxRequestsPerChild   0                 <-
</IfModule>

We are now running as follows:

Timeout 10                                          <-- connections timeout after 10 seconds
KeepAlive Off                                      <-- no keepalive, new connection per request
HostnameLookups off                          <-- don't wait doing DNS/RDNS lookups
<IfModule mpm_prefork_module>
    StartServers          50                      <-- start 50 servers
    MinSpareServers       15                  <-- keep 15 spare
    MaxSpareServers      30                  <-- to a max of 30 spare
    MaxClients       225                         <-- handle 225 clients per server
    MaxRequestsPerChild  4000            <-- stop at 4000 requests per child (stops memory leaks)
</IfModule>

For reference: http://www.ibm.com/developerworks/linux/library/l-tune-lamp-2.html?ca=dg...

KeepAlive isn't a bad thing

dalin's picture

KeepAlive isn't a bad thing, it allows you to "stream" multiple files over one connection, thus eliminating the overhead of creating a new connection for every request. So unless you serve all CSS/JS/Images/etc. through a CDN it is still useful. The problem being that the defaults are absurd for a high traffic Drupal site. My standard starting point is:

KeepAlive On
MaxKeepAliveRequests 20
KeepAliveTimeout 3

For B-grade browsers that are making 3 concurrent connections to download a page that gives each connection 3 seconds to download 20 resources each. Adjust if your page takes longer than 3 seconds to load, or if you have more than 60 resources on a page (at which point you probably have bigger issues anyway).

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

stayin' alive

rob d's picture

thanks for the pointer dalin.

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week