Posted by gdtechindia on February 13, 2010 at 5:36pm
Hi Everyone.
Very important topics have been discussed in the high performance group in the past and i would like to request all members to help make a list of important factors, which can offer performance boost for Drupal website. I will list a few i know work very nicely. Please do add your suggestions.
- Use BOOST module to serve static content to users
- Cache router can also be used for caching.
- If you are on dedicated hosting, use LiteSpeed Webserver. Even there are some companies offering LiteSpeed hosting in shared environment.
- If you have really high traffic and static content, use CDN. (I haven't tried it yet)
- Use BLOCK CACHE module (in Drupal 5), while its already in Drupal 6.
- Use different dedicated servers for Database and content delivery (if your website project makes enough money to have such multi server setup).
- If you own a dedicated server, go for SCSI or SSD (its more expensive, but offers speed boost), instead of simple drives. Even it will be better to use two disks under RAID 0.
- Use Apache Solr (i am not sure, if it will work with LiteSpeed web server)
I welcome other members and experts to share their ideas and tips on increasing performance.
Thanks
Dhaliwal
Comments
Use BOOST module to serve
Boost is an excellent choice. As is Varnish (like Project Mercury). Both have their benefits and costs. The "right" solution is going vary depending on your particular site design and user population. Authcache may also me a good option, especially if your traffic is primarily authenticated users.
"Also" is not really the right word here. Cacherouter and memcached are used for Drupal's internal caching of variables, menus, etc. while Boost and Varnish cache pages and content externally. They do very different things. Ideally, you'd use both an internal cache (cacherouter or memcache) in addition to an external cache (like Boost or Varnish). Also, don't forget to install an opcode cache like APC so that the system doesn't have to keep compiling the PHP code on every pageload.
While Apache can be a resource hog, I wouldn't necessarily recommend going with an alternate web server daemon. In my experience, while alternative web servers like lighttpd and nginx can provide a performance boost the increase is small enough that it may not be worth the additional complexity. We do use nginx for some special-purpose use cases - like serving static content - but I'm not convinced there's a compelling reason to drop Apache as the primary dynamic PHP content server.
Drive speed is important, but I think sufficient RAM is even more important. With enough RAM and a well-tuned cache you can keep much - if not all - of the site's data in vastly-faster RAM and only rarely have to resort to reading from disk. I'd take the money that you could spend on upgrading to SSDs and put it into RAM instead.
ACK!!! Never ever run a server on RAID 0. By doing so, you not only remove any sort of redundancy in you storage you actually double the likelihood of a catastrophic disk failure (or triple if you have three disks in the array). In RAID 0, if any drive in the array fails the entire array is lost. Go with RAID 1 (with two disks) or RAID 5 (with three or more disks). You could even go with RAID 10 or RAID 50 if you have a lot of disks available. With any of the higher RAID levels (pretty much any RAID but RAID 0) you get the performance benefits of spreading the read/write traffic over multiple drive spindles but you also get redundancy in case a drive fails.
boost plus some others
I agree with gchaix on all his points.
Some other things I would add to the best performance practices:
- cache as much as possible
- split up your static images or other aspects to subdomains. You are only able download only four objects concurrently from same url, you increase this by splitting them up
- making sure APC or another php accelerator is running is a big performance boos, especially if they are configured properly
Of course monitoring your server performance with tools like cacti is essential as it will provide you with a visual on how your site changes are impacting your server performance, which may lead to a decreased page load performance
Dev and Support: prometsource.com
This could get complex...
I agree with gchaix on all points. This is actually something that we do quite often and we've got it down pretty well. Cache as much data as possible, RAM is hugely important and making sure that your MySQL is configured properly. These are the 3 areas we are able to gain the most improvement when working with high traffic sites. I see MySQL as being a bottleneck for a lot of sites just because whoever set it up didn't have the necessary experience to get everything right. Take the time and find an expert, our last big event on a Drupal site actually did 13,000 queries per second against one of the MySQL servers in the cluster.
--
www.neospire.net
What about Core Cache and Views cache and others tools?
What about:
Are they on the list? Do you recommend them?
KarimB - Read the blog Le blog en français
Other Tools
Out of all of the above I recommend views2 caching; it's simple yet effective.
Thank you but...
Thx you for your fast response and the cool tips. But..,
Drupal Core Cache and Boost
Do you mean that if we are using boost for anonymous users it’s better to don’t activate the Drupal core cache?
DB Maintenance module
OK. But how or when do you optimize your tables?
Thx and best regards
Karim
KarimB - Read the blog Le blog en français
Drupal core cache
Drupal's core cache only work for anonymous users. Boost only works for anonymous users. For most sites, it is sorta pointless to have them both enabled.
Optimizing tables: If you know there won't be any logged in users at a given time then run it at that time. My understanding is the Optimize command locks the table and creates a new one on the hard drive. Once done, table access is a lot faster, but while optimize is going you can stall your site.
http://www.mysqlperformanceblog.com/2008/05/01/learning-about-mysql-tabl...
Thx Mikey, here is my list so far....
Implemented.
Optimize views queries
Page and block Views cache (views2 caching)
Core Cache or boost for anonymous users (http://drupal.org/project/boost)
Boost is really awesome. As you recommended ("Drupal's core cache only work for anonymous users. Boost only works for anonymous users. For most sites, it is sorta pointless to have them both enabled."), I didn't enable the Drupal's core cache since I use the boost module as you recommended . But I'm wondering if there’s no performance issues since without enabling the core cache, the table cache_page is empty. Does memcache ignore this table too?
Memcache API (http://drupal.org/project/memcache)
I'm still wondering how it works but I suppose that instead of storing cache data in to the DB, it stores in the memory. I can see a huge performance gain with this module.
Not implemented yet. What do you think about the following strategies?
Memcache for sessions (http://www.hyperionreactor.net/blog/storing-drupal-sessions-memcache)
Memcache path (http://drupal.org/project/pathcache)
No Anonymous Sessions (http://drupal.org/project/no_anon)
Authcache (http://drupal.org/project/authcache)
Thx again for the tips
KarimB - Read the blog Le blog en français
Using Pressflow instead of
Using Pressflow instead of stock Drupal will also give you the anonymous lazy sessions. There may be an issue using Memcache for sessions with Pressflow.
cache_page table
cache_page stores the full html version of the page; same thing that Boost does in the cache folder. Don't worry if that table is empty with boost enabled, they are caching the same thing, thus you don't need the cache_page table. Memcache can use the cache_page table, but you are still booting up PHP when doing this so Boost is usually faster even in this case.
Parallel
What do you guys think of Parallel? Are there any potential conflicts when using it along with the other methods mentioned here? Does using good "internal" (memcached) and "external" (Boost) caching obviate the need for this kind of delivery system? Should I just use a "real" external CDN?
-nirad
Parallel addresses a
Parallel addresses a different problem. Take a look at this from the Yslow dev and Yahoo optimizer guy:
http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connecti...
You also get the benefit of cookie-free domains for static content which is another one of Souders' Yslow optimization recommendations.
There is some added overhead from doing a few more DNS lookups. So if you're already optimizing your site by having aggregated css/jss files and aggregated img files by using css sprites, then there's less need to use Parallel.
Parallel is really good if you're using a lot of images and many css and js files on a page.
Of course a real CDN gives you geographically diverse servers that will be closer to your visitors, but they have an extra cost and take quite a bit more setup than Parallel.
So if you're already
I disagree. By getting static content off of your server you have the benefit of being able to tune your webserver as an application server only rather than a mixed application/static content server.
A good geographically dispersed origin-pull CDN should only set you back about $30-50 / month depending of course on your bandwidth usage. Origin-pull being the easy-to-setup flavour of CDNs.
You can use Parallel to get running with an origin-pull CDN fairly quickly.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
Apache tuning
I've been tuning my apache instance rather than moving to a different http server.
The default apache config for the work module can be improved significantly.
Work out which worker module type I'm using:
sysadmin@WebSrv01:~$ apache2 -V
Server version: Apache/2.2.11 (Ubuntu)
Server MPM: Prefork
These were the defaults for apache, installed from Ubuntu repos:
Timeout 300 <-- this holds the server in a wait of upto 300 seconds for a connection to timeout
KeepAlive On <-- keepalives allow for multiple requests per connection, holding open sessions on the web server
MaxKeepAliveRequests 100 <-- 100 KeepAlive requsts
KeepAliveTimeout 15 <-- which timeout after 15 seconds
<IfModule mpm_prefork_module>
StartServers 5 <-- start 5 servers
MinSpareServers 5 <-- keep 5 spare
MaxSpareServers 10 <-- to a max of 10 spare
MaxClients 150 <-- hand 150 clients per server
MaxRequestsPerChild 0 <-
</IfModule>
We are now running as follows:
Timeout 10 <-- connections timeout after 10 seconds
KeepAlive Off <-- no keepalive, new connection per request
HostnameLookups off <-- don't wait doing DNS/RDNS lookups
<IfModule mpm_prefork_module>
StartServers 50 <-- start 50 servers
MinSpareServers 15 <-- keep 15 spare
MaxSpareServers 30 <-- to a max of 30 spare
MaxClients 225 <-- handle 225 clients per server
MaxRequestsPerChild 4000 <-- stop at 4000 requests per child (stops memory leaks)
</IfModule>
For reference: http://www.ibm.com/developerworks/linux/library/l-tune-lamp-2.html?ca=dg...
KeepAlive isn't a bad thing
KeepAlive isn't a bad thing, it allows you to "stream" multiple files over one connection, thus eliminating the overhead of creating a new connection for every request. So unless you serve all CSS/JS/Images/etc. through a CDN it is still useful. The problem being that the defaults are absurd for a high traffic Drupal site. My standard starting point is:
KeepAlive On
MaxKeepAliveRequests 20
KeepAliveTimeout 3
For B-grade browsers that are making 3 concurrent connections to download a page that gives each connection 3 seconds to download 20 resources each. Adjust if your page takes longer than 3 seconds to load, or if you have more than 60 resources on a page (at which point you probably have bigger issues anyway).
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
stayin' alive
thanks for the pointer dalin.