I'm in the process of setting up a Drupal 6 site for a reasonably large urban public library in Canada, with an expectation that the site will receive about 5,000 visits a day. It's going on a Windows box that will be hosting two other separate applications, each on different installed servers (IIS with a library events application and ezproxy). We have a separate database server in place to hopefully give us plenty of room to grow. [more]
I'm making fairly liberal use of CCK fields, which I hazily understand can have performance implications. At the same time, I understand that Drupal scales very well. My questions are:
- How much do I need to worry/think about performance tuning Drupal?
- What are the best resources for learning about Drupal performance issues?
- Should I be thinking about load testing, and what are the best ways to go about that?
Thanks in advance.
Devin Crawley
Library Web Services and Systems
Ottawa Public Library
Comments
Logged in users or anonymous users?
Of the 5,000 visits a day, how many do you expect to come from non-logged-in (anonymous) users, and how many do you expect to come from logged-in (authenticated) users? Optimizing performance for anonymous users is pretty straightforward. In "Administer -> Site configuration -> Performance", within the "Page cache" section, set the "caching mode" to "normal", and that will enable Drupal to serve pages to anonymous users very efficiently (regardless of how liberally you use CCK fields). Optimizing performance for authenticated users isn't quite as easy. There's many things you can do, but the effectiveness of each one depends on the particulars of your site. At a minimum, make sure you have a PHP opcode cache enabled (e.g., APC, eAccelerator, or XCache). For Linux, my favorite one is APC, but I don't have enough experience with Windows servers to make a meaningful recommendation. And make sure PHP is running inside of IIS using either ISAPI or FastCGI (but not regular CGI). Once you've done that, you may find that your server can handle the load you need just fine. If not, some additional resources are http://groups.drupal.org/taxonomy/term/165 and http://2bits.com/articles/drupal-performance-tuning-and-optimization-for....
Windows works
But you will get better performance, particularly measured against cost, with Linux. Even Apache on Windows would help, but I think that it is useful to look at your Drupal servers as appliances, as they don't generally need to be a tightly integrated part of your core IT.
If you think that CCK fields or views are hurting performance, you can take the code generated by those modules and turn it into modules. Even though views are cached, you can save a lot of traffic by putting them in code. That said, I don't recommend that most libraries use their resources for this unles they have positively identified a problem.
Another approach to scaling is to isolate logged in and anonymous traffic on separate servers. You can then aggressively cache the anonymous server, and if that's not enough, you can put a caching reverse proxy in front of it.
For the sake of conversation, one of my clients has a multisite installation with nine public facing sites sharing content in between one and eleven languages. Those sites CSS aggregation and normal caching. All services run on one dual 2-core processor server. They get (I can't say, but considerably more than you are looking at) hits per day and the machine doesn't breath hard. The point is that, unless you have reason to believe that you might be swamped on day one, you should give it a try before you start looking for more expensive solutions to problems you might not be having. I guess the first question, which of course I ask last, is: What are those 5k visitors doing during their visits to your site?.
Queries/second
Unless your server is already hammered from the other applications - even on Windows using IIS - it should be able to handle at least 5 requests/second (http://buytaert.net/drupal-vs-joomla-performance). The main question is whether you are going to be inundated by those 5k users with very strong peaking (e.g. evening hours) or spread out during the day.
I do agree that a linux/apache box will be more efficient at serving pages and easier to troubleshoot if anything goes wrong.
So at minimum
it sounds like a PHP accelerator is essential. I'll definitely pursue that. What I didn't mention is Drupal is running on Apache; the server also hosts a second application on IIS and EZproxy. Should I be concerned about the processing demands of running Apache and IIS on the same box? From what I know, the server itself is a 2 gigahertz quad core, with 2 gb of ram. Does that ram sound sufficient? The database server is the same, with more disk storage, obviously.
As far as usage, currently we're peaking at 60,000 page views/day. The site usage is not intense in itself (averaging 2-3 pages per visit). My estimate is probably about half of the visits would be authenticated when we switch to Drupal.
Re. Linux -- I wish we had more flexibility to be able to deploy more servers as needed (and linux would definitely be a nice option). It's frankly a miracle to me given our stereotypically restrictive IT environment that we're able to have our own servers at all. Up till now we've been hosting our static site on a shared ISP to at least have the ability to ftp, etc. Many thanks again.
We are edging into the trick question category
If you are running Apache and IIS on the same box, I guess that you have somewhat limited resources and that only reinforces my earlier suggestion that you launch your site using what you have, then adjust as necessary. Since your projected load seems moderate, I wouldn't invest a huge amount of time in load testing (which is not to say that I wouldn't do any), but I would do the basics, including tuning your Drupal, PHP, MySQL and Apache configurations, and adding an accelerator, which while not essential, has little downside.
This will give you the opportunity to get real usage data that you can use to do more informed planning.
FWIW, we spec servers with a minimum 1 Gb RAM per core. My experience is that RAM is cheap, and you can't have too much -- although you can have more than your applications can use. Internally, our base box is now a high performance dual quad core machine with 32 Gb Ram and dual SAS drives under hardware RAID. We chop these up with Xen for application servers and run them OOTB as database servers. We use a SAN for storage.