can drupal support 400,000 page view per hr ?

tarun_nagpal's picture

Hi All,
I am in a challenging situation. I have working and used to prefer drupal for my all
projects. Now i have to create a project with high quality traffic. I want to ask the simple question "can drupal support 400,000 page view per hr ?" .

Any help will be appreciated.

Regards
Tarun Nagpal

Comments

dynamic content? static

Andre-B's picture

dynamic content? static content? anonymous users? logged in users?

more questions

neeravbm's picture

In addition to what andre posted, what are the hardware constraints?

Hardware is basically your

Brian Altenhofel's picture

Hardware is basically your limiting factor.

With all the respect, cannot

PlayfulWolf's picture

With all the respect, cannot disagree more*. Once upon a time (to be precise ~2004-2005 year) Pentium 4 server handled a project with ~0.5M hits per hour. It was 100% authenticated users but the PHP app was custom... built for WAP :) Some heavy black magic was done with caching tricks but the server was still under 0.5 load.
Drupal is probably ten times more heavy, but caching technologies are also head and shoulders above those used 10 years ago.

*If we are talking about vanilla Drupal project with no or only basic built-in caching techniques

drupal+me: jeweler portfolio

Using a multi-tier structure

Brian Altenhofel's picture

Using a multi-tier structure (DB on separate servers from web heads), I've got a client where we load tested the ordering process on a Drupal Commerce site. 1GB/1vCPU/SSD VM yielded 41 orders per second without ESI and microcaching, 45 orders per second with ESI, and 77 orders per second with ESI and 1s microcaching.

OP wants to serve 111 pageviews per second. That's really not a lot to ask, especially of a single-tenant dedicated server.

Yes, drupal can do it!

technicalknockout's picture

As mentioned above hardware is a limiting factor. But proper application & database tuning can help you squeeze more out of the hardware. The question posted IMHO is not a simple question at all. For the sake of having an answer, I say "Yes, drupal can do it!" But really that's a naive answer. I'm guessing you're not going to be running a site with drupal core only. Your site's functionality, configuration, and infrastructure will likely determine whether it stands up to your traffic's needs.

hardware is not the limiting

Andre-B's picture

hardware is not the limiting factor, the limiting factor is money and knowledge. because you can buy or rent as much hardware as you like, but you will still have to know how to set it up properly to have a scaling application. if you don't have the knowledge you can buy it.

Doing this on static content is fairly easy, scaling that as well, simple lamp stack + a few varnish nodes should be capable of managing that stuff. but since thread author did not give any detail it's hard to predict. after all, 400.000 views per hour are 111 views per second, which is easy to handle..

Simple question, but no simple answer

Andy_Read's picture

In line with the previous comments, you may be asking a simple question, but there is no simple answer. If you expect a simple answer then it's simply "No" - Drupal will not support 400,000 users per hour - on it's own.

Sites of this capacity and greater have been built with Drupal as their core CMS platform, but there are some essential elements that you need to research:
0) Configuring the server(s) with best-practise components like NGinX web-server, APC and memcache.
1) Caching - at many levels, both inside Drupal, caching data, blocks, pages and in front of Drupal typically using Varnish and/or a CDN. But the hard question is what can and can't be cached and when the cache needs clearing, which depends on many things previously mentioned like whether users are logged in. This requires a thorough analysis of your application.
2) Optimisation - Test and measure which aspects of the application are really taking the time - measure response times, look at the mysql slow-query log, etc. Don't waste time optimising things that aren't really a problem. But there will be places in the architecture or code where you will need to restructure and do things in a more efficient way.
3) Scaling - which means sharing the computing load across multiple web-servers, database servers and a shared file system. If money is no object, you can try just throwing hardware at the problem, but putting your effort first into steps 1 & 2 can gain orders of magnitude more performance before you waste money on hardware/cloudware.

Googling Drupal high performance and reading plenty of the readily available material is a better starting point than asking a simplistic question on a forum like this.

@Andy_Read 3) scaling is only

Andre-B's picture

@Andy_Read 3) scaling is only possible if the application is horizontally scalable. If it's not done correctly adding there will be a CPU limit, a RAM limit or/and a hard drive (even SSDs) speed limit...

Wrong question; here's how to better frame it

FluxSauce's picture

Can the stack deliver 400,000 pages per hour?

Drupal is merely a part of the larger architecture. There are many, many enterprise sites that run Drupal that serve way more pages than that. However, the architecture of the entire stack is what's really important.

Are you talking about 400k anonymous page views? Authenticated? REST or other API calls? AJAX? What's the weight of the page? What other assets are being included? Using a CDN? Reverse proxy? Caching layers? Database configuration? HA? Etc.

Many media companies server more traffic

rkarajgi's picture

Tarun: Simple answer is "Yes - a Drupal based solution is capable of supporting such traffic and provide 2~3 second response times. But scaling solution depends on your application".

First you will need to fine tune your Drupal application for 2~3 second response times. Steps suggested by Andy Read are a good starting point.

For anonymous use cases, it is lot easier to serve 400,000+ pages/hr - when there is user-login required. It is just a question of adding more reverse proxy servers, using CDN and adding more "Drupal Application Servers". Most media company sites use this approach.

When it comes to high traffic of logged-in users, then the approach has to be more sophisticated than just throwing iron. As Andy Read suggested above, you will need to fine tune your Drupal application for this - and figure out strategy for single/multiple memcached and sessions table in memcached.

Thanks

Rajeev

  • Drupaler in Cupertino, SF Bay Area

Configuration and the users

tarun_nagpal's picture

Thank you all for the great help and support.

My site traffic is basically anonymous users. As per the hardware and the server configuration concern, we have private dedicated servers.

Our team has the Linux/Unix server experts and i am ready to adopt any kind of new technique like boost, memcache.

use varnish

neeravbm's picture

If it's mostly anonymous users, use varnish and you'll be able to handle that traffic.

For anonymous traffic, you

dionesku's picture

For anonymous traffic, you won't have any problem.
Use Boost (https://www.drupal.org/project/boost) and you'll be serving static html pages, literally - which an average server can do thousands/second.

Boost is great if you can't

Sam Moore's picture

Boost is great if you can't use varnish, but in Tarun's case varnish is a better choice.
Boost caches fully-rendered HTML pages on the server's filesystem for easy retrieval, which is great for shared-hosting accounts. It's all done via a module (with a little tweaking of settings.php) so doesn't require a powerful admin account on the server.
Varnish on the other hand is a reverse-proxy server that caches in RAM, and is thus orders of magnitude faster. You need full control of your stack to install and configure it, though.

Agree 100%. A couple of years

Jamie Holly's picture

Agree 100%. A couple of years ago I was getting almost 40,000 hits/hour on a post on my D7 site running on a linode 512 VPS (the smallest one) and the it just hummed along.

The only thing I would add is that if your site is image intensive, then I would look at possibly adding in a CDN, just to keep those requests off the main server. When you start getting up into traffic like this, running off a single web head, then the more load you can outsource, the better.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Varnish vs boost

mikeytown2's picture

If boost can do 8k requests a second then varnish can do about 10k requests a second on that server. In terms of serving traffic they are closer than most people think. I can max out a gb LAN with either boost/varnish with decent hardware. Cache clears is where varnish wins. Removing files from the disk is slow when compared to ram.

SSL requirements will also

pingers's picture

SSL requirements will also affect your software stack. Varnish doesn't deal with https, you need something to sit in front of varnish to terminate SSL and pass on the requests which could be cached with varnish. Nginx or Pound are two fairly commonly used tools for SSL termination in front of varnish. E.g. http://www.acquia.com/blog/why-pound-awesome-front-varnish

As mentioned above, the stack

typhonius's picture

As mentioned above, the stack in its entirety is really what needs to be taken into account here. Consider the analogy:

Can an engine go 200kph

On its own, an engine will sit and churn away quite happily, but without the rest of the car it won't budge from its initial position.

From my perspective, a site with mainly anonymous traffic on dedicated servers can make use of a well written VCL and Varnish to serve many pages, quickly, to many users. The rest of the stack will need to fill in for uncached pages, new content and editing. Each component of the stack forms a link in the chain to keep the site online, taking requests and delivering content.

It would be nice to break things down to a simple question of yes or no here, but going back to our car analogy, it's like asking if an engine can go so fast with no knowledge of the rest of the car, the road conditions or the driver.

Benchmarks

wwhurley's picture

For a recent project we subjected a site that was intended to support a large number of concurrent connections to load testing. It was a fairly simple setup, a c3.xlarge server for the web server and Varnish and another m3.large for the database. We spidered the site to retrieve all possible variations of content and then ran soak tests against the totality of possible URLs as well as more targeted tests against sections of the site that could not be cached effectively.

With a warm cache we could serve 1000 concurrent connections returning results in just under a third of second, so our total throughput was ~ 3400 requests per second. Against uncached content it wasn't quite as rosy. We could support bursts of up to 250 concurrent connections to uncached content at about 80 requests per second, so for a cold start it could get dicey if traffic just appeared. Against content that could not be cached at all we could support only 50 concurrent connections at 17 requests per second.

We stopped the anonymous testing at 1000 concurrent requests since the math worked out to ~ 200k requests per minute, but we were only seeing a load of 1 on the server so we had a lot of headroom there. Anecdotal evidence suggests that a single Varnish node is more than capable of serving 3000 concurrent requests as long as you have sufficient memory allocated to cache all the objects without a whole lot of optimization.

We didn't put any time into optimizing the authenticate user process since there were very few places where it would be an issue. Had it been an issue doing more caching at the Drupal layer as well as playing with vcl_hash and maybe even some ESI work could have made that significantly better.

High performance

Group events

Add to calendar

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week