News site with Drupal

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
cola's picture

Hi All,

we create a news site with drupal 6 (20'000 visitors at day).
we use a loadbalancer with 5 apache server and a own db server (all virtual).
for the cronjob (also memcache and solr) we use a own server.

for caching we use the followed modules
- path cache
- memcache
- views content cache

the page cache we have disabled right now (the news are not up to date if some new news are stored).

what did you think we can do for more speed?
someone has good suggestions?

thx for your inputs

regards,
cola

Comments

For this type of traffic, a

fgm's picture

For this type of traffic, a single non-virtual server is typically quite sufficient, and likely even more efficient than settings up all these multiple virtual servers on the same physical host.

The page cache should always be enabled, though, typically in external mode if you are using Pressflow and a varnish proxy: you can always invalidate the appropriate entries when new content is generated. See expire.module and varnish.module for this.

I fully agree

kbahey's picture

I fully agree that 5 Apache instances, virtualized or not, are overkill. One well configured physical host should be able to handle this level of traffic.

You can do away with Varnish even, and only use memcache with page cache enabled. Ideally you will be using Apache as an MPM Worker and fcgid for PHP to reduce memory usage.

This way you have fewer components to configure.

One thing we found repeatedly on client sites we do a performance audit for, is that Pressflow may be incompatible with some modules that prevent the page cache from working because of session stuff. In this case you may want to revert to Drupal core instead of Pressflow, and use memcache page cache.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Just in short: You want to

Fabianx's picture

Just in short:

  • You want to additionally use: APC, Pressflow and Varnish.

You absolutely want some kind of caching and be it just 5 min. And you absolutely want all of your media to either live on a CDN or be served through Varnish as well.

TL; DR:

  • 500 requests to frontpage per second, frontpage has 30 static assets
  • Without Varnish: 500 dynamic (PHP) requests per second + 15000 static requests per second
  • With Varnish and 5min caching of pages and 24h for static assets: 1 request every 5 min, 30 requests per 24h

=> With Varnish there is thousands of requests per second less.

=> You want Varnish.

Scenario:

1 Front page, around 22 images, 4 CSS and 4 JS files. (Aggregation enabled) == 30 objects on the front page

500 Hits go in one second to the front page: This means Apache needs to serve 500 requests, all of this requests need to go to the database, additionally apache needs to serve 30 static objects, fetch it from the filesystem etc.

Apache is kind of slow.

And the next second you have another 500 hits, etc.

It is easy to see that even with a powerful setup the server will soon go into contention (requests queueing up) or will be permanently under high load.

That makes: 500 dynamic + 15000 static requests per second

Now lets add Varnish with static resources cached for 24h ( 2 weeks is normally better as media seldom changes) and the front page for just 5 min:

The 500 requests come in, first all want the frontpage:

Varnish is requesting the backend servers just once per page:

=> 1 request to Apache

That means, Varnish is hitting Apache with 1 request and then delivering this page to all 500 clients.

Then the clients will request the images and again Varnish will make just one connection per image:

=> 30 static requests to Apache

Once it got them from the backend, they'll be delivered to clients.

Now the next second, we have zero requests to the backend as everything is still cached, etc.

That means your Apache is free for 5 min and then varnish will hit it just with one request again.

which makes: 1 request every 5 min + 30 requests every 24 hs.

Of course your system will have more pages and you'll need to handle non-anonymous pages differently, but still for parallel requests to the same page, your Apache will be really glad not to hand out the same pages again and again.

And depending on your policy you might even be able to cache it for longer than 5 min, depending on URL, or even add some expiration logic to Drupal itself using expire and varnish modules.

But even with just 5 min, your Apache has thousands of requests per second less to handle.

I hope that helps and explains a little the magic of caching!

Best Wishes,

Fabian

You could also

perusio's picture

use something like Nginx proxy_cache or fastcgi_cache (requires php-fpm or php-cgi) and serve that many users with a fraction of the current used resources. Even with Apache in your backend to serve PHP with mod_php the response should be faster for not cached/cacheable since Nginx enables proxy buffering by default.

You could also

perusio's picture

use something like Nginx proxy_cache or fastcgi_cache (requires php-fpm or php-cgi) and serve that many users with a fraction of the current used resources. Even with Apache in your backend to serve PHP with mod_php the response should be faster for not cached/cacheable pages since Nginx enables proxy buffering by default.

@cola Some of the suggestions

dalin's picture

@cola Some of the suggestions given by others might be useful for you (especially Pressflow + Varnish). But I would recommend ignoring them for now. By randomly choosing various "performance enhancements" you can get yourself into trouble very quickly, you may even be making performance worse. If you went to see a doctor because you didn't feel well you wouldn't want the doctor to just cut you open and start "fixing things" inside. The doctor would do tests and analysis to find out what the problem is and then prescribe an appropriate remedy. Same thing goes for performance tuning. Find out what your bottleneck is then choose an appropriate solution, then repeat.

the page cache we have disabled right now (the news are not up to date if some new news are stored).

This sounds like a bug with the way that you are creating news. If you are using Drupal's standard node system then the page cache is cleared whenever a new node is created.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Hi All, thx for your

cola's picture

Hi All,

thx for your answer!
we now check the possibility to do audits with physically server to see if they are much better performance.
also we integrate the drupal cache functionality. now is faster!

what did you mean about changing some tables to InnoDB? right now all tables are MyISAM....

Regards,
cola

Tables with lot of writing

exlin's picture

Tables with lot of writing does not perform so well with MyISAM.

Write operations lock whole table in myisam versus a row in innodb.

Re: News site with Drupal

alexpavlovic's picture

You did not mention what the majority of the website traffic is ( non-logged in users, logged in users, etc. ).
If this is a news website, chances are majority is anonymous traffic. Utilizing loadbalancer and 5 apache servers
seems at a glance like not a proper usage of the resources at hand. For news websites we generally utilize
boost + nginx. Boost integrates with D6 a little better then Varnish does and works wonders with nginx static file
serving. Also it is easier to setup. On some websites where we utilize cheaper hardware we can easily pull
~2800/requests per sec for static content. This is on a single node. You may need to re-evaluate your strategy.

Boost integrates with Drupal

dalin's picture

Boost integrates with Drupal better then Varnish does...

I strongly disagree. Many of the sites that consult with us on performance that are running Boost are having severe issues during cache clearing. We get them switched over to Varnish and they see drastic improvements in scalability.

Also it is easier to setup.

Again I disagree. I'd say they require about the same amount of effort.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Re: Boost integrates with Drupal

alexpavlovic's picture

@dalin,

I strongly disagree. Many of the sites that consult with us on performance that are running Boost are having severe issues during cache clearing. We get them switched over to Varnish and they see drastic improvements in scalability.

I actually edited my comment, before you commented, I realized I did not include D6. Please re-read. None of our setups experienced any cache clearing issues. I am not sure where or how you got these.
How did you measure these drastic improvements? Were these on D7 or D6?

Again I disagree. I'd say they require about the same amount of effort.

Not true, boost works out of the box on D6, and Varnish requires patching of core on D6. D7 or PressFlow is another story. @cola mentioned that they are using stock D6.
This is why I mentioned what I said above and it still holds.

Additionally if you read: http://drupal.org/project/varnish

"At the moment, using Varnish to any effect at all requires Drupal 7 or PressFlow."

Cheers.

Hi All, thx for all your

cola's picture

Hi All,

thx for all your answers!

we now do some tests with physical servers, if they are faster then VM, we think to change to physical.
right now we test pressflow (we use right now drupal 6). if we have no problems with our modules, then i think we change to pressflow and use also a master/slave DB.

how works varnish with logged in users?

regards and thx,
cola