Pressflow+APC+Varnish: External Cache or Drupal Default Dilemma

Events happening in the community are now at Drupal community events on www.drupal.org.
soyarma's picture

Hey Folks;

I've been doing a lot of performance tweaking with some of the sites I run (specifically http://latam.kaspersky.com). We have been running Varnish for some time and I recently managed to convince the powers-that-be that APC is perfectly safe and we're nuts for not running it.

After turning APC on I saw admin page generation times drop from 2s(ish) to around 300-500ms. This was awesome, but nowhere near as fast as my dev environment where they were down in the sub 100ms range (non admin pages even as fast as 20ms). After doing some tweaking I found that this was (of course) caused by the fact that my production server was behind varnish and we had the caching mode set to External on Drupal's performance page.

Changing this to Drupal Default and then testing apache benchmark (I know, not the best real-world test, but good for hammering) I started to get the 100ms-ish numbers for authenticated users.

So this is my dilemma: Swapping to Drupal Default causes me to create a SESS cookie for folks who fill out forms (I believe--correct me if I'm wrong--this is the only added cookie instance one gets when switching to Drupal Default from External) as opposed to the shorter NO_CACHE cookie they get with External Cache. This means those visitors now pass through to the backend and add load there.

However, I have as many as 5 form submissions a second on some of the sites I run, so all of those folks are cookied one way or another and get dynamic content served to them after form submission for (usually) the remainder of their visit anyway. It seems to me that in my scenario using Drupal Default for my cache is better than External Cache because they will then get faster page loads after being cookied, and non-cookied users will still get their content served out of Varnish.

Is there some aspect I'm missing where I really do lose out, or is this a logical solution?

Comments

Ignore cookies?

steve.colson's picture

It sounds like your site was working perfectly fine with varnish in front of all of these form submissions previously. Unless I am missing something, the only thing that is holding your setup from putting the load on varnish instead of the webhead is the presence of a cookie, correct?

There is a fairly easy solution I can think of, and it isn't terribly hard.

People who use google analytics face this problem too. You can keep the cookie around, but have varnish strip it off for any pages but ones you want that cookie for. Since the GA cookie is really only used client-side (javascript) most people don't distinguish what page it is for but instead just ignore it outright. In your varnish config file, you could include:

sub vcl_recv {
  // Remove has_js and Google Analytics cookies.
  set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
  ...
}

If you didn't want to put page-detection code in to your varnish config, you would obviously break the ability for administrators to log in. The best solution in this case is to run another domain or sub-domain (admin.example.com instead of www.example.com) and have varnish ignore anything at that domain that isn't a static object (gif|png|jpg|txt|js|css|...). It can still all be running on exactly the same stack, but it allows you to segment it out on to different webheads later if you really wanted, or to apply significantly different cashing rules to the admins than is applied to anon.

Hey Stephen; I actually

soyarma's picture

Hey Stephen;

I actually already use whitelisting for cookies. I strip all cookies except for NO_CACHE, SESS and DRUPAL_UID. However, I need to provide non-cached content on form submit for a variety of reasons. Cookies aside there are two other reasons why I don't want to use Varnish as a crutch and why I need the webserver to handle the real load.

  1. POST requests aren't cached regardless of cookies or not, so all of that traffic still flows through to the backend.
  2. Attacks from systems that generate dozens of new URLs that all have to pass through to the backend anyway (as they aren't in Varnish's cache). A great example of this would be Accunetix. It can take a server down quite nicely, varnish or no--especially since it is recommended to pipe HTTP 1.0 requests straight to the backend with Varnish and many tools/attacks use HTTP 1.0.

My admins already do use a subdomain for their work, but I do have some authenticated users as well, so I need to be able to serve them dynamic content through varnish.

cookie_cache_bypass module

burningdog's picture

If you're using Pressflow, take a look at the cookie_cache_bypass module - it allows anonymous users who have submitted forms to bypass reverse proxy caches and database replication to get fresh pages, presumably containing results of their form postings.

https://wiki.fourkitchens.com/display/PF/Using+database+replication+with...

I do already use that, the

soyarma's picture

I do already use that, the pressflow cookie_cache_bypass is what generates the NO_CACHE cookie that I whitelist in my Varnish config.

My core question seems to have gotten a bit lost.

Since the webserver with Drupal set to 'Normal' cache can deliver pages sub 100ms (which is not significantly slower than Varnish) am I better off swapping from External to Normal cache so that I can give my users (who by virtue of cookies or POST requests) who don't get Varnish cached pages the benefit of the faster performance?

Is there some other caveat to using Normal cache in combination with a reverse proxy that I am missing?

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: