Cache thrashing solution - minimize cached urls?

Events happening in the community are now at Drupal community events on www.drupal.org.
ebrueggeman's picture

I'm working on a unique problem right now with the relaunch of a high traffic site (2-5 million page views per day), and want to run an idea by the group. There is a specific singular landing page that will receive a significant amount of traffic from an external source, and each requested page would have a unique query parameter. Ex: /landing-page?uid=438474383487348?foo=bah.

We definitely want to use Drupal cache_gets and cache_sets (though the specific implementation is still up in the air) because of the high traffic, and we need to capture that query string information and process it if the user creates an account. An issue with using Drupal's cache framework is that it stores cached pages based on the $_SERVER['REQUEST_URI'], which includes the query string args. This would likely overload our cache tables and not provide value anyways as the query args will be unique per visitor.

Does anyone have experience spoofing the $_SERVER['REQUEST_URI'] to remove certain query arguments? I've successfully been able to remove all query args, and set a cookie with them, then overwrite $_SERVER['REQUEST_URI'] before caching has kicked in without the additional query args. Drupal then thinks the page is the same as the visitor before it, and serves a cached page up. When the user registers, we process the cookie.

Does anyone see any issues with this approach or has had an experience doing anything similar? I would only enable the processing logic if !$user->uid.

Comments

Does anyone have experience

dalin's picture

Does anyone have experience spoofing the $_SERVER['REQUEST_URI'] to remove certain query arguments? I've successfully been able to remove all query args, and set a cookie with them, then overwrite $_SERVER['REQUEST_URI'] before caching has kicked in without the additional query args. Drupal then thinks the page is the same as the visitor before it, and serves a cached page up. When the user registers, we process the cookie.

Brilliant idea. Do you do this on the webserver level (via .htaccess or the like)? I see it working fine as long as you don't need to alter the content of the page based on the query string.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

I've done something similar

mbutcher's picture

I've done something similar on an Nginx server, where we basically stripped of the GET parameters before passing the script on to PHP for processing. In our case, though, the data was just used for analytics, so we didn't need to store it (the fact that we could get it from the access log was enough). Seems that with the right web server configuration you should be able to dump the value into a cookie.

Another way of reducing cache impact.

mbutcher's picture

Another thing we've done, which might make sense in your case, is to offload the process of serving from the cache to Nginx. I documented it here: http://technosophos.com/content/53900-speedup-nignx-drupal-and-memcache-... (Wow... long URL...). It's a little... outside the box, and it certainly won't work for everyone. But it dropped our load average way down.

In this scenario, we actually munged the cache key so that we removed the GET params off of the URL before storing the URL in the cache, thus both of these URLs would be served from the same cache entry:

http://example.com?a=b
http://example.com?b=c

I see from your link that you

ebrueggeman's picture

I see from your link that you allocated 8gb of memory to memcached - do you know approx how much memory the average nginx request took to serve?

Rough answer

mbutcher's picture

I'll give you some basic info and you can extrapolate from there, or ask more specific questions.

We actually only dedicated 5G to the page cache. The additional memory was allocated to other caches for other things.

Each cache entry took no more than 1M of memory. Nginx consumes a pretty constant amount of memory: 2.5M for our serves. I think that's per-process (we run four, one bound to each processor). Serving a page request does not substantially inflate that number.

PHP-FastCGI takes around 32M per thread on average for us, but this will vary considerably based on what parts of Drupal you are using. Of course, for a direct cache hit in my configuration, PHP isn't invoked (though other simultaneous requests obviously require that I keep that value in the calculations)

Thus, for one server, we can purchase a system with 8G of RAM, install Nginx/PHP/Memcached, fire off a huge load test, and still have 1G of RAM remaining basically untouched.

(caveat: this is assuming that the database lives on a different server. MySQL is a totally different story.)

Boost

mikeytown2's picture

You can operate boost in this way; serve the cached URL completely ignoring the query string. Although to get it to work exactly how you want it would require some small modifications to the module and some quick htaccess tricks... less then half an hour to get it working with boost is my guess (5 min if code works correctly the first try).

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week