Handling large spike of un-cachable requests.

Events happening in the community are now at Drupal community events on www.drupal.org.
brad.curnow's picture

Hi all.

I've been tasked with building a high performance, high capacity website for a client and so have been reading up on best practices. I'm getting familiar with the cache based approach for both authenticated and anon. users, APC, varnish etc - however I'm finding it difficult to find best practices regarding non-cachable requests.

-- EXAMPLE --

Suppose my website were to give away 500 free donuts to the first 500 people who correctly answer a multiple choice question. This figure would be stored in a field called "QTY" on the "freebie" content type.

Also suppose that this "offer" goes live at 1pm and that as many as 12,000 authenticated users try to claim all at once...

-- END --

Initially I thought about simply loading the node, reducing QTY by 1 and repeating until QTY reaches 0. To me though this method (aside from being very inefficient?) seems like it would place a massive load on the server?

I guess bypassing loading the node would be a start? Even better if we could bypass the entire bootstrap? Perhaps a custom module could load the initial QTY value and then handle all the incoming requests? So many potential options and I have no idea where to start!

What would be considered best practice for handling these kinds of situations? I suppose simply adding a layer of web servers behind a load balancer would help, but what could one do in terms of coding and as a general approach to such situations?

Thanks!

AttachmentSize
process.jpg51.45 KB

Comments

I believe you should think of

vaibhavjain's picture

I believe you should think of doing your maximum task in Memcache.
A user coming onto the page, giving correct answers (which can be verified via memcache), if so, then take the user to freebie page, calculation can be done in Memcache too - Lock the memcache if being used, OR what I did in one case was, if the number of items left are below a specified , Start using Lock.
For the database storage, you can either hit the DB, every minute to update the value, or hit the DB on every successful attempt, which can be a PITA.

Vaibhav Jain

Thanks Vaibhav. Memcache

brad.curnow's picture

Thanks Vaibhav. Memcache usage certainly seems to be the way to go. Hitting the DB on every successful attempt would greatly reduce the benefit of using memcache in the first place right? Better to hit the DB only when, say, the free items run out? Or perhaps when there's a lull in requests... hmm.

SSDs for the DB server would

dalin's picture

SSDs for the DB server would also help. Avoiding making them authenticated users would be better as well, but if you must then take a look at AuthCache.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Indeed! Good thing to keep in

brad.curnow's picture

Indeed! Good thing to keep in mind when choosing hosting provider. I was thinking about using Pantheon, or the Rackspace Cloud (Sydney servers for Australian website). Pantheon is attractive because everything is managed, but I'm not sure how flexible they can be?

First quantify, then decide how radical you need to be!

andy_read's picture

Hi. An interesting problem. But probably the first step is to quantify exactly what you mean by "12,000 authenticated users try to claim all at once". Specifically what time period are they coming in over and what is the peak/average rate? And what is your target response time?

If they really are all coming in over just a few seconds, then you probably need to serve a completely static page from a Drupal cache and do the first 500 decision with a little bit of ajax to a standalone light-weight server like node.js. Just record the session id or similar and return 'win' to the first 500, then sort out the details in the background later!

Andy

Thanks Andy. I always like to

brad.curnow's picture

Thanks Andy. I always like to plan for the absolute worst case scenario, so in this instance I'd like to pretend we have 12k users hammering the system within a very short time period - seconds or tens of seconds. Your suggestions seem to line up with the kind of approach I was musing over, so I'm glad I was on the right track!

Would you say this kind of approach could scale easily? I would have thought yes but this high performance stuff is very new to me still.

Lets assume that a user needs

fabianx's picture

Lets assume that a user needs to click: "Claim" before something happens.

A good architecture would be for example using AJAX / post to a PHP only script, which then decreases for example a memcache counter, which is an atomic fast operation.

In that case the site can be completely static and the response of the AJAX request is either "yay!" or "nay!".

Depending on the outcome you forward the user to a "you have won page" with a "token" from your PHP script (stored for example in MySQL or calculated via secret key and the users number (1-500) like in drupal_get_token()) or "better luck next time".

The you have won page is only ever seen by 500 users, so can be "authenticated", check the token and let the user enter her contact details.

That would be a pretty simple, but very scalable architecture as both PHP and memcache on the "API" server can be scaled out easily horizontally. (api.mydomain.com)

I have implemented something like that once for a Voting System and it worked very very well.

Have Fun and thanks for posting such an interesting problem!

Thanks,

Fabian

Nice. Thanks Fabian.

brad.curnow's picture

Nice. Thanks Fabian. Certainly seems to be a reasonably common line of thought developing here! There's so many variables and potential bottlenecks to deal with at this level but the muddy waters are starting to clear for me. Thanks!

So to summarise...

brad.curnow's picture

Excellent info, thanks lads!

So if I were to abstract a little in an effort to formulate a general approach to this kind of situation, could I say that the following process would be a good start:

  • serve cached pages, then AJAX in/out smaller snippets of info as required to reduce server load.

  • Skip Drupal and use very lightweight, optimised PHP (or perhaps other) code to execute that which can't be cached.

  • Do anything that can possibly be done on the client side, on the client side.

  • Write code that progressively screens incoming requests, directing users to cached assets at every possible juncture.

  • When the user does hit a stage where further processing is required, run as much as possible using memcache.

Obviously the details will vary in each use case, but does this seem like a good starting point/check list to you guys?

Brad, sounds like you've got

andy_read's picture

Brad, sounds like you've got a great starting strategy there. As you say, Fabian & I are suggesting the same basic idea - the only diff being the possibility of using something non-php like node.js for the 'little fast bit'.

I've got no real experience with node.js to recommend it, but various bits of info I've picked up recently suggest it may be very good for anything lean & mean and close to real time. A quick google on node.js v php brought up this article which is particularly interesting in terms of being hit by lots of clients: http://blog.loadimpact.com/2013/02/01/node-js-vs-php-using-load-impact-t....

You might also want to combine it with MongoDB (no-SQL) as a leaner place to initially store your winners.

Andy

I have not recommended

fabianx's picture

I have not recommended node.js for two reasons:

a) It is more difficult to do the form submission /memcache integration there. NodeJS is great for persistent state and real time though.

b) And that is more important: It is not easily possible to scale NodeJS horizontally.

Edit: However this is mostly true for the Drupal NodeJS solution. If using a separate nodejs webserver like in the article, there is no state and it would obviously work well. I still stand that PHP is more than enough for this job though and easier to deploy for most standard infrastructures.

With the PHP / Memcache solution, you just add another box , add it to the LB: done. With NodeJS and state being hold this might be more complicated. Serving this PHP via NGINX should be real lightweight, but should also work from its own Apache box.

The only bottleneck / SPOF is the memcache instance itself, but one could use heartbeat for that if that is needed.

MongoDB is overkill as there are only 500 writes for the winners, which is marginal and given they need to fill out a form and user input is "slow", probably one bigger web node and a normal DB are more than enough.

Thanks,

Fabian

Makes sense. What kind of

brad.curnow's picture

Makes sense. What kind of numbers (or other relevant factors) do you think would need to be in play before nodeJS/mongo would become the optimal choice?

Thanks mate, that's a very

brad.curnow's picture

Thanks mate, that's a very interesting article too. I didn't realise nodeJS had issues scaling on multicore processors. Good to know! Could still be a good option to handle massive loads, though.

Great info - thanks

brad.curnow's picture

Great info - thanks gents!

Sounds like the basic approach is okay.

I would imagine that NodeJS and Mongo could be a step up if required for a truly massive influx (I'm thinking McDonalds giving away free burgers or something) where you may have 100k users smashing the system at or around the same time - especially if they're all getting the multi choice question correct!

If I understand correctly, the basic difference between approaches is that a PHP/SQL solution could more easily be scaled out as needed, whereas something like nodeJS/Mongo would be better to start off on machine(s) that are capable of handling the maximum forecast load and then not have to scale at all.

Another thought I had was to perhaps have a number of servers handling the requests - for example if we give away 5000 donuts, perhaps we have 5 servers each handling 1000 prizes each (i.e. memcache counter is 1000 on each machine, for a total of 5000). I'd expect that to have some benefits in terms of alleviating queuing issues, max connections etc.

Of course doing it this way would mean having a module divvy up the prizes and then delegate the info out to the process servers... probably overkill but worth keeping in mind I guess.

Anyway, looks like the groundwork has been laid - time to push forward towards an actual solution for my application. Thanks again for your input everyone!

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: