Posted by repoman on February 12, 2013 at 1:46pm
Yesterday, an internal staff member was running two XML sitemap scans on our site and we were seeing loads on the Linux web head servers in excess of 12-15. We are running a Varnish/Pound proxy in front of the drupal web head with the MySQL DB on a 3rd server. We asked them to stop and the load disappeared.
The question I have is that since anyone could conceivable do this from the outside what is the best practice to help keep our server more resilient to a scan of every page on the site from the same IP? Is there a way to throttle the connections so that the load is more manageable?
Thanks!

Comments
Yes you can
I would not suggest you to try this from a Drupal perspective.
But, on the other side you do can handle this thing on the web server.
But, maybe first you need to find where it was the work being done, you say you are using a revers proxy, but you were scanning the site, right?
If all the request would be reaching to the reverse proxy's cache it would be no problem, as varnish can actually handle those without problem, no real hard processing would happen there.
The problem is when all these requests reach the web server and then the application (Drupal) and then it happens all the threads, db requests, file processing, etc.
Apache for example has a way to say how many threads should be allowed at the same time, how many minimum, how many max and how many idle and some similar other configuration, and it depends on your configuration of the web server.
Also, varnish has the option to say how many requests will you sent at the same time to each web server that you configure.
So, you have 2 places where you can set how many requests you should receive in the web server at the same time, and actually you should, but the exact configuration is first a little of theory and then adjusted with test and error, you can use tools like ab command or JMeter and see how much can your sever and application work without problems.
~Nestor
Nestor Mata
http://nestor.profesional.co.cr/es
Throttling
HI Nestor - Thanks for your input!
Yes we have a Reverse proxy (Varnish for port 80 & Pound for port 443), however since the scan was hitting every page on the site only the pages that were previously hit and cached were served from Varnish; which seems to be primarily landing pages on the site and not all the articles. So yes Drupal had to bootstrap all those uncached pages.
We will look into adjusting apache on the application (Drupal) server to limit the number of connections, I recall the worker and prefork section of the httpd.conf.
We will also look into Varnish to do the same as that would probably be best.
We also have APC & Memcache enabled on pressflow to help but perhaps there needs to be some tweaking. Do you know of some AB or JMeter settings that basically all web sites should be able to handle? We have a quite a lot of resources for our site but a simple XML site map scan should not be so easy to lock it up.
The problem is we have a small staff to deal with this and just need a good start in the right direction and then we will be able to find our way. There is just so much information out there that we don't know where to begin.
Also, what is your opinion on this info: http://www.netmagazine.com/features/top-15-drupal-performance-tips
Thanks again for your input and anyone else's.
Performance tweaks
Here's the wiki
D6: http://groups.drupal.org/node/187209
D7: http://groups.drupal.org/node/210683
When your site is under load how does MySQL look?