Posted by dpardo on December 21, 2010 at 5:40pm
When a 404 page is launched by Drupal, it needs to load all Drupal's stack.
In web sites with a lot of modules, like is my case, it means that php need load 60M ram for launch a page not found!
That is a problem if for example you have some kind of software testing your site with random urls like mcaffe.
So we made a little hack for index.php so we check previously if a page exists before load all drupal stack.
Here is the code. Someone knows a clean way for fix this problem?
/**
* @file
* The PHP page that serves all page requests on a Drupal installation.
The routines here dispatch control to the appropriate handler, which then
* prints the appropriate page.
All Drupal code is released under the GNU General Public License.
* See COPYRIGHT.txt and LICENSE.txt.
*/
$conex = mysql_connect('localhost', 'my_user', 'my_pwd') or die("Could not connect : " . mysql_error());
mysql_select_db('my_db') or die("Could not select database");
$uri = explode('/',$_SERVER['REQUEST_URI']);
$query = mysql_query("SELECT * FROM menu_router WHERE path = '".$uri[1]."'", $conex);
$num = mysql_num_rows($query);
mysql_close ($conex);
if ($num>0) {
require_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
$return = menu_execute_active_handler();
// Menu status constants are integers; page content is a string.
if (is_int($return)) {
switch ($return) {
case MENU_NOT_FOUND:
drupal_not_found();
break;
case MENU_ACCESS_DENIED:
drupal_access_denied();
break;
case MENU_SITE_OFFLINE:
drupal_site_offline();
break;
}
}
elseif (isset($return)) {
// Print any value (including an empty string) except NULL or undefined:
print theme('page', $return);
}
drupal_page_footer();
} else {
header("HTTP/1.0 404 Not Found");
header("Status: 404 Not Found");
include("404.html");
} Thanks,
Daniel

Comments
"Standard" way to do it
http://2bits.com/drupal-planet/reducing-server-resource-utilization-busy...
beat me to it :)
beat me to it :)
Thanks for the link, but I
Thanks for the link, but I think that this snippet doesn work against force brute attacks like this example:
208.131.175.* - - [01/Feb/2010:00:48:55 +0100] "GET /horde/ HTTP/1.1" 404 204
208.131.175.* - - [01/Feb/2010:00:48:57 +0100] "GET /horde2/ HTTP/1.1" 404 205
208.131.175.* - - [01/Feb/2010:00:48:58 +0100] "GET /horde3/ HTTP/1.1" 404 205
208.131.175.* - - [01/Feb/2010:00:48:59 +0100] "GET /horde-3.0.9/ HTTP/1.1" 404 210
81.171.96.* - - [30/Jan/2010:12:08:31 +0100] "GET /thisdoesnotexistahaha.php HTTP/1.1" 404 223
81.171.96.* - - [30/Jan/2010:12:08:31 +0100] "GET /cmd.php HTTP/1.1" 404 205
81.171.96.* - - [30/Jan/2010:12:08:31 +0100] "GET /cacti/cmd.php HTTP/1.1" 404 211
81.171.96.* - - [30/Jan/2010:12:08:31 +0100] "GET /portal/cacti/cmd.php HTTP/1.1" 404 218
81.171.96.* - - [30/Jan/2010:12:08:32 +0100] "GET /portal/cmd.php HTTP/1.1" 404 212
81.171.96.* - - [30/Jan/2010:12:08:32 +0100] "GET /stats/cmd.php HTTP/1.1" 404 211
This kind of access causes a full Drupal bootstrap...
I think what you have is a
I think what you have is a very good way to handle 404 of pages that Drupal will normally handle with a full page load. If you were looking to do it without altering the index.php file, you could also make a module and do the same things in its hook_boot(). Then, after installing the module, find it in the system table and set boostrap to 1 and it's weight to something like -1000. That will make it execute very, very early in Drupal's stack.
There is a very strong reason for doing it the way I've outlined above, and that is that you have written an SQL query that is begging to get some SQL injection spliced into it. With that query you've written above I could create a new user for myself on your SQL server and get full access to your database all by splicing things into my URL. If you use the Drupal database functions you save yourself from possible injection attacks.
Also, you can take that code linked to above a bit further (you may also need to pull out the .xml one, or modify it if you have other xml RSS feeds). Instead of just using it to serve fast 404s for images you can use it to whitelist pages. This would allow you to know all the pages that your modules use (some do load other pages like the ad module as I've shown below).
// List of extensions for static files
$exts = 'txt|png|gif|jpe?g|shtml?|css|js|ico|swf|flv|cgi|bat|pl|dll|exe|asp|xml|php';
$allowed_pages = array('index.php','adserve.php','rss.xml');
// It is not an imagecache path, which we allow to go through Drupal
if (!strpos($_SERVER['QUERY_STRING'], 'imagecache')) {
// It is not our main feed page or an allowed php page
if (!in_array($_SERVER['QUERY_STRING'],$allowed_pages)) {
// Is it a static file?
if (preg_match('/.(' . $exts . ')$/', $_SERVER['QUERY_STRING']))
// Just send a 404 right now ...
{
header('HTTP/1.0 404 Not Found');
print '<html>';
print '<head><title>404 Not Found</title></head>';
print '<body><h1>Not Found</h1>';
print '<p>The requested URL was not found on this server.</p>';
print '</body></html>';
exit();
}
}
}
You may also want to see this thread: http://drupal.org/node/76824
Lastly, I'd consider a reverse proxy (like Varnish) if you are finding yourself to sensitive load. It won't help with the issues specifically outlined above (since it will have to generate the 404 before it caches it) but a lot of brute attacks hit the same URLs so you can save yourself some pain--either with them, or by serving cached pages to your legit users.
Lastly, you may want to look into getting IDS (Intrusion Detection System) from your host/provider. Most firewalls can have that added in and spot and block brute force attacks without you needing to do anything to your webserver/backend.
Thanks a lot for your post
I think that the best aproach for brute force attacks is your suggestion about install a IDS.
But also I love the idea of a module with a very low weight for check the URLs, I thought that I saw a similar module for Drupal 5, and I will try to write one for Drupal 6.
I'd be interested in seeing
I'd be interested in seeing it when you're done--the notion of an 'Fast 404' module strikes me as very interesting. I know there are some things being done in D7 to make this better, but nothing for D6 other than hacks and patches. While those may ultimately offer the best bang for the buck, they don't help folks who aren't serious coders (or folks who don't want to hack core).
What was the D5 module that did this? I was looking today and couldn't find any modules that addressed this issue.
new module
Was just created http://drupal.org/project/http_reject
sounds interesting.
That module rejects based on
That module rejects based on UA/HTTP method... that not helps much here :( A similar module is http://drupal.org/project/badbehavior
First version of module
I have installed http_reject but it seems that the funcionality is different, http_reject only focus at the http protocols that accept Drupal.
Anyway I have written a very very very simple module. Only two methods! :)
hook_enable
hook_boot
Name: light_not_found (I dont know if it is a good name)
I followed soyarma instructions so the module install by self with a very low weight.
You can download at http://www.phpia.net/light_not_found.tar.gz
I test the memory consumption and it only needs 1mb for launch the 404 page (Drupal needs about 8mb in a fresh install) so if you have a lot of modules it means an improve.
Probably it would be interesting to create some admin tools, where the user could write some paths exceptions... Set the default 404 page...
What do you think?
Hey dpardo; I couldn't help
Hey dpardo;
I couldn't help myself last night either and started on a module that incorporates both your menu router checking as well as the missing image checking (which can be done earlier and lighter) for the best of both worlds.
I still have to write the install and make it so that if you have path checking running from settings.php it doesn't run again at hook_boot.
Here it is in it's current state: http://nosquaresoftware.com/content/fast-404-module
I made a change for include
I made a change for include the alias table and I updated the module. Here is the code:
It must be improved and tested!
$uri = explode('/',request_uri());if ($uri[1] && $uri[1] != "index.php") {
$nuri = $uri[sizeof($uri)-1];
$match = db_result(db_query("SELECT count() FROM {menu_router} WHERE path = '%s'", $uri[1]));
if ($match==0) {
$match = db_result(db_query("SELECT count() FROM {url_alias} WHERE dst LIKE '%s'", '%'.$nuri.'%'));
if ($match==0) {
header("HTTP/1.0 404 Not Found");
header("Status: 404 Not Found");
print '<html>';
print '<head><title>404 Not Found</title></head>';
print '<body><h1>Not Found</h1>';
print '<p>The requested URL was not found on this server.</p>';
print '<p><small>By Light Not Found module</small></p>';
print '</body></html>';
exit();
}
}
}
Hey Soyarma, if you want, we
Hey Soyarma, if you want, we can work together in the develop of a module and perhaps move this thread to http://groups.drupal.org/contributed-module-ideas
I think that's a great idea.
I think that's a great idea. I've kicked off the thread here: http://groups.drupal.org/node/114729
The proper long term solution
The proper long term solution is to get #76824 in core.
Until then, you can do this by a simple change in settings.php, as others have pointed out. This does avoids hacking core and all the maintenance headaches that ensue. Read more on reducing server resource utilization on busy sites by implementing fast 404s in Drupal. You can extend it for paths that are not Drupal-ish.
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
It only partially solves the
It only partially solves the problem, for deprecated static files. But it's lucky that most of files in my 404 log are static files ;) Is there anyone brave enough to do a real benchmark? Get a one-day access log, and replay all requests with/without the patch to see how much the performance is improved. That's of course on a case-by-base basis improvement. In my case, 404 takes 0,06% number of all HTTP requests.
However it looks like a micro optimization, too. It may prevents other modules like imagecache from making magic things happen.
Maybe a "long long term" solution is a menu system rebuilding.
You might
consider changing your web server. Using Nginx instead of Apache. A decent config will serve all static files (returning 404 if they don't exist) without involving drupal altogether.
Can't run Drupal
Can't run Drupal with that server ;)
jcsio You can try the module
jcsio You can try (at your own risk :)) the module that we are building in http://groups.drupal.org/node/114729
Its means a improve in the perfomance and it works without a patch.