Low Memory/CPU 404 Error Pages

Events happening in the community are now at Drupal community events on www.drupal.org.
soyarma's picture

This is a continuation of the discussion here: http://groups.drupal.org/node/114124. The discussion stemmed from how much memory/cpu Drupal can take to provide a 404 error page (60MB+ if you have a decent module load).

While there have been several methods proposed for dealing with this, most are core hacks (though some have made it into Drupal 7). Most of the hacks deal with 404s for missing images/css/js files. They don't deal with other URLs that appear to be valid paths and not links to missing files.

Discussions referenced before the proposed module(s) were started on:
* http://groups.drupal.org/node/114124
* http://2bits.com/drupal-planet/reducing-server-resource-utilization-busy...
* http://drupal.org/node/76824

dpardo hit upon checking the menu_router and url_alias tables very early on in the site startup and soyarma proposed building the check into a module that runs during hook_boot.

dpardo built this module and has it here (http://www.phpia.net/light_not_found.tar.gz) for examination.
soyarma expanded upon what dpardo did and merged the url/menu table checking with the image/file checking other hacks have proposed and created a combined module: http://nosquaresoftware.com/content/fast-404-module

Our thought is to merge these two and test them to see if this is a workable solution for production servers that see many 404s

Comments

hey dpardo; I encourage you

soyarma's picture

hey dpardo;

I encourage you to take a look at the module I put together--be sure to note the readme.txt which contains the additions to the settings.php. While those settings aren't strictly necessary to get the functionality you have written in your module, they allow the image/file URL checking to be done first, which is lighter-weight as it does not require a db query. That way any bad images/files can be identified sooner and the 404 delivered.

I also created the ability for the 404 path check to be done sooner in bootstrap by allowing for the path check (checking against the tables) to be done directly from the settings.php. I got that idea from a discussion here (http://drupal.org/node/76824).

I have identified some caveats and issues and questions:

  1. If the path in menu_router has a % in it (such as node/%/edit) then a path like node/1/edit will fail. This can actually be fixed really simply by reversing the items in the WHERE statement to 'node/1/edit' LIKE path. I did this in m module and it worked like a charm
  2. I'm curious if you ran into an issue with $_GET['q'] that caused you to use request_uri(). I am using $_GET['q'] and it is working quite well (so far)
  3. The path_redirect module creates another table of aliases, we'll need to check and see if the module is enabled and if so look at it's table too.
  4. In working with my dev site after putting my module in place I found myself getting 404s on ajax autocompletes. I'm not sure what the issue is and have more testing to do.

Let me know what you think of the features I added in with my module and if you think it would be a good place to begin collaboration.

Also, I do have a CVS account on drupal.org, so if we think this module has wings we can create a project for it.

Be careful

mikeytown2's picture

I would be careful with matching anything past the first arg (like node). views and I'm sure some other modules utilize the fact that /frontpage/junk gets sent to the frontpage view. I took a stab at cracking views and was semi successful http://drupal.org/project/views404. Just giving you a warning that you might break some stuff if you get too aggressive with issuing a 404.

New Version

dpardo's picture

Hi,

I have rewritten the module for fix points 1 and 2. The new version is in the same place at http://phpia.net/light_not_found.tar.gz

I got some code from Drupal core functions:
menu_get_item()
menu_get_ancestors()

I have to check points 3 and 4, also I have to take some interesting things from includes/path.inc for the frontpage issues.

I have not an account in Drupal CVS if you can create the project I could apply like co-maintainer, Can I?

Daniel

That's a good call. I was

soyarma's picture

That's a good call. I was laying in bed last night thinking about that issue. I've written many a view where I didn't bother putting the % on the end of the path.

A possible solution may be to write an sql query like this:

  $sql = "SELECT path FROM {menu_router} WHERE '%s' LIKE CONCAT(path,'%')";

You may get some false positives, but it will save from breaking things.

Hi, Im back again after a

dpardo's picture

Hi, Im back again after a couple of days of Christmas holidays.

Sorry soyarma, where can I find your code? Im not able to download from http://nosquaresoftware.com/content/fast-404-module (Good name fast-404 :)

Ready

dpardo's picture

Hi all, I finished the module, now checks:
- First if the path is an alias, then translate it, to a Drupal real route.
- if the path is a valid route (I use the same method that Drupal way for prevent to break something like mikeytown2 suggest)
- if path_redirect module is active and check in his table.

I tested in a couple of sites, and it seems to work very fast!

Last version: http://phpia.net/light_not_found.tar.gz

Hi! I caught a bug during

coveryoureyes's picture

Hi! I caught a bug during testing on my site: link aliases like /media work fine but links like /media/ shows as 404. Though before this module installed they were both work fine. So I had to add one line to the function which checks validity of url to correct the bug.

function light_not_found_menu_get_item() {
   $path = $_GET['q'];
  if(substr($path,-1)=='/') $path = substr($path,0,strlen($path)-1);  // <== THIS ONE I MEAN
    if ($path == "" || $path == "index.php") return TRUE; // How works in drupal? take a look to /includes/path.inc for frontpage

Thank you for your work!

Another error fixed

coveryoureyes's picture

Now I also find a lot of error messages in the server log generated by the line $path = $_GET['q'];. Each page load makes four same errors in log: Undefined index:  q in /var/www/..... in same line with same referer. So I added another line of code. Probably it was just needed to prevent it from some unnecessary repeated requests.

function light_not_found_menu_get_item() {
   if(!isset($_GET['q'])) return FALSE;  // <= it works both with TRUE and FALSE
$path = $_GET['q'];
  if(substr($path,-1)=='/') $path = substr($path,0,strlen($path)-1);
   if ($path == "" || $path == "index.php") return TRUE; // How works in drupal? take a look to /includes/path.inc for frontpage

Thanks for the both fix

dpardo's picture

Thanks for the both fix coveryoureyes, I had included the lines, but I changed the first one. It should be return true, else I get a not found with the frontpage.

Hey dpardo; Sorry, I forgot

soyarma's picture

Hey dpardo;

Sorry, I forgot to check off anonymous access to files. I'll look at what you've put together too.

Great soyarma take a look to

dpardo's picture

Great soyarma take a look to the module and after your integration we can upload to Drupal modules.

I took a look at the module

soyarma's picture

I took a look at the module you wrote and also at the patches above. I satisfy the check for frontpage in this line in my 'fast_404_path_check()' module:

if (variable_get('fast_404_path_check',FALSE) && isset($_GET['q'])) {

That is where I determine whether the module is on or not and if it is not the front page.

I'm somewhat uncertain about the ancestors stuff. I'm going to look that over in more detail. I think that some of the need for it may be fixed by reversing the query. I'm also working on adding a check for path_redirect items.

I've looked at the menu

soyarma's picture

I've looked at the menu ancestors function as well as how you recreated it for our purposes, and I think that my simple reversal of the query will work.

The reason is this:

select * from menu_router WHERE path LIKE 'node/1234/edit'

will fail, however if you reverse the where clause the % signs already in the drupal path will work as wildcards for sql

select * from menu_router WHERE 'node/1234/edit' LIKE path

Which actually looks like this to the sql server

select * from menu_router WHERE 'node/1234/edit' LIKE 'node/%/edit'

This will return true and the path will be valid. One more change is to do what I outlined above for things like views:

select * from menu_router WHERE '%s' LIKE CONCAT(path,'%')

Which, when fully parsed comes out like this in the sql server

select * from menu_router WHERE 'my-view/arg1/arg2' LIKE CONCAT(path,'%')

That way if the path that is in menu router is just 'my-view' it will still match.

Do you think that covers the issues you used the ancestors setup for?

The module definitely needs

soyarma's picture

The module definitely needs more review and the certainty that my method of writing the SQL queries does the same thing as the ancestors method, but it is up on D.O. here: http://drupal.org/project/fast_404

A dev build is in queue to be created, but one can also pull the sucker out of CVS as well.

Contributed Module Ideas

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week