Performance impact of .htaccess and rewrite rules?

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
SerenityNow's picture

Hi-

I am new to Drupal, but I have always avoided use of .htaccess files and mod_rewrite because I was under the impression that they could negatively impact performance.

Here are some excerpts from Wrox's 'Professional Apache' in the chapter on improving Apache's performance:

On .htaccess:

"If AllowOverride is set to anything other than 'None', Apache will check for directives in .htaccess files for each directory from the root all the way down to the directory in which the requested resource resides, after aliasing has been taken into account. This can be extremely time consuming since Apache does this check every time a URL is requested, so unless absolutely needed, always (set AllowOverride to 'None')"

"If .htaccess files are not required in a location or the server generally, we can produce a much more efficient server by turning them off. If the possibility of an .htaccess file exists, Apache must check to see if one is present in every directory above the one the URL refers to, while overrides are enabled."

On mod_rewrite:

"Any use of mod_rewrite's URL rewriting capabilities can cause significant performance loss, especially for complex rewriting strategies. The RewriteEngine directive can be specified on a per-directory or per-virtual host, so it is worth enabling and disabling mod_rewrite selectively if the rules are complex and needed only in some cases.

In addition, certain rules can cause additional performance problems by making internal HTTP requests to the server. Pay special attention to the NS flag, and be wary of using the -F and especially -U conditional tests."

I just installed Drupal and have been exploring the system. One thing I noticed is that the only .htaccess file included with the distribution is at the root of the Drupal files, so I am wondering is it really necessary to set AllowOverride to 'All' or can I move the .htaccess directives into the httpd.conf?

Also, I noticed that the RewriteEngine is turned on for the entire Drupal directory structure and there are some RewriteRules enabled by default in the .htaccess file. This Drupal install will be deployed behind a company firewall, so I'm not concerned with having "clean" or "search engine friendly" URLs. So I was wondering if it is necessary to have any RewriteRules at all for Drupal to run? If some URL rewriting is required (like maybe for private files), I would prefer to enable the RewriteEngine only for the specific directories and cases where it would be needed and not for the entire Drupal directory structure - is that possible?

Has anyone done anything similar to what I describe? Suggestions?

Any info appreciated, thanks.

Comments

The best approach I have

slantview's picture

The best approach I have seen on this for high performance is to Include the .htaccess files in the VirtualHost definition and turn the Overrides to none. This is especially critical if you have your root directory running over NFS as it will cause a ton of lookups and all over NFS. If you don't need "clean urls" you can turn off the rewrite engine, but I have not seen that to be especially effective for performance.

http://www.slantview.com/

It's all fine from an

stewsnooze's picture

It's all fine from an academic point of view to discuss whether Apache is slow but if you are behind a corporate firewall you are quite likely to be doing low traffic. Drupal currently ships with .htaccess and the rewrites on e.t..c because it tries to work on all sorts of hosting platforms. I wouldn't even look at changing this if you are doing less than a Drupal requests per second averaged out over the working day. I would first look at page caching, memcached, db optimisation before this.

However please don't let me derail this conversation as I am simply trying to help point out what performance problems you may encounter before Apache performance problems.

Full Fat Things ( http://fullfatthings.com ), my Drupal consultancy that makes sites fast.

Imagecache & files directory

Jax's picture

Imagecache requires clean_urls to be on. Since that is an often-used module you usually need to turn on mod_rewrite.

There's also a .htaccess in the files directory which is automatically generated. Don't forget to include that info in your httpd.conf.

Thanks - I will keep an eye

SerenityNow's picture

Thanks - I will keep an eye out for that additional .htaccess file and any imagecache dependencies.

Are there any benchmarks?

chriscohen's picture

Are there any benchmarks to show the effect of using mod_rewrite or .htaccess files (with modest rules) on a typical Drupal site? While I don't question the book's assertion that these things will make requests slower, terms like 'extremely time-consuming' and 'significant performance loss' can mean different things to different people without real figures.

A quick google turned up

Jax's picture

A quick google turned up http://www.fubra.com/blog/2008/01/htaccess-vs-httpdconf/ which concludes:

That gives us a difference of around 6.6% less requests per second while .htaccess is turned on.

Thanks this is great data!

SerenityNow's picture

Thanks this is great data!

Benchmarks

Jamie Holly's picture

This is on Drupal 6 with all cache primed

Rewrites located in VHost:

Concurrency Level:      10
Time taken for tests:   3.252 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      998200 bytes
HTML transferred:       933900 bytes
Requests per second:    30.75 [#/sec] (mean)
Time per request:       325.210 [ms] (mean)
Time per request:       32.521 [ms] (mean, across all concurrent requests)
Transfer rate:          299.75 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        3    9   8.1      5      52
Processing:    47  314 186.9    323     832
Waiting:       42  253 156.1    247     738
Total:         53  323 189.1    330     836

Percentage of the requests served within a certain time (ms)
  50%    330
  66%    410
  75%    444
  80%    461
  90%    559
  95%    697
  98%    834
  99%    836
100%    836 (longest request)

Rewrites located in .htaccess:

Concurrency Level:      10
Time taken for tests:   3.800 seconds
Complete requests:      100
Failed requests:        0
Write errors:           0
Total transferred:      998200 bytes
HTML transferred:       933900 bytes
Requests per second:    26.31 [#/sec] (mean)
Time per request:       380.018 [ms] (mean)
Time per request:       38.002 [ms] (mean, across all concurrent requests)
Transfer rate:          256.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        4    8   9.3      6      67
Processing:    47  368 430.7    236    1777
Waiting:       42  323 410.6    191    1663
Total:         53  377 436.9    242    1798

Percentage of the requests served within a certain time (ms)
  50%    242
  66%    284
  75%    338
  80%    359
  90%   1528
  95%   1661
  98%   1753
  99%   1798
100%   1798 (longest request)

**NOTE: is that this is just loading Drupal. CSS, JS and images aren't loaded off of AB. If you are running a site without JS and CSS optimization/aggregation enabled then your performance would really start suffering with the depth of the directories involved with stylesheets and javascript files included in modules.

In my experience moving the rewrites to the vhost directive isn't the best option. If your server is getting to the point you need to do that to increase performance then you will most likely be looking at upgrading in the very near future anyways. Also we have occasional Drupal releases that include changed .htaccess files. Its easy to overlook those and the chance of that happening raises exponentially if you don't use .htaccess.

What I actually do is a combination though. On larger sites I manage I actually modify the core to utilize a static url. All JS, CSS and Images are loaded from a straight static server (no php at all). There is a small rewrite in that vhost to check if the file exists. If not then the request is sent back to the main site. That helps with uploads, etc. until rsync has a chance to catch up. Its also good if you are at the point you can run on a single server with Drupal and don't want to bother with NFS. This gives us the best of both worlds. We don't have to worry about the performance loss of .htaccess when it comes to sub-directories of the document root and can still use it to handle the main site.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Thanks for some

SerenityNow's picture

Thanks for some drupal-specific data.

What's interesting to me is that even though this data (and the data from the fubar test previously posted) does not take into account the depth of drupal's directory structure, there is a substantial impact on performance.

FUBAR HTACCESS DATA
0-level deep htaccess diff = 6% diff

INTOXINATION REWRITE DATA
Time taken for tests: 3.252 seconds vhost vs. 3.800 seconds htaccess = 17% diff
Requests per second: 30.75 vhost vs. 26.31 htaccess = 14% diff
Connection Times (ms): 323 vhost vs. 377 htaccess = 17% diff

I have to say that to me, it seems like a pretty good deal to get roughly a 20% (or more for requests to an n-level deep directory structure like Drupal with requests for CSS and JS files also factored in) improvement in web server efficiency at runtime, simply by moving some directives from one file to another at design time, and making a design decision to minimize/eliminate use of URL rewriting. It's true that this customization would need to be tracked, but when I think about being able to eliminate work done by apache with every single request, it seems like something to consider.

And stepping away from a short term benchmark test, the real-world benefit would grow larger over time since there is an efficiency savings that is additive (since with every single request served by apache there is no need to go to disk to do .htaccess lookups or burn CPU rewriting URLs) until you restart the apache server.

I realize that everyone's situation is different, but for me, if there is little risk, it seems like moving the directives may make sense for this project.

This benchmark does not make sense

nirai's picture

The benchmark referenced above you (http://www.fubra.com/blog/2008/01/htaccess-vs-httpdconf/) measured a performance decrease of 6% from 4500 requests per second to 4200 requests per second.

This means that at the rate of 30 GET requests per second limited to the HTML resource where Drupal is clearly the bottleneck there should be 0 impact by .htaccess!

Testing for a period of 3 seconds is misleading.
You should repeat with a 30 seconds test and you will see that there is 0 difference in results.

On the other hand as you say, your test only retrieved the shallow Drupal index.php.
I conducted a similar test with the ab benchmark which you seem to have used to retrieve a reasonably deep file:
/sites/mydomain.org/files/imagecache/thumbnail_big/daniel/images/default-thumbnail.jpg
At that depth the impact was more serious and apache served 2200 requests per seconds instead of about 3700

The overhead for enabling .htaccess in that case was about 0.2 ms per request.
If you calculate 10 hits per page view you are looking at about 2 ms overhead per page view.

Therefore for a 30 page views per seconds Drupal website with all the (deep) resources served you should see about 5% performance impact.

For 99.9% of sites which usually operate well below this level, the performance impact should be negligible.

Thanks

chriscohen's picture

Thanks for that. It's pretty interesting to see some real figures. In light of this I think I would just be tempted to throw more power at the situation, or look at other performance-enhancing options, rather than move .htaccess rules to the VHost.

The best thing you can do

Jamie Holly's picture

The best thing you can do (if you haven't already) is install APC on your server. Also alternative caching mechanisms like cache_router or Boost will give you the biggest bang for the buck.

Another thing that also helps is using CSS sprites to handle the template images. Instead of having a request for each image that makes up the template, put them all in one file and use the CSS background position to display the appropriate part of the sprite image.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Just to be clear, you don't

ldpm's picture

Just to be clear, you don't actually have to move the rules themselves to your vhost. Just make your <Directory> look something like this:

<Directory /path/to/DocumentRoot/>
AllowOverride None
Order allow,deny
allow from all
Include /path/to/DocumentRoot/.htaccess
</Directory>

You'll need to bounce apache whenever you make a change to your rewrite rules, but you won't have to actually store them in the main httpd.conf.

This is an execellent and

ducdebreme's picture

This is an execellent and simple idea!
Thanks!
Stefan

This is exactly what I said

slantview's picture

This is exactly what I said to do in the very first reply ;)

One thing I didn't mention is you actually want this for the files directory as well as the site root.

dont forget you may have

morningtime's picture

dont forget you may have multiple htacces files, e.g. in /sites/default/files/.htaccess

then you must add a separate

AllowOverride None
Include /var/www/vhosts/domain.com/httpdocs/sites/default/files/.htaccess

for each extra htaccess file

multi-site

recrit's picture

if you have a multi-site configuration and want to AllowOverride None, then the following code makes it more maintainable so you dont have to modify apache configs everytime you add a site. Again, performance benefits of loading this all in the main config or sites-available config might just be a feel good type of benefit.

<?php
 
# Files and Temp dirs
  # handles root (/var/www/sites) and sub dirs (/var/www/sub/sites)
 
<Directory ~ "^/var/www/(.+/)*sites/.+/(files|tmp)/">
   
SetHandler Drupal_Security_Do_Not_Remove_See_SA_2006_006
    Options None
    Options
+FollowSymLinks
 
</Directory>
?>

update book page

mikeytown2's picture

care to update the book page?
http://drupal.org/node/43788

I updated the book page with

recrit's picture

I updated the book page with a new section "Multisite + Sub Directories + files + tmp". This extends the current multisite directives to cover subdirectories, so if they do not have subdirectories they can use the simpler one.

Not a good idea

mukesh.agarwal17's picture

I'm using ckeditor module and it has 2 different htaccess files.. but I cannot just include them coz their htaccess files read something like this:

AddType application/x-javascript .js
AddType text/css .css
AddType text/xml .xml

and

<IfModule mod_php5.c>
  php_value engine off
</IfModule>
<IfModule mod_php4.c>
  php_value engine off
</IfModule>

if i try to include them straight away, it will force the users to download a link... and not render it, if you know what I mean..

Should we instead add another directory directive to the same virtualhost? something like

<Directory /var/www/XXX/sites/all/modules/contrib/ckeditor/ckeditor>
Include /var/www/XXX/sites/all/modules/contrib/ckeditor/ckeditor/.htaccess
</Directory>
<Directory /var/www/XXX/sites/all/modules/contrib/ckeditor/ckfinder/userfiles>
Include /var/www/XXX/sites/all/modules/contrib/ckeditor/ckfinder/userfiles/.htaccess
</Directory>

Any help will be appreciated.

Btw, the change in AllowOverride settings has not made much of a difference to my performance, every authenticated process is still consuming more than 50Mb although I've boost and cdn enabled. (I use apache and mod_php)..

Cheers,
Mukesh Agarwal
www.innoraft.com

This AllowOverride trick will

dalin's picture

This AllowOverride trick will have zero effect on performance, only increased scalability. But as noted it is minimal unless you are using NFS.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Thanks everyone for the

SerenityNow's picture

Thanks everyone for the excellent feedback!

Still lacking hard numbers ...

kbahey's picture

I have long seen this "move .htaccess to the vhost" recommendation, and its promise of increasing performance, but little hard evidence that it does indeed benefit performance (i.e. before and after benchmarks, ...etc.)

We have a site that gets over 3 million page views a day, 92 million per month. Above 2 million on a slow day.

And these are Google Analytics only numbers. The real numbers due to non Javascript crawlers are higher.

The site uses .htaccess in the normal way and has not been slow at all.

this is not to say removing .htaccess has no value, I am sure it does have some impact, because there is no check on .htaccess files going up the path for each directory. But what REAL impact does it have?

Also, remember that .htaccess is needed in the "files" directory to prevent nasty attacks. See here for details http://drupal.org/node/65409. If you remove it, you are opening yourself to that kind of attack.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

I think I posted some numbers

dalin's picture

I think I posted some numbers a while back somewhere but I can't find them. I had ran some benchmarks and found a 1-2% increase on EXT3, higher on NFS. So if you mount your files directory over NFS, and have a deep file tree, you might want to investigate further.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

micheas's picture

Cloud file systems tend to also have high latency, so if you are on ec2, rackspace cloud, or one of the other providers with a virtualized filesystem disabling .htacces lookups might significant.