Pressflow/Drupal 6 performance related patches & replacements for core functionally

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

Core Patches

Big Performance Gains - Low Risk

Cache module_implements() - 300ms
Optimize element_children() - 400ms
_menu_link_translate() might avoid calling _menu_load_objects() - 250ms (if using views module)
Clean up drupal_get_schema_versions() - 2000ms (admin status page).

Big Performance Gains - High Risk

Cache url() - 400ms (warning: breaks menu in Open Atrium)

Small Performance Gains

drupal_lookup_path() optimization - skip looking up certain paths in drupal_lookup_path() -- Not needed in Pressflow.
COUNT(*) is an expensive query in InnoDB.
taxonomy_get_vocabularies needs static caching
node_access issues four queries on default install node/1 and has no static cache
optimize _trigger_get_hook_aids()
Be more sparing with locale cache clears
Backport DRUPAL_ROOT to 6.x
bootstrap_invoke_all() queries bootstrap modules twice
Backport $conf['blocked_ips'] to D6
Performance fix in system_region_list() avoid an SQL query by using globals
Make drupal_attributes() faster
If item is hidden in _menu_tree_check_access() skip it right away.
Replace strtr() with str_replace() for db prefixing - Saves 300ms with 134 prefixes (CiviCRM) and around 579 DB queries

Race Conditions

InnoDB - MySQL transient error handling - Fix for "Deadlock found when trying to get lock; try restarting transaction query"
Fixes "Warning: unlink() No such file or directory in file_delete()"
Rewrite module_rebuild_cache() and system_theme_data()
menu_masks variable is empty
Prevent variable_init from returning an empty array
Race Condition when using file_save_data FILE_EXISTS_REPLACE

Fix Bugs

book_get_books() cache becomes stale when batch-inserting book pages
MySQL silently fails: Got a packet bigger than "max_allowed_packet" bytes
Ensure that entries are written to watchdog table
Impossible to lock two MySQL tables
node_view() incompatible to php5
Impossible to insert serialized data with update_sql()
db_set_active errors out if db_connect fails. Add in the ability to handle this gracefully (don't always call _db_error_page)

Fix PHP Notices

PHP notice in form_builder function
Remove PHP warnings from legacy PHP4 XML parser for new update status release history XML
Notice: Undefined index: key in format_xml_elements()
Undefined index: configurable in actions_synchronize()
Notice: Undefined index: type in dblog_build_filter_query()
Function split() is deprecated in _filter_autop()

Contrib Patches

Performance

Views
views_get_default_view() - race conditions and memory usage
Persistent caching for unpack_options() calls from building displays

Imagecache
use lock.inc instead of a file lock

Replace Core's Functionally

Cache Backends

Memcache API and Integration (cache.inc, lock.inc, session.inc)
Cache Router (cache.inc)
Cache Backport (D7 to D6) allows use on d6 of various cache modules from d7 (memcache, apc, redis)

Cron

Elysia Cron
SuperCron
Ultimate Cron

Page only

Boost
Varnish HTTP Accelerator Integration
Authenticated User Page Caching (Authcache)

Search

Apache Solr Search Integration

Statistics

jStats
Google Analytics Statistics
Boost - Stats block.

CSS/JS file aggregation/optimization

Javascript Aggregator
Advanced CSS/JS Aggregation
CSS Embedded Images
Fork of CSS Embedded Images (for AdvAgg)

Watchdog

File logging
Rawlog
Redis Watchdog
Graylog2 GELF logging

HTTP Requests

cURL
HTTP Parallel Request Library - Replacement for drupal_http_request()

Block/Region/Pane Alt Rendering

Ajax Blocks
Ajaxify Regions
Views Javascript Random
Edge Side Includes integration - 2.x

3rd party performance analysis (SAAS)

New Relic
Tracelytics

Others

Imageinfo Cache - Cache image_get_info()
Fast 404s for static content. Be sure to view the README of the advagg module if you use this with advagg.
Taxonomy Edge - Overrides taxonomy_get_tree() and taxonomy_select_nodes().
views_optimized - Optimizes certain views queries into subqueries. See complete thread for details.
_content_type_info() memory usage improvements - Reduce CCK memory usage
views_calcpager - Uses the SQL_CALC_ROWS hint to calculate counts for views pagers. On some data sets with some queries, this can be significantly faster (~50%) than executing a separate COUNT() query as is done in the standard views pager.
spaces_og_accelerated - Provides performance improvements for spaces_og by delaying or eliminating spaces' group node_load.

Comments

With patch Cache

superfedya's picture

With patch Cache module_implements(): Save about 30-80ms.
With patch Cache module_implements() and Cache url() same ms saving that with Cache module_implements() only. But there is 20 queries saving on front page.

YMMV

mikeytown2's picture

Your Mileage May Very. The numbers I put up where from a heavy site; its usually easier to find gains on larger sites.

Why this?

perusio's picture

<?php
$script
= (strpos($_SERVER['SERVER_SOFTWARE'], 'Apache') === FALSE) ? 'index.php' : '';
?>

This is Apache centric. Any decent Nginx config can handle ?q=<URI> request URIs, so why this?
I was unaware that this was in core. Needs fixing.
<?php
$script
= in_array($_SERVER['SERVER_SOFTWARE'], array('Nginx', 'Apache')) == FALSE ? 'index.php' : '';
?>

Should be submitted to core patches and press flow

aruna.kulatunga's picture

Hi Perusio,

Might be an idea to submit this to core and also to the Pressflow issue queue in github or pressflow.org

Another one

NITEMAN's picture

Please consider adding: http://drupal.org/node/881344

Best Regards

Not a patch

mikeytown2's picture

There is no patch in that issue.

--
Linux: Web Developer
Peter Bowey Computer Solutions
Australia: GMT+9:30
(¯`·..·[ Peter ]·..·´¯)

Pressflow

mikeytown2's picture

Pressflow already has the CDN patch applied.

CDN patch

rjbrown99's picture

FWIW, the current CDN patch in Pressflow core is not the same as the one included in the CDN module patches dir. There were some changes that have not been merged back to Pressflow.

http://drupalcode.org/project/cdn.git/blob/refs/heads/6.x-2.x:/patches/d...

cocomore

mikeytown2's picture

Just discovered http://drupal.cocomore.com/projects/cocomore-drupal-core-cdc so I will be looking through the patches for anything of interest.

Added Optimize element_children()

mikeytown2's picture

Gives me over 300ms improvement on a front page load.
http://drupal.org/node/1345204#comment-5259212

Another

rjbrown99's picture

This is a good one:
http://drupal.org/node/1061348

Potentially removes a non-cached database query from EVERY request, including requests that would result in an external cache hit since this is called higher in the bootstrap order than the external cache.

Another one, optimize menu_router_build() and _menu_router_save():
http://drupal.org/node/512962#comment-4487890

And another, cache menu_get_item():
http://drupal.org/node/643984

Yet another, static cache for bootstrap_invoke_all():
http://drupal.org/node/1103910

Small fix for system_region_list():
http://drupal.org/node/941334#comment-4751192

Make drupal_attributes() faster:
http://drupal.org/node/961908#comment-3673582

Index for menu_links:
http://drupal.org/node/371521

db_placeholders uses waseful implode/array_fill functions:
http://drupal.org/node/485618#comment-4249368

I also run a slightly modified implementation of taxonomy caching, since I do not do any type of taxonomy access control and my entire site is basically built around taxonomy. This helped quite a lot and my hit rate is high for that bin:
http://drupalcode.org/project/advcache.git/blob/refs/heads/6.x-1.x:/DRUP...

Perhaps we should consider a Pressflow fork and merge request with some of these changes?

More patches that need work

mikeytown2's picture

Race Condition when using file_save_data FILE_EXISTS_REPLACE
http://drupal.org/node/818818

_menu_link_translate() might avoid calling _menu_load_objects()
http://drupal.org/node/753064

Regression: Unify and rewrite module_rebuild_cache() and system_theme_data()
http://drupal.org/node/147000

Better, multi-site friendly "www." addition/removal in .htaccess
http://drupal.org/node/352180

Replace watchdog

rjbrown99's picture

For another "replace watchdog", I use a variation of this:
http://drupal.org/project/gelf

This is the GELF log format for Graylog2, a scalable logging framework. Graylog2 currently uses MongoDB for storage but will soon use ElasticSearch in the next version. It also enables you to send arbitrary structured data and perform searches on it. It's very cool.

wiki

mikeytown2's picture

This is a wiki so feel free to edit it :)
I did add some of the patches you mentioned; none of them showed a big improvement, but on other systems they might. Also added in a new section called "Block/Region/Pane Alt Rendering". I encourage people to test out my sandbox project "Fork of Edge Side Includes integration" if this is something your looking for.

Thanks!

omega8cc's picture

I created a Pressflow 6.22 clone with all patches from "Big Performance Gains" and "Small Performance Gains" sections:

https://github.com/omega8cc/pressflow6

Enjoy!

Great idea! I'd considered

cashwilliams's picture

Great idea! I'd considered doing this before, just never taken the time. Thanks, can't wait to check it out

excelent!! I'll take a look

killua99's picture

excelent!! I'll take a look :D

[at]killua99 ~~

thanks very much, I hear that

gateway69's picture

thanks very much, I hear that while some patches may improve performance it could also increase memory usages, is their a break down of what would eat up more memory but also provide a boost..

My preliminar tests in dev

rodricels's picture

My preliminar tests in dev enviroment:

large drupal 6 site
php 5.2 and 5.3
memcache
drupal no cache
mysql percona 5.5

gains ~100ms execution time :D

"Official" pressflow 6

justindavis's picture

It looks like you've patched this release to Drupal 6.24 -- the "official" Pressflow release on github still shows up as 6.22. Would you consider your branch safe for "production" sites?

Thank you regardless!

Official Pressflow is already at 6.23

omega8cc's picture

But we are using this 6.24 +Extra fork in production already, but note that it includes also two extra patches to fix major issues with 6.24:

http://drupal.org/node/1425868#comment-5565092
http://drupal.org/node/1425260#comment-5550350

We also removed one patch previously applied - 'Cache url() - 400ms' - because it breaks menus in Open Atrium.

I hope this helps.

Cache url

rjbrown99's picture

I removed 'Cache url()' as well, but for a different reason - it was causing a tremendous amount of activity around locking and the semaphore table. My Newrelic graphs showed a huge spike with locks, and this is with the 'make lock_wait() wait less' applied and backing the locking subsystem in Memcache. Backing out that patch put me back to normal.

Issue

mikeytown2's picture

Can you provide more details in the issue?
http://drupal.org/node/1327720#comment-5607160

Not a lot...

rjbrown99's picture

Unfortunately I don't have a lot to show for it. It didn't show any issues in dev or test, then it rolled to prod during a low point for our traffic. We started to see lots of locking activity and rolled it back. Our dev/test load testing tools are not a perfect representation of prod, plus we don't have NewRelic hooked to those environments so it is more likely than not that we wouldn't have noticed.

Core Statistics Module

mikeytown2's picture

Added in a Statistics section with jStats, Google Analytics Statistics, and Boost. Any others? I included GA Stats because of this Google Analytics Statistics - Drupal 6 feasibility

As far as I got my benchmarks

tugis's picture

As far as I got my benchmarks correct, both Pressflow and patches included in this wiki, mostly improve a system with a "warmed cache". I achieved big gains accessing pages already visited before (around 1500ms gains average per page, for authenticated users!).

If you are visiting a page for the first time and your system is complex, you still have to carefully look at your code and the contributed modules you added to it.

Cck

rjbrown99's picture

Here's one for CCK that I had not seen before. I realize this page is mostly focused on core but it looked intersting. I have not tested it myself so I am not posting to the wiki above yet.

http://drupal.org/node/1009596

A few additional patches

rjbrown99's picture

Other new patches, that I have not tested:

Replace strtr() with str_replace() for db prefixing
Reduce {system} database hits (needs backport from 7)
Add caching for system_list() (needs backport from 7)
Optimize menu_tree_page_data() performance

I am also using the following patch on my pressflow fork with no problems. It allowed me to update all of my custom (non-released) modules to using the constant which keeps them more in-line with what I'd need for D7. Here is what my commit looks like.
Define REQUEST_TIME as a constant (needs backport from 7)

Interesting

rjbrown99's picture

And for a completely different approach, how about just implementing some of the core Drupal functions as a PHP extension?

Drupal PHP Extension

There is code in the tree if you check it out directly. I have not tested it, but it's an interesting idea. Less intrusive than just rolling the entire Drupal PHP environment into something like HipHop.

git clone --branch master http://git.drupal.org/project/drupal_php_ext.git
phpize
./configure
make

I always thought this was a

gateway69's picture

I always thought this was a good approach to take parts of any type of framework and compile it into C into php or other , this can deff help speed things up, hence why facebook really went the hip hop route.

Agree

rjbrown99's picture

I agree - HipHop has been a bit of a pain to work with, at least for me. Just getting the base packages up and running with the patched curl and libeventd takes a bit of time and effort. It does not seem that the Drupal PHP extension has any type of momentum so this is more just my admiration for the concept. I also remember C about as well as 11th grade Spanish so I'm not going to be any help in trying to move the ball forward.

Unfortunately I see this a

Spechal's picture

Unfortunately I see this a lot with Drupal. "Let's not fix it the issue, let's just mask it. We can use memcached, opcode caching, huge sql buffer pools, varnish/nginx, oh, and we can even turn various Drupal parts into a PHP extension!"

How about making a crappy site A LOT better and then making it even better with the aforementioned tools.

Not bashing Drupal, just those that say to mask the problem and move on.

Let the negative votes and flamers commence...

For me...

rjbrown99's picture

I can tell you that for me, I very much like to fix my issues and do so regularly. I have fixed countless issues on my primary site, almost all of the major ones caused by me implementing things incorrectly. But after spending much time with NewRelic and friends, there comes a point where the extra speed boosts aren't going to come from fixing my own code or undoing things like embedded views, pages with tons of uncached node teasers, and the like since they have already been optimized.

So that's where we come to caching and the other tricks. Why do I care about this? Easy - it costs $$ to scale a large Drupal site in the cloud, and squeezing out that extra bit of performance may lower the number of instances needed or the type of machines required to run the site. So we hack about and test/try/fix. Most of this is just applying things that have already been done in D7 or elsewhere.

At least for me, it's not masking the problem as much as it's eeking out that extra bit of performance. No negativity or flames here - this issue and wiki isn't for everyone, but it seems like it has found an audience so others have identified a similar need. That's the great thing about Drupal - community. I couldn't do what I have done so far without all of the help and contributions of the other folks out there.

Premature optimization ...

kbahey's picture

Premature optimization is the root of all evil -- Donald Knuth.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Thank you! That is what

Spechal's picture

Thank you! That is what development and scaling is all about. Getting everything you can from the code before throwing money/hardware at it.

Fork

rjbrown99's picture

I also did mostly the same thing as omega8cc, which is to fork off Pressflow with these patches. My tree is here:
https://github.com/rjbrown99/6

There are a few differences which you should be able to read about on the wiki page on that site. It has links to all of the patches that were applied, and I'm also keeping up with both Pressflow and the 6.25-pre tree which as of today has no real commits after the 6.24 release. I also have a few extra unapplied but otherwise ready-to-go patches in the patches directory, including the "Cache url()" which could break Open Atrium per the above notes. Those patches should cleanly apply in the top of the tree with a patch -p0 if you want them.

I have also started to implement a few NewRelic-specific calls in the tree to optimize my results. Right now, this just focuses on ignoring batchapi calls for purposes of the Apdex calculation since my daily batch runs were throwing off my numbers. I'm going to add more as I find opportunities.

I plan to keep this tree updated as I am using it for prod, but your mileage may vary.

Issue

rjbrown99's picture

I ran into an issue with Cache menu_get_item() backport to Drupal 6, and it seems as if I am not the only one. This patch was reverted in Drupal 7. I backed it out of my tree, and I'm going to remove it from the wiki above as well.

hmm when updating with your

gateway69's picture

hmm when updating with your fork today I got the following errors on update

Update #6006
Failed: ALTER TABLE comment DROP INDEX comment_uid
Failed: ALTER TABLE comment DROP INDEX nid
Failed: ALTER TABLE comment DROP INDEX pid
Failed: ALTER TABLE comment DROP INDEX status
Failed: ALTER TABLE comment ADD INDEX comment_pid_status (pid, status)
Failed: ALTER TABLE comment ADD INDEX comment_num_new (nid, timestamp, status)

Patch

rjbrown99's picture

Ah yes, typo with this patch. Fixed.

By the way, I'm not encouraging folks to use this fork. It's me experimenting with lots of the performance patches and some PHP 5.2/5.3 only features. You are welcome to, but keep in mind that it may break. The wiki should be reasonably up-to-date with the changes.

Equally interesting

perusio's picture

is the move by MediaWiki to adopt Lua as a templating macro language:

http://www.reddit.com/r/programming/comments/p4sve/wikipedia_chooses_lua...

http://svn.wikimedia.org/viewvc/mediawiki/trunk/php/luasandbox/

Lua is small, fast and is integrated with Nginx.

Need to know when will release presflow-6.24

fulgent's picture

Hi All,

I have a one site which is running on the drupal-6.24 latest version.
I need to migrate it in to the pressflow.So My question's are :

1) Is their any wrong impact on my site if i go with pressflow?
2) when will release pressflow 6.24 ?
2) How can i update my system into the pressflow?

Thanks is advance!!

The reason Pressflow is not

Mark Theunissen's picture

The reason Pressflow is not upgraded to 6.24 is because there is a merge conflict with some changes to the bootstrap process (and time, of course). David is working on this, as you can see in the Github issue here.

Fork Pressflow if you absolutely have to, but please create pull requests on Github. Just like with Drupal, it makes sense to work on a common platform instead of everyone maintaining their own forks.

All the discussion going on here, around which new patches should be included, is great. Though it should really be moved to the Github issue queue so that we can follow them.

Thanks everyone!

Mark Theunissen
(Four Kitchens)

module_list

mikeytown2's picture

Been doing some benchmarks and module_list can be an issue sometimes. In D7 this is cached. Eventually I would like to back port this as it might save around 100ms per page (if mysql is acting up). Under normal conditions the query returns in about 50ms so this should be a nice speed improvement either way.

D6: module_list()
D7: module_list() -> system_list()

Anyone know what issues where created that changed this in D7?

Ubercart

mikeytown2's picture

I have a core patch of interest for people who might be using Ubercart. It's a backport of an issue I found in D8/D7 for D6. If item is hidden in _menu_tree_check_access() skip it right away. Reports that this patch reduced 1200 queries down to 200 queries.

There is no 6.28 version on

superfedya's picture

There is no 6.28 version on this +Extra?

Thanks

It is in a separate branch:

omega8cc's picture

It is in a separate branch: https://github.com/omega8cc/pressflow6/tree/pressflow-plus-6.28.1 and download available via tags page: https://github.com/omega8cc/pressflow6/tags

I will merge it in master there, to avoid confusion.

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week