Hi all,
Relatively new to Nginx, got a few sites running on it, no problem - we're porting another site over but this one is doing a WSOD on any views pages (as well as Views admin) and refusing to load the Admin module or the Rubik theme. It's truly odd. Linux permissions seem ok (triple-checked), no other issues we can see. We're not getting any meaningful errors in the error log, just the white screen. We installed dtools to get more information and the errors coming back look exactly like this:
http://drupal.org/node/907594
But that's where it ends. Everyone reporting this finds a path_to_theme() function call is the cause - we haven't got any in custom modules with this in and we've disabled all contrib modules that have such calls, just in case, to no avail.
The set-up is Nginx, FastCGI + Varnish on a CentOS 5.5 server. I repeat, other sites on the same server are not suffering these issues. We're suspicious of the server set-up, because we can't repeat these problems on local machines running Apache2. Moving the site between Apache2 instances on different distros produces no problems at all, same code, same db - but on this one server it will not fly!
Has anyone seen anything like this before? We're totally stumped!

Comments
To debug WSOD on the fly you
To debug WSOD on the fly you could put in the index.php something like:
<?phperror_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
?>
In this case it can be related to some not enough tuned fastcgi settings in your Nginx configuration.
Or it is possible you need to raise values for accepted headers buffers, as default values can cause (random!) WSOD due to changing size of stuff stored in cookies - some themes are storing really big things there and it may cause cascades of errors, hard to debugging.
I would recommend to start with changing one value at the time, and restarting Nginx to see if that helps.
Something to compare with: https://github.com/omega8cc/provision/blob/master/http/nginx/server.tpl.php
You could start with values in the
Size Limitssection there.Thanks!
However we have advanced on this - just as a test, our developer swapped out Pressflow for standard Drupal core and it works fine! So something in Pressflow combined with our particular set of modules and settings seems to break this site badly... but only on this server. Standard core works, Pressflow breaks (badly!) ...
Does that change anything? Or would you still advise starting in the same place?
what version of php you run
what version of php you run on server?
5.3
PHP 5.3.3:
$ php -vPHP 5.3.3 (cli) (built: Jul 22 2010 17:12:45)
Copyright (c) 1997-2010 The PHP Group
Zend Engine v2.3.0, Copyright (c) 1998-2010 Zend Technologies
$ uname -aLinux enigma1 2.6.35.7 #1 SMP Mon Oct 4 20:11:56 BST 2010 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/*-release
CentOS release 5.5 (Final)
Edit: should note other servers with similar module sets work fine - both Admin and Views are ok on the sister site on the same server... =/
Same issue
or almost:
setup : pressflow 6.X(current GIT), latest authcache-dev, latest cacherouter-dev , php-fpm 5.3.3, nginx 0.8.54 (gzip enable, , apc(130mb), memcache (64mb for sessions) on a linode 1024 VPS.
- with no cacherouter/authcache, pages are generated correctly
- with cache enable, pages are generated correctly, when the cache is generated(first time the page is called after emptycache)
- but after that step, pages are rendered without central content (only header), some times some blocks are also rendered, but never all info.
I tried to increase the APC/MEMCACHE dedicated memory ... no chance
Something really strange, is that when the "cached" version is called, a theme include generate an error (sites/all/modules/lecourrier/theme.inc) I had to add a (ugly)
require_once "/var/www/pressflow/".drupal_get_path('module',"lecourrier")."/theme.inc";;
to remove the error.
...
This can be still the same
This can be still the same (or related) issue, if anything (module/theme) in this site relies on anonymous sessions/cookies (not used in Pressflow), as it other changes/differences between vanilla and Pressflow core shouldn't break things so badly.
I would suggest to try with headers buffers values in Nginx anyway, and then the list of modules you have there would help to diagnose the problem, which is really interesting, since there are only a few known modules causing issues with Pressflow (and Varnish):
https://wiki.fourkitchens.com/display/PF/Modules+that+break+caching%2C+a...
Note: the Admin module is on that list (oops!) so it explains why it fails when combined with Pressflow + Varnish, but still - it can be related to mentioned there cookies stuff and too low values in the Nginx headers buffers, so try to increase it first.
Great link
Thank you again! Will look through.
Works with Pressflow on Apache2 with no Varnish, so it's a fair bet it's something like this...
No chance, it would have been
No chance, it would have been to easy. I had only masquerade as "confilcting" module. I disabled it with no effect.
fastcgi_intercept_errors on;
fastcgi_ignore_client_abort off;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 256k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 512k;
fastcgi_temp_file_write_size 512k;
What I don't understand is that I dont get any error, neither in system log nor in dblog reports
Try to increase headers
Try to increase headers buffers (not fastcgi buffers) and watch what you have in the Nginx (not php) error log.
See https://github.com/omega8cc/provision/blob/master/http/nginx/server.tpl.... and increase values of
client_header_buffer_sizeandlarge_client_header_buffers.no luck
I set
client_body_buffer_size 128k;
client_header_buffer_size 64k;
client_max_body_size 100m;
large_client_header_buffers 32 64k;
with no change.
I went a bit further:
- pages with node in content display the node, but no blocks at all. Probably a problem with block caching. I'll dive into that code ...
APC was the bad guy
I took me too much time .. umpf ! I wonder why nobody else got that problem.
Ubuntu Marverick (64bit) provides APC 3.1.3, witch was buggy for me. I had to upgrade (recompile) APC-3.1.7 from http://pecl.php.net/package/APC/3.1.7 (didn't test with 3.1.6)
=> now it works ;-)
Ah, then it (APC issue) was
Ah, then it (APC issue) was discussed here earlier: http://groups.drupal.org/node/113594