I was not sure where to post this question. Appologies if this is not the correct place.
I've been looking at PressFlow and it looks nice that it already comes packaged with several performance related patches. Thanks to those who made it available.
However, compared to latest Drupal snapshot, it seems to be missing latest Drupal core patches. If this happens with minor fixes it's ok, but it seems it could also happen with security related issues?
I'm considering the use of PressFlow for a high traffic site I'm porting to Drupal, but... I'm wondering how often is PressFlow being updated and how PressFlow users get notifications of such updates.
Also, is there any public resource where to ask for support for PressFlow related issues, bugs, or documentation about the patches included in PressFlow so that developers and site administrators can take them further should they need to for their particular use-case?
Would it be possible to release the performance related patches in PressFlow separately? That way we could simply get Drupal when a new version comes, and then apply the PressFlow patches.

Comments
In addition...
What is the current "version" of PressFlow?
Does it contain the latest 6.13 security fixes?
How quickly is PressFlow updated after a security release/patch is made public?
In all honesty...
PressFlow is a performance-based modified derivative of the Drupal CMS platform. You should really look at contacting someone at Four Kitchens on these questions.
At Morris Communications, we are currently using the PressFlow 6 platform for a number of sites we are migrating to it, but we are also making a number of changes to the platform as well for working better with load balancing, integrating the memcache/authcache modules and and unique squid/squirm patterns.
Just so you know, you cannot take a D6 patch and apply it to a PressFlow 6 build. It won't work.
-- Michael
Thanks, I already contacted Four Kitchens
I figured I would have to ask these questions using their contact form, but it was after a while of posting this here.
As for maintenance, 4
As for maintenance, 4 kitchens has already committed to immediately backporting all drupal.org security patches. economist.com and others have required this level of service before adopting pressflow. as a developer on economist.com, i can tell you that pressflow now has a track record of fulfilling this promise. i would not worry about this.
The pressflow patches are not itemized on the web AFAIK but you can discern them by diffing the code. Note their bzr repository is public. See http://fourkitchens.com/blog/2009/01/17/distributed-version-control-prov...
It would be nice to have a forum for pressflow users. Until we have that, I think this High Performance group is a pretty good proxy.
Pressflow takes great pains to remain API and schema compatible with core. There is little reason to avoid using it IMO.
Moshe, Thank you for your
Moshe,
Thank you for your reply. This is exactly what I wanted to hear.
I'm happy to hear the code is being maintained to such a high standard.
Sure, Thanks for the input
In fact, my interest in PF was more from an open source POV than a commercial interest.
About a couple of years ago we launched a site based on Drupal 5 that I had to tweak for performance and to support reverse proxy caching for anonymous users, as well as certain pages for authenticated users. I wrote about it here at g.d.o.
Since then, I participated in several Drupal core issues related to reverse proxy caching, in one way or another (see #147310 Implement better cache headers for reverse proxies). And my focus here was mostly related to Drupal 6. Why? Because we're working to port the main site (which is currently using a proprietary CMS) to Drupal 6 (too early for D7) with additional SN features for our user base. Problem seems to be that it is perhaps to late to apply certain patches to Drupal 6, even if these patches "take great pains to remain API and schema compatible with core", so now that we're at the latest stages of the project, I'm starting to research a way to use a Drupal 6 installation that works well with reverse proxies (Squid in our case). The site gets about 2 million pageviews a day, more or less, and we expect to raise our current numbers with the new site, so these things are pretty important for us.
I have 2 options: work out my own patch to Drupal 6 based on the work done on the Drupal issues queue, or... guess what, this has already been done here for PF, which is great. However, since PF is not a community initiative, I had some doubts about using PF directly or just as a source of stuff that has, more or less, being cooked in the Drupal issues queue.
The problem I see in PF, from my POV, is that it seems to be managed from a commercial approach, which is perfectly reasonable. But I have the feeling it will potentially benefit less people because the features it includes are not openly managed as if it was a community initiative.
I'm wondering if the page caching stuff in PF could be committed to Drupal 6 branch. Maybe it was possible if these enhancements are really "API and schema compatible with core". But I'm afraid Drupal core maintainers are busy with other things right now. Anyway, it would be nice if this approach was analyzed, I think.
If it was possible, then it would also be easy to share additional enhancements other could potentially add, for example to manage page caching for certain pages of the site for authenticated users, or to share the tricks to do so with the currently distributed page caching patch.
[EDIT] Following a link to Drupal.org Now With Caching I've found this issue in the Infra queue (#466444 Reverse Proxy Patch) where the page caching and lazy session patches are being used at d.o. I posted there about the idea to manage these patches using community shared resources, so it's easy for others to benefit from them, as well as to keep contributing back based on experience, fact that could be easily shared, I think. And this could potentially benefit D7 as well. The more of us who use this for D6, the more chances to return value to the community, and that means D7 and beyond.
@Swampcritter: I have to
@Swampcritter: I have to disagree when you say "you cannot take a D6 patch and apply it to a PressFlow 6 build. It won't work." On the contrary I'd say that usually core patches apply fine to Pressflow; only in edge cases there might be incompatibilities. As a rule of thumb I'd say that a core patch that doesn't apply cleanly to Pressflow should be inspected twice before suspecting Pressflow.
For Pressflow users that are not clients of Four Kitchen, the main problem with Pressflow is that it's pretty intransparent what happens, and when; e.g. there is no central platform where releases are announced or packaged releases can be downloaded; Pressflow used to be on Launchpad but seems to have abandoned this platform in favour of Github. The changes from Drupal 6.23 (SA-CORE-2012-001) seem to have made it into Github, but I can not find any trace of the changes from Drupal 6.24 so far. I don't doubt that these changes will make it into Pressflow sometime, it's simply more complicated to follow since there doesn't seem to be a consistent release or announcement procedure (at least none I have understood so far).
You can also try our
You can also try our Pressflow 6.24 +Extra fork available on GitHub: https://github.com/omega8cc/pressflow6
We merged in all latest vanilla 6.24 changes without issues. It includes also some extra patches discussed here: http://groups.drupal.org/node/187209#comment-650678
Enjoy!
Thanks for the pointer!
Thanks for the pointer! I'm still confused since I cant find a reference to Pressflow 6.24 anywhere; Github gives me currently 'pressflow-6-6.23-0-gf5c736a.zip' from master. From which codebase are you operating, and how safe is it to use 'omega8cc-pressflow6-pressflow-6.24-plus-5-g4b05a21.zip' in production? If I were to use your fork, what kind of testing should I do prior to deploying? Or would it be advisable to just go from Pressflow 6.22.x to 'pressflow-6-6.23-0-gf5c736a.zip' and run 'omega8cc-pressflow6-pressflow-6.24-plus-5-g4b05a21.zip' in sandbox environments?
One WSOD
I deployed 'omega8cc-pressflow6-pressflow-6.24-plus-5-xxx.zip' on a number of production sites; so far all is working well on roughly two dozend sites, except for one site that dies with a WSOD; from Apache's error.log:
[Sat Feb 11 02:32:17 2012] [error] [client 123.456.789.123] PHP Fatal error: Allowed memory size of 262144000 bytes exhausted (tried to allocate 24 bytes) in /var/www/drupal/modules/taxonomy/taxonomy.module on line 908I raised the PHP memory limit to 1024M (default: 250M) which did not resolve this issue. Also downgraded to a vanilla Drupal core 6.24, with the same result, so it's a problem with one of the newly introduced bugfixes.
Too many terms
First, try 300M and see if it works, before you go all the way to 1024M.
You probably have too many terms, most likely free tagging, with lots of terms.
How many rows are in the term_data table?
On a site with over 26,000 rows, we used this to clean the tables.
CAUTION! UNTESTED! DANGEROUS!
Backup the database first, test on a development server, then run them, and see what they do, specifically the row count in term_data and term_node before/after.
If all is well, then document the process and then we can run it on the live server.
Delete long tags
delete from term_node where tid in (select tid from term_data where length(name) > 40);
delete from term_hierarchy where tid in (select tid from term_data where length(name) > 40);
delete from term_data where length(name) > 40;
Delete tags used only once
1 - First get the TIDs
mysql -uroot -p -e "select tid, ',' from term_node group by tid having count(nid) <= 1 order by tid" live > tids.txt
2 - Use the tids in the following scripts
delete from term_node where tid in ()
delete from term_hierarchy where tid in ();
delete from term_data where tid in ();
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
Core Replacement for taxonomy_get_tree
Here is line 908. It's inside of taxonomy_get_tree(). If removing terms is not an option for you then take a look at http://drupal.org/project/taxonomy_edge (link to this module is near the bottom of the wiki). Note that the module requires one to patch core for it to work correctly.
Good Luck!
Waht is "too many terms"?
Hi Khalid,
thanks for your suggestions; the site has around 30k nodes and a couple of hundred users, that's by far not "large" for my understanding (Wikipedia is large, with several millions of nodes, running on the same software stack like Drupal); I have managed larger sites even with Microsoft Frontpage, purely file system based and without any database backing; I simply refuse to believe that Drupal scales significantly worse than a Windows-based HTML editor.
The term_data table for the WSOD site reports 6,536 rows, which I'd consider quite small; I have other D6 sites with significantly more items in the term_data table (>15k rows) which run smoothly; and even 15k rows in a database table are peanuts, a database does nothing but manage large sets of data. If a relational database management system would freak out at 15k, or 25k, or 50k, or even 150k rows of data, it'd be total crap. So I'm not sure what "too many terms" means; I have never read about hard limits for the size of Drupal's taxonomy; if there would be known limits, they would have to be documented prominently - very prominently, IMHO. Actually, such limits would be total showstoppers.
Last but not least - even if I would accept taxonomy limits for a fact - this wouldn't explain why the site dies after a minor maintenance update (which actually is supposed to have resolved memory isses, taxonomy_get_tree() memory issues</a).
When I'm running into potentially memory related WSODs, I'm increasing the memory in steps like 250, 275, 300, 350, 512, 1024, 2048, 4096. In this case, the stepping absolutely doesn't matter.
Long tags:
mysql> select * from term_node where tid in (select tid from term_data where length(name) > 40);
239 rows in set (0.51 sec)
mysql> select * from term_hierarchy where tid in (select tid from term_data where length(name) > 40);
74 rows in set (0.02 sec)
mysql> select * from term_data where length(name) > 40;
72 rows in set (0.00 sec)
Do you see a problem here? I don't. For amounts of data like this I don't even need a spreadsheet software, just some sheets of paper. I simply can not believe that Drupal has any problems with this, and I might be forced to start castrating the site.
I apologize if this posting sound sceptical; I am no expert in Drupal programming, but I'm using databases for over two decades and believe to know what they were able to do when dBase was considered high tech. I simply can not imagine that Drupal regresses beyond what was possible 25 years ago, megabytes of system memory. Even more, I'm scared shi**ess to maybe have invested years of my life in a software that might not even be able to process the data aequivalent of a couple of books. I was - and I still am - expecting Drupal to being able to scale to the data aequivalent of a full-grown library. Like, for example, the MediaWiki software does.
False assumption
You can't compare MediaWiki to Drupal. They are totally different platforms. Yes, they use PHP, but they are different.
The issue here is that taxonomy module was written initially without Free Tagging at all, and then free tagging was added, and people use it in ways that were never imagined. Same thing when we cached all URL aliases in memory, and then changed it to be a database lookup in the next release, because people used the software in ways the developers never imagined.
This is not the databse that is blowing out, it is the data structure within the taxonomy_get_tree() function that is using up too much memory building a tree. Here is the source code.
The issue you mentioned fixes the problem in Drupal 7.x, but you are using Drupal 6.x still. That is why. Try to backport the patch.
I mentioned one case where we had 26,000 and faced memory issues. We trimmed the term_data table to 14,522 s and the site runs with PHP's memory_size set to 150MB, without any issues. YMMV.
This is just one way. I can't believe that 1024MB is not enough. We seldom see sites needing 256MB.
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
@Khalid: You seldom see sites
@Khalid: You seldom see sites needing a PHP memory limit of 256M, because you are willing to sacrifice data (like 10k taxonomy terms) and power (like 'Views' vs. manually coded SQL) in favour of slim and fast setups. That's a valid approach, but sure it isn't mine. As Mike put it, removing terms from live sites is not really an option for me. If we have to start deleting data from our Drupal sites, that would simply mean "Goodbye Drupal" for us. However, a lack of memory is not the issue here, at least not within sane boundaries. I can set the PHP memory limit even to 4096M, and the site still WSOD's; it just does take longer, and emits more PHP fatal errors into Apache's watchdog log (actually I don't know which PHP memory limit is really used by Drupal; the "Suhosin" crap might interfere (it has before), but I don't see traces of Suhosin interventions in my logs).
The mentioned issue taxonomy_get_tree() memory issues is listed in the changelog for Drupal core 6.24 ("#556842 by mh86, catch, bangpound, wojtha, deviantintegral: optimize taxonomy_get_tree() by building the tree internally instead of recursively to improve performance; especially good for bigger taxonomies"), that's why I stumbled over it. The backport against 6.x-dev was accomplished by 'wojtha' in April 6, 2011, and it passed 190 MySQL simpletests; an enhanced patch by 'deviantintegral' followed in past October, and was finally committed by Gábor in January (#119; I did not check the committ logs because I still trust what core maintainers are saying in the issue queues). So if the
taxonomy_get_tree()issue is actually related to my WSOD, it should be resolved for D6, or the patch has introduced a major regression.(Off topic: Yes, I imho can and I will continue to compare Drupal and MediaWiki. They are not "totally different platforms", they are closely related and share more than just the application language. They share, for example, the complete software stack they're based on, they're both developed as Free software, they both follow the node paradigma of hypertext, etc.; and I'm even using MediaWiki markup in my Drupal sites, btw).
@Mike: Thanks for the pointer; I read the reference to 'taxonomy_edge' in the
taxonomy_get_tree()issue and installed the module on the WSOD site. Drush reports:Taxonomy edges rebuilt: 20573 processed with depth 8 in 0.324 secondsThe core patch is labeled "taxonomy-6.20.patch" and might not be rolled against a current Drupal core release; applying it against 6.24 fails:
# patch < taxonomy-6.20.patchpatching file taxonomy.module
Hunk #1 FAILED at 835.
Hunk #2 FAILED at 1130.
2 out of 2 hunks FAILED -- saving rejects to file taxonomy.module.rej
Without the patch (according to README.txt the patch is not required, but recommended), 'taxonomy_edge' does not resolve the WSOD. However, the module also has other requirements, like "Elysia Cron or Parallel Cron [...] for cronjob to work" which my site doesn't match. Well it was worth a try (follow-up issue).
Generally: The "Pressflow 6.24 +Extra fork" referenced in http://groups.drupal.org/node/25689#comment-684948 so far hasn't caused similar issues on roghly two dozend other Drupal sites (directly updated from Pressflow 6.22.x to 'omega8cc-pressflow6-pressflow-6.24-plus-5'). However, at least for logged-in users, the sites perform significantly slower (according to my stats, currently with 99.9% hits for the APC opcode cache, and 67.7% hits for memcached); e.g. saving a node now takes 8-14 seconds, opposed to 2-4 seconds with Pressflow 6.22. That's probably one of the reasons why there isn't an official Pressflow 6.24 yet.
You are missing the point
If you need 1024GB for many pages, then with an average of 35 PHP processes, how big is the memory required for a server? What about a large site with 50+ processes. You see where this is going?
All this says something is seriously wrong, and there are many ways to deal with it, backport a patch, trim the dataset from spelling mistakes and such, ...etc.
Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.
Quick update: Downgraded to a
Quick update: Downgraded to a vanilla Drupal core 6.23, and the site was back immediately (Pressflow 6.23.x works smoothly, as well); so far I only get an WSOD when accessing the 'Status report' page. This might be another WSOD since it isn't logged in Apache's error log, but in syslog:
Feb 11 15:06:19 {myhost} suhosin[18545]: ALERT - script tried to increase memory_limit to 262144000 bytes which is above the allowed value (attacker 'REMOTE_ADDR not set', file '/var/www/drupal/sites/default/settings.php', line 140)Feb 11 15:06:30 {myhost} kernel: grsec: From {myip}: Segmentation fault occurred at 00000021000047e4 in /usr/lib/apache2/mpm-prefork/apache2[apache2:18404] uid/euid:33/33 gid/egid:33/33, parent /usr/lib/apache2/mpm-prefork/apache2[apache2:28247] uid/euid:0/0 gid/egid:0/0
According to phpinfo(), the site runs with 250M, and has
suhosin.memory_limit = 0:; As long scripts are not running within safe_mode they are free to change the; memory_limit to whatever value they want. Suhosin changes this fact and
; disallows setting the memory_limit to a value greater than the one the script
; started with, when this option is left at 0. A value greater than 0 means
; that Suhosin will disallows scripts setting the memory_limit to a value above
; this configured hard limit. This is for example usefull if you want to run
; the script normaly with a limit of 16M but image processing scripts may raise
; it to 20M.
Now:
suhosin.memory_limit = 512M, and the "Status report" page is back on Drupal 6.23, but syslog still reports a Segmentation fault at various addresses. This is beyond my league.One step ahead, two steps back ;)
is there a safe way to convert pressflow to drupal
Hi all,
I don't know if this is a the right place to write this, but I hope it's,
I have a website was built on pressflow and the client want to upgrade it to D7, my question here is : is there a way to convert pressflow drupal to D6?
thanks
keep it simple
I would suggest that you upgrade to D7 Pressflow.
In fact, you can follow these instructions and be sure that, instead of using D7 core, you use Pressflow 7 core.
http://gerardmcgarry.com/blog/my-drupal-6-7-upgrade-process
Depending on the complexity of your site and/or the presence of custom modules, you may need to build out the functionality in a fresh D7 Pressflow site then migrate the content over.
It's up to you.
Of course, if you really want to downgrade from pressflow to D6 before you upgrade, then you should be fine by taking your sites/ directory and placing it on a fresh D6 code-base then running update.php
Let us know what you end up doing and how it goes!
Going from Pressflow6 to
Going from Pressflow6 to Drupal6 or Pressflow6 to Drupal7 is pretty straightforward - basically no different from doing a normal core upgrade. Of course it should be done on a test site first :)
Pressflow7 has very few changes compared to Drupal7. I suggest not using Pressflow7 at this point unless there is something in it you specifically know that you need.
knaddison blog | Morris Animal Foundation