Pages hit by Boost Crawler are not being cached on Nginx Reverse Proxy

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
carn1x's picture

I am running an Nginx reverse proxy on port 80, with Apache behind on port 81. I do not have nginx set up to serve boost cache files, only static assets.

If I hit port 80 from my browser, or curl on the box itself, it generates a cached file. However the crawler fails to generate a file, despite Apache access log suggesting the crawler is regularly hitting pages (the site is not public, access is only allowed for IP exceptions).

I have setup the cron to hit either Nginx or Apache directly, neither seems to make much difference, both show hits from 127.0.0.1 to cron.php, /boost-crawler and various pages that must be originating from the crawler, but at no point are cache files generated.

After a couple of hours /boost-crawler is still being hit, but no crawling is happening at all now.

I am using Debug 9 but nothing in the dblog seems to point to any error, its simply full of the following messages:

boost 2012-07-17 10:32 Crawler - Thread 2 of 2 Done. Anonymous
boost 2012-07-17 10:32 Crawler - Thread 1 of 2 started Anonymous
cron 2012-07-17 10:32 Cron run completed. Anonymous
boost 2012-07-17 10:32 Crawler Sleep for 15 seconds Anonymous
boost 2012-07-17 10:32 Crawler Start ... Anonymous
boost 2012-07-17 10:32 Expired stale files from static page cache. Anonymous
boost 2012-07-17 10:32 Debug: boost_cache_expire_all_db() Following ...
According to the various buttons on the Boost Settings page there are:

  • Boost cached data: 0 pages
  • Boost expired data: 0 pages
  • Database records: 0
  • Files: 0

I have also posted this to the Boost Issue Queue: http://drupal.org/node/1688456

Thanks for any advice!

EDIT: I'm running D6

Comments

Try HTTPRL

mikeytown2's picture

Try switching over to http://drupal.org/project/httprl/ for crawling purposes. It's on my infinitely long to do list. Code in HTTPRL is much more solid then what is in Boost.

Or you can try

this is not related but i'm curious:

janton's picture

I'm curious with the cache_warmer.. can you also clear the cache? what if you use cache_warmer for also authenticated users and want to do some development on the site, will this be a pain is the ...

Ideally

perusio's picture

yes. Cache warmer is just a crawler. The cache control has to be done elsewhere. The all idea of microcaching is that you don't care about cache invalidation mechanism.You just set a low TTL and it just works.

There have been some discussions about the best (more Nginxy way) to do that. Instead of aping Varnish. It's on my TODO list.

Authenticated users have each his/her own cache. So not much point in using cache warmer. Unless you can guarantee that there are pages that are equal for everyone. It's complicated. For authenticated users you can use microcaching without the crawler or decompose your page and use SSI/ESI.

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week