I've just released a drush command that implements a crawler to keep the cache warm. Although it was developed with microcaching in mind you can use it for keeping any type of external cache warm. Varnish, or whatever type of caching proxy you decide to use.
-
To keep a cache warm just set a cron job of the command invocation that suits a particular site traffic patterns.
-
To prime a cache fire the command once with the proper options to get all the desired content in cache.
The command is quite light (although PHP's CLI is quite cumbersome usually) since it only boostraps the site to the DB phase.
Try it out and report back your findings. The first item on the TODO list is providing an alternative to Lua Socket using the cosocket facility of the Embedded Lua Module.

Comments
Looks Interesting
Left you a message: http://drupal.org/node/1426856
Blank pages
I use both microcaching and the cache warmer module, but the result is sometimes blank cached pages. Especially the front page.
Apparently the page stored in nginx's cache is empty, but checking the header with curl returns a hit.
I have never had this problem warming the cache "manually".
Any idea what the cause might be?
Well, what status
does cURL return?
My suggestion is to narrow the list down to a few pages and see what happens.
Also try disconnecting the drupal cache and see how it behaves.
Thanks
I've disabled drupal's own cache but the problem persist on the front page and some term pages.
cURL returns:
{"":{"timestamp":1332762301,"status":200,"time":0.384295}}
for the front page.
Try this:
while true; do curl -s http://yoursite > /tmp/r.txt; [ ! -s /tmp/r.txt ] && echo "problem" && exit 1; sleep 5; doneLet's see if anytime the returned page has 0 length.
Use
ctrl+Cto interrupt.Thanks
I solved the problem by having less content on the "problematic" pages such as /node and some terms with lots of teasers.
Individual nodes does not seem to cause any problems.
Thanks again :)