Lazy sessions with PF5 and Varnish problem

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
cdoyle's picture

I'm having trouble with a Pressflow install and its lazy session handling.

The issue
Anonymous page views are still getting a session cookie and thus the Cache-Control:max-age header is being set to 0, making it difficult to have Varnish cache pages.

Things to note
I am using the Drupal 5 version of Pressflow (5.21.47) and thus there is no "external cache" setting on the performance tab, but the caching mechanism has been set to Aggressive.

The memcache module D5 patch was applied to PF.

I disabled almost all the modules and this was still a problem. The only remaining modules were: memcache, core - required, ad serving module (with no references to SESSION or COOKIE), jquery update, and maybe one other. At any rate, I tried this with a stock PF5 install with only the memcache module and the Garland theme and had the same issue. It is possible but I don't believe the modules or the theme are at fault.

The varnish config I'm using is based on gchaix's http://blogs.osuosl.org/gchaix/2010/01/23/varnish-config-defaultvcl/

It works, sort of
The problem seems to be that line 790 (or so) in bootstrap.inc requires that there be no session for max-age to be set correctly.
$max_age = variable_get('cache', CACHE_DISABLED) == CACHE_AGGRESSIVE && (!isset($_COOKIE[session_name()]) || isset($hook_boot_headers['vary'])) ? variable_get('cache_lifetime', 0) : 0;
It seemed to me that, since a session cookie was being set for anonymous users, the max-age would never be set correctly. I threw a hack in my bootstrap.inc to see if this was the only issue preventing Varnish from caching pages so I put unset($_COOKIE[session_name()]); on the line preceding the max age code and low, max age was set correctly. This allowed varnish to begin caching pages. However, I'm sure that I shouldn't have to write a bootstrap hack to get caching to work in PF5 so I would appreciate any advice as to what I'm doing wrong or what else I can try. Thanks!

Greg suggested using http://varnish-cache.org/wiki/VCLExampleLongerCaching and ignoring max-age which is a possibility but I would rather have the correct headers and not have to hack around them.

Comments

We had the same issue with

adrifl's picture

We had the same issue with Pressflow 5 / Varnish and tried the following solution which seems to work so far.

In bootstrap.inc
function drupal_page_cache_header(stdClass $cache) { }
we define
global $user;
and set a DRUPAL_LOGGED_IN cookie for logged-in users.

// $max_age = variable_get('cache', CACHE_DISABLED) == CACHE_AGGRESSIVE && (!isset($_COOKIE[session_name()]) ||  isset($hook_boot_headers['vary'])) ? variable_get('cache_lifetime', 0) : 0;

  if ($user->uid > 0){
    setcookie('DRUPAL_LOGGED_IN', 1);
    $max_age = 0;
  }
  else {
    $max_age = 300;
  }

Our varnish .vcl file reads as follows:

#This is a basic VCL configuration file for varnish.  See the vcl(7)
#man page for details on VCL syntax and semantics.
#
#Default backend definition.  Set this to point to your content
#server.
#
backend webserver {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout = 600s;
    .first_byte_timeout = 600s;
    .between_bytes_timeout = 600s;
    .max_connections = 250;
}


#Below is a commented-out copy of the default VCL logic.  If you
#redefine any of these subroutines, the built-in logic will be
#appended to your code.
#
sub vcl_recv {
    # Use the default backend for all other requests
    set req.backend = webserver; 

    # Allow a grace period for offering "stale" data in case backend lags
    #set req.grace = 60s;
    set req.grace = 5m;

    remove req.http.X-Forwarded-For;
    set req.http.X-Forwarded-For = client.ip;

    # Properly handle different encoding types
    if (req.http.Accept-Encoding) {
        if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

    # Force lookup if the request is a no-cache request from the client
    if (req.http.Cache-Control ~ "no-cache") {
        return (pass);
    }

    ## Default request checks
    if (req.request != "GET" &&
        req.request != "HEAD" &&
        req.request != "PUT" &&
        req.request != "POST" &&
        req.request != "TRACE" &&
        req.request != "OPTIONS" &&
        req.request != "DELETE") {
            # Non-RFC2616 or CONNECT which is weird.
            return (pipe);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        return (pass);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        return (pass);
    }

    ## Modified from default to allow caching if cookies are set, but not http auth
    if (req.http.Authorization) {
        /* Not cacheable by default /
        return (pass);
    }

    # ORVSD tweaks
    ## Remove has_js and Google Analytics cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s
)(__[a-z]+|has_js)=[^;]", "");
    ## Remove a ";" prefix, if present.
    set req.http.Cookie = regsub(req.http.Cookie, "^;\s
", "");
    ## Remove empty cookies.
    if (req.http.Cookie ~ "^\s$") {
        unset req.http.Cookie;
    }
    ## Catch Drupal theme files  - THIS BREAKS UPDATE.PHP DO NOT USE
    #if (req.url ~ "^/sites/") {
    #    unset req.http.Cookie;
    #}
    # Catch Drupal misc files (like drupal.js and jquery.js)
    #if (req.url ~ "^/misc/") {
    #    unset req.http.Cookie;
    #}
 
    # Drupal js/css doesn't need cookies, cache them
    if (req.url ~ "^/modules/.
.(js|css)\?") {
        unset req.http.Cookie;
    }

    ## Pass cron jobs and server-status
    if (req.url ~ "cron.php") {
       return (pass);
    }
    if (req.url ~ ".*/server-status$") {
       return (pass);
    }

    ## Don't cache install
    if (req.url ~ "install.php") {
        return (pass);
    }

    ## Don't cache Drupal logged-in user sessions
    if (req.http.Cookie ~ "(VARNISH|DRUPAL_LOGGED_IN)") {
        return (pass);
    }

    return (lookup);
}

# More ORVSD tweaks
# Per-session cache
sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }

sub vcl_deliver {
#    return (deliver);
   #add cache hit data
   if (obj.hits > 0) {
     #if hit add hit count
     set resp.http.X-Cache = "HIT";
     set resp.http.X-Cache-Hits = obj.hits;
   } else {
     set resp.http.X-Cache = "MISS";
   }
}

sub vcl_error {
    if (obj.status == 503 && req.restarts < 5) {
        set obj.http.X-Restarts = req.restarts;
        restart;
    }
}

It would be great to find a solution without hacking bootstrap.inc. Any hints and further tuning tips are welcome.

PF5 Varnish cache without hacking bootstrap.inc

cdoyle's picture

I ended up going with the Varnish example that Greg suggested (http://varnish-cache.org/wiki/VCLExampleLongerCaching) which will ignore the max-age and set TTLs as it sees fit. The following config is working for me with no bootstrap.inc hacking. The "magic" is in the vcl_fetch and vcl_deliver under the commented sections.

# Based on http://blogs.osuosl.org/gchaix/2010/01/23/varnish-config-defaultvcl/
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.

#
#Default backend definition.  Set this to point to your content
#server.
#
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout = 600s;
    .first_byte_timeout = 600s;
    .between_bytes_timeout = 600s;
}

sub vcl_recv {
    set req.backend = default;

    # Allow a grace period for offering "stale" data in case backend lags
    set req.grace = 5m;

    remove req.http.X-Forwarded-For;
    set req.http.X-Forwarded-For = client.ip;

    # Properly handle different encoding types
    if (req.http.Accept-Encoding) {
        if (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

    # Force lookup if the request is a no-cache request from the client
    if (req.http.Cache-Control ~ "no-cache") {
        pass;
    }

    ## Default request checks
    if (req.request != "GET" &&
        req.request != "HEAD" &&
        req.request != "PUT" &&
        req.request != "POST" &&
        req.request != "TRACE" &&
        req.request != "OPTIONS" &&
        req.request != "DELETE") {
            # Non-RFC2616 or CONNECT which is weird.
            pipe;
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        pass;
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        pass;
    }

    ## Remove has_js and Google Analytics cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z]+|has_js)=[^;]*", "");
    ## Remove a ";" prefix, if present.
    set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
    ## Remove empty cookies.
    if (req.http.Cookie ~ "^\s*$") {
        unset req.http.Cookie;
    }
    # Drupal js/css doesn't need cookies, cache them
    if (req.url ~ "^/modules/.*.(js|css)\?") {
        unset req.http.Cookie;
    }

    ## Pass cron jobs and server-status
    if (req.url ~ "cron.php") {
       pass;
    }
    if (req.url ~ ".*/server-status$") {
       pass;
    }
    ## Pass on admin pages
    if (req.url ~ "^/admin/.*$") {
       pass;
    }

    ## Don't cache install
    if (req.url ~ "install.php") {
        pass;
    }

    if (req.url ~ "file.php") {
        pass;
    }

    ## Don't cache Drupal logged-in user sessions
    if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID)") {
        pass;
    }

    # special cookie set in custom Drupal module for logged in users
    if (req.http.Cookie ~ "phpbb2mysql_data") {
      pass;
    }

    lookup;
}

# Per-session cache
sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }

sub vcl_fetch {
    # These status codes should always pass through and never cache.
    if (obj.status == 404 || obj.status == 503 || obj.status == 500) {
        set obj.http.X-Cacheable = "NO: obj.status";
        set obj.http.X-Cacheable-status = obj.status;
        pass;
    }

    # Grace to allow varnish to serve content if backend is lagged
    set obj.grace = 5m;

    # don't cache
    if (obj.http.Pragma ~ "no-cache" || obj.http.Cache-Control ~ "no-cache" || obj.http.Cache-Control ~ "private") {
      pass;
    }

    if (!obj.cacheable) {
        set obj.http.X-Cacheable = "NO: !obj.cacheable";
        pass;
    } else {
      # From http://varnish-cache.org/wiki/VCLExampleLongerCaching
      /* Remove Expires from backend, it's not long enough */
      unset obj.http.expires;

      /* Set the clients TTL on this object and set how long Varnish will keep it (could be different) */
          set obj.http.cache-control = regsub(obj.http.X-Drupal-Cache-Lifetime, "(.*)", "public, max-age=\1");
## These TTLs are based on the specific paths we're using and may not apply
## to your site so feel free to modify as you see fit.  You could just set a single default TTL if you want.
          if (req.url ~ "(.js|.css)$") {
            set obj.ttl = 30m;               // js and css files ttl 10 minutes
          } else if (req.url ~ "(^/articles/)|(^/tags/)|(^/taxonomy/)") {
            set obj.ttl = 10m;              // list page ttl 10 minutes
          } else if (req.url ~ "^/article/") {
            set obj.ttl = 5m;             // article ttl 5 minutes
          } else {
            set obj.ttl = 5m;               // default ttl 5 minute
          }

      /* marker for vcl_deliver to reset Age: */
      set obj.http.magicmarker = "1";
 
      # All tests passed, therefore item is cacheable
      set obj.http.X-Cacheable = "YES";
    }
    deliver;
}

sub vcl_deliver {

  # From http://varnish-cache.org/wiki/VCLExampleLongerCaching
  if (resp.http.magicmarker) {
     /* Remove the magic marker */
     unset resp.http.magicmarker;

     /* By definition we have a fresh object */
     set resp.http.age = "0";
   }

   #add cache hit data
   if (obj.hits > 0) {
     #if hit add hit count
     set resp.http.X-Cache = "HIT";
     set resp.http.X-Cache-Hits = obj.hits;
   } else {
     set resp.http.X-Cache = "MISS";
   }
}

sub vcl_error {
    if (obj.status == 503 && req.restarts < 5) {
        set obj.http.X-Restarts = req.restarts;
        restart;
    }
}

The part I couldn't get working was this:

      /* Set the clients TTL on this object and set how long Varnish will keep it (could be different) /
      if (obj.http.X-Drupal-Cache-Lifetime ~ "[0-9]+") {
          set obj.http.cache-control = regsub(obj.http.X-Drupal-Cache-Lifetime, "(.
)", "public, max-age=\1");
          C{
             char *ttl;
             ttl = VRT_GetHdr(sp, HDR_REQ, "\030X-Drupal-Cache-Lifetime:");   //\030 is string length (24) in octal... is this in HDR_OBJ or HDR_REQ?
             VRT_l_obj_ttl(sp, atof(ttl));
          }C
     } else {
        # Couldn't find a Drupal cache lifetime so don't cache
        set obj.http.cache-control = "no-cache, must-revalidate, post-check=0, pre-check=0, max-age=0";
        set obj.ttl = 0m;   // you could cache for a longer default time if you like
      }

The VRT_GetHdr just wouldn't return what I wanted and I didn't have a good way to debug the VCL so I went ahead and configured static TTLs. The X-Drupal-Cache-Lifetime header was being set in a custom module's hook_menu in a no-cache block with:
    header('X-Drupal-Cache-Lifetime: ' .  variable_get('cache_lifetime', 0));

That way, you could still control the TTL using the Drupal cache settings. If anyone can get the VRT_GetHdr part to work correctly, please let me know.

file.php

gchaix's picture

You can drop the file.php directive in there:

if (req.url ~ "file.php") {
        pass;
    }

That's my hacking around Moodle behavior (it really doesn't like to be cached).

Whoops, missed that one,

cdoyle's picture

Whoops, missed that one, thanks.

Oh, I should also note that

cdoyle's picture

Oh, I should also note that you could probably leave the set obj.http.cache-control = regsub(obj.http.X-Drupal-Cache-Lifetime, "(.)", "public, max-age=\1"); line out of your code or set it to a static value so there would be no changes on the Drupal side.

The per session cache was

cdoyle's picture

The per session cache was causing us problems with hit rate. Commenting this out raised our hit rates from somewhere in the 20-30% range to around 80%. This could be related to the session cookie still being set in PF5. We also explicitly started caching RSS feeds because they were being sent from Drupal with the no-cache cache-control header.

# Per-session cache
#sub vcl_hash { if (req.http.Cookie) { set req.hash += req.http.Cookie; } }

Good point

gchaix's picture

That per-session cache pre-dates my attempts to cache even when a SESS cookie is set. Removing it may well help ... thanks for pointing it out. I'll have to do a bit of testing to see.

-Greg

Drupal 6 varnish config

cdoyle's picture

Just as an FYI, here's the new default.vcl I've been working on for Drupal 6. This config is designed to work with Varnish 2.1. If you're using a version prior to that you will probably have to change the beresp variable back to obj and may have to remove the return(). I'll note that you need to make sure you're removing all cookies on anonymous pages and check (via httpfox) that you're actually caching the pages you want. There is some site specific cookie removal in this config including removing phpbb3 cookies that you should take a look at.

This config assumes you're using Pressflow 6 with the Varnish module installed and the site configured for external caching (on the performance tab). The main upgrade here is using C to set the ttl based on the max-age cookie provided by Drupal.

# Based on http://blogs.osuosl.org/gchaix/2011/01/23/varnish-config-defaultvcl/
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.

#
#Default backend definition.  Set this to point to your content
#server.
#
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout = 600s;
    .first_byte_timeout = 600s;
    .between_bytes_timeout = 600s;
#    .max_connections = 250;
}

C{
#include <errno.h>
#include <limits.h>
}C

sub vcl_recv {
    set req.backend = default;

    # Allow a grace period for offering "stale" data in case backend lags
    set req.grace = 5m;

    # discard X-Forwarded-For header and use real IP (for use in Apache logs, etc.)
    remove req.http.X-Forwarded-For;
    set req.http.X-Forwarded-For = client.ip;

    # Properly handle different encoding types
    # as per: http://varnish-cache.org/wiki/FAQ/Compression
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

    # Force lookup if the request is a no-cache request from the client
    if (req.http.Cache-Control ~ "no-cache") {
        return (pass);
    }

    ## Default request checks
    if (req.request != "GET" &&
        req.request != "HEAD" &&
        req.request != "PUT" &&
        req.request != "POST" &&
        req.request != "TRACE" &&
        req.request != "OPTIONS" &&
        req.request != "DELETE") {
            # Non-RFC2616 or CONNECT which is weird.
            return (pipe);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        return (pass);
    }

    # Drupal js/css doesn't need cookies, cache them
    if (req.url ~ "^/modules/.*\.(js|css)\?") {
        unset req.http.Cookie;
    }

    ## Don't cache Drupal logged-in user sessions
    if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID)") {
        return (pass);
    } else {
        ## Remove phpbb3 cookies for all non-logged in users.
        set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(phpbb3_[a-z0-9\-_]*)=[^;]*", "");
    }

    ## Remove interstitial cookies.  These are site specific but you may need to do something similar
    #set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(ns_session|ns_cookietest|Visited)=[^;]*", "");
    ## Remove has_js and Google Analytics cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-z0-9_]+|has_js)=[^;]*", "");
    ## Remove imce module cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(icid|idsc|1h1|1h2|iw1|iw2)=[^;]*", "");
    ## Remove a ";" prefix, if present.
    set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");
    ## Remove empty cookies.
    if (req.http.Cookie ~ "^\s*$") {
        unset req.http.Cookie;
    }

    ## Pass cron jobs and server-status
    if (req.url ~ "cron.php") {
       return (pass);
    }
    if (req.url ~ ".*/server-status$") {
       return (pass);
    }
    ## Pass on admin pages
    if (req.url ~ "^/admin/.*$") {
       return (pass);
    }

    ## Pass on forum pages
    if (req.url ~ "^/forums/.*$") {
       return (pass);
    }

    ## Don't cache install
    if (req.url ~ "install.php") {
        return (pass);
    }

    return (lookup);
}

sub vcl_fetch {
    # Grace to allow varnish to serve content if backend is lagged
    set beresp.grace = 5m;

    # These status codes should always pass through and never cache.
    if (beresp.status == 404 || beresp.status == 503 || beresp.status == 500) {
        set beresp.http.X-Cacheable = "NO: beresp.status";
        set beresp.http.X-Cacheable-status = beresp.status;
        return (pass);
    }

    # if no-cache headers set, don't cache unless it's a feed
    if ((!req.url ~ "/feed$") && (beresp.http.Pragma ~ "no-cache" || beresp.http.Cache-Control ~ "no-cache" || beresp.http.Cache-Control ~ "private")) {
        return (pass);
    }

    # cache static assets for a long time regardless of headers
    if (req.url ~ "\.(jpg|jpeg|png|gif||mp3|ogg)$") {
        set beresp.ttl = 1w;
    } else if (beresp.cacheable) {
        /* set ttl based on Cache-Control header set in Drupal */
        /* from http://www.varnish-cache.org/trac/wiki/VCLExampleExtendingCacheControl */
        call extended_cache_control;

        # All tests passed, therefore item is cacheable
        set beresp.http.X-Cacheable = "YES";
    } else {
        set beresp.http.X-Cacheable = "NO: !beresp.cacheable";
        return (pass);
    }
    return (deliver);
}

sub extended_cache_control
{  
    if (beresp.http.Cache-Control ~ "max-age=[0-9]+") {

        /* Copy the ttl part from original header */
        set beresp.http.X-Cache-Control-TTL = regsub(beresp.http.Cache-Control, ".*max-age=([0-9]+).*", "\1");
C{
        {  
            char *x_end = 0;
            const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\024X-Cache-Control-TTL:");
            if (x_hdr_val) {
                long x_cache_ttl = strtol(x_hdr_val, &x_end, 0);
                if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) {
                    VRT_l_beresp_ttl(sp, (x_cache_ttl * 1));
                }
            }
        }
}C
        unset beresp.http.X-Cache-Control-TTL;
    }
}

sub vcl_deliver {
    # http://www.varnish-cache.org/trac/wiki/VCLExampleHitMissHeader
    # add cache hit data
    if (obj.hits > 0) {
        # if hit add hit count
        set resp.http.X-Cache = "HIT";
        set resp.http.X-Cache-Hits = obj.hits;
    } else {
        set resp.http.X-Cache = "MISS";
    }
}

sub vcl_error {
    if (obj.status == 503 && req.restarts < 5) {
        set obj.http.X-Restarts = req.restarts;
        restart;
    }
}

# Added to let users force refresh
sub vcl_hit {
    if (!obj.cacheable) {
        return (pass);
    }

    if (req.http.Cache-Control ~ "no-cache") {
        # Ignore requests via proxy caches,  IE users and badly behaved crawlers
        # like msnbot that send no-cache with every request.
        if (! (req.http.Via || req.http.User-Agent ~ "bot|MSIE")) {
            set obj.ttl = 0s;
            return (restart);
        }
    }
    return (deliver);
}

Regionalization using Varnish

cdoyle's picture

As a quick addendum to my last Drupal 6 Varnish config, here's a regionalization trick I added to one of our sites. This allows you to serve a regionalized version of the site through varnish and set a header so Drupal can tell which version of the uncached site it should be serving. It uses the https://github.com/meetup/varnish-geoip-plugin geoip plugin for varnish.

The header X-GeoIP-Region is being set and sent to Drupal so the proper version of the page can be generated and cached (with vcl_hash taking the region into account). This means that each region you want to serve for will have its own version of each page stored in varnish but it worked for our application.

I had to modify /usr/local/plugins/geoip_plugin.vcl to add another header with the country (inserted next to the city and lat/long) in the version of geoip that I used.

VRT_SetHdr(sp, HDR_REQ, "\020X-GeoIP-Country:", country, vrt_magic_string_end);

Here's the default.vcl:

# Based on http://blogs.osuosl.org/gchaix/2011/01/23/varnish-config-defaultvcl/
# This is a basic VCL configuration file for varnish.  See the vcl(7)
# man page for details on VCL syntax and semantics.

# GeoIP plugin from https://github.com/meetup/varnish-geoip-plugin
include "/usr/local/plugins/geoip_plugin.vcl";

#
#Default backend definition.  Set this to point to your content
#server.
#
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .connect_timeout = 20s;
    .first_byte_timeout = 20s;
    .between_bytes_timeout = 5s;
    .max_connections = 68;
}

C{
#include <errno.h>
#include <limits.h>
}C

sub vcl_recv {
    set req.backend = default;

    # Allow a grace period for offering "stale" data in case backend lags
    set req.grace = 5m;

    # discard X-Forwarded-For header and use real IP (for use in Apache logs, etc.)
    remove req.http.X-Forwarded-For;
    set req.http.X-Forwarded-For = client.ip;

    ## Pass cron jobs and server-status
    if (req.url ~ "cron.php") {
       return (pass);
    }
    if (req.url ~ ".*/server-status$") {
       return (pass);
    }
    ## Pass on admin pages
    if (req.url ~ "^/admin/.*$") {
       return (pass);
    }

    ## Pass on forum pages
    if (req.url ~ "^/forums/.*$") {
       return (pass);
    }

    ## Don't cache install
    if (req.url ~ "install.php") {
        return (pass);
    }

    # Check to see if the geo targeting cookie is present on the client and
    # set a header if it isn't
    ## was if (req.http.Cookie ~ "((?!GeoIP-Country).)*") {
    if (!req.http.Cookie ~ "GeoIP-Country") {
        call geocode_header;        ## do GeoIP lookup and set X-GeoIP-Country header
    } else {
        # set request header from cookie before cookie is removed
        set req.http.X-GeoIP-Country = regsub(req.http.Cookie, "^.*GeoIP-Country=([^;]*).*$", "\1");
        set req.http.X-GeoIP-Cookie-Set = "1";    # set temporary header to pass info to vcl_deliver
    }
    # set Region header based on either cookie or GeoIP lookup
    if (req.http.X-GeoIP-Country ~ "(GB|UK|BE|SE|AU|CY|DK|FI|FR|DE|GR|IE|IT|MT|NL|NZ|NO|PT|ES)") {
        set req.http.X-GeoIP-Region = "UK";
    } else {
        set req.http.X-GeoIP-Region = "US";         # Default region
    }

    # Properly handle different encoding types
    # as per: http://varnish-cache.org/wiki/FAQ/Compression
    if (req.http.Accept-Encoding) {
        if (req.url ~ "\.(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
            # No point in compressing these
            remove req.http.Accept-Encoding;
        } elsif (req.http.Accept-Encoding ~ "gzip") {
            set req.http.Accept-Encoding = "gzip";
        } elsif (req.http.Accept-Encoding ~ "deflate") {
            set req.http.Accept-Encoding = "deflate";
        } else {
            # unkown algorithm
            remove req.http.Accept-Encoding;
        }
    }

    # Force lookup if the request is a no-cache request from the client
    if (req.http.Cache-Control ~ "no-cache") {
        return (pass);
    }

    ## Default request checks
    if (req.request != "GET" &&
        req.request != "HEAD" &&
        req.request != "PUT" &&
        req.request != "POST" &&
        req.request != "TRACE" &&
        req.request != "OPTIONS" &&
        req.request != "DELETE") {
        # Non-RFC2616 or CONNECT which is weird.
        return (pipe);
    }
    if (req.request != "GET" && req.request != "HEAD") {
        # We only deal with GET and HEAD by default
        return (pass);
    }

    # Drupal js/css doesn't need cookies, cache them
    if (req.url ~ "^/modules/.*\.(js|css)\?") {
        unset req.http.Cookie;
    }

    ## Don't cache Drupal logged-in user sessions
    if (req.http.Cookie ~ "(VARNISH|DRUPAL_UID|SESS)") {
        return (pass);
    } else {
        ## Remove phpbb3 cookies for all non-logged in users.
        set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(phpbb3_[a-z0-9\-_]*)=[^;]*", "");
    }

    ## check with varnishlog -b -I Cookie
    ## Remove "bonus" cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(CP|ebNewBandWidth.*|scorecardresearch|get_iphone_app|punbb_cookie|IPE[0-9]+)=[^;]*", "");
    ## Remove interstitial cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(fbs[a-z0-9]+|ns_session|ns_cookietest|Visited)=[^;]*", "");
    ## Remove has_js and Google Analytics cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(__[a-zA-Z0-9_]+|has_js)=[^;]*", "");
    ## Remove imce module cookies.
    set req.http.Cookie = regsuball(req.http.Cookie, "(^|;\s*)(icid|idsc|1h1|1h2|iw1|iw2)=[^;]*", "");
    ## Remove geo targeting cookie
    set req.http.Cookie = regsub(req.http.Cookie, "(^|;\s*)GeoIP-Country=[^;]*", "");
    ## Remove a ";" prefix, if present.
    set req.http.Cookie = regsub(req.http.Cookie, "^;\s*", "");

    ## Remove empty cookies.
    if (req.http.Cookie ~ "^\s*$") {
        unset req.http.Cookie;
    }

    return (lookup);
}

## change the varnish cache hash to add in geo information so pages will be regionalized
## this could double the cache size
sub vcl_hash {

    ### these 2 entries are the default ones used for vcl. Below we add our own.
    set req.hash += req.url;
    set req.hash += req.http.host;

    ### use country as long to create a per-region hash as long as it's set and it's not a static asset
    if (req.http.X-GeoIP-Region && !req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
        set req.hash += req.http.X-GeoIP-Region;
    }
}

sub vcl_error {
    if (obj.status == 503 && req.restarts < 5) {
        set obj.http.X-Restarts = req.restarts;
        restart;
    }
}

sub vcl_fetch {
    # Grace to allow varnish to serve content if backend is lagged
    set beresp.grace = 5m;

    # These status codes should always pass through and never cache.
    if (beresp.status == 404 || beresp.status == 503 || beresp.status == 500) {
        set beresp.http.X-Cacheable = "NO: beresp.status";
        set beresp.http.X-Cacheable-status = beresp.status;
        return (pass);
    }

    # if no-cache headers set, don't cache unless it's a feed
    if ((!req.url ~ "/feed$") && (beresp.http.Pragma ~ "no-cache" || beresp.http.Cache-Control ~ "no-cache" || beresp.http.Cache-Control ~ "private")) {
        return (pass);
    }

    # cache static assets for a long time regardless of headers
    if (req.url ~ "\.(jpg|jpeg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
        set beresp.ttl = 1w;
    } else if (beresp.cacheable) {
        /* set ttl based on Cache-Control header set in Drupal */
        /* from http://www.varnish-cache.org/trac/wiki/VCLExampleExtendingCacheControl */
        call extended_cache_control;

        # All tests passed, therefore item is cacheable
        set beresp.http.X-Cacheable = "YES";
    } else {
        set beresp.http.X-Cacheable = "NO: !beresp.cacheable";
        return (pass);
    }
    return (deliver);
}

sub extended_cache_control
{
    if (beresp.http.Cache-Control ~ "max-age=[0-9]+") {

        /* Copy the ttl part from original header */
        set beresp.http.X-Cache-Control-TTL = regsub(beresp.http.Cache-Control, ".*max-age=([0-9]+).*", "\1");
C{
        {
            char *x_end = 0;
            const char *x_hdr_val = VRT_GetHdr(sp, HDR_BERESP, "\024X-Cache-Control-TTL:");
            if (x_hdr_val) {
                long x_cache_ttl = strtol(x_hdr_val, &x_end, 0);
                if (ERANGE != errno && x_end != x_hdr_val && x_cache_ttl >= 0 && x_cache_ttl < INT_MAX) {
                    VRT_l_beresp_ttl(sp, (x_cache_ttl * 1));
                }
            }
        }
}C

        unset beresp.http.X-Cache-Control-TTL;
    }
}

sub vcl_deliver {
    # http://www.varnish-cache.org/trac/wiki/VCLExampleHitMissHeader
    # add cache hit data
    if (obj.hits > 0) {
        # if hit add hit count
        set resp.http.X-Cache = "HIT";
        set resp.http.X-Cache-Hits = obj.hits;
    } else {
        set resp.http.X-Cache = "MISS";
    }

    # As long as there was no geo cookie set on the incomming request,
    # tell the client to set the geo cookie for the next request.
    ##if (req.http.Cookie ~ "((?!GeoIP-Country).)*") {
    if (!req.http.X-GeoIP-Cookie-Set && req.http.X-GeoIP-Country && !req.http.X-GeoIP-Unavailable) {
        #
        ## TODO: is there an issue with the cookie name on prod vs stg... add in hostname?
        #
        set resp.http.Set-Cookie = "GeoIP-Country=" req.http.X-GeoIP-Country "; Path=/";
        unset req.http.X-GeoIP-Cookie-Set;      # remove temporary header
    }
}

# Added to let users force refresh
sub vcl_hit {
    if (!obj.cacheable) {
        return (pass);
    }

    if (req.http.Cache-Control ~ "no-cache") {
        # Ignore requests via proxy caches,  IE users and badly behaved crawlers
        # like msnbot that send no-cache with every request.
        if (! (req.http.Via || req.http.User-Agent ~ "bot|MSIE|Dealio")) {
            set obj.ttl = 0s;
            return (restart);
        }
    }
    return (deliver);
}

Change for varnish-3.0.2

chmac's picture

Thanks for this thread. Totally unrelated, I was looking to set a cookie with the client IP address. Turns out it's quite easy. I copied the code above, however, I ran into one challenge. I'm using varnishd (varnish-3.0.2 revision cbf1284).

My working code is thus:

sub vcl_deliver {
set resp.http.Set-Cookie = "user_ip=" + client.ip + "; Path=/";
}

The code above had no + symbols, I needed them to get it to work. Thanks again.