Nginx, CDN and SEO

Events happening in the community are now at Drupal community events on www.drupal.org.
superfedya's picture

Hi,

I use a CDN module for the parallel request.
But it creates a SEO problem. Same article can me available from many sub-domains:

mysite.com/article
cdn1.mysite.com/article
cdn2.mysite.com/article
cdn3.mysite.com/article

...

It isn't good for SEO.

Right now I add the sub-domaines like that:

server {
    listen 80; # IPv4
    server_name mysite.com cdn1.mysite.com cdn2.mysite.com cdn3.mysite.com;
    limit_conn arbeit 32;
...

Any way to make cdn works only for images/js/css and for any other request redirect (301) to the main domain?

With Apache I did like this:

RewriteCond %{HTTP_HOST} ^mysite.com$ [NC]
  RewriteRule ^(.)$ http://www.mysite.com/site/$1 [L,R=301]

  RewriteCond %{HTTP_HOST} ^cdn1.mysite.com$ [NC]
  RewriteRule ^(.
)$ http://www.mysite.com/site/$1 [L,R=301]

  RewriteCond %{HTTP_HOST} ^cdn2.mysite.com$ [NC]
  RewriteRule ^(.)$ http://www.mysite.com/site/$1 [L,R=301]

  RewriteCond %{HTTP_HOST} ^cdn3.mysite.com$ [NC]
  RewriteRule ^(.
)$ http://www.mysite.com/site/$1 [L,R=301]

http://drupal.org/node/1060358

Any suggestions?

Thanks

Comments

Use the Nginx 'map'

Peter Bowey's picture

Use the Nginx 'map' feature+permanent rewrite.This sample from my own server:

map $host $cdn_caller {
        default                         '';    # default is no divert
        ~^(?:cdn1.static.com.au|cdn2.static.com.au|cdn3.static.com.au|cdn4.static.com.au) 1;  # CDN caller
    }
    map $host $cdn_check {
        default                         1;  # default is block
        ~
^(?:cdn1.static.com.au|cdn2.static.com.au|cdn3.static.com.au|cdn4.static.com.au) 0;  # CDN caller
    }
##
##
    # Divert CDN Host headers
    if ($cdn_caller) {      # Remote CDN should NOT come here
        rewrite ^ $scheme://cdn1.static.com.au$request_uri? permanent;      # send it to the correct static host
    }

--
Linux: Web Developer
Peter Bowey Computer Solutions
Australia: GMT+9:30
(¯`·..·[ Peter ]·..·´¯)

Something like this might do it

perusio's picture

Try. At the http level:

map $uri $redirect_cdn {
    default 0;       
     ~\.(?:css|gif|js|jpe?g|png)$ 1;
}

At the server level:
if ($redirect_cdn) {
   return 301 http://mycdn.com$request_uri;
}

peter bowey Thanks, it

superfedya's picture

peter bowey

Thanks, it exactly what I need. But The there is one problem, in browser everything is OK. All images are available from the CDN.

But http://www.webpagetest.org/result/120601_3B_J7D/1/details/ shows many 301 redirect.
That isn't good for SEO. How I can remove any 301 redirect from images/css/js on the CDN1/2/3?

perusio
Thanks
It will redirect each file to the CDN, but everything are already available from CDN. The idea is to redirect everything from CDN to the main domain exept images/css/js. To make rid of double content.

How I can add to this line:
~^(?:cdn1.mysite.com|cdn2.mysite.com|cdn3.mysite.com) 1; # CDN caller

Exception of jpg, gif, css, js files?

Thanks

I didn't understood your initial request

perusio's picture

This is the simplest approach:
At the server level or inside the / location, do:

location ~* \.(?:css|gif|js|jpe?g|png)$ {
    return 301 http://$http_host$request_uri;
}

This means that you have to modify the location that handles static files. Leave the server_name as you have it above with all the cdn hostnames. This should do the trick.

if you want to redirect only

perusio's picture

for requests that specify the cdn* hostnames, do:

## http level map directive.
map $host $redirect_cdn {
    default 0;
    cdn1.mysite.com 1;
    cdn2.mysite.com 1;
    cdn3.mysite.com 1;
}

And modify the above location to:
location ~* \.(?:css|gif|js|jpe?g|png)$ {
   if ($redirect_cdn) { 
       return 301 $scheme://$host$request_uri;
   }
}

It will create the redirect

superfedya's picture

It will create the redirect also.
The idea is create 301 redirect for every request except for the images and other static files.
http://www.webpagetest.org/result/120601_K3_NRG/1/details/

It must be like that:
http://www.webpagetest.org/result/120601_DP_NSM/1/details/

The visitors will download all static files (mostly images) in parallel without any 301 redirect. But all other content like a:
cdn1.site.com/article will redirect to site.com/article

Thanks

Ok so you want the other way around

perusio's picture

for that you need to remove the cdn hostnames from server_name.

Create a separate vhost with the cdn hostnames only. And use a map directive.

## At the http level
map $uri $no_cdn {
    default 1;
    ~\.(?:css|gif|js|jpe?g|png)$ 0;
}

server { ## cdn vhost
    server_name cdn1.mysite.com cdn2.mysite.com cdn3.mysite.com;

    if ($no_cdn) {
       return 301 $scheme://mysite.com$request_uri;
   }
}

It's just perfect! Thank you

superfedya's picture

It's just perfect! Thank you very much Perusio!

Can I add to this vhost the lines?:
limit_conn arbeit 32; (to protect against DDOS)

and

   ## All static files will be served directly.
    location ~* ^.+.(?:css|js|jpe?g|gif|png)$ {
        access_log off;
        expires 30d;
        ## fell swoop.
        tcp_nodelay off;
        ## Set the OS file cache.
        open_file_cache max=3000 inactive=120s;
        open_file_cache_valid 45s;
        open_file_cache_min_uses 2;
        open_file_cache_errors off;
    }

Yes

perusio's picture

Place it inside a /:

location / {
     ## All static files will be served directly.
    location ~* ^.+\.(?:css|js|jpe?g|gif|png)$ {
         limit_conn arbeit 96;

        access_log off;
        expires 30d;
        ## fell swoop.
        tcp_nodelay off;
        ## Set the OS file cache.
        open_file_cache max=3000 inactive=120s;
        open_file_cache_valid 45s;
        open_file_cache_min_uses 2;
        open_file_cache_errors off;
    }
}

I would increase the connection limit to 3 times since the client can download stuff in parallel from each domain, as you see above.

Thanks you. It works

superfedya's picture

Thank you. It works perfectly.

Except for imagecache.I

superfedya's picture

Except for imagecache.

I added:

    ## If accessing an image generated by imagecache, serve it directly if
    ## available, if not relay the request to Drupal to (re)generate the
    ## image.
    location ~* /imagecache/ {
        ## Image hotlinking protection. If you want hotlinking
        ## protection for your images uncomment the following line.
        #include sites-available/hotlinking_protection.conf;

        access_log off;
        expires 30d;
        try_files $uri /index.php?q=$no_slash_uri&$args;
    }

Without results.
http://cdn1.madfanboy.com/sites/default/files/imagecache/screenshot/imag...

http://madfanboy.com/sites/default/files/imagecache/screenshot/imagecach...

Any idea?

Imagecache

perusio's picture

works as a 404 error handler, if there's no image it's forwarded to drupal so that the image can be (re)generated.

So I guess what you need is:

  1. Check if image exists on the CDN.

  2. If not then forward to the main domain for generating the images.

Since you need to redirect the request to another domain, you can't use try_files.

## Inside the / location of the cdn vhost.
location ~* /imagecache/ {
        ## Image hotlinking protection. If you want hotlinking
        ## protection for your images uncomment the following line.
        #include sites-available/hotlinking_protection.conf;

        access_log off;
        expires 30d;
       
        error_page 404 = http://mysite.com/index.php?q=$no_slash_uri&$args;
}

CDN is just a mirror of the

superfedya's picture

CDN is just a mirror of the main domain.

I think imagecache need the access to PHP for thumbnail generation. All other images of imagecache are available, except the thumbnails.

Thanks

Any news? Nobody uses

superfedya's picture

Any news? Nobody uses Imagecache with Nginx and CDN?

Thanks

You could check how it is

omega8cc's picture

You could check how it is done, and fully automated, in the https://github.com/omega8cc/provision_cdn Aegir extension. It basically creates separate CDN-only vhost, which includes all required locations and redirects to properly support both imagecache, AdvAgg and CDN farfuture. We are using it on our own website for the fake-CDN setup, and it creates for us the CDN-only vhost as shown here: https://gist.github.com/4574584

It's seems to be very

superfedya's picture

It's seems to be very complicated.

My CDN vhost works, but not for all thumbnail generation (it works everywhere except for the node view).

I'm sure that I miss something:

server { ## cdn vhost
    server_name cdn1.mysite.com cdn2.mysite.com cdn3.mysite.com;
    root /var/www/mystie.com;

    if ($no_cdn) {
       return 301 $scheme://mysite.com$request_uri;
   }

    ## See the blacklist.conf file at the parent dir: /etc/nginx.
    ## Deny access based on the User-Agent header.
    if ($bad_bot) {
        return 444;
    }
    ## Deny access based on the Referer header.
    if ($bad_referer) {
        return 444;
    }

    ## Protection against illegal HTTP methods. Out of the box only HEAD,
    ## GET and POST are allowed.
    if ($not_allowed_method) {
        return 405;
    }

## Inside the / location of the cdn vhost.
location ~* /imagecache/ {
        ## Image hotlinking protection. If you want hotlinking
        ## protection for your images uncomment the following line.
        #include sites-available/hotlinking_protection.conf;
    ## Include the FastCGI config.
    include fastcgi_drupal.conf;
    fastcgi_pass phpcgi;
        access_log off;
        expires 30d;
      
        error_page 404 = http://mysite.com/index.php?q=$no_slash_uri&$args;
}

location / {
     ## All static files will be served directly.
         location ~* ^.+.(?:css|js|jpe?g|gif|png)$ {
         limit_conn arbeit 96;

        access_log off;
        expires 30d;
        ## fell swoop.
        tcp_nodelay off;
        ## Set the OS file cache.
        open_file_cache max=3000 inactive=120s;
        open_file_cache_valid 45s;
        open_file_cache_min_uses 2;
        open_file_cache_errors off;
    }
}

}

And if I manually copy

superfedya's picture

And if I manually copy thumbnail link to the browser, next time it will show fine...

The links for thumbnail is like that:
http://madfanboy.com/sites/default/files/imagecache/screenshot/imagecach...

Maybe it's because of "/imagecache/screenshot/imagecache/screenshots" path? and I need to adjust something to my vhost config?

Thanks

Seems that the problem was

superfedya's picture

Seems that the problem was resolved by adding:
try_files $uri /index.php?q=$no_slash_uri&$args;

to the imagecache config.

Nginx

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: