RFC: Parallel downloading via subdomains

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
mikeytown2's picture

Create a module that automatically rewrites various tags with a src attribute to use a subdomain. This /sites/all/files/css/mystyle.css to //data1.example.com/sites/all/files/css/mystyle.css. Have it handle the following html tags

  • link
  • style
  • script
  • img
  • iframe
  • embed
  • ???

If it could get the src elements out from inside the script tags, that would be cool as well. http://www.websiteoptimization.com/speed/tweak/parallel/ - 2 to 3 hostnames seems to give the best results. Subdomain points to same folder as the domain to make life easy. Some kind of smart auto assign based on filesize or something (needs a db table to keep track, so one can take advantage of the browser cache). Also need to address any multisite issues that might arise.

Thoughts, Comments, Ideas, Code?

Comments

I started to look at

zeropaper's picture

I started to look at http://api.drupal.org/api/function/custom_url_rewrite_outbound/6 (I will sooner or later need to use parallel downloads) but I'm not 100% sure this would do the trick for everything.. still, I guess this may be a good starting point..
The biggest problem with custom_url_rewrite_outbound() is that it has to be placed into the settings.php file... which would not be the most elegant solution for a module, to my opinion..

CDN module

jcmarco's picture

Probably the best existing solution could be Wim Leers' CDN
http://drupal.org/project/cdn

Other options is patching as in:
http://tag1consulting.com/patches

and
http://tedserbinski.com/tags/drupal/getting-drupal-play-nice-your-cdn

Ted Serbinski is pointing to:
http://drupal.org/node/499156

But this last patch is the same followed by Wim for the CDN module

CSS & JS is easy

mikeytown2's picture

Getting/Rewriting the CSS & JS files is fairly easy to do. Use the same hook that CSS Gzip and Javascript Aggregator uses: template_preprocess_page(). $variables is passed by reference so any changes one makes to $variables['styles'] and $variables['scripts'] will end up in the final output. Code for parsing that is in both projects. If you want to do it via the fully generated html document, boost does that.

Other objects like images ect... are also in the $variables array, but it's semi theme dependent, and hidden throughout various array keys.

content
left
right
header
footer
tabs

I think system_region_list might list them all though.

Got it working

mikeytown2's picture

Does css & js
http://drupal.org/project/parallel

Was actually quite simple to do. The concept works.

Images too!

mikeytown2's picture

Got it to do just about all the images inside the html document. Each element type can have it's own subdomain, for total of 3 now (css, js, img). I'm running this on a live server now. I'm using a total of 2+1 so css + images point to the same subdomain. Dev release should be out in about 3 hrs or so (they go out every 12 hrs).

Wim Leers's picture

You haven't explained the pros and cons of each module properly. Please see http://drupal.org/node/550744.

SEO

patataur's picture

Hello

What is the incidence of this module for SEO?

Edit :

Ok i saw this : http://drupal.org/node/579624

Coming from here : http://drupal.org/node/597178

That's nice :)

There will probably be some

brianmercer's picture

There will probably be some temporary drop in rankings for the specific images.

You'd definitely want to setup a 301 redirect from the old URL to the new one.

Updated rules

mikeytown2's picture

Took a quick peek and wrote some untested rules; hopefully they cover JavaScript now.
http://drupal.org/node/579624#comment-2713530

He said something about SEO

brianmercer's picture

He said something about SEO on google image search, so I was thinking perhaps to do a 301 redirect from www.example.com/files/image.jpg to cdn1.example.com/files/image.jpg. That's what you'd normally do if you're changing the domain of regular text content because you don't want the dreaded duplicate content penalty of google finding identical content at two locations.

Google also supports a "canonical" meta tag which you can insert on the duplicate content which points back to the main content so that the google bot doesn't have to guess which of the two you want to be ranked. But you can't implement a meta tag on a direct image link.

Parallel doesn't prevent the web server from serving the file from www.example.com/files/image.jpg though, it just rewrites the URL on the dynamic pages and serves it from both. So direct links would still work, and the current listing of the image in google image search would still work. The question remains whether it'll hurt google image search rankings if the google crawler finds an identical image at cdn1.example.com/files/image.jpg. If it works like normal page ranking, then it may, but the image search algorithm may be a bit different.

It might hurt on the older

Jamie Holly's picture

It might hurt on the older images, but the newer ones would be fine. Google's only going to index images it finds referenced in other files, so if all your files reference imgs1.example.com/files (CDN), then they won't ever index example.com/files (base Drupal).

But that does go with the point I brought up before of insuring that each image is assigned a permanent server (and something I didn't think of). If you have a dynamically changing subdomain, then Google will see the same image on numerous servers and punish you.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Specific subdomains for specific element

patataur's picture

Hi Mikey

You made it so that css would use CDN1, images CDN3, js CDN2.

For me that is not much help : the navigator downloads differents elements at the beginning (where having parallel helps) but later only downloads images, about 8-10 images. Those are all on CDN3 so no parallel download there.

Would it be possible to having each file being downloaded on a different CDN, alterning the CDNs one after another for each element being requested?

Or, alternatively, having the ability to put some images into CDNx and other images into CDNy?

Thanks ;)

Complexity

mikeytown2's picture

This would add a lot of complexity to the module, right now it's fairly dumb. I'm more then willing to accept patches, but ATM i'm still too busy with the boost module to worry about this one.

Just simply alternating

Jamie Holly's picture

Just simply alternating domains isn't going to help that much. You really need a way to keep track of what image is going to what domain and keep it there. So if thisimage.jpg comes off of imgs1.example.com, then that image should always come off that same host. If not then you are going to be defeating the purpose by adding additional overhead to the browser on subsequent page views, since the browser would consider the new host as an entirely different image and reload it, thereby negating the browser's cache.

For example, your page has myhome.jpg as an image. On one page view it comes from:

http://imgs1.example.com/myhome.jpg

Next page view it comes from:

http://imgs2.example.com/myhome.jpg

The browser now sees two totally different files and loads them from the server, instead of just loading it from the cache.

A much better solution would be to use absolute URL's when the image is added into the node or whatever. Once an image is added it is assigned a server and the absolute URL to that image is used. That way each image is constantly being served from it's assigned subdomain. You could even come up with a system based off of the FID in {files} - say a base 4 system if you wanted 4 different subdomains.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

FID wouldn't be a good choice

dalin's picture

FID wouldn't be a good choice because you then can't do redirects in .htaccess. What you probably need to do is assign to a different CDN domain based on subdirectory in /files.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Well I wasn't actually

Jamie Holly's picture

Well I wasn't actually considering the old images. In all honesty I would leave those alone and just focus on newly added items. In that case then FID would work fine (I've done similar type things before and it works great). Of course you could even come up with a filter that uses and algorithm to assign a subdomain based off the filename. Something as simple as adding all the ascii values of the filename and then doing a base whatever off of that.

It's one of those situations that there's about a million possibilities for and going to depend upon the current system and layout. As long as you can keep the subdomain the same for each individual image then you will be doing a lot more good.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Not all sites use the files

jcisio's picture

Not all sites use the files table (IMCE for example). I think we can use the first character of the filename to take decision, and it's easy to do with .htaccess, too. To go further, for a perfect balance, use the first character of the md5 hash.

IMCE insert Image

biru's picture

Hi all,
I've tried to install the parallel module ver. 6.x-1.0-beta2 its look great ...but i have problem when i created article using IMCE module ..i cant insert image because when i click into insert image button ..the popup menu goes blank and stop to respond.
but
when i go to performance page and leaving blank for JS text box and everything back to normal ...
Any help would be greatly appreciated ?

Regards

Cross Domain JS

mikeytown2's picture

I've heard of javascript having issues running correctly when it's cross domain. IMCE might have this issue?

FWIW I've got Parallels 1.0B2

dalin's picture

FWIW I've got Parallels 1.0B2 with IMCE 1.3 working fine together. I've set things up for images and JS to be pulled from the same CDN domain. The CDN domain is not a subdomain.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Oh and this is using WYSIWYG

dalin's picture

Oh and this is using WYSIWYG 2.0 with FckEditor 2.6.5 and IMCE Bridge 1.0.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Cross Domain JS IMCE issue

biru's picture

Thanks Mike
I will try to ask this IMCE issue at forum

Regards

How to set this up on a hosted server

vishalchavda's picture

Hello
I am a newbie and am trying to setup the CDN module to work on a hosted environment.
For my site I am using the latest DEV versions of CDN and ImageCache and have patched Drupal 6.22 core.

I have created a CNAME via CPanel called cdn1 that points to my main domain. I am on a hosted service using Apache servers.

I have configured the CDN module to use origin and pull mode and have mapped http://cdn1.example.com|.jpg .jpeg .png .gif.

With this set up my images on the site are not getting displayed.

In the display statistics, I get something like this:

http://cdn1.example.com.com/sites/default/files/imagecache/Custom_100_100/[type]/[nid]/myImage.jpg (port 80)
...
On clicking the link I get a 404 Not found error. Basically cdn1 is not fetching data from example.com.com/sites/...

Any help would really be appreciated.

Regards

Vishal

couple of options

mikeytown2's picture

http://drupal.org/project/imageinfo_cache will generate all presets in the background so you shouldn't get a 404 any more.

What happens when you go to cdn1.example.com? Is drupal there?

Hello

vishalchavda's picture

Hello and thanks for your reply.

I have tried to do the setup the CDN module as explained in the setup section in the Parallel module i.e. http://drupal.org/project/parallel

I have simply created the "cdn1" CNAME in cPanel and pointed it to my main domain i.e. example.com. This created an empty folder called cdn1. When I go to cdn1.example.com.

I get the default website page from cPanel, something like this:

Default Web Site Page

If you feel you have reached this page in error, please contact the web site owner:

webmaster@cdn1.example.com
It may be possible to restore access to this site by following these instructions for clearing your dns cache.
...

The problem is not just with image cache. When I try to map the .css files they are not getting set.
i.e. cdn1.example.com|.css

Do I need to make any changes in the .htaccess file.

Regards

Vishal

It sounds like your problem

dalin's picture

It sounds like your problem has nothing to do with Drupal or Drupal files. It looks like you haven't set up your CDN domain properly. First make sure that
http://www.example.com/CHANGELOG.txt
and
http://cdn.example.com/CHANGELOG.txt
return the same thing. Only then can you setup Drupal to use the CDN.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Thanks for your answer

vishalchavda's picture

Hello Dalin

Thanks for your help.

I had to ask my hosting company to set this up for me, they said they had to do some setup on their side which they could not do :(. But they suggested that I create a subdomain "cdn1" and point it to "/public_html". This seemed to have done the trick. I have set up the CDN module now and am getting the images from the cdn.

I also tried to do the same with .css files from a new subdomain i.e. cdn2, but this is not working at the moment.

Please do feel free to look at my website and give me your feedback if you do not mind.

www.thatscookedby.com

Regards

Vishal

Setting a cookie-less domain

vishalchavda's picture

Hello

I am trying to set-up my CDN subdomain to be a cookie-less.
- I have updated my .htaccess file to redirect all traffic to the www site i.e. http://example.com to http://www.example.com
- In the settings.php file I have set $cookie_domain = '.example.com';
But when I use YSlow, it tells me that cdn1.example.com is setting cookies for static content which are images.
Can you please help in setting up the subdomain - cdn1 to be cookie-less

Regards

Vishal

Peter Bowey's picture

1) You do not mention the setup for your CDN sub-domain!
2) Why are your directing all GET traffic to http://www.example.com
when you state that your $cookie_domain is '.example.com';

Normally you would set www.example.com to be the cookie domain (and the source of all PHP GETs), then create a new DNS STATIC only sub-domain for say: static.example.com -or- cdn1.example.com. It does work, I have done this many times!

In settings.php, do; $cookie_domain = 'www.example.com;'

Then using using your server's DNS setup; Bind or NSD (or whatever CP) use;

www.example.com.          IN      A       xxx.xxx.xxx.xxx
cdn1.example.com.         IN      A       xxx.xxx.xxx.xxx

I am sure you know what to do in .htaccess

--
Linux: Web Developer
Peter Bowey Computer Solutions
Australia: GMT+9:30
(¯`·..·[ Peter ]·..·´¯)

Thanks for your reply:

vishalchavda's picture

Thank you so much for your response. I am still having some difficulties in setting this up. I have the following setup:

  1. For the CDN module I have created in Cpanel a subdomain called cdn1 and pointed it to /public_html. (Not set to redirected). I than mapped this in the CDN module to serve .jpeg, .png etc. This works and I can see that my images are getting served from the cdn1, but with the overhead of cookies.

  2. I than setup via Cpanel an A record for the cdn1 subdomain as follows:
    www.cdn1.example.com. A xx.yy.zzz.aa

Note: I tried to set the subdomain as you suggested above, but I get an error:
"cdn1.example.com. A xx.yy.zzz.aa"
I got this message in cpanel: You must have the zoneedit feature to take this action. You currently only have the simplezoneedit feature.)

  1. in the .htaccess file, I have redirected all traffic to www.example.com.

  2. In settings.php I have set $cookie_domain = 'www.example.com';

This is all I have done. I am a newbie in all of this and whatever I have done so far has been through implementing various suggestions found online. Dont know what else is required to be done in the .htaccess file.

Regards

Vishal

Cookieless domain setup

vishalchavda's picture

Hello

Can someone please help with sharing their expertise on setting up a cookieless domains as per question above

Regards

Vishal

Hi