GlusterFS and Drupal 7 horizontal scaling

Posted by newbie7001 on December 23, 2011 at 12:45am

Prehaps someone can help me or at least point me in the right direction. If one needed to scale up a couple apache servers but share the docroot using a synchonized (common source) it seems glusterfs is a good solution. I have set up an experimental Glusterfs replicated system (2 instances, each both master and client) with linux (ubuntu), apache and php on them. Got Glusterfs working fine, however I am a little unclear how to "link" /var/www to the glusterfs client mount point (e.g. /mnt/point). Not sure how to bring drupal and Glusterfs together using "best practices."

After googling, there seem to be a few options, but not entirely sure about "best practices".

1) Could one simply make the glusterfs client mount point /var/www. Not sure it is that simple?

2) Would one use symblic links between /var/www and the glusterfs client mount point? Not exactly sure how this would be done. Am not a linux systems administrator.

3) Would one use mount --bind between /var/www and the glusterfs client mount point? Again, not exactly sure how this would be done.

4) Or perhaps there is another best practice?

Any help would be appreciated, since I am a bit of a newbie in this area. I also assumed that one would want to replicate all of /var/www in drupal 7 not just the sites or files upload directory since perhaps some modules in profile, etc, may also save things on disk and http is a stateless protocol.

Comments

Have you thought of using a

Posted by Mark Theunissen on December 23, 2011 at 3:56am

Have you thought of using a plain old NFS share? With APC cache enabled, the code is only accessed once anyway (switch off apc.stat so that it's not checked every hit), and the impact of having a network share is pretty much zero (assuming enough APC memory to cache the whole site).

Varnish can serve your static assets like image files and cache them for ages.

Any particular reason for using Gluster?

Question on Drupal NFS

Posted by jblumenfeld on January 2, 2012 at 2:19am

Hi,
I have a follow-up question on using NFS with APC. Although I understand that with APC enabled and APC.stat turned off the code files are not read in when they're executed, but Drupal also has some directory and file access via file_scan_directory. Wouldn't that still go to the file system or would it be optimized somehow from APC?

thanks,
/jeremy

Still hits the file system

Posted by mikeytown2 on January 2, 2012 at 11:04am

APC doesn't prevent things like file_scan_directory, file_exists, etc... from hitting the disk.

It depends

Posted by perusio on December 25, 2011 at 3:06am

what are you trying to accomplish. To be honest I see very little upside on using something like GlusterFS for your site. The module files are static in the sense that unless you upgrade modules or install new ones they stay the same.

For the files directory, using a distributed fs is an option. Other is using NFS and/or something like lsyncd which is scriptable in Lua.

No need also to use Varnish, if you're using something like Nginx you can stay in the wonderland of evented loops and use the Nginx cache for whatever type of assets you want. No need to introduce more moving parts and placing a threaded engine in front of an evented one.

Elaborate on what you pretend do achieve. What's your requisites and we'll be able to help you better.

In the topic of horizontal scaling. Using a load balancer is an option with several cheap backends. You can use, e.g, Nginx proxy cache to have a central cache and distribute the load on several backends.

thanks

Posted by newbie7001 on December 26, 2011 at 9:09pm

Thank you for your answers. Like I said I am a bit new to this. The reason for the question is that http is a stateless protocol hence one cannot control the application server a client will receive on each request (yes, I know load balancers can introduce sticky sessions, etc., but that is not what I am looking into). Therefore any client must see the same state. To clarify (and I am asking a question here) does this mean that no drupal module or theme can EVER (even an inexperienced programmer or by mistake) store state under /var/www other than in the files folder or db? What prevents that?

Is there an nfs or lsynchd (with exact steps) practical drupal tutorial that you might recomend? I had looked into glusterfs because people like acquia seemed to be interested in it. See http://www.cloudave.com/21/acquia-uses-gluster-storage-for-drupal-gardens-saas-offering/ for example.
The 3.2.5 version of glusterfs also in additon to syncronous master-master replication (distributed or striped) and master-client, also offers master-slave async backup to a different region via geo replication which uses rsynch. (I believe lua scripting is also offered although not something I am looking into.)

P.S. While I have used varnish, another option to handle images is to enable apache's expire mod so that for the second hit to your website the client's memory does all the work. One could use cloudfront, akaimai or another CDS for the first hit.

That's correct - there is no

Posted by Mark Theunissen on December 26, 2011 at 9:14pm

That's correct - there is no state stored with the module or theme files. Database handles it - and typically sites/all/files is only static assets.

Apache should not be able to write to any folder except the sites/all/files, which is what prevents anyone from doing that.

Any NFS tutorial should work - nothing special, just create a share and mount it at the correct mount point. Just make sure to follow guides on optimising it.

Here's a tutorial on Drupal

Posted by Jamie Holly on December 26, 2011 at 9:28pm

Here's a tutorial on Drupal and NFS

http://www.johnandcailin.com/blog/john/scaling-drupal-step-one-dedicated...

As Mark states, it's pretty much just setting up NFS - nothing Drupal specific about it. You can also do things like SSHFS/FUSE for shared file directories (again - nothing Drupal specific, just get SSHFS/FUSE set up).

HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

lsyncd

Posted by perusio on December 27, 2011 at 2:50am

is for using when you want to have replication. NFS doesn't replicate. It justs allows you to mount remote filesystems in a given machine. It's the easiest way to setup a central asset server and a bunch of frontends that use the same drupal filesystem. But bear in mind: no replication.

GlusterFS is nice if you have a NAS or a any type of storage cluster. If you're going to host a lot of sites or if your data has very demanding requisites, then it's a good option. For most people out there IMO it's an overkill.

In my sites I never use the /var/www directory. I run all sites under a regular user account and just give permissions to the server user for writing in the files directory. I want to have the minimal interaction possible with the root account. Setting an Expires or Cache-Control header doesn't provide any server side caching. You're totally dependent on the client side cache policy.

Most of the configs are on the db, although if using features or CTools you could have a lot of stuff in code, as long as you don't alter the features being provided, which in features parlance is referred to as overriding a feature .

ok

Posted by newbie7001 on December 30, 2011 at 7:43pm

Thanks for your replies. Jamie your link was quite useful and helped answer one of my original questions. Mark, you refer to the importance of optimization, what is the single most important optimization you found (in practice) for nfs? perusio, to clarify your reference to "overiding a feature": I am somewhat unclear, is that a possible reference to one dynamically overiding a feature (code). I have not (yet) done any drupal programming, but if one can dynamically change a feature configuration stored on a file, is not "state" being stored in a place other than the files folder or the db? Have I missed something?

Even with APC, NFS being a major performance hog

Posted by gmania on January 12, 2012 at 11:14am

We've got a similar set up, and are using NFS to share the entire drupal directory (code + files directory) between instances in our cluster.

I was expecting that with APC, NFS shouldn't really result in any performance hit for dynamic pages, since all the code should be cached in memory. However, running synthetic benchmarks with AB shows a major performance hit when loading a page (just the dynamic HTML). When accessing PHP off of NFS vs "local" disk (EBS volume on AWS), performance is 25-75% slower (depending on the number of simultaneous connections varying from 1-10).

Any thoughts as to what might be causing this performance hit? APC has plenty of cache, and is set to stat=0. What else might be at work here, and how do we go about identifying (and fixing) the bottleneck?

We were also looking into GlusterFS, since it seems that beyond the redundancy advantage that it provides (single point of failure at the moment with the NFS server), it seems that it might also help with performance if we chose to replicate files locally rather than server them over the wire. Thoughts on this?

Thanks!

I don't think

Posted by perusio on January 12, 2012 at 12:15pm

that having your code files on NFS is a good idea. Those files are hit continuously. APC just caches the bytecode. They still are hit in every request. Static files is diferent, because:

They're cached in the client.
The number of files that is hit in each request is smaller usually.

I would suggest that you either use a real distributed filesystem or that just use git to clone the repo of the site across all frontend machines. You can set up git hooks to automatically push from a master repo to the slaves.

Then GlusterFS is the way to go?

Posted by gmania on January 12, 2012 at 3:38pm

We were originally planning to just host the files directory on NFS and setting up a separate system to push code updates to the machines in the cluster.

However, we're using Aegir to host a number of sites and it's actually quite a pain to carve out the files directory for each hosted site and host only that directory on NFS.

So hosting the whole install on NFS was the "quick win" solution to kill both issues with one stone. No need to carve out the files directory, and no need to write something new to manage code updates.

But like many "quick wins", the win isn't as big as we had hoped in and we pay the price in terms of performance. (Not very clear as to what APC / PHP / Apache is actually doing in terms of accessing code files, but benchmarking would seem to confirm that even if the bytecode is cached in memory, there's still file access going on for each load)

So we're now back to square one. Either we come up with a better performing alternative to NFS (does Gluster qualify for what you call a real distributed system?), or we go back to rolling a full, if complicated system to host the individual file directories via NFS and push code updates to each machine in the cluster.

Thoughts?

Thanks!

Yes and no

Posted by perusio on January 12, 2012 at 6:07pm

it depends what's the price that you're willing to pay for having a solution. There's also DBRD, as referenced below. And there's also lsyncd. My advice would be for you to start as simple as possible. You already tried using NFS and has problems. Next is to try something like lsyncd. Last I would try a distributed FS.

You could still use symlinks

Posted by fabianx on January 12, 2012 at 8:39pm

You could still use symlinks for all the files directories, which would be also quick to do with aegir and should allow you to share files differently from code ...

However if you want to stick with NFS, using CacheFS and FSCache:

http://www.linux-mag.com/id/7378/

might be a good idea, too.

Especially on AWS with the local instance disk (which should be faster than EBS), this could work really well.

Best Wishes,

Fabian

DRBD

Posted by alanmackenzie on January 12, 2012 at 4:04pm

Provided the static assets you need to share are not too big DRBD (RAID 1 over network, all I/O is on the local disks) sounds like it would be a good solution with-out the performance issues that come with NFS.

http://en.wikipedia.org/wiki/DRBD

http://www.linkedin.com/in/linuxwizard

Lots of great suggestions

Posted by gmania on January 16, 2012 at 1:33pm

Thanks for the suggestions!

CacheFS looked promising, but doesn't seem particularly well supported, and I couldn't get it working in Centos (kept giving me Kernel permissions errors).

I looked at lsyncd, but am not sure how easily it handles two-way synchro. If I have a cluster with 3 web servers, each of which can update the Files directory, will lsyncd be able to determine, on each machine, whether there is a need to sync our own edits to the other machines (or a central server), and vice-versa, whether there our edits on other machines which need to be synced from the server (or other machines) to the local instance? Rsync seems really best suited to one-way changes, such as updating the code base, and I've known it to run into issues if changes have been made to both the local and remote targets.

Looked briefly at DRDB, but this seems more like a solution to provide networked redundancy to a "locally" accessed array, but is not specifically designed to providing "common" access to files to 3-4 different machines in a cluster. Perhaps I'm missing something though.

Looking again at GlusterFS, not sure if the performance improvement is really that much greater than NFS, as it seems to also come with a lot of overhead.

CacheFS would probably have been ideal, had I been able to get it working, but right now I'm leaning back towards carving out the Files directory and hosting that on NFS, and putting together an lsyncd or rsync option to keep the codebase up to date.

Did you ever get CacheFS

Posted by a_c_m on April 29, 2013 at 11:37am

Did you ever get CacheFS working? What issues did you have?

We have a huge (1.5TB) files directory (of small files) and currently run it over NFS. But, i'm wondering if something like CacheFS might help us too.

Also been hearing that Samba might be a better choice? http://serverfault.com/questions/372151/nas-performance-nfs-vs-samba-vs-... over NFS.

Which FS depends on your use

Posted by rwohleb on April 29, 2013 at 5:46pm

Which FS depends on your use case. Is it going to be read-heavy or write-heavy?

I use GlusterFS in a load balanced read-heavy web environment and it works really well for me. If you expect to be write-heavy, then be aware that GlusterFS has bad performance with many small file changes. They don't support write aggregation yet. Rsync can have bad performance with GlusterFS because of this.

I don't like Gluster. (But,

Posted by a_c_m on April 29, 2013 at 7:05pm

I don't like Gluster. (But, i'm happy to be told i'm missing the point)

It (to my knowledge) requires 100% copies on each webhead/brick.

Which is fine if you have a few 100mb of stuff, but when you have 2TB+ it makes each web-head very expensive and a huge waste of resources.

Not to mention, that, when your at your highest load and need to horizontally scale, by spinning up a new webhead, you have to sync your entire files folder, over the already stressed network, from already stressed webheads. I've seen this kill sites and servers, its not pretty.

The only way i can see gluster making any kind of sense, is :
a) small volumes of ready heavy data*
b) as a replication/redundancy file system which is still then mounted via nfs by the consumers and just uses gluster to keep 2 or 3 redundant file systems in sync.

But if thats your use case, why not just add a few GB more ram to Varnish and never have the requests hit your webheads at all?

and b) still has the NFS/CacheFS/Samba question.

Gluster does not require all the data on each disk

Posted by rgristroph on April 29, 2013 at 7:28pm

Gluster does not require all the data be on one node. When you do the "gluster volume create" command, you can give it "replicate" ( all the data on every node), or "stripe" or "distribute", which are various ways that some of the data is on every node (and maybe each block is also on multiple nodes, depending on how you set it up).

In theory thats correct :

Posted by a_c_m on April 29, 2013 at 7:54pm

In theory thats correct : http://gluster.org/community/documentation/index.php/GlusterFS_Concepts

But.

Stripe is meant for large files.
Distribute will fail if a single node goes down.
Replicate has the issues i described.

I know Gluster is flavor of the month - but it just doesn't (to me) seem to be cut out for this (specifically our) use case. I want to be proven wrong, so if anyone has a link to a example setup that sidesteps these issues, i would love to read it.

Which is why the likes of Pantheon and Aberdeen have gone out and built their own FS solutions to the problem. Just a shame they didn't open source them!

So, back to my original question. Anyone used CacheFS to speed up a NAS connection from webheads... and anyone seen in production, if SAMBA is faster than NFS?

GlusterFS has been popular in

Posted by rwohleb on April 29, 2013 at 8:32pm

GlusterFS has been popular in this type of environment for a while now. Is it a panacea? No. A big reason companies like Pantheon didn't go with GlusterFS is that their use case required supporting many clusters of applications servers in a huge scaling environment. GlusterFS is awesome when you are running one (or a few) clusters of application servers.

GlusterFS doesn't require you to replicate or stripe data across your application servers. You can just run the GlusterFS client on the application servers:
http://gluster.org/community/documentation/index.php/Gluster_3.2:Accessing_Data-_Setting_Up_GlusterFS_Client

You mention 1.5TB of small files but didn't mention access patterns. CacheFS is great when you have a read-heavy environment that is not very random-access. If each file gets hit infrequently, any cache is worthless and only adds overhead. Additionally, you may be running Varnish in front of everything, so files can also be cached at that layer.

I could easily see GlusterFS used in your situation. You could have a dual GlusterFS server replication config for redundancy. Your application servers could access this via GlusterFS client, NFS, or SMB.

Without further information it's hard to point you in the right direction.

Replicate AND distribute

Posted by rgristroph on April 29, 2013 at 10:07pm

I think they replicate and distribute, like one of the higher-order RAIDs.

http://community.gluster.org/a/how-are-distribution-and-replication-rela...

However, I have not built a gluster that complex myself, let alone run a big Drupal site on it.

As other comments have noted, your actual access patterns and read / write ratios can matter a lot in setting this up.

Capistrano

Posted by rwohleb on January 19, 2012 at 11:02pm

Have you taken a look at something like Capistrano? I use it in my production environment to push code to multiple load balanced apache servers. It will pull from something like a git repository, setup any symlinks you need for file directories, and then update the 'current' symlink across all the servers at the same time. It works flawlessly for me. The file directories still need to be on a shared FS like NFS, but at least all the code and regular static content is on a local fast FS on each server.

We're using Aegir

Posted by gmania on January 20, 2012 at 7:57am

I've heard lots of good stuff about Capistrano. We're using Aegir though, which handles a bunch of the same tasks.

I finally got cachefilesd up and running (needed to enable selinux), but performance was actually worse than straight NFS. I'm guessing that since the issue was mainly the overhead involved with loading dozens of files and not the file transfer times, caching actually made the problem worse by increasing the overhead with no major gain in terms of transfer times, since the files were so small.

I'm currently rolling an Aegir-specific solution to split out code using symlinks to local files and maintain that via rsync, and only share configs and user generated content (files, etc) out via NFS. Initial benchmarking showed major performance improvements.