Multi-colo site architecture options / example sites

ctoomey's picture

Hi,

I'm an architect working on a project to rebuild our current consumer site, possibly using Drupal, and am new to Drupal and learning as much as I can as fast as I can. We currently serve all users to our site out of a single colocation (colo), but in the near future are planning to add additional colos in other geographic locations and serve the site concurrently from them all, with users being routed to the nearest colo.

I'm wondering if and how folks have architected Drupal websites that are hosted in multiple colos, e.g., east, central, and west US. Do you guys 1) know of or work on any such sites, and 2) if so, know what database architecture they're using?

It seems like there are only 2 options for how to handle multi-colo Drupal database writes: 1) have the writes in each colo go to a local master DB, and use multi-master replication across the N colos, and 2) have all the writes go to a master DB at a single colo, and replicate to read slaves in each colo.

Are Drupal core and most modules written such that a multi-master setup would be feasible and not result in database inconsistencies from concurrent writes to different masters, e.g., from auto-incrementing fields and such? What about when external caching (memcache) is enabled -- is the assumption generally that all DB writes happen through the Drupal stack and thus flush or update caches on write, which would be violated if the writes happen in a remote colo and are only replicated locally at the database level (not through the Drupal stack)?

Option 2), writing to a single colo, has major limitations in terms of latency and scalability so doesn't seem like much of an alternative either.

Comments / thoughts?

thanks,
Chris

Comments

Implementations vary because

vividgates's picture

Implementations vary because of the modules that people use, in terms of code.

Some links that may help:

http://groups.drupal.org/node/27820
http://drupal.org/node/22754
http://drupal.org/node/469274

Sharding for distributing load, but not for small use cases: http://axonflux.com/mysql-sharding-for-5-billion-p

Multi-colo Drupal site examples?

ctoomey's picture

Thanks for the links, have looked through those but found nothing about multi-master setups.

Do you or anyone else here work on or know of any Drupal sites using multi-master setups, or where else I should post besides this group and the database group?

thanks,
Chris

One way to handle this is

jmccaffrey's picture

One way to handle this is with a CDN. If your site has lots of static content and or anonymous users, local targeted edge caches will be smarter than de-centralizing your DB. Have you explored this option yet?

CDN suitable for publishing

vividgates's picture

CDN suitable for publishing sites, such as for news and magazines with a lot of static pages to disperse content, but I think what's being asked for is also a write intensive setup, because it is for a consumer site (ecommerce?). Sharding was my first suggestion because of replication issues of multi-master setups, but this reason alone doesn't seem to satisfy.

I know MySQL Cluster is out there, but do you really need all this stuff, and why would this match your use-case?

It's sort of like designing rotating gears, where you can round-robin multi-master and/or sharded setups at each Colo, and the replicas can be sent across the colos in a round robin fashion as well. But the similarity ends because with coding you can also address fallback databases for HA.

Draw some diagrams to resolve the architectural issue and gain a mental picture over what you need to do. I'm not aware of any example code for multi-master setups.

Cheers.

Paul

Any known multi-colo Drupal sites?

ctoomey's picture

I don't yet have a specific site issue I'm trying to solve, I'm just interested in knowing how people are scaling Drupal sites beyond a single colo.

Are there any multi-colo Drupal sites? If so, what's the write intensity and the DB setup for them?

thx,
Chris

To put this in perspective

dalin's picture

To put this in perspective the Grammys, which has peak traffic of tens of millions of page views in one weekend (listen to the Lullabot podcast to corroborate my numbers) is all served from one location (plus Akamai). So it can be done even at scale without issue.

--
Dave Hansen-Lange
Technical Lead
Advomatic LLC
Great White North Office
Canada

Even better

jcisio's picture

There is a Drupal site that has as many pageviews as that site (if "tens of" = "20"), and uses only ONE server, without any CDN. So, a multi site hosting is far far from most of real needs.

If you want to run something

soyarma's picture

If you want to run something like master-master MySQL db replication you just need to be sure that the pipe between the two masters is fast enough and big enough for masters to not get out of sync. However, you would need to have both colos talking to the same master and be able to failover. Drupal code is not written with running from diverse database servers at the same time (at least not writing--reading is definitely possible).

However, since a coast to coast round trip is only 70ms (if you have a good pipe and it can handle the data volume) that is all you would have to add to the time it takes to execute your DB queries. If your Drupal config is screamin and only takes a base of 70ms to load a page (definitely doable), then you are talking 140ms to deliver a page to apache to serve.

This delay needs to be added

jason.fisher's picture

This delay needs to be added to each query executed. They would not execute in parallel, and the results of each query will block the PHP that follows..

.. and it's pretty easy to end up with 250+ queries per page these days.

True, but any site that gets

soyarma's picture

True, but any site that gets that sort of load should be using memcache for most cache queries. If you use a path alias cache in memcache then you are actually down to under 20 queries per page (though that can depend a lot).

Any known multi-colo Drupal sites?

ctoomey's picture

It's great to hear that some Drupal sites are able to scale to serve lots of requests from a single colo, but I'd still like to hear if there are in fact any multi-colo Drupal sites. Does anybody in this group know of any such sites and if so, will you please post the sites and the Drupal developer contact for them if known?

There are important reasons to serve from multiple colos besides increasing scale, namely 1) being closer to the end users to reduce latency and 2) providing continued availability when a colo or its internet connectivity become temporarily unavailable.

Note that the latency increase to construct a Drupal page in a remote colo compared to the DB-hosting colo would be (interColoLatency*numberQueriesToBuildPage), not just interColoLatency, since the queries would be executed serially and not concurrently during page construction Of course local query caching in memcache in remote colos would alleviate that to the extent to which the queries were cacheable.

Tag1 has at least two clients

catch's picture

Tag1 has at least two clients (I'm not sure if I'm supposed to say which ones but I'll find out) serving content from more than one data centre.

One data centre is the 'master' serving all authenticated requests and some anonymous. The second data centre only serves anonymous traffic.

Once a request is routed, the full request is served from a single location - no read requests across data centres (MySQL is configured master/slave with the slave in read-only mode, memcache and varnish are set up/patched to read locally, but set/delete/expire to both locations).

This means you can use the second data centre for HA, while not leaving it completely redundant until there's a failover. It's not the same as trying to do master-master, I've seen that discussed previously but I'm not aware of places it's implemented (and agree with others that it's unlikely to be necessary for the very large majority of sites).

That sounds like a great

dalin's picture

That sounds like a great architecture. Minimizing complexity while maximizing scalability.

--
Dave Hansen-Lange
Technical Lead
Advomatic LLC
Great White North Office
Canada

OK I double checked and it's

catch's picture

OK I double checked and it's fine to name at least one of the clients using this setup - Symantec Connect, there's a d.o case study at http://drupal.org/node/1061630.

Many questions

jcisio's picture

I don't know what DB-hosting colo is, but I'm sure you don't want to put your web servers and DB servers in different data centers. Never have I heard any, because of the reason you pointed out: latency between web server and DB server.

And now you are trying to solve a problem that you don't have.

  • Latency: except for heavy AJAX application, you don't want to reduce latency for that expensive price. A pageview is consist of a few dozens of HTTP requests. Use a CDN for static content already reduce latency of more than 95% of your requests.

  • Connectivity: a DC always has multiple connections. Only natural calamity (like a 8.9 richter earthquake) can make trouble.

Well, and if you still want to solve that "problem", you need to find out the characteristics of your application, then define a trategy: is it read-intensive or write-intensive, what is the best DB replication... (read one of the latest posts in Facebook Engineering page about read globally, write locally and vice versa).

We're talking about multi-colo sites

ctoomey's picture

and the option to have a writable DB in only one of N colos. This would be the "DB-hosting" colo, while the other N-1 colos would be "remote" colos w.r.t. the database.

This was option 2) from my original post: "It seems like there are only 2 options for how to handle multi-colo Drupal database writes: 1) have the writes in each colo go to a local master DB, and use multi-master replication across the N colos, and 2) have all the writes go to a master DB at a single colo, and replicate to read slaves in each colo.".

Okay, I can understand your

Garrett Albright's picture

Okay, I can understand your frustration that you're asking about multi-colo sites and we're not answering you about that. (And no, I don't know of any multi-colo sites or sites using multi-master databases, either. In fact, I didn't even know that the latter was even possible until recently, when I heard about in the context of "don't even bother going through the trouble of trying to do multi-master databases.")

But at the same time, I'd encourage you to consider our experiences here and consider that, in reality, unless (or perhaps even if) your site is going to truly be a monster Drupal site like the Grammys or Examiner or such, you seem to be thinking you need much more in terms of hosting resources than you really will. As I've always tried to stress here, throwing more hardware at a problem should not be your first attempt at a solution.

Start small. Once you start hitting your limits, look for places where you can optimize. Only after your site has become so prolific that there's no longer a cost benefit to further optimization should you start worrying about colocated DBs and such.

Need to understand Drupal's scalability story

ctoomey's picture

Thanks Garrett. Here's some more context on why I'm asking about this. We've got an existing, successful site (www.coupons.com) that already gets a lot of traffic and that we hope and expect will continue to grow traffic significantly going forward. It's currently running on a Windows stack using .NET, but we're rewriting it for LAMP and hoping to use Drupal for content management, theming, etc.

Regardless of whether we use Drupal or not, we're planning to expand our hosting from a single colo as we are today to multiple colos around the U.S. and world. Hence we need to know now if and how we can make Drupal work well in that kind of deployment and so I was hoping to hear from others here about their experiences doing this.

After digesting the comments here and thinking more about it, I think an active/active multi-master setup shouldn't be necessary and instead we can get by with all writes (and non-slave-safe reads-after-writes) going to a single master with all slave-safe reads hitting local caches or slaves, with failover to the secondary master (in another colo) for availability.

If / when the single master setup becomes too detrimental to our latency or scaling, we can look at sharding via MySQL cluster or similar.

Multi-master / master-master

exlin's picture

Multi-master / master-master configuration of mysql is what i have seen most. If you are worried about load queries are causing you should also consider memcached witch reduces load on db greatly.

Definitely memcache for reads

ctoomey's picture

I was talking about scaling for writes.

Multi-master

mbutcher's picture

We've been working for several months on multi-master setups across datacenters. It is far from easy, and we've done some substantial hacking already. We do get the traffic to justify multiple datacenters, but we also have the developer resources to take on a daunting project.

As it stands now, we're doing all writes to one master, which replicates to its slaves, and to another "master" with its own set of slaves. But we don't have the downstream master replicating back up, and we probably never will. We do (or should) have the ability to swap the relation between datacenters if need be. In that case, we will swap the relationship between the two masters, and all writes will go to the second master.

Latency is a HUGE issue for our setup, even over a very fast link.

Using D6 or D7 / what sorts of hacking needed?

ctoomey's picture

Matt, are you running D6 or D7 or Pressflow6, and what areas have you had to hack -- contributed modules so that they properly support master / slave queries, or something in core (what?)?

And your configuration has 2 masters, each of which have their own slaves and which also replicate to each other, but only 1 of the 2 masters ever takes writes writes? I.e., it's dual-master but with and active and passive vs. 2 active masters?

2) providing continued

dalin's picture

2) providing continued availability when a colo or its internet connectivity become temporarily unavailable.

We've done this with fail-over, but not two active datacentres.

--
Dave Hansen-Lange
Technical Lead
Advomatic LLC
Great White North Office
Canada

We have a couple clients who

highermath's picture

We have a couple clients who do this. One is master-slave, which is fine for them as their sites are entirely anon. for regular users. The other uses a commercial middleware product that provides replicated data management and caching services on top of a scalable clustering protocol. The solution is very effective (and expensive). Unfortunately, I can't give details, but it isn't hard to find.

High performance

Group events

Add to calendar

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: