Hello
We have a busy site currently running Pressflow 6.15 across 4 web servers pointing to a single Master/Slave DB setup.
One of the web servers is the CMS server that web publishers use to update the site. The other 3 servers are public facing and server the corporate website.
The public site is anonymous in terms of Drupal logins, although a few pages need to connect to a 3rd party site via SOAP, and so we need to maintain some basic session information.
So, given that the site is 95% 'read only' and can be served by Boost/Varnish I wanted to explore the idea of placing slave databases on each of the public facing web servers (so PHP connects to the DB via localhost), and set the database replication to ignore the sessions table, and we'd then just use a single instance of memcache to look after session management (and even other cache tables). Or, I could just allow sessions to be saved individually on each server's slave DB instance, since the load balancer always keeps users on the same box - and even if it didn't, or a box went down, the loss of session is not a major issue.
The CMS server that the web publishing team uses would be connected to the Master database which, in turn,would replicate all changes to web servers A,B,C except session data. Drupal running on web servers A, B and C would connect locally to their 'own' instances of the slave DB.
This is just an initial thought and needs a lot of work, but I just wanted to get it out there and see if anyone has achieved something similar?
Thanks, James
Comments
We do this...
We setup high traffic Drupal sites with a read only slave on each web server and it works really well. The problem is that Pressflow 6 does not make alot of use of the read only slave, that is supposed to change in Pressflow 7. One of our sites was doing 13,000 queries per second against on of the master servers on their big traffic day and the read only slaves got less than 5% of that. So, your mileage may vary, it's going to depend a lot on the type of queries your site is utilizing. I have not gone code diving to see what queries pressflow sends to the master and which ones go to the slaves, so I can't help you much there. Based on some very non-scientific guessing, it looks like simple selects go to read only, but any queries with more complexity than that get sent back to the master.
1 major advantage we have found to this setup is the redundancy, if your master db server breaks down you have a local copy of the data you can use, it means locking out any changes on your site and not having the same experience, but at least the site will be up while you fix your master server(s).
--
www.neospire.net
Thanks Our plan was to
Thanks
Our plan was to actually tell the webservers to treat the 'local slave' DB as if it were the master i.e. they would know nothing about the real master database, that only the publishing team's CMS is pointing to.
In this scenario, 100% of queries on each of the three web server would got to the slaves databases running on localhost on each server. Very few - if any - writes would be made to the databases running locally on the web servers. These databases are treated as practically 'read only' DBs, purely for the generation of content. Customers never submit data that is stored in the DB, it's all anonymous.
100% of queries on the CMS server would go to the master.
The master would then replicate the new data to the three slaves, but would ignore certain tables during the replication process (sessions, for example) so that only only updates that affect the website layout/content would be pushed to the slaves.
it's purely anonymous traffic hitting the three web servers except for a tiny bit of session management that is needed from a third party SOAP API, but this session info could easily be held in a central memcache bin, to which all three web servers point.
Just wondering if anyone has tried this at all?
Thanks, James
Questionable
This is not a common architecture or solution. Tread cautiously; while it's possible to do this (anything's possible) it's not something Drupal supports very well.
To wit, Drupal will want to make write queries even in a read-only type application (e.g. updates to sesssion, user, history tables). You'll need to code around all those, and who knows what might be going on in contrib space.
You can maybe solve this via creative database prefixing, or by creating your own DB driver, or even by having the mysql user access for the read only DBs be "select" only, but be prepared to deal with a lot of errors and wonky side-effects as you try to do things that are supposed to be "easy".
In general, if all your traffic is anonymous, Varnish should offer a lot of bang for your buck without the potential for odd side-effects.
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
Varnish (or other whole-page cache)
I see you're already considering this. Unless you think you're going to have a lot of traffic that's spread extremely wide/shallow (e.g. 1M requests over 100,000 unique urls), a static caching system is what to focus on. This will be orders of magnitude faster than anything you'll get with Slave DBs.
Slave DBs theoretically come in most handy when you have a lot of sorting/filtering/drilling-down in your application. If that's the case, you might want to explore the potential of building that part of your app around Solr, which can be scaled/culstered more easily than MySQL.
If you're primarily in the businesses of publishing pages, focus on caching the whole pages. :)
https://pantheon.io | http://www.chapterthree.com | https://www.outlandishjosh.com
Josh is correct...
Based on your further description, joshk is correct, you really should concentrate more on your cache and just optimize your MySQL master to handle the load needed. If I understand the scenario you are describing correctly then you could run into some problems. If you do write to the same tables on the local db that the CMS would be pushing to you have a real potential for running into conflicts and your replication just falling over. To make that work properly you would need to run circular replication between all the "slaves" and include the master in that replication. This would be a scary scenario to maintain in an environment with any size to it, a replication failure in that type of setup could cascade and you could have all of your servers writing un-synchronized data that you will then have to help them catch up on.
--
www.neospire.net