Drupal suffers from database scalability issues due to the fact that reads and writes to the database are executed on the same server(s). In custom applications, database reads and writes may be separated to different database servers.
If you are going to make the move from one database server to two database servers, should you buy more hardware, or configure each database to be optimized for the kind of load (write, read, back-up) it has?
For example, MyISAM does table locks during writes which causes delays for reads from the same tables. One option is to use InnoDB tables for high volume write tables. This should reduce the waits.
Buy more hardware, or patch Drupal to separate reads and writes?
Cheers,
Kieran

Comments
patch and coordinate with others...
Buying new hardware benefits one site - patching benefits all sites. Hopefully you could find others interested in creating the patch and getting it applied to core - or would this be something that couldn't coexist with core?
Greg
--
Knaddison Family | mmm Beta Burritos
knaddison blog | Morris Animal Foundation
This would have to be a patch
For 5.0 and earlier this would have to be patch. In effect you could have to check every query and see if it is a select (read) or a insert, update, delete (write). I don't think would be necessarily generally useful. It could also be very expensive processor wise. For sites like Drupal.org that have a lot of writes it would be useful, but I suspect the majority of sites have a very high read to write ratio.
I have been thinking about some further optimizations for the databases. On the read database, you could tune the database to have a very large MySQL query cache to store as many reads as possible from memory. On the write database you could reduce the query cache and allocate memory to other buffers which would support writes.
Cheers,
Kieran
To seek, to strive, to find, and not to yield
< a href="http://www.youtube.com/watch?v=COg-orloxlY">Support the Drupal installer, Install profiles, and module install forms
<a href="http://ia310107.us.archive.org/1/items/organicgroups_og2list/dru
There could be a simple
There could be a simple configuration options in the settings.php file that allowed you to configure multiple database servers such as master-slave or master-slave * n or master-master. Then create a new database.mysql-cluster.inc and do all of the hard work in here. Then the database abstraction remains and this can ship with the default drupal distribution.
Any thoughts of why this wouldn't work?
Steve
Slantview Media http://www.slantviewmedia.com/ | Blog http://www.slantview.com/
Yeah
No part of Drupal distinguishes between READ and WRITE queries. It's a non-negotiable prerequesite for the kind of 'shared nothing' setup we're looking for here.
Until we go through all of core and replace every db_query* call with db_read(), db_write() (and optionally db_replace() ), the "real" way of doing this eludes us.
That "real" way is to allow READs to come from slave DB instances while WRITEs go to a master, or insert your own preferred clustering method here.
reading and writing
you don't need to be able to distinguish at that level. a simple preg_match or better yet stristr($SQL,"SELECT") and you would know what each call was and be able to split them up accordingly.
If you did it your way it's not just the "core" you would need to go through, it is every module that you use. Not that this would actually be that hard. For a serious site that needs database clustering, you likely have the staff already to do this and could probably get this done in a day, maybe two.
But that would then break upgradability. So it would not be very good. The way described above would be faster and be more scalable for widespread usage.
There are probably a dozen or so ways I can think of to do this, but the thing is that you would still want to use the write db for reads initially since both database servers will still be getting an equal amount of writes. Every slave server still has to receive that write command (insert,update) during it's replication. So just adding a single slave server in the replication chain may not give you the complete performance increase you are looking for.
Slantview Media http://www.slantviewmedia.com/ | Blog http://www.slantview.com/
No, again.
Ya know, I could have sworn you just posted a comment (INSERT) which contained the word 'SELECT' in it...
I beg to differ...
Any INSERT is going to have to be done on the master server regardless of a sub-select or if the content contained has certain key words. All key words have to be quoted in order to not break the SQL syntax. If the SQL command has the word INSERT or UPDATE in it, that is routed to master servers. So, again, this can be accomplished with regular expressions. The logic to differentiate SQL code is not difficult although i was definitely over simplifying with just doing stristr command. It was not meant as an actual finished solution, rather as an idea to spur conversation to come up with a complete solution. The SQL language is not so complicated that there could not be logic written to determine what actions are being performed; thus, i still stand by my answer that the entire core and every contributed module would not need to be rewritten in order to accomplish what is being described here.
As a matter of fact, we are going to be writing it this way for a very large site we are converting to Drupal as the current traffic levels for the site are > 30m hits per month with expectations to grow to over 100m per month. I will definitely be documenting what we do in order to contribute back to the community with the solutions that we come up with. Hopefully, other people who have a vested interest in these technologies will also help to give options for extremely large sites that wish to use Drupal as their framework and cms.
The other thing to look at is mysql clustering. With a cluster you would not need to distinguish reads and writes, as you could just load balance between servers for all SQL commands. There are obviously some issues with this method as mysql clustering is still considered beta type software which may turn some people off.
Slantview Media http://www.slantviewmedia.com/ | Blog http://www.slantview.com/
I am a bit concerned that
I am a bit concerned that the neccessary syncing of master and slave creates more overhead than it is worth. it would be helpful if people with experience in master/slave setups could add to this discussion.
Setting up master-slave experience
Hi, we have been setting up master-master and master-slave MySQL configurations over the last few months.
We do it primarily for replication and HA and not for load. One thing we did find was that backing up lots of data means accessing every row in the database. This is particularly burdensome when MySQL tries to load everything into memory as part of accessing all the data. This can causing swapping when all the memory for the database is used. By using a slave to do replication it helps reduce the load on the live server.
If you decide you want to implement this we can share detailed documentation on how we set this up.
I have also been talking to a MySQL admin at one of the largest hosting providers and he recommended that the first thing they do is move applications to a master (write) , slave (read), and slave(back-up) configuration. We have have discussed implementing this fix for Drupal, but haven't implemented it yet.
If you want live examples Jeremy Zawodny's blog has lots of good examples.
Cheers,
Kieran
To seek, to strive, to find, and not to yield
< a href="http://www.youtube.com/watch?v=COg-orloxlY">Support the Drupal installer, Install profiles, and module install forms
<a href="http://ia310107.us.archive.org/1/items/organicgroups_og2list/dru
It's worth it
It's worth the trouble if you're mostly querying the database instead of updating it. It's also lower overhead to have a master and slave than a multi-master setup.
initial patch
Perhaps you already knew about this, but here is an old patch that I found that I believe achieves this:
http://cvs.drupal.org/viewcvs/drupal/contributions/sandbox/crackerjm/rep...
It probably isn't working anymore (or even very close) but maybe it would be useful.
Greg
--
Knaddison Family | mmm Beta Burritos
knaddison blog | Morris Animal Foundation
If i may
This is a 5.x speciffic code. i apologize in advance as i am not familiar with rolling out patches, i am very sorry. but i hope this bit of code can be helpful for the discussion.
<?php
//database.inc
function db_query($query) {
$args = func_get_args();
array_shift($args);
$query = db_prefix_tables($query);
if (isset($args[0]) and is_array($args[0])) { // 'All arguments in one array' syntax
$args = $args[0];
}
_db_query_callback($args, TRUE);
$query = preg_replace_callback(DB_QUERY_REGEXP, '_db_query_callback', $query);
//load balancing
if(strpos(strtolower($_GET['q']),"admin") !== false)
db_set_active('write'); //its important that all admin gets access to the most recent data
else
if(strpos(strtolower($query),"select") === 0){
db_set_active('read'); //this will not contain any data from the master (write) database untill replication happens
}
else {
db_set_active('write');
}
return _db_query($query);
}
//sites/default/settings.php
//$db_url = 'mysql://username:password@localhost/databasename';
$db_url = array(
'default' => 'mysql://username:password@localhost/databasename',
'read' => 'mysql://username:password@localhost/databasename',
'write' =>'mysql://username:password@localhost/databasename'
);
?>
also if i may be so bold as to say drupal should not be responsible in managing slaves and masters on larger scale than this. if certain setup requires multiple slaves then its the network administrator's job to install load balancer. then the load balancer's address will be filled into one of the above 'read','write', or 'default' fields.
if you rely on code such as the comment http://groups.drupal.org/node/2147#comment-5917 mentions then you are not doing much different than the current approach in 5.x, because drupal might return the same slave all the time which will achieve you nothing. only hardware load ballancer can really know which slave should be routed to during the request.
also this is just a proof of concept code. it shows that breaking select and other queries and sending them to different hosts is possible!
now this opens a few problems, for example: When user logs in to his site, that login session will be written in the "write" host, but immedialy on the next page refresh his session data will be read from the "read" host, so its like the user never logged in, for this issue to be resolved replication will have to take place instantly which defeats the purpose of this code.
I do not believe that current drupal infrastructure can be adapted to a master-slave database approach without MAJOR rewrite of some core functionality, a tip of an iceberg would be what i mentioned in the example.
------------------
Sometimes interesting things appears on http://litwol.com
create temporary table
Hi,
you should add a detection for "CREATE TEMPORARY TABLE" in your code too, because temporary tables will be used for search (at least in 5.3) but temporary tables will not be replicated on slave. it is save to create this tables on slave servers, because they are only visible in "connection scope".