Database Scalability
This group is an attempt to get a serious discussion started around database scalability with Drupal sites.
Right now, Drupal mostly relies on direct connections with mysql to generate Web pages. There is an internal caching mechanism which can help alleviate load, but which introduces some problems of its own (like the user login problem, where users have to reload a page after logging into the system). Typically, mysql optimizations and hardware concerns become a primary area of focus after Drupal has reached its caching limits. While there are not that many massively trafficed Drupal sites around (massive means greater than 100k hits an hour), they are coming, and it would be great to have some defined paths people could follow to build up their sites.
I think about scalability all the time, and find it is a fairly intimidating topic to begin with. Most shops do not have the hardware resources available to build a server farm and test various configurations of the stock distribution or optimized versions floating around between backalley Drupal hackers. Most shops, for that matter, do not work with anything that could qualify as a high volume Web site, so rarity is working against this project right from the start. I could share some horror stories about what happens when a site get Slashdotted or MoveOn'ed, but the basic lesson is simple - it is better to be prepared then be left wondering what is going to happen the next time there is a big spike.
So, to begin with, there are 3 basic levels at which scalability work can occur -
1) Application - refers to the way Drupal accesses data. There are several projects going around at the time of this writing dedicated to controlling how Drupal accesses data from mysql (unsure about PostGres).
2) Database - refers to how databases serving Drupal are configured and tweaked.
3) Hardware - refers to the machines hosting Drupal sites, optimal configurations and things that can make the service easier to use.
There are a couple of things I am particularly interested in which others may want to take a look at.
1) MemCache - http://www.danga.com/memcached/ - useful in alleviating load from Web servers. Installed on top of mysql and accessed using a different set of functions. Got a good rep over at LiveJournal as a way to kill db issues altogether and would take a fair amount of coding to get it into Drupal. Affects the application and database levels, in that it requires a code rewrite for Drupal and installation on the database server.
2) SSL Authentication - about 60% of the load on a high traffic Web site I currently host is from SSL authentication. Offloading the actual SSL processing to a server separate from the Web server can reduce that load significantly (the actual amount I don't know, I had an Intel product designed specifically for this at my last job which worked beautifully).
Anyways, hoping there are a lot of people interested in the upsizing.
M



row level locking vs table locking ..
on a site we deployed which got a fair number of hits but not a large amount, the sessions table was getting corrupted quite often. We figured that converting the session table to InnoDB was a worthwhile operation, since the session table is operated on twice during a page load (one read, one write), and using innodb (i..e row level locking) prevents table lockup
this fixed the corruption problem, we did not really get to the speedup issues :)
Another group?
There is already a high performance group. Do we need a 'database scalability' group? I've yet to see something interesting come out of the high performance group though.
High performance group...
High performance includes database performance. Let's centralize there and shut down this group. Sound good?
I like mutiple groups
I like multiple groups as they really focus on detailed aspects of performance. There is a difference between optimizing application performance at the code level and optimizing database performance at the table definition and server configuration level. Getting groups together around really specific parts of performance is probably going to accomplish more than a mashup of issues from all different levels.
That said, I see where all the users are going, and will simply ask my objections be noted for the... uh... record.
M