I have been searching high and low for the best advice on stack setup for a site I want to release in the coming weeks/months. My issue is that nearly all discussion regarding performance only deals with anonymous and therefore easily cachable sites. I then stumbled upon this group which appears to be the answer to my prayers, although there is a lot of info in here to digest and make sensible use of.
The case in question is a social networking site, so high concurrent throughput of only authenticated users (which seems to make most tuner/optimizers shiver in fear) and needs to be inherently scalable.
I throw open a call for the members of this group to discuss what would be the ideal setup for this scenario? And would mercury/pressflow etc be up to this job?
Based upon this discussion there is also the potential for paid consultancy work to put the best solution into practice.
I'm genuinely excited by this group and its efforts and where it is heading.
Comments
re: A valuable test case for discussion on the 'ideal' setup
The things that Mercury provide that will help with a high number of authenticated users are:
1) Opcode caching of PHP using APC
2) Caching of mysql query results using Memcached
3) Improved (and less resource intensive) search using Solr
4) Minimization the impact of any anonymous traffic using Varnish
5) Modularization. ie: Mysql, Solr and Varnish can all be moved to another server when loads increase.
Hope this helps,
Greg
--
Greg Coit
Systems Administrator
http://www.chapterthree.com
What about hardware architecture?
Ok so lets assume Mercury is the ideal software setup, what about hardware?
There seems to be alot of attention for Amazons cloud services as the best host to use, but I'm having trouble being convinced on that. Amazons services seem to be really expensive, mainly due to them charging you for traffic and all of the hidden costs. If you created a site that was an instant success and drew an massive amount of traffic it could end up bankrupting you.
From my research I have found a host (1and1.com) who provide cloud servers with configurable specs and no charge for unlimited traffic and if you want to add an extra 1GB Ram its just £5(~$7.5), the same for an extra 100GB storage etc.
I'd prefer this option because you know exactly how much you are going to spend each month and you can easily up/downgrade whenever you need to.
Using this host, I would then think the best setup would be to lease 2 such servers, one setup to be the the web server, the other as a CDN (see CDN integration module) so store and serve files from.
For the web server you could give it high RAM and cores but low storage space, the other, high storage space but low RAM and cores. Is this sensible? And what about the Database? Would you suggest a database on each with replication, or only a single DB on one of them, or maybe even a 3rd server just for the DB?
Pros for Amazon
All things considered, for any site with more than a 100,000 daily visitors (not visits) of whom a majority will be authenticated users pulling in resource heavy sessions, I found it difficult to do better than an Amazon m1. There are some case studies available at aquia and 2bits which indicate that at this level, the main bottleneck would be bandwidth and one cannot beat Amazon on that.
There are no hidden costs with Amazon. Each cost is very well documented and you also have the costing worksheet available to work out in advance how much you will spend. We have had issues with a recent DDOS attack on ne of our servers - a 10x spike for several days - and the cost went up by just 4 pounds for a server cluster that costs us around 1000 pounds per month.
Also, I have found no issue from starting small and then grow using EBS volumes and snapshots. If you can anticipate spikes, even better, just fire up another instance with a load-balancer in front. A sensible solution will be one of the micro-servers (available as of beginning September 2010) as a load balancer in front, two small web servers running mercury varnish/apc/memcache/apache2 next and then a replicated master / slave mysql or its dropin replacement Percona (recommended - made a major difference for my work) for the database. You can have the master running in a m1 instance only for writes and the slave in a smaller instance for the reads.
There is a single point of failure in this, the micro load balancer. One can start up another using heartbeat....
So many possibilities and so little time....