Hi everybody,
First of all, I am very new to Drupal and actually not actively coding myself but more involved as a project manager. However, as I've done some programming before and had some computer science modules at university I think I do have a basic grasp on the bigger issues around cloud computing.
We're just finishing a new system which we of course hope will become a huge hit but right now I am more concerned about general hosting issues.
We created a custom search which is pretty heavy on the database with a lot of different joins and queries. Whilst there of course always is some room for improvement in the algorithm itself, my main concern right now is: what you would suggest is the best hosting in a scenario like ours? We are working with a very skilled Drupal coder but he never used AWS before so he doesn't have any experience when it comes to this.
Basically, we have a lot of read queries, a few write queries and of course some basic traffic. Currently, I am however most worried about the speed of performing queries.
We are officially deploying the system for beta this Friday (currently in late alpha) on a dedicated server (decent basic package).
Would you recommend moving to EC2/S3 in a scenario like this? What further factors are there to consider for us and how much custimization roughly will it need for a whole migration from an old host to EC2/S3?
Thank you very much in advance. I will try and share our experiences here in the future so that others can benefit from whatever we learn/mistakes we make on the way.
Jonas
Comments
Hi Jonas, We're currently
Hi Jonas,
We're currently trialling something similar. There's no straight answer to your question I suspect, but some thoughts...
You should expect to see an ~30% reduction in speed from an Amazon EC2 instance from equivalent bare metal, due to the impact of a hypervisor on CPU speed and disk IO. You can mitigate this in part by upsizing your instance, but this will cost more of course.
You should make sure your DB data store is sitting on an elastic block store attached to the EC2 instance. This will both ensure that your data will persist even when your EC2 instance is terminated (rather important!), but also speed up your DB since disk IO on an EBS is faster than native disk IO on an EC2 instance. If you haven't seen it there's a great write up of how to do this (and install Drupal) at http://www.sunsetlakesoftware.com/2008/09/13/running-drupal-website-amaz....
The decision of what deployment scenario to choose is a complex one and performance (particularly performance of specific queries) is only one factor here. It is generally wiser to look at DB optimization (indexes, my.cnf settings, EXPLAIN etc.) than worry too much about the hardware. This will also help you scale out with load (which will be another part of the performance equation).
But if for some reason you can't optimize your queries anymore, and replication etc. won't help, it's worth remembering the advantage of EC2/S3 is flexibility in being able to provision hardware quickly, not performance. If your hardware requirements are relatively constant over a 6-12 month period and you may well be better off simply with dedicated hosting on bare metal, and picking a box with really fast disk, a stack of RAM for your DB server, and a fat pipe to your webserver.
In summary, it shouldn't make a difference really, always look at the software before worrying too much about the hardware. But if you were to pick Amazon over hosted on performance alone, hosted is your choice.
Hope that helps!
Thank you very much for your
Thank you very much for your comments ajessup. We soft launched yesterday (www.PlanetAbroad.com) and as you can see performance is far from ideal.
Bottomline you say is Software > Hardware and the point in EC2/S3 is rather flexibility than pure power?
By the way, what kind of tools do you use to analyze the speed of your sites? For example, how in the above website would I analyze what takes forever in the query "beach, Malysia" ? (or if it actually is done very quickly on our server but the connection between the server and my host is too slow?
I know this is a very fundamental question which will require a lot of investigation from our side over time but maybe you could provide me with an overview link similar to the "running a drupal website on Amazon EC2" overview you gave me?
YSlow, Devel Module, Slow query log
Try:
1) YSlow
2) Drupal devel module to see what's taking so long, PHP exec or MySQL
3) MySQL slow query log
Kieran
Drupal community adventure guide, Acquia Inc.
Drupal events, Drupal.org redesign
Hey Jonas, Bottom line you
Hey Jonas,
Bottom line you say is Software > Hardware and the point in EC2/S3 is rather flexibility than pure power?
Correct. I wouldn't worry about the performance of the hardware at this stage (unless it's REALLY slow for some reason). If you want your website to handle any serious amount of traffic (ie. several users requesting pages simultaneously), then even on a slow machine requesting a single page from a single user must give a pretty snappy response time.
By the way, what kind of tools do you use to analyze the speed of your sites? For example, how in the above website would I analyze what takes forever in the query "beach, Malysia" ? (or if it actually is done very quickly on our server but the connection between the server and my host is too slow?
If you want to see why a page is trying to take too long to load, get Firebug (a Firefox plug-in). It has a tab called Net that will tell you if latency between your machine and the site is an issue, or if the wait is in the server's processing time to manage requests.
I suspect the latter. Assuming it is, then website performance optimization is a book or two in itself. But the chapter in the 'Pro Drupal Development' (VanDyk & Westgate 2007) would be a great place to start.
In the meantime, some other techniques and tools to try:
There's a stack of other tools and techniques, xDebug's profiling tool will help you identify precicsely which PHP queries are taking a while to run, but you probbably don't need all that just yet.
Sounds great - had a look at
Sounds great - had a look at YSlow - seems like our front page needs to combine all of the javascripts and external stylesheets amongst others...
Now I'm going to have a look at our Database as I suspect there is an awful lot going wrong there as well. Thanks!