EC2/S3 for DB-intensive drupal system

Events happening in the community are now at Drupal community events on www.drupal.org.
Jonas E's picture

Hi everybody,

First of all, I am very new to Drupal and actually not actively coding myself but more involved as a project manager. However, as I've done some programming before and had some computer science modules at university I think I do have a basic grasp on the bigger issues around cloud computing.

We're just finishing a new system which we of course hope will become a huge hit but right now I am more concerned about general hosting issues.

We created a custom search which is pretty heavy on the database with a lot of different joins and queries. Whilst there of course always is some room for improvement in the algorithm itself, my main concern right now is: what you would suggest is the best hosting in a scenario like ours? We are working with a very skilled Drupal coder but he never used AWS before so he doesn't have any experience when it comes to this.

Basically, we have a lot of read queries, a few write queries and of course some basic traffic. Currently, I am however most worried about the speed of performing queries.

We are officially deploying the system for beta this Friday (currently in late alpha) on a dedicated server (decent basic package).

Would you recommend moving to EC2/S3 in a scenario like this? What further factors are there to consider for us and how much custimization roughly will it need for a whole migration from an old host to EC2/S3?

Thank you very much in advance. I will try and share our experiences here in the future so that others can benefit from whatever we learn/mistakes we make on the way.

Jonas

Comments

Hi Jonas, We're currently

ajessup's picture

Hi Jonas,

We're currently trialling something similar. There's no straight answer to your question I suspect, but some thoughts...

  • You should expect to see an ~30% reduction in speed from an Amazon EC2 instance from equivalent bare metal, due to the impact of a hypervisor on CPU speed and disk IO. You can mitigate this in part by upsizing your instance, but this will cost more of course.

  • You should make sure your DB data store is sitting on an elastic block store attached to the EC2 instance. This will both ensure that your data will persist even when your EC2 instance is terminated (rather important!), but also speed up your DB since disk IO on an EBS is faster than native disk IO on an EC2 instance. If you haven't seen it there's a great write up of how to do this (and install Drupal) at http://www.sunsetlakesoftware.com/2008/09/13/running-drupal-website-amaz....

  • The decision of what deployment scenario to choose is a complex one and performance (particularly performance of specific queries) is only one factor here. It is generally wiser to look at DB optimization (indexes, my.cnf settings, EXPLAIN etc.) than worry too much about the hardware. This will also help you scale out with load (which will be another part of the performance equation).

  • But if for some reason you can't optimize your queries anymore, and replication etc. won't help, it's worth remembering the advantage of EC2/S3 is flexibility in being able to provision hardware quickly, not performance. If your hardware requirements are relatively constant over a 6-12 month period and you may well be better off simply with dedicated hosting on bare metal, and picking a box with really fast disk, a stack of RAM for your DB server, and a fat pipe to your webserver.

In summary, it shouldn't make a difference really, always look at the software before worrying too much about the hardware. But if you were to pick Amazon over hosted on performance alone, hosted is your choice.

Hope that helps!

Thank you very much for your

Jonas E's picture

Thank you very much for your comments ajessup. We soft launched yesterday (www.PlanetAbroad.com) and as you can see performance is far from ideal.

Bottomline you say is Software > Hardware and the point in EC2/S3 is rather flexibility than pure power?

By the way, what kind of tools do you use to analyze the speed of your sites? For example, how in the above website would I analyze what takes forever in the query "beach, Malysia" ? (or if it actually is done very quickly on our server but the connection between the server and my host is too slow?

I know this is a very fundamental question which will require a lot of investigation from our side over time but maybe you could provide me with an overview link similar to the "running a drupal website on Amazon EC2" overview you gave me?

YSlow, Devel Module, Slow query log

Amazon's picture

Try:

1) YSlow
2) Drupal devel module to see what's taking so long, PHP exec or MySQL
3) MySQL slow query log

Kieran

Drupal community adventure guide, Acquia Inc.
Drupal events, Drupal.org redesign

Hey Jonas, Bottom line you

ajessup's picture

Hey Jonas,

Bottom line you say is Software > Hardware and the point in EC2/S3 is rather flexibility than pure power?

Correct. I wouldn't worry about the performance of the hardware at this stage (unless it's REALLY slow for some reason). If you want your website to handle any serious amount of traffic (ie. several users requesting pages simultaneously), then even on a slow machine requesting a single page from a single user must give a pretty snappy response time.

By the way, what kind of tools do you use to analyze the speed of your sites? For example, how in the above website would I analyze what takes forever in the query "beach, Malysia" ? (or if it actually is done very quickly on our server but the connection between the server and my host is too slow?

If you want to see why a page is trying to take too long to load, get Firebug (a Firefox plug-in). It has a tab called Net that will tell you if latency between your machine and the site is an issue, or if the wait is in the server's processing time to manage requests.

I suspect the latter. Assuming it is, then website performance optimization is a book or two in itself. But the chapter in the 'Pro Drupal Development' (VanDyk & Westgate 2007) would be a great place to start.

In the meantime, some other techniques and tools to try:

  • Make sure there's nothing dumb happening on the client side, like 404 images (in Drupal, often if you request an asset that doesn't exist then drupal serves up a 404, which puts extra load on your machine as it tries to render the 404 page) or having your page dependent on having the Javascript having loaded and parsed.
  • Make sure your site is compatible with Drupal's CSS and page caching. If it is, turn it on. It will help a lot.
  • Almost always, poor server performance is actually due to the DB being loaded up. Use mysql's slow query log to see which DB queries are performing poorly. Go over those which are and review any complex queries you're doing - use EXPLAIN to look at the queries and make sure you've indexed all the right columns. See if you can find other ways of optimizing the query. If you can't get it down to a few miliseconds, think about building a query cache or use an alternative search system geared for performance, like Lucene. Most 'searches' performed on most websites are actually pre-cached.
  • Put your web and DB on separate machines. Make sure your DB has lots of RAM (and fast disk if possible)

There's a stack of other tools and techniques, xDebug's profiling tool will help you identify precicsely which PHP queries are taking a while to run, but you probbably don't need all that just yet.

Sounds great - had a look at

Jonas E's picture

Sounds great - had a look at YSlow - seems like our front page needs to combine all of the javascripts and external stylesheets amongst others...

Now I'm going to have a look at our Database as I suspect there is an awful lot going wrong there as well. Thanks!

Amazon Web Services (S3, EC2)

Group organizers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: