Could a good-sized Drupal newspaper site run on a cloud-based infrastructure?

Events happening in the community are now at Drupal community events on www.drupal.org.
geoffb's picture

We are currently looking at changing our hosting company so that we can ramp up our resources.

One of the options we looked at was Amazon's EC2 (Elastic Compute Cloud). However, we backed away when our intial tests showed that transfer speed weren't very fast.

But, I was just wondering if anyone in the Newspaper on Drupal group has successfully trialled Cloud Computing and, if so, whether there were any tricks and traps if you wanted to run a good-sized Drupal newspaper site on a cloud-based infrastructure.

Cheers,

Geoff

Comments

EC2

benjamin birkenhake's picture

We did evaluate the EC2. Also it looks attractive a first, it's hell'a lot of work, because you have to setup the whole load balancing enviroment by yourself. And the machines are much slower than our own, so we decided to do the loadbalancing by ourselves, if neccecary. But, with a quadcore - machine, we have no problem in serving half a million PIs a day an with a simple loadbalancing setup, we could even triple that ammount.

You might want to take a look at this:
http://www.johnandcailin.com/blog/john/scaling-drupal-open-source-infras...

And you might also want to look for Drupal and Media Temples Grid System:
http://www.mediatemple.net/webhosting/gs/
http://www.google.de/search?q=media+temple+grid+drupal

--
anmut und demut

--
anmut und demut

Server Info

geoffb's picture

Benjamin,

Would you mind giving me the details of your server (RAM, CPU, etc.) and if you've configured it in any special way?

Thanks,

Geoff

Of Course

benjamin birkenhake's picture

Right now we have we have a "Hewlett Packard ProLiant DL380 G5 High Performance" with 2 x Dual-Core Xeon 5160 / 3 GHz - 12 GB RAM for http://www.netz-gegen-nazis.com and the same machine with only 8GB RAM for http://kommentare.zeit.de. Both running on Ubuntu Linux.

We're also running a Zend Platform for better Performance, but it's only a small boost as Drupal usually is extremely fast. But it is great for monitoring and finding Performance-Leaks within Drupal. We actually did quite a lot of performance optimization within Drupal. Some Tables are not optimal indexed. Some Queries even from within the core are getting slow, when it comes to bigger datasets. But we could replace them all, without touching the core just by replacing the blocks, or functions by own Modules.

Besides that, we did no further special Configuration or Optimization, if I do remeber correctly. ;)

--
anmut und demut

--
anmut und demut

Thanks for post! I would

Doktor.Science's picture

Thanks for post! I would also like to know how you optimized performance.

This is really exciting

johsw@drupal.org's picture

Could you let us in on which tables you made new indexes for, and which core queries you have substituted?

Best,
/Johs. W.
http://www.information.dk

Puh

benjamin birkenhake's picture

I am not sure anymore where we added indexes, because we did that via phpMyAdmin and didn't document ist (shame on me). But I'am pretty sure, that one of them the users table, because I know that we did a recode of most of the user-blocks like "news users" and "currently logged in users".

We have about 180.000 registered users, which made queries on the users table slow.

I'll take a closer look tomorrow, when I am back at work. :)

--
anmut und demut

--
anmut und demut

we're running ec2

netaustin's picture

The New York Observer runs politicker.com and its network of sites on Amazon ec2. It's a pretty interesting configuration--politicker.com is an portal into a network of state-level political web sites (e.g. politickernj.com, politickerca.com). We split traffic based on the domain name--only the Apache servers are different based on each site, all other servers are shared. Those states that are new to the network (we're at about 20 right now) and have low readership share large (8mb RAM) servers in groups of four. New Jersey (the original state site) has an extra large instance to itself, and we have an extra large database server sitting behind the whole cluster. Also a small NFS server.

We're building small caching servers built with Varnish; once deployed, we'll automatically turn off all but one actual web server at night and point the Varnish servers at that one web server; since all web servers have identical configuration, they can all serve any of the sites at any time. We cache heavily already, but Varnish should allow us to burst to a very high number of impressions/minute when needed.

There's no actual printed newspaper behind it, as there is on Observer.com (FWIW: running on meatware with a CDN) but for the type of content and frequency of publication, there might as well be.

Also, I think if you needed load balancing, running pound on an extra large instance with a minimal linux build and nothing else installed would be very fast, and fairly easy to configure. We've had no trouble with latency or server speed.

What's the advantage to

eli's picture

What's the advantage to using EC2 over a dedicated server? More flexible in terms of scalability? If it's cheaper through EC2, it can't be by very much.

Theoretically ....

yelvington's picture

Theoretically it gives you arbitrary scalability, because if you need more processing power to handle traffic spikes you can just instantiate more clones of your machine, then release them when things settle down.

mindlace's picture

I'm currently investigating cloud-based deployment approaches, and I can say that my experience with EC2 is that it performs as advertised, though of course the "units" you're getting with EC2 are smaller than the servers you could be deploying yourself.

I am currently using s3fs for the /files directory. I have to do more testing before I can say whether or not this is an adequate solution.

The largest engineering obstacles mostly revolve around persisting your database. S3 is pretty much an obligatory part of that experience. The system I've been using persists LVM snapshots of mysql in order to provide consistency when the DB server goes down.

I would also suspect that there would be problems down the road when your DB grows past the point where you need more than 15GB of memory, but ... that's a lot.

As to the "why" - The fact that you can deploy new machines in a few minutes is a good thing; another big draw is that you don't have to make a year+ investment in order to deploy a cluster, like you do with real hardware.

Really Depends on Your Requirements

TheDude's picture

I hate to answer a post this old, since I'm sure you've long since made your decision, but for the sake of those that come across this post looking for their own answers, as well as the poster, I'll offer a reply.

The short answer is yes, you can certainly use EC2 for a "good size" newspaper site, a social networking site or a blog site or any combination thereof. We have done so for well over a year and were relatively early adopters of the EC2 service. Across our organization we deliver close to a hundred million page views a month at peak times and EC2 has served us well and afforded us dynamic scalability and lower TCO and ongoing infrastructure costs.

Does EC2 have it's downsides and challenges (even today in 2010)? Yes absolutely, but for us the benefits very much outweigh the disadvantages.

The downsides are few:

  • In our experience write performance across the board is very poor when compared to iSCSI or even SATA write performance on traditional hardware. In addition to this, write performance has been reported as an issue with the RDS service.

  • EC2 is open in some senses but closed in others. As such you'll experience a bit of vendor lock-in. That is to say that you can't just go into the data center, pull out your servers and send them to another colo. You may actually have to refactor your software, operational protocols and workflow to migrate your systems AWAY from EC2 if you ever chose to do so.

  • There is a very real learning curve to adopting EC2 across an organization. This isn't such a big problem for small companies with one or two site administrators but when you have a fairly diverse organization with a lot of support staff and developers, learning the ins and outs of EC2 migration and ongoing administration can have a non-trivial cost to your organization.

On the other hand, the upsides are many:

  • Over the last year and a half Amazon as repeatedly reduced pricing across important segments of their service.

  • For us the TCO was far lower than colocation or managed hosting

  • Instant and essentially infinite scalability

  • Diverse product offering and simple integration of more advanced services to help your site performance (mapping S3 buckets to CloudFront CDN, Elastic Load Balancing, Dynamic Scaling, Hadoop Analytics etc)

  • A unified management interface for all of the above services.

  • In many ways, EC2 gives us a much easier workflow for upgrading and patching (or even just trying out new operating system optimizations, versions and configurations) since you can abstract your application data from your operating system via EBS volumes and simply migrate application snapshots to newer AMI versions and migrate Elastic IPs to the new instances seamlessly.

There are also some interesting and potentially exciting evolutions right here in the Drupal community with EC2. You may want to check out Project Mercury which is an Amazon AMI running Pressflow, Varnish and other performance enhancements for Drupal. http://groups.drupal.org/node/25065

Have you made a choice? If so, what did you decide?

Cheers,
Dan

All of our Drupal sites are

morisy's picture

All of our Drupal sites are still fairly low traffic, but this was a great overview of Drupal cloud hosting and food for thought for the future, thanks for writing it up.

Web guy, SpareChangeNews.net
Twitter: @morisy / @sparechangenews

Newspapers on Drupal

Group organizers

Group categories

Topics - Newspaper on Drupal

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: