Case study: France 24 migration to Drupal 6, a brand new codebase to be open-sourced

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
ndeschildre's picture

France 24 is a public 24/7 international news channel that broadcast in three languages: French, English and Arabic. Its mission is to cover international current events from a French perspective and to convey French values throughout the world. The channel provides keys to understanding ever more complex events through in-depth analysis. France 24 puts also culture at the forefront of its programming.
France24 is part of the AEF ("Audiovisuel Extérieur de la France", French foreign medias), with RFI (a radio station) and TV5 (a TV station).

Launched in december 2006, the website was originally based on a Java CMS, Magniola. But due to stability problems, we switched to Drupal 5 on mid-2008. We have just migrated to Drupal 6 and a brand new codebase last Tuesday. I will now cover this migration, focusing on the technical part.
FYI, the monthy traffic of the France 24 websites is around 5 millions unique visitors. A more geeky metric is 300-400 concurrent active apache threads at all times.


Only local images are allowed.

The migration scope

This was not a simple migration. Indeed, since the first migration to Drupal 5, some lessons were learned, and a few technical choices had scalability issues. So it was decided to restart the code from scratch.
Then, we wanted to add much more flexibility to frontpages and stories. The structure was quite rigid, especially in the frontpages.
Plus, we were not going to make one migration, but two migrations at the same time: France 24 and RFI. The RFI website is based on ASP.NET with an homegrown CMS. And that had to be done in 6 months.

That was an interesting object specialization paradigm applied to a full-scale website development in Drupal: how can we work faster by sharing some of the code?

Developping two websites at once

Well, that works quite well. We finished the websites in time (RFI to go live in a couple of weeks), and we gain time everyday by working on both websites at the same time.

Basically, we have three sets of modules, the "AEF" one, shared by the two websites, and the RFI and France24 ones. In the AEF set, we define Views, Content Type, basic templates, taxonomies, and so on. We create the big common features: menu, tabs, easy views, externodes,... And we specialize them if necessary in the RFI and France24 modules. E.g. we add a field to a content type, we add a filter to a view, we add a RFI/France24-specific process like fetching of videos for the France24 video-on-demand, or the RFI radio editions.

As a result, more than half of the code is common, and a quarter specialized to each RFI and France24.

The project development

For this project, we were a team of 9 developers/projects leaders, and 2 sysadmins.
We worked using the Scrum methodology, with short 3-weeks sprints: 2,5 weeks of development, and then a demo to the journalists for feedback. That way, they could easily monitor the progress of the project and provide feedback *early*.

Main concepts

Multimedia Element: One thing we learned with the first version is that often new stuff is asked, and that stuff already exists hardcoded somewhere else in the website. E.g. a carousel of images for the frontpage is asked, but it is already available within a story, and harcoded to the story content type.
So we created the concept of the "Multimedia Element", which is a mix of videos, sound, diaporama, carousels, links to stories, twitter, external links, text, quote, ... And everything on the website is either an story, or a multimedia element. So everything you're seeing on a box in the website is a multimedia element, and it can be put everywhere, in a frontpage, in a story, in a special report, ...

CCK Formatters: The usual way to theme nodes on Drupal would be to theme the node page template, and the Views item template. Unfortunately that's not easy to reuse the themes elsewhere (e.g. in a multimedia element) with this approach. So we are making heavy use of the standard CCK formatter concept: we are creating node themes in CKK formatters for each content type, and we can reuse them easily.

Contrib modules and homegrown modules

We are using quite a lot of contrib modules. In fact 35. Amongst the classics, the lightweight Composite is used instead of Panels for the frontpage.

We also developed quite a lot of homegrown modules, that you can preview in this Drupalcon presentation. This includes:

- AEF Easy View: Views is a very powerful tool, but too complicated for journalists: do you see them creating their own view, and include the result somehow in his story?? This module is a CCK field that let you easily configure an existing view and put it in your story. You can even choose the theme of the results, and if it is going to be a carousel. E.g. say you want to show all the latest stories with the tag France. Select the tag, select the number of stories you want to show, the theme, preview it, and that's done!
Also, you can reorder the results of the automatic list, making it a manual list. Or you can keep it automatic even if you have reordered results.

Only local images are allowed.

- AEF Multimedia Element: The multimedia element content type as described earlier, plus the FCKEditor plugin to insert them in the body of a story. Very powerful module.
- AEF Externodes: A module allowing one drupal to access another's drupal nodes remotelly, using Nid/Fid Address Space abstraction, and executing Views remotelly. E.g. if you're looking at node 100000001 on Drupal 1, you are in fact looking at node 1 on Drupal 2. This is truly powerful, allowing you to 1) save SQL CPU time by querying nodes on a remote Drupal, 2) sharing a set of nodes. A good example we're going to use it for is images nodes. A collection of 20,000 images nodes will be put on a separate Drupal, which we will able to plug to other drupals. And the heavy SQL full-text search on these images by journalists will be done on a database different than the main one. I stronly recommend you to see the videos to see it live.

Only local images are allowed.

- AEF Image: A very powerful image CCK field. When you upload an image, you see it in all the different imagecache presets used in the website. And you can scale/crop each one of the presets differently using JCrop: you are overriding an automatic imagecache preset. Useful when you're cutting one head :) Plus, you can even upload and scale/crop another image for a given preset. And finally this field support both the direct image-upload approach, or the image-as-a-node approach. See it live on the video!

Only local images are allowed.

- AEF Editor Toolbox: A small fixed frame where you can search stuff, manage bookmarks, history, and search your image collection.

Only local images are allowed.

- AEF Embedded Edit: Do everything on a single window! Creating a image, then searching it, inserting it in a multimedia element, saving it, then going to your article, searching your multimedia element and inserting it... can be quite long. With this module, you create your image directly on the same page in an iframe, and when you save, the nodereference of the multimedia element is automatically filled with the result. And you can edit/view every node referenced from a nodereference.
- AEF Formatter Selector: Have you ever been frustrated by the fact that there is no way in Drupal to select a theme in the node edit form? And no contrib module for it? Well, this module is doing it! It let you select a theme in a list of themes under each nodereference you selected.

Only local images are allowed.

...
and 39 more "AEF" modules, meaning they are generic and could be reused by another website.

The server architecture

Our server infrastructure is basically laid out as follow:
First, the Akamai CDN, which act as a giant reverse-proxy, it saves us from 90% of the hits.
Then 4 load-balanched apaches, each one sharing the same webroot with a NFS mount.
Finally a replicated Mysql database linked with the apaches at 1Gbits.

Problems we encountered, Lessons we learned

Problem encountered: Before the migration, on the first version, we had some Mysql slow queries with a cron we made, sometimes crashing the database.
Leasson learned: Choose very carefully your data model. Be very careful with the database. That's the only part that is not scalable. You can add as many apaches servers as you want, but only one database.

Problem encountered: We were working at reducing the amount of traffic between the database and apache. And we found out than 80% of the traffic was due to Lightbox2 who was generating thousands of CCK formatters unnnecessarily. And these formatters definition were stored on cache tables and transfered to apache on each page load. Haven't we found out that, the server infrastructure would have probably collapsed with a 1.2 Gbits traffic on a 1Gbit wire between apache and mysql.
Lesson learned: Take your average number of active apache thread, multiply it by the SQL *data size* transferred for a page, and check your wire capacity.

Problem encountered: When we hit the "migration" button, all the apache went crazy at 200 of load. After some times of investigation, we found out that they were simply swapping like hell.
Lesson learned: Take your average number of active apache thread, multiply it by the average memory usage of a page, in our case 30M, and check that your apaches have enough RAM.

Problem encountered: Before the migration, on the first version, loading a France 24 page from a browser was quite slow. I am talking here about the user experience in the browser, the total loading time you can see on your network tab on firebug, when all JS,CSS,images are loaded. We were at about 6-8s, and the user experience was not that good. That may sound weird since we have Akamai caching our files.
In fact, part of it was due to quite a large number of JS files, and also a tracking JS that was loading 9 more JS files. Now, on the second version, we have a *much* better loading time, 2-4s.
Lesson learned: Aggregate your JS files!! In fact, your browser can load images and CSS concurrently, but it will load JS files *sequentially*! And also aggregate your CSS files.

Problem encountered: As we were developing the website, the apache2 computing time got longer... We had to reduce that. And very often, we were able to reduce dramatically the loading time by commenting a single line or two.
Lesson learned: There is always room to reduce your loading time, and that's usually due to simple mistakes. Put timers in your code, display the time needed to generate parts of the page, and narrow down the part eating the most of the CPU time.

The open-sourcing

We announced we were going to open source this code... and we will do it. All the 45 AEF modules. First we need to release RFI, then we need to package the code. And probably remove the bits of non-generic stuff that may remain on AEF modules.
So that's probably around the end of year.
Meanwhile, you can see it live on this Drupalcon presentation.

Conclusion

Every development team has a given level of expertise in development. And having an excellent base such as Drupal for a project is simply averaging up the final quality of this project.
By contributing (soon) these new Drupal modules, we hope to help strengthen the newspaper module base, and we want to thank the Drupal community for this wonderful product.

Comments

It appears you've developed a

DeeZone's picture

It appears you've developed a wonderful in house toolbox of modules. Of course, you must know there are more than a few of the readers of your case study that are drooling to have access to any or all of your AEF modules. AEF Externodes sounds like a dream come true to me. Congrats on your efforts, I hope you share some of your code along with your wisdom.

Congrats for your project !

jdidelet's picture

Congrats for your project ! Just a question. You choose innoDB or MyISAM engine for MySQL ?


Julien Didelet
Founder
Weblaa.com

MyISAM

ndeschildre's picture

Hello and thanks.

After the initial slow queries we had on the first version, we did a benchmark with innoDB to see if it was behaving better or not, and how it was behaving on slow queries. The results were quite similar, so we stayed with MyISAM.

A good read, thanks! You

mig5's picture

A good read, thanks!

You mentioned that the database is the only thing that doesn't scale - I noticed you mentioned a shared NFS directory across the apache nodes, presumably for shared file uploads. From past experience, this is also another difficult-to-scale or single-point-of-failure part of the architecture.

Are you or have you considered implementing any replication at the filesystem level to ensure that that is redundant/highly available? i.e replicating the data with DRBD and using a HeartBeat shared IP or something like that.. I have found that that's as much as one can do with NFS, and perhaps other fileserver solutions like GlusterFS or CouchDB etc might be the way to go.

Also,

Finally a replicated Mysql database linked with the apaches at 1Gbits.

Linked in what way - is this a master > slave replication ring, and perhaps you are pointing an apache server to a specific mysql slave per server? Do you see any latency in the replication / Slave IO?

Is there a way for an apache server to use a different mysql slave server (if there are several), i.e using keepalived or anything for IP failover? Or is it not on that scale yet to be using multiple slave DB servers. Sorry, just curious questions from a sysadmin :)

Looking forward to seeing and using RFI when it goes live :) bravo

I'll let our super-sysadmin

ndeschildre's picture

I'll let our super-sysadmin answer that :)

This would be a great

bonobo's picture

This would be a great writeup/case studt for the homepage of drupal.org -- have you considered reposting this on d.o?

Cheers,

Bill

Good idea. But I will

ndeschildre's picture

Good idea.
But I will probably have to reformat some stuff and add some pictures...

Great story! Thank you. I'm

jcisio's picture

Great story! Thank you. I'm quite eager to see AEF Image source code, as we're using image-upload approach (we usually have more than 5-10 images per article).

Question: do you have some benchmark on Panels and Composite? If there is not much difference, I'd go for the more popular.

Thanks. Unfortunately, we did

ndeschildre's picture

Thanks.
Unfortunately, we did not do a benchmark between the two. We just looked at what we needed, we choosed the less complex one. At the time, we were afraid there would never be a stable version of Panels (there is one now), and Composite, while also still in beta, was much smaller and easier for us to debug if necessary. Which was never necessary in fact.

It's now on frontpage

ndeschildre's picture

As requested above, I submitted this post for the homepage, and it's now there: http://drupal.org and http://drupal.org/node/614014

When is the source to be released?

firemyst's picture

Hi thank you very much for this writeup, I'm working on a similar case for a non for profit organization. Any idea on when will you release the code? I'm about to go with the openpublish distro of drupal but thought I'd ask you first because what you did is exactly what you need (probably with 1/3rd the traffic) and could streamline our deployment process to a point I'd never thought of.

A few modules have already

ndeschildre's picture

A few modules have already been open-sourced, you can look on my profile at: http://drupal.org/user/353500

Unfortunately, due to lack of time, we are "only" releasing the modules, we are not planning to maintain a full distro.

hi

ngocthao's picture

thank you about a great case study. I'm interested in france24's theme. Would u like to send me france24's theme?. thanks again for a case study :)
Happy new year

theme

ndeschildre's picture

Hello,

We are not at the moment considering to open-source the theme used on the website.

Cheers,
Nicolas

sorry

ngocthao's picture

Thanks for your reply . I apologize for my impolite question.
Best regards

Newspapers on Drupal

Group organizers

Group categories

Topics - Newspaper on Drupal

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week