Hi!
First, let me apologize if my english hurts. I'm not a native.
Thanks for the existence of that group. Though i come here for the first time, I'm sure it helped people many times before.
I'd like to expose my concerns and worries too about drupal. I'm in charge of a website which has about:
- 11.000.000 pages views a month
- 300.000 unique visitors by month
- 100.000 registered users
The settings are:
- each user can post a content
Now as I'm a drupal addict, because of its power, versatility and ease of configuration(after a short read of course lol), I asked my boss to run our site with drupal. But I worry about the performance. We have more than 600.000 nodes available and it's going to increase with the new version.
I read many articles on optimisation and drupal optimisation. But the practice lacks. Did someone implemented a drupal website with more than 600.000 nodes and lots of users? Is there any way we could implement this with a low cost?
It would be very helpful if drupal.org could tell us how they planned and implemented their site.
ps: we have about 6000 contents added by day and the site is mostly accessed by unique visitor.

Comments
one of the biggest
one of the biggest installation is drupal.org.
I see no problems. with a good setup you can handle this.
Willy: You are right to be
Willy:
You are right to be concerned. Until the end of 2008 I managed a Drupal site with 200 modules, 2M nodes, 350K unique visitors each month, 2M page views a month, and a 10%/90% authenticated vs. anonymous site visitor mix. We had to jump through all sorts of hoops - caching, load balanced web servers, master-slave-slave-slave DB servers, and a CDN - to get the site to offer an acceptable level of performance. I am not sure what you consider to be "low cost", but we were running two load balancers, four low end servers (web and image), three mid range servers (DB slaves), and one high end server (DB master) and I had a very smart and motivated (and not inexpensive) Director of Operations that worked 24x7 keeping it all running.
When we got the rights to more data and realized that the 2M nodes were going to jump to 25M, we gave up on Drupal and moved to a custom Grails based solution. Don't get me wrong - I like Drupal and think it is the right solution for many problems, but it is not the right solution for all problems.
What functionality is your site going to have e.g. what modules do you expect to use? What is the mix of authenticated vs. anonymous site visitors going to be? Do you (or someone on your staff) have a deep understanding of caching? Who are your site visitors, and will you lose traffic and revenue if site performance is lacking? Or is your site compelling enough that site visitors will tolerate poor performance and be forgiving of the occaisonal speed bump? There are some very talented Drupal performance consultants available, but they are not cheap - can you afford to purchase help when it is needed? Are you comfortable outsourcing your site search to Acquia? Do you have PHP development staff that can recode poorly performing modules without waiting for the community to fix any performance problems?
Andy
Andy Forbes
CEO
Yeaton Partners
(e) aforbes@yeatonpartners.com
(w) http://www.yeatonpartners.com/
200 Modules......
Willy, here's the presentation Narayan and I gave at Drupalcon DC about how we have Drupal.org handle over 1.5M unique visitors, 25M page views, and low anonymous to authenticated page view ratio. http://dc2009.drupalcon.org/session/drupalorg-infrastructure-status-and-...
I know some about Andy's project based on Drupal 5. Certainly the decision to have 200 modules that haven't been tested for that level of performance is an architectural decision that has consequences. Some of the most popular modules like Views and CCK in Drupal 5 were not proven to scale from millions to tens of millions. Now in Drupal 6 there are significant improvements, and there are alternatives that are known to scale better. Lower level frameworks don't give you as much out of the box and have a starting point that allows fine programmatic tuning that GUI tools do not.
Willy, it is important for your business to understand how important performance and scalability is versus features. It may make sense to get some early architectural scoping to make sure your needs are being met.
Drupal 6 performance made some significant improvements for caching and extensible support for other scalability layers. Andy is right, that you want your operations person to be well versed in scaling the LAMP stack including tuning MySQL, MemCache, Filesystems, WebServers, PHP. Your sites unique traffic pattern, third party integrations, and feature set will can put considerable strain across 8-10 layers of your stack, only one of which will be Drupal. If your operations folks don't understand that stack, or are learning by trial and error it can problematic.
The good news is the LAMP stack is very common for Internet applications and it's the same stack that Yahoo and Facebook use to scale to Billions of page views. If you need help, don't hesitate to get in touch through my contact tab.
Cheers,
Kieran
Drupal community adventure guide, Acquia Inc.
Drupal events, Drupal.org redesign
I think that frustrates a
I think that frustrates a lot of people, myself included, that an ASP.NET is able to handle 1.2 billion of pageviews per month, with most of them authenticated users (dating site) with only a handful of servers (7-8 servers) -- still with rooms to growth.
That's about 500-600 pageviews per second. Last time I heard it is close to 2 billion pages per month now.
http://highscalability.com/plentyoffish-architecture
With Drupal, I haven't been amazed by the figure. Even a site with only 25 million pageviews per month needs a few servers. PoF, on the other hand, aims to server 1 billion pageviews per month per server.
http://plentyoffish.wordpress.com/2006/12/07/scaling-sql-server-2005-nat...
That practically puts Digg -- which is LAMP -- into shame.
Markus claimed to have solved the scalability issue, but no one really knows how he did it.
I know, it's entirely different. But Kieran, while Drupal is just one of the stacks, its efficiency determines speed. I've seen a Drupal page with close to a thousand queries per page. Obviously with memcached and other caching methods, it still helps to optimize and reduce the queries.
Assuming everything else is the same, the other stacks are known to perform very well. But the question is, how is Drupal compared to others? I think that's what most people try to figure out first before picking it as a content framework of choice for their project.
Will choosing Drupal cause scalability issues in the future, or will developing an app with lower level framework save a lot of cost later? After all, we all in the business for the long haul.
I personally use Drupal in various different projects. But currently as I have ideas for many bigger projects, I am in the position to reconsider if Drupal is the right tool for the job.
Does anyone else find it
Does anyone else find it ironic that http://highscalability.com is timing out?
Apples and apple martini's
Hendry, PlentyOfFish was clearly built by a person with tremendous talent. Comparing this ASP.NET implementation and Drupal is not realistic. With ASP.net you are programming almost all the features yourself from scratch rather than downloading 100 to 200 modules and configuring them. If you have the talent to program all that functionality from scratch then optimizing for performance is an just another wonderous skill you happen to have. But try taking POF specifications to your favorite ASP.net shop and ask them how much it will cost to implement on ASP.net. Then compare the spec to a Drupal shop.
There's a couple of details you should look so you are comparing accurately. They use a CDN, as do most high traffic sites. I recently had a performance discussion with someone who argued that Drupal couldn't scale. I asked them a few questions and they finally confessed that they were just pushing static pages of their site to Akamai servers and delivering hundreds of millions of pages from the CDN. If you want to do the same, try the Boost module. If you are generating revenue then you can afford a CDN and scale to billions of anonymous pages relatively easily. So while PoF are only using 2 webservers they are probably using dozens of servers in the CDN. You can do the same with Drupal. On Drupal.org we don't use a CDN, but we could.
I could go through in detail and point out how many of the choices are hardware architecture choices and not software limitations of Drupal. But let's look at the post.
So let me summarize the keys to success.
e.g. Apple rocks because of Objective C!!!(not Steve Jobs)
Let me know when you get past stage one.
Cheers,
Kieran
Drupal community adventure guide, Acquia Inc.
Drupal events, Drupal.org redesign
LOL - don't hold back
Kieran:
LOL - don't hold back; tell us how you really feel :-) . If I had the previous company to do over again, I'd still start with Drupal. I'd start with a development firm that knew what they were doing, I'd develop a custom module to serve up the dataset that was 99% of the nodes in my system, I'd use SOLR from day one, and I'd be careful to design the pages so I could cache at the page-part level and make aggressive use of a CDN. Hindsight being 20/20, and all that.
I think mostly I am reacting to wilmar81 asking the right questions and his question getting a simple "yeah - Drupal can do that" answer when the right answer was "it depends".
Also, I stand by my recommendation that he spend some money on you, or someone like you. I certainly wish I had when I was getting started with Drupal.
Andy
Andy Forbes
CEO
Yeaton Partners
(e) aforbes@yeatonpartners.com
(w) http://www.yeatonpartners.com/
Note I also mentioned that
Note I also mentioned that ASP.NET and Drupal are two completely different things. Just that I'm trying to deliver a point. Drupal is just one part of the stacks, but it uses all the components like memcached, Apache, CDN, etc.
In the 1000 queries per page case, it can surely benefit from speed by optimizing Drupal itself. Perhaps you can speed it up twice if you decrease the queries to 500 per page -- assuming everything is the same and every query time is the same. You get my idea.
I don't say it's entirely Drupal. A strong background and knowledge of the whole stack is certainly required but I have seen Drupal often is the bottleneck, as you say it is because of improper deployment of the content framework.
So the whole point for the OP is, try to assess your need. If hiring a firm to implement Drupal may take you such and such amount of money, and then 3-20 times the amount of servers to serve the same amount of people, than could you still compete on the market?
In PoF case, if he has to deploy as many servers and hire as many staffs, a few million dollars will not make it but since he scales the architecture well, PoF survives well and can afford the free model while he tries to figure out how to monetize the traffic.
P.S. You can only view PoF a few pages before you need to authenticate, so very likely they have high number of authenticated users.
+1 on what Kieran and Hendry are saying
Willy:
When I started my Drupal project I turned to a firm that I thought had some solid Drupal experience. Unfortuantely, it turned out they were mostly learning at the same time I was. It was a false economy on my part to not start my project with one of the established Drupal firms / consultants. If I have one piece of advice for you, it is to spend a little bit of money up front and buy a week or two of someone like Kieran. Having someone that has "been there and done that" brain-fart their experience to you will save you a ton of time and money.
As Kieran observed, I ended up making some architecture choices that cost me dearly, and that I could have avoided if I had started my development with someone that really knew Drupal inside and out.
Andy
Andy Forbes
CEO
Yeaton Partners
(e) aforbes@yeatonpartners.com
(w) http://www.yeatonpartners.com/
Oula!
Waw!
I'm a bit scarred when reading your posts guys. Yes, the first comment gave me a "quantum of hope". And so many posts I read (I still can't find them again).
Let's start with Andy. I must admit this is my real first project with caching system. Few days after we launched the version 2.0 of our website, we had load issues with our server, and I could fix it with cache_lite, which is a good tool. But not as "complex" as those things you explained. The maximum I will install is 100 modules. And I think it's enormous for us. Now, we hardly have 500 members authenticated at the same time. Then, we have much more anonymous than authenticated. But all content are moderated or posted by moderators.
Modules Because of the mix of content types, I will use cck and its relatives. To allow real versatility (we're a site of classifieds ads but we have other contents which generate a good traffic), I want to use views and panels. For url friendly, token, pathauto and global redirect. Those are the main modules i want, plus some of the defaults installed.
Caching > No, nobody have a deep understanding of caching. But as I said, I read some articles which helped to understant the necessity of a good architecture (7 stages of scaling web applications). But I think the guys managing our server can handle it.
Traffic, Revenue > Of course, if we lose traffic, it will be dramatic for us. So are revenues.
Kieran, I went to your link, but couldn't understand anything. You speak too fast for me lol. I will have a class of english soon. Then my ears should be trained by the end of october lol.
Let me give a summary of where we are now. One dedicated server which handles everything: mysql, http, files...everything. Then, although we are on a dedicated server, it look like we are on a shared hosting since we also have 3 of 4 other sites running on the same server.
Going on drupal, I intended to have a reverse proxy (like squid), apc and memcache installed. I also planned to ask for another server for mysql only, if necessary. That's all I planned about optimization. But i'm afraid it's not enough.
Andy, I think we can afford more servers . It don't depend on me, but on my boss. I forgot to add that, even if I'm not an expert on drupal, I know php a bit, and I can read drupal documentation lol.
Now, I have two sites running under drupal (http://www.bamena.com/ and http://www.journaux.ma/). The first one runs with drupal6 and the second with drupal5. And i'm satisfied with them both. Though bamena.com causes me some worries.
Right. This is where we are.
Thank you for your posts.
--Willy
Life worth it to live. No matter what you're going through. Only your weakness can overcome your strenght.
--Sites
http://www.telecomaroc.com/
http://www.squaresystems.co.ma/
Sorry, you just can't
Sorry, you just can't compare ASP.NET to Drupal. It's a completely different beast.
What should wilmar really answer is how many authenticated users does he have. I didn't see any answer to that yet.
I think I did answer it on
I think I did answer it on my second post. Now the max we have is 500 users authenticated at the same time (time<15mn). And I agree, we can't compare asp.net to drupal.
--Willy
Life worth it to live. No matter what you're going through. Only your weakness can overcome your strenght.
--Sites
http://www.telecomaroc.com/
http://www.squaresystems.co.ma/
Traffic pattern
Apologises to Kieran. I didn't understand when you mentionned traffic pattern. I've just find what it means.
Unique visitors: 23.000/weekday - 15.000/weekend
Pageviews: 400.000/weekday - 200.000/weekend
Our server is a quad core xeon with 4G memory.
As I said before, much of the traffic is by anonymous visitors. But we do moderate about 4000 nodes/day. About 2500 are published and the others deleted.
I've got to leave for that day. It's evening time here.
--Willy
Life worth it to live. No matter what you're going through. Only your weakness can overcome your strenght.
--Sites
http://www.telecomaroc.com/
http://www.squaresystems.co.ma/
The most you can squeeze is
The most you can squeeze is when you are building custom scripts with custom database structure. You cannot compare Drupal to custom script. For example if I am using Drupal for personal blog, with most of the features disabled, I can write custom php scripts which will do the same but will get >10x performance. I see Drupal as a tool to get you idea running - then you will have the funds for very very custom solution. PlentyOfFish is the example of later - It is a hard work of building system from scratch.
I had a social site with ~1,000,000 million pageviews per day, which was designed and coded by very gifted teenager, who at that time was good at social engineering but very poor php programer. Site grown up pretty quickly and we moved from shared to dedicated server with 2gigs of ram. It was overloaded in minutes... I started to look at:
After all that the same server was running on 0.2-0.4 load, without any problems for months. Yes, it was ~1,000,000 pageviews/day site, with most of the registered users ( I do not remember exact count)!!!
It was not Drupal experience but the principles are the same:
drupal+me: jeweler portfolio
Nice points, adding some more
First, I do not think we need to compare Drupal to ASP.NET. There are plenty of examples of system in PHP itself which perform great.
And Drupal is one of them. "Great" is a big term. There are many, many things to consider. As Keiran says you can not drop modules into ASP.net and see them working wonders.
But as playfulwolf says, Drupal is meant for many uses. So it is bound to have support for the whole module system, theme system, ACL, etc., blah, blah... Many social websites may not need the whole ACL. Just 3 user types: admin, authenticated, visitors, all harcoded into the system.
We have been doing custom PHP for last 3 years because we are very low budget. I am just about 25+ now. We started with the money from Google Summer of Code 2006 which I did for Drupal. Then I had an offer for a social website. At that time, and still now I was more experienced in the custom PHP segment than Drupal specific. So I decided to head that way. And we still are mostly custom PHP. We delivered 9M+ PV/month from 1 single server. Mostly authenticated users. 33 PV/user. 18 mins per visit, etc... on average. Although currently the site is unmaintained: no buyers into the social segment in Latin America. Anyways, the point is as Keiran states: experience.
Drupal lets you start fast, easy. But in my opinion if you have to tweak Drupal to the core, hack it so much that you lose updates compatibility, then custom could be a better option. That is the choice we had made. So we use Drupal for projects where we do not have scaling nightmares. High performance, big numbers, scaling are pure bulls***t for sites just starting up. Unless the concept is an unique as twitter there will be chance to grow your website along with the user numbers. But in your case you have a high volume website. So you have limited choices.
For sites where we need to manage few million pageviews in less than USD 500 a month in hosting we do custom apps. Yes all hosting + CDN (all on AWS) within that price. To be honest there is never a perfect way. No single architecture will blindly suite another unless the sites are clones from every angle.
But one thing you should never do, is take example of PoF in hopes of doing same. Treat your case wisely. Brainstorm with experienced people like Keiran and other maintainers of high traffic Drupal websites. Its always easy to tally their numbers with yours. If they match then great. If they don't then you could go as far as developing an app in custom ASP.net too (don't go that way... evil :P).
brainless,
Sumit Datta,
brainlessphp.blogspot.com
Web Developer
brainless,
Sumit Datta,
brainlessphp.blogspot.com
Web Developer
Its all true what you say,
Its all true what you say, just was that not the point of the posts above.
Its clear that you get drupals power for a prize as you get the asp speed for another.
But it would be interesting how things would be IF we then see a comparable non php CMS.
ATM no other base system offers a out of the box free CMS complete like drupal, joomla or typo3.
They are good 3-5 years behind. But there are already some good commercial non php CMS out.
I am sure we will see in the near future other full featured CMS beyond php and i am also sure
that will be good for the php world too. Competition is always good.
The tweaking is another problem - custom will not save you from bottlenecks of the core.
For example that drupal turns off the block cache when you have installed custom node
access control (even when your blocks don't use any node data), the uncaching of
user profiles and file pathes which scale badly... A small patch there works wonder.
We have ONE script, which is invoked to create a site out of a repository of modules,
core files and medias. It will do all the copy and apply the patches. The goal is really
to have a kind of "robot script". To update a site, i just copy the new files in the file/module
base directory, then i invoke back, create and install - 3 scripts.
I only have to check that manually when a patched module or core file is changed,
but in 9 of 10 cases you see on the first glance an adjustment is needed or not.
It works like a charm, i must say.
Any install step which can't be automized is changed to a way which fits to the 3 script step.
In that way i avoid the human error factor and i have no work beside one time a bit more to
create the scripts lines and the patches.
@michtoen The tweaking is
What is this "small patch" that you mentioned?
Thanks.
Checkout www.daimonin.org, i
Checkout www.daimonin.org, i had moved our project page to drupal 5 weeks ago.
For a site like that, the block cache is most important. As you can see, the site is somewhat fast.
We have not much nodes atm but 9k users.
Even as registered user, you can fetch a profile with blog overview, friendlist and pictures in 250-280ms.
When all blocks will trigger as cached. So, the 2nd call will be fast OR, and that is important, when the
site is under load, because then the chance is high the block is refreshed in time.
If you count the blocks, then you will see that it only works by good caching.
So, this is a must have - on a site type like ours it will half the access time and more.
Sadly, the core will disallow any block caching when you install any access right system - which
is a must have for a social site. Read on here:
http://drupal.org/node/186636
Another one is to cache the user profiles. I have found a patch on the sites (lost the link)
which will remove the dreaded mutiple DB calls to the profile. On a social website you will easily
load the user profile like 15-25 times. I personally count that to design flaws, but well.
If anyone want have it give me a note, it was posted on drupal here as D7 patch, i ported it to d6 and it
works at last for our site like a charm.
I also migrated "trivial" modules like for example flat comments to a direct core patch. These
modules are often doing things which you can direct patch in the core by a single value or line.
I prefer to write a patch for that, have one time and by the update a bit more work but then
one hook call less.
Also, checkout this page: http://www.daimonin.org/developers/daimonin-project
As you can see, there are badges under the user avatars. These badges represent the 2 most valued
roles for the user. But the 2 badges are only one time calculated. I added a small patch to save/load user.
When you save a user, the subfunction is calculating the roles, and generating the badge code. Then i
store that in the ... signature of the user. Because the signature is part of the drupal user table, and we don't
use it because i had attached SMF as forum by a custom session bridge to drupal, its called for every page for nothing.
Really nothing, because user_load() is loading the user table entry always with a *.
Only the comments, when fetching the user data for a comment, are not grapping the whole table entry, whyever.
I had to patch the core comments there, so the signature is called too.
Small changes, 2 core patches, and voila - a full featured role badge system.
Well, its custom and not for everyone of course, if you use the signature somewhere else, but thats
something what i count under customizing.
I did some more stuff, most is pretty minor and safe to do, but the mass will give you the extra ms safes which
makes the difference. But honestly - thats the same for every CMS. I never run in 6 years a system i had not
pimped up a bit by core patches.
Sumit, I took PoF just an
Sumit, I took PoF just an example.
This is the last time I'll be saying this. It is a business decision. Drupal is not always the solution, as much as I love Drupal. Period.
How could you brainstorm with a Drupal person and still be objective?
In a business, ASP.NET vs Drupal is a valid comparison. We have NOT YET come into a conclusion about CMS, CMF or lower level framework or even straight PHP or ASP. So every benefit and drawback should be considered.
If your site is designed to be there in 10 years, ease of implementation is NOT always an advantage. It may be come a drawback especially because flexibility always comes with a cost too.
PoF example
Hendry, I think you got me wrong. The statement was meant for users.
I meant that the users out there do take such clues. And they are not to be blamed too. Technology is complex as it is. It is difficult for people to understand these many terms. What I meant was a general statement that people should not imagine what works for PoF will work similarly for them. "But one thing you should never do, is take example of PoF in hopes of doing same.": that is an advice for users.
brainless,
Sumit Datta,
brainlessphp.blogspot.com
Web Developer
brainless,
Sumit Datta,
brainlessphp.blogspot.com
Web Developer