Drupal Makes insane number of Queries, Can it be considered Good ?

Events happening in the community are now at Drupal community events on www.drupal.org.
gdtechindia's picture

I am a big fan of Drupal and have used Drupal on more than 50 websites so far. My team has noticed many issues with performance for a website which was serving high traffic. We used modules like Boost for static file caching.

Some programmers i consulted told me that we should look for database optimization. So, i hired a person for DB optimization. He made a profiling tool to records all the queries our Drupal site makes.

Now, the site serves around 10k visitors in a day. And the number of queries for the last 12 hours are 6.1 million.

Some Facts important to be mentioned here

We are using very limited Modules

Views >> Recently stopped using the most popular and recent popular due to performance issues. But views is still enabled.
AcidFree Galleries >> Showing last three images post in galleries

We do not log access files due to performance reasons.

Server has 6 GB RAM, Intel Quad Core, Fast Hard Drive

Now, I want to ask, why Drupal makes so many queries ? Even if we do not install too many modules (in our case, around 3-4 modules, in addition to the core modules of Drupal), this number is too big. 12 million in a day for 10 K visitors ?

Drupal may work fine for small websites, but for websites which start getting some traffic, this CMS doesn't seem to work as good. Is there a need for Drupal team to check on the performance side ?

Regards

Comments

That seems absurdly high. I

slantview's picture

That seems absurdly high. I don't know about stats for database queries, but we do anywhere from 500k - 1m unique visitors per day and we are running on one database server (with a hot spare doing replication), 4 web servers and 2 memcache servers, with most of the machines sitting idle.

I would suggest you look at cache router / memcache / boost.

http://drupal.org/project/cacherouter
http://drupal.org/project/memcache
http://drupal.org/project/boost

Steve

hi thanks for your reply. We

gdtechindia's picture

hi
thanks for your reply.
We are already using Boost
Cacherouter, i will try now.

We tried memcache, but for some reason, it didn't work well for us.

I saw your case study for performance about divx.com, hope to see more results.
BTW, we are running drupal 5.14

Acid free galaxies

Johnny vd Laar's picture

for your acid free galaxy you might have a look at the: http://drupal.org/project/blockcache module that gives you a cached version of the block

A couple of comments about caching and queries

Amazon's picture

First, who cares how many queries Drupal has? If your database can handle it, why worry how many? Find your slowest queries, using MySQL's slow log and make sure the slowest queries in aggregate aren't slowing down total database performance time. If you believe the individual page load times are being hampered by excessive queries then use the devel module to identify which queries are performing slowly and tune those specific queries. Tune your database accordingly.

Second, when you say you were having problems with most viewed and most popular think about what this means. For every page load, you are recalculating all your views and the most popular. The more content you have and the more viewers you have this obviously won't scale. The solution is to use block caching so that all these dynamic features are cached to be calculated every 5 minutes instead of a hundred times a second.

Third, do you know you have a database problem? Did you use devel module to compare the page generation time total PHP generation time versus total database query time? I frequently see people worrying about their database performance when in fact the database is only representing 0.3s of a page that is being measured at 3-5s total. Page generation and load times frequently have to do with optimizing other parts of the LAMP stack.

For detailed analysis of how to tune your Drupal site start here: http://tag1consulting.com/performance_checklist

Cheers,
Kieran

Going by number of queries

Jamie Holly's picture

Going by number of queries doesn't really help. As an example, a site I managed used to be on Wordpress. The most queries needed for any page was 17. If we had more than about 15,000 pi in an hour it would crash our database. I moved the site to Drupal and our pages average between 80-100 queries. Now we have had periods with over 50,000 pi in an hour and the database server sat near idle.

What you need to do is look at the slow queries. I would enable the query log in the Devel module and select to sort it by duration. Take your longer queries and run an explain on them in MySQL and see if they are doing anything bad like full table scans or filesorts.

You do seem to have an abnormally high number of queries (averaging about 600 per page). Using Devel you can narrow that number down further - such as if one page is generating a lot more queries than another.

Another thing that generates a lot of queries, depending on the site's layout, is the path module. There are various patches around that help eliminate many of those queries, either by ignoring lookups for certain paths (ie: admin), or by utilizing another caching backend for them like memcache.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Totally agree - not the

playfulwolf's picture

Totally agree - not the amount of queries, but the quality of database design matters. Once (not Drupal site) revisioned a database of site which had ~1million pageviews/day, averages ~40-100queries/second built on custom PHP script which had... database with NO indexes at all!!!.......

Of course, Drupal is very smart designed, but slow queries log should be very high on performance checklist.

---
naslenas.com. Something not interresting about Drupal.

drupal+me: jeweler portfolio

On a heavy site

kbahey's picture

On a heavy site, you can see up to 1000 questions per second, as measured by

show status like 'Questions';

So that is around 8.6 million a day. Even with memcache enabled for anonymous users.

However, it may be easier just to enabled devel and see what it reports as number of queries for some representative pages, and see if it is high.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

By 1000 questions per second

borgo's picture

By 1000 questions per second do you mean db queries?

Anyway 1000 per sec = 86.4 mil queries per day. Now isn't that an amazing number...

We are running such a website and using devel I have measured many times over 1000 per sec. We have experienced mysql overload and crash few times. Boost helped a lot but still, we don't have a real fix.

Litenode

Amazon's picture

Here's an interesting take on reducing load time.

http://www.developmentseed.org/blog/2009/feb/4/litenode

Kieran

Optimize code?

damienmckenna's picture

Have you written any template code that's running unnecessary queries, or module code that could be improved? Yesterday I found some code I wrote ages ago which was very sloppy as I wasn't aware of how Drupal could automate some things for me, so half an hour rewriting it and updating related code dropped about twenty queries off my site's homepage.

200-300 queries per typical page

escoles's picture

That's what I'm seeing on one client site right now, at least according the the Performance logs & Devel module. That's on a node page, viewed by an anonymous user, displaying 7 blocks (search, Primary Nav, another menu block, one CCK block, a Print/Email block, and two text blocks), with page caching in "normal" mode. The node on that page has 2 CCK fields, both filtered textareas.

"Tune your stack" is really not a very useful answer for the vast majority of sites. I realize this is a "high performance" group, but when you are running 300 queries per typical anonymous page, it seems to me that there's a problem right out of the gate.

Misbehaving Module

kwinters's picture

A single misbehaving module could explode the number of queries.

If you can narrow it down to what module / block / etc. contributes the most queries, that will go a long way towards fixing it.

Ken Winters

I wasn't trying to hijack the

escoles's picture

I wasn't trying to hijack the thread w/ a support request, FWIW. But since you're asking:

reptag is the single biggest offender. It runs 14 queries, and each of those 13 times (182 queries). (I'm getting these details from a logged-in user, but the math makes it unlikely that reptag is doing much less for anons.) So, yes, that's pretty bad. I made heavier use of reptag on that site than I have since, which could help explain why the newer sites are so much faster. The real mystery is why it's showing up on this page at all, since I don't use any replacement tags there.

The most expensive queries are ones that I suspect aren't getting run for anons. (Curiously, though, the perceived load speed is about the same for anon and admin.) I'm doing some work on this site tonight, so maybe I'll enable log visibility for anons for a little while and get some #s from that to be sure what I'm really looking at.

What's especially interesting to me is that there are still so many unique queries. That's about 180 queries, most of them unique. (Based on the anon figure of 368, I think I said. Actual Admin-login page I'm getting details off of has about 398 queries.)

A cached page should only be

dalin's picture

A cached page should only be generating a handful of queries (mainly to load the session, check that the user isn't blocked, and load the cache). If you are getting 300-400 for an anonymous user requesting a page that is in the cache then there is something seriously wrong with your site.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Looks as though that's very difficult to verify

escoles's picture

I think you must be talking about Aggressive Mode caching. I've got normal mode caching (not aggressive mode), no block caching.

In that scenario, I'm seeing 314 queries. Some highlights: cache_get, 43 (at least 15 of those are non-unique queries); drupal_lookup_path, 31; menu_get_item, 13.

If I switch to aggressive mode caching, I see 332 queries on the first anon load of that page after the change in cache settings, and then I don't see that page show up on the performance log again. I don't see any because the page loads don't get recorded in the Performance Log.

So, I switch back to normal caching and reload the page. Again, nothing shows up in Performance Log. Clear browser cache; try a different browser; still no page load.

Only when I clear the site caches do I see an anon page load recorded in the Performance log. (This is with normal caching, not aggressive.) Then I don't se another until the next time I clear the site caches.

If I make Devel data visible to anon users, I can see the queries, but again, I see 312 on the page (311 in Performance Log).

So what I'm seeing is that Devel at least doesn't provide a means to ascertain how many queries are being run when any caching is switched on, and that if the devel data is exposed to anon users, caching is effectively switched off.

Are there any tools that would give a view onto the number of queries that are actually run per page load with caching on?

Sounds a lot like something

Jamie Holly's picture

Sounds a lot like something odd is going on with a module or something. With devel output enabled for Anon and the options to display query output enabled, I do see the queries on the anon page view. First load after cache dump has all the queries and is the standard formatted table. Once the cache is primed then I get a standard print_r output of the global $queries. On a site with about 60 modules enabled, on a standard cached anon page it only has 7 queries (session and module load). Technically that is 10 queries though as there are 3 queries that execute before queries have the option to be saved (load variables from cache, access rules checking and session). That's because Drupal won't save queries unless the variable dev_query is set to 1.

On advanced cache you won't see this information. That's because of how aggressive cache works. Aggressive cache won't load any modules, even if they are set to bootstrap. Check out _drupal_bootstrap in bootstrap.inc and you'll see the logic under the DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE section, but I can tell you on the site I have that Drupal does 4 queries total. Here's a trick to find this out.

Disable all the query stuff in the devel module. In your settings.php override the dev_query variable:

$conf['dev_query']=1;

In _drupal_bootstrap in bootstrap.inc, make $queries a global. Look for :

        // We are done.
        exit;

Under the DRUPAL_BOOTSTRAP_LATE_PAGE_CACHE of _drupal_bootstrap. Before the exit add:

echo '<pre>';
print_r($queries);
echo '</pre>';

Now you get a dump of every single query that runs, including the 3 that are missed during bootstrap. Just be warned that everyone will see these queries on every single page, so if you are on a production site I really wouldn't do it.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

I think I'll build your debug

escoles's picture

I think I'll build your debug suggestions into my themes henceforth. I can easily enable them as-needed and do a couple minutes of testing. Not a good idea for high-traffic sites, but we don't generally build those. Plus, if it's a server-load issue, I'm thinking I should be able to get representative performance by creating a second website on the same server, in a subdomain that's got HTTP auth to prevent spidering. So, not such a bad idea, if handled with caution.

You might want to review the

dalin's picture

You might want to review the bootstrap process or use a debugger to get a better idea of how this works. For normal-mode caching there are seriously just a handful of queries when there is a cache hit (unless you have a contrib module that is making excessive use of hook_boot(), hook_exit(), or shutdown functions). Aggressive mode caching happens before hook_boot() is called so that handful of queries gets reduced to just a couple.

I'm not quite sure why Devel is mis-reporting information during normal caching. In theory it should still work because it registers its shutdown function during hook_boot().

And of course, if the page is not in the cache, it will require those 312 queries to build it.

Also keep in mind, as mentioned elsewhere in this thread, that the number of queries per page is a fairly useless metric.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Thanks for the feedback. But

escoles's picture

Thanks for the feedback.

But w.r.t. the number of queries as a useless metric: Doesn't each query introduce new overhead? So, forget about the total number of queries, and consider total cost of queries, which is going to be the sum of the execution times. Which is going to be more with more queries. So yes, raw # of queries is not as helpful as you might think, but the # of queries translates into a higher execution time than if queries are omitted. (E.g. by caching.)

And if you have high latency, doesn't a very large number of queries compound that problem?

If your database resides on a

dalin's picture

If your database resides on a separate server, and you have optimized all your slow queries, then yes the latency between web server and database server does come into play. But I find that by enabling block caching, plus a bit of manual caching if necessary, the database time becomes manageable. YMMV.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Core Patch

mikeytown2's picture

Here's a core patch that should start to reduce the # of queries needed. This doesn't target cache hits though.
http://drupal.org/node/512962#comment-2463820
I have a list of functions that could benefit from the db_multi functionality in the next comment. Thought on this would be appreciated.

Insane number of Queries: ~2500...can it be???

maksfeltrin's picture

I'm quite new to Drupal and I'm usually more interested in php frameworks as kohana, yii, symfony, ci. However I felt i needed to know more about drupal because of the concept of "node" and its extension using cck is very useful for building complex data type management and presentation. The modularity and granularity of this cmf is astonishing (i've been experimenting for a couple of month, reading everything i could find for improving site performance and patching the modules i needed accordingly).

Nevertheless some facts made me think about performance...

::Number of queries per page (250~400) for authenticated users with about 20 modules installed (tagadelic and taxonomy_redirect has been patched to use cache results with static variables). 20 may seem a lot, but they just make the site achieve common cms functionality as wysiwyg, multilanguage, image galleries.

::Number of queries for deactivation of a module (in this case "taxonomy_redirect" module) operation is 2513 ????????

..reporting Devel module output...

Page execution time was 16451.83 ms. Executed 2513 queries in 2452 milliseconds.
Memory usage:
Memory used at: devel_init()=2.01 MB, devel_shutdown()=35.21 MB.

I'm used to improve web applications reducing the numer of queries from 13 to 9...of course it's nothing compared to drupal capabilities...but 2500 queries to deactivate something seems a bit too much to me...

So, am I doing something wrong??? Can anybody explain to me what am i doing wrong? Anybody else having the same problems i have.

::Reactivating the module:

Page execution time was 14881.73 ms. Executed 2632 queries in 3087.63 milliseconds.
Memory usage:
Memory used at: devel_init()=2.01 MB, devel_shutdown()=35.49 MB.

Yes...I'm definitely doing something wrong....

::Trying to deactivate another module (tagadelic)...

Page execution time was 17027.04 ms. Executed 2469 queries in 2830.52 milliseconds.
Memory usage:
Memory used at: devel_init()=2.01 MB, devel_shutdown()=35.19 MB.

????

loose-coupledness / related question re. D7

escoles's picture

1: Isn't a lot of this query proliferation due to the loose-coupling philosophy -- i.e., build an implementation by integration of many small independent parts? When you build one big app, you can consolidate the queries; but when you build an application by integrating many small, independent apps, you don't have an easy way to do that. (This doesn't explain the extraordinary number of queries required to enable/disable modules, though.)

2: Aren't there some query consolidation features in D7? How would they affect these ##s? My understanding is that D7 will actually degrade performance for an implementation like maksfeltrin's*; are those reasons related to PHP or SQL?

--
maksfeltrin's execution times seem to me to clearly indicate he's on a shared server -- that his site doesn't have many resources available to it. Either that or it's a local dev site (I see similar execution times on MAMP).

loose-coupling philosophy

maksfeltrin's picture

Thanks for reply (escoles and dalin)

yes... i wanted to test it on a shared server (popular italian hosting provider used by many of my customers). I still have poor performance in my webserver (freebsd 4.9...yes still 4.9...very stable and lightweight, server apps compiled from source excluding mysql, ~ 40 websites) and in my laptop test server (freebsd 8.0).

I agree that loose-coupling philosophy inevitably leads to performance degradation. In my opinion database related stuff (dao,orm) should have an internal caching mechanism (like Doctrine does as an example). This is even more important in a project where thousands of (good) people are known to contribute. I also think that many configuration options like modules activation-deactivation, which are admin related stuff, should be put in configuration files. In my experience database caching is faster than filesystem caching, so i agree with drupal choice. So... i hope that next releases of drupal (>=8) will be focused more and more on performance improvements than in admin interface enhancements. Once you separate admin roles from editors, d6 interface may appear simpler than joomla to the average end user who just needs to input content, picture and tags/categories.

...just to make it clear: my point is not to criticize anything, just to outline the problems i encountered. I always consider problem reporting as the first step for improvement.

Hi, you mentioned: " In my

meba's picture

Hi,

you mentioned: " In my opinion database related stuff (dao,orm) should have an internal caching mechanism (like Doctrine does as an example)" - but didn't you just experience that? There is a lot of caching and because you effectively flushed the cache (by enabling/disabling functionality), you see a lot of queries.

Well (de)activating modules

dalin's picture

Well (de)activating modules isn't something that you do on a regular basis. Many caches get cleared at this point in time (though there are a few issues open to reduce that. Search for them). And here is where a lot of caches and registries get rebuilt. The expensive stuff gets offloaded here so that it doesn't happen on every page load.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Cut that number in half with a patch

mikeytown2's picture

Patch right above your first message will greatly reduce the number of queries; estimate by at least by a 1/3, maybe close to 1/2.
http://drupal.org/node/512962#comment-2463820 - First one
http://drupal.org/node/512962#comment-2819410 - Latest one

I plan on making this patch less aggressive (only push 1MB of data at a time) and thus will require less functions.

Thanks, i will try your

maksfeltrin's picture

Thanks, i will try your solution and i think i will also start back from the basic core installation and look into the code of the modules as i activate them.... one by one... to see what happens..and if things can be done differently when performance is affected... will take time...but i think it's worth it.

Reducing the number of queries

maksfeltrin's picture

I was able to reduce the number of queries from 224 to 139.

In my case the problem was in the tagadelic module theme function. I have two tagclouds blocks, total ~40 terms in two vocabolaries (per language terms). After deep analisys of tagadelic and taxonomy_redirect modules i found that the call of l(..) drupal function issued drupal_lookup_path(..) twice for each term.
Overriding or patching (as you show below) the theme_tagadelic_weighted(..) function will do the trick.
In my case vocabolary terms are either specific for a language or with no-language and are linked to a view which filters nodes by terms (language and no-language) and my tagadelic modue display only the tags for the current language or no-language as well.

Replaced code is commented out....

<?php
function theme_tagadelic_weighted($terms)
{
  global
$language;

 
$output = '';
  if (
module_exists("i18ntaxonomy"))
   
$terms = i18ntaxonomy_localize_terms($terms);
  foreach (
$terms as $term)
  {
   
//$weight = $term->weight;
    //$output .= l($term->name, taxonomy_term_path($term), array('attributes' => array('class' => 'tagadelic level'.$weight, 'rel' => 'tag'))) ." \n";
   
$output .= '<a href="'.base_path().$language->language.'/'.taxonomy_term_path($term).'" class="tagadelic level'.$term->weight.'" rel="tag">'.$term->name.'</a> ';
  }
  return
$output;
}
?>

MySQL Database Engineering Suggestions....

jdonson's picture

The number of redundant queries is the issue here.
Why beat around the bush?

Drupal is VERY database-intensive.

Eliminating data and process redundancy is what transactional relational database technology is for...

So why not look at how to eliminate read query redundancies per page??!

Memcached and varnish can really help, but why not max out what the database can do?

Several MySQL architectures can confront database bottlenecks.

  1. Gather, Collect and Parse MySQL Slow Query Logs

    ** => Parse for cumulative impact of small waits ('misdemeanors') and large hog queries ('felonious offenders').

    ** => Watch MySQL Processlist Count for Sleeping Connections

  2. Provision Dedicated Physical Database Server(s)

  3. MySQL Replication Between Master and Slave Database Servers,

    • Dedicated Slaves for RealTime Analytics and Database Backups?
  4. Network Data Source Connection Management & Streamlined Default and Configurable Caching Strategies

=> Avoids sleeping DB connections and supports effective caching,
evidenced by cache performance ratios (request:hit).

  1. Stored Routines and UDF's

  2. Note that there is now the option of encapsulating the functionality of memcached

http://www.bluegecko.net/mysql/memcached-functions-for-mysql-1-1-released/


After properly implementing MySQL, you will find that all of your caching strategies
will become easier to implement and to manage.

Jeremy Donson
http://www.urbanspectra.com/resume
jjdonson@gmail.com

Jeremy Donson
Database and Systems Engineer
New York City

Drupal is not intended for that.

Riki_tiki_tavi's picture

Drupal is a CMS. You can make a good modules setup, make a nice nodes constructor but if you will have more than 50 online users on the site you can't run it without pages cache enabled.

So your site should be static for the most part. After you create a node, entire node page goes to the page cache and then served only from the cache with minimal database queries. There will be several queries to check access rules, some bootstrap calls, flood control and throttle.

But with good database optimization when all your database will fit into RAM you will get a good performance but there is one thing. Drupal use a lot of JOIN-s for content output. JOINs require a lot of memory. The bigger your tables will get the more memory it will take. At some point your free RAM will end and your database will stuck.

I saw a drupal installation when for a one page there was 10k+ database requests. It was on dedicated server and it was not able to handle more than 10 users online. It's a very terrible practice.

So it's not really Drupal fault, it's more a bad choice of engine/tool for a site installation with big visitors count. It's true for all versions of Drupal.

Right now I'm working on Drupal optimization. The goal is to make a caching the part of the core and a part of Drupal architecture. So site should be dynamic, without agressive page caching but also with less database calls as possible. In the best cases it shouldn't make a database queries at all and serve all content from cache.

That will allow not to make a database connection and the speed in that case will be almost the same as when entire page is returned from cache but the page will be dynamic, not static!

The main idea is to precache all cacheable data only when data will change, e.g. node was created/updated. Almost all data on the site is cacheable. You may create node object my node_load() and put it into memcache and then you can get it only from cache without database calls. Afaik that is already implemented in D8 as "EntityCache", but i'm working with D6 and don't really know a lot about D8 internals.

Also there would be a cache warmup. When your memcache (and/or web server) is starting, you can run a php script which will fill the actual cache and only then start to serve HTTP requests.

Everything above will allow to make a geo distributed Drupal web servers cluster. Database may be installed on the one server with fast HDD (without a lot of RAM) and many web frontends with local memcache in different DCs. Yes, content updating will be quite long but it worh it and it's much cheaper than one big expensive server or several expensive medium servers.

The most part of the work is almost done but It's a pity that I can't share it because it's forbidden by Drupal trademark restrictions :-(

The most part of the work is

Garrett Albright's picture

The most part of the work is almost done but It's a pity that I can't share it because it's forbidden by Drupal trademark restrictions :-(

wat

wat https://www.drupal.com/t

Riki_tiki_tavi's picture

wat

https://www.drupal.com/trademark

Examples which do not qualify as "fostering the Drupal software":

  • creating a Drupal fork "ImprovedDrupal";
  • Thank you for reply. Andy clarified the meaning of the trademark rule below. No need to answer.

    Drupal is for Sharing!

    andy_read's picture

    Drupal trademark restrictions have never stopped people sharing good ideas or suggested patches with the Drupal community. We even have a history of patched versions of core for performance improvements (most notably PressFlow).

    The trademark protection simply allows Dries to control how the name Drupal is used - typically to prevent it being used in a commercial way that claims some official backing of the community when there is none.

    So how would you like to share your work? Let's start a discussion. And if we need Dries to permit some use of the Drupal trademark (unlikely) we can ask Dries and he will advise on whether it's needed or what the best route is make this work available.

    The community has always been open to proposed patches to core to improve anything.

    What is new?

    andy_read's picture

    Riki, what has prompted you to write your post on this thread? I've just read more detail and your technical points are good, but you are posting on a thread started in 2009/10 and only updated once more in 2012. You say that all version of Drupal are the same, but then acknowledge that D8 has significant improvements in EntityCache (based on render_cache module in Drupal 7).

    The technical issues you mention are well understood and have been worked on in many ways over the last 6 years. I assume you noticed that Drupal D8 was released last week and that Drupal 6 (which was very current when this thread started) is now in end-of-life support (which ends in Feb '16) - it really should not be the basis of future work.

    You seem to criticise technology choices that were made many years ago for Drupal 6 (when they were probably the best balance of many competing factors) but then proudly announce that right now you're working on Drupal optimization, but only for Drupal 6. That is probably one of the most out-of-date choices you will ever make in your career.

    Yes. Everything is correct.

    Riki_tiki_tavi's picture

    Yes. Everything is correct. Sorry for necroposting.

    Here is the story.

    I've started working with D6 5-6 years ago when D7 was in alpha. My company have a medium-loaded web site (15-20k visitors a day, approx 10 view per visitor and overall 1.5m+ hits by background processing, search engine crawlers, etc...). After 6 years of hard work now I have a heavily modified engine which is capable to do all the work on a one cheap server and able to horizontal scaling with minimal effort. But I'm were using Drupal not as a CMS to publish nodes, but more like just engine with a robust API to create my own modules. It's something like E-store with a lot of custom written modules (~19 our modules, 8 core modules and 45 contrib modules). The site works without agressive cache turned on because pages are dynamic.

    Imo Drupal for the past 6 years is moving to be more CMS rather than CMF. Drupal API is changing significantly, complexity of Drupal core and core modules is raising accordingly and there is no "LTS Drupal API" which developers may use without afraid that they should rewrite everything from the scratch in every 2-3 years. I understand that Drupal is trying to be on a "sharp edge" of technology and create LTS API is easier said than done. But unfortunately I can't to nothing about it.

    So Drupal is moving on and we're left on our own island now. Yes, you're right, looks like I'm a little bit upset about D8 release and ending of D6 support.

    Probably I should try to port some features to D8.1 but the problem is that some of them are too radical, e.g. keep entire bootstrap registry in cache or not to load all the modules for a single page request. Another problem is that some of the changes probably suitable only for my installation and cannot be used in common Drupal sites due some technical issues that I don't see yet. And also I'm not familiar with D8, I'm not sure that I will able to handle the task.

    Migrate plan also will be a very hard task for us. My company have their own problems which developers department should solve every day and to make a good upgrading plan we should spend a lot of resources.

    So the easiest way will do nothing and continue to work on our own fork of D6.

    Thank you for reply, Andy.

    Drupal 8 is the way forward

    andy_read's picture

    Riki, I sympathise with your situation. There are many who have invested a lot in D6 and I still have a couple of small client sites running it. But D6 has had 5 years of support since D7 was released and I think that's about as long term support as we can reasonably expect, especially from an open source project which has traditionally been completely reliant on volunteer time (although that is beginning to change in good ways, with safeguards against commercial monopolies).

    The more you describe the improvements you've made though, the more I feel that D8 is the way forward - it has the same ideas built in of only loading modules needed, but it does this with the widely accepted PHP standard autoload mechanism (PSR-4) and in general is based on widely supported Symfony-2 framework components. Drupal has always been both a CMS and a CMF and many of the D8 changes are geared towards using it for services, including headless, not just traditional web front-ends.

    You have probably achieved great things on your own over the last 6 years, but if you've done it your own way by modifying core, then you really can't expect any support from the Drupal community. There is nothing to stop you continuing to use D6 and others will too - for a while. There will probably be companies specialising in supporting older D6 sites that can't be migrated, but they will struggle to support a site where the core has been heavily modified. I've heard this story from developers numerous times over the years: "We took on a project to support an existing site (D6 or D7) but the core has been modified so it's impossible to maintain".

    What the Drupal community has heavily committed to though, is to support migration from older versions of Drupal to D8, making migration from D6 a priority. You'll see lots more of this over the coming weeks and I'm still getting up to speed with it, but there is the migrate upgrade module (https://www.drupal.org/node/2257723), which does a lot of the legwork, plus Drupal Console (https://www.drupal.org/project/console) which has tools to create skeleton custom modules. There is also the Module Upgrader module (https://www.drupal.org/project/drupalmoduleupgrader) but this currently seems to be focussed on upgrading from D7, not D6 (yet).

    All of that migrating helpers

    Riki_tiki_tavi's picture

    All of that migrating helpers looks promising and I assume that migrating from D6 to D8 is not such a big deal as I thought.

    But I'm sure that my problem is not unique. Using Drupal as CMS and as CMF is a two completely different ways. You can't satisfy both Webmasters (who make sites for their customers using only web browser) and Developers (who create a large custom environment for small and medium business needs).

    Here how I see it (DCT - Drupal Core Team):

    DCT: hey, we want to make a green button on that page. Do you need it?
    Webmasters: Yes! That's a wonderful idea! Can you also add a red and a blue one?
    DCT: yes, easily

    DCT: hey, we want to make a green button on that page. Do you need it?
    Developers: Will it improve performance?
    DCT: hmm, probably not.
    Developers: It's a security fix?
    DCT: no, certainly not.
    Developers: Maybe it will increase API reliability?
    DCT: not sure about that
    Developers: So we don't need it. We can make a buttons by ourselves.

    I'm not sure that it's true because I can tell only from my own side (as a developer) but that's how I see it. How many work-hours were spent on a "exporting and importing settings" feature? Yes, it's a killer feature for webmasters but it's absolutely useless when you have a server farm and automated deployment practices.

    Another problem that bothers me is performance. When all our modules will be upgraded to D8 coding standards and everything will work fine, all tests will be green and we'll release everything to production, I'm sure that our database will handle drastically increased amount of SELECTs but the page speed will decrease for 100-200 milliseconds (30 core database requests + connection, of course). You can say that it's not important but I believe that it may significantly decrease our SEO Page Ranking in a long term.

    So at this point I can see four options for the upgrade:

    1. Contribute a huge patch into Drupal Core (I'm not sure that it's real, provided code may be very ugly and some solutions may be rude and considered as unacceptable by Drupal Core Team)
    2. Create a module that will implement caching and warming up of the Drupal bootstrap engine with small amount of patches or even without them (I'm also not sure that it's possible in a gentle manner, i.e. the D8 core have all necessary entry points)
    3. Repeat the story with modifying D8 core (not very wise and less preferable)
    4. Migrate to another Framework and technologies (we have a number of experienced RoR developers and some tools written on RoR for backend, but well written Ruby code cannot show the speed as a well-optimized PHP code does)

    As you can see, the choice is not obvious. Probably someone from D8 core team may assist with option 1 or 2?

    Riki, Thanks for your ideas

    fabianx's picture

    Riki,

    Thanks for your ideas and input.

    This is exactly what we are trying to do with D8:

    • Cache everything and make D8 behave as if it was a static page.

    • All pages in Drupal 8 are by default cached in dynamic_page_cache (authenticated and anonymous).

    • All anonymous user pages are additionally cached in page_cache.

    All blocks are themselves render cached. Only when something changes, the cache is invalidated. (cache tags)

    It is perfectly possible to serve outdated information while re-generating the cache in the background (Even contrib could do this with around 20-30 loc).

    Drupal 8 knows the difference between dynamic and static parts of the page and splits it up accordingly.

    All information is cached on each server (APCu) and can even be cached nearer to the user in Varnish or CDNs like Fastly.

    For more information please see:

    https://events.drupal.org/barcelona2015/sessions/making-drupal-fly-faste...

    All new ideas are obviously appreciated, so if you want to share something, please open a new topic with ideas here in the High Performance group.

    There is nothing as "too radical" of an idea. All you need to do is: Share it. (or even open a core issue and ping me (Fabianx, WimLeers or catch) in IRC)

    Thanks,

    Fabian (D8 Core Developer)

    Thank you Fabian, at last I

    Riki_tiki_tavi's picture

    Thank you Fabian, at last I got something certain about D8.

    Everything that I got before was something like "Don't ask. Just upgrade. D8 is better and will solve all your problems." (Sorry, Andy) :-)

    Cache everything and make D8 behave as if it was a static page.

    How soon you plan to make it happen? What is the state of 8.0 and 8.1? That's exactly what I need. It was so indispensable for me 6 years ago that I had no another choice but to hack D6's core.

    It's already in D8 release -

    andy_read's picture

    It's already in D8 release - that's what I tried to say earlier - end of story :-)

    Do watch Fabian's Barcelona presentation - I was there and it covers this whole subject of how we can cache everything and know that it's always cached and cleared as expected.

    In general D8 is very well documented on d.o. I've been googling a lot of D8 info over the last few weeks and much is out of date, but the d.o. documentation is by far the most reliable and quite comprehensive at this stage: https://www.drupal.org/8

    Ok. Today I've spent a whole

    Riki_tiki_tavi's picture

    Ok. Today I've spent a whole working day trying to inspect D8 capabilities and gotchas.

    Here the list of cons that I've found:

    • There is NO native memcache support in D8.
    • There are two active modules for 8.1 that implements MemcacheStorage: memcache and memcache_storage. Both are not ready for production use: there is no documentation how to install them on D8. Tracker contains some issues about poor performance.
    • I've discovered some "KeyValue" storage mechanics in D8 that is still using Database (Why Memcache and Redis aren't an appropriate Key-value storages? Why Cache Storages and KeyValueStorage should be a different entities?)
    • I was not able to find any information how I can force "cachetags" to use memcache storage instead of database.
    • There is NO support for Redis Storage implementation in D8 at all.
    • Complexity of D8 code with the help of OOP becomes almost impossible for unprepared developer (i.e. for me).
    • Manual traversing and debugging through D8 core code becomes extremely hard. That was a very annoying problem in Symfony. Our company revoked to use Symfony because of that fact in favor of RoR 4. Indeed you'll get a speed up on the start of developing but each serious core problem will put you into a deadlock where you'll stuck for a weeks seeking for workarounds. That is exactly what happened with our team on Symfony..

    The pros for me:

    • Looks like there's a possibility to override any part of the core that you want by modules if you know how (there is no documentation though).
    • Support for different environments (dev, staging, production). It will simplify deploying and testing processes.

    So the only thing that I can do is to conclude that D8 is not ready for production yet. And our team is not ready for upgrade too.

    I think we should stick with our D6 modified core for some time until all problems will be resolved and I'm afraid that the time will be far beyond of February 2016.

    I don't want to offend anyone but D8 is looks like a good product of Architecture Astronautics (there is an article about that from Joel on Software).
    You had flew too high into the space :-(

    I've discovered some

    Garrett Albright's picture

    I've discovered some "KeyValue" storage mechanics in D8 that is still using Database (Why Memcache and Redis aren't an appropriate Key-value storages? Why Cache Storages and KeyValueStorage should be a different entities?)

    Because a SQL database is a simpler requirement to fulfill than a key-value store. A $5/mo web host will give you a MySQL database, but probably not a Redis… um… instance. So it makes sense for Drupal to just use the database for key-value storage until it is told otherwise.

    Incidentally, you can do this in D7 too; as the cache and variable storage systems are amenable to being used with key-value store systems, it's possible to use them with such instead of using the SQL database.

    MySQL Performance

    mikeytown2's picture

    Not to mention that with MySQL 5.6+ InnoDB is really fast (5.7 is GA and even quicker); we ended up abandoning memcache due to high availability issues. If you can write non locking queries then MySQL is quick; 100,000 queries per second is no longer unattainable.

    Yes, that is correct. With

    Riki_tiki_tavi's picture

    Yes, that is correct. With all of that HA-issues I'm always forget about $5/mo hosting. KeyValueStore is not a big deal. One of memcache storage modules may implement KeyValueFactoryInterface. The big deal is that all that stuff is not ready-for-production (for my production) yet :-(

    I think it's obvious that Drupal core for HA sites should be a little bit different than Drupal for $5/mo-hosting. I believe that community members may resolve this dilemma if they want.

    There is NO native memcache

    fabianx's picture
    • There is NO native memcache support in D8.

    Would you expect this to be in Core directly?

    • There are two active modules for 8.1 that implements MemcacheStorage: memcache and memcache_storage. Both are not ready for production use: there is no documentation how to install them on D8. Tracker contains some issues about poor performance.

    The performance of the extension is mainly dependent on the used PHP extension.

    Both could be stable by now - it is more a matter of releasing a new version.

    • I've discovered some "KeyValue" storage mechanics in D8 that is still using Database (Why Memcache and Redis aren't an appropriate Key-value storages? Why Cache Storages and KeyValueStorage should be a different entities?)

    The kev_value is the abstraction. Drupal 8 does not need anything than a DB out of the box. Why should D8 force users to install a KV store?

    However, because it is abstracted you can store it where-ever you want.

    As KV is meant for persistent data however, memcache would not be a good match.

    • I was not able to find any information how I can force "cachetags" to use memcache storage instead of database.

    cachetags by definition need a persistent storage, which memcache cannot provide. For redis the internal cache tags mechanism can be used.

    • There is NO support for Redis Storage implementation in D8 at all.

    There is a redis project in contrib - though probably only on GitHub right now.

    • Complexity of D8 code with the help of OOP becomes almost impossible for unprepared developer (i.e. for me).

    There is a learning curve, but overall the used framework does not matter as much as the abstraction of services.

    • Manual traversing and debugging through D8 core code becomes extremely hard.

    Or extremely easy. Your mileage may vary.

    • "I don't want to offend anyone but D8 is looks like a good product of Architecture Astronautics (there is an article about that from Joel on Software).
      You had flew too high into the space :-("

    lol, could say that, but overall the code has become cleaner and more abstracted.

    Now for D9 we need to remove the unused layers and simplify where possible.

    precache data

    mikeytown2's picture

    I'm starting to do a lot of this in D7.

    https://www.drupal.org/project/apdqc fixes all the database deadlocking issues I've encountered so that MySQL is just as fast as Memcache and can handle 200+ concurrent users (open connections). It will also prefetch cache data so that cache reads are "free"; writes are async so that they are "free" as well.

    In terms of precaching the dev version of https://drupal.org/project/httprl/ has an awesome "function cache" function; we use it for semi-delayed constantly updated, always available stats that take over a minute to generate. httprl_call_user_func_array_cache() is similar to call_user_func_array() but it returns the cached value of the function and then re-generates the cache value in the background. This gives you a 100% cache hit rate with minimal lag in terms of how old the cache is.

    Thank you for reply. Very

    Riki_tiki_tavi's picture

    Thank you for reply.

    Very interesting approach with MEMORY tables.

    It seems that such setup is not very HA when you use Master+Slave with load balancing between MySQL instances. MySQL becomes a single point of failure. Here the cons that I can see:

    • According to mysql docs if your Master will fail then all memory tables will be erased. That may lead to a very high load until one of your standby will become a Master. There is a chance that the whole system may freeze and you should restart it. In that case your databases may be corrupted.
    • When you'll have several frontends looks like they should write into MEMORY tables only on Master, you will get a network delay for that write operations. Original Drupal have a lot of expirable cache entries so write-to-cache operations count may be high.
    • On Master+Slave setup MEMORY tables should be replicated between instances too. Cache-generated mysql replication traffic between Master and Slaves may be a very high and very hard to manage especially if you have a distributed cluster.
    • If you plan to use Master+Master you should write your own database cache handler. Drupal don't able to run on M+M setup out of the box.

    I'm going to use a schema based on a "reversed memcache cluster" when your frontends rely only on a local memcache instance and may even work without database in "readonly mode" if your database will goes down.
    When some data changes (e.g. node created) the caching engine should spread modified cache items across all online instances. It can be done on a client POST synchronously or asynchronously by using a queue in database.
    When one of frontends should be rebooted for a maintenance, on the system start their local cache will be warmed up and after that it will be ready to handle clients. One of the cons is that such scheme have a limitation of memcache storage size but for my needs 1G on each frontend is pretty enough. Another con (of course) is that Drupal core should be modified to implement cache warmup mechanics too.

    we ended up abandoning memcache due to high availability issues

    Can you describe those issues?

    Master will fail then all

    mikeytown2's picture

    Master will fail then all memory tables will be erased

    This is only for the semaphore table and only if you're using MySQL 5.5 or lower. InnoDB performance in 5.6+ makes memory tables not that useful. Even still I don't see the big downside because it's only used for that table. Usually when things go bad you want to clear out all previous locks because your site is now waiting for something to be done and it's not going to happen until the lock times out (30s).

    When you'll have several frontends looks like they should write into MEMORY tables only on Master, you will get a network delay for that write operations.

    Not 100% sure what you're saying here. We use multiple webheads that then point to a master/slave MySQL setup. Network delay is inevitable in a high availability setup. I'm not worried about the one MEMORY table that holds locks losing it's data when the master crashes; from what I've seen this will actually speed up recovery time as locks will be acquired again instead of waiting.

    Cache-generated mysql replication traffic between Master and Slaves may be a very high and very hard to manage especially if you have a distributed cluster.

    This can be true. Same problem usually happens with most other key value stores if it's in a high availability cluster.

    We ended up abandoning memcache due to high availability issues.
    Can you describe those issues?

    Not a post by me but it talks about some of the issues we've had. It mainly has to do with cache clears and how the whole memcache cluser get's reblanced when one node drops out (couchbase) resulting in oddities in the cached data. Since going with APDQC we've had zero db issues.
    http://dev.mlsdigital.net/posts/Cloud-Native-Drupal/#remote-data-center-...

    Thank You mikeytown2 I love

    tinohuda's picture

    Thank You mikeytown2

    I love you awesome https://www.drupal.org/project/apdqc
    It's save me from slow drupal 7.

    Now my authcache and apc module uninstalled, and my drupal 7 site run much faster.

    Thanks.

    BR,

    D8 + Redis

    Jamie Holly's picture

    I've been playing with the D8 Redis module (it's on GitHub). Normal page loads are only executing 6 queries on authenticated users. 3 are to load the user (session, roles and fields) and 3 are for states.

    Haven't gone through real hard testing yet, but so far this seems like a pretty nice boost.

    One thing I'm wondering on is how memcache will handle that tag checksums for cache bins. The only way I can see that working is to keep the tags in the database, since you want those to persist. One thing is for sure; with Drupal 8 we have opened a whole new world of performance enhancements.


    HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

    Write through is a must for

    fabianx's picture
    • Write through is a must for memcache cache tag checksums obviously.

    But then there is 2 possibilities:

    • Treat every missing key as a miss and get those from the database instead.
    • Write back the checksums for missing or never expired tags during retrieval of the item. (which is "bad" as it is writing on a get())

    But yes, it is still a 'hot topic'.