Slow queries - Problem with accessing Content management -> content
We have just installed the latest version of Drupal 5 on a new site and loaded it up with content, about 4.1 million nodes. These nodes are split between three types of content types, which amounts to around 2.6 mill, 1.5 mill and 70.000 nodes for each type.
Hitting the administrator menu : Content management -> content destroys performance beyond anything I've seen :)
This is the query which totally smother the performance, taken from the developer module:
337659.39 0 pager_query SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50
Why?
My opinion is that such an expensive query shouldn't be allowed to be in a default installation of Drupal. This has the implications that Drupal can not grow beyond the scope of "small sites" without tweaking the core code.
I'm not even sure what the query is doing. Sorting the node table on field changed and then fetching data for the 50 first nodes should be pretty straight forward. Why the join with table users? The node table already has a field uid. It seems to me that getting the name of the user for each node later would be much faster (it is a lot faster), so what is the real reason for that join?
Running this query a few times in a row brings it down to 2.5 minutes:
SELECT n.*, u.name, u.uid FROM node n INNER JOIN users u ON n.uid = u.uid ORDER BY n.changed DESC LIMIT 0, 50;
This query returns it result set in 0.00 seconds (after having been run once - first run completed in 9.5 seconds - query cache kicks in):
SELECT * FROM node ORDER BY changed DESC LIMIT 0, 50;
The in the code, which loops through the result set, you could easily get the name of the user for each uid (and it would be SO much faster!)
I feel a default installation of Drupal should not buckle under even with semi-large number of nodes in it's database, especially when we aren't doing anything fancy besides clicking on a few administration links.
How do other people with semi-large sites handle this? - Or are we doing something wrong?
I tried searching for this, since I figured this would be a quite common problem for anyone with a mature number of nodes, but I couldn't find anything... So maybe this is not a problem but something we haven't configured or done correctly.
How many other cumbersome sql queries will we stumble on later?
Right now I see no other solution than to change the code so standard administration will actually work...
Settings for PHP:
max_execution_time = 800
max_input_time = 260
memory_limit = 50M
Total memory is 8GB, with another 8GB of swap, running on Dual Quad Core Intel-server from Dell
Running Ubuntu server 7.04
PS.
"The Performance and Scalability forum on Drupal.org is where else to look."
Maybe that line should be removed from the groups header, since that forum is deprecated?


Yes, needs improvement
That page has a lot to improve on as a "core" component. Most large sites end up building views per roles or other custom content management overview / filter pages. So, that's one answer -- don't use that core functionality, use Views to build dashboards as needed.
Also, sounds like you've already identified a solution -- a quick patch for D6 / D7 sounds in order?
PS. I removed the link in the header, and did some maintenance to add a couple of other people as admins.
I think this is a problem
I think this is a problem with MySQL rather than Drupal per se - in my experience, optimisation of inner joins is really poor. I've gotten around the problem in other modules by splitting SQL statements into 2 - the first to return the nids of the node using the "LIMIT x, 50" clause, then the second - using "WHERE n.nid IN (n1, n2, n3, ...)" instead of the LIMIT clause - to run the inner join.
I guess it's something that needs to be addressed in Drupal, assuming vast majority of us use MySQL - but mainly I think it's a serious shortcoming of MySQL that it buckles under a standard (and very simple) SQL statement.
That said, other SQL formats are optimised quite impressively in MySQL that don't perform so well on other databases - but the result is that it's difficult to standardise the SQL in drupal code to cater for multiple back-end options.
Have you tried to EXPLAIN that statement on your system?
On my system, this query works like a charm. Here's my EXPLAIN output:
EXPLAIN SELECT n.*, u.name, u.uidFROM drupal_node n
INNER JOIN drupal_users u ON n.uid = u.uid
ORDER BY n.changed DESC
It runs in just a few millisecons.
Maybe there is something odd somewhere in your MySQL or something...
Thanks for the heads up!
Thanks for the heads up!
I did run EXPLAIN on the query earlier. From which I got the result "possible keys" : uid for node and PRIMARY for users. So I just figured that everything was ok there. Didn't look to deep into it.
Then, looking at the result from your EXPLAIN I could see something was wrong with my result. Because MySQL liste table u (users) first, then table n (node), and under "Extra" gave me this information : "Using temporary; Using filesort"
My EXPLAIN gave this result at first:
id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE u ALL PRIMARY NULL NULL NULL 19 Using temporary; Using filesort
1 SIMPLE n ref uid uid 4 drupal.u.uid 11 Using where
So I did an ANALYZE on table node. Running EXPLAIN again on the same query gave this result:
id select_type table type possible_keys key key_len ref rows Extra1 SIMPLE n index uid node_changed 4 NULL 4138957
1 SIMPLE u eq_ref PRIMARY PRIMARY 4 drupal.n.uid 1 Using where
Running the query now, took only 0.71 seconds with empty query cache, and 0.00 seconds after. Obviously ANALYZE is an important statement and I should learn to read past column 5 of the EXPLAIN statement :)
Just curious, which version of MySQL are you using?
Erlend Strømsvik - erlend@nymedia.no
Ny Media AS - http://www.nymedia.no
MySQL 5.0.51a with InnoDB tables
I also optimize tables daily, just after dump. Maybe that made the difference.
ANALYZE on MyISAM
Does this approach work on MyISAM tables? My node table (D6.4) uses the MyISAM engine, when I run the EXPLAIN command I can see the query uses filesort but running the ANALYZE command seems to make no difference when I rerun EXPLAIN against the select.
I've a feeling I've missed something but can't really see what it is!
TIA
Yes it does
It works for MyISAM. Can you paste the results from your explain? It sounds like you might be missing an index or two.
This is the result for the
This is the result for the query (apologies for the formatting). It's being run against an vanilla D6.4 install.
Make sure you have an index
Make sure you have an index on changed in the node table. If not then add it. That will get rid of the filesort. If it is there, try an analyze table again and then an optimize.
I can't remember if that is a key that Drupal adds on install, or if it's one I have added doing my own tuning. I usually end up redoing some of the indexes from a vanilla Drupal install. You need to really try and weigh the benefits of a new index. If it's a very common query that is run, then add it. If it isn't then don't. You don't want to go index crazy since that will cause performance issues every time a node is updated or created.
But in a table without that key, containing 30,000 nodes, it takes 1.3536 sec. With that key added it only takes 0.0017 sec. That's off my development server that has all MySQL caching disabled.
Thanks for the speedy
Thanks for the speedy answer. I'm really curious about this more than anything, I'm trying to learn as much as I can.
Using Navicat I've looked at the indexes on the node table and one exists for "changed" already called "node_changed". I've done as you suggested and run analyze followed by optimze again (using both the inbuilt Navicat commands and the SQL commands) but explain still shows me using filesort...
Is there some configuration change I need to make directly on the MySQL install? FYI I'm using vanilla MySql 5.0.51b on Windows.
TIA.
my.cnf? available memory? database size?
Maybe you need to increase join_buffer_size or something else in your my.cnf based on database size, server load, available memory, etc.
Thanks indeed!
OMFG I take back everything I just said. I'd run SQL profiling, forced indexes, repaired indexes, just never quite worked out that ANALYZE was the command I was looking for... (blush)
The EXPLAIN command
For those of you new to the EXPLAIN command, there's some info at mysql.com. But basically "using temporary" is bad and "using filesort" is bad.