Optimizing for many nodes and comments - database view

Events happening in the community are now at Drupal community events on www.drupal.org.
meba's picture

Hello, i am currently trying to optimize a website with 82000 nodes and 749414 comments. Unfortunately, it's badly developed, many modules, etc. but I can't do anything with it right now.

As far as I've been searching, frontpage took 800 (!) SQL queries to display. I used:

and I was able to reduce the frontpage to 357 queries, but still, it's too slow. I've eliminated all potentially slow queries and now I am getting highlighted (>5ms) queries just randomly, depending on the load on database server (still, one of the slowest is cache_filter, having around 239 000 rows)

Database is now:
Dual core Intel, 2.4GHz
1GB RAM (could be more, but swap is not used at all so I guess it may be enough.)

my.cnf:
key_buffer = 64M
key_buffer_size = 64M
max_allowed_packet = 16M
thread_stack = 128K
thread_cache_size = 32
max_connections = 256
table_cache = 220
tmp_table_size = 64M
skip-bdb
skip-innodb
query_cache_limit = 6M
query_cache_size = 256M

The site is typical community site - lots of nodes, comments, forums, galleries.
Also, even if I eliminated table locking, i am usually seeing some Locked queries in show processlist and Table_locks_immediate = 9763474, Table_locks_waited = 228974 (There is no other site running on the server).

Do anybody have any idea for optimizing a little bit more or is the database server just too slow for this? (According to AwStats, the website is loaded, but not too much, average 123 000 pages a day). Thanks!

Comments

SELECT DISTINCT(node.nid),

meba's picture

SELECT DISTINCT(node.nid), node_comment_statistics.last_comment_timestamp AS node_comment_statistics_last_comment_timestamp_last_comment_timestamp, node.type AS node_type, node.title AS node_title, node.changed AS node_changed, users.name AS users_name, users.uid AS users_uid, node_comment_statistics.comment_count AS node_comment_statistics_comment_count, node_comment_statistics.last_comment_timestamp AS node_comment_statistics_last_comment_timestamp FROM node node LEFT JOIN comments comments ON node.nid = comments.nid AND comments.uid = '7794' LEFT JOIN node_comment_statistics node_comment_statistics ON node.nid = node_comment_statistics.nid INNER JOIN users users ON node.uid = users.uid WHERE (node.status = '1') AND (node.uid = '7794' OR comments.uid = '7794') ORDER BY node_comment_statistics_last_comment_timestamp_last_comment_timestamp DESC LIMIT 0, 25;

also, MySQL server really likes this kind of query in slow log...

We seem to be hitting the same wall

mikeytown2's picture

http://drupal.org/node/624000

Above site can not run without Boost. Stats collection brought down MySQL very quickly when Boost disappeared for a short time. So I also recommend using Boost as it will free up MySQL & CPU time on your web server.

Memory Caching

Jamie Holly's picture

I would suggest setting up a memory caching system that CacheRouter supports. It gets rid of a lot of those cache queries.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

800 is a lot, but not excessive

kbahey's picture

Having seen a lot of sites in need of optimization, I would say that 400-500 queries per page is normal. 800 queries is a lot but not excessive. I have seen 1,000 too.

The 5 ms threshold is just a guideline, not a hard number.

What matters is which queries, not just the number of queries ...

What we need to see is how many milliseconds total and how many for database per page. If you are under 500 ms, then it is OK. If it is more, you have to know how much is spent in the database, then tune from that.

You may want to check performance logging and monitoring module but it is most effective when APC is being used, not eAccelerator.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

I've copied the site to my

meba's picture

I've copied the site to my laptop and doing tests there, so:

Node view, first time:
Executed 283 queries in 1504.6 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 9739.76 ms.
Node view, second time:
Executed 241 queries in 235.46 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 548.72 ms.

Attaching a summary performance log. Above you can see that execution time is 9700ms, this is not usual as far as i know, but I should also look into optimizing code run, not only the database.

Hmm, the performance log didn't attach, you can see it here: http://temp.jakubsuchy.cz/performance.png
Also, this is mysql slow log: http://temp.jakubsuchy.cz/mysql-slow.log.gz

Accelerator and MySQL Cache

kbahey's picture

Jakub

Laptops have slower hard drives, and the power saving settings can really slow things down.

The discrepancy between first hit and subsequent ones is due to two things: accelerator, and mysql query cache. 9700 ms total and only 1500 of them in the database vs. 548 and 235 only next time. First is due to PHP eAccelerator, and second is due to mysql query cache.

The rest of the report has only one hit, so it would be a worse case. Look at the second line in the performance report: 5000+ ms total, and 1722 ms for queries.

I'd rather you put this on a server similar to what you will deploy on, and then start optimizing rather than going on a wild goose chase, and optimizing the wrong things.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

It's already deployed, it's

meba's picture

It's already deployed, it's an existing website. MySQL above, web server is some quadcore (which is running several websites and wasn't performing well due to concurrency, that's why we made second mysql server before we buy dedicated just for the website). I can easily optimize there, but I guess running performance module would kill the server...(I will try)

APC

kbahey's picture

You can safely run it on high traffic sites if APC is enabled.

I would not recommend running it on a busy site with database writes, neither summary or detail mode.

I am also looking forward for patches with memcache. There is an issue in the queue for this.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

Drupal performance tuning, development, customization and consulting: 2bits.com, Inc..
Personal blog: Baheyeldin.com.

So we have moved to

meba's picture

So we have moved to dedicated quad core (4x 2GHz AMD), 4GB RAM, 2x SATA in RAID1

I have tried to use cacherouter but unfortunately and I don't know why, the system takes too much processor load (we are using APC).

This is standard scenario:

9103 www-data  17   0  534m  38m  19m R   72  1.0   0:02.47 apache2                                                                                                            
9048 www-data  18   0  534m  40m  21m R   70  1.0   0:06.35 apache2                                                                                                            
9022 mysql     15   0 1071m 343m 3552 S   68  8.7 488:18.59 mysqld                                                                                                             
9005 www-data  16   0  534m  42m  22m S   29  1.1   0:10.88 apache2                                                                                                            
9038 www-data  15   0  541m  47m  22m S   29  1.2   0:09.95 apache2                

With cacherouter + memcache/apc it goes much higher and load goes over 30, with no iowait, so disk is not a bottleneck. I wasn't able to identify why the website takes too much processor load (and even on the old server, it wasn't this bad - old was 4x 2.4GHz Intel but with 150 more websites)

Memcached

Maxime Topolov's picture

Well we're running several huge sites with almost same config :

  1. Apc
  2. Memcached
  3. AuthCache (if authenticated user are using your site)
  4. Boost (very powerfull)
  5. Lighthattpd rather than apache
  6. Make sure you have a 1 (or 2) Front ends with high CPU, 1 DB with huge memory and 1 memcached with REALLY huge memory
  7. Make sure to remove Drupal Search, and replace by Google or Apache
  8. Take in account that until you reach 80-90% of hit rate on memcached you'll have a huge load average on your DB
  9. Try to reduce modules & views usage, specially views take many CPU and sometimes SQL

Maxime TOPOLOV
Adyax / http://www.adyax.com
+33 1 76 77 34 60

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: