Fixes for MySQL Deadlocks in D7

mikeytown2's picture

I've been working on eliminating deadlocks and speeding up the database layer of Drupal. This is what I've come up with after helping out here: http://drupal.stackexchange.com/questions/104880/drupal-7-database-selec... Not everyone can use an alternative cache backend; thus getting the best performance out of MySQL is desired. Noted that converting some cache tables to memory would only be useful for tables that don't get written to very often & low concurrently due to table level locking. Also noted that this would only be useful on MySQL boxes that have the Improved MEMORY Storage Engine patch and if the the data column has been moved to be the last column due to Memory not supporting BLOB/TEXT column types; thus the recommendations below do not use the memory engine for cache tables.

Use READ COMMITTED for MySQL transactions
Patch for new D7 installs: https://drupal.org/node/1650930#comment-8599921
Code to add to settings.php for current installs: https://drupal.org/node/1650930#comment-8437127

Below are patches that are mainly helpful if you are using the database for the cache backend.
Do not use InnoDB for the semaphore table, use Memory
https://drupal.org/node/1898204
Due to limitations with core's database api creating a patch to fix this looks difficult to do. I want to use a BTREE index for memory tables due to this issue: http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mys...
Below is the SQL to switch this over to a memory table and change the primary key to be name, value instead of name & add a unique index on name.

ALTER TABLE semaphore ENGINE = MEMORY;
ALTER TABLE semaphore DROP PRIMARY KEY;
ALTER TABLE semaphore ADD PRIMARY KEY (name, value) USING BTREE;
ALTER TABLE semaphore ADD UNIQUE name (name) USING BTREE;
ALTER TABLE semaphore DROP INDEX value;
ALTER TABLE semaphore ADD INDEX value (value) USING BTREE;
ALTER TABLE semaphore DROP INDEX expire;
ALTER TABLE semaphore ADD INDEX expire (expire) USING BTREE;

Deadlocks occur in cache_field table.
Thinking about this and it seems like the cache_field table needs to be split up into 2 different caches, call one cache_field which stores entity field data and another called cache_field_info for storing cached info about fields. Splitting this up has already been fixed and changing this in D7 would be very hard to support; so for now here is a D7 patch that puts locks around field.info changes in order to prevent transactional rollbacks when they conflict.
https://drupal.org/node/2193149#comment-8607729

Waiting for table metadata lock on cache_field table
TRUNCATE is a DDL statement and thus locks the table. Renaming tables fixes this.
https://drupal.org/node/2222635#comment-8682483

Use asynchronous MySQLi connection to avoid RECORD LOCKS (Deadlocks) on cache tables
Writes to the same cid by different processes causes a record lock. By setting the deadlock timeout to 2 seconds instead of the default of 50; the huge slowdown caused by record locks can be avoided. Bonus points is that cache_set is now faster than cache_get.
https://www.drupal.org/node/2336521#comment-9255467

Comments

Wow, this looks like some

dalin's picture

Wow, this looks like some great scalability research Mike.

--
Dave Hansen-Lange
Technical Manager
Advomatic LLC
Great White North Office
Canada

Agreed. I've been looking

AshleyGerard's picture

Agreed. I've been looking through the archives and found this scalability research.

Voted up!

Slides

mikeytown2's picture

I've put together this and more information into a presentation called DIY Drupal 7 Performance. I'll be giving this talk remotely on August 19th and hopefully at the PNWDS October 17-19.

Looking forward to it,

christefano's picture

Looking forward to it, mikeytown2!

If anyone would like to attend in person or tune in remotely for mikeytown2's presentation, the meetup is the EXTREME DRUPAL MEETUP (yes, in all caps) from 7-9pm Pacific on August 19th:

   https://groups.drupal.org/node/434623  

Space is limited, so if attending in person please be sure to RSVP on Meetup.com:

   http://www.meetup.com/greater-los-angeles-drupal/events/196699562/

Thanks very much mikeytown2.

W.M.'s picture

Thanks very much mikeytown2.

numbers?

Jānis Bebrītis's picture

This seems fairly complicated for average user, can you give any approximate numbers on increased pageload when doing this? Thanks!

Numbers

mikeytown2's picture

These all have around zero impact on performance for a single request.

They have a big positive impact on scalability, which can help the performance stay the same when you have a lot of concurrent users.

I'm working on a module that will make following these guidelines simpler: https://www.drupal.org/project/apdqc. Once I have a proper release of APDQC, I will update my DIY Drupal 7 Performance slides to take advantage of this module.

Extremely interesting and

paean99's picture

Extremely interesting and useful.

I would share a link to the recording of one of your latest presentations DIY Drupal 7 Performance to go along with your slides.

Recorded at the Extreme Drupal Meetup (High Performance, DevOps & Large Scale Drupal) in Marina Del Rey, CA, on August 19, 2014.

no joy

brad.bulger's picture

applied all of these changes (except putting table in memory, which i can't do in this environment) and still getting deadlock errors on cache_field.

is there anything i can do to work around this?

Module

mikeytown2's picture

I created a module for this: https://www.drupal.org/project/apdqc

READ COMMITTED causing errors

pbattino's picture

After first step ("Use READ COMMITTED for MySQL transactions") I started to get errors every minute (must b every elysia cron run):

WD php: PDOException: SQLSTATE[HY000]: General error: 2014 Cannot        [error]
execute queries while other unbuffered queries are active.  Consider
using PDOStatement::fetchAll().  Alternatively, if your code is only
ever going to run against mysql, you may enable query buffering by
setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.: SELECT 1
FROM {blocked_ips} WHERE ip = :ip; Array
(
    [:ip] => 127.0.0.1
)
in drupal_is_denied() (line 1917 of
../htdocs/includes/bootstrap.inc).
PDOException: SQLSTATE[HY000]: General error: 2014 Cannot execute queries while other unbuffered queries are active.  Consider using PDOStatement::fetchAll().  Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.: SELECT 1 FROM {blocked_ips} WHERE ip = :ip; Array
(
    [:ip] => 127.0.0.1
)
in drupal_is_denied() (line 1917 of ../htdocs/includes/bootstrap.inc).
Drush command terminated abnormally due to an unrecoverable error.       [error]

Mysql 5.1.73 / Php 5.5.19 / Drupal 7.36

update mysql to mysqlnd

mikeytown2's picture

http://stackoverflow.com/questions/13159518/how-to-enable-mysqlnd-for-php

Interesting thing is drupal sets the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to TRUE in DatabaseConnection_mysql::__construct() as the error msg says that this is one way of fixing it. Long story short the mysql package is bad mysqlnd is good.

PDO::MYSQL_ATTR_USE_BUFFERED_QUERY

pbattino's picture

Thanks, I was getting to the same conclusions, but since we have such a complicate set up on the stack I was not able to quickly change the mysql package to check. Good, I'll have a try.

I spoke too soon.... we ARE

pbattino's picture

I spoke too soon....
we ARE using mysqlnd !!!
what could be wrong???

mysql
MySQL Support   enabled
Active Persistent Links     0
Active Links  0
Client API version    mysqlnd 5.0.8-dev - 20102224 - $Id: 731e5b87ba42146a687c29995d2dfd8b4e40b325 $

Directive   Local Value Master Value
mysql.allow_local_infile   On  On
mysql.allow_persistent   On  On
mysql.connect_timeout    60  60
mysql.default_host   no value    no value
mysql.default_password no value    no value
mysql.default_port no value    no value
mysql.default_socket   /mysql/mysqld.sock  /mysql/mysqld.sock
mysql.default_user   no value    no value
mysql.max_links    Unlimited   Unlimited
mysql.max_persistent  Unlimited   Unlimited
mysql.trace_mode  Off Off

mysqlnd
mysqlnd    enabled
Version     mysqlnd 5.0.8-dev - 20102224 - $Id: 731e5b87ba42146a687c29995d2dfd8b4e40b325 $
Compression  supported
SSL   supported
Command buffer size   4096
Read buffer size   32768
Read timeout  31536000
Collecting statistics  Yes
Collecting memory statistics    Yes
Tracing     n/a

update mysql 5.1 to 5.6

mikeytown2's picture

All I can offer at this point then is to update MySQL to 5.6 :/

Do you think this can help?

pbattino's picture

Do you think this can help? I'll see if we can do it, in fact other parts of our organization looked at us in horror when I said "we use mysql 5.1", noting that lock related problems are much better in newer versions.

Anyway I ended up on this page because of a lock problem but ultimately it looks like it's the last versions Rules (2.8 and 2.9) that are causing it, reverting to 2.7 fixed the problem.

Congrats for your excellent work anyway! I look forward to try apdqc !

READ-COMMITTED hosing site

ChaseOnTheWeb's picture

I tried adding the line

$databases['default']['default']['init_commands']['isolation'] = "SET SESSION tx_isolation='READ-COMMITTED'";

to an existing site on our test server (RHEL 6, PHP 5.3, MySQL 5.1, APC, memcached), but when I do, the site just hangs indefinitely. Same for drush. No error message; it just doesn't do anything. Nothing in the logs.

Take out the line and all's good again.

Any ideas?

DeadLock on advancedqueue table

enorniel's picture

I have a Deadlock on the advancedqueue table:

PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait timeout exceeded; try restarting transaction: UPDATE {advancedqueue} SET status=:db_update_placeholder_0, expire=:db_update_placeholder_1
WHERE  (item_id = :db_condition_placeholder_0) AND (expire = :db_condition_placeholder_1) ; Array
(
    [:db_update_placeholder_0] => 0
    [:db_update_placeholder_1] => 1432376555
    [:db_condition_placeholder_0] => 646
    [:db_condition_placeholder_1] => 0
)
in AdvancedQueue->claimItem() (line 67 of /var/www/sites/all/modules/advancedqueue/advancedqueue.queue.inc).

It happens because (I suppose) I have multiple queues running and 2 or more of them hit the row to claim the item at the same time.

Do you have any idea how to avoid that ?

Thanks.

db_merge

mikeytown2's picture

db_merge would most likely help to minimize the occurrence of this issue. You can also look at the latest deadlock in detail by running SHOW ENGINE innodb STATUS (from slide #102 of my DIY Drupal 7 Performance presentation). Just a heads up that the default timeout is 50 seconds; you can make this smaller by adjusting innodb_lock_wait_timeout.

enorniel's picture

My infra is based on mariadb galera and I have advancedqueue on multiple web servers in an attempt to parrallelize processes and avoid failures. This error occurs when 2 advancedqueues try to claim an Item at the same time. They lock the row together to update it.

Is there a good workaround ?

Thanks.

use db_merge instead of db_update

mikeytown2's picture

If I were you I'd change the php code so it uses db_merge instead of db_update and post the patch to the modules issue queue. If that doesn't fix it I'd look into using lock_acquire and friends to prevent the database locking from happening. Both solutions require creating a patch for the module.

Faisability

enorniel's picture

I am looking at the faisablity.

The problem is that the claimItem function needs to return values from the select which I think is impossible with db_merge.

lock_acquire

mikeytown2's picture

<?php
     
if ($item) {
       
// Acquire the lock without releasing it to enforce the lease time.
       
if (lock_acquire('advancedqueue:claimItem:' . $item->item_id, $lease_time)) {
         
// Try to update the item. Only one thread can succeed in UPDATEing the
          // same row. We cannot rely on REQUEST_TIME because items might be
          // claimed by a single consumer which runs longer than 1 second. If we
          // continue to use REQUEST_TIME instead of the current time(), we steal
          // time from the lease, and will tend to reset items before the lease
          // should really expire.
         
$update = db_update('advancedqueue')
            ->
fields(array(
             
'status' => ADVANCEDQUEUE_STATUS_PROCESSING,
             
'expire' => time() + $lease_time,
            ))
            ->
condition('item_id', $item->item_id)
            ->
condition('expire', 0);
         
// If there are affected rows, this update succeeded.
         
if ($update->execute()) {
           
$item->data = unserialize($item->data);
            return
$item;
          }
        }
      }
?>

This will mean that only 1 process can update that item and it can only be updated every 30 seconds. To make this more robust I would release the lock where claimItem gets called like so; but this is not required, it will work with just the change above.

<?php
   
while ($item = $queue->claimItem()) {
      if (
time() > $end) {
       
// We've reached max execution time.
       
lock_release('advancedqueue:claimItem:' . $item->item_id);
        return;
      }
     
advancedqueue_process_item($queue, $queue_name, $queue_info, $item, $end);
     
lock_release('advancedqueue:claimItem:' . $item->item_id);
    }
?>

I had an interesting case

dgtlmoon's picture

I had an interesting case recently, turns out that even when your site is in 'maintenance mode' this does NOT mean that your database is totally unaffected by public traffic. I was getting the dreaded lock on cache_fields whilst doing an upgrade on a site which was in maintenance_mode

If there is a request for a page that is NOT in the cache, it causes Drupal to execute all hook_init implementations across your active modules, many commonly used modules use this opportunity to update 'cache_field' either directly on indirectly

On a large installation there can be over 100 different hook implementations that are fired! even when in maintenance!

Including hook_init, hook_node_info, field_info_alter, entity_load etc etc and importantly hook_init

Fortunately the website I was working on is mostly used as a headless Drupal install to feed some other websites, and that hook_boot runs before any caching or page processing runs

// Implements HOOK_BOOT
function sadhack_boot() {
  if (variable_get('maintenance_mode', FALSE)) {
    if(preg_match('/^(node|json|component|etcetc)/i', $_GET['q'])) {
      header($_SERVER['SERVER_PROTOCOL'] . ' 503 Service Unavailable', TRUE, 503);
      exit();
    }
  }
}

The issue for me was that whilst I was reverting features (to update with a new field), requests were hitting the site and causing those hooks to fire entries (yes, page cache was on but sometimes they hit an uncached page, it only took one!) and then whammo, cache_field deadlock whilst my poor feature was trying to revert

Moral of the story - Drupal does not do what you think it does, always UNPLUG YOUR DRUPAL :) and then run your updates if its critical (fields etc)

One workaround for this is

dalin's picture

One workaround for this is instead of enabling maintenance mode, deploy an index.html file (and some modifications to .htaccess) that shows the maintenance page.

--
Dave Hansen-Lange
Technical Manager
Advomatic LLC
Great White North Office
Canada

marko_mmc's picture

You my friend are a genius! - This solved PDOException: SQLSTATE[HY000]: General error: 1205 Lock wait problem for me after hours of going through various fixes.

deadlock in openscholar dist. over drupal7

gonchiponchi's picture

hi,

Thanks it worked for us!!
We have 500 websites on openscholar using a strong MYSQL server 5.7 . we've hundreds of users working together and we started to get deadlock for each cache clean in the last months and became impossible to work. I fixed standard mysql def. to avoid the server fail on deadlock. this def. makes rollback on timeout, it only permits us keep control over the database, before this fix we lost the server.

innodb_deadlock_detect = 0
innodb_lock_wait_timeout = 20
innodb_rollback_on_timeout = 1

I altered the semaphore table and it's really worked, the cache clean still work slow but we don't have deadlock. I hope it's enough because we can't add patches to the drupal code because we are using a special distribution, not a clean drupal7. This distribution run OG modules, they change many of the standard drupal7 features but the cache management is standard.
Also i'm checking the transaction isolation change in the setting file ,
perhaps it improves the performance .

Sounds like instead of using

dalin's picture

Sounds like instead of using the DB for the cache, you should switch the cache back-end to Redis
https://www.drupal.org/project/memcache
or Memcache
https://www.drupal.org/project/redis

Redis in particular is fairly easy to get set up, and the performance gain on a site like yours should be quite significant.

--
Dave Hansen-Lange
Technical Manager
Advomatic LLC
Great White North Office
Canada

memcache

gonchiponchi's picture

yes, i read we need to work with memcache as less , thanks, i 'll check redis.

Before this i didn't work on drupal7 with a heavy traffic and users working on data .

The standard configuration was enough to keep the sites working also with a lot of

traffic but not like this , It was catastrophic.

Use READ COMMITTED for MySQL transactions

gonchiponchi's picture

i changed the default transaction def. in the setting file , it improve the performance a lot, using this
$databases['default']['default']['init_commands'] = array(
'isolation' => "SET SESSION tx_isolation='READ-COMMITTED'"
);
we reduce the qps on the database from over 1000 , usually around 1500 to less from 1000 around 600 qps. The transactions default slowed the database work and it caused to the websites load be slowest than the expected for us.
Next change will be the memcached support , i tested it in develop environment is very easy to configure it in drupal. i expect to reduce the qps by half as less perhaps more.

memcache

gonchiponchi's picture

We started to work with memcache and it cut the qps to half almost and also reduce the CPU use. We were before with constant 1/3 until 2/3 CPU power in use after optimization changes in the tables and transactions. Now the database use 20% CPU most the time and it doesn't grown more than 50% . The sites are working quickly in particular for occasional users reading information . i think we can hold more than 1000 websites without performance problems with this configuration.
thanks to the high performance people they wrote this article and users for their comments. I hope this post will be util for another users having performance problems in drupal 7 or drupal 7 distributions.

disable the memcache

gonchiponchi's picture

After 3 weeks i disable the memcache due to problems with the data updates through the CMS . Users managers had all kid of problems due to the cache. I though that drupal doesn't make cache with admin status but it made and bring us a lot of problems. May be that the problem is the in php memcache client, it's not stable , i'm working with php 7.0.22 and memcache client 3.0.9-dev, perhaps this is a bad library.
Also still we have once in the day dead lock, i must reboot the database. i'll change
innodb_lock_wait_timeout to 2 seconds , i hope this may help .
We are using MYSQL 5.7 on UBUNTU 16