Mercury on Ubunut Lucid

Events happening in the community are now at Drupal community events on www.drupal.org.
sreyas's picture

Hi,

I have installed pantheon on Ubuntu (lucid). But please note I have downgraded the php to 5.2.10 (karmic version) since the site was giving some errors with drupal 6.13.

Server softwares

Ubuntu Lucid
Apache/2.2.14
PHP 5.2.13 (cli)
Mysql 5.1.41-3ubuntu12.3
Varnish-2.1
Mercury 1.1

The site is working good.. but from time to time load increases tremendously and server just hangs. Seems like php connections are not getting closed.
This is from error.log

[Mon Jul 12 20:47:16 2010] [error] child died with signal 9
[Mon Jul 12 20:47:24 2010] [error] child died with signal 9
[Mon Jul 12 20:47:24 2010] [error] child died with signal 9
[Mon Jul 12 20:47:36 2010] [error] child died with signal 9

My server tuneables

/etc/apache2/apache2.conf

export APACHE_MAXCLIENTS="10"

/etc/apparmor.d/usr.sbin.mysqld

export APPARMOR_MYSQLD=""

/etc/default/tomcat6

export TOMCAT_MEMORY="128"

/etc/default/varnish

export VARNISH_MEMORY="64"

/etc/memcached.conf

export MEMCACHED_MEMORY="128"

/etc/mysql/my.cnf

export INNODB_BUFFER_POOL_SIZE="64"

in bytes (ie, 1Gb = 1073741824 bytes)

export INNODB_LOG_FILE_SIZE="1073741824"
export KEY_BUFFER_SIZE="8"
export MYSQL_MAX_CONNECTIONS="20"

/etc/php5/apache2/php.ini

export PHP_MEMORY="96"

/etc/php5/conf.d/apc.ini

export APC_MEMORY="128"

Could anyone tell me how i can make sure client connections are closed correctly. Due to this server load my site and even my server is going down. I think this happens after around 3-4 hours of running.

Regards
Sreyas

Comments

Update

sreyas's picture

Hi,

Well I think this has something to do with OS itself. I am attaching some more errors from /var/log/messages

Jul 16 12:04:31 srv kernel: [53987.519079] 247206 pages non-shared
Jul 16 12:16:55 srv kernel: [54736.388249] apache2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0
Jul 16 12:16:55 srv kernel: [54736.388257] Pid: 5376, comm: apache2 Not tainted 2.6.33.5-rscloud #2
Jul 16 12:16:55 srv kernel: [54736.388261] Call Trace:
Jul 16 12:16:55 srv kernel: [54736.388275] [] ? T.429+0x4f/0x145
Jul 16 12:16:55 srv kernel: [54736.388284] [] ? _raw_spin_unlock_irqrestore+0xf/0x10
Jul 16 12:16:55 srv kernel: [54736.388291] [] ? ___ratelimit+0xe2/0xfc
Jul 16 12:16:55 srv kernel: [54736.388296] [] ? T.428+0x37/0xfe
Jul 16 12:16:55 srv kernel: [54736.388301] [] ? __out_of_memory+0x140/0x157
Jul 16 12:16:55 srv kernel: [54736.388306] [] ? out_of_memory+0x82/0xac
Jul 16 12:16:55 srv kernel: [54736.388312] [] ? __alloc_pages_nodemask+0x489/0x57c
Jul 16 12:16:55 srv kernel: [54736.388319] [] ? read_swap_cache_async+0x54/0xe9
Jul 16 12:16:57 srv kernel: [54736.388324] [] ? swapin_readahead+0x57/0x98
Jul 16 12:16:57 srv kernel: [54736.388330] [] ? __raw_callee_save_xen_pte_val+0x11/0x1e
Jul 16 12:16:57 srv kernel: [54736.388336] [] ? handle_mm_fault+0x39a/0x6e0
Jul 16 12:16:57 srv kernel: [54736.388341] [] ? xen_force_evtchn_callback+0x9/0xa
Jul 16 12:16:57 srv kernel: [54736.388347] [] ? check_events+0x12/0x20
Jul 16 12:16:57 srv kernel: [54736.388351] [] ? check_events+0x12/0x20
Jul 16 12:16:57 srv kernel: [54736.388357] [] ? do_page_fault+0x277/0x293
Jul 16 12:16:57 srv kernel: [54736.388363] [] ? page_fault+0x25/0x30
Jul 16 12:16:57 srv kernel: [54736.388367] Mem-Info:
Jul 16 12:16:58 srv kernel: [54736.388369] DMA per-cpu:
Jul 16 12:16:58 srv kernel: [54736.388372] CPU 0: hi: 0, btch: 1 usd: 0
Jul 16 12:16:58 srv kernel: [54736.388375] CPU 1: hi: 0, btch: 1 usd: 0
Jul 16 12:16:58 srv kernel: [54736.388378] CPU 2: hi: 0, btch: 1 usd: 0
Jul 16 12:16:58 srv kernel: [54736.388381] CPU 3: hi: 0, btch: 1 usd: 0
Jul 16 12:16:58 srv kernel: [54736.388384] DMA32 per-cpu:
Jul 16 12:16:58 srv kernel: [54736.388387] CPU 0: hi: 186, btch: 31 usd: 58
Jul 16 12:16:58 srv kernel: [54736.388390] CPU 1: hi: 186, btch: 31 usd: 108
Jul 16 12:16:58 srv kernel: [54736.388393] CPU 2: hi: 186, btch: 31 usd: 116
Jul 16 12:16:58 srv kernel: [54736.388396] CPU 3: hi: 186, btch: 31 usd: 28
Jul 16 12:16:58 srv kernel: [54736.388402] active_anon:114370 inactive_anon:115510 isolated_anon:69
Jul 16 12:16:58 srv kernel: [54736.388403] active_file:123 inactive_file:2128 isolated_file:1
Jul 16 12:16:58 srv kernel: [54736.388405] unevictable:0 dirty:0 writeback:14 unstable:0
Jul 16 12:16:58 srv kernel: [54736.388407] free:1946 slab_reclaimable:2171 slab_unreclaimable:3364
Jul 16 12:16:58 srv kernel: [54736.388408] mapped:4531 shmem:7643 pagetables:12401 bounce:0
Jul 16 12:16:58 srv kernel: [54736.388416] DMA free:4020kB min:52kB low:64kB high:76kB active_anon:4708kB inactive_anon:4928kB active_file:24kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13812kB mlocked:0kB dirty:0kB writeback:0kB mapped:148kB shmem:252kB slab_reclaimable:16kB slab_unreclaimable:228kB kernel_stack:104kB pagetables:12kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Jul 16 12:16:58 srv kernel: [54736.388425] lowmem_reserve[]: 0 994 994 994
Jul 16 12:16:58 srv kernel: [54736.388437] DMA32 free:3764kB min:4004kB low:5004kB high:6004kB active_anon:452772kB inactive_anon:457112kB active_file:468kB inactive_file:8512kB unevictable:0kB isolated(anon):276kB isolated(file):4kB present:1018080kB mlocked:0kB dirty:0kB writeback:56kB mapped:17976kB shmem:30320kB slab_reclaimable:8668kB slab_unreclaimable:13228kB kernel_stack:2272kB pagetables:49592kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1586 all_unreclaimable? yes
Jul 16 12:16:58 srv kernel: [54736.388447] lowmem_reserve[]: 0 0 0 0
Jul 16 12:16:58 srv kernel: [54736.388454] DMA: 1*4kB 4*8kB 3*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 4020kB
Jul 16 12:16:58 srv kernel: [54736.388472] DMA32: 913*4kB 2*8kB 6*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3764kB
Jul 16 12:16:58 srv kernel: [54736.388490] 49539 total pagecache pages
Jul 16 12:16:58 srv kernel: [54736.388492] 39596 pages in swap cache
Jul 16 12:16:58 srv kernel: [54736.388495] Swap cache stats: add 14019230, delete 13979634, find 2273103/3914685
Jul 16 12:16:58 srv kernel: [54736.388499] Free swap = 648kB
Jul 16 12:16:58 srv kernel: [54736.388501] Total swap = 2097144kB
Jul 16 12:16:58 srv kernel: [54736.395069] 262144 pages RAM
Jul 16 12:16:58 srv kernel: [54736.395072] 6476 pages reserved
Jul 16 12:16:58 srv kernel: [54736.395075] 71705 pages shared
Jul 16 12:16:58 srv kernel: [54736.395077] 247738 pages non-shared

Regards
Sreyas

Downgraded apache to apache

sreyas's picture

Downgraded apache to apache 2.2.10 still the server load increases beyond control.. Looking for other options!!:(

Have you tried to profile

everyz's picture

Have you tried to profile Drupal to understand what takes so long?

Swap

joshk's picture

you're going into a swap spiral, looks like. you've also got a non-standard stack. You should profile your application to figure out what is happening with it, but this looks like a problem with Drupal and your traffic.

Hi josh, Actually I tried

sreyas's picture

Hi josh,

Actually I tried this same website on karmic also. There my server configurations are

Ubuntu Karmic
Apache/2.2.12 (Ubuntu)
PHP Version 5.2.10-2ubuntu6.4
Mysql 5.0.83-0ubuntu3 (Ubuntu)
varnish-2.0.4
Mercury 1.0

Actually followed this installation instruction. http://groups.drupal.org/node/50408

Earlier i had a perfectly working server for some another site, so I think this one is due to drupal.. but not sure where it is wrong as there is not much information from dblog also.

Regards
Sreyas

Some more information. This

sreyas's picture

Some more information.

This is a live site and I have been moving site between multiple servers for getting the best performance. While testing the website only newly installed server everything works perfect. Thats is there is perfectly no load(server load always less than 1). I even tested it using Jmeter.
With 1000 users and tested for continuously 4 hours and the load never went more than 1. Also the Jmeter report shows only .4% error(which is normal i think).

Earlier site was residing on low resource VPS with bare drupal now moved the site to rackspace cloud with pantheon and mercury profile. So its definitely should be giving a boost to the sites performance. Rather than giving a boost site was going down every 3-4 hours due to server load.

So testing is not working out here, as I do not have any problem with the site in test environment. Server load increases only when site is brought live.

Regards
Sreyas

xdebug report

sreyas's picture

Showing the 20 most costly calls sorted by 'memory-own'.

                                                                        Inclusive        Own
function                                                        #calls  time     memory  time     memory
--------------------------------------------------------------------------------------------------------
MemcachePool->get                                                   88  0.2388 18694216  0.2388 18694216
module_list                                                        298  0.0718  9739040  0.0388  9304408
views_plugin_display->option_definition                            288  0.0374  9317408  0.0321  9295864
ob_start                                                           168  0.0040  7017848  0.0040  7017848
array_keys                                                        2200  0.0424  5640824  0.0424  5640824
func_get_args                                                     2284  0.0456  5168312  0.0456  5168312
str_replace                                                       5019  0.0886  5168008  0.0886  5168008
array_merge                                                        506  0.0130  4716592  0.0130  4716592
views_handler_field->option_definition                             250  0.0197  3983032  0.0154  3904872
drupal_load                                                        218  0.0624  4112664  0.0329  3890432
module_hook                                                      10694  0.5985  3513888  0.4189  3513888
t                                                                 2926  0.3052  3711784  0.1357  3484616
date_part_extract                                                  537  0.0274  3321384  0.0193  3291096
unserialize                                                        301  0.0192  2948712  0.0192  2948712
explode                                                           1449  0.0237  2667432  0.0237  2667432
variable_get                                                      4294  0.0896  2350280  0.0896  2350280
preg_replace                                                      3938  0.0864  1945864  0.0864  1945864
mysql_fetch_object                                                 915  0.0239  1840104  0.0239  1840104
_db_query_callback                                                3037  0.1914  1867120  0.1152  1757128
url                                                                651  5.0089  2557576  0.0876  1601952

subscribing

bennos's picture

subscribing

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: