Are we squeezing too much?

Events happening in the community are now at Drupal community events on www.drupal.org.
chrisarusso's picture

Since having attended DrupalCon Chicago (and really well before) we have tried to absorb as much as we could from the advice of the performance community and implement it.

We haven't yet implemented Varnish/memcache/apc..., but we would like to take this (varnish) as our next step. Memory is a bit of a concern for us, we have a handful of services all running on one 8GB machine with a fair amount of daily traffic. I aplogize for the loads of data ahead, but i figured better to write too much rather than not enough.

We are primarily concerned with,
1) Are we asking for too much out of one box? Is it ram, cpu, something else? What is our bottleneck?
2) If we aren't asking too much, what are realistic targets we can acheive in terms of page load time, and what might be safe resource allocation to varnish (and everything really) when we turn it on?
3) If we are asking too much, what would be the best use of additional hardware?

Our environment

Traffic Metrics

Average Per Day (pageviews) 48,786
Average Per Visit (views per visitor) 1.4
99.9% traffic is anonymous (may be growing soon, but always overwhelmingly anon)

Quick breakdown of services all on one box

Apache
Solr
MySQL
Soon to be varnish?

Server Stats

2xE5[34]*; 8GB; 4 drive bays; 80GB+80GB+80GB+80GB disks; Ubuntu 8.04 (LTS); CP: none; DT: 1000GB of Data Transfer;

chris@hefty:~/w/files$ sudo cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4669.56
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 0
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.88
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.87
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 3
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 1
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.88
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 4
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4699.07
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 2
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.88
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 6
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.86
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

processor : 7
vendor_id : GenuineIntel
cpu family : 6
model : 23
model name : Intel(R) Xeon(R) CPU E5410 @ 2.33GHz
stepping : 6
cpu MHz : 2333.406
cache size : 6144 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips : 4666.88
clflush size : 64
cache_alignment : 64
address sizes : 38 bits physical, 48 bits virtual
power management:

chris@hefty:~/w/files$ sudo cat /proc/meminfo
MemTotal: 8190564 kB
MemFree: 2412312 kB
Buffers: 569200 kB
Cached: 3065432 kB
SwapCached: 340012 kB
Active: 3712860 kB
Inactive: 1595000 kB
SwapTotal: 2048276 kB
SwapFree: 1531252 kB
Dirty: 4984 kB
Writeback: 324 kB
AnonPages: 1670104 kB
Mapped: 66756 kB
Slab: 401712 kB
SReclaimable: 326596 kB
SUnreclaim: 75116 kB
PageTables: 23068 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 6143556 kB
Committed_AS: 2525548 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 25220 kB
VmallocChunk: 34359712907 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

Web server

Apache/2.2.8 (Ubuntu)
PHP 5.2.4-2ubuntu5.14
KeepAlive Off (recently switched... seemed to be hurting us on)

Database server

MySQL 5.0.51a (is this too old?)
Storage Engine: InnoDB (recent switch... though 2 of ~130 tables are still MyISAM)

From drupal admin/reports/status/sql

SQL
Command counters
Variable Value Description
Variable Value Description
Com_select 8 The number of SELECT-statements.
Com_insert 0 The number of INSERT-statements.
Com_update 0 The number of UPDATE-statements.
Com_delete 0 The number of DELETE-statements.
Com_lock_tables 0 The number of table locks.
Com_unlock_tables 0 The number of table unlocks.
Query performance
Variable Value Description
Variable Value Description
Select_full_join 0 The number of joins without an index; should be zero.
Select_range_check 0 The number of joins without keys that check for key usage after each row; should be zero.
Sort_scan 0 The number of sorts done without using an index; should be zero.
Table_locks_immediate 166401438 The number of times a lock could be acquired immediately.
Table_locks_waited 2529 The number of times the server had to wait for a lock.
Query cache information

The MySQL query cache can improve performance of your site by storing the result of queries. Then, if an identical query is received later, the MySQL server retrieves the result from the query cache rather than parsing and executing the statement again.
Variable Value Description
Variable Value Description
Qcache_queries_in_cache 4396 The number of queries in the query cache.
Qcache_hits 272403531 The number of times MySQL found previous results in the cache.
Qcache_inserts 102608966 The number of times MySQL added a query to the cache (misses).
Qcache_lowmem_prunes 66772564 The number of times MySQL had to remove queries from the cache because it ran out of memory. Ideally should be zero.

From Drupal contrib module DB Tuner

Queries
Uptime in seconds: 880393
Uptime: 10d 4h 33m 13s
Questions: 415406057
% slow queries: 0.00580203383987
slow query rate: 0.000569399942713 per day
Long query time: 1
Slow query logging: ON
% reads: 79.6215234635
% writes: 20.3784765365
qps: 471.841617323
reads per sec: 0.0781389632499 per day
writes per sec: 0.0199990273975 per day
Queries: 471.841617323 per second
Connections: 2 Million
Bytes sent: 3751 Billion
Bytes received: 97 Billion

versions
Supported Version: 5
Release Series: 5.0
version less then 5.1, upgrade!
(substr("version",0,3) ne "5.1")
(substr("5.0.51a-3ubuntu5.8-log",0,3) '5.0' !== "5.1")
Minor Version: 51
Distribution: (Ubuntu)
Distribution: (Ubuntu)

MySQL Architecture: x86_64

Query cache
Query cache efficiency (%): 71.5765302197
% query cache used: 26.983165741
The query cache is not being fully utilized.
(Qcache_free_memory / query_cache_size * 100 <80)
(4527024 / 16777216 * 100 26.983165741<80)
Query cache low memory prunes: 75.9171688098 per second
Increase query_cache_size -- there are too many low memory prunes.
(&hr_bytime(Qcache_lowmem_prunes/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(66836944/880393) dbtuner_stristr('75.9171688098 per second', array('second', 'minute')))
Query cache size: 16.0 Mb
Query cache min result size: 1.0 Mb
The max size of the result set in the query cache is the default of 1 Mb. Changing this (usually by increasing) may increase efficiency.
(&hr_bytes(query_cache_limit) eq "1.0 Mb")
(dbtuner_hr_bytes(1048576) '1.0 Mb' === "1.0 Mb")

Sorts
Total sorts: 10036528
% sorts that cause temporary tables: 0.14190166161
rate of sorts that cause temporary tables: 58.2367192833 per hour
sort_buffer_size: 2.0 Mb
read_rnd_buffer_size: 256.0 Kb
Sort rows: 12444.3930256 per second
There are lots of rows being sorted. Consider using indexes in more queries to avoid sorting too often.
(&hr_bytime(Sort_rows/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(10955956509/880393) dbtuner_stristr('12444.3930256 per second', array('second', 'minute')))

Joins,scans
rate of joins without indexes: 2.17890873735 per second
There are too many joins without indexes -- this means that joins are doing full table scans.
(&hr_bytime((Select_range_check + Select_scan + Select_full_join)/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime((0 + 1801031 + 117265)/880393) dbtuner_stristr('2.17890873735 per second', array('second', 'minute')))
rate of reading first index entry: 4.73679595363 per second
The rate of reading the first index entry is high; this usually indicates frequent full index scans.
(&hr_bytime(Handler_read_first/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(4170242/880393) dbtuner_stristr('4.73679595363 per second', array('second', 'minute')))
rate of reading fixed position: 519.470768168 per second
The rate of reading data from a fixed position is high; this indicates many queries need to sort results and/or do a full table scan, including join queries that do not use indexes.
(&hr_bytime(Handler_read_rnd/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(457338428/880393) dbtuner_stristr('519.470768168 per second', array('second', 'minute')))
rate of reading next table row: 20625.7471254 per second
The rate of reading the next table row is high; this indicates many queries are doing full table scans.
(&hr_bytime(Handler_read_rnd_next/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(18158763389/880393) dbtuner_stristr('20625.7471254 per second', array('second', 'minute')))

temp tables
tmp_table_size-max_heap_table_size: 16777216
tmp_table_size and max_heap_table_size are not the same.
(tmp_table_size-max_heap_table_size !=0)
(33554432-16777216 16777216!=0)
tmp_table_size: 32.0 Mb
max_heap_table_size: 16.0 Mb
% temp disk tables: 33.6876432074
Too many temporary tables are being written to disk. Increase max_heap_table_size and tmp_table_size.
(Created_tmp_disk_tables / (Created_tmp_tables + Created_tmp_disk_tables) * 100 >25)
(4037397 / (7947404 + 4037397) * 100 33.6876432074>25)
temp disk rate: 4.58590311372 per second
Too many temporary tables are being written to disk. Increase max_heap_table_size and tmp_table_size.
(&hr_bytime(Created_tmp_disk_tables/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(4037397/880393) dbtuner_stristr('4.58590311372 per second', array('second', 'minute')))
temp table rate: 9.02710948406 per second
Too many intermediate temporary tables are being created; consider increasing sort_buffer_size (sorting), read_rnd_buffer_size (random read buffer, ie, post-sort), read_buffer_size (sequential scan).
(&hr_bytime(Created_tmp_tables/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(7947404/880393) dbtuner_stristr('9.02710948406 per second', array('second', 'minute')))

MyISAM index cache
MyISAM key buffer size: 16.0 Mb
max % MyISAM key buffer ever used: 81.7626953125
MyISAM key buffer (index cache) % used is low. You may need to decrease the size of key_buffer_size, re-examine your tables to see if indexes have been removed, or examine queries and expectations about what indexes are being used.
((Key_blocks_used)key_cache_block_size/key_buffer_size * 100 <95)
((13396)
1024/16777216 * 100 81.7626953125<95)
% MyISAM key buffer used: 19.9279785156
MyISAM key buffer (index cache) % used is low. You may need to decrease the size of key_buffer_size, re-examine your tables to see if indexes have been removed, or examine queries and expectations about what indexes are being used.
((1-Key_blocks_unusedkey_cache_block_size/key_buffer_size) * 100 <95)
((1-13119
1024/16777216) * 100 19.9279785156<95)
% index reads from memory: 96.0059189303

other caches
table open cache size (5.1+): table_open_cache
Size of the table cache
(table_open_cache >-1)
(table_open_cache table_open_cache>-1)
rate of table open: 8.8613959902 per second
The rate of opening tables is high, increase table_open_cache to avoid this.
(&hr_bytime(Opened_tables/Uptime_since_flush_status) =~ /second|minute/)
(dbtuner_hr_bytime(7801511/880393) dbtuner_stristr('8.8613959902 per second', array('second', 'minute')))
% open files: 0.68359375
rate of open files: 0.686965934532 per day
Immediate table locks %: 99.9984796862
Table lock wait rate: 10.3535580133 per hour
thread cache: 8
Total threads created: 16083
thread cache hit rate %: 0.00642780835435
Threads that are slow to launch: 4
There are too many threads that are slow to launch
(Slow_launch_threads >0)
(4 4>0)
Slow launch time: 2

Connections
% connections used: 22.6666666667
Max connections used: 102
Max connections limit: 450
% aborted connections: 3.99664761198E-5
rate of aborted connections: 0.0981379906474 per day
% aborted clients: 0.013348803024
rate of aborted clients: 1.36575370318 per hour

InnoDB
Is InnoDB enabled?: YES
% innoDB log size: 62.5
InnoDB log file size is not an appropriate size, in relation to the InnoDB buffer pool. Consider changing either\ninnodb_log_file_size or innodb_buffer_pool_size
(innodb_log_file_size / innodb_buffer_pool_size * 100 >=0)
(5242880 / 8388608 * 100 62.5>=0)

other
MyISAM concurrent inserts: 1

INSERT DELAYED USAGE

Delayed_errors 0

Delayed_insert_threads 0

Delayed_writes 0

Not_flushed_delayed_rows

Drupal environment

Pressflow 6.20

Modules

148 enabled including 23 custom
CDN 6.x-2.1 (origin pull, two different IPs and subdomains...cookie free... all pointing to the same box)
Boost 6.x-1.18 (I've read dev is better)
Solr 6.x-1.2

GTMetrix Analysis:
Homepage on average about 2MB and 200 resources (yes I know that's a lot... we have a lot of media).
I have used image reduction software (jpegoptim) to help with size
Our scores are roughly 83 and 60 for Google Page Speed and YSlow respectively

Thanks so much for your advice!
- Chris

Comments

From the information that

dalin's picture

From the information that you've given, it appears that some tweaking should go a long way. You do have a really big box here. Some quick thoughts:

  • Between APC, Varnish, and Memcache I would start with APC. It's easiest to setup and will offer you the biggest overall improvement.
  • From the output of dbtuner it looks like you are running a fairly stock MySQL configuration which may be what's hurting you the most (I'm just guessing here as I can't see how many queries are run per page and how long those take, or how hard MySQL is running). I would try switching out my.cnf with my-innodb-heavy-4G.cnf and try that (Though maybe decrease innodb_buffer_pool_size to 1GB for now to prevent running the server into swap. Increase later if necessary). I also recommend the following InnoDB settings:
    innodb_flush_log_at_trx_commit = 0
    innodb_flush_method = O_DIRECT
  • You might increase the MySQL query cache to 64M.

But I also don't recommend blindly implementing my recommendations above. Do some research to understand what the config options are before you tweak them.

My jaw drops when you say 200 resources. Media heavy sites that I work with sometimes start at about 150, but with spriting and aggregating we often get that down to < 60. I'm guessing that your initial page load time on IE7 is upwards of 15 seconds. No one will stick around for that. They'll start leaving at around the 4 second mark.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Dave, Will the

chrisarusso's picture

Dave,

Will the my-innodb-heavy-4G.cnf be problematic given I still have two MyISAM tables?

I am also a little fearful that giving APC 256MB of RAM, which it seems it'll need, may be pushing it. We hover around swap during busy parts of the day. Though I imagine this memory will likely end up being used more efficiently in APC and save us system-wide, yeah?
Here's an example of today's load:

Only local images are allowed.

Additionally, our hosting provider suggested we run ths
ps -aylC $APACHE |grep "$APACHE" |awk '{print $8'} |sort -n |tail -n 1
Which yielded, 136MB per apache process (I imagine this is terribly high).

Additionally when apache was off, we only had 3.6GB free (of the 8)

To give you an idea of the resources... we're very image and ad heavy. I'm pulling the 200 resources number from gtmetrix, which I believe double counts image scripts that turn into redirects that turn into images.
Just now the homepage yielded 125 image requests and 91 for anonymous and authenticated respectively.
Breakdown of those is:

  • ~50-60 imagecache images
  • ~15-30 1px images
  • ~10 ad images
  • handful of theme images

I have optimized the imagecache images with jpegoptim
Spriting will really not do that much for us, our theme is simple.
We're just heavy on views based image thumbnail lists.

I was blind a few months back, now my vision's roughly 5/20. Hopefully it'll go up from here. Thanks for the input.

~15-30 1px

Garrett Albright's picture

~15-30 1px images

BbbWHAAAAA? Fire your themers immediately.

If you're still using one-pixel images in 2011, is it at least the same image and not ~15-30 different ones?

Also, maybe this is a dumb question, but did you check that ImageCache is working properly - that it's caching the images it creates, and that the server is serving the cached images instead of bootstrapping Drupal the second time?

Understatement

chrisarusso's picture

It actually seems this was an understatement. When viewing the waterfall images load on FF with the net panel, I see the 15-30 from googleadservices.com alone. We have several other ad services running, so the real number probably approaches 50.

These aren't images that we are using for the theme, but rather the underlying technology that the ad service implements. Unless I'm missing something, there is no way to avoid this when including ad code (other than serving significantly fewer ads). Am I?

As for imagecache... not a dumb question (as far as you know... I may be dumb (; ). However, yes, it is working properly in that it's creating derivative images, and then serving them staticly, and not booting through drupal when being served from our psuedo CDN. I did have to pull a little trickery to create derivate images that hadn't yet been created, but were being served by the CDN (which is just separate vhosts on the same server symlinked to the main directory). It's mostly working now other than a potential transliteration fix which is leaving out some images with funky URLs (a known issue in Drupal).

Okay, yeah, I was thinking

Garrett Albright's picture

Okay, yeah, I was thinking those were placement adjustment images in your themes - a particularly nasty remnant of the days before CSS caught on, and encouraged by GoLive and its ilk. I'm not sure why Google is serving images like that, but either way, they're being served from Google's servers, so they won't be affecting your server at all.

Anyway, I'm out of ideas, so I will defer to the more knowledgable folk in this thread. Good luck.

Apache

nickteagle's picture

Hi I was thinking about your apache memory size it seems large ?

What os are you running ?

What do you get from apache2ctl -M

What are the details for your prefork config in the apache config file.

What does show ps -aux | grep apache (or http depending on os)

How many apache processes are running when the system is running slow ?

Could you send the top part of a top command whenthe system is running slow ?

Also what what is your memory limit set in your php.ini ? 128mg

Cheers Nick

Many questions

chrisarusso's picture

Hey Nick, I like where your head's at. Here goes.

OS

chris@hefty:~$ cat /etc/issue
Ubuntu 8.04.4 LTS \n \l
chris@hefty:~$ cat /proc/version
Linux version 2.6.24-19-server (buildd@king) (gcc version 4.2.3 (Ubuntu 4.2.3-2ubuntu7)) #1 SMP Wed Aug 20 18:43:06 UTC 2008

Loaded mods

chris@hefty:~$ apache2ctl -M
Loaded Modules:
core_module (static)
log_config_module (static)
logio_module (static)
mpm_prefork_module (static)
http_module (static)
so_module (static)
alias_module (shared)
auth_basic_module (shared)
authn_file_module (shared)
authz_default_module (shared)
authz_groupfile_module (shared)
authz_host_module (shared)
authz_user_module (shared)
autoindex_module (shared)
cgi_module (shared)
deflate_module (shared)
dir_module (shared)
env_module (shared)
expires_module (shared)
headers_module (shared)
mime_module (shared)
negotiation_module (shared)
php5_module (shared)
rewrite_module (shared)
setenvif_module (shared)
status_module (shared)
Syntax OK

Apache config directives

Below I'll list pertinent (and maybe some not so pertinent) directives along with the notes we have as to when/why we changed them. Disclaimer: we didn't necessarily do the right thing, but it's at least somewhat documented here. We've since moved to using git to track everything in /etc

Timeout: The number of seconds before receives and sends time out.

- 2011-5-5 change from 100 to 60.

#
Timeout 60

KeepAlive: Whether or not to allow persistent connections (more than

one request per connection). Set to "Off" to deactivate.

- 5/10/11 setting to Off to get comparison.

#
KeepAlive Off

MaxKeepAliveRequests: The maximum number of requests to allow

during a persistent connection. Set to 0 to allow an unlimited amount.

We recommend you leave this number high, for maximum performance.

#
MaxKeepAliveRequests 300

KeepAliveTimeout: Number of seconds to wait for the next request from the

same client on the same connection.

- 2011-5-6 reduce from 5 to 2.

- 2011-5-8 reduce from 2 to 1 after hefty thrashed.

#
KeepAliveTimeout 1

#

Server-Pool Size Regulation (MPM specific)

#

ServerLimit 500

prefork MPM

StartServers: number of server processes to start

MinSpareServers: minimum number of server processes which are kept spare

MaxSpareServers: maximum number of server processes which are kept spare

MaxClients: maximum number of server processes allowed to start

MaxRequestsPerChild: maximum number of requests a server process serves

StartServers 10
MinSpareServers 7
MaxSpareServers 10
MaxClients 80
MaxRequestsPerChild 4000

8/23/09 set maxclients to 150 for troubleshooting hard crashes

8/24/09 set maxclients to 250. hard crash problem found. un-neutering server

9/23/09 set maxclients to 200. 250 seems to be too high (pushes us into swap)

10/25/09 set maxclients to 250 to handle high traffic today

10/28/09 set maxclients to 200. 250 seems to be too high (pushes us into swap)

2/21/10 set maxclients to 150 for troubleshooting going into swap this morning

2/22/10 set maxclients back to 200 after fix

2/25/10 set maxclients to 150 since we increased php limit to 250mb for dev site work.

5/4/11 change maxclients from 100 to 50 during network troubles.

1:33pm change maxclients from 50 to 80.

5/6/11 change maxclients to 100.

5/8/11 change maxclients to 80 after went into swap.

worker MPM

StartServers: initial number of server processes to start

MaxClients: maximum number of simultaneous client connections

MinSpareThreads: minimum number of worker threads which are kept spare

MaxSpareThreads: maximum number of worker threads which are kept spare

ThreadsPerChild: constant number of worker threads in each server process

MaxRequestsPerChild: maximum number of requests a server process serves

StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 0

AccessFileName .htaccess

The following lines prevent .htaccess and .htpasswd files from being

viewed by Web clients.

#

Order allow,deny
Deny from all

DefaultType is the default MIME type the server will use for a document

if it cannot otherwise determine one, such as from filename extensions.

If your server contains mostly text or HTML documents, "text/plain" is

a good value. If most of your content is binary, such as applications

or images, you may want to use "application/octet-stream" instead to

keep browsers from trying to display binary files as though they are

text.

#
DefaultType text/plain
HostnameLookups Off
LogLevel warn
ServerTokens Prod

Optionally add a line containing the server version and virtual host

name to server-generated pages (internal error documents, FTP directory

listings, mod_status and mod_info output etc., but not CGI generated

documents or custom error documents).

Set to "EMail" to also include a mailto: link to the ServerAdmin.

Set to one of: On | Off | EMail

#
ServerSignature Off

ps

chris@hefty:/etc/apache2/sites-enabled$ ps aux | grep apache2
www-data 4734 0.0 1.4 323132 116092 ? S May18 0:07 /usr/sbin/apache2 -k start
www-data 6820 4.1 1.6 319572 132488 ? S 01:31 0:10 /usr/sbin/apache2 -k start
www-data 6840 4.4 1.5 311284 124456 ? S 01:32 0:10 /usr/sbin/apache2 -k start
www-data 7195 0.6 0.8 268004 70956 ? S 01:32 0:01 /usr/sbin/apache2 -k start
www-data 7226 1.9 1.4 308456 121104 ? S 01:33 0:03 /usr/sbin/apache2 -k start
www-data 7239 7.6 1.4 302328 115148 ? S 01:33 0:09 /usr/sbin/apache2 -k start
www-data 7243 7.1 1.2 288876 102216 ? S 01:33 0:09 /usr/sbin/apache2 -k start
www-data 7244 9.8 1.3 294216 107688 ? S 01:33 0:12 /usr/sbin/apache2 -k start
www-data 7248 2.2 1.5 321344 124176 ? S 01:33 0:02 /usr/sbin/apache2 -k start
www-data 7253 5.3 1.7 327436 140160 ? S 01:33 0:06 /usr/sbin/apache2 -k start
www-data 7256 5.3 1.5 317616 130768 ? S 01:34 0:05 /usr/sbin/apache2 -k start
www-data 7262 12.7 1.7 326596 139492 ? R 01:34 0:11 /usr/sbin/apache2 -k start
www-data 7265 6.3 1.5 311400 124688 ? S 01:34 0:05 /usr/sbin/apache2 -k start
www-data 7267 3.1 1.5 311508 124596 ? S 01:34 0:02 /usr/sbin/apache2 -k start
www-data 7269 0.7 1.0 279296 88440 ? S 01:34 0:00 /usr/sbin/apache2 -k start
www-data 7278 3.7 1.2 289848 102504 ? S 01:34 0:03 /usr/sbin/apache2 -k start
www-data 7280 3.5 1.3 299132 112268 ? S 01:34 0:03 /usr/sbin/apache2 -k start
www-data 7288 8.7 1.8 339864 152692 ? R 01:34 0:07 /usr/sbin/apache2 -k start
www-data 7291 1.3 1.1 305476 96368 ? S 01:34 0:01 /usr/sbin/apache2 -k start
www-data 7293 6.3 1.6 318452 131412 ? S 01:34 0:04 /usr/sbin/apache2 -k start
www-data 7383 5.5 1.7 329580 142144 ? R 01:35 0:03 /usr/sbin/apache2 -k start
www-data 7656 4.0 1.2 294636 105964 ? S 01:35 0:02 /usr/sbin/apache2 -k start
www-data 7658 6.4 1.3 301820 114204 ? S 01:35 0:03 /usr/sbin/apache2 -k start
www-data 7660 7.1 1.8 335868 148100 ? R 01:35 0:03 /usr/sbin/apache2 -k start
www-data 7661 7.0 1.3 298088 110560 ? S 01:35 0:03 /usr/sbin/apache2 -k start
www-data 7662 2.6 1.5 316008 126700 ? S 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7663 8.2 1.5 320880 129376 ? S 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7664 5.7 1.2 290352 98580 ? S 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7665 5.4 0.0 0 0 ? Z 01:35 0:01 [apache2]
www-data 7670 7.5 1.0 287736 83396 ? R 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7672 0.8 0.2 247808 22064 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7673 0.0 0.0 241160 4588 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7674 0.0 0.0 241156 4556 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7675 0.0 0.0 241160 4572 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7677 5.7 1.3 307996 110204 ? S 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7678 0.0 0.0 241624 4944 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7680 14.8 1.4 320372 115520 ? S 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7681 28.8 1.1 285760 93576 ? R 01:35 0:01 /usr/sbin/apache2 -k start
www-data 7682 0.8 0.1 242896 14660 ? S 01:35 0:00 /usr/sbin/apache2 -k start
www-data 7684 4.3 0.2 244180 22736 ? S 01:35 0:00 /usr/sbin/apache2 -k start
chris 7700 0.0 0.0 3944 672 pts/2 R+ 01:36 0:00 grep apache2
root 22294 0.0 0.0 241020 8180 ? Ss May11 5:28 /usr/sbin/apache2 -k start

top

This was not at a busy time. A busy time, might have twice as many apache threads and a higher load. But you can see that memory is quite occupied with about 1GB free. I'll try to save a snapshot if I can remember to be on the server during rush tomorrow.

Tasks: 200 total, 4 running, 196 sleeping, 0 stopped, 0 zombie
Cpu(s): 38.4%us, 2.5%sy, 0.0%ni, 55.8%id, 2.9%wa, 0.0%hi, 0.4%si, 0.0%st
Mem: 8190564k total, 7128712k used, 1061852k free, 472456k buffers
Swap: 2048276k total, 518452k used, 1529824k free, 2889412k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7737 www-data 20 0 303m 120m 50m S 35 1.5 0:02.35 apache2
7718 www-data 20 0 296m 113m 50m S 32 1.4 0:06.23 apache2
7748 www-data 20 0 313m 108m 29m S 30 1.4 0:00.91 apache2
7752 www-data 20 0 316m 132m 50m S 30 1.7 0:02.09 apache2
7680 www-data 20 0 320m 136m 50m R 29 1.7 0:03.31 apache2
7244 www-data 20 0 287m 105m 51m S 28 1.3 0:16.61 apache2
7728 www-data 20 0 303m 117m 46m S 27 1.5 0:03.35 apache2
7749 www-data 20 0 292m 108m 50m S 22 1.4 0:02.96 apache2
3363 mysql 20 0 458m 157m 4416 S 18 2.0 4545:25 mysqld
7732 www-data 20 0 304m 117m 46m S 18 1.5 0:02.86 apache2
7670 www-data 20 0 278m 95m 51m S 16 1.2 0:04.35 apache2
7746 www-data 20 0 302m 115m 46m R 13 1.4 0:01.46 apache2
7662 www-data 20 0 308m 126m 51m S 12 1.6 0:04.41 apache2
7741 www-data 20 0 262m 62m 31m R 8 0.8 0:00.29 apache2
7280 www-data 20 0 292m 109m 51m S 4 1.4 0:06.52 apache2
7743 www-data 20 0 302m 115m 47m S 4 1.4 0:01.44 apache2
7726 www-data 20 0 297m 104m 40m S 2 1.3 0:01.08 apache2
7742 www-data 20 0 303m 119m 49m S 2 1.5 0:02.85 apache2
7269 www-data 20 0 277m 95m 51m S 1 1.2 0:04.00 apache2
7716 www-data 20 0 300m 118m 51m S 1 1.5 0:03.69 apache2
7677 www-data 20 0 300m 117m 51m S 1 1.5 0:04.54 apache2
3105 root 15 -5 0 0 0 S 0 0.0 98:10.93 md2_raid1
7288 www-data 20 0 308m 126m 51m S 0 1.6 0:10.69 apache2
7682 www-data 20 0 270m 85m 48m S 0 1.1 0:03.94 apache2
7750 www-data 20 0 237m 12m 7288 S 0 0.2 0:00.05 apache2
1 root 20 0 4020 176 92 S 0 0.0 0:19.96 init
2 root 15 -5 0 0 0 S 0 0.0 0:11.44 kthreadd
3 root RT -5 0 0 0 S 0 0.0 0:18.43 migration/0
4 root 15 -5 0 0 0 S 0 0.0 0:38.06 ksoftirqd/0
5 root RT -5 0 0 0 S 0 0.0 0:01.01 watchdog/0
6 root RT -5 0 0 0 S 0 0.0 0:18.02 migration/1
7 root 15 -5 0 0 0 S 0 0.0 0:35.91 ksoftirqd/1
8 root RT -5 0 0 0 S 0 0.0 0:00.92 watchdog/1
9 root RT -5 0 0 0 S 0 0.0 0:10.65 migration/2

php memory limit

memory_limit = 150M

What do ya think? And thank you for taking an interest and helping out.

Apache

nickteagle's picture

Ubuntu is old but i'm sure you know that and of course big job to upgrade but would be nice to be on 10.04 ?

On the apache front i've been slimming down my apache config to get the minimum loaded modules that drupal will still work with. I've been using this, https://boxpanel.bluebox.net/public/the_vault/index.php/High_Performance....

My present apache has loaded;
oaded Modules:
core_module (static)
log_config_module (static)
logio_module (static)
mpm_prefork_module (static)
http_module (static)
so_module (static)
authz_host_module (shared)
deflate_module (shared)
dir_module (shared)
env_module (shared)
mime_module (shared)
php5_module (shared)
reqtimeout_module (shared)
rewrite_module (shared)

but i did have to delete some alias stuff out of the config to get this work. Note i've not run this config on a big traffic website.

Are you loading a number .so in your apache config ?

On the mpm prefork it does look like you have played around with this and got it to settle but what you could try is bringing the MaxRequestsPerChild down this will kill the older processes sooner which should be the ones using the most memory.

You could also play with your PHP memory limit and see what you can bring that down to that should stop the apache processes getting so big. Do it in steps and check the logs until php crashes with a memory error ?

Of course if you slim down apache and it reloads more often then you might be able to settle on a slightly higher figure for maxclient.

I saw your comments on APC I would really give that a rethink and I think you would get good result even from using it with say 64Meg ? it only storing your php files and only these once these been through the op code compiler I've got it running a small site with about 10module at 20meg. Make sure you copy over the apc.php file that comes as part of the down load to your webserver then you can see how much memory it uses and you maybe able to bring that down to.

I saw your running Solr is that on tomcat or is jetty ? what memory is set aside for the jvm ? (can you show ps -aux ?)

I think 5.0 on mysql is old ? they have release 5.6 ? I would consider seriously moving to 5.5 especially now the majority of your tables are innodb. Also 5.5 will handle your multi-core processor better. But might be worth having a look at your my.cnf file ?

Cheers Nick

Knowing is half the battle

joshk's picture

Overall given your traffic numbers, the hardware should be more than adequate. If you're not seeing the results you want You should identify where the bottlenecks are happening:

  • Is it transfer/render time to the browser? Do you have css/js aggreggation off? Can you sprite your theme images? Can you convince your designers to make something lighter-weight? ;)
  • Do you have high page execution times? APC can help a lot there (be sure to give it enough memory to cache all 160 modules!). Beyond that, code profiling can help flush out lemons or unnecessary processing.
  • On the back-end, since you have 8GB of RAM, if you haven't tuned your database you're not giving it enough memory. Be prepared to spend some time learning how this works, but you're definitely going to need to adjust your buffer pool.

Be sure to work on this in a focused and controlled fashion. Trace the lifetime of a page request from browser to server and back and see where time is being spent along the way. Work on one aspect at a time, and make sure you have a good way to test whether your changes are helping or hurting your numbers. This is an area where following a rigorous experimental procedure will save you huge in the end.

Most of all, good luck! :)

Good advice

chrisarusso's picture

Josh,

When the server isn't loaded too much, the boost anonymous experience is pretty snappy. However, when we hit our busy times, the server gets bogged down pretty easily as the concurrent users swell up memory and system resources much more than they "should" be. Before we switched to innodb, this would occasionally (okay... daily) lead to downtimes because the system started writing mysql temp tables to disk, and they would lock up, and then ran out of memory, and probably a host of other issues. Authenticated experience has never been that fast, but it's not terrible, and we're primarily (though not entirely) focused on improving the anonymous experience

I don't think spriting will do much as our theme is quite simple with few images. CSS/JS agg is on.

Something lighter weight... now you're talking. Thanks for both you and Dave going to bat for me ;)

Code profiling: I started to work with xhprof, but it sort of fell by the wayside as we got more excited about trying out varnish and the like. It was also going to be my goto in terms of benchmarking improvements of the system in general by collecting the stats of requesting the same page upon each intended deployment of a performance improvement.

Having said that, how would you recommend doing it? I have read a bit about new relic within the group's postings and at DrupalCon, a fair amount of people seem to be using that service. Currently, we're sort of relying on response time from pingdom.com, gtmetrix charts over time, and server load from munin monitoring. They're all good services, but I fear they leave too many variables out there to rely on as good evidence.

Thanks for the luck!

From my point of view your

vegardx's picture

From my point of view your database seem to be working fine, you have very few slow queries and that is a really good sign on your overall performance of MySQL, sure there are things that /could/ be done to squeeze every little bit of performance out of it. But with all that anonymous traffic you're seeing I'd focus more on caching, especially opcode (APC, XCache, etc) and reverse caching (varnish!).

--
Vegard

Hey now...

chrisarusso's picture

You're supposed to yell at me and tell me how to fix it, not compliment me!

Ha, I appreciate it. The db was actually where I started, and I used devel to identify some supremely expensive views queries. So the slow queries have been on the radar for a while. They're still not perfect, but by no means the biggest fish to fry at this point. But the others are right in saying, it is, for the most part, a stock my.cnf that almost certainly would improve upon the right tweakage.

What you've said, was what I had in mind in the next few weeks. I believe the two pros above have helped lean me towards APC, but varnish has been on the plate for a bit now. We shall see.

Thanks for your input.

nginx

ogi's picture

I would go for nginx + PHP5-FPM. It saves significant chunk of memory on low memory VPS, I don't know about your case. Apache mpm_prefork means that each request=Apache process that contains mod_php regardless if it's about static file or PHP! KeepAlive makes this worse.

If you are willing to try, you can start by studying https://github.com/perusio/drupal-with-nginx

Hmmmm

chrisarusso's picture

Certainly some competing advice at this point. As stated above, we have 8GB of RAM, so as I see it, we don't fit into the "low memory" category. I have read some good things about nginx, but that's an entirely new set of things to learn.

I think before switching the web server from Apache, which I know more about after having worked with it for a few years, there's more documentation for and a larger community around, it's likely we'd move towards varnish (though I suppose varnish is an entirely new part of the stack as well).

Check out what these guys did on one server http://2bits.com/drupal-performance/presentation-34-million-page-views-d... ...and allegedly without some of the complexities that i'm preparing to add. In some ways the hardware was superior, but it seems they cached very well, and has a SIGNIFICANTLY lighter drupal footprint.

It seems plenty can be done without switching a core technology, the web server, to nginx, but maybe I will go down your path. Ultimately, I don't know, and that's why I'm asking.

Thanks for taking the time to reply.

Varnish vs. nginx

ogi's picture

I don't know how Varnish is usually configured for Drupal but Varnish serving static files is very big improvement alone and this makes it very close to nginx + PHP5-FPM. But again I don't know how much improvement you'll get relative to your CDN usage, e.g. does your CDN serve beyond files/ - theme's static files in particular. It would be great if Varnish can serve directly your Boost cached files without reaching slow Apache mpm prefork. I don't have any experience with Varnish but I hope my reasoning be of some help.

I have limited varnish knowledge...

chrisarusso's picture

compared to some of the heavy weights in this group, but I'm quite sure you can customize it at a very granular level. In short, yes you could serve those static html files with varnish instead, and even pngs but not jpegs (if you had any reason to do this). It seems boost is recommended for shared hosting where you don't have the control over the server, but can really only enable a module. Varnish will replace boost once we properly configure it. Our plan is to introduce it slowly... so only serving files from our files/* directory. This alone will be serving somewhere in the neighborhood of 30 - 70 images per page for us, so we should see marked improvements with that alone.

Boosted Varnish to the rescue :-)

fabianx's picture

Hey,

I have got a proposed session for Drupal Con London on using Varnish seemlessly with already boosted sites (without Pressflow!). It takes the load off Apache and works really similar to Boost without worrying about SESSION throwing modules, etc. We managed to get the response times down on one site from 4s to around 0.3 ms on the high traffic days (less on the others).

If you don't have any specific Boost configuration (excluded sites), it is even a drop in enhancement. If you have, you need to change just one line. So it is really easy to install and configure.

Contact me and I give you a sneak preview on the config for you to use.
(It still needs a little clean up before it is ready for release to the community, but it will probably be released shortly before Drupal Con London.)

This config also already includes the latest greatest Varnish config pointed to by dalin from Lullabot (thanks Guys), so you can always seemlessly go to "real" Pressflow+Varnish, but I personally like the Boost crawler better ...

Best Wishes,

Fabian

PS: And if you are interested in the session, you can read more about this approach and its benefits here:

http://london2011.drupal.org/conference/sessions/boosted-varnish-how-inc...

PPS: To make this a little more useful to your original question, you should absolutely do with 97% anonymous traffic:

FIRST: Allow less apache clients => lower memory usage and lower system load

Yes, I propose scaling the system down to make it faster :-).

Now we have plenty of memory to install APC and memcache.

As you are on Ubuntu just do:

apt-get install php5-apc
apt-get install php5-memcache
apt-get install memcached

If that gives you problems, because you are on PHP 5.2, follow one of the tutorials to install the versions from karmic (compatible with PHP 5.2).
For example: http://randyfay.com/node/63

Setup APC memory size:

256 MB is large and for most sites 128 MB is enough. Just watch the stats and adjust as needed with apc.php.

Then do the usual memcache setup:

drush dl memcache # dl memcache module

and add two lines to settings.php (second line needed for varnish setup later):

$conf['cache_inc'] = './sites/all/modules/memcache/memcache.db.inc';
$conf['memcache_servers'] = array('localhost:11211' => 'default');

(or use memcache.inc if you do not want to write-through to DB)

2/3 items done

This alone should help much much more, because now you can slowly scale up Apache again as it uses less memory due to the opcode cache. Also requests are faster due to the opcode cache.

Varnish is also just a:

apt-get install varnish

edit /etc/default/varnish to enable it and then drop in your boosted varnish config to /etc/varnish/ and point to it.

Test it out via www.example.com:6081

Then when ready, move apache to 8080 and let varnish listen on 80.

And adjust settings.php to set:

$conf['reverse_proxy'] = TRUE;
$conf['reverse_proxy_addresses'] = array('127.0.0.1');

3/3: DONE.

Possibly fire up a similar EC2 test instance to test this steps before going live and replay some real live logs via grabbing hitted URLs from your boost_cache DB table. (I also use this approach for cache pre-warming)

memcache.db.inc should not be

dalin's picture

memcache.db.inc should not be used. It's unmaintained and will be removed from the next version of Memcache module.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Good to know

fabianx's picture

Good to know :-).

Thanks and Best Wishes,

Fabian

Info

chrisarusso's picture

Wow,

Lots of information here:

Update on what I've done

I have since updated apc from 48MB to 192MB of memory (didn't even realize we had it enabled before). It now has a continuous 100% hit rate, and uses much of that memory when fragmented, so I haven't scaled down the memory, and probably won't need to given our excess. As Dalin said, this has had a nice impact on apache memory needs, and has "smoothed out" memory usage according to our munin graphs.

I have installed and configured varnish to the point that it is serving basically everything besides html pages now, which boost is. My plan was to further configure it properly so that we could get rid of boost, as I had read that varnish is faster/better at serving from memory rather than reading/writing static files all the time.

A couple things about that:

  1. I have not changed settings.php to list the reverse proxy. What is the effect of this? It seems perfectly happy without it, but I'm sure I'm missing something.
  2. How about telling drupal "external cache"? I haven't changed this yet either, and am not sure of the effect there.
  3. When moving from boost to varnish serving my html (we have loads of cookies that we strip on files which is preventing html sharing currently) how do I control when the content expires? We have a fair amount of user comments, and don't want these cached pages to live too long. Is that taken care of on the performance page that will control the html headers which then controls varnishes expiry? Is it a varnish setting? Can we chose to purge a cached-page upon comment submission, or is it much easier to just give a ttl to varnish?
  4. Does anyone use munin to monitor their varnish instance(s), and if so, have any problem with the memory allocated graph? I want to see how much memory it's using, so I'll know if the 1GB I'm currently giving it is sufficient?

Thanks for the amazing input and good luck with the session.

While I think Boost is great

dalin's picture

While I think Boost is great for shared hosting, I don't really like it for more substantial sites. The load that it causes during a cache clear can be quite severe. The performance gain of Varnish serving the HTML vs. Boost is only noticeable under high traffic load. In these cases Varnish can hold up very well.

  1. If you haven't setup the reverse proxy in settings.php you'll notice that the hostname column in the sessions table only lists one IP address - that of Varnish. The reverse proxy settings tell Drupal what X Forwarded For headers to trust (these list the originating IP address).

  2. That's how to get Varnish to cache your pages. By default Drupal sends HTTP headers that forces the proxies to re-validate the page on every request. By enabling this (and increasing the max age) Varnish will start caching HTML pages.

  3. If you enable the Cookie Cache Bypass module (bundled with Pressflow), then whenever an anonymous user submits a form (i.e. a comment) they'll get a 5min cookie which causes Varnish to send their requests straight to Drupal (so they'll always see fresh content). As for general expiry you can either just go with the max age (as set on admin/settings/performance) or you can use the Varnish Drupal module (and possibly the Expire module) to integrate with Drupal's normal cache clearing mechanisms.

  4. I find Varnish' memory usage to be very consistent so I just check it with varnishstat until I get it in a good place and then just leave it.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

Hi, I have installed and

fabianx's picture

Hi,

I have installed and configured varnish to the point that it is serving basically everything besides html pages now, which boost is. My plan was to further configure it properly so that we could get rid of boost, as I had read that varnish is faster/better at serving from memory rather than reading/writing static files all the time.

That is exactly what Boosted Varnish is for :-). My offer stands: Just drop me a mail. I am curious what it could do for you.

Varnish is definitely faster from memory, but as you noticed boosted files are not served by Varnish (as they set expired to 1978). However the files are not the bottleneck, but Apache is. If you configure Varnish now to cache each item for say one hour or 5 min or 10 min, then hot items will once be asked for from Apache and then served by Varnish for the next 10 min.

That means that every item is just asked once by Varnish (an varnish knows that a backend request on that item is pending) and whenever the backend is not available the cached copy is used (even if its expired). That way you get a lot of more reliability.

See here for more information:

http://www.varnish-cache.org/trac/wiki/VCLExampleLongerCaching

And even 5 min can make a difference.

I have not changed settings.php to list the reverse proxy. What is the effect of this? It seems perfectly happy without it, but I'm sure I'm missing something.

Drupal's ip_address() function will give back 127.0.0.1 always.

How about telling drupal "external cache"? I haven't changed this yet either, and am not sure of the effect there.

It seems to be similar to aggressive caching. There is a great presentation video from DrupalCon Chicago on that topic from Kenny Silanskas:

http://chicago2011.drupal.org/sessions/failure-launch-drupal-performance...

When moving from boost to varnish serving my html (we have loads of cookies that we strip on files which is preventing html sharing currently) how do I control when the content expires? We have a fair amount of user comments, and don't want these cached pages to live too long. Is that taken care of on the performance page that will control the html headers which then controls varnishes expiry? Is it a varnish setting? Can we chose to purge a cached-page upon comment submission, or is it much easier to just give a ttl to varnish?

I am not yet completely understanding what you are doing here. But if boost is re-generating the page on comments (not sure if it does that), then with your 5 min Varnish cache solution you should be fine.

If you are just using Varnish, you'll need to send custom headers to Varnish via varnish, purge and expire modules on the comment submission.

Else it'll live until the maximum time is over.

Does anyone use munin to monitor their varnish instance(s), and if so, have any problem with the memory allocated graph? I want to see how much memory it's using, so I'll know if the 1GB I'm currently giving it is sufficient?

Nope, memory stats work really fine with Varnish. Depends of course on your working set if 1 GB is enough.

How have page load times improved so far? Do you see the results you expected?

Best Wishes,

Fabian

varnish

ingard's picture

If you have 99.9% anon users the single thing that will help you the most will definantly be Varnish! you need to try this out asap :)
However, varnish by default will pass all requests with cookies to the backend so you will need to use D7 or pressflow or implement changes that removes the sessions stuff from anon users. But trust me, you will like varnish and if you have D7 or pressflow already then implementing varnish is as simple as installing it, change the apache listening port to lets say 88 and configure the backend localhost:888 in varnish and you're ready to rock!

Also if you have stock D6 you can unset cookies with some varnish config foo for all requests to static files. This will offload apache a great deal as well :)

But trust me, you will like

dalin's picture

But trust me, you will like varnish and if you have D7 or pressflow already then implementing varnish is as simple as installing it, change the apache listening port to lets say 88 and configure the backend localhost:888 in varnish and you're ready to rock!

And configure the caching mode and page cache maximum age at admin/settings/performance. And finding the community's latest/greatest .vcl file (here I'll save you the legwork). And setting up your settings.php file to get the true IP address. And do some testing to make sure that it's actually working, and working well. And figuring out whether you want to just let content naturally expire, or to hook in with Drupal's cache clearing mechanisms (you'll need Varnish module and possibly Expire module for this).

APC is definitely quicker and easier and should drop the memory usage of Apache by at least 50%.

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

High performance

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: