Good query execution time but page execution time too high

We encourage users to post events happening in the community to the community events group on https://www.drupal.org.
ghankstef's picture

We have a fairly high traffic site 1.5 million page views a month. The site uses a fairly standard Panels node override for displaying individual nodes. Left side gets node content such as title, content body and node comments. Right sidebar gets a latest stories block, an ad, and related stories block. Actually panels panes in most cases not blocks. We do run memcache and Varnish but we find that the traffic pattern always includes a number of uncached requests and scaling the site up mean delivering a single page as fast as possible.

Devel outputs this.
Executed 208 queries in 112.01 ms. Queries exceeding 5 ms are highlighted. Page execution time was 2039.75 ms. This is too slow

Slowest query is 5.24 ms so queries are not the biggest problem here.

Attached is a screen grab from XHprof and a callgraph.

It seems to me this is death by a thousand cuts. The slowest function takes 165 ms but there are 80, 000 functions called and the total of that just takes too long.

I'm not sure how to speed this up. Do I need to ditch panels, views and all the other nice Drupal tools and just code this up from scratch making sure each part is performant?

Any advice is much appreciated.

Thanks

AttachmentSize
XFProf by Wall time exc174.38 KB
Callgraph9.94 MB

Comments

Dblog

ghankstef's picture

Note - database logging is off in production - saves maybe 100 milleseconds

Acquia Agent & Entity Cache?

mikeytown2's picture

Looks like the acquia agent module is causing the slow http request
acquia_agent_get_subscription()
What can you do to limit the number of http requests this module does?

Have you tried entity cache?
https://drupal.org/project/entitycache

I'll see if I can get the

ghankstef's picture

I'll see if I can get the Acquia Agent module shut off I think its required by Acquia ApacheSolr module though.

I'll try entitycache too I'm not using it currently. I might need to use a cache warmer though to see a lot of benefit. That would be ok though as long as it is performant.

Do you agree though? There

ghankstef's picture

Do you agree though? There are a few things to firm up perhaps but outside of those this seems like death by 1,000 cuts. There is just too much code running to be fast

death by 1,000 cuts analysis

mikeytown2's picture

xdebug's cachegrind output when analyzed with wincachegrind (works under wine) can be very helpful when trying to find the slow points. I find KcacheGrind to not be as user friendly when it comes to finding out the slow points.

Just to check are you using php 5.4 or grater with some kind of op code cache like APC installed? If not, do it as it will help a lot.

Acquia hosting

ghankstef's picture

Acquia hosting 5.3.2-1ubuntu4.21 with APC

APC settings:

apc.cache_by_default On On
apc.canonicalize On On
apc.coredump_unmap Off Off
apc.enable_cli On On
apc.enabled On On
apc.file_md5 Off Off
apc.file_update_protection 2 2
apc.filters no value no value
apc.gc_ttl 3600 3600
apc.include_once_override Off Off
apc.lazy_classes Off Off
apc.lazy_functions Off Off
apc.max_file_size 1M 1M
apc.mmap_file_mask /dev/zero /dev/zero
apc.num_files_hint 1024 1024
apc.preload_path no value no value
apc.report_autofilter Off Off
apc.rfc1867 Off Off
apc.rfc1867_freq 0 0
apc.rfc1867_name APC_UPLOAD_PROGRESS APC_UPLOAD_PROGRESS
apc.rfc1867_prefix upload_ upload_
apc.rfc1867_ttl 3600 3600
apc.shm_segments 1 1
apc.shm_size 192 192
apc.stat On On
apc.stat_ctime Off Off
apc.ttl 7200 7200
apc.use_request_time On On
apc.user_entries_hint 4096 4096
apc.user_ttl 7200 7200
apc.write_lock On On

It's not the Acquia Agent

jchin1968's picture

It's not the Acquia Agent module itself that is slow but your network or the Acquia server. The Exclusive Wall-Time is negligible for the various acquia_agent functions but quite high for the call to stream_socket_client() at 164.835 ms to open a connection.

The 3 calls to fread() at 135.390ms may or may not be an issue depending on the file size of the response and the network latency (i.e. distance) between your server and the Acquia server. 45ms on average per HTTP request doesn't really seem that bad.

HTTP requests, however, only account for 15% of the total execution time. There's other things that are contributing to the slowness.

Try running the Block Timer module (https://drupal.org/sandbox/jchin1968/2061005) on your development environment. It will highlight slow rendering blocks on a page and even blocks that are rendered in the background but not displayed.

I think this is due to

ghankstef's picture

I think this is due to running xhprof in my Dev environment. Presumably in production these functions will run faster because the distance to the Acquia agent server will be much less.

I'll give Block Timer a try

Do you have Panel cache and

Sagar Ramgade's picture

Do you have Panel cache and View cache enabled ? Seems Memcached gets are really few.

Need Drupal help?
Reach Me

Share your Posts, Url, Sites
www.sociopost.com

I this case this is a node

ghankstef's picture

I this case this is a node page. Panels cache is active for things that are 'global'. Activating it for node content really isn't that helpful. Once a page is hit once it gets cached by varnish for 10 minutes. Most of the traffic is anonymous.

Maybe entity cache with a cache warmer would help

Also we have had to move

ghankstef's picture

Also we have had to move parts of the cache back to the Database. Memcache seems to run out of memory if we leave the entire cache memcache. Page cache and views cache use the db cache.

what's the peak traffic?

tloudon's picture

1.5 million pageviews/mo may or may not break down into a lot of concurrent page requests/sec. is the site slow all the time or just during peak loads?

also, have you verified that varnish is caching or identified what URLs are missing? or could you increase the ttl or use content based caching (a la acquia purge)?

fwiw, unless you have some super traffic spikes, on a normal and properly configured site; i would think 99.9% of anon visitors are only having a speedy varnish cached experience.

if the site is marketplace.org, the ttl is 600 seconds. just clicking around, i got mostly quick response times but hit a number of cache misses. i noticed a couple of cache counts around 200, but most were lower in the 10s. i realize this is a news site, but if the traffic covers a normal distribution of URLs--i'd guess quite a few visitors get cache misses b/c of the 10m ttl. varnish could effectively be covering the homepage and a few top links here.

Yeah 10 minute ttl seems to

ghankstef's picture

Yeah 10 minute ttl seems to be the most I can get away with for political reasons much more than technical ones. I could try for much longer cache and then selective clearing but the real problem is 200 + modules and scores of features that are not built for performance - the site needs an audit and a significant rebuild with performance in mind. The current build was more of a situation where they made mockups and threw all the modules they could at it to make it work, look and act like the mock up.

Could you write a crawler to verify cache hits/misses?

tloudon's picture

If you write a crawler that hits your top 100 URLs (say 1 every 3 seconds 6 times a day) and stores the HEAD, in particular:
X-Age:
X-Cache-Hits:

You would be able to tell how effective your caching policy is. I don't think it would be super difficult/time-consuming to write. I could post a gist of something close in ruby...but it might be nice to hijack some boost?? code so it could have a dynamic URL list instead of a static one.

I'm not sure you're going to be able to do a whole lot to improve the speed. Try running ab or whatever benchmarking tool you like against a file that just bootstraps Drupal--no menu routing, no rendering, etc. I would look at better caching policies, but YMMV.

Cheers

I actually did a very similar

Jamie Holly's picture

I actually did a very similar thing in NodeJS a few months back. It would load the front page every couple of hours, then hit the 6 tops stories, all in different threads (using the cluster module). I also recorded the stat output from Apache stats as well as various server metrics (load, memory, network and disk IO, running processes). Took less than 30 minutes to write (thanks to about a half dozen node modules) and helped me track down a problem that plagued me for a couple of months.


HollyIT - Grab the Netbeans Drupal Development Tool at GitHub.

Selective cache clearing can

dalin's picture

Selective cache clearing can go a long way (if a new blog post is written, only clear the blog home page, etc.).

And then ESI (Edge Side Includes) to have different caching rules for different parts of a single page (e.g. the 'most popular' block probably doesn't need to be updated more than once an hour and is the same for every page on the site).

--


Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his

ESI Resources/Tips?

tloudon's picture

Hey Dave,

I haven't seen too many references to sites using the ESI module or sites using ESI in the wild period.

I set it up for one client that had huge traffic demands, but it took a lot of effort and was very difficult to debug. I ran into issues w/ module compatibility w/ ESI blocks, ESI panels, and IIRC context, etc--it's been over a year at this point, so I can't totally remember all the details. It was just a beast.

Anyway, I did see authcache 2-x is geared toward ESI tho and seem to remember hearing Symfony/D8 has solid ESI support. So it looks like there's some cool stuff on the horizon :)

Do you know of many sites running Drupal w/ ESI or any good resources? I'd like to catch up on the current-state--I may need it on a project here in the near future.

Thanks!