Site performance (Complex front page layout + Views)

Events happening in the community are now at Drupal community events on www.drupal.org.
kostajh's picture

Hi all,

I am currently working on a project to convert a static HTML newspaper site (with about 1700 articles in English and Spanish) to Drupal 6. Most of the data is imported and the site is pretty much assembled, but I am running into some major performance issues that I need some help with.

The basic content type used throughout the site is "Article", which has about a dozen CCK fields.

Some background about how the site is set up. I have used Views to build most of the site as follows. The newspaper has several core Departments. For example, "Music", "Politics", "Society", "International", etc.

For each Department, I created a View. Then I created two displays: "Page" and "Block". The Page display is used to display lists of content (with exposed filters and a pager to limit the lists to 20 items). The Block is used for front page display of just the titles of the three most recent articles in that department. I have a Department Taxonomy to file each article, then I have another Taxonomy (with the same terms as Department) to pick a "Primary Term" - as of now this is used to pick where the article is displayed in lists of content on the front page (i.e. since the same article is often filed in multiple Departments, I don't want the same posted in multiple blocks on the front page), but it will probably be useful in other circumstances too.

Then I have several Views associated with Node Queues to display the three most recent articles entered with title, author, date, description; a View of a Node Queue to display the lead story with Photo, and a couple other Views also tied to Node Queues to display various featured items, etc.

So here are my questions:

1. What is the best strategy to reduce the number of queries for the front page? I have read that the maximum limit of queries one should aim for on the front page is 1000. Running the devel module last night, I accessed the front page with the following result:

Executed 4879 queries [!!] in 4284.22 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 16505.7 ms

That is not good. From there I navigated to an single article node with the result:

Executed 326 queries in 4264.23 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 5483.45 ms

I navigated back to the Front page which should have been cached by now:

Executed 481 queries in 3319.64 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 9122.86 ms

And, looking at a listing of content (i.e. departments/society) yields:

Executed 344 queries in 2112.89 milliseconds. Queries taking longer than 5 ms and queries executed more than once, are highlighted. Page execution time was 5470.19 ms

Now clearly this is not very good. I can only imagine what the picture will look like when there are hundreds of logged in users and thousands of page views each day...!

Right now I am more concerned about the number of queries than the page execution time at this point as I'm fairly certain the latter is impacted to a pretty high degree by an incorrectly configured VPS (which our tech person is going to sort out).

So how can I reduce the number of queries? Am I taking the right approach in creating a View for each Department? Or does it make more sense to create a View called "Department Blocks" and add displays for each Department taxonomy term, and likewise, a View called "Department Pages" and add displays for each term?

What are my options for front page displays? I do not want to use Panels as it seems too buggy and prone to crashing (from my own experience with it). I don't feel comfortable using it on a production site.

2. What is the best practice for using Taxonomies with custom content types?

For example, I have this option checked: "Save values additionally to the core taxonomy system (into the 'term_node' table).
If this option is set, saving of terms is additionally handled by the taxonomy module. So saved terms from Content Taxonomy fields will appear as any other terms saved by the core taxonomy module. Set this option if you are using any other taxonomy application, like tagadelic. Otherwise terms are only saved in the cck tables and can only be accessed via the node or a view."

Now I am not using Tagadelic, though I am using the Related Content module to automatically generate related content blocks when the user is viewing an article. Is it a better practice performance-wise to not have these terms saved to the term_node table, or does it not make much of a difference? If it makes a big difference I could change that and also ditch the Related Content module.

3. Views and Depth filters

The Department "International" uses depth filters to display all terms that are children of the parent term, International. I do this also for regions. For example I have created a display called "Africa" for the International View which uses a Taxonomy depth filter to include all countries that are in Africa. Does this slow things significantly? For the Africa display example, would it be a better practice to use the Taxonomy filter without depth and simply select all the taxonomy terms I want displayed, i.e. "Africa", "Sudan", "Egypt", "Nigeria", etc.

4. URL aliases

Part of this project entailed mapping the old article URL aliases to the new site. Thus each one of the 1700 articles has a URL alias that copies the old site's URL and ends with ".htm". This is consistently one of the red flags from Devel's page load output (usually about 72 ms or so). Anything that can be done about this? Is there another, faster method?

5. Any other suggestions?

I am not really looking for server optimization advice so much as tips for fine tuning my Drupal site. I want to get those queries down, then deal with server optimization. Any advice or pointers would be extremely welcome.

Thanks in advance to your responses and help!

Comments

A number of things come to mind

ken hawkins's picture

But I'll start with just a couple.

First off, yes that's a lot of queries, but even then the execution time seems pretty high for a test site. Are you running on a particularly slow box?

How are you controlling the display of your lists on the home page? Panels or via the php template?

Our homepage is pretty complex but is still relatively quick (with no cache):
Page execution time was 1921.85 ms. Executed 1206 queries in 816.16 milliseconds.

And that's for what's probably 20+ views calls.

Oh, and don't forget to check your variable table (if you have devel on: /devel/variable). I've seen massive 15,000+ variables get in there and really slow things down.

The best strategy ...

yelvington's picture

The best strategy to reduce the number of queries is to cache the completedly built page. http://drupal.org/project/boost is a good simple solution for a smaller site with a single server. An anonymous visitor executes zero database queries if the page is in the cache.

The number of queries isn't as important as the nature of the queries. Turning on query logging (on a test site) in mysql will give you some clues. You can inadvertently create some heinous queries with Views.

We are using Panels on our production sites, in both Drupal 5 (Panels 2) and Drupal 6 (Panels 3) installations, at sites delivering up to a million unique users and 10 million pageviews a month per site.

To accommodate the traffic, both anonymous and authenticated, we brought in Four Kitchens as consultants and tuned the following configuration:

  • External load balancer/firewall.

  • Two reverse-proxy caching servers (Squid/Squirm) facing the outside world. If an item can be cached, it's served directly without touching the Drupal application layer. This covers media objects, scripts, CSS and any pages delivered to anonymous visitors.

  • Another load balancer/firewall.

  • Multiple Drupal application servers. Anything Squid can't directly handle lands here. Primarily this means "first recent view" of any page and any static object, plus every authenticated pageview.

  • Multiple Memcached servers. All Drupal cache tables have been moved from MySQL into Memcached. This is a phenomenal performance enhancer.

  • One fairly muscular MySQL database server with most tables altered from MyISAM to InnoDB. Generally MyISAM is better for read-intensive and InnoDB is better for write-intensive processes (due to differences in the way write locks are handled). Our caching configuration has radically cut the load from reads, so tuning the server for writes is a good move.

Make sure you have properly tuned your MySQL server and that you don't turn on query logs on a production box.

Looking again

ken hawkins's picture

All good points, yelvington, but I think he's got some problems way before he hits the caching, load balancing optimizations.

When you've got 326 queries executing in 4264.23 milliseconds it probably means you've got a bad view query in there.

If you could paste in the query log that devel is spitting out (or just look for long queries) I bet you could nail it down.

Recently we had a comment query that was taking nearly 200ms to execute.


Thinking outside of the module opens many paths.

Be careful with Views

jdidelet's picture

Views could be really nice if you understand and master the UI but be careful with request generates by Views. See devel result and the request send by your views page or block. Sometime, it's crazy. My advice, create your own module, your own functions and try to optimise your requests inside. Personally, I use Views for generate an request. After, I copy/past this request in a function and I try to optimise that.

Of course yelvington's advise are really great but before that, try to optimise your codes and requests (and reduce the number of your request).

For the cache, you can try to use too http://pecl.php.net/package/APC.

Good luck !
Julien Didelet


Julien Didelet
Founder
Weblaa.com

Update

kostajh's picture

Hello all-

Thanks for the comments and suggestions.

I think Julien and Ken are right that a big part of the problem, if not most of it, is with poorly configured Views. I'm going to take some time to go through each one and clean up what I've got, then test again. I also need to enable caching in each view.

One thing I did try was to use PressFlow. The performance boost is immediately noticeable.

Hi, As we worked towards

shyamala's picture

Hi,

As we worked towards some exciting performance gains, some of our learning

  1. Use Drupal's Page Cache
  2. Run Mysql tuner to understand how Mysql responds to your site
  3. We never had in our page/panel or block a code rendered directly from a view, all our view results were cached. The lullabot article on caching, The Beginners guide to caching could be a very good starting point.
  4. Do not panic by large no of queries, the queries with longer execution time are critical - we focused on the ones on our large tables - NODE, session, user, log, taxonomy tables
  5. Queries with inner joins or distinct clause were also marked and reviewed
  6. We had large benefits by restricting the log size, periodic clearing of the session tables
  7. We did add a patch for Drupal look up path and url redirect, will have that appended to this list.
  8. We moved from Drupal search to Google search!
  9. USE of PHP accelerator is a must

Checkout:

  1. Server tuning considerations: http://drupal.org/node/2601
  2. http://2bits.com/articles/performance-logging-module-log-performance-sta...
  3. http://blamcast.net/articles/speed-up-drupal
  4. http://buytaert.net/drupal-performance

What is the Kind of traffic you expect, do you have analytics installed in the current HTML site...

All the best, Happy performance Tuning...

Netlink Technologies Ltd
I blog and Twitter :)

Thanks Shyamala! The

kostajh's picture

Thanks Shyamala!

The articles on 2bits.com are fantastic. One thing I did yesterday was look at APC, following the notes on this article: http://2bits.com/articles/importance-tuning-apc-sites-high-number-drupal.... When I took a look at apc.php on my site, sure enough it resembled the first chart on the 2bits article. I doubled the memory to 80 M and the site is immediately much faster. The page execution times were halved.

I'm going to take some time this weekend to go through all the articles on this subject on 2bits and the links folks posted here as well.

Does anyone have experience about using APC with Boost? Are the two compatible?

Newspapers on Drupal

Group organizers

Group categories

Topics - Newspaper on Drupal

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds: