Scalability Presentation Notes - SFDUG July 2008

You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!
public

I'm posting this as a wiki for others to add notes or links - based on Neil's presentation we attended. I would suggest conversation / disagreement / further discussion about the recommendations herein happen elsewhere - i.e. in the comments, etc. :-)

My notes are now up -Neil

Notes on Scalability
Neil Drumm
SF Drupal User's Group
July 14, 2008

OVERVIEW

  1. Solve the easy problems first.
    •  E.g. setting the right Drupal configurations, CSS, etc
    •  More below
  2. Buy more hardware
    •  It's cheaper than programmers
  3.  Solve the hard problems
    •  Hard problems = a few hours to a few days to solve

Scalability is not just fixing Drupal issues
- Whole LAMP stack
- Also CSS, HTML, Javascript problems (what it takes to render a page)

A Complex Process
- Find slowest part
- -(Whether it's the database or javascript)
- Fix
- Repeat
(Ba-dum tshhhh....)

Scaling PHP is always the same

  • You usually start w/ one server
    (or Amazon web services, or VPS "virtual private server")

  • Split DB server and web server
    Drupal DB likes to have a lot of RAM
    Apache - take off of DB server

  • Add more web servers
    Round robin DNS
    Load balancer

  • Eventually DB clustering
    When?  Scale example = Drupal.org has DB clustering
    More on clustering below

EASY FIXES

  • Turn on Drupal caching
    This makes anon page requests = 1 DB query
    Otherwise typical Drupal page request is 60+ DB queries

  • "Minimum cache lifetime" setting - make it longer

  • Enable block cache (Drupal 6)

  • Select "Optimize CSS" & "Optimize Javascript" settings (javascript in Drupal 6)
    Merges all various modules's css and javscript into one file (CSS or javascript)

  • Watchdog slows down sites; Drupal 6 allows for swapping out with other logging mechanism

"Database clustering is not too fun"

  • What you can do instead
    Optimizing MySQL is key
    MySQL default is configured for a laptop (i.e. not for a server)!

  • MySQL Report (from hackmysql.com)
    Extensive report, extensive documentation at hackmysql.com

  • MySQLa
    Checks the slow query log file
    Run "Explain" in front of your query (on the command line)
    Returns query plan from MySQL
    Look for:

  • Key column always filled in
  • Rows should be low
    Dev.mysql.com - explains the this "explain" table

  • Devel module can show queries
    Show how long they take
    Can show querys that take longer than a time-set threshold
    Shows how many times that query was called
    These queries will highlight:

  • What is taking a long time
  • What is getting queried all the time (i.e. not optimized)

Watch out for:

  • Views

  • Anything executing too many queries

  • E.g. Views calling other views
  • Views usually perform 7-8 queries each

Side conversation:
Stored procedures are not used in standard Drupal dev (e.g. not in core) because these are implemented differently in different database systems (e.g. MS SQL vs. MySQL vs PostGRES)).

PHP / APACHE FIXES

  • Install op-code cache
    This is a PHP extension
    Provides a good Apache speed improvement
    E.g. APC or E-excelerator
    Note:  Can cause site faults
    But can be configured to automate fixes (i.e. restart Apache)

  • Optimize calling external web services
    Set up proxy to cache these external services
    E.g. Squid
    Installing Squid can add a lot of complexity

OPTIMIZING FRONT-END

  • Test w/ Firebug
    Net tab: shows all requests used to build page
    YSlow (Yahoo add-on):
  • Letter grading of various services
  • Used for large installations

Your aim is to reduce HTTP requests
Javascript profiler (in Firebug) - more JS, slower page load

MORE COMPLEX FIXES

  • Using MemCache
    Caches whole "objects" (user / node object)
    I.e. Caches results of one object = multiple qrys in one
    Used in addition to op-code cache
    Run it as close to web server as possible
    Requires code patch
    Hard to debug

  • DB Clustering
    Structure:
    1st db server - all rights (read and write)
    2nd "slave" db server - only read rights
     (DB clustering ability is built-in to Drupal 6)

Example:  Drupal.org
2 load balancers, w/ Squid
3 web servers:
- DB master - read/write
- Slaves - search / read only

OTHER NOTES

  • VPS "virtual private server" recommendations
    Advomatic uses Voxel.net (all Xen machines)
    Groups.drupal.org/highperformance  (node/229)
    "Anything with Shack in the name is a bad idea"

  • Tag1Consulting.com/drupal
    Drupal performance checklist