Scalability Presentation Notes - SFDUG July 2008

Events happening in the community are now at Drupal community events on www.drupal.org.
You are viewing a wiki page. You are welcome to join the group and then edit it. Be bold!

I'm posting this as a wiki for others to add notes or links - based on Neil's presentation we attended. I would suggest conversation / disagreement / further discussion about the recommendations herein happen elsewhere - i.e. in the comments, etc. :-)

My notes are now up -Neil

Notes on Scalability
Neil Drumm
SF Drupal User's Group
July 14, 2008

OVERVIEW

  1. Solve the easy problems first.
  •  E.g. setting the right Drupal configurations, CSS, etc
  •  More below
  • Buy more hardware
    •  It's cheaper than programmers
  •  Solve the hard problems
    •  Hard problems = a few hours to a few days to solve

    Scalability is not just fixing Drupal issues
    - Whole LAMP stack
    - Also CSS, HTML, Javascript problems (what it takes to render a page)

    A Complex Process
    - Find slowest part
    - -(Whether it's the database or javascript)
    - Fix
    - Repeat
    (Ba-dum tshhhh....)

    Scaling PHP is always the same

    • You usually start w/ one server
      (or Amazon web services, or VPS "virtual private server")

    • Split DB server and web server
      Drupal DB likes to have a lot of RAM
      Apache - take off of DB server

    • Add more web servers
      Round robin DNS
      Load balancer

    • Eventually DB clustering
      When?  Scale example = Drupal.org has DB clustering
      More on clustering below

    EASY FIXES

    • Turn on Drupal caching
      This makes anon page requests = 1 DB query
      Otherwise typical Drupal page request is 60+ DB queries

    • "Minimum cache lifetime" setting - make it longer

    • Enable block cache (Drupal 6)

    • Select "Optimize CSS" & "Optimize Javascript" settings (javascript in Drupal 6)
      Merges all various modules's css and javscript into one file (CSS or javascript)

    • Watchdog slows down sites; Drupal 6 allows for swapping out with other logging mechanism

    "Database clustering is not too fun"

    • What you can do instead
      Optimizing MySQL is key
      MySQL default is configured for a laptop (i.e. not for a server)!

    • MySQL Report (from hackmysql.com)
      Extensive report, extensive documentation at hackmysql.com

    • MySQLa
      Checks the slow query log file
      Run "Explain" in front of your query (on the command line)
      Returns query plan from MySQL
      Look for:

    • Key column always filled in
    • Rows should be low
      Dev.mysql.com - explains the this "explain" table

    • Devel module can show queries
      Show how long they take
      Can show querys that take longer than a time-set threshold
      Shows how many times that query was called
      These queries will highlight:

    • What is taking a long time
    • What is getting queried all the time (i.e. not optimized)

    Watch out for:

    • Views

    • Anything executing too many queries

    • E.g. Views calling other views
    • Views usually perform 7-8 queries each

    Side conversation:
    Stored procedures are not used in standard Drupal dev (e.g. not in core) because these are implemented differently in different database systems (e.g. MS SQL vs. MySQL vs PostGRES)).

    PHP / APACHE FIXES

    • Install op-code cache
      This is a PHP extension
      Provides a good Apache speed improvement
      E.g. APC or E-excelerator
      Note:  Can cause site faults
      But can be configured to automate fixes (i.e. restart Apache)

    • Optimize calling external web services
      Set up proxy to cache these external services
      E.g. Squid
      Installing Squid can add a lot of complexity

    OPTIMIZING FRONT-END

    • Test w/ Firebug
      Net tab: shows all requests used to build page
      YSlow (Yahoo add-on):
    • Letter grading of various services
    • Used for large installations

    Your aim is to reduce HTTP requests
    Javascript profiler (in Firebug) - more JS, slower page load

    MORE COMPLEX FIXES

    • Using MemCache
      Caches whole "objects" (user / node object)
      I.e. Caches results of one object = multiple qrys in one
      Used in addition to op-code cache
      Run it as close to web server as possible
      Requires code patch
      Hard to debug

    • DB Clustering
      Structure:
      1st db server - all rights (read and write)
      2nd "slave" db server - only read rights
       (DB clustering ability is built-in to Drupal 6)

    Example:  Drupal.org
    2 load balancers, w/ Squid
    3 web servers:
    - DB master - read/write
    - Slaves - search / read only

    OTHER NOTES

    • VPS "virtual private server" recommendations
      Advomatic uses Voxel.net (all Xen machines)
      Groups.drupal.org/highperformance  (node/229)
      "Anything with Shack in the name is a bad idea"

    • Tag1Consulting.com/drupal
      Drupal performance checklist

    SF Bay Area

    Group organizers

    Group categories

    Resources

    user group

    Group notifications

    This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

    Hot content this week