(From webchick:)
Hi there, performance team! :)
Back at DrupalCon Chicago, Dries outlined a strategy for Drupal 8 involving a series of "gates" that would help ensure core code quality in a number of different categories: Documentation, Performance, Accessibility, Usability, and Testing. The purpose of gates is to define essentially a set of "checkboxes" which outline the most important aspects to each category, and does so in a way that is both not overwhelming to core developers (list of hoops to jump through is kept as small as possible) and also creates very little additional burden on the team in question (contains sample links to documentation/resources so developers/reviewers can help themselves).
Since we have already traditionally had requirements around documentation, it made sense to start there. Jennifer Hodgdon and some other folks from the documentation team have put together the Documentation gate, which is available at http://drupal.org/node/1203498. What we need is a similar table for each of the other gate areas. And that's where you come in! :D
So I'd love to kick off a brain-storm about what the top performance gates might be (please, no more than 5 if possible), with the goal being to fill out the following table. (This is a wiki page, so be bold!)
(edit below here)
SQL queries
When is this needed?:
If there is an additional, new or changed SQL query in a patch
Details:
When a patch adds database calls (including indirect calls via API functions), the issue should include an indication of how often the query is likely to run. If queries are added or modified that are not identical to existing queries, then additionally EXPLAIN output should be posted to ensure they use indexes as well as possible. The goal is to avoid adding additional database calls unless necessary, and ensuring that queries scale when using large data sets.
Resources:
http://drupal.org/project/devel / http://dev.mysql.com/doc/refman/5.0/en/explain.html
Memory
** When is this needed?**
When large arrays or objects are added or modified, lots of PHP files are included, or when static caches are added.
Details
Loading large arrays or objects into memory, including a very large number of PHP files, or adding static caches can all affect memory usage negatively. Issues should be accompanied by before/after xhprof screenshots or similar data detailing the impact of these changes.
Resources:
xhprof / (devel)
CPU
When is this needed?:
If a patch adds or changes any code that is frequently executed during a 'normal' request to Drupal, especially up to and including full bootstrap, or during the page rendering process
Details:
Ideally post before/after xhprof or cachegrind detail for the area of code being modified. If the change is to a single function, 'unit benchmarks' may be appropriate. Performance results from load testing tools such as ab/siege/jmeter are discouraged since they are often testing hardware rather than the application and are easy to get wrong.
Resources:
xhprof / xdebug
System calls
When is this needed?:
If a patch adds or modifies include/require, file_exists()/file_mtime(), time() calls during normal operation or cache rebuilds. This does not include simply adding new files or modules, however it applies to refactoring of any centralized loading or scanning functions.
Details:
Post before/after xhprof, cachegrind or strace detail for the area of code being modified.
Resources:
http://en.wikipedia.org/wiki/Strace / xhprof / xdebug
Caching
When is this needed?:
If patch adds, removes or modifies static or persistent caching.
Details:
Caching is a trade-off between memory, CPU, storage and developer experience. Before/after xhprof screenshots are recommended to show the impact of changes made. Changes that modify or add large 'rebuild' operations should post similar performance data regarding cache misses.
Front end
When is this needed?:
If a patch makes significant changes to JavaScript or CSS
Details:
If refactoring commonly used JavaScript, post profiling data. If changing JavaScript and/or CSS aggregation logic, size and number of aggregates across several page requests should be posted to the issue
Resources:
http://www.webpagetest.org/, google chrome toolbar, firebug, speedtracer
Comments
added some
OK I added some to this, tried to keep it to five:
SQL queries are relatively easy to measure, the only problem there is not everyone knows how to use EXPLAIN, and it is not always easy to generate realistic data on (i.e. whatever else an EXPLAIN says, rows scanned can really depend on cardinality of data). However that's not too bad with a bit of docs.
Memory, CPU and System calls - I only trust xhprof/xdebug/strace for this - benchmarks with ab/siege/jmeter are useless for 99% of core issues so we shouldn't recommend that. This will need some standardization on how to post that kind of data to the issue, and probably Drupal-specific docs on how to use the tools.
Front end - Owen did a lot of work on this for Drupal 7 in terms of aggregation/gzip etc. logic so would be good to get his comments here. Casey did some very good JavaScript optimization for the overlay so would also be good to get thoughts on that.
I have not included anything about automated performance testing here for the following reasons:
pasting here
Updating the wiki directly maybe wasn't such a good idea, so I'm pasting what I added to a comment. If you have radically different suggestions, feel free to remove mine from the wiki or also post them as comments, then we can update as we go along:
Thoughts
Caches - Static & Persistent
Proper usage of drupal_static & cache_* functions. What & how to cache is almost always done after the code is fully functional; profiling is a good way to discover the bottlenecks, A/B testing is a good way to test various solutions.
Cache bin and Key names should have a standard naming convention.
Other Ideas
Assume writes are expensive. Example: only use variable_set if the value in the DB is different.
Don't use the variable table as a key value lookup for things other than settings.
Check for memory leaks and excessive memory usage.
Proper usage of locking to prevent stampedes & to ensure ATOMIC operations.
Great job getting this going.
Great job getting this going. A couple thoughts.
Lots of patches have minimal
This is true, but we need criteria for what that means. http://drupal.org/node/1203766, http://drupal.org/node/1061924, http://drupal.org/node/1014130 and http://drupal.org/node/1064212 are all issues documenting quite serious regressions in core that resulted from 'innocent' patches in the first place. On only one of the issues that introduced those regressions did anyone review the code in such a way that these issues would be caught (and I reviewed at least two of the original regression-causing patches and asked for benchmarks in one). I think we need to shift at least part of the burden towards those involved in an issue to indicate why it doesn't cause a regression - I'm fed up with coming across them months after issues are committed, not all are avoidable but a lot are.
This is related to the memory section about adding new static caches (or changing the content of static caches), if we can increase to 6 items then this is worth adding I think. While I'm in favour of adding caching where we should, I've also seen issues try to add pre-emptive persistent caching for things that only call a few cheap PHP functions in core, just in case a contrib module does something expensive - which is adding both DX and actual performance overhead to core (hitting the cache layer isn't free either) for hypotheticals. I've also seen cases where we had marginal static caches, that were later changed from static to drupal_static(), and the drupal_static() was as expensive or worse than the code that was being static cached in the first place.
There's some bits of documentation we'll need for this:
a section in the handbook on performance testing and profiling tools, basic install howtos, how to use them, how to share results in the issue queue.
a handbook page with example code for microbenchmarks.
maybe a new 'performance' section of coding standards, to try to document patterns where we have them.
@mikeyP@mikeytown2 - I added a note about variable_set() to the docs. We don't have documented patterns for using the locking framework in core, there are several core issues outstanding on that, so we'd need to resolve that and come up with a repeatable pattern before it can be added here IMO, although a section on race conditions and stampedes might well be worthwhile.fyi
@catch
it's mikeytown2, mikeyP is someone else. Bringing this up because it's not the first time you've confused the 2 of us http://drupal.org/node/1014086#comment-4615740
arggh
Sorry, edited my comment to correct this, I know exactly who you both are but irc has eradicated my ability to type consciously past the first 4-5 characters of a username :(
incorporated some of this
While I disagree that we can determine up front which patches will or won't have minimal performance impact (maybe docs and interface text changes, anything else you at least need to be aware of how often the code executes to know whether it's worth optimizing or not), I reworked the memory section a fair bit and made it more about caching in general. Does that look any better?
I'm not sure that moving the
I'm not sure that moving the emphasis from memory management to caching was a good thing. One of Drupal's biggest scalability issues ATM is memory usage. That is often related to cache misses, but not always - having a huge array will use significant memory regardless of if we are pulling it from cache_get() or rebuilding it on the fly.
On the other hand we need to have a balance between the ease of getting code committed, and good performance. It's already damn hard to get code into core. And by having complex gates we decrease the number of people who are willing to pass through them all. Perhaps we need to rely more heavily on the automated performance tests. If a patch gets committed that causes an increase in memory usage, our automated performance tests need to be able to detect that.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
It also occurs to me that for
It also occurs to me that for many/most of the things on this list people are just going to go "huh?!?!?". For those people perhaps what we need is a tag "needs performance review" like will already have for accessibility.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
split
I split memory management and caching, that brings us up to 6 but it's better than confusing/incomplete.
Automated tests are good for ensuring that what's tested is not getting any worse, but at least for the short-medium term I don't think the tests can be the gate.
To ensure full test coverage we'd need a framework in core, then make people write patches to test the functionality they're patching same as SimpleTest, that seems a much higher barrier than running EXPLAIN or profiling a request to me. Not to mention there's nothing running yet, and it's likely D8 patches (as opposed to D7 backports) will start getting committed within the next few weeks.
Once it's up and running, we could definitely offload some of the gate to the automated testing - but regression tests don't necessarily tell you why something regressed, just that it did, so it will still be important to have a lot more people familiar with profiling and other tools to identify what the bottlenecks are. If we run branch tests, this will all be post-commit/push too. They really feel like different things to me, even if we should work really hard to bring them as close as possible once they get established.
Testbots should show history of performance of tests
I know this has been mentioned before, but there has to be a way for us to gather the historical performance characteristics of our test suite. This would allows early notice for specific problems as well.
It is difficult that we have the testbots on unknown hardware with unspecified performance characteristics, but it's my bet that valid information can be gatherered. I don't believe that Simpletest even currently gives us actual test duration. That would be easy to get. Interpreting it, now that's harder.
What can we do to help us automate the performance gate?
@rfay see
@rfay see here:
http://drupal.org/sandbox/catch/1186744
There's already discussions in that queue.
--
Dave Hansen-Lange
Director of Technical Strategy, Advomatic.com
Pronouns: he/him/his
I did create Capture duration
I did create Capture duration for test runs in the PIFR queue. Although that information is difficult to use well, I just can't imagine why we don't at least capture it. Note that I'm talking about capturing the duration for each Simpletest test, not the overall duration of the suite (although that's interesting too).
Jenkins grinder plugin keeps historical test data
A goal of ours on a project is to track historical data for performance test runs to identify regressions/improvements. No recent activity, though, but maybe an option:
https://wiki.jenkins-ci.org/display/JENKINS/Grinder+Plugin
I haven't yet evaluated this tool, however.
Why not use jenkins
Why not use jenkins out-of-the-box test run history? It provides regression/improvements trends. Jenkins can store as many test results as necessary
Simpletest isn't run from
Simpletest isn't run from Jenkins for qa.drupal.org, it's run on PIFR which is a Drupal-based test runner in PHP.
When we have specific automated performance tests to run on branches, it's very likely these will be Jenkins jobs - but that framework doesn't exist yet. However to deal with potentially differences in hardware for test runs, we've come up with a design that does not rely on wall time but rather measures resources used, you can read more about that at http://groups.drupal.org/node/133679
Front end
I am not very familiar with Javascript profiling tools. What I do know is that for performant Javascript it is most important to only selectively access the DOM. For example before I optimized Overlay, it was adding event handlers to every link, while one event handler on the document object was enough. DOM traversal/modification is pretty heavy, even on modern browsers.
This is great discussion so far...
As you discuss these points, please keep an eye out for places where existing docs are sub-standard or even missing altogether. It needs to be crystal clear to developers, reviewers, and committers when these gates are needed to be passed, and how to pass them when they do. We can't set up a gate without a "Resources" column, or we're right back to "Hey catch, can you benchmark my patch?"
suggested docs
Yep, things we need docs for, some of these may exist but I have a feeling a lot don't. I am not going to personally write all this documentation, although heyrocker already agreed to be a guinea pig and help with this so looks like that may not be a problem. I'm very happy to stop doing profiling/benchmarks and show people how in exchange for documentation too :)
This looks like a long list but ideally we want granular pages for this stuff, since some is generic, some not so much:
How to:
Use devel query log in D7/8 and understand EXPLAIN output
Install xdebug on $environment * 3
Install xhprof on $environment * 3
Profile Drupal using xhprof
Profile Drupal using xdebug
View xdebug reports (webgrind/kcachegrind and others)
View xhprof reports (devel UI + xhprof's own)
Interpret xdebug results.
Interpret xhprof results.
Use strace and interpret the results.
Run microbenchmarks on simple functions with a custom bootstrap script and microtime.
Common pitfalls when profiling, stracing and benchmarking.
Also similar docs for front-end tools.
Other docs:
- caching conventions in Drupal (static and persistent)
- stampede protection and ensuring atomic operations using the lock framework (needs core to be consistent first, there is no pattern at the moment and several outstanding issues).
More docs
These should not be pre-requisites, but:
Initial feedback...
(I'm reviewing the text in the node, assuming that's the most up to date.)
The "SQL queries" gate sounds good. It seems to make it pretty easy to tell when I need to do that thing. I'd like to see the Resources section fleshed out with a http://drupal.org/writing-secure-code -esque one-pager of how to interpret and fix EXPLAIN results too, for more novice developers.
"Memory" could stand to be fleshed out more. What qualifies as "large" objects/arrays? What qualifies as "lots" of PHP files/includes? Does static caches refer to the
statickeyword, ordrupal_static()or both or?"CPU" still needs work, as well. Most Drupal developers have not stepped through index.php in a debugger, so have no idea what "especially up to and including full bootstrap, or during the page rendering process" means. Can you cite some examples? Would "touches includes/bootstrap.inc" suffice?
Also, I have no idea what 'unit benchmarks', and ab/siege/jmeter means. I don't think you need to outline what people should not do; just outline what they should.
"System calls" -- centralized loading or scanning functions: could you cite some examples here? (I assume you mean things like module_invoke_all() or file_scan_directory(), but not sure.)
"Front end" is especially vague and could really use a tutorial or something; I'd have no idea how to interpret those results if they were posted.
What qualifies as "large"
There is no easy answer to this, especially between full bootstrap and page caching, that's part of why page caching broke so many times and in so many different ways during D7 - because people benchmarked full page requests to validate changes to code that runs during cached page requests. Also the size of objects and arrays can vary massively between sites in a lot of cases.
"lots" of PHP files/includes could probably be "more than a couple" - since usually you are changing one or many but not often in between.
Could possibly say "any code in /includes, hook_boot() or hook_init()" except there is lots of rarely called code in /includes where we probably don't care much. I would say if no-one in an issue knows where the code they are modifying actually gets called or how frequently, that is a serious problem (both for pointless micro-optimizations where they make no difference, and regressions in the critical path). If this gates thing works, the only way is if people are actually considering questions like that - the examples are broad because Drupal code paths are not predictable in most cases.
Yeah that's right, examples would be good to add.
Front end is vague because I have least experience there, hoping someone who does can flesh that out.
Large objects/arrays
Anything over 1MB when its been serialized. Memcache & MySQL default configurations have issues with items over 1 MB. Trying to figure out if a code change will hit this limit is hard because as catch says, different configurations lead to different pain points.
Yeah 1mb is a good rule of
Yeah 1mb is a good rule of thumb for maximum size. Memcache usually compresses so can store more without raising slab size, and on MySQL max_allowed_packet is often higher, but 1mb is still a lot for any one single item. This assumes the array is written back to the database which is the case some of the time but not all of it.
But yeah there's no way to tell which are which, most of the caches in core that can reach 1-4mb each start off at 100-700kb in a default install, and they may depend both on which modules are installed, and site-specific configuration. The idea of the bullet point is to make people think about the changes they're making in terms of memory - I don't think we can make actual rules for what you can and can't do (and if we did, core would be breaking all of them at the moment).
Started collating a list of
Started collating a list of the worst D7 performance regressions (where either the gate and/or automated performance testing may have helped) at http://drupal.org/node/1215136
I'm concerned about the
I'm concerned about the documentation burden here, at the rate this is going the gate is going to require 30-odd pages of in-depth documentation. What tools get used for performance testing varies over time - xhprof didn't exist in early 2008, services like Browsermob are even younger, many tools are only available on certain OS etc.
So any documentation that is too much tied to how to use specific tools risks getting dated quite fast, or being counter productive. There's also plenty of docs around on the internet on how to use most of this stuff, we don't necessarily need to duplicate all of this on Drupal.org.
On the other side there is 'how to write performant code' - but most of this stuff makes no sense in isolation, and most pages I see about it are stupid voodoo micro-optimization that doesn't take context into account. We can get around that issue with a "what not to do" and "Drupal performance regressions we love to hate" kind of stuff but anything that says "do this, don't do this" is unlikely to be useful - you can still mess things up, or be fine, whether you follow particular guidelines or not.
If we're going to get anything from this gate it should be this:
when people are working on core patches, they bear performance and scalability in mind, at all.
when changing code, the frequency of evaluation of that code should be part of the criteria for assessing performance impact.
the performance impact of changes is measured using appropriate tools (not guessing, not ab -c1 -n500 for every single change regardless of whether it gives meaningful results or not)
the performance impact of changes is understood in terms of how it relates to resource usage (rather than raw requests per second in a limited benchmark) - this means memory, cpu, file system i/o, database, network calls from within the application, cache invalidation, stampedes, network use between the user and the browser, browser performance itself.
the performance impact of changes is understood in terms of how it relates to how Drupal sites may run into issues (large data sets, lots of pages, lots of modules, cold starts etc.)
If those start to come into place, then we should see less regressions, but I don't think it possible to come up with a set of rules that will prevent regressions going in - if we tried to do it like code style, core would massively fail already.
Measure and report... and that's all
IMO if we just have a way to measure, everything will work out. The "gate" should just be modest, consistent information that can be used to judge whether there is a performance impact, and if so, whether it's worth it. If we have the feedback everything will work out. The problem in the past is that the only feedback we've had is catch!
I don't think we should even worry about anything else, for the very reasons you're saying: We will just turn everything into a morass of paperwork.
yes!
i totally agree with this.
to that end i've started some really early hacking on collection xhprof data for known setups via simpletest:
http://drupalcode.org/sandbox/justinrandell/1212436.git/commit/3bc9ebcba...
moved sandbox into xhprof.module
i've moved the sandbox code into xhprof.module and cleaned it up a bit, so we now have raw data collection capabilities via simpletest with that module.
next steps include:
Front-end ideas
I think the existing text is pretty good. I think the main areas we are likely to see regressions are:
- Aggregation logic changes (even "small" ones) - needs extremely careful, methodical testing (in real browsers, with typical latency)
- Any new CSS/JS inclusions need to be assessed if they should be part of the global aggregate(s), or the page one(s) [we have some docs on this]
- Changes in CSS/JS inclusion (tags/loaders) need to have careful timing runs (in real browsers, with typical latency)
- JS code that does significant DOM walking or other potentially CPU, RAM or network intensive activity (e.g. overlay changes, browser-side table sorting, AJAX page construction) should be profiled for browser performance
Any others that come to mind? Should we add this list to the front-end part of the gate?
Images
I think we should mention image optimization, such as image compression and using sprites where appropriate.
@dcmouyard on Twitter
Possible to wrap up by Friday?
Since Dries wants to announce these gates at DrupalCon London, which is in 3 weeks, we basically need to wrap this up by the end of the week in order to give ample time for review and revision. Does this sound doable? How can I help?
Moved to http://drupal.org/core-gates
Thanks so much for your hard work on this, everyone. This is a really solid first stab, so I went ahead and moved it to http://drupal.org/core-gates, where we can always refine it later if necessary.