Posted by webchick on April 2, 2015 at 9:54pm
Start:
2015-04-13 (All day) - 2015-04-19 (All day) Europe/Paris Organizers:
Event type:
Sprint
The goal of this sprint is to uncover the "unknown unknowns" blocking Drupal 8 from shipping, performance-wise. The main focus will be on profiling Drupal 8 under a variety of scenarios to determine the next places to start optimizing, then creating issues for these things so that a wider group of folks can work on them.
We're specifically looking for two types of sprinters:
- Finders: Those with profiling experience, using tools like XHProf, Blackfire, and flame graphs, who can analyze profiling data to flag problems and come up with recommendations on optimization targets. Your goal is to keep drilling down until we get to the bottom of what's making Drupal 8 slow.
- Fixers: Those who can both work on existing known performance issues, as well as take direction from the first group and create/review patches to improve areas of performance. We're especially looking for those with experience with things like the routing system, bootstrapping, container services, route access checking, and asset handling.
The Sprint Planning Doc has everything you need to know, including stuff to do before the sprint and a nice list of all the outstanding critical issues to tackle.
If you're planning to attend (remote attendees welcome!) fill your details in the DevDays Sprint Attendance sheet.
Hope to see you there!
Comments
Update from the sprint: Monday, April 13
profiling the costly asset handling as uncovered in comment 36 of that issue.
following people working on it:
find a bug, which is great, because it points out that our cache tag coverage
is already very complete
were still arriving; but initial discussions already happened amongst yched,
dawehner, Alex Pott, amateescu, znerol, Berdir, xjm and Wim Leers. We'll
start diving into that deeply tomorrow.
Update from the sprint: Tuesday, April 14
Today, we really got up to speed!
On the page cache front
(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)
On fixing performance issues other than those related to the page cache
On finding more performance issues
Several "novice profiling" issues are there too, which should be a good fit for people new at profiling, but still have a nice potential :)
Food for thought for the next days.
Update from the sprint: Wednesday, April 15
Still steaming ahead. Progress on many, many fronts at once. 4 patches committed, 6 RTBC, and several more approaching RTBC.
On the page cache front
(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)
On fixing performance issues other than those related to the page cache
On finding more performance issues
We didn't do further profiling today. dawehner opened https://www.drupal.org/node/2470926 and https://www.drupal.org/node/2471657 also based on yesterday's profiling. Both of those have already been picked up by a contributor at the sprint!
EDIT: I stupidly forgot penyaskito's issue. Ping me if I forgot to mention another issue or your name! So many things going on, that it's easy to miss something!
Update from the sprint: Thursday, April 16
Still steaming ahead. Progress on many, many fronts at once. 5 patches committed, 5 RTBC, and several more approaching RTBC.
On the page cache front
(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)
Enormous progress here!
So, in other words, almost all known remaining ways to break the page cache have already been fixed, or are RTBC!
On fixing performance issues other than those related to the page cache
On finding more performance issues
Update from the sprint: Friday, April 17
Still steaming ahead. Progress on many, many fronts at once. 4 patches committed, 1 RTBC, and several more approaching RTBC.
On the page cache front
(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)
Note: "Move internal page caching to a module to avoid relying on config get on runtime" (https://www.drupal.org/node/2368987) was committed (listed below), which made Drupal 8's page cache significantly faster! And "All stack middlewares are constructed at the same time even for cached pages" (https://www.drupal.org/node/2473113) has the potential to make it faster still!
On fixing performance issues other than those related to the page cache
On finding more performance issues
We didn't do further profiling today.
Post-most-of-the-sprint profiling
THIS IS NOW OUTDATED, SEE COMMENT BELOW!
Monday vs. Saturday.
700499c2598122c973909799e18c118661bd136e
cbc03163b937bd6c5de70ed88cee062286680431
+ https://www.drupal.org/node/2395143 + https://www.drupal.org/node/2471218Test site is a fresh install of D8, with one node that contains an image and 3 terms. PHP 5.5.11, OpCache on.
We test both out-of-the-box performance (the "noapcMON" prefix, which stands for "no APC classloader, Monday's HEAD) and the performance with the APC classloader manually enabled for Monday's HEAD, because that is now enabled by default (the "apcMON" prefix).
Everything is tested with authenticated user 2, unless otherwise specified. Whenever SmartCache is used, we use [#2429617-50] for Saturday, and [#2429617-44] for Monday.
Page cache (anon)
(
ab -c 1 -n 1000 http://host/
)21% faster.
First, Monday's HEAD versus Saturday's, out of the box
/node/1
/node/1
+ SmartCacheSecond, Monday's HEAD versus Saturday's, with the APC classloader manually enabled for Monday
/node/1
/node/1
+ SmartCacheFixing performance regressions
We found 2 performance regressions that are not part of the above benchmarks:
With both applied we save:
-144 function calls
-0.5%
( @todo Add final benchmark numbers )
Also cold cache performance is much better with Yaml File Cache now:
Monday: 5.7-6.5 seconds
Saturday+Plus Some: 3.9 - 4.5 seconds
Hence saving on average 30% after a 'drush cr'.
Performance before vs. after sprint
The profiling I did above was flawed: I was selecting the fastest out of 10 runs, due to time constraints. That lead to unreliable numbers. Especially because it meant rendering each page (and the XHProf results page) in the browser.
I redid that, this time using benchmarking, which doesn't suffer from variability quite as much.
Before vs. After
700499c2598122c973909799e18c118661bd136e
5f987ee0acfb7843c4050a2f7b8ff9da274945b3
Test site is a fresh install of D8, with one node that contains an image and 3 terms. PHP 5.5.11, OpCache on.
We test both out-of-the-box performance (
before
, which stands for "no APC classloader", which was the default before the sprint) and the performance with the APC classloader manually enabled for HEAD before the sprint, because that is now enabled by default (before-apc
).Everything is tested with authenticated user 2, unless otherwise specified.
Before each test, Apache is restarted. Benchmarked using
ab -n 1000 -c 1 -C <sessioncookie> <URL>
. If the standard deviation looks suspiciously high, thus pointing to other apps on my development laptop taking up too much CPU or I/O time, then I re-ran it. Usually it took 2 to 3 runs.Percentage deltas in the conclusions are taken by looking at the mean, not the median, because the median does not have more granularity than the millisecond level. Comparing the differences in mean is acceptable here because the standard deviation is always low.
Observed performance
Page cache (anon)
(
ab -c 1 -n 1000 http://host/
)before
Requests per second: 117.77 [#/sec] (mean)
Time per request: 8.491 [ms] (mean)
Time per request: 8.491 [ms] (mean, across all concurrent requests)
Transfer rate: 1498.69 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 7 8 0.5 8 12
Waiting: 6 8 0.5 8 11
Total: 7 8 0.5 9 12
before-apc
Requests per second: 121.77 [#/sec] (mean)
Time per request: 8.212 [ms] (mean)
Time per request: 8.212 [ms] (mean, across all concurrent requests)
Transfer rate: 1549.66 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 7 8 0.5 8 12
Waiting: 6 7 0.5 7 12
Total: 7 8 0.5 8 13
after
Requests per second: 154.76 [#/sec] (mean)
Time per request: 6.462 [ms] (mean)
Time per request: 6.462 [ms] (mean, across all concurrent requests)
Transfer rate: 1969.06 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 5 6 0.6 6 21
Waiting: 5 6 0.4 6 8
Total: 5 6 0.6 6 21
Front page (auth)
before
Requests per second: 11.75 [#/sec] (mean)
Time per request: 85.076 [ms] (mean)
Time per request: 85.076 [ms] (mean, across all concurrent requests)
Transfer rate: 151.58 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 82 85 1.5 85 91
Waiting: 73 75 1.5 75 83
Total: 82 85 1.5 85 92
before-apc
Requests per second: 12.30 [#/sec] (mean)
Time per request: 81.273 [ms] (mean)
Time per request: 81.273 [ms] (mean, across all concurrent requests)
Transfer rate: 158.91 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 78 81 1.3 81 89
Waiting: 69 72 1.3 72 80
Total: 79 81 1.3 81 89
after
Requests per second: 12.66 [#/sec] (mean)
Time per request: 78.984 [ms] (mean)
Time per request: 78.984 [ms] (mean, across all concurrent requests)
Transfer rate: 163.45 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 1
Processing: 75 79 1.7 79 89
Waiting: 66 70 1.6 70 80
Total: 75 79 1.7 79 89
7% faster out of the box. 2.5% faster if you had manually enabled APC.
/node/1
(auth)before
Requests per second: 9.95 [#/sec] (mean)
Time per request: 100.480 [ms] (mean)
Time per request: 100.480 [ms] (mean, across all concurrent requests)
Transfer rate: 189.13 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 97 100 1.8 100 110
Waiting: 86 90 1.7 89 98
Total: 97 100 1.8 100 110
before-apc
Requests per second: 10.28 [#/sec] (mean)
Time per request: 97.304 [ms] (mean)
Time per request: 97.304 [ms] (mean, across all concurrent requests)
Transfer rate: 195.51 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 91 97 3.6 97 152
Waiting: 83 86 3.4 86 138
Total: 91 97 3.6 97 153
after
Requests per second: 10.63 [#/sec] (mean)
Time per request: 94.053 [ms] (mean)
Time per request: 94.053 [ms] (mean, across all concurrent requests)
Transfer rate: 202.65 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 89 94 1.9 94 120
Waiting: 79 84 1.8 84 111
Total: 90 94 1.9 94 120
6.4% faster out of the box. 3.3% faster if you had manually enabled APC.
Bootstrap/routing/route access checking performance
Here, we measure with the SmartCache patch applied, which reduces the time spent in rendering significantly, thus mostly leaving the bootstrap/routing/route access checking performance.
before
is omitted, onlybefore-apc
is tested, because we only care about actual performance gains in bootstrap/routing/route access checking, if any.We use [#2429617-50] for
before
, and [#2429617-44] forafter
.Front page + SmartCache (auth)
before-apc
Requests per second: 22.63 [#/sec] (mean)
Time per request: 44.186 [ms] (mean)
Time per request: 44.186 [ms] (mean, across all concurrent requests)
Transfer rate: 292.29 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 41 44 1.1 44 57
Waiting: 37 39 1.1 38 51
Total: 41 44 1.1 44 57
after
Requests per second: 22.60 [#/sec] (mean)
Time per request: 44.246 [ms] (mean)
Time per request: 44.246 [ms] (mean, across all concurrent requests)
Transfer rate: 291.78 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 41 44 1.1 44 50
Waiting: 36 39 1.0 39 45
Total: 41 44 1.1 44 50
Unchanged.
/node/1
+ SmartCache (auth)before-apc
Requests per second: 13.70 [#/sec] (mean)
Time per request: 73.003 [ms] (mean)
Time per request: 73.003 [ms] (mean, across all concurrent requests)
Transfer rate: 260.45 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 68 73 1.4 72 81
Waiting: 62 65 1.4 64 73
Total: 68 73 1.4 73 81
after
Requests per second: 13.77 [#/sec] (mean)
Time per request: 72.637 [ms] (mean)
Time per request: 72.637 [ms] (mean, across all concurrent requests)
Transfer rate: 262.30 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 70 72 1.3 72 80
Waiting: 62 64 1.3 64 72
Total: 70 73 1.3 72 80
Unchanged.
Cold cache performance
See Fabianx' comment above.