D8 Accelerate Performance Sprint at Drupal Developer Days in Montpellier, France

webchick's picture
Start: 
2015-04-13 (All day) - 2015-04-19 (All day) Europe/Paris
Organizers: 
Event type: 
Sprint

The goal of this sprint is to uncover the "unknown unknowns" blocking Drupal 8 from shipping, performance-wise. The main focus will be on profiling Drupal 8 under a variety of scenarios to determine the next places to start optimizing, then creating issues for these things so that a wider group of folks can work on them.

We're specifically looking for two types of sprinters:

  1. Finders: Those with profiling experience, using tools like XHProf, Blackfire, and flame graphs, who can analyze profiling data to flag problems and come up with recommendations on optimization targets. Your goal is to keep drilling down until we get to the bottom of what's making Drupal 8 slow.
  2. Fixers: Those who can both work on existing known performance issues, as well as take direction from the first group and create/review patches to improve areas of performance. We're especially looking for those with experience with things like the routing system, bootstrapping, container services, route access checking, and asset handling.

The Sprint Planning Doc has everything you need to know, including stuff to do before the sprint and a nice list of all the outstanding critical issues to tackle.

If you're planning to attend (remote attendees welcome!) fill your details in the DevDays Sprint Attendance sheet.

Hope to see you there!

Comments

Update from the sprint: Monday, April 13

Wim Leers's picture

Update from the sprint: Tuesday, April 14

Wim Leers's picture

Today, we really got up to speed!

On the page cache front

(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)

On fixing performance issues other than those related to the page cache

  • dpovshed has been profiling the asset handling since yesterday, and has made lots of progress:
  • fgm was ALL over all the class loading-related issues:
  • m1r1k got https://www.drupal.org/node/2458763 to a green patch, and he's quickly getting close to an RTBC!
  • likin is rerolling https://www.drupal.org/node/2368987, znerol and Wim Leers did benchmarking, it is almost RTBC! That will make Drupal 8's internal page cache ~20% faster!
  • Yesterday, plach, Fabianx, dawehner and Wim Leers discussed next steps, Fabianx already rolled an initial version. plach is pushing it further. The current numbers: table views with 50 fields are 55% faster. Fabianx and plach are sprinting along from home!

On finding more performance issues

  • dawehner, amateescu, berdir, yched, znerol, pwolanin and others did a profiling deep dive. The goal was to find performance areas outside of rendering performance (where we already know how to move forward). We specifically focused on routing and the access checking that happens as part of routing. We were unable to identify any significant parts that we can speed up significantly. After hours of analyzing, we concluded that the only way forward was to not look for low-hanging or even high-hanging fruit, but to just make many small incremental improvements.
  • See https://www.drupal.org/node/2470679#comment-9823490 for all relevant issues: for all of the things we have identified as being improvements or worthy of investigation, we've resurfaced existing issues, or opened new ones.
    Several "novice profiling" issues are there too, which should be a good fit for people new at profiling, but still have a nice potential :)
  • At the dinner table, Alex Pott made an excellent point: we can dump many more things into PHP. We already have a compiled container. There's no reason why we can't also do this for loading classes (class map, see above), routing information, asset library information, and even compile an entity type class with all its handler classes into a single PHP file.
    Food for thought for the next days.

Update from the sprint: Wednesday, April 15

Wim Leers's picture

Still steaming ahead. Progress on many, many fronts at once. 4 patches committed, 6 RTBC, and several more approaching RTBC.

On the page cache front

(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)

  • mr.baileys and jan.stoeckler got "max-age on HTML responses wrongly set to `max-age=0, private` instead of `max-age=N, public` (breaks reverse proxies and client-side caching)" (https://www.drupal.org/node/2467041) committed!
  • mr.baileys did the above as part of working on "Bubbling of elements' max-age to the page's headers and the page cache" (https://www.drupal.org/node/2352009), which was determined to be blocked on other work.
  • swentel and borisson_ fixed the test failures for "Search results should bubble rendered entity cache tags and set list cache tags" (https://www.drupal.org/node/2464409), it is now approaching RTBC
  • borisson_ and swentel got "Adding/updating interface translations should invalidate page & render caches" (https://www.drupal.org/node/2428837) to RTBC!
  • klausi & pwolanin discovered and fixed "REST responses should have proper cache tags" (https://www.drupal.org/node/2471473) in the same day!
  • GoZ got "File/Image field formatters don't add a cache tag for the file they display" (https://www.drupal.org/node/2388023) to RTBC!

On fixing performance issues other than those related to the page cache

On finding more performance issues

We didn't do further profiling today. dawehner opened https://www.drupal.org/node/2470926 and https://www.drupal.org/node/2471657 also based on yesterday's profiling. Both of those have already been picked up by a contributor at the sprint!

EDIT: I stupidly forgot penyaskito's issue. Ping me if I forgot to mention another issue or your name! So many things going on, that it's easy to miss something!

Update from the sprint: Thursday, April 16

Wim Leers's picture

Still steaming ahead. Progress on many, many fronts at once. 5 patches committed, 5 RTBC, and several more approaching RTBC.

On the page cache front

(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)

Enormous progress here!

  • borisson_ continued pushing "Search results should bubble rendered entity cache tags and set list cache tags" (https://www.drupal.org/node/2464409), forward, it is now *this* close to RTBC!
  • GoZ got "File/Image field formatters don't add a cache tag for the file they display" (https://www.drupal.org/node/2388023) committed!
  • borisson_ and swentel got "Adding/updating interface translations should invalidate page & render caches" (https://www.drupal.org/node/2428837) back to RTBC, after it was pushed back by a core committer.
  • pwolanin and klausi discovered "404/403 responses for non-existing nodes are cached in Page Cache/reverse proxy, are not invalidated when the node is created" yesterday, I filed it today, and pwolanin got a patch going and committed today!
  • amateescu started working on "Non-declarative (configurable) routes must be able to associate cache tags (was: Views access plugins alter route's access requirements, altered routes must have associated cache tags)" (https://www.drupal.org/node/2464657) this morning, and already got it to RTBC!

So, in other words, almost all known remaining ways to break the page cache have already been fixed, or are RTBC!

On fixing performance issues other than those related to the page cache

On finding more performance issues

Update from the sprint: Friday, April 17

Wim Leers's picture

Still steaming ahead. Progress on many, many fronts at once. 4 patches committed, 1 RTBC, and several more approaching RTBC.

On the page cache front

(i.e. for closing the https://www.drupal.org/node/2467071 meta issue.)

  • borisson_ and swentel got "Adding/updating interface translations should invalidate page & render caches" (https://www.drupal.org/node/2428837) committed!
  • Arla pushed "Drupal 8 only allows one user every 6 hours to register when page caching is enabled — caused by entity UUID in form state" (https://www.drupal.org/node/2465053) forward, a very tricky issue, with a bug that was discovered thanks to page caching now being enabled by default.

Note: "Move internal page caching to a module to avoid relying on config get on runtime" (https://www.drupal.org/node/2368987) was committed (listed below), which made Drupal 8's page cache significantly faster! And "All stack middlewares are constructed at the same time even for cached pages" (https://www.drupal.org/node/2473113) has the potential to make it faster still!

On fixing performance issues other than those related to the page cache

On finding more performance issues

We didn't do further profiling today.

Post-most-of-the-sprint profiling

Wim Leers's picture

THIS IS NOW OUTDATED, SEE COMMENT BELOW!
Monday vs. Saturday.

Test site is a fresh install of D8, with one node that contains an image and 3 terms. PHP 5.5.11, OpCache on.

We test both out-of-the-box performance (the "noapcMON" prefix, which stands for "no APC classloader, Monday's HEAD) and the performance with the APC classloader manually enabled for Monday's HEAD, because that is now enabled by default (the "apcMON" prefix).


Everything is tested with authenticated user 2, unless otherwise specified. Whenever SmartCache is used, we use [#2429617-50] for Saturday, and [#2429617-44] for Monday.

Page cache (anon)

(ab -c 1 -n 1000 http://host/)

  • Monday: 8.2 ms/request, 122 requests/s
  • Saturday: 6.5 ms/request, 154 requests/s

21% faster.

First, Monday's HEAD versus Saturday's, out of the box

Front page
Run #noapcMON_uid2_frontpage Run #SAT_uid2_frontpage Diff Diff%
Number of Function Calls 50,401 42,805 -7,596 -15.1%
Incl. Wall Time (microsec) 115,684 107,829 -7,855 -6.8%
Incl. MemUse (bytes) 19,902,800 19,879,976 -22,824 -0.1%
Incl. PeakMemUse (bytes) 20,059,608 20,037,960 -21,648 -0.1%
/node/1
Run #noapcMON_uid2_node1 Run #SAT_uid2_node1 Diff Diff%
Number of Function Calls 68,984 62,432 -6,552 -9.5%
Incl. Wall Time (microsec) 152,125 141,233 -10,892 -7.2%
Incl. MemUse (bytes) 22,170,216 22,102,152 -68,064 -0.3%
Incl. PeakMemUse (bytes) 22,345,368 22,277,408 -67,960 -0.3%
Front page + SmartCache
Run #noapcMON_uid2_frontpage_smartcache Run #SAT_uid2_frontpage_smartcache Diff Diff%
Number of Function Calls 24,166 20,547 -3,619 -15.0%
Incl. Wall Time (microsec) 61,401 57,484 -3,917 -6.4%
Incl. MemUse (bytes) 11,795,520 11,766,952 -28,568 -0.2%
Incl. PeakMemUse (bytes) 11,847,136 11,819,488 -27,648 -0.2%
/node/1 + SmartCache
Run #noapcMON_uid2_node1_smartcache Run #SAT_uid2_node1_smartcache Diff Diff%
Number of Function Calls 50,713 45,061 -5,652 -11.1%
Incl. Wall Time (microsec) 112,181 106,017 -6,164 -5.5%
Incl. MemUse (bytes) 17,929,200 17,906,960 -22,240 -0.1%
Incl. PeakMemUse (bytes) 18,018,312 17,996,720 -21,592 -0.1%

Second, Monday's HEAD versus Saturday's, with the APC classloader manually enabled for Monday

Front page
Run #apcMON_uid2_frontpage Run #SAT_uid2_frontpage Diff Diff%
Number of Function Calls 41,837 42,805 968 2.3%
Incl. Wall Time (microsec) 106,594 107,829 1,235 1.2%
Incl. MemUse (bytes) 19,565,368 19,879,976 314,608 1.6%
Incl. PeakMemUse (bytes) 19,721,856 20,037,960 316,104 1.6%
/node/1
Run #apcMON_uid2_node1 Run #SAT_uid2_node1 Diff Diff%
Number of Function Calls 61,364 62,432 1,068 1.7%
Incl. Wall Time (microsec) 139,801 141,233 1,432 1.0%
Incl. MemUse (bytes) 21,938,992 22,102,152 163,160 0.7%
Incl. PeakMemUse (bytes) 22,114,760 22,277,408 162,648 0.7%
Front page + SmartCache
Run #apcMON_uid2_frontpage_smartcache Run #SAT_uid2_frontpage_smartcache Diff Diff%
Number of Function Calls 19,784 20,547 763 3.9%
Incl. Wall Time (microsec) 57,131 57,484 353 0.6%
Incl. MemUse (bytes) 11,850,104 11,766,952 -83,152 -0.7%
Incl. PeakMemUse (bytes) 11,901,624 11,819,488 -82,136 -0.7%
/node/1 + SmartCache
Run #apcMON_uid2_node1_smartcache Run #SAT_uid2_node1_smartcache Diff Diff%
Number of Function Calls 44,275 45,061 786 1.8%
Incl. Wall Time (microsec) 106,685 106,017 -668 -0.6%
Incl. MemUse (bytes) 17,982,224 17,906,960 -75,264 -0.4%
Incl. PeakMemUse (bytes) 18,071,824 17,996,720 -75,104 -0.4%

Fixing performance regressions

Fabianx's picture

We found 2 performance regressions that are not part of the above benchmarks:

With both applied we save:

-144 function calls
-0.5%

( @todo Add final benchmark numbers )

Also cold cache performance is much better with Yaml File Cache now:

Monday: 5.7-6.5 seconds
Saturday+Plus Some: 3.9 - 4.5 seconds

Hence saving on average 30% after a 'drush cr'.

Performance before vs. after sprint

Wim Leers's picture

The profiling I did above was flawed: I was selecting the fastest out of 10 runs, due to time constraints. That lead to unreliable numbers. Especially because it meant rendering each page (and the XHProf results page) in the browser.

I redid that, this time using benchmarking, which doesn't suffer from variability quite as much.

Before vs. After

  • Before = 700499c2598122c973909799e18c118661bd136e
  • After = 5f987ee0acfb7843c4050a2f7b8ff9da274945b3

Test site is a fresh install of D8, with one node that contains an image and 3 terms. PHP 5.5.11, OpCache on.

We test both out-of-the-box performance (before, which stands for "no APC classloader", which was the default before the sprint) and the performance with the APC classloader manually enabled for HEAD before the sprint, because that is now enabled by default (before-apc).

Everything is tested with authenticated user 2, unless otherwise specified.

Before each test, Apache is restarted. Benchmarked using ab -n 1000 -c 1 -C <sessioncookie> <URL>. If the standard deviation looks suspiciously high, thus pointing to other apps on my development laptop taking up too much CPU or I/O time, then I re-ran it. Usually it took 2 to 3 runs.

Percentage deltas in the conclusions are taken by looking at the mean, not the median, because the median does not have more granularity than the millisecond level. Comparing the differences in mean is acceptable here because the standard deviation is always low.


Observed performance

Page cache (anon)

(ab -c 1 -n 1000 http://host/)

before
Requests per second:    117.77 [#/sec] (mean)
Time per request:       8.491 [ms] (mean)
Time per request:       8.491 [ms] (mean, across all concurrent requests)
Transfer rate:          1498.69 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     7    8   0.5      8      12
Waiting:        6    8   0.5      8      11
Total:          7    8   0.5      9      12
before-apc
Requests per second:    121.77 [#/sec] (mean)
Time per request:       8.212 [ms] (mean)
Time per request:       8.212 [ms] (mean, across all concurrent requests)
Transfer rate:          1549.66 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     7    8   0.5      8      12
Waiting:        6    7   0.5      7      12
Total:          7    8   0.5      8      13
after
Requests per second:    154.76 [#/sec] (mean)
Time per request:       6.462 [ms] (mean)
Time per request:       6.462 [ms] (mean, across all concurrent requests)
Transfer rate:          1969.06 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:     5    6   0.6      6      21
Waiting:        5    6   0.4      6       8
Total:          5    6   0.6      6      21
Conclusion
24% faster out of the box. 21% faster if you had manually enabled APC.

Front page (auth)

before
Requests per second:    11.75 [#/sec] (mean)
Time per request:       85.076 [ms] (mean)
Time per request:       85.076 [ms] (mean, across all concurrent requests)
Transfer rate:          151.58 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    82   85   1.5     85      91
Waiting:       73   75   1.5     75      83
Total:         82   85   1.5     85      92
before-apc
Requests per second:    12.30 [#/sec] (mean)
Time per request:       81.273 [ms] (mean)
Time per request:       81.273 [ms] (mean, across all concurrent requests)
Transfer rate:          158.91 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    78   81   1.3     81      89
Waiting:       69   72   1.3     72      80
Total:         79   81   1.3     81      89
after
Requests per second:    12.66 [#/sec] (mean)
Time per request:       78.984 [ms] (mean)
Time per request:       78.984 [ms] (mean, across all concurrent requests)
Transfer rate:          163.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       1
Processing:    75   79   1.7     79      89
Waiting:       66   70   1.6     70      80
Total:         75   79   1.7     79      89
Conclusion

7% faster out of the box. 2.5% faster if you had manually enabled APC.

/node/1 (auth)

before
Requests per second:    9.95 [#/sec] (mean)
Time per request:       100.480 [ms] (mean)
Time per request:       100.480 [ms] (mean, across all concurrent requests)
Transfer rate:          189.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    97  100   1.8    100     110
Waiting:       86   90   1.7     89      98
Total:         97  100   1.8    100     110
before-apc
Requests per second:    10.28 [#/sec] (mean)
Time per request:       97.304 [ms] (mean)
Time per request:       97.304 [ms] (mean, across all concurrent requests)
Transfer rate:          195.51 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    91   97   3.6     97     152
Waiting:       83   86   3.4     86     138
Total:         91   97   3.6     97     153
after
Requests per second:    10.63 [#/sec] (mean)
Time per request:       94.053 [ms] (mean)
Time per request:       94.053 [ms] (mean, across all concurrent requests)
Transfer rate:          202.65 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    89   94   1.9     94     120
Waiting:       79   84   1.8     84     111
Total:         90   94   1.9     94     120
Conclusion

6.4% faster out of the box. 3.3% faster if you had manually enabled APC.

Bootstrap/routing/route access checking performance

Here, we measure with the SmartCache patch applied, which reduces the time spent in rendering significantly, thus mostly leaving the bootstrap/routing/route access checking performance. before is omitted, only before-apc is tested, because we only care about actual performance gains in bootstrap/routing/route access checking, if any.

We use [#2429617-50] for before, and [#2429617-44] for after.

Front page + SmartCache (auth)

before-apc
Requests per second:    22.63 [#/sec] (mean)
Time per request:       44.186 [ms] (mean)
Time per request:       44.186 [ms] (mean, across all concurrent requests)
Transfer rate:          292.29 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    41   44   1.1     44      57
Waiting:       37   39   1.1     38      51
Total:         41   44   1.1     44      57
after
Requests per second:    22.60 [#/sec] (mean)
Time per request:       44.246 [ms] (mean)
Time per request:       44.246 [ms] (mean, across all concurrent requests)
Transfer rate:          291.78 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    41   44   1.1     44      50
Waiting:       36   39   1.0     39      45
Total:         41   44   1.1     44      50
Conclusion

Unchanged.

/node/1 + SmartCache (auth)

before-apc
Requests per second:    13.70 [#/sec] (mean)
Time per request:       73.003 [ms] (mean)
Time per request:       73.003 [ms] (mean, across all concurrent requests)
Transfer rate:          260.45 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    68   73   1.4     72      81
Waiting:       62   65   1.4     64      73
Total:         68   73   1.4     73      81
after
Requests per second:    13.77 [#/sec] (mean)
Time per request:       72.637 [ms] (mean)
Time per request:       72.637 [ms] (mean, across all concurrent requests)
Transfer rate:          262.30 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:    70   72   1.3     72      80
Waiting:       62   64   1.3     64      72
Total:         70   73   1.3     72      80
Conclusion

Unchanged.

Cold cache performance

See Fabianx' comment above.

France

Group categories

Chantiers

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

Hot content this week