Performance of Behat tests

Posted by greggles on March 11, 2015 at 3:16pm

When we started out using behat, things were great. Tests were easy to write and ran quickly. And then...over time, we got up to tests that took an hour to run. Which is too long.

We tried a few strategies:

Throwing more hardware at the problem - this cut test speed in half, but was constrained by CPU speed.
Reducing steps in afterScenario - we had placed a handful of different test cleanup steps into the afterScenario and those ran every time, even if they weren't needed. Optimizing those cut 30% off the test time for us.
Using single drush commands instead of multiple - wherever we had 2 or more steps with "when I run drush..." right after each other we could save a fair bit of time combining those into just 1 step. But the benefit from those changes was limited to how often that specific sequence occurred which was not super often.
Combining two scenarios - Many of our tests involved 5 setup steps and then 2 things that were actually being tested. We saved time by taking 3 different scenarios that had the same 5 setup steps and turning them into one scenario so that the 5 setup steps would be run once.

Things that didn't work:

changing our api_driver from "drush" to "drupal" (or vice versa) - in spite of docs saying one is faster, this didn't make a noticeable change for us
increasing hardware...beyond a point - having more cpus/ram/disk speed didn't help beyond a point, we weren't taking advantage of it
we spent a bit of time trying to create a home-grown parallelizing of the test runs and then combine the junit xml files, but once we upsized the hardware and fixed the afterscenario issues, spending more time on this didn't seem worthwhile. We didn't try out the parallelRunner which, it seems, has been discussed here a bit before.

Things we're still working on and/or considering:

Running the automated tests on feature branches before they merge - we currently run them automatically when a feature branch is merged to our integration branch. This is easier to setup, but runs the risk of merging something that won't pass and then you either revert or have to quickly fix, neither of which is fun. If tests were run on feature branches then the amount of time it takes to run them would be slightly less important.

So...
* What have other people found?
* How long do tests take to run?
* At what point do you start looking to optimize them?
* What strategies have worked well to optimize your tests?

Comments

Test your domain logic not Drupal

Posted by jpd4nt on March 11, 2015 at 3:31pm

The point of testing is to cover you code you have written, as that is what you have the most control over.

Drupal is bad for this due to the way it works with a database, but also browser testing is slow as well. So try to keep you BDD tests just testing your code, not the Drupal system.

Konstantin did a talk which was quite a good explanation on why he dropped parallel runners:
http://www.meetup.com/BDDLondon/events/181454232/
They did film it, so not sure where that has gone.

For us, system tests and smoke tests are slow but we only spend 5mins doing it. Rest is all in code so very quick. Our full build (inc deploy to staging) is about 35mins.

What if I told you that this

Posted by greggles on March 11, 2015 at 9:02pm

What if I told you that this is only testing our domain logic? ;)

That event does look right on topic! I hope the video will get posted.

knaddison blog | Morris Animal Foundation

The basics of the talk was to

Posted by jpd4nt on March 12, 2015 at 12:24am

The basics of the talk was to mock out dependencies and strip it all back.

So I guess you have to look at what are in your setup steps to see if you really need them, otherwise all you can do is throw kit at it.

We try to avoid real database calls, huge speed up, and when we do its running of a ram disk with most of the safeties off (we don't care about the data).

Again you don't say how big your test suite is, what you are trying to solve with it.

I still feel that you are still testing more that just your domain logic, as then its just php code and should not have any dependencies that will not take milliseconds to reset (just dump the memory).

Great points. We have several

Posted by greggles on March 13, 2015 at 2:09pm

Great points. We have several tests that interact with web apis and those are definitely slow. Several of them we've mocked the responses and indeed it's a big improvement. Mocking the DB interaction would be another avenue that I could definitely see helping us.

In terms of "how big", I'm not sure how to quantify that since "Given I visit /" is much simpler than "Given all data is setup for the data synch". But...some easy stats:

31 .feature files
93 scenarios
1710 steps (i.e. any line that begins with given, then, then and)

knaddison blog | Morris Animal Foundation

This is something that Panopoly has been struggling with...

Posted by dsnopek on March 12, 2015 at 1:46pm

This is something that Panopoly has been struggling with too, especially recently. We run our tests on Travis-CI, which has a 50 minute limit on test runs. So, we tend to go through cycles like: (a) tests take 20-30 minutes to run, (b) we add more tests, so they're now hitting the 50 minute limit, and can't add more tests, (c) we optimize something and return to (a). :-)

Since we're on Travis-CI, we've got some unique problems and limitations. For example, we can't throw hardware at the problem! We're stuck with what Travis-CI gives us. Also, Travis-CI rebuilds the environment in which the tests are run everytime, so we've got to deal with both setup time and it's a lot harder to optimize the environment the tests run in, since we can't just SSH in and tweak settings.

Here's a meta-issue that we've been using for the most recent round of optimizations:

https://www.drupal.org/node/2437927

To sort of summarize the types of things we've been trying so far:

Optimizing the performance of the Drupal site in the normal ways. So, enabling APC, tweeking MySQL settings, enabling CSS/JS aggregation, etc.
Profiling/optimizing our Behat steps and @afterStep, @afterScenario, etc. The last round of this reduced the test run time by about 10 minutes: https://www.drupal.org/node/2447839. We also recently had a Javascript error in WYSIWYG (unrelated to what we were testing) cause one of our tests to take an extra 15 minutes: https://www.drupal.org/node/2449495. These sort of inefficiencies sneak in over time, and unfortunately, we don't take the time to really dig into them until we hit that 50 minute limit.
"Focusing" the tests on just what needs testing, and making "Given" steps for the rest. It's clicking through forms and stuff in Selenium that takes the most time for us. If we are testing using something or editing it, we don't need the test to step through the creation of the thing - we can bake that into a "Given" step and only test the actual creation one time.

In response to something you wrote above:

wherever we had 2 or more steps with "when I run drush..." right after each other we could save a fair bit of time combining those into just 1 step.

... and then:

changing our api_driver from "drush" to "drupal" (or vice versa) - in spite of docs saying one is faster, this didn't make a noticeable change for us

So, the reason two drush's takes so long, is because it's bootstrapping Drupal twice. If you change the api_drive to "drupal" and then it'll bootstrap Drupal once per Scenario, directly inside of Behat. So, if you do what you would do with drush directly in a "Given" step in your FeatureContext (because you now have access to the full Drupal API, in a fully bootstrapped Drupal site), it should definitely make your tests run faster!

Of course, this isn't just switching the api_driver - you also have to stop calling drush directly in your tests and write some custom "Given" steps. But it really sounds like you could make some performance gains there.

Things I'd like to experiment with in Panopoly in the future:

Finding ways to test what we need to test without using Selenium. Unfortunately, Panopoly's whole experience depends on lots of Javascript magic (via Panels and CTools). So, even though what we're testing is just a Drupal form, you need to do a bunch of Javascript stuff for that form to even appear (ie. in a CTools modal). Maybe we could find some way to expose the form (just for the tests) without doing the Javascript stuff, and then run those tests with Goutte?
Breaking up the tests in smaller units that can be tested independently. Panopoly is made up a bunch of Feature modules. We're currently testing them all together as the whole distribution. But if we can decouple the modules and tests enough to run them independently, then we can run just the tests that exercise "panopoly_admin", for example, and get much shorter test runs with the same feedback to the developer.

I'm interested to hear what other ideas other folks have to increase their test performance!

Wow, awesome insights and

Posted by greggles on March 13, 2015 at 2:11pm

Wow, awesome insights and feedback. Thanks, dsnopek!

Your advice on moving multiple steps into a single given matches our experience for sure. We've mostly done that for brevity in writing tests. There are probably a few places with ~20 steps across 3 web pages that could be replaced by an API call or two inside of a single custom step which no doubt would be faster.

Your explanation of using the Drupal driver to avoid bootstrapping Drupal repeatedly makes a ton of sense as well.

I think your additional ideas about using a faster driver and breaking tests apart are definitely interesting. We've written several simpletests that are contributed to modules on drupal.org to offload testing there where possible. That's an easy win :)

knaddison blog | Morris Animal Foundation

I got curious about the

Posted by greggles on June 10, 2015 at 10:57pm

I got curious about the potential impact of this change and finally did a test. We have a drush command that we call often to create a clean test environment. We also have a step defined in our FeatureContext.php that directly calls the same function. I wrote a scenario that does the drush command 21 times and a scenario that does the step 21 times.

So, they look like:

  @api @drupaldriverz
  Scenario: Test drupal driver
    Given a clean test environment
    And a clean test environment
...

And:

  @api @drushalushlush
  Scenario: Run a drush command
    Given I run drush "beginners-mind" "-y"
    And I run drush "beginners-mind" "-y"
...

And then when I run them:

root@7d7a0f7664a5:/var/www/card/sites/all/tests# bin/behat --tags @drushalushlush --format=progress
.....................

1 scenario (1 passed)
21 steps (21 passed)
0m13.231s
root@7d7a0f7664a5:/var/www/card/sites/all/tests# bin/behat --tags @drupaldriverz --format=progress
.....................

1 scenario (1 passed)
21 steps (21 passed)
0m0.519s
root@7d7a0f7664a5:/var/www/card/sites/all/tests#

So if you have a lot of "I run drush ..." and those things don't absolutely need to be drush commands, it's a lot faster to call them from a custom step definition.

knaddison blog | Morris Animal Foundation

Cool! Thanks for sharing your

Posted by dsnopek on June 11, 2015 at 2:03am

Cool! Thanks for sharing your profiling results!

SQLite

Posted by marcus_clements on May 20, 2015 at 9:37am

I've just finished a 9 month Laravel project where we used Behat extensively for functional tests. After mocking all APIs our test suite was still taking 30 mins to run. I switched the DB to SQLite and the execution time dropped to less than 4 minutes.
The application was much less DB intensive than Drupal, so I'm not sure how it will pan out using SQLIte as the Drupal DB in testing.

I've just started a Drupal 7 project which needs BDD tests so I'll try SQLite in the next few days and report back.

Contrib modules vs sqlite

Posted by jpd4nt on May 20, 2015 at 1:38pm

We tried to use sqlite on a ram drive to speed things up but we ran into too many contrib modules that only worked with MySQL.

I am hoping when the Drupal test bot is updated to cover all the supported databases this can be started to be addressed.

Did you try running just some

Posted by greggles on May 20, 2015 at 7:43pm

Did you try running just some tests on core with sqlite? I have a vague memory that people tried using sqlite as a testbot database and it slowed the testing process. Would be good to have a more recent confirmation of that theory.

knaddison blog | Morris Animal Foundation

Yes

Posted by jpd4nt on May 21, 2015 at 9:34am

I did start off with core tests to makes sure the test harness worked.

I did not see any noticeable slow downs by using sqlite, main speed up was not having to set up the database as its quick since its just a file.

Login performance

Posted by jhedstrom on October 7, 2015 at 6:23pm

It was discussed during a BOF in Barcelona, that the current method of checking for a logged in user results in several unnecessary bootstraps since it is just checking for markup in a rendered page (eg, the logout link).

The proposed resolution here is to a) log the user in directly via session manipulation, and then b) use that session to check if a user is logged in or not. There is a somewhat abandoned PR here: https://github.com/jhedstrom/drupalextension/pull/131 and it might make sense to move much of that into the DrupalDrivers.

Performance of Behat tests

We tried a few strategies:

Things that didn't work:

Things we're still working on and/or considering:

Comments

Test your domain logic not Drupal

What if I told you that this

The basics of the talk was to

Great points. We have several

This is something that Panopoly has been struggling with...

Wow, awesome insights and

I got curious about the

Cool! Thanks for sharing your

SQLite

Contrib modules vs sqlite

Did you try running just some

Yes

Login performance

Behat

Group organizers

New groups

Group notifications

Hot content this week