Tests that fail by design

Posted by catch on June 6, 2008 at 12:14pm

We're down to under 60 core test failures, and there's at least four RTBC patches in the queue which will reduce that further, meaning we'll be down to 5-6 unresolved issues dealing with core test patch failures pretty soon.

At the moment, there's only one test that fails by design - for the url filter. However that core bug is likely to get fixed before all tests are working again, so it's unlikely to impact on much.

However, once we do have all tests working, and some coverage gaps filled, then there's bound to be some tests we'll want to write that fail by design - when just running all tests, it's almost impossible to tell that this is the case though, so I'm wondering how best to deal with it.

We could put a note in code comments, or maybe in the assertion response text, link to an issue nid for the bug (that could be left in as documentation for the test even when it's fixed). Either way I'd be interested to see what people think about this, and different ways to handle it. I'm also not sure how it'll play with the DrupalTestBed which I guess assumes 100% pass rate?

Comments

There should always be a

Posted by alpritt on June 6, 2008 at 12:40pm

There should always be a 100% pass rate. If you write a test that uncovers a bug, then the test should be committed at the same time as the bug fix.

-
www.alanpritt.com

hmm, makes sense

Posted by catch on June 6, 2008 at 2:08pm

I guess for something like #6162 that took years to fix, you could have a patch in the queue for the test, but not commit it until it goes in with the bug fix.

I am with alpritt and don't

Posted by starbow on June 6, 2008 at 10:06pm

I am with alpritt and don't get the "fail by design" idea. A test should succeed. If it fails, either it or the code it tests need to be rewritten. You can do test-first development, but once a test gets committed to core, that should signify that the absence of the feature it is testing for is a critical issue.

At least that is the way I would set it up.

Definitely the code it tests

Posted by catch on June 6, 2008 at 10:52pm

Definitely the code it tests should be a critical bug - it's whether we knowingly commit failing tests before there's a working patch fixing the bug that I'm asking, and how to make it clear that's what we're doing if so.

Consider a potential workflow

Posted by IncrediblyKenzi on June 6, 2008 at 11:05pm

We've used this internally on projects that use unit tests (not necessarily drupal, but it's relevant here):

A developer submits an issue report and attaches a unit test to reproduce the issue
Another developer acknowledges the issue and pulls the attached unit test to reproduce locally
Unit test fails, developer creates fixes
Unit tests pass, developer creates a patch and submits it back along with the unit test that the patch addresses
patch is applied, that and the unit test are both committed.

With that in mind, I think alpritt's comment about a 100% success rate are important, so long as two things are maintained (and this requires developer diligence):

only working code is committed to revision control
failing unit tests are NEVER commented out to satisfy #1

It occurs to me that there's another case that we're not covering here: what if the tests pass on the developer's system, but for whatever reason after a commit and successive checkout on another developer's box, the tests start failing again. I assume that part of this is handled by the patch review process, and patches will never go into RTBC until the results are validated. Am I correct about that?

well...

Posted by catch on June 6, 2008 at 11:14pm

upload.test currently passes on at least three platforms, but fails on one: http://drupal.org/node/267683 I don't know how we can get around situations like that. Hopefully they'll be pretty rare though.

One thought that comes to

Posted by IncrediblyKenzi on June 6, 2008 at 11:57pm

One thought that comes to mind is automating this kind of test in a CI environment (hudson comes to mind), but I think that might involve a discussion with the folks at osu. I know Selenium-RC supports running its own tests against target platforms, but how to wrap it in a way that makes sense, and how to generate some sanity at the user level somewhat escapes me.

Is this a line of thought worth pursuing?

It's good to have tests that fail

Posted by cwgordon7 on June 7, 2008 at 12:49am

Having tests that fail by design is useful for showing that it's actually possible for tests to fail —wouldn't it be awful if all tests were passing no matter what the results because of a bug in SimpleTest? However, I understand that there is a difficulty with DrupalTestBed —perhaps we should put tests "designed to fail" in a separate test case so it's possible to not run them easily? In any case, having tests that just fail ("Make sure failing works!") would be a good design practice. Hopefully we won't get to the point where we need to have tests that just pass ("Make sure passing works!") ;)

That sounds like a plan.

Posted by catch on June 7, 2008 at 1:03am

That sounds like a plan. Like a 'known issues' test case. And get DrupalTestBed to ignore it. Then when one of those tests pass, we make sure it's moved. Certainly I can think of core bugs where there's a relatively straightforward test could be written but fixing the bug could take a lot more effort.

Er. Am I missing something?

Posted by webchick on June 7, 2008 at 3:10am

Why can't your tests pass even if they're checking for failures?

$node = node_load('banana');
$this->assertFalse($node, t('banana is not a valid node ID'));

A little bit yeah ;)

Posted by catch on June 7, 2008 at 10:04am

OK here's an example.

http://drupal.org/node/206820 - for a number of years, we've been leaving orphan nodes in the database when deleting a forum - now the (last minute string freeze) help text reflects this although the underlying code hasn't changed.

Assume that this wasn't a known issue - if we want full test coverage, then ideally someone would have come to write a test for deleting a forum, and check that nodes were deleted alongside it. Do we commit the test, or do we keep it in the patch queue until it's committed along with the bug fix?

Additionally, the url filter has also been broken for some time, and a patch was committed which shows the breakage - before the working patch (which is still RTBC in the queue). Afaik this is the first test committed which breaks by design.

In both cases, the test is perfectly good, but the code is broke in some way. With something like http://drupal.org/node/6162 we knew it was broke for a couple of years before it got fixed - had that bug arisen now, we could've had a test for it - my question is what happens to the test if there's not an immediate fix.

One advantage is that if a known bug has a test available to everyone, in core, it might save duplicate issues - and certainly it'd make it more obvious which issues are stale when their tests start working.

For example, PHP has failures in their tests - presumably they've been doing this longer than us.

Because...

Posted by cwgordon7 on June 7, 2008 at 4:39pm

Because say banana was, by mistake, a valid node id. $this->assertFalse($node...) should fail. However, say there was a bug in SimpleTest that causes this to pass anyway (e.g. nothing fails). Then we would never know if anything was failing, completely defeating the purpose of automated testing. :)

If you look at the original SimpleTest library, you will see that they included a test case to test that passes, fails, exceptions, and errors worked. (And they got 1 pass, 1 fail, 1 exception, and 1 error). It's a good idea to have such tests around, just in case. :)

Tests to define solutions

Posted by floretan on June 8, 2008 at 11:05pm

To follow a test-driven approach to development, which is one of our goals, we need to write tests before writing code, and that implies that these tests will fail until the related issues are fixed. It seems that the question is really if those failing tests should be committed or simply be part of the patch.

IMO, we should be able to commit (or to have Dries commit) tests that fail by design, but we also need to keep any random test to be committed. The line should be drawn not only at the priority of the issue (critical issues only), but mainly at the clarity of the correct behavior. I don't believe that having the URL filter process URLs inside of HTML comments is really a critical issue, but I can write a simple test (no pun intended) that easily defines the correct behavior.

Tests that fail by design are not about pointing out the failure, but about outlining criteria for a correct solution so that nothing is missed when we actually have a solution. Did it ever happen that your comment at the beginning of a thread was forgotten by the time a patch got committed in comment #75?

A few concerns

Posted by alpritt on June 9, 2008 at 2:51am

If patches are committed by design they would certainly have to be given a separate status to those not broken by design. Otherwise we land ourselves in the muddle we are in currently, where we can't clearly see if the patch we have created breaks anything. The tests become less useful if we have to wade through lots of broken ones in order to find what breakages are pertinent to our particular patch. That will only lead to mistakes.

Still, I don't really see the point in any case. The suggestion here is basically to turn Drupal into part issue queue. But if you have the test patch in the issue for the bug you can keep track of things a lot easier; everything (comments, status and patches) is all in one place. Plus, it can be easier to write better tests once you've spent time working on fixing the bug because you learn that section of code better. It also avoids us going through two review and commit cycles, which would slow down development. Also five failed tests, does not mean five bugs; so I don't think it works well as a bug tracker in any case. Finally, we will end up with a hybrid of some broken tests being committed, and the majority still stuck in the issue queue. Confusion will arise.

The strategy should be simple: you find a bug issue, you check for related issues, you write a test. At that point another developer could take over, apply the patch, see exactly where the test fails (everything that breaks is related to this bug), fix the bug, runs all tests, rolls a patch. Simple. Perfect. Where's the problem?

The alternative: You find a bug issue, you check for related issues, you write a test. Another developer comes along reviews the test, it gets committed (remember getting reviewed is a huge bottleneck at the moment). Another developer hits the bug bingo link, finds the bug with a test already written, then loads up Drupal HEAD, and searches through the sea of failed tests for the specific tests that relate to this issue. Changes the appropriates tests' statuses so they are not part of the 'fail-by-design' section and runs them, and everything is green. Except for the test they didn't see in that big ol' sea of failing tests.

To sum up: -1 to creating a new haystack to lose our pins in.

-
www.alanpritt.com

I think this makes sense.

Posted by catch on June 9, 2008 at 9:05am

I think this makes sense. However I still think it'd be useful to be able to download a bunch of failing tests to run them without hunting through the issue queue. Maybe a contrib module?

Simpletest.test

Posted by boombatower on July 4, 2008 at 4:05pm

Since some of the comments delt with ensuring that SimpleTest can fail a test I would point you towrard simpletest.test which checks that SimpleTest output errors. Unless something extremely weird happens that test isn't going to pass if the framework breaks.

Having a collection of tests that fail and point out specific issues would be nice, but I don't think that should be committed to core.