We're down to under 60 core test failures, and there's at least four RTBC patches in the queue which will reduce that further, meaning we'll be down to 5-6 unresolved issues dealing with core test patch failures pretty soon.
At the moment, there's only one test that fails by design - for the url filter. However that core bug is likely to get fixed before all tests are working again, so it's unlikely to impact on much.
However, once we do have all tests working, and some coverage gaps filled, then there's bound to be some tests we'll want to write that fail by design - when just running all tests, it's almost impossible to tell that this is the case though, so I'm wondering how best to deal with it.
We could put a note in code comments, or maybe in the assertion response text, link to an issue nid for the bug (that could be left in as documentation for the test even when it's fixed). Either way I'd be interested to see what people think about this, and different ways to handle it. I'm also not sure how it'll play with the DrupalTestBed which I guess assumes 100% pass rate?

Comments
There should always be a
There should always be a 100% pass rate. If you write a test that uncovers a bug, then the test should be committed at the same time as the bug fix.
-
www.alanpritt.com
hmm, makes sense
I guess for something like #6162 that took years to fix, you could have a patch in the queue for the test, but not commit it until it goes in with the bug fix.
I am with alpritt and don't
I am with alpritt and don't get the "fail by design" idea. A test should succeed. If it fails, either it or the code it tests need to be rewritten. You can do test-first development, but once a test gets committed to core, that should signify that the absence of the feature it is testing for is a critical issue.
At least that is the way I would set it up.
Definitely the code it tests
Definitely the code it tests should be a critical bug - it's whether we knowingly commit failing tests before there's a working patch fixing the bug that I'm asking, and how to make it clear that's what we're doing if so.
Consider a potential workflow
We've used this internally on projects that use unit tests (not necessarily drupal, but it's relevant here):
With that in mind, I think alpritt's comment about a 100% success rate are important, so long as two things are maintained (and this requires developer diligence):
It occurs to me that there's another case that we're not covering here: what if the tests pass on the developer's system, but for whatever reason after a commit and successive checkout on another developer's box, the tests start failing again. I assume that part of this is handled by the patch review process, and patches will never go into RTBC until the results are validated. Am I correct about that?
well...
upload.test currently passes on at least three platforms, but fails on one: http://drupal.org/node/267683 I don't know how we can get around situations like that. Hopefully they'll be pretty rare though.
One thought that comes to
One thought that comes to mind is automating this kind of test in a CI environment (hudson comes to mind), but I think that might involve a discussion with the folks at osu. I know Selenium-RC supports running its own tests against target platforms, but how to wrap it in a way that makes sense, and how to generate some sanity at the user level somewhat escapes me.
Is this a line of thought worth pursuing?
It's good to have tests that fail
Having tests that fail by design is useful for showing that it's actually possible for tests to fail —wouldn't it be awful if all tests were passing no matter what the results because of a bug in SimpleTest? However, I understand that there is a difficulty with DrupalTestBed —perhaps we should put tests "designed to fail" in a separate test case so it's possible to not run them easily? In any case, having tests that just fail ("Make sure failing works!") would be a good design practice. Hopefully we won't get to the point where we need to have tests that just pass ("Make sure passing works!") ;)
That sounds like a plan.
That sounds like a plan. Like a 'known issues' test case. And get DrupalTestBed to ignore it. Then when one of those tests pass, we make sure it's moved. Certainly I can think of core bugs where there's a relatively straightforward test could be written but fixing the bug could take a lot more effort.
Er. Am I missing something?
Why can't your tests pass even if they're checking for failures?
$node = node_load('banana');
$this->assertFalse($node, t('banana is not a valid node ID'));
A little bit yeah ;)
OK here's an example.
http://drupal.org/node/206820 - for a number of years, we've been leaving orphan nodes in the database when deleting a forum - now the (last minute string freeze) help text reflects this although the underlying code hasn't changed.
Assume that this wasn't a known issue - if we want full test coverage, then ideally someone would have come to write a test for deleting a forum, and check that nodes were deleted alongside it. Do we commit the test, or do we keep it in the patch queue until it's committed along with the bug fix?
Additionally, the url filter has also been broken for some time, and a patch was committed which shows the breakage - before the working patch (which is still RTBC in the queue). Afaik this is the first test committed which breaks by design.
In both cases, the test is perfectly good, but the code is broke in some way. With something like http://drupal.org/node/6162 we knew it was broke for a couple of years before it got fixed - had that bug arisen now, we could've had a test for it - my question is what happens to the test if there's not an immediate fix.
One advantage is that if a known bug has a test available to everyone, in core, it might save duplicate issues - and certainly it'd make it more obvious which issues are stale when their tests start working.
For example, PHP has failures in their tests - presumably they've been doing this longer than us.
Because...
Because say banana was, by mistake, a valid node id. $this->assertFalse($node...) should fail. However, say there was a bug in SimpleTest that causes this to pass anyway (e.g. nothing fails). Then we would never know if anything was failing, completely defeating the purpose of automated testing. :)
If you look at the original SimpleTest library, you will see that they included a test case to test that passes, fails, exceptions, and errors worked. (And they got 1 pass, 1 fail, 1 exception, and 1 error). It's a good idea to have such tests around, just in case. :)
Tests to define solutions
To follow a test-driven approach to development, which is one of our goals, we need to write tests before writing code, and that implies that these tests will fail until the related issues are fixed. It seems that the question is really if those failing tests should be committed or simply be part of the patch.
IMO, we should be able to commit (or to have Dries commit) tests that fail by design, but we also need to keep any random test to be committed. The line should be drawn not only at the priority of the issue (critical issues only), but mainly at the clarity of the correct behavior. I don't believe that having the URL filter process URLs inside of HTML comments is really a critical issue, but I can write a simple test (no pun intended) that easily defines the correct behavior.
Tests that fail by design are not about pointing out the failure, but about outlining criteria for a correct solution so that nothing is missed when we actually have a solution. Did it ever happen that your comment at the beginning of a thread was forgotten by the time a patch got committed in comment #75?
A few concerns
If patches are committed by design they would certainly have to be given a separate status to those not broken by design. Otherwise we land ourselves in the muddle we are in currently, where we can't clearly see if the patch we have created breaks anything. The tests become less useful if we have to wade through lots of broken ones in order to find what breakages are pertinent to our particular patch. That will only lead to mistakes.
Still, I don't really see the point in any case. The suggestion here is basically to turn Drupal into part issue queue. But if you have the test patch in the issue for the bug you can keep track of things a lot easier; everything (comments, status and patches) is all in one place. Plus, it can be easier to write better tests once you've spent time working on fixing the bug because you learn that section of code better. It also avoids us going through two review and commit cycles, which would slow down development. Also five failed tests, does not mean five bugs; so I don't think it works well as a bug tracker in any case. Finally, we will end up with a hybrid of some broken tests being committed, and the majority still stuck in the issue queue. Confusion will arise.
The strategy should be simple: you find a bug issue, you check for related issues, you write a test. At that point another developer could take over, apply the patch, see exactly where the test fails (everything that breaks is related to this bug), fix the bug, runs all tests, rolls a patch. Simple. Perfect. Where's the problem?
The alternative: You find a bug issue, you check for related issues, you write a test. Another developer comes along reviews the test, it gets committed (remember getting reviewed is a huge bottleneck at the moment). Another developer hits the bug bingo link, finds the bug with a test already written, then loads up Drupal HEAD, and searches through the sea of failed tests for the specific tests that relate to this issue. Changes the appropriates tests' statuses so they are not part of the 'fail-by-design' section and runs them, and everything is green. Except for the test they didn't see in that big ol' sea of failing tests.
To sum up: -1 to creating a new haystack to lose our pins in.
-
www.alanpritt.com
I think this makes sense.
I think this makes sense. However I still think it'd be useful to be able to download a bunch of failing tests to run them without hunting through the issue queue. Maybe a contrib module?
Simpletest.test
Since some of the comments delt with ensuring that SimpleTest can fail a test I would point you towrard simpletest.test which checks that SimpleTest output errors. Unless something extremely weird happens that test isn't going to pass if the framework breaks.
Having a collection of tests that fail and point out specific issues would be nice, but I don't think that should be committed to core.