Szalapski.com: The ongoing struggle for high-quality unit tests

My team at work has been struggling lately with how ambitious we should be with unit tests. Everyone agrees that we need automated unit tests, and we need to separate them from automated integration tests and run them often. However, I must ask:

How robust should our unit tests be?

There are definitely a few different ways to think about unit testing. At one extreme is full-on test-driven design (TDD), where every unit of one's software is designed via the unit tests; the unit is responsible for passing all the tests and nothing more. I've never done an actual project using TDD, and I'd like to try--but most teams don't go this far. The other extreme is to avoid unit tests entirely--everything is tested using integration tests or perhaps manual testing only. This is not likely good enough for a project of any complexity.

Seems to me that nearly all development projects fall somewhere in between, and my current project is no exception. So we struggle with how robust to make the tests--what functionality needs tests? Is there non-trivial functionality where one might decide to forgo testing in order to work on higher priority things?

First serious attempt: a "do it right" policy for unit tests

We knew we weren't going to do full TDD. After we realized we needed a better testing policy, we made some rules and guidelines that go something like this:

Every non-trivial public member of a class or module must have robust unit tests written for it.
Test every possibly different edge case you can think of.
Refactor classes so that all dependencies are fully injectible, so that mocked instances can be injected for true unit testing.

Use a mocking framework like Moq or Microsoft Moles and Stubs to create more fully-featured mock objects with less refactoring.

Submit every test for code review by at least two other developers (we use CodeCollaborator).

Reviewers should submit test cases they think are lacking
Author should follow-up with new tests

Consider using Pex to help you write tests with better coverage.

Writing high-quality unit tests is difficult, tedious, and unenjoyable

To write unit tests well is to spend a lot of time thinking of what all the possible kinds of inputs are--thinking that developers are more interested in investing in their actual implementation. The tools involved (we used MSTest, Moq, Pex, and Microsoft Moles) all have a learning curve and are not what most developers want to learn.

But it's worth it, we all say, so we soldier on and do our best. Say I've implemented a new feature, and I begin writing unit tests for my new classes. I stub out each test for a class. By the time I finish a few properties, I have stubbed out and provided focused, detailed thought on each of them. My brain is tired because this thought process is tedious and not the kind of problem-solving thought that makes me engaged and interested. If software design and implementation is akin to doing a jigsaw puzzle, then writing unit tests for that software is akin to doing one's taxes.

I now have much of the work done for these tests I've written, and my software will be better as a result. Maybe I've even exposed some defects or missing features just by thinking about these tests. Now I am fatigued, but it's worth it, right? But hold on; I'm only done with about a third of the testable members of this class. After I write tests for the remaining two-thirds members, I have to continue on and write tests for several other classes that I've added.

But it's worth it, right? Hold on, I haven't considered the classes that I've added to. I have to open their existing tests and understand all of the tests that possibly relate to my new feature. I now have to write a few new unit tests for what I've added to that class. So I repeat the process all over again. Hopefully, this doesn't take as long, since I only have a few changes.

But it's worth it, right? Now my new feature is unit-tested, and for all time into the future, we will have good assurance that each class works well. Not so fast--I haven't yet modified the tests for changes to existing behavior to support the new feature. Now I have to examine the existing tests so I can change their preconditions, actions, and/or assertions. I also have to consider if the test is revealing that my behavior change breaks any existing features. Maybe my change to existing behavior is naive and we need to reconsider it.

But it's worth it, right? Suppose now I am done writing unit tests--what a relief! But now I need to write integration tests, which feels like starting over for much of the same thing. And I need to consider if any new backwards-compatibility tests are needed for units that work with older software that can't be upgraded at the same time.

I put my tests through a code review, and the reviewers recommended a few additional cases. Fortunately, it is usually pretty easy to implement tests for these cases after someone else points them out.

The outcome of this is great, but still has a big flaw: to properly execute the above, it takes a lot of time and determination to get it right. We had tasks that were previously estimated at 4 hours (including perhaps an hour to write tests) and being done in a day now taking perhaps 12 mythical man-hours to be done over more than a week. Do we have better assurance that our software works well and according to our intent? Without question. But how much? Is it worth taking perhaps double or triple the time to develop something? We've been developing the "old way" for several months, and our impression was that the business was tentatively happy with our results, including the need for support and patching that arose from a few defects.

After a few months of this style, we begin to degrade. It feels like it is taking too much time to write tests in this way; the business seems more and more unhappy with our decreased velocity (and increased estimates). Team morale is down and excitement for the future wanes.

The perfect is the enemy of the good: tweaking the approach

We are keeping the guidelines, but we will now try be wiser in writing tests--we pick and choose which tests to write, which tests to stub out but leave for writing later, and which units to avoid thinking of tests for at all. We try to always be "better" (writing more, higher quality, and better covering tests) than we were before, yet we try not to go all-out to do it "right" in every possible way.

So my next emphasis is for all the team members to write robust tests when it is beneficial, and to avoid doing work writing robust tests where they won't be beneficial. I plan to "err on the side of writing more tests", since it is sometimes difficult to know which tests are actually unnecessary--but it seems to me that one lesson from this is that we simply don't have the resources to write "complete" or "really good" unit tests as a hard-and-fast rule.

Relaxing our unit-test vigilance might create some technical debt

I fully acknowledge that there is a potential for issues here. My biggest concern with this approach is that it demands individual judgment by each team member on "how good" to make one's testing. Code review should mitigate this somewhat, but with an amorphous guideline to avoid writing unit tests that aren't beneficial, no one will know exactly how good "good enough" is. There is a big risk that as a team we will neglect to write beneficial unit tests for a class because no one who read that code thought writing unit tests were necessary or thought they should speak up.

Another good point is that if we don't spend the time to write unit tests now, or we will spend time troubleshooting later. This is obviously true to an extent; the sooner you find a bug, the fewer resources it takes to fix or otherwise overcome it. However, I cannot accept that 100% of the unit tests that one could possibly write will result in an error that we have to find later. The hope is that we can avoid taking time to write the trivial and "almost trivial" unit tests. The business has already demonstrated that we are somewhat error-tolerant, as long as we can fix errors reasonably quickly; can we be okay with that as well?

I admit that I am long way from figuring out a "sweet spot" approach to this difficult issue. How does one discern trivial functionality that needs no testing from non-trivial? How can we ensure that needed unit tests exist? Is there a better, more objective goal that we can aim for in our tests?

If we aim to make our unit tests not "really good" but instead "good enough" given the business demands and resource constraints, will they ever be really good enough? I'd love to hear all your thoughts and recommended articles; please leave your comments.

Szalapski.com

Saturday, September 25, 2010

The ongoing struggle for high-quality unit tests

No comments:

Post a Comment