All Articles

The price of software: Unit testing

Test automation is one of the best investments an engineering team can do when developing software. The industry accepts that testing units is the fastest and most simple form to automate tests. But does that mean we should unit test our whole codebase? How much of our code should we cover with unit testing? How can we be most efficient?

Measuring Tests Effectiveness

Finding program bugs when developing is cheaper than fixing them afterward. Handling user concerns, maintaining transaction consistency, isolating, and then releasing a patch for a defect is expensive.

The goal of testing is to reduce the cost of software.

The difference between the price of process and the cost of bugs is the real return of investment of writing tests.

Code Coverage

The traditional way to measure how much we are testing is the so-called code coverage. We obtain the value of coverage as the percentage of source code that runs when executing tests.

code-coverage

In addition to the number of lines of code, different criteria can be used in the coverage calculation: function coverage, statement coverage, and so on. However, they all rely on the ratio of total code versus code executed when running tests as measurement.

The assumption behind measuring code coverage is that a program which entirely tested will have fewer defects. In practice, aiming for 100% coverage is not very realistic.

Testing Fatigue

The cost of writing unit tests increases exponentially as we get closer to 100% coverage. The number of defects detected decreases at the same rate.

Writing tests to get the test coverage from 90% to 100% is expensive and highly ineffective in preventing bugs.

unit-test-fatigue

Aiming for complete test coverage couples the tests with the implementation to a level where development effort for a feature can even double. We get slowly out of sync with product development when we spend too much time testing implementation details.

The Cost of Unit Testing

Being unit testing the fastest way to test, we fall in the trap of overusing it. To increase code coverage, we test code that does not correspond to a specific unit but ties many units together. The effort of mocking and stubbing becomes higher than writing the business logic itself. We are at that point writing integration tests using the unit test approach and tools.

The return on investment of those unit tests is minimal because they rely heavily on mocking of their dependencies and will pass even if those dependencies have changed their interfaces. Making mocks and fakes aware of changes in the constructs they replace results into heavy maintenance work.

Abusing unit testing adds overhead to the development process without reducing defect density.

It is usually much more cost-effective to write an integration test than several unit tests if we already have a decent integration test setup.

Using the traditional code coverage as a synonym for program correctness or code heath will lead to a negative return on test efforts and increase development time rather than decreasing it.

The Effective Code Coverage

Tests are beneficial if they prevent the maximum amount of bugs unit of time invested. Code coverage is only useful if we include the context in the calculation. A better parameter to quantify out test efforts is something we can call the effective code coverage. That is the percentage of code that the development team decided should be unit tested and the ones that executes when running tests.

effective-test-coverage

Depending on the project and the programming language, this metric will not be higher than 20% to 40% of the total source code. That is because we target code that:

  1. identifies as a unit
  2. is ideally reused

Because tests do not suffocate development, the return on investment of this method is much higher. The approach gives space to investing testing time that would normally go into unit assertions in integration or end to end tests.

This testing philosophy requires flexibility in thinking and relies heavily on context: the kind of software we are writing and its purpose, language, frameworks and, architecture choices we make. Therefore, the parts of the source code that must be unit tested change with the same frequency as our product does.

Although counter-intuitive, decreasing the amount of unit testing can improve the overall testing coverage.

Testing efforts can then be focused on other types of tests, like integration or end to end.

Summing up

We should avoid writing unit tests that are too close to implementation details. Instead, focusing on reusable modules and their interfaces will maximize the number of defects identified per unit time.

The remaining testing time can be invested in more sophisticated testing methods, closer to user expectation, and changing less frequently. The testing setup with the highest return of investment is a mix of unit, integration, end to end, and manual tests. The perfect blend depends on the product, the tech stack, and the team.

We should approach unit testing in the same way we approach any form of automation. Traditional process analysis, including cost and time, help make a much better decision than absolute measures of coverage.