There is a group of software engineers (and full disclosure: not so long ago I was part of that group) that when asked about the test coverage of their software product, they answer along the lines:
Yeah, we have some here and there, but we’re not in it for the numbers. If we encounter some particularly complex piece of code then we test it, but otherwise it’s at the author’s discretion to decide
They are not denying the benefits of automated testing, but at the same time, by their own admission, they limit their usefulness only to particular cases. From my experience, this is usually due to friction that appears while writing them: the team probably lacks experience to create the tests or to build the architecture supporting them, the application design poorly supports developing clean test cases, etc.
After you learn a few tricks to overcome aforementioned obstacles, the process of building your automated test suite becomes a joy, and you can swiftly improve your 10-20% coverage into a 70-90% coverage, increasing your benefits by an order of magnitude. So if a team already recognizes the value of automated testing, with fairly little effort you can turn it from one that only occassionally writes test, to one that does that by default. And this article is meant to show you some of those tricks.
The purpose of the tests
In my opinion, there are primary and secondary purposes for having an automated test suite, but if I had to pick one, that’d be:
An automated test suite is there to give you confidence in some assumptions about how your code behaves in a given scenario and environment
I believe the only thing a suite can give you guarantees about is that probably something is broken when the tests fail, but not the other way around: a green result doesn’t necessarily mean that everything is working smoothly.
At the same time, the larger your suite is, the more well-crafted are your scenarios, and the better you understand what your tests cover, the more closely this confidence resembles guarantees. And for a lot of practical purposes, there are situations in which my trust in the automated tests is so high, that them successfully passing is sufficient for me to push a change to production.
Isolating the thing we want to test
In every automated test, there is a system under test (SUT). It’s easiest to illustrate this in the case of a unit test, where many times the SUT will just be a single class with some behaviour we’d like to get guarantees about. But neither in unit tests, nor in higher level tests should we equate SUT with a class, but rather with a black-box, a part of the system (either very small or a very big one) with clear boundaries. In each test, we’d like to act from outside of these boundaries, observe the results and side effects outside of the box, and ignore whatever is happening inside the box.
Here is how it looks like:
We can also see from this diagram, that every test scenario starts with an incoming message. For low-level (unit) tests that will probably be a method call, while for higher level tests that would be simulating a HTTP request or dispatching a command on the command bus.
What’s important here: we need to isolate the SUT at the appropriate abstraction level. This means:
- When testing a single unit, like a class or a small collection of classes, we don’t want them to depend on the whole application, backing storage or environment. This means, for example, that we don’t want the class to depend on the Service Locator (or Dependency Injection Container), nor any global state. We want to explicitly pass any dependency, because we need to control the environment outside the box.
- This is very similar to application level testing, only the box is larger. How large and what shape is up to you: do you want to include the persistent storage or not? Caches? Live external APIs or mocks? Filesystem?
Unit vs integration vs functional
Depending on the size and shape of the box you choose, you will get different guarantees from your test suite. The smaller it is, the more detailed the test scenarios can be, and easier it is to test any edge cases. But at the same time, the farther from your real production environment you get, because a large portion of your app is replaced by test doubles or a staged environment, that may have nothing in common with your real-world use cases.
Larger surface area, on the other hand, tests more of your application’s component together, usually with a real backing storage (like a SQL database), but will be slower and harder to iron out those gritty details.
This is where the concept of classic pyramid of testing comes from: a lot of unit tests at the bottom, and fewer higher level test as we go up the layers. This is also something I personally usually implement in my projects: push as much logic from outer layers into inner ones (think: from controllers to services, or from the application layer to the domain layer, from imperative shell, to the functional core, etc), add a lot of unit tests, and then just a few functional tests to see if the components are correctly glued together.
I find arguing over naming and rules counterprodictive. I don’t really know what an integration test means to any another person, but as long as we can agree on the purpose and characteristics of any given suite, we can create them and use them to our advantage. Similarly, once I was on a team that had lenghty discussions about whether a unit test should mock every immediate dependency of a class, or if it was allowed to use concrete implementations in some cases. That discussion was secondary to the fact that the team simply stalled and didn’t write any unit tests (or a minimal amount).
My advice is simple: desigg your own test suites, name them so that your team understands them, and then choose for yourself if they will form a pyramid, an hourglass or an inverted pyramid, all depending on what results you think will be beneficial for your business.
Deciding what assertions to make
Getting back to the system under test diagram:
The test starts with an incoming message, and every process has its input and an output. For automated software testing, we can differentiate three kinds of outputs, and each SUT will have between one and all of them:
- The return value: whatever you get back from your method call or the HTTP response you get from your HTTP request.
- Side effects: outgoing messages to any explicit or implicit dependencies. To put it in simpler terms, those any method calls to other services/classes/apis with intent to modify state. Some examples include writing to cache or a database, changing a value of a global variable, saving a file to disk, executing a POST API call, sending an email, etc.
- Changed SUT’s internal state: and on some occasions, the side-effects are observable on the SUT itself, for example when executing a mutation on an entity. But keep in mind that we’re only talking about publicly visible state, so unless it has a getter (simply speaking), we don’t care about changes to the private state of SUT, because that is inside of the black box and covered by the fog of war.
In addition, each incoming message that starts the process can have one of two characteristics (or on rare occassions, both):
- Query: when we generally don’t expect any side-effects, and only care about the return value
- Commands: are the opposite, usually not returning a value or a one we don’t want to make any assertions about, but their main purpose is to create some side-effects.
Here is a matrix on how we’d like to make assertions for each kind of message and output (credits for the idea: Sandi Metz):
In short:
- When executing a query, we only make assertions about the response value and ignore all the rest
- When executing a command, we make sure the SUT makes outgoing calls or changes it’s internal state that we can observe, and ignore all the rest
This means we can explicitly ignore all of the calls to self (for example, calling private methods or setting private state), which are happening inside of the black-box. In a test scenario, we don’t care about that, these are implementation details. If we break this rule, we will end with a test suite that is more fragile, and it will break more often when refactoring — when the behaviour does not change. That’s friction we’d like to avoid.
On a higher level, say in application testing:
- We don’t care if a particular service was called.
- We might want to test if an email was sent, as in our definition, that probably reaches outside of our app.
- We shouldn’t make assertions about what is saved to the database, because it’s considered private state. Instead, we’d want to use a public endpoint to query for that data. Think of it this way: your customers don’t care if the record they created is saved in the database. What they do care about is whether it’s returned on a list of records, or more importantly, if it’s then used in the business processes.
This is a powerful concept:
The output of the process of creating a reminder is not a record in the database. It is the fact of an alarm going off at the time specified in the input.
Crafting the black-box
To help with controlling the environment outside of the box, we need to make sure SUT relinquishes control over it. This is commonly known as The Inversion of Control principle. In practice, for our automated test suite, this means that instead for SUT to know it’s dependencies and state, we need to make them explicit and pass them in.
Examples of dependencies and state:
- Dependency injection of regular class dependencies: instead of creating a mailer inside of the class, we expect an object implementing a specific interface to be passed in either via the constructor, on as the part of the incoming message (e.g. a method argument)
- Same for querying system state — instead of using global state (singletons, static methods, global variables), we pass the state in: either as a method argument, or using dependency injection in conjunction with the repository pattern.
- An often overlooked part of the global state is time. Some of our behaviour will yield different result based on what time it is. A
ClockInterface
is meant to provide the system time, but since it’s an abstraction, it allows us to provide a test double which will behave exactly the way we want in a given test scenario, making the tests independent of system time itself.
In more broad terms, the idea of ports and adapters from hexagonal architecture can be used. Ports are holes in our SUT that we want to fill in with different shaped pegs (adapters). These adapters are either real-world implementations (DatabaseEntityRepository
, SystemClock
) or test doubles (InMemoryEntityRepository
, FrozenClock
). Ports live on the edge of our black-box, and allows us to connect it with our pre-defined environment.
What’s next?
- Read more articles in the automated testing category
- A demo project showcasing test suite organization, arrange/act/assert pattern, how to create and use test doubles, gherkin syntax for unit tests, and more
- Does the article make sense? Hire me to help you with your test suite
- 👇 Or book a call, so we can talk about how I can help your team