I'm Done with Unit and Integration Tests
What's the difference between them? Why does it matter? I discuss why I use the terms 'I/O-Free' and 'I/O-Dependent' tests instead.
I’m Done with Unit and Integration Tests
I’ve been writing developer tests for a very long time. Lately, I’ve been reflecting on the types of tests I write, and why some are easier than others. When teaching and coaching others how to write tests, I almost always explain what I mean by “Unit Tests” and “Integration Tests”: Unit Tests don’t touch hardware, don’t do I/O, etc.[1], and test against a single object or group of objects[2]. I’d then demonstrate what I meant, like this:
@Test
public void fullDeckHas52Cards() {
Deck deck = new Deck();
assertThat(deck.size())
.isEqualTo(52);
}
@Test
public void drawCardFromDeckReducesDeckSizeByOne() {
Deck deck = new Deck();
deck.draw();
assertThat(deck.size())
.isEqualTo(51);
}
These are both “unit” tests (neither touch hardware nor do I/O) and are Solitary (the tests only reference the Deck
class).
Later on, we’d need to write tests against code that may do I/O, often a database or an external service (usually over the network). I’d explain that those were “Integration Tests”, because we were integrating our code with someone else’s code (the database code or the other service’s code) through I/O. However, calling someone else’s code that’s supplied as a library (e.g., a JAR file) that doesn’t do I/O (such as Caffeine, a caching library) can also be considered integration. Even using Java’s Collection classes (e.g., ArrayList
) is using someone else’s code, though we don’t usually think of that as integration.
Hard to Redefine Terms With Lots of Baggage
Over the past few years, I’ve become more frustrated with the terms “unit” and “integration” because:
- I have to explain what Unit and Integration mean before I can use them
- Everyone has their own internal definition of what they mean, which is often different from mine and other folks in the room
- Differences in definitions often led to long discussions[3] that aren’t useful
- Folks don’t remember how I define it, and fall back to their own definitions
I finally decided to do something about it, and come up with different names. But first, I had to answer the question: why does it matter? What about Unit (doesn’t do I/O) and Integration (may do I/O?) is important for the way I approach development?
No I/O = Predictable & Fast
If you create an object, and call a method that only accesses internal properties (fields) and any parameters to the method, it must be predictable. Any logic or calculations the code is doing is deterministic. What makes code not predictable? I/O. Accessing a file is unpredictable, because the file could’ve changed without you knowing, the drive could fail intermittently, could be out of space, etc. Accessing a remote service involves not only the network (unreliable), but also the remote service (unpredictable). I include access to anything outside of memory, such as random number generation and the current date & time as I/O, because they are also unpredictable. By eliminating all I/O, you make the code under test, and therefore the test, deterministic.
Not accessing I/O also means your tests will run extremely fast[4], with most of the time spent getting the tests ready to run: compiling, starting, etc. Everything else, from instantiating objects to running the code, is almost instantaneous. There’s no reason you can’t have many thousands of tests run in a few seconds. By the way, this speed is critical for doing test-driven development, which is why I focus on these kinds of tests.
Now when it comes to code that does interact with the outside world, such as getting the current date & time, or fetching information from a database or external service, I still want tests that don’t do I/O. This is where Test Doubles come into play. In Hexagonal Architecture, that might mean a Stub or a Fake in place of a concrete Adapter, or I might use Nullable Infrastructure Wrappers, which is Stub-like implementation embedded in your production code.
The idea is that we’re still not touching I/O, so everything is still Predictable & Fast. These are a form of Sociable tests, where we’re testing a larger set of collaborating classes, but explicitly not testing the I/O itself.
I want 80-90% of my tests to be these kinds of tests. Of course, it’ll vary widely depending on how much your system is doing things vs. integrating with other things.
I/O = Unpredictable & Slower
At some point, though, you want to have some sort of test that touches I/O. Things like:
- Check the database schema is valid
- Ensure the ORM can read and write from the database and create well-formed objects
- Call an external service API and ensure you get a valid response
These are hard to write, because, like I said, they’re unpredictable. Databases are often managed (“owned/defined”) by the application, so those can use tools like Docker and Testcontainers to allow your tests to use real databases. They’re slower, but you won’t be running them nearly as often as the other tests.
When it comes to calling external services, if it’s a service that you can run inside a container (maybe it’s Kafka, or it’s a custom service created by another team that provides an executable), then you can do the same thing as with the database above. But if it’s a public service (like GitHub or Google), or any service that you can’t run locally or in a container, you’re not going to be able to run automated tests that are predictable. Yes, you could run against a “sandbox” environment, but for me that falls under “unpredictable” based on my experiences (maybe yours work better?). So, if I write these tests, I run them manually, or have other ways of checking that my code works (like good monitoring and observability in production).
Naming Attempts
Naming is hard, as we all know. Trying to find a name that is descriptive and memorable, but with little or no baggage, is quite a task.
First Attempt
During a discussion of the problem with Willem Larsen, he proposed using words from a different language (e.g., Japanese) for the same terms (or at least what they meant to me). However, not being able to speak Japanese, I had to rely on less than perfect translation systems. Not only did I want to find a word or phrase that had the intent I wanted, but it also had to be relatively short. For months, I played with different translations, but wasn’t happy with the results, so I shelved the idea.
Pure and Impure
In Functional Programming, the terms “Pure” and “Impure” have somewhat similar meanings to my usage of “Unit” and “Integration”. In FP, a pure function is a function that doesn’t cause any side effects, and always produces the same output for a given input. I tried using this terminology for a while, but it had two problems:
- I still had to define “pure” as not accessing I/O
- I got objections from FP folks, because of the misuse of the term “pure” in the context of testing stateful object-oriented code (methods accessing internal state could never be “pure”)
Honestly, #1 was the more annoying aspect for me, though it did have the benefit of not coming with much baggage (except for folks familiar with FP).
I do use the term “purify” when I’m talking about the process of separating the I/O code from the logic, e.g., “let’s purify this code, making it I/O-Free”.
I/O-Free and I/O-Dependent
So here we are. The two main kinds of tests I write. The majority are I/O-Free tests, and when I hit the I/O boundary and need to test against some real external service, I use I/O-Dependent tests (or I/O-Based).
I further split I/O-Free tests into Domain, Application, and other buckets, depending on the application architecture (e.g., Domain tests never have Test Doubles), but that’s another article.
What do you think? Let me know on my Discord, on Mastodon, or on Twitter.
This should sound familiar to some of you, as it’s the same idea as the set of unit testing rules by Michael Feathers written in 2005. ↩︎
I use the terms Sociable and Solitary to differentiate between the different kinds of “unit” tests, as defined by Jay Fields in Working Effectively with Unit Tests, with an excerpt found here. ↩︎
If I have one more debate over what “unit” means, I’m gonna scream. It’s especially annoying, since I don’t think defining a “unit” is a useful exercise. ↩︎
Yes, there is code that performs lengthy calculations that run completely in memory, but I’m not talking about that. ↩︎