RSpec shared examples unmasked

Better Specs is a popular source of guidance for writing RSpec tests. Unfortunately, much of the advice there is dubious, and some of it is downright bad. One of these pieces of bad advice has to do with RSpec's shared examples feature. The shared example section reads:

Making tests is great and you get more confident day after day. But in the end you will start to see code duplication coming up everywhere. Use shared examples to DRY your test suite up. [DRY = Don't Repeat Yourself]

"Use shared examples to DRY your test suite up." Hmm. Is this a good idea? Is test code supposed to be DRY? Why or why not? We'll discuss this question in a moment, but first let's look at Better Specs' supposed "bad" and "good" way to write a test.

Better Specs' "good" way

In the following test, a very brief one, we can see that we're apparently testing a listable resource, a paginable resource, a searchable resource and a filterable list. At the beginning of the test we're defining a resource and a uri. If you're not sure what the meaning of this test is, me neither. But let's just hold this test in memory for a moment and try to understand it once we've seen both examples.

describe 'GET /devices' do
  let!(:resource) { FactoryBot.create :device, created_from: user.id }
  let!(:uri)       { '/devices' }

  it_behaves_like 'a listable resource'
  it_behaves_like 'a paginable resource'
  it_behaves_like 'a searchable resource'
  it_behaves_like 'a filterable list'
end

So that's apparently the good way to write such a test, using shared examples. What does the bad way look like?

Better Specs' "bad" way

We can see that the first couple of lines of this second example are the same as the first example, the lines that define a resource and a uri. After that it gets a little less clear (at least to me) what the relationship between the good version and the bad version is. It seems like maybe the test below has to do with pagination. Other than that I'm not too sure.

describe 'GET /devices' do
  let!(:resource) { FactoryBot.create :device, created_from: user.id }
  let!(:uri)      { '/devices' }

  context 'when shows all resources' do
    let!(:not_owned) { FactoryBot.create factory }

    it 'shows all owned resources' do
      page.driver.get uri
      expect(page.status_code).to be(200)
      contains_owned_resource resource
      does_not_contain_resource not_owned
    end
  end

  describe '?start=:uri' do
    it 'shows the next page' do
      page.driver.get uri, start: resource.uri
      expect(page.status_code).to be(200)
      contains_resource resources.first
      expect(page).to_not have_content resource.id.to_s
    end
  end
end

Honestly, there are so many things wrong with these examples from a technical perspective, and so much wrong with this "lesson" from a pedagogical perspective, that I'm not even sure where to begin. Let's start with the mistaken supposition that it's a good idea to DRY up your test suite, and then let's take a closer look at what shared examples actually are.

Duplication in tests: okay or not okay?

The essence of Don't Repeat Yourself is that if you have a piece of knowledge or behavior that appears in a codebase multiple times, that means that one of the copies of that behavior could get changed without the others getting changed, leading to inconsistencies and therefore bugs.

Some programmers take DRY too far and apply it in ways that don't make sense, unifying things that actually make more sense to be separate. Many people have spotted these misguided attempts to "DRY" up code and erroneously concluded that the DRY principle itself is to blame. In fact, there has been a whole mini-movement against DRY, which is where we get hare-brained ideas like Write Everything Twice (WET) and the Rule of Three. But misapplications of the DRY principle are not the fault of the DRY principle itself. Inexperienced or unthoughtful people misunderstanding a good idea does not turn the good idea into a bad idea.

Why test code is different

"Tolerate some duplication, and go easy on the DRY" is more or less the message of the backlash against DRY, which is a bit like saying "tolerate some poison in your food, and try not to be overly healthy". This misguided advice has unfortunately taken particularly strong hold in the realm of testing. A commonly-given reason is that, in testing, "clarity is more important than DRY". Okay, well if that's true, why isn't clarity more important than DRY everywhere? What's different about test code that makes DRY apply differently? This explanation doesn't address this question, and it's false.

DRY does indeed apply to test code differently from application code, but not because clarity is more important than DRY in testing. The meaningful difference is that, generally, test code is arbitrary whereas application code is not. Tests (also known to some as executable specifications) specify the desired behavior of the application code. In order for the application code to be correct, it must adhere to the specifications described in the tests. Tests (which are, again, executable specifications) are subject to no such constraint. Whatever the tests say the correct behavior is, that's what the correct behavior is. This is the sense in which test code is arbitrary. Because test code is arbitrary, two pieces of identical test code don't necessarily constitute duplication.

How to identify real duplication in tests

If one piece of duplicated code changes but its twin does not, does that constitute a bug? If the duplicated code is application code, there's an easy way to tell (and let's assume full test coverage for the sake of the example): if the change causes a test to fail, then the change created a bug. In other words, if the change causes the program no longer to conform to its specifications, then the change caused a bug.

But if the piece of duplicated code is test code, there's no way to tell whether changing one copy creates a bug or not, other than to ask the author of the change what their intention was. The change that created the inconsistency could be a mistake, or it could be that the requirements changed and what used to be one behavior has now branched into two different behaviors.

(One brief nuance before we move on: in test helper code, because test helpers are not specifications, the same principles of DRY apply to helpers as to application code.)

All this is to say that Better Specs' assertion that you can use shared examples "to DRY your test suite up" is questionable at best. There is still an unanswered question to content with, though. What about cases where several features share a large amount of identical behavior? It feels dumb to write a large number of identical tests. The options we're left with are to write extremely duplicative tests or to write just one such test and then accept gaping holes in our test coverage. What's one to do?

This is when it's important to remember that there's not a "right" and "wrong" way to do testing, there's only what's smart and less smart and what's advantageous and disadvantageous. The goal is to get a positive return on the investment made in the tests. When there are several features which share similar behavior, going from zero tests to one makes a huge difference. The behavior in those features used to not be covered at all, and now it is. Going from one test to two makes a smaller difference. You've gained a bit more assurance, but if the code that the second test covers shares most of the some code with what the first test covers (as it of course should), then the gain is not that great. The gain for the third test written is even smaller, and so on.

What's behind the mask?

What exactly is a shared example, anyway? Would you believe me if I told you it's a function? I don't mean literally, but for all practical purposes that's what it is. Let's take another look at the Better Specs example:

describe 'GET /devices' do
  let!(:resource) { FactoryBot.create :device, created_from: user.id }
  let!(:uri)       { '/devices' }

  it_behaves_like 'a listable resource'
  it_behaves_like 'a paginable resource'
  it_behaves_like 'a searchable resource'
  it_behaves_like 'a filterable list'
end

What's effectively happening, from the perspective of how easy the code is to understand, is that we're assigning values to global variables and then calling a function that uses those variables. Here's an illustration of code that's conceptually equivalent to the shared example above.

$resource = FactoryBot.create :device, created_from: user.id
$uri = '/devices'

it_behaves_like('a listable resource')
it_behaves_like('a paginable resource')
it_behaves_like('a searchable resource')
it_behaves_like('a filterable resource')

Again, to be clear, I'm not saying that my illustration is what the shared example is doing behind the scenes or anything like that. I'm only saying that using a shared example is, from a code understandability perspective, just as bad as defining global variables and then calling functions that use those variables. Of course we've all been taught that global variables are bad, but it's worth reviewing why. Why are global variables bad again?

The cost of global variables

Just like with duplication, it's not exactly that global variables are bad, it's just that they have a cost. The cost, in this case, is that the in the place the global variables are defined, you can't see all the places where they're used, and so you don't know if it's safe to change anything about the global variable definitions. And in the place where the global variables are used, you can't see where they're defined, and you also don't know everything that might happen to them in between where they've been defined and where they're used. Basically the scope is wide open. It's the opposite of encapsulation.

When, conversely, variables are narrowly scoped, it's easy to trace their entire lifecycle. You can see where the variable is defined and where it's used. It's easy to predict the consequences of any particular change, and easy to be justifiably confident that your change won't have any surprising side effects.

It gets worse

The design of shared examples, where you define global variables, then use them in some distant place later, is already bad enough, but on top of that is the maddening way that the tests for shared behavior are called. Instead of defining a function and then calling it, shared examples get "called" by passing a string into the it_behaves_like method. Below is an example that I got from this blog post, which I've slightly modified for clarity.

describe 'Dogs behavior' do
  context 'Snuff' do
    # Assignment of global variable
    let!(:snuff) { Dog.new(true, true, false, true) }

    # Call to "global function" called "a normal dog"
    it_behaves_like 'a normal dog'
  end
end

# Definition of "a normal dog" "function"
shared_examples 'a normal dog' do
  it { is_expected.to be_able_to_growl }
  it { is_expected.to be_able_to_bark }
  it { is_expected.to be_able_to_jump }
  it { is_expected.to be_able_to_flee }
end

The reason this design choice is so frustrating is that when people choose not-very-unique names like shared_examples "login", it becomes surprisingly difficult to find the shared examples you're looking for. Shared example bring all these compromises and frustration for what benefit? To DRY up your test suite, an idea that doesn't even make sense in the first place. No thank you!

Alternatives to shared examples

Even though shared examples are a bad solution, and even though it's not a good idea to try to de-duplicate code that's not actually duplication in the first place, there is still a real problem that remains to be solved. What do you do when you have the choice between writing the same tests repeatedly and leaving holes in your test coverage?

As I alluded to earlier, the aim of testing is not to achieve 100% test coverage for its own sake or to piously observe religious rules, but to enjoy the practical benefits that testing brings. If there are two areas of an application that are slightly different but share most of the same code, then a test for one area exercises most of the code for the other area too. So one approach to testing multiple similar features is to choose to test a "representative sample" of the features and decide that that's good enough.

Sometimes the bulk of (apparent or real) duplication in tests takes the form of CSS selectors and such, not only creating duplication but also obscuring the meaning of the test with noise. Such annoyances can be mitigated using Page Objects, a topic I've written about here, which can be used to raise the level of abstraction in tests, allowing the tests to read more easily and achieve the same thing with less code. Applying this technique has the happy side effect of reducing the amount of duplicative code in tests, but without obscuring the meaning the way shared examples do. The RSpec library contains a mix of decent and questionable ideas, but I think some of its features are purely rotten and deserve to go straight in the garbage. The worst idea of them all, which I would undoubtedly toss in the trash first, is shared examples.