Introducing a Powerful New Way to Write Tests in Python

Introducing a Powerful New Way to Write Tests in Python

Β·

9 min read

Featured on Hashnode

TL;DR - I just released the first version of a new kind of testing framework, called Sundew. It's early days, but if you're ready to try something new in the world of testing, you should check it out! You can find the GitHub repo here and the documentation here.

If you're interested in learning more about what makes sundew interesting, keep on reading!

Sundew is a new kind of testing framework for Python that applies a truly brand-new approach to writing tests (at least as far as I can tell). It requires you to write tests in a new and unfamiliar way, but in exchange, you'll unlock exciting new capabilities for your test suite like finding failing tests faster, detecting functions without tests, and automatically writing regression tests for those functions. However, while these new capabilities are exciting, breaking new ground in an established field like testing comes with some sharp edges. Aside from being a new, unproven library, the nature of this new approach also makes it strongly opinionated. I like to compare it to something like the Python autoformatter black. If your first experience was like mine, you might have been a bit put off by how opinionated it was, but once you tried it out you realized the benefits far outweighed the initial adjustment. My hope is your experience with sundew will be much the same.

Let's start by looking at what drives sundew to be so opinionated, so we can explore how that unlocks functionality you can't find in other testing frameworks today.

Writing Tests with Sundew

This is an example of what a test written with sundew might look like:

from sundew import test
from my_module import my_function
from tests import fixtures 

test(my_function)(
    setup={fixtures.setup_class},
    kwargs={"a": "123"},
    patches={"sys.stdout": io.StringIO()},
    returns="return_123"
    side_effects=[
        lambda _: _.patches["sys.stdout"].getvalue() == "123\n",
        lambda _: _.a == "return_123",
    ],
)

There's alot to unpack here and most of the details I'll leave to the documentation, but let's try to understand why our tests look like this. Every test written in sundew starts with: test(my_function) where test is the function sundew uses to declare a test, and my_function is some function in your code that you want to test. This simple format unpacks something powerful, and it's the reason why I call sundew's approach "function-based testing". With this format we enforce a very important concept:

Every test has a strict relationship with a single function from your code.

Now, we all know tests are supposed to be written this way anyways. The single responsibility principle applies to tests as much as, if not more than, it does to the code we write. However, unlike other testing frameworks, sundew strictly enforces this relationship. This is where sundew starts to feel opinionated, maybe even a bit... dogmatic πŸ‘€. Hang in there though, this strict relationship is what unlocks the really cool stuff soon.

Once we tell sundew what we're testing we need to provide context such as:

  • "What arguments do I want to test this function with?"

  • "What should the function return?"

  • "What setup do I need to do before I call the function?"

  • and "What side effects do I expect to happen when the function is called?".

Most of the arguments above are optional, and simple tests may only utilize the kwargs and returns parameters. You can checkout the documentation if you want a deeper dive into how to use all the arguments sundew supports for defining tests.

Alright, so we've got ourselves an opinionated, tightly-coupled-to-a-single-function test that feels less expressive than our usual "write whatever function you want and we'll check if it runs" framework. As we'll soon see, this trade-off becomes worth it because it enables one simple, but powerful, fact about tests written with sundew:

Our tests know what they are testing

Let's look at why that's such a powerful notion...

Fail Fast!

One thing I can't stand when working with large, slow test suites is when I run the test suite and it doesn't fail until after I've waited through minutes of successful tests. The feedback loop is too slow, and my flow is broken. If my test suite has a failure, I want to know about it as soon as possible. I want it to fail fast.

Now while sundew can't guarantee that the very last test you run won't be the broken one, it can do some pretty smart ordering of your tests to reduce that likelihood. Because sundew knows exactly what function each test is running, we can also inspect those functions and see the sub-functions they call too. And if those sub-functions have tests (and we hope they do, but we'll come back to that) then sundew can build a dependency graph before it runs a single test. Then we can take that dependency graph and apply Kahn's Algorithm to topographically sort it. This means we've sorted our tests from the ones for functions that have the least number of sub-functions (i.e. zero) to the ones for functions that have the most dependent functions.

How does this lead to test suites that are likely to fail faster? Here are two examples that demonstrate:

  1. If function_a calls function_b, then function_a is guaranteed to take atleast as long as function_b. By topographically sorting our dependency graph we have approximately ordered our tests from fastest to slowest. This ordering is not guaranteed (no one says a function without any dependencies can't be extremely slow) but it's approximately the case for any reasonably complex codebase.

  2. If function_a calls function_b, and function_a has a bug in it then function_b most likely does too. So once again, while this approach does not provide guarantees, we've increased our likelihood to find our bugs faster by testing the functions that others depend on first. For a sufficient quality test suite, the inverse can also point to where your bugs are. If function_a is not failing tests, but function_b is, it tells you a bit about where to look for your issue.

Failing fast is cool, and it's certainly an improvement to the quality of life for larger test suites, but this is just the beginning of the functionality we've unlocked.

Detect Missing Tests

As we discussed in the last section, we now know what functions have tests and any sub-functions that those functions call. While we can use tests for those sub-functions to build our dependency graph, we can also detect and report when those sub-functions don't have tests! This is a great, high-level step toward assessing your test suite's code coverage. While it's often not worth it to ensure you've "tested" every line of code, I'd assert it's almost always beneficial to make sure you're at least testing every function of your code.

"But wait, that sounds like a lot of work. I already wrote good tests for a top-level function; now I have to write tests for all the sub-functions it depends on?", I hear you say. Don't worry we've saved the best for last because with sundew you don't! Because sundew can:

Automatically Write Regression Tests (For Free!)

So at this point, sundew knows:

  1. Which functions have tests

  2. What sub-functions are called by those functions

  3. Which of those sub-functions don't have tests

If we combine all this knowledge we can do something really cool. When sundew runs the tests you have written, as long as they pass, it can patch the sub-functions without tests and automatically write regression tests for them. If we know function_a passed1` its test, and while we ran that test function_a called function_b with the arguments (a=123, b="456") and function_b returned "123456" then we've got all we need to automatically write a basic sundew test for function_b. This happens recursively and for all sub-function calls.

What's even better is we know:

  1. The test for function_a is going to run the same function with the same arguments as the automatic test we just wrote for function_b

  2. Sundew automatically orders our tests by function dependencies

This allows us to cache function_b so we don't waste time running the same function twice in our test suite. This means the run time for your test suite does not increase at all, even though its coverage does with the newly added tests πŸŽ‰.

Now, these auto written tests don't know what side-effects to test, nor are we running property tests to check other possible inputs that could still fail, which is why I refer to them strictly as "regression tests". They take the effort and cost out of writing tests to ensure that your changes in the future don't break what's working today. I believe there is room to make these tests even more effective in the future, after all this is only v0.1.0!

Intangible Wins

The sections above are the standout features we get from the way sundew models tests and enforces the strict relationship with your functions. However, I've also found some intangible wins that come from this structure that I think are worth considering:

  • The Single Responsibility Principle is proven to be a good pattern when designing your codebase, and inherently sundew promotes this principle in your code as well as requiring it for your tests.

  • Isolation is easier for sundew to control because we know what we're isolating. I haven't eliminated the need for patches to isolate side effects, but sundew does automatically isolate function inputs and outputs, which gives you less to worry about.

  • Tests don't silently fail. A forgotten pass statement, or running a coroutine without an await, or even just a mistyped assertion can cause tests to pass when they shouldn't. With sundew the relative rigidity in how we define tests avoids many of these possibilities.

What's Next?

Sundew is very new. I expect we're going to find edge cases and issues as early adopters try it out in the wild. My priority is tackling those issues and improving the reliability of the core functionality of sundew that I've built so far. After things mature and stabilize, I think we've just scratched the surface of what sundew unlocks and I think there are some very cool features in its future like:

  • Detecting which functions had code changes and only re-running relevant tests, rather than running the whole test suite every time. Just like pytest-testmon (an amazing plugin if you use pytest today πŸ‘), but I believe we can more easily detect changes to functions and their dependencies with sundew, without relying on coverage.py

  • I believe mutation testing is ready to have its time to shine in the Python world but is held back by the huge computational load of testing mutations across your code. I believe sundew's function-based testing method can allow us to reduce testing mutations to just the relevant function tests and might lower that barrier enough to get a better coverage metric than "lines of code ran".

  • I think there is more we can do with auto-generated tests, and I'd love to continue exploring making them more robust than just regression tests

  • I don't know yet how property-based testing libraries like hypothesis fit into sundew, but I'm eager to find out!

If you're excited about what I introduced in this blog post, or the features above that I've outlined, I'd love it if you gave sundew a try! It's early days and while I've done my best to ensure a smooth launch, if you find a new edge case just drop an issue and I'll address it as soon as I can. If you're interested but it's too soon for you to take the plunge, you can subscribe to this blog for future updates as the project matures and improves. Otherwise if you have any thoughts, questions, or ideas for sundew feel free to drop a comment! Thanks for checking out the introductory post to sundew, I'm excited to see where this thing goes.

Happy Testing!

Β