Automating Integration Testing

Hello everyone,

I would like to know how you approach integration testing of your plugins? Does it always come down to designing isolated unit tests, which assert whether the resulting output of a function matches the expected one? Or are there better ways of evaluating performance of the whole system?

Let’s imagine our plugin generates urban layouts based on an input curve and simulates daylighting conditions of the resulting geometry. We could break the process down into a few steps:

  • Subdivide the input curve
  • Generate geometry
  • Run daylight simulation

Now, I would like to perform all these steps sequentially with various permutations of data sets:

  • concave/convex curves; planar/non-planar curves; some vertices below/above 0 elevation etc.
  • various rules for geometry generation (only high-rise; only residential; high/low density; mix use etc.)
  • run the simulation with a few curated weather files including purposely introduced errors in the data set

This could allow me to spot edge-cases uncaught by individual unit tests. It might be that the curve subdivision step always succeeds, but it results in very small cells, which trip my geometry generation logic. Or the resulting building geometry can’t be handled by the simulation module.

Currently I am performing these steps manually but would love to hear about your experience with automating similar integration tests and how best to approach it.

Thanks!

Hi,

I’m currently not developing a GH plugin, and by the time I did, I really had almost no experience with that topic. My plugin was quite buggy. But that was ok, because I had no will to spend most of my free-time on charity programming…
Nowadays, I would approach plugin development differently. I think testing is a mandatory thing, but you can also slow down development a lot by doing it. In other words, it’s always important to minimize the effort to the truly important things.

I would claim that Integration testing is a bit more useful, because it gives you more freedom for refactoring code, in comparison to unit-testing with a high code-coverage. I think anything with testing boils down to the structure of your code. Decoupling code with tight interfaces allows you to test the interfaces, threading anything inside as a black box.
In general, the key is always to limit the input and output of an isolated system and test it as good as possible. So integration-testing is not about testing the whole chain, it’s more about testing the interaction of some of your software components, depending on a realistic use-case. Some speak more about behavioral testing because of this.

So do you need to reproduce what you do manually? No, you don’t need to do this for an integration test. You can work with raw data, even using a Unit testing framework.

Apart from that, e.g. if you care that a subdivision of a curve can be too high or too low, then it doesn’t make sense to invest the time in testing it, just limit the parameters of your system. Same for all these null exceptions. Just guard against null states, and you don’t have to test that all the time.

Apart from that, there are bugs and there are bugs. Some are okay to fix later, based on an error report with use of solid logging. Some bugs are a showstopper, and therefore should never occur.

So you see, I believe that testing is a tradeoff work. You trade feature-development-time against reliability. A bug-free software might be a great piece of software, but its likely a feature-less software. Sure many claim, that you save the time for writing tests on a longer term, but this is an idealistic view! Because software development can also slow down due to bad code-structure or bad ideas; and it does it way more often. No matter if your test-coverage is exceptional high or not.

4 Likes

Thanks for a very elaborate answer @TomTom!

So far I’ve been mainly developing plugins with one or two other contributors, where I had an overview of the entire code base. It was possible to control all commits and anticipate which parts of the logic could be indirectly affected. This made manual testing procedures painful but possible.

Now, we’re gradually expanding the team and looking into ways of parallelizing development without compromising the quality or introducing regressions. Intuitively, the idea of modularizing everything into containers with Known Stable Interfaces makes a lot of sense. When a module gets refactored, its interfaces need to be verified for compatibility and - at least in theory - we should be good to go.

Thanks for the pointer on limiting the parameters of the system to prevent exceptions. Seems obvious in hindsight, but I see this discussion as a way of structuring information in my head and reading your thoughts definitely helps me better understand the concept.

Somehow it feels like unit tests are safeguarding us from the known bugs. I guess my question is about trying to find the unknown ones. Following the concept that a whole is more than the sum of its parts, there always occur bugs we had not anticipated. User reports is one way of finding these. Any other - preferably automated - ideas?

Now that I type it out, I thought of a generative framework similar to Galapagos which stress-tests the plugin to identify its limitations. Does anyone know of similar concepts in software development?

Maybe something else about interfaces. Ideally, interfaces consist only of basic value types and simple collections. If you need to pass in complex objects, referencing other complex objects, then your interface might look slim, but it is not. Basically, you just dump many more parameters into objects, which defeats the purpose of creating interfaces. So if you manage to deal with a truly slim interface, there is not much you need to test.

So what you are referring to as stress testing, is testing basic parameters for extreme values and just see if there is an exception being thrown or if the output still makes sense. But does it even make sense to do that automated?

It is not so much about how you test, it’s about having testable code, with predictable and data in- and output. If your software component is an “oven” and you put in meat and turn on 220 °C for 300 minutes, you expect it to output roasted meat, no matter how complex the oven is or how roasted it actually is. I mean, you can even (“stress”) test what happens if you try to cook with positive infinity of °C. Ideally any data input should have boundaries (e.g. no NaN allowed within a domain of 50°C - 300°C), but the only thing you can test, is that you get an appropriate exception/response.
It’s not your task to test at which parameters a perfect chicken is made. Users might use the oven to bake bread…

Simple interfaces with clear user-stories and on-the-point testing is the key in my opinion. Stressing a system for none-sense values is something you can do, but better is to actually validate any input in your actual code and unit-test the validation…

If you can make your integration tests with GH files, use Tenrec to run this tests as Visual Studio tests.

1 Like