What’s in a Test?
Here at spriteCloud, we offer our testing services and expertise to our clients. In the day to day conversations with them, we don’t make a lot of distinctions between one type of testing and another, but this similarity in treatment belies the fact that there are fundamental differences in what some of our activities try to achieve. In this post, we want to explore that a little bit, by examining the characteristics of testing activities.
Let’s look at the steps all of our test activities have in common first, before looking at some differences:
- We are provided by the customer with the software we’re testing, most likely installed on a test environment.
- We exercise the software’s functions, taking notes on how it behaves.
- We report something about the software’s behaviour back to our client.
The difference lies in what we report back:
- For functional testing, we report back whether the software behaves correctly.
- For load testing, we report back whether the software behaves correctly.
- For stress testing, we report back how well the software behaves.
The fundamental difference between the report resulting from one activity and the report resulting from another, then, is whether we can judge if the software behaves correctly, or whether we must leave that up to the client.
With that in mind, let’s move on to Wikipedia’s definition of an experiment, and compare the two:
An experiment is an orderly procedure carried out with the goal of verifying, refuting, or establishing the validity of a hypothesis.
Our testing activities certainly are orderly procedures, and they all share the goal of establishing the software’s behaviour. But what’s this bit about a hypothesis? How does that fit in?
Well, if our hypothesis is phrased as “does the software behave according to acceptance criteria?”, then we can also claim the testing activities to have the goal of verifying, refuting or establishing the validity of that hypothesis.
The crux lies in the presence or absence of acceptance criteria.
Wikipedia’s article on experiments recognizes this, and includes the following quote to distinguish controlled experiments from other kinds of experiments or other methods to gain insight into a topic:
A controlled experiment often compares the results obtained from experimental samples against control samples.
There we have it. Control samples and acceptance criteria are a different name for the same thing in a different setting. If we have acceptance criteria, we can turn any test activity into a controlled experiment.
What happens in the absence of acceptance criteria?
From a particularly strict perspective, an absence of acceptance criteria implies we cannot actually establish the validity of a hypothesis, because the hypothesis is too vaguely phrased to do just that. The best we can do is gather data, which will hopefully let us inch closer to refining our hypothesis by setting some acceptance criteria.
In science, the term for that is an observational study, and in testing, the term cross-sectional study describes the type of observational study that comes closest to our activities. We can quote again:
A cross-sectional study (also known as a cross-sectional analysis, transversal study, prevalence study) is a type of observational study that involves the analysis of data collected from a population, or a representative subset, at one specific point in time—that is, cross-sectional data.
For our purposes, we can simplify the definitions to the following:
- In an observational study, we exercise software to gather data about its behaviour.
- In an experiment, we conduct an observational study and additionally compare the gathered data to acceptance criteria in order to establish whether the software works as intended.
All of our testing activities will fall into one of those categories or the other. The most vivid example of this is the difference between a load test and a stress test; in both cases, you exercise the software, paying less attention to deep functionality than to error occurrence rates.
The crucial difference between the two types of test?
- A load test verifies that error occurrence rates are below a threshold (acceptance criteria) for a required amount of load (acceptance criteria), making it an experiment.
- A stress test establishes at which amount of load, error occurrence rates exceed a threshold (acceptance criteria), making it an observational study.
The example not only illustrates the difference between the testing activities, it also highlights that presence or absence of acceptance criteria is not quite what differentiates the two. Rather, it depends on whether acceptance criteria are sufficiently complete in order to verify the hypothesis that the software works as intended.
It is important to point out that either kind of testing activity is useful and has its place, but only the type of activity that fulfils the criteria of an experiment can result in assuring software quality. It is therefore in the interest of clients that activities that have more in common with observational studies are conducted rarely, and usually in order to gather enough data from which acceptance criteria can be inferred.
Or, to rephrase it more simply: the purpose of all testing activities is to eventually reduce all questions about software to “does it work as intended?”.