In recent weeks I have been reading a little on the subjects of cognitive biases and human fallibility. Earlier this week I stumbled upon the work of clinical psychologist Paul E Meehl.
In his 1954 work on expert intuition titled Clinical versus Statistical Prediction: A Theoretical Analysis and a Review of the Evidence, Meehl reviewed the results of several clinical studies and concluded that in many low-validity environments, experts are less accurate at predicting outcomes than simple statistical analysis. As an example, one of the studies discussed the accuracy of predicting college student’s honor-points (i.e. their grades). The comparison was between the predictions of student counsellors who had access to an array of information on the students, and a simple calculation based on two variables (school rank and aptitude test score). The results showed the statistical analysis to be more accurate than the clinical assessments – the results were in fact not statistically significant, but either way, it is striking that the intuition of the experienced “experts” who had access to lots of data was no better than a simple calculation based on just two of the available pieces of data.
On first reading these finding are both surprising and disconcerting. Does Meehl’s report actually suggesting that someone with vast academic achievements and a career’s worth of industry experience will be no more accurate (or even less accurate) at making predictions than a few numbers fed into a simple algorithm?
My brain kicked into gear and I began to imagine the implications this 60 year old study may have on the profession of software testing.
Surely the development of any non-trivial software application could be considered low-validity, simply due to the lack of time available and the impossibility of exhaustive testing. With the uncertainty of how production ready an application may be, then perhaps the opinions of a software tester might be less accurate than an analysis of a few available statistics (e.g. bugs rates, unit test coverage, previous release failure rates etc).
The black mist was descending over me, “my career is over”, I thought. Previously I had thought that my 10+ years of experience in software development and testing enabled me to provide a “professional assessments” of software quality, but now I was reading (or at least I thought I was) that all of my knowledge and skills were now redundant and would soon be surpassed by some kind of algorithm.
Thankfully my mind did not stop there. As I delved further I soon discovered the error of my thoughts.
Obviously the profession of software testing was not dead, in fact on further contemplation I realised that Meehl’s finding actually proved the exact opposite. Of course my intuitive judgement of the quality of a software application was unreliable. But intuition is not the currency of software testing – no – we software testers are not mystics, we’re scientists. We don’t make predictions based on our gut feelings, we make critical assessments based on observations from laboratory-style experiments.
What the study has taught me, however, is that as software testers we must tread a fine line between intuitive assessments and objective measurements. By this I mean that as experts in our fields we are able to look at complex situations and make instinctive decisions about software quality based on some internal heuristics. This can be helpful when asked to make quick decisions in pressure situations but we must not let ourselves become complacent.
We must not lose sight of the fact that our role is a pursuit of information. We must continue to find new ways to measure the systems we are testing, to collect more rich and diverse data to analyse, in order to provide objective and statistically robust predictions about their future performance. We can then use our social competence to combine the statistics with our knowledge and experience to provide detailed assessments of software quality. This approach is not only in the best interests of the customers we serve, but also helps to protect our integrity and credibility as software testing professionals.