Failures of the scientific method and the waterfall method (again)

I originally discussed the waterfall method in a previous blog post. Pranav just made a comment that I feel needs some further discussion:

I disagree with the following statement: “This is how science (unfortunately) often works – researchers just cite something, because everyone else does so as well, and don’t really read the publications that they refer to. So eventually an often cited claim becomes “fact”.”

I would like to say that this is generally not how science works. Claims(predictions) made by any scientific theory is verified by multiple laboratories/people over a period of time and they are always falsifiable.

The problem with software development is that it is not based on a scientific theory yet (I am not talking about computer science here, only software development). It has been said many times that it is more of an art than science.

Problems with science

Pranav, thanks for your comment. Of course you are right in that science normally should work by verifying findings, and falsifying those that are found to not hold up to evidence. However, my experience as a researcher has shown that this is unfortunately not often what happens. I’ll present a few examples.

First, the convention of citing another article. If you follow the rules strictly, you should only cite someone when you are referring to their empirical findings, or to the conclusions they draw from that. Nothing else. Nothing. But, unfortunately, most scientific articles do not follow these rules, but liberally make citations like “X used a similar research design (X, 2006)”. Wrong! While X may very well have done so, you’re not supposed to cite that. You only cite empirical results and conclusions thereof. But this is what happens. Because citation practice has degenerated across all fields, it is quite difficult to see the quality of a citation – is the claim (that is backed by a citation) based on empirical evidence or is it just some fluff? It has become too easy to build “castles in the sky” – researchers citing each other and building up a common understanding, but with no empirical evidence in what everyone is citing to. This is what has happened to the Waterfall model – researchers in the whole field have cited to Royce (1970) because that’s what everyone else has done. And the citation isn’t valid – there’s no empirical evidence to back the claim that a linear process works, and actually there’s not even a fluffy “i think so” claim to that effect, but rather Royce considers the idea unworkable. This hasn’t stopped hundreds of researchers from citing Royce. I consider this an excellent example of the failure of the scientific method. Of course the method in theory is still solid, but the degenerate practices have made it prone to failure.

In my field of educational technology I see this only too often. Cloud castles being built as researchers cite each other in a merry-go-round, with no-one realizing that whoever started the idea did not in fact have empirical evidence for it, but just a fluffy idea. And shooting down these established cloud castles is not easy, because the whole scientific publishing industry is also skewed:

  • You usually can’t get an article published unless to cite the editors and reviewers of the journal, and support their findings. Therefore contraversial results are hard to publish in the journal where they would be most useful.
  • Researchers often do not publish zero-findings. That is, when they try to find a certain pattern, and they fail, then they just scrap the whole thing and move on. But the scientific method actually needs also these zero-findings, because there are many fluffy theories that may occasionally get statistically significant empirical evidence (because with a 95% margin of confidence there’s 5% of studies that will find a connection even if there is none), but cannot be easily proved wrong. Therefore the numerous studies that fail to find a connection would show that the theory isn’t very strong. But these almost never get published.

And let’s make it clear that this is not a problem of “soft science” alone. Let’s take for example statistical science. Those who do statistics will know Cronbach’s alpha as a reliability estimator. It’s the most commonly used estimator, being in wide use for decades. Unfortunately, it doesn’t work for multidimensional data, which most data actually is. It’s still being used for that, because “That’s what everybody is using”. Here’s the article where professor Tarkkonen proves that Cronback’s alpha is invalid for multidimensional data, and in fact most data (pdf). You’ll notice that it is not published in a peer-reviewed article. I’m told it was submitted years ago to a journal, and the editor replied something to this effect:

“Look, your article is sound and I cannot find any fault within it, but we can’t publish this. I mean we’ve been using Cronback’s alpha for decades. Publishing this would make us all look silly.”

Waterfall and software engineering

OK, back to the waterfall method and software engineering. I like Pranav’s comment that making software is more of an art than science. And I agree. Creating new software is not like producing another car off the assembly line, it’s like designing a new car. Creating copies in bits is easy and cheap, since the engineering of computers is good enough. But making software is more of an art, and a design process.

In fact I have a problem with the term “software engineering”. Because software isn’t engineered, it’s envisioned, explored, protyped, tried out, iterated, redone, and designed. Researchers and managers have been trying to get software engineering to work for several decades now, and what are the outcomes? Can we build software as reliably as we can build a new bridge? No. But, if the bridge builder was given for each new bridge requirements like “use rubber as the construction material”, changing the requirements, tools and crew each time, maybe building a bridge would not be so predictable either.

But this doesn’t mean that software development can’t be a scientific method. If we can do empirical data gathering (like metrics of source code, and budget/schedule overruns), and can find patterns there, then there is a science. I mean there’s science in arts as well – compatible colour combinations, the golden ratio and other rules (of thumb) are based on experience, and currently many of them have proper studies to back them up, not just folklore.

So also software development can be studied scientifically, although much of it is unpredictable and challenging. My understanding is that most of the studies published in the last decade quite uniformly find that practices such as iterative life cycle models, lean management, intuitive expert estimation instead of automated model estimation, and common sense, actually are well correlated with successful software development projects. So there are some rules of thumb with real empirical backing.

Even the waterfall method has empirical studies to show that it is a really horrible way to do software except in some extremely specific circumstances, which almost never coincide in a single project. If they would, it would be the most boring project in the world. I would not want to have any part it it. Remember, NASA flew to the moon in the 1960s with iteratively developed software and hardware. And I’d bet the software guidance system for the Patriot missiles (which never hit anything in the 1990s) were based on a linear model.

About Tarmo Toikkanen

Design-researcher, entrepreneur, author. Psychology of learning, engagement design, educational technology, prototyping, participatory design, copyright and Creative Commons.


  1. “If you follow the rules strictly, you should only cite someone when you are referring to their empirical findings, or to the conclusions they draw from that. Nothing else. Nothing.”

    You shouldn’t cite a mathematical proof?

  2. Juha says:

    But, unfortunately, most scientific articles do not follow these rules, but liberally make citations like “X used a similar research design (X, 2006)”. Wrong! While X may very well have done so, you’re not supposed to cite that.

    I don’t understand. It is an important piece of information that the writer is avare of previous experiances of other scientists. How could he say it without a cite?

    Describing connections makes it possible to combine different studies. In this example we could get better evidence of the suitability of different research designs.

  3. Good comments, 123 and Juha. Mathematics is of course a bit different in that it is purely theoretical. So maybe the word “empirical” is a bit wrong there, or misleading. The point is that you should only cite evidence about something. In mathematics it can be a theoretical proof, in most other sciences it is empirical findings.

    Juha has a good point about referring to previous studies, which of course is needed. My understanding is that citations are not the correct method for doing that. You can of course just make a reference in plain text, as in “X used a similar research design in his paper YYY”, but not make it an official sitation. If X used a novel design and actually had empirical evidence that it worked, then you could do a real citation, as in “X demonstrated the viability of his research design approach (X, 2006)”. But that’s a different thing.

    If we get really strict, the wording of citations should actually differentiate the cases when you cite someone’s empirical findings (eg. “X showed that…”) and someone’s conclusions (eg. “X claims that…”). Because the empirical findings (or theoretical proofs) are more objective than the conclusions drawn thereof.

    But this debate is a bit academic (sorry for the pun) because the practice of citations has changed in these decades. It may be that there is a valid need to cite others on more than just the evidence they’ve presented (like the method they used), but for the sake of keeping science scientific, it would help if we had a system of classifying citations so we could easily and without error see which citations cite evidence, which cite conclusions, and which just site something else. Because these citations have different strengths and when readers mix them up, science becomes gossip, hearsay, cloud castles, and story telling.

Speak Your Mind