H0, null hypothesis – challenge your preconceived notions

After listening to Göte Nyman at TEDxHelsinki about Null hypothesis (H0), I just had to write down some thoughts that arose from it. First a recap:

Göte presented examples from popular news media, such as “coke and pepsi activate different areas of the brain”. Well duh, if they are experienced differently, someone might assume that the experience manifests itself as brain activity. So the null hypothesis behind the news item was that coke and pepsi affect the brain in the same way. So the point was that looking at the assumptions behind “revelations” tells you something about the naïve assumptions people generally have about the topic. I mean, it’s not news unless it challenges current thinking. So each scandalous headline reveals something about the assumptions of the editor about what the audience thinks about the topic. [Read more…]

Failures of the scientific method and the waterfall method (again)

I originally discussed the waterfall method in a previous blog post. Pranav just made a comment that I feel needs some further discussion:

I disagree with the following statement: “This is how science (unfortunately) often works – researchers just cite something, because everyone else does so as well, and don’t really read the publications that they refer to. So eventually an often cited claim becomes “fact”.”

I would like to say that this is generally not how science works. Claims(predictions) made by any scientific theory is verified by multiple laboratories/people over a period of time and they are always falsifiable.

The problem with software development is that it is not based on a scientific theory yet (I am not talking about computer science here, only software development). It has been said many times that it is more of an art than science.

Problems with science

Pranav, thanks for your comment. Of course you are right in that science normally should work by verifying findings, and falsifying those that are found to not hold up to evidence. However, my experience as a researcher has shown that this is unfortunately not often what happens. I’ll present a few examples.

First, the convention of citing another article. If you follow the rules strictly, you should only cite someone when you are referring to their empirical findings, or to the conclusions they draw from that. Nothing else. Nothing. But, unfortunately, most scientific articles do not follow these rules, but liberally make citations like “X used a similar research design (X, 2006)”. Wrong! While X may very well have done so, you’re not supposed to cite that. You only cite empirical results and conclusions thereof. But this is what happens. Because citation practice has degenerated across all fields, it is quite difficult to see the quality of a citation – is the claim (that is backed by a citation) based on empirical evidence or is it just some fluff? It has become too easy to build “castles in the sky” – researchers citing each other and building up a common understanding, but with no empirical evidence in what everyone is citing to. This is what has happened to the Waterfall model – researchers in the whole field have cited to Royce (1970) because that’s what everyone else has done. And the citation isn’t valid – there’s no empirical evidence to back the claim that a linear process works, and actually there’s not even a fluffy “i think so” claim to that effect, but rather Royce considers the idea unworkable. This hasn’t stopped hundreds of researchers from citing Royce. I consider this an excellent example of the failure of the scientific method. Of course the method in theory is still solid, but the degenerate practices have made it prone to failure.

In my field of educational technology I see this only too often. Cloud castles being built as researchers cite each other in a merry-go-round, with no-one realizing that whoever started the idea did not in fact have empirical evidence for it, but just a fluffy idea. And shooting down these established cloud castles is not easy, because the whole scientific publishing industry is also skewed:

  • You usually can’t get an article published unless to cite the editors and reviewers of the journal, and support their findings. Therefore contraversial results are hard to publish in the journal where they would be most useful.
  • Researchers often do not publish zero-findings. That is, when they try to find a certain pattern, and they fail, then they just scrap the whole thing and move on. But the scientific method actually needs also these zero-findings, because there are many fluffy theories that may occasionally get statistically significant empirical evidence (because with a 95% margin of confidence there’s 5% of studies that will find a connection even if there is none), but cannot be easily proved wrong. Therefore the numerous studies that fail to find a connection would show that the theory isn’t very strong. But these almost never get published.

And let’s make it clear that this is not a problem of “soft science” alone. Let’s take for example statistical science. Those who do statistics will know Cronbach’s alpha as a reliability estimator. It’s the most commonly used estimator, being in wide use for decades. Unfortunately, it doesn’t work for multidimensional data, which most data actually is. It’s still being used for that, because “That’s what everybody is using”. Here’s the article where professor Tarkkonen proves that Cronback’s alpha is invalid for multidimensional data, and in fact most data (pdf). You’ll notice that it is not published in a peer-reviewed article. I’m told it was submitted years ago to a journal, and the editor replied something to this effect:

“Look, your article is sound and I cannot find any fault within it, but we can’t publish this. I mean we’ve been using Cronback’s alpha for decades. Publishing this would make us all look silly.”

Waterfall and software engineering

OK, back to the waterfall method and software engineering. I like Pranav’s comment that making software is more of an art than science. And I agree. Creating new software is not like producing another car off the assembly line, it’s like designing a new car. Creating copies in bits is easy and cheap, since the engineering of computers is good enough. But making software is more of an art, and a design process.

In fact I have a problem with the term “software engineering”. Because software isn’t engineered, it’s envisioned, explored, protyped, tried out, iterated, redone, and designed. Researchers and managers have been trying to get software engineering to work for several decades now, and what are the outcomes? Can we build software as reliably as we can build a new bridge? No. But, if the bridge builder was given for each new bridge requirements like “use rubber as the construction material”, changing the requirements, tools and crew each time, maybe building a bridge would not be so predictable either.

But this doesn’t mean that software development can’t be a scientific method. If we can do empirical data gathering (like metrics of source code, and budget/schedule overruns), and can find patterns there, then there is a science. I mean there’s science in arts as well – compatible colour combinations, the golden ratio and other rules (of thumb) are based on experience, and currently many of them have proper studies to back them up, not just folklore.

So also software development can be studied scientifically, although much of it is unpredictable and challenging. My understanding is that most of the studies published in the last decade quite uniformly find that practices such as iterative life cycle models, lean management, intuitive expert estimation instead of automated model estimation, and common sense, actually are well correlated with successful software development projects. So there are some rules of thumb with real empirical backing.

Even the waterfall method has empirical studies to show that it is a really horrible way to do software except in some extremely specific circumstances, which almost never coincide in a single project. If they would, it would be the most boring project in the world. I would not want to have any part it it. Remember, NASA flew to the moon in the 1960s with iteratively developed software and hardware. And I’d bet the software guidance system for the Patriot missiles (which never hit anything in the 1990s) were based on a linear model.

Don't draw diagrams of wrong practices – or: Why people still believe in the Waterfall model

The Waterfall model is originally invented by Winston W. Royce in 1970. He wrote a scientific article that contained his personal views on software development. In the first half of the article, he discusses a process that he calls “grandiose”. He even drew a figure of the model, and another showing why it doesn’t work (because the requirements always change). This model is the waterfall. He used it as an example of a process that simply does not work. In the latter half of the article he describes an iterative process that he deems much better.

OK, so why do people still advocate the waterfall? If you look at the scientific articles on software engineering that discuss the waterfall, they all cite Royce’s article. In other words, they’re saying something like “The waterfall is a proven method (Royce, 1970).” So they base their claims on an article that actually says the opposite: that the model does not work.

This is how science (unfortunately) often works – researchers just cite something, because everyone else does so as well, and don’t really read the publications that they refer to. So eventually an often cited claim becomes “fact”.

Agile methods and the linear method were both originally used already in the 1950’s. The scientific mis-citing boosted the waterfall’s popularity, but in the 1980’s it was being phased out – because people had found out (the hard way) that it does not work.

But, things change as a busy engineer in the US defense organization is asked to come up with a standard for military grade software projects. He doesn’t know what a good process would be, and he’s told that “Hey, Royce already came up with the correct method: the waterfall. Use it.” So the waterfall becomes the US military standard DoD-2167.

A bit later the NATO countries notice that they too need a software engineering standard. And they ask the US for their standard – no need in reinventing the wheel, right? And thus the waterfall is again used everywhrere, since if the military uses it, it must be good. Right?

Here in Finland we had already dropped the waterfall in the 1980’s, but after the DoD-2167 was sent over the Atlantic like a weapon of mass desctruction, it became the standard, again.

What have we learned here: The agile (iterative, incremental) methods are not new. They’re as old as the waterfall. There is no scientific evidence that the waterfall works – on the contrary, most projects fail. There is, however, massive evidence that agile methods improve productivity, quality and success rates.

So what should we really have learned: People remember images. And people often read the beginning of an article, but not the end.

So: Don’t draw figures or diagrams of wrong models, because people will remember them. And in the worst case that could cause hundreds of thousands of failed projects and billions of euros and dollars wasted for nothing.

Update (February 2010): Commenters have been asking for references to back up my claims, so I added links to the most central documents referred in this blog post.