Language / Sprache:

Statistical Hypothesis Testing

Fri Mar 6 09:06:58 2015

Even in our enlightened modern society, every now and then dubious theories, claims, practices or institutions arise that appear to be scientific, but fail to meet the requirements.

Often they simply violate the basic rule of science "The truth you claim to have uncovered must also be testable by others," by making statements which inherently defy objective verification. Statements such as "Terrestrial radiation, despite neither being perceptible nor otherwise measurable, makes it harder to stay focused, although sensitivity varies among individuals. Our inexpensive protective amulet neutralizes this harmful radiation." Similar claims are frequently accompanied by fanciful theories on arbitrarily curious forms of energy, often sententiously phrased, yet scientifically downright baseless, because not testable.

Nevertheless, whether with water vitalizers, lunar calendars, divining rods or sidereal pendulums, occasionally there also appear claims quite accessible to verification. For example, "Product XY is a physical water vitalizer, which further improves the taste."

To systematically examine such statements for their truth, this site helps to conduct a so-called statistical hypothesis test, a method of making substantiated decisions using experimental data. Simply put, a controlled experiment is conducted several times and then the probability of the overall result occurring by chance is calculated. If this probability – the so-called p-value – is very low, and if the overall result supports the original statement, then it is very likely that the statement is true, i.e. that it was not just coincidence. A high p-value, on the other hand, suggests the result can, with equally high probability, be attributed to mere chance and therefore no reason exists to accept the statement as true, even if the overall result seems to support it. A grossly contradictory result obviously falsifies the statement directly.

By way of illustration let's now investigate the above statement about the water vitalizer XY, though the general approach is applicable to similar situations. Say the experiment includes four visually identical glasses of water, placed on a line in a random order, one of which is filled with "vitalised" water, the remaining three glasses with ordinary water. A test subject, convinced that vitalized water tastes different to conventional water, has to identify the vitalized glass. The probability of such a random hit evidently equals ¼ and that of a random miss ¾ accordingly. The probabilities of all possible totals of random hits after several repetitions of this experiment follow a binomial distribution. Now, a statistical hypothesis test could consist of ten such repetitions, i.e. of ten rounds. If the test subject obtained seven hits overall, this would yield the following result: click here for the example evaluation. The original claim could therefore be accepted as true with fairly high certainty because the probability is very low that the outcome occured just by chance.

Naturally, with these kinds of experiments there is the undesired possibility that test subjects - whether consciously or unconsciously - reach their answer in unforeseen ways. In the previous example, not only the sense of taste but also other sources of information could possibly be consulted: the glasses might only appear identical at first sight but might actually differ slightly yet perceptibly, a subtle yet visible pattern in the seemingly random arrangement of glasses might manifest or additional information might unknowingly be transmitted through facial expressions or body language by other persons present in the room. To control and to minimize such confounding factors is ultimately the responsibility of the experimenter. In the above example, for instance, the light could be dimmed in order to compensate for minor visible differences between the glasses, their order could be determined by a sufficiently random mechanism to prevent pattern formation and only one person could be allowed to stay in the room at any given time.

This site tries to encourage participants to meet said requirements in the form of a "computer-aided blind study". Each round consists of three steps:

  1. The experimenter enters the room, places the glasses according to a computer-generated arrangement and leaves the room.
  2. The test subject enters the room, arrives at an answer, enters it into the computer and leaves the room.
  3. The computer compares the given answer with the generated one, stores the result and generates a new random arrangement.

After all rounds are completed, the computer generates a detailed evaluation report and displays the result. Every statistical hypothesis test conducted on this site is assigned a unique identifier and a static URL, which allows you to interrupt the experiment and return to it any time. Once a test is completed, the unique URL will point directly to the evaluation report.

If necessary, additional persons can certainly be involved, e.g. to monitor and to ensure the correct execution of steps 1 and 2, but consequently the same person must not be present during more than one step in order to minimize the number of potential confounding factors.

It shall remain an open question whether the false statements we encounter are due to innocent naivety or plain old charlatanry; this website merely aims to help you expose false claims scientifically and autonomously. And also, of course, to substantiate genuine claims, for the magic of today is the science of tomorrow.


Name: Website:
0/1000 (HTML not allowed. URLs are auto-linked.)