We’re deluged with claims that we should do this, that or the other thing because some study has a “statistically significant” result. But don’t let this particular use of the word “significant” trip you up: when it’s paired with “statistically”, it doesn’t mean it’s necessarily important. Nor is it a magic number that means that something has been proven to work (or not to work).
The p-value on its own really tells you very little. It is one way of trying to tell whether the result is more or less likely to be “signal” than “noise”. If a study sample is very small, only a very big difference might reach that level, while it is far easier in a bigger study.
But statistical significance is not a way to prove the “truth” of a claim or hypothesis. What’s more, you don’t even need the p-value, because other measures tell you everything the p-value can tell you, and more useful things besides.
This is roughly how the statistical test behind the p-value works. The test is based on the assumption that the study’s hypothesis is not true – the “null hypothesis”. The calculation then estimates whether you would expect this result, or one further away from “null” than it is, if the hypothesis of the study is not true. The level to reach statistical significance is likely to be 95%, which is common practice, but still a bit arbitrary.
If the p-value is <0.05 (less than 5%), then the result is compatible with what you would get if the hypothesis is true. The finding itself can be a fluke. You can’t conclude too much based on that alone.
Always keep in mind that a statistically significant result is not necessarily significant in the sense of “important”. It’s “significant” only in the sense of signifying something. A sliver of a difference could reach statistical significance if a study is big enough. For example, if one group of people sleeps a tiny bit longer on average a night than another group of people, that could be statistically significant. But it wouldn’t be enough for one group of people to feel more rested than the other.This is why people will often say something was statistically significant, but clinically unimportant, or not clinically significant. Clinical significance is a value judgment, often implying a difference that would change the decision that a clinician or patient would make. Others speak of a minimal clinically important difference (MCID or MID). That can mean they are talking about the minimum difference a patient could detect – but there is a lot of confusion around these terms.