STATISTICAL PARADOXES AND FALSE MYTHS
5 min readThe advent of increasingly sophisticated scientific models, such as quantum mechanics or those of social psychology, has lead to fading the Promethean tension towards inexhaustible and infallible knowledge. It has plunged us into a dimension dominated by uncertainty and in which many dynamics are beyond our control and escape our understanding. The inner distress that derives from this has been projected and sublimated in the figurative arts, nourishing, among others, the masterpieces of existentialism and absurdism.
When faced with large populations (of particles or people, for instance), it is impossible to formulate satisfactory deterministic theories, namely that do not involve random variables. The single element escapes to an univocal characterization and sometimes assumes ontological status only according to its interaction with others. As the well-known Heisenberg uncertainty principle teaches us, the simple act of measuring a property will tend to affect it. Similarly, in order to predict and interpret the behavior of an individual, it is necessary to frame it in a society shaped by a disorienting variety of historical, cultural and economic factors, among others.
And here is that the theory of probability and statistical survey invariably emerge in favour of all disciplines. If, on the one hand, they have allowed us to progress vertiginously in our understanding of reality, on the other hand, they hide pitfalls that should not be underestimated.
Imagine that you have a sweet tooth (it shouldn’t require too much effort…) but you are allergic to dried fruit, and you have a friend who takes pleasure in torturing you for this reason. He puts two boxes in front of you: a white box containing about seventy assorted candies and about ten almonds, and a black box containing two hundred chocolates and about sixty peanuts. We can also suppose that this friend is a pastry chef, otherwise we cannot explain where he got everything. In any case, you are allowed to drawing lots only one item out of a box (without looking inside, of course) on the condition that you have to eat whatever comes your way. After a quick calculation, you decide to pick from the white box because it is the one that guarantees you a higher probability of catching a candy (87.5% vs. 77%, approximately). You catch a candy, and swallow it tastefully.
But you are not satisfied: you yearn for another candy, and you ask your executioner to repeat the game. To spice things up, in the white box there are now a hundred candies, compared to five hundred almonds. In the black box there are twenty chocolates but two hundred peanuts. Again you decide to pick from the white box (16.6% against 9%) and, with the complicity of fate, you pick a sweet to pamper your palate.
Just because it is your lucky day and your appetite is insatiable, you submit one last time to the bet. Your friend, irritated, mixes the contents of the two white boxes into one white box and the contents of the two black boxes into one black box. You rush to rummage through the white box, convinced that it is the best option in terms of probability, since it is the sum of the two best cases. But you are very wrong! The black box is now the more reasonable alternative, and not by a small margin (46% vs. 25%). You have run into the so-called Simpson’s paradox, for which the sum of two favorable cases can generate an unfavorable one (or vice versa).
It is relatively easy to understand what has happened, noting that probability is defined as the ratio of the number of favorable cases to the total number of favorable cases. Nevertheless, it is just as easy to be misled, since it is something counterintuitive.
Another paradox quite common in statistics is the Will Rogers paradox which occurs when, moving an element from one set to another, the average of both increases. Its name is due to the comedian Will Rogers and one of his jokes bordering the politically incorrect: “When Okies left Oklahoma to move to California, the average intelligence increased in both states”. Also in this case the solution is simple: if the element that moves is below average in its current set of belonging but above in that of destination, by definition of average will increase both.
A relatively frequent phenomenon in investigations carried out with little expertise, is the confusion between correlation among variables and causality. Leaving aside an exhaustive discussion, which would require defining a whole series of formal theoretical notions, it should be remembered that the simultaneous occurrence of two events does not authorize us to conclude that one of the two is the cause of the other, even if it may seem reasonable. For example, in the absence of a properly designed experiment (and further to the simple statistical investigation!), we cannot infer that spending many hours in front of a computer causes an increase in blood pressure just because the data of a sample suggest it. In fact, there are variables, called latent (lurking variables), not contemplated in the study, which are able to influence and determine both events. In this case it is the physical activity: people who perform less physical activity on average spend more time in front of the computer and have a higher blood pressure.
The problem arises when statistical investigations of a certain impact are invalidate by these procedural errors, due to the negligence or unorthodox intentions of those who conducted them. For instance, taking up the logic of candy boxes, it may happen that in clinical investigations a drug that is effective in treating a condition in both a group of elderly and young people, then appears to be ineffective or even harmful if you cumulate the data into a single set.
In other cases there are no real errors in the survey, but the results are expressed in such a way as to deviously amplify the emotional impact on those who read them and consequently the media resonance. In a study conducted in Australia it was found an increase of about 18% in the probability of contracting colon cancer when eating a diet rich in bacon. What is not made explicit is that the increase is relative, i.e. 9 individuals out of 100 have contracted cancer, compared to 8 out of 100 who statistically contract it without eating 50 grams of processed meat per day. And again: those who tan with artificial lamps (abusing them, by the way) are 50% more likely to contract skin cancer. 50% more than the risk assessed for normal sun exposure, which amounts to about 0.2%. In other words, the absolute risk has risen to only 0.3%.
To conclude, hoping that they will arouse you to self-reflection, a couple of quotes.
“The real difficulty lies in whether the events to be analyzed are completely independent or partially so.” George Yule
“Logic is bound up with this condition: to suppose desperately that identical cases are given, because without constants man could not survive.” Friedrich Nietzsche
Ciao, mi chiamo Martina e sono laureata in Lingue. Le mie passioni principali gravitavano da sempre intorno a letteratura, cultura, linguistica, viaggi, così ho deciso di intraprendere la strada della traduzione, specializzandomi grazie ad un Master. Adoro leggere, pratico beachvolley e amo stare a contatto con la natura.