It is said that there is a correlation between the number of storks’ nests found on Danish houses and the number of children born in those houses. Could the old story about babies being delivered by storks really be true? No. Correlation is not causation. Storks do not deliver children but larger houses have more room both for children and for storks.
丹麦流传着一种说法,一户人家屋檐上的鹳巢数量与这家人所生孩子的数量存在着相关性。婴儿是鹳鸟送来的古老传说是真的吗?当然不是。相关性跟因果关系不是一回事。鹳不会送来孩子,但大房子有更大的空间为孩子和鹳所用。
This much-loved statistical anecdote seems less amusing when you consider how it was used in a US Senate committee hearing in 1965. The expert witness giving testimony was arguing that while smoking may be correlated with lung cancer, a causal relationship was unproven and implausible. Pressed on the statistical parallels between storks and cigarettes, he replied that they “seem to me the same”.
这是一则人们喜闻乐见的统计趣闻,但如果你知道1965年在美国参议院一场听证会上它是如何被用到的,你就不会觉得那么有趣了。那位做听证发言的专家证人辩称,尽管吸烟或许跟肺癌相关,但两者之间不存在已证明的、令人信服的因果关系。当被问及为何把鹳和孩子的关系与香烟和肺癌的关系进行类比,他回答说,两者“在我看来是一样的”。
The witness’s name was Darrell Huff, a freelance journalist beloved by generations of geeks for his wonderful and hugely successful 1954 book How to Lie with Statistics. His reputation today might be rather different had the proposed sequel made it to print. How to Lie with Smoking Statistics used a variety of stork-style arguments to throw doubt on the connection between smoking and cancer, and it was supported by a grant from the Tobacco Institute. It was never published, for reasons that remain unclear. (The story of Huff’s career as a tobacco consultant was brought to the attention of statisticians in articles by Andrew Gelman in Chance in 2017 and by Alex Reinhart in Significance in 2017.)
这位证人的名字叫达莱尔•哈夫(Darrell Huff),是一名自由记者,因其1954年出版的那本精彩、大为畅销的《统计数字会撒谎》(How to Lie with Statistics)而深受数代极客的爱戴。如果该书续集付印的话,他今天的名声或许会完全不同。《吸烟统计数字会撒谎》(How to Lie with Smoking Statistics)使用了各种鹳式论点来对吸烟与癌症的相关性提出质疑。该书得到了美国的烟草研究所(Tobacco Institute)资助,但不知出于什么原因一直没有出版。(2017年安德鲁•格尔曼(Andrew Gelman)在《Chance》杂志上发表的文章,以及2017年亚历克斯•莱因哈特(Alex Reinhart)在《Significance》杂志上发表的文章,使哈夫担任烟草业顾问的经历引起统计学家们的注意。)
Indisputably, smoking causes lung cancer and various other deadly conditions. But the problematic relationship between correlation and causation in general remains an active area of debate and confusion. The “spurious correlations” compiled by Harvard law student Tyler Vigen and displayed on his website (tylervigen.com) should be a warning. Did you realise that consumption of margarine is strongly correlated with the divorce rate in Maine?
毋庸置疑,吸烟会导致肺癌和其他多种致命疾病。但广泛意义上的相关性与因果之间的尚存疑问的关系,仍是当前一个极易引起争议和混淆的领域。哈佛大学(Harvard)法学院学生泰勒•维根(Tyler Vige)编撰并发布在其网站(tylervigen.com)上的“伪相关”应算是一种警告。你知道缅因州人造奶油的消费量与离婚率之间存在很强的相关性吗?
We cannot rely on correlation alone, then. But insisting on absolute proof of causation is too exacting a standard (arguably, an impossible one). Between those two extremes, where does the right balance lie between trusting correlations and looking for evidence of causation?
所以,我们不能仅仅依赖相关性。但是,坚持为因果关系提供绝对证据就过于苛刻了(甚至是一种不可能达到的标准)。在这两个极端之间,如何在相信相关性与寻找因果证据之间达到合理的平衡呢?
Scientists, economists and statisticians have tended to demand causal explanations for the patterns they see. It’s not enough to know that college graduates earn more money — we want to know whether the college education boosted their earnings, or if they were smart people who would have done well anyway. Merely looking for correlations was not the stuff of rigorous science.
科学家、经济学家和统计学家倾向于要求为他们看到的现象提出因果解释。知道大学毕业生能赚更多钱还不够,我们想知道,大学教育是否提高了他们的收入,或者他们本来就是聪明人、不管接受大学教育与否都能赚更多钱。仅仅寻找相关性并非严格科学的做法。
But with the advent of “big data” this argument has started to shift. Large data sets can throw up intriguing correlations that may be good enough for some purposes. (Who cares why price cuts are most effective on a Tuesday? If it’s Tuesday, cut the price.) Andy Haldane, chief economist of the Bank of England, recently argued that economists might want to take mere correlations more seriously. He is not the first big-data enthusiast to say so.
但随着“大数据”的到来,这场争论开始发生变化。海量数据集可以产生一些有趣的相关性,在某些用途上它们就足够好用了(谁关心为何周二降价效果最好呢?如果确是这样,那就选这一天降价。)英国央行(BoE)首席经济学家安德鲁•霍尔丹(Andy Haldane)不久前表示,经济学家们或许想更认真地看待纯粹相关性(mere correlation)。他不是第一个这么说的大数据热衷者。
This brings us back to smoking and cancer. When the British epidemiologist Richard Doll first began to suspect the link in the late 1940s, his analysis was based on a mere correlation. The causal mechanism was unclear, as most of the carcinogens in tobacco had not been identified; Doll himself suspected that lung cancer was caused by fumes from tarmac roads, or possibly cars themselves.
我们回头来讲抽烟与癌症之间的关系。20世纪40年代末,英国流行病学家理查德•多尔(Richard Doll)最早开始怀疑二者之间的联系。当时他的分析基于纯粹相关性,他不清楚因果机制,因为当时还没确定烟草中的大多数致癌物。多尔本人怀疑肺癌的致病原因是柏油公路的烟气,或者可能就是汽车本身。
Doll’s early work on smoking and cancer with Austin Bradford Hill, published in 1950, was duly criticised in its day as nothing more than a correlation. The great statistician Ronald Fisher repeatedly weighed into the argument in the 1950s, pointing out that it was quite possible that cancer caused smoking — after all, precancerous growths irritated the lung. People might smoke to soothe that irritation. Fisher also observed that some genetic predisposition might cause both lung cancer and a tendency to smoke. (Another statistician, Joseph Berkson, observed that people who were tough enough to resist adverts and peer pressure were also tough enough to resist lung cancer.)
多尔与奥斯汀•布拉德福德•希尔(Austin Bradford Hill)在1950年发表了他们关于吸烟与癌症关系的早期研究结果,由于俩人的研究基于纯粹相关性,在当时果不其然遭到了批评。伟大的统计学家罗纳德•费雪(Ronald Fisher)在20世纪50年代多次加入论战,指出很可能是癌症引起吸烟,毕竟癌前期病变会对肺部造成刺激,人们可能会通过吸烟来缓解这一刺激。费雪还认为有些遗传特征可能既会引发肺癌,还会引起吸烟倾向。(另一位统计学家约瑟夫•伯克森(Joseph Berkson)提出,假如一个人强悍到足以抵制广告的诱惑和同龄人的压力,那么他也强悍到足以抵抗肺癌。)
Hill and Doll showed us that correlation should not be dismissed too easily. But they also showed that we shouldn’t give up on the search for causal explanations. The pair painstakingly continued their research, and evidence of a causal association soon mounted.
希尔和多尔的例子告诉我们,不要轻易否定相关性,但他们也以行动证明,不应放弃寻找因果解释。俩人继续勤恳研究,很快就发现了更多表明因果关系的证据。
Hill and Doll took a pragmatic approach in the search for causation. For example, is there a dose-response relationship? Yes: heavy smokers are more likely to suffer from lung cancer. Does the timing make sense? Again, yes: smokers develop cancer long after they begin to smoke. This contradicts Fisher’s alternative hypothesis that people self-medicate with cigarettes in the early stages of lung cancer. Do multiple sources of evidence add up to a coherent picture? Yes: when doctors heard about what Hill and Doll were finding, many of them quit smoking, and it became possible to see that the quitters were at lower risk of lung cancer. We should respect correlation but it is a clue to a deeper truth, not the end of our investigations.
希尔和多尔在寻找因果关系时采取了一种务实的方法。比如,是否存在一种剂量效应?是的,烟瘾大的人更可能患肺癌。烟龄长短有关系吗?有关系,吸烟者开始吸烟很久后,癌细胞开始形成。这与费舍尔设想的人们在肺癌早期阶段用烟草进行自我医疗的假设相矛盾。多个证据来源凑到一起能否得到一个逻辑连贯的描述?答案是:能够得到。当医生们听闻希尔和多尔的发现时,许多医生开始戒烟,现实情况也表明戒烟者患肺癌的风险要更低。我们应该尊重相关性,但相关性只是通向更深层真理的一个线索,而不是研究的终点。
It’s not clear why Huff and Fisher were so fixated on the idea that the growing evidence on smoking was a mere correlation. Both of them were paid as consultants by the tobacco industry and some will believe that the consulting fees caused their scepticism. It seems just as likely that their scepticism caused the consulting fees. We may never know.
目前尚不清楚为什么面对越来越多的吸烟致癌的证据,赫夫和费雪却执着地认为这仅是相关性。他们二人都是烟草行业的顾问,因而有些人会认为他们的怀疑动机来源于顾问费。但也很可能正是他们的怀疑带来了顾问费。到底哪个为因,哪个为果,后人可能永远不得而知。
新gre词汇germane
新gre词汇circumspect
GRE类比分类大全(下)
新gre词汇virulent
新gre词汇conflagration
新gre词汇vilify
如何应对gre逻辑阅读陌生词汇?
新gre词汇:意近词归纳法
新gre词汇belie
新gre词汇intractable
新gre词汇lament
新gre词汇stoic
新gre词汇petulant
新gre词汇Ubiquitous
新gre词汇immaculate
新gre词汇insipid
新gre词汇antediluvian
新gre词汇truculent
新gre词汇coalesce
新gre词汇tractable
新gre词汇clandestine
新gre词汇disputatious
新gre词汇immanent
GRE词汇分类汇总:形近词
新gre词汇flagrant
新gre词汇maladroit
新gre词汇adroit
gre新增词汇汇总
新gre词汇contumacious
新gre词汇indomitable
不限 |
英语教案 |
英语课件 |
英语试题 |
不限 |
不限 |
上册 |
下册 |
不限 |