Bad Statistics Will Be the Death of Us All!
Note from Andrew: I’m experimenting with publishing content from smart people I trust who don’t have another outlet for their intelligent writing. This is one of those essays, written by a wiser man than I.
Bad statistics bother me everywhere I see them, but it makes me near homicidal when bad statistics are used in ways that will kill people. I’m not faulting people who are statistically illiterate. I know plenty of people who are math and engineering literate who are STILL statistics illiterate. Statistics is hard and not just for math reasons. It requires bending your mind around things like the null hypothesis and understanding words that you thought you knew what they meant being used in ways that are different from what you believe they mean. It would almost be easier if statistics completely had its own technical jargon rather than reusing words that people find familiar in ways that lead them to be wrong. The upshot is that somebody could use statistics to propagate lies and it would look pretty good to someone without a strong statistical background. They could even, with a little more effort tell lies to a person with that strong background if that person didn’t actually look at the data and methodology closely.
So let me say this right now. I’m not going to bore you with statistics or math. I will have links to the source papers or data below if you are of a bent that you would like to see and check the math and methodology yourself. I will, however, give you some simplistic definitions just so we can have this conversation. A population is a group that you want to make statements or generalizations about. A parameter is that statement or generalization. A sample is a subset of the population that you know some things for sure about. A statistic is a collection of the things you know about that sample that you are going to use to guesstimate a parameter. Much better and many more definitions are linked below. In general, you need the samples to be pretty random to make any good guesstimates. That’s all we are doing for now, but if you are interested the link below gives you a pretty full glossary of statistical terms, and if you choose to read and learn them all, even without learning any of the math, you will know more about statistics than most professionals that use them every day.
On March 27, New York state had only 728 cumulative reported deaths from covid19. We now know that that number was probably low because of a host of reporting issues and could have been much higher in reality. You can see this is in revised numbers recently coming from NYC. China also recently reevaluated and moved their number significantly upward. This is pretty normal in pandemics, guerrilla warfare, earthquakes, and other disasters — as more and better information comes in you get better data. I am only really mentioning deaths because it doesn’t matter how many covid19 “cases” we reported then or even now. Our testing is so inadequate and NONrandomly distributed that the covid19 “cases” number is meaningless. It is, in other words, a bad sample that we’ve skewed by restricting testing to certain circumstances and cannot use to make generalizations about the population as a whole (there are small specific exceptions to this, I might look at South Korea’s total numbers and make some generalizations from that, or Singapore’s, and I think that the testing from the Roosevelt is particularly useful to make generalizations about populations that are trapped, like naval vessels or prisons). Every other statistic you hear based on that case rate is similarly garbage. That decision to limit and restrict testing probably was a good decision in triage circumstances, but it doesn’t help us use the information to make large generalizations. Deaths, however, we CAN use. Even knowing a certain amount of deaths are unreported it is a safe generalization to believe that the initial underreporting is going to be true to a similar extent everywhere just due to the essentially similar nature of disasters. Right now(this was written on 4/21/2020) the state of Georgia has around 818 cumulative reported deaths and the slope of that data is up to 4/10/2020 essentially identical to NYS on 3/27/2020, but they are gonna open all their businesses up. They are idiots and you don’t need a strong background in statistics to know that. All of those extra needless deaths coming lie squarely on the shoulders of all the idiots involved in that decision.
Let’s instead talk about this pressure that people feel about opening back up for business. It’s also wrong. Just flat out wrong. It requires a long train of data analysis and statistics to show it wrong, though, and that doesn’t fit in a sound bite or even a print news article. Let’s talk about the 1918/1919 flu pandemic. It’s not a perfect analog to our current situation but then, as now, we know that different parts of our country adopted different strategies. We know that some of them, Philadelphia for example, were strategies that definitely made things worse, and that other places adopted strategies that resulted in fewer deaths. The JAMA released a paper on the subject back in 2007. In general, it can be said that the sooner and longer that things were shut down the fewer people died. That is a broad generalization, but I encourage you to read the paper and check the data and confirm that. I am confident you will come to the same conclusion.
There are and have been people looking at the economic impacts of pandemics. I link below a paper by the STL Fed. It is a little horrifying how much their demographic data from 1918/1919 mirrors what we are seeing currently. Give it a quick read if you think the people who run the federal reserve are heartless, you will be surprised. There is another paper out right now for peer review from a group of people including some VERY top-notch statisticians from MIT. I link it below as well. They build on some of the work already done and use records from then that are accessible now. They look at manufacturing output from that time and banking assets because those are records that we can localize to specific parts of the country. They cross it against which cities/regions had more successful strategies at reducing death. We can see that the places that shut down earlier and longer not only had less death but had stronger and faster economic recovery after the shutdowns. The places that shut down earlier and longer continued to show those stronger recoveries well into 1923. The takeaway I want you to have from this is that places with strong and sustained shutdowns had a 4-year headstart on the neighboring economies when recovering from the pandemic. You don’t have to believe me. You can check the data yourself. If you took my advice above and read the glossary you now know as much as the “peers” they will have reviewing it.
With that in mind. Every single person who is telling you that we have to reopen the economy fast is wrong. Not idiots necessarily because they don’t all yet have this information in their hands, but still wrong. The best way to help the non-idiot people who are wrong is to spread good information with the tools to verify it.
The very succinctly titled, yet to be published but available for review, paper that started me thinking and researching and verifying data and claims. Pandemics Depress the Economy, Public Health Interventions Do Not: Evidence from the 1918 Flu https://papers.ssrn.com/sol3/Papers.cfm?abstract_id=3561560
JAMA Nonpharmaceutical Interventions Implemented by US Cities During the 1918–1919 Influenza Pandemic https://jamanetwork.com/journals/jama/fullarticle/208354
info on NYS numbers. with super easy links to the data in .csv so you can put it in R or Minitab or Excel or whatever tool you like for manipulating data. https://www.syracuse.com/coronavirus-ny/
Georgia DPH numbers. Not as mini-tabbable as NYS but at least published.
2007 report on pandemic flu from the Federal Reserve Bank of St. Louis. https://www.stlouisfed.org/~/media/files/pdfs/community-development/research-reports/pandemic_flu_report.pdf
Berkeley Statistics glossary