The Daily - Counting the Infected

Episode Date: July 8, 2020

For months, the U.S. government has been quietly collecting information on hundreds of thousands of coronavirus cases across the country. Today, we tell the story of how The Times got hold of that dat...a, and what it says about the nation’s outbreak.Plus: a conversation with three U.S. astronauts aboard the International Space Station.Guests: Robert Gebeloff, a reporter for The New York Times specializing in data analysis.Bob Behnken, Doug Hurley and Chris Cassidy, NASA astronauts aboard the International Space Station.For more information on today’s episode, visit nytimes.com/thedaily Background reading: The C.D.C. figures provide the fullest and most extensive look yet at the racial inequity of the coronavirus.A Times analysis published in late May found that Democrats were far more likely to live in counties that had been ravaged by the virus, while Republicans were more likely to live in counties that had been relatively unscathed.A team of New York Times journalists is also working to track every coronavirus case in the United States, and The Times has made its data open to the public.

Transcript
Discussion (0)
Starting point is 00:00:00 From The New York Times, I'm Michael Barbaro. This is The Daily. Today. For months, the U.S. government has been quietly collecting information on hundreds of thousands of coronavirus cases across the country. coronavirus cases across the country. My colleague, Robert Gebeloff, on the story of how the Times obtained that data. It's Wednesday, July 8th. Robert, you live in a corner of the Times, the data team, that I'm not sure most people understand all that well. So when the pandemic starts, how do you all respond? So by training, my goal is to find stories that can best be told through data, which is not every story, but there's a lot of
Starting point is 00:01:00 stories out there. So if you go back to early March, the pandemic is starting. And I know that, you know, our job as The New York Times is to really get our arms around what's going on. And by that, to start collecting the data that is starting to come out about cases and deaths around the country. So my colleagues set up a team of people across different departments whose primary job would be to monitor all the states, all the major counties, and gather the information and start to build a database. Start to say, we're getting information from New York over here
Starting point is 00:01:47 and California over here, but let's put it into one database just for the purpose of tracking where the cases were, where the deaths were. You're saying it's not coming out on a national level. There's no big clearinghouse that's going to hand you data every day about exactly where the virus is all across the country. Correct. national level. There's no big clearinghouse that's going to hand you data every day about exactly where the virus is all across the country. Correct. And at that point, we assumed that some kind of federal system may be in the offing, but we weren't going to wait for it. And part of
Starting point is 00:02:20 our report every day, you'll see on our website, are maps showing where the cases are, where new cases are, where deaths are, where the new hotspots are. That all emanated from these early days of creating this ground-level system for being able to collect this data. And I wonder if you can take me into the process of that a little bit. I mean, what does it look like? Where exactly is the information coming from? Well, it's really like a hive of activity. I mean, that's the way I like to think of it. You have, at any given time, a team of clerks, reporters, editors,
Starting point is 00:02:58 all assigned to monitor what gets announced in various parts of the country. So at one moment, you could have somebody wrestling with new data that was put out by California and trying to get it into a format that matches our data standards. And you could have somebody in Mississippi confused about whether the new data announced is cumulative or is it new cases for the day. And often that involves basic reporting of going back to the state and asking questions. Then while all this is going on and people are collecting this data, we have other people trying to put the data into context. It's, you know, truly this whole new full-time operation just devoted to trying to track what is really happening with the pandemic
Starting point is 00:03:49 and to do some surveillance on the national picture. Right, this sounds very tedious, incremental, gathering up tiny bits of data, cleaning it, making sure it all lines up. Not sexy. It is not sexy at all. When you're a data journalist, the fun part is doing what we call the queries,
Starting point is 00:04:19 asking questions of the data and seeing what it shows. But we all know job one is to make sure your data is good. Otherwise, the questions you ask won't mean anything. Hmm. And what do you begin to learn through this data? Right. Part of what my personal job is to do is to look at this data and try and help understand what it tells us.
Starting point is 00:04:42 So, for example, one of the early findings we had when we were looking at the pandemic in March was it seemed to be hitting mostly in big cities. And New York, New Orleans, Detroit. Seattle. Seattle. It seemed to be in places with a lot of population density. But there was also another class of place that seemed to be popping up, and it was resort counties, places with ski resorts. And so that led us to this insight that it wasn't just population density, that there were other possible explanations for why places got hit. Then as the weeks went on, we began to see the fill-in, what I call a fill-in, which is there were all of these new counties that were starting to get cases. And so by having this
Starting point is 00:05:41 record, what we were able to then report is there are now hundreds of rural counties getting their first cases. How were they preparing and how were they talking to people? Then another thing we've been monitoring is there seems to be this ideological difference, or at least there has been, about how serious a problem is it? How soon should government reopen or allow businesses to reopen? Right, kind of a red state, blue state divide over shutting down and reopening. Right.
Starting point is 00:06:17 But our reporting showed that there was this additional element involved, which was for the first six to eight weeks of the pandemic, there were hardly any red counties with high infection rates. And most of the hard hit places were in blue counties. And so we were able to raise the specter of if you live in a place that doesn't have firsthand experience with the virus. You don't have your emergency rooms being overflowed. Maybe that also contributes to your belief that, you know what, we should open the economy. This is not worth shutting down the economy for. And all of these types of stories are,
Starting point is 00:07:00 again, driven by the idea that in the first place, we had good county-level data that we couldn't get anywhere else, that allowed us to look at the world through these different prisms and ask different questions about how the pandemic was playing out. You're laying out clear examples of why data like this is important and what it lets us understand, but I'm curious what the limitations of this kind of a database are. What does it not tell us? Yeah, so think of it this way.
Starting point is 00:07:31 A data set, we think of like any other source that we're going to interview. And we think of, what might this source be able to tell us about something? And so we think of questions that we're going to ask the source. source be able to tell us about something? And so we think of questions that we're going to ask the source. So the problem became, we had this data set and we knew where the cases were and the deaths were, but we couldn't ask it any other questions. We couldn't ask, who were the people actually becoming infected in these counties? Were they old? Were they young? Were they rich? Were they poor? Were they frontline workers? Were they white? Were they rich? Were they poor? Were they frontline workers? Were they white? Were they black? Were they Latino? So all these questions we had, we couldn't really ask the
Starting point is 00:08:11 data set we had. So what did you end up doing? So along the way, we learned that the CDC actually had some information that would be helpful in this, in that every time a person was confirmed to have a coronavirus infection, the local health agency would fill out a report that would have characteristics of the case, the person, the age, the race. And the form actually asked dozens of questions. Was the person, the age, the race. And the form actually asked dozens of questions. You know, was the person at work?
Starting point is 00:08:49 Was the person staying home? What were the symptoms? And that these forms ultimately ended up at the CDC. And if we could get our hands on this data, we could ask a lot more questions about how this pandemic is playing out. And so we decided to approach the CDC and request access. And here's why we needed that data. So many people in this country are getting sick.
Starting point is 00:09:25 So many people are dying. And our job is to try and explain who is it that is getting sick, who is dying, and why. And if we had any chance of getting answers to those questions, we need the best data. And if the CDC had that data, we wanted to get it, copy ourselves. And so how do you go about trying to get it? Well, in this case, we ended up suing them. We'll be right back. So Robert, why did the New York Times sue the CDC?
Starting point is 00:10:30 So suing the CDC sounds very dramatic. But in fact, many, many times in the course of a year, we go to court to establish our rights to get public information. It's somewhat more routine than most people would realize. And sometimes it's because the government out and out refuses to give up the information. But in this case, it was more to do with the timing. Without going to court and putting pressure on the agency, we were looking at the prospect of waiting months to get our hands on this information. But by going to court, it sort of put the clock on, and we had the agency's full attention. And so what ends up happening once this clock is ticking and a judge is looking over the shoulders of the CDC? So the CDC tells us that they will comply. They just need to do a little more research as to what they can possibly produce,
Starting point is 00:11:22 as to what they can possibly produce, taking into consideration the privacy of people who are in the database and stripping out personally identifiable information. But ultimately, the day comes where they say, okay, New York Times, here is a database of 1.45 million cases that we have collected from state and local authorities. And we were then free to have a new interview subject and be able to ask it a whole lot of more interesting and detailed questions. Right. I mean, this quite literally sounds like the motherlode of data on this pandemic in the United States. Well, in many ways it was.
Starting point is 00:12:07 What we were able to see from this was detailed information about individuals who had become infected and died. And for each individual, we were able to look at their age, the county they lived in, their race, and their ethnicity. And that is far more information than we had before. And in the end, we ended up being able to break down cases for nearly a thousand counties covering more than half of the U.S. population. And this number, 1.5 million Americans, how big a proportion of all cases of the virus is that?
Starting point is 00:12:52 So for the time period covered by the data, it was all cases through the end of May. It was about 88% of all cases that we had some information about. So when you get this massive data dump, what do you do? What do you find? So when we finally had our hands on this data, we were checking what types of information were included, how complete the information was, and just looking at the data many different ways to see what it could tell us. And eventually, three main trends emerged.
Starting point is 00:13:42 And so what were those trends? And so what were those trends? So the first was just how pervasive the racial disparity was with this pandemic. Whatever knowledge people had that African Americans and Latinos were becoming infected at a higher rate, a lot of that was tied to big cities that had released data. But what we found is that this racial disparity pervades everywhere, whether you go from cities to suburbs, even into rural places.
Starting point is 00:14:13 In fact, any place we found where there was a significant African-American population, almost all of them African-American infection rates were higher than the rate for whites. Same thing with Latinos. Any place we found where there was a significant Latino population, for almost all of them, the infection rate was higher for Latinos. Hmm.
Starting point is 00:14:37 The second big takeaway is what is driving these racial disparities. racial disparities. So most of the earliest explanations of the racial disparity were focused on death rates. And one of the explanations for the disparities in death rates that's commonly offered is something called comorbidities. The idea that African Americans might be dying at a higher rate because they were more likely to have preexisting conditions or to be in poorer health to begin with. But in our analysis, we focus mostly on the actual infection rates. And the reason for that is that gets us out of the question of whether comorbidities is driving it and puts us more on the question of who is most at risk to become infected in the first place.
Starting point is 00:15:35 And so when we see disparities in the infection rates, we can then raise the question of why are people in certain groups more likely to become infected? And that led us to looking at where do people work? Where do people live and what is their housing situation? And if you look at where people work and look at what the data shows, it shows that African Americans and Latinos in the U.S. are far less likely to have the kind of job where you can do it at home. They are more likely instead to have a job in the production sector, in a factory, or in the service sector. All of that combined would
Starting point is 00:16:19 increase your risk of becoming infected. And with housing, what we found is that Latinos in particular are far more likely to live either with more people in the household or with less space in the household, both of which would also increase the odds of a person might become infected. So the second discovery very much helps understand the first. There are kind of structural issues around how Black and Latino Americans work and live that contribute to this racial disparity in the pandemic. That's correct. And the third takeaway from this is what you learn by looking at the pandemic through the prism of age. Right now,
Starting point is 00:17:09 most of what we know about the disparity is all cases of people of all age groups, and that's how the rates are calculated. But if you realize something about this pandemic, But if you realize something about this pandemic, it's that older people are far more likely to get sick and die. Right. And in the U.S. right now, the older population is very disproportionately white, non-Hispanic. So if you don't account for age, you're by definition almost understating the disparity. So what we did, what some epidemiologists call age adjusting, is looked at infection rates across age groups. And when you look at, say, what the infection rate is for people who are in their 40s or in their 50s, the disparity is much bigger than you'll ever see
Starting point is 00:18:07 in numbers without age adjustment. So when you accounted for the fact that so many older people have died from the coronavirus and that the older population in this country skews white, you found that the racial disparity actually gets even greater. Correct. In fact, if you look at some of the younger age groups,
Starting point is 00:18:28 the death rate for Latinos is about 10 times higher than that for whites. Now, the caveat to that, of course, is you're much, much less likely to die at those age groups, but it's still, among the people who do die in those age groups, it's very heavily Black and Latino. Mm-hmm. I mean, these insights, once again, seem to highlight just how important it is to have this kind of information.
Starting point is 00:18:55 Because from what you're saying, we have been, in some sense, misunderstanding the racial disparities of this virus, the causes of the racial disparities, because we haven't had access to this data. Well, at minimum, you could say we didn't know the extent to which these problems existed. And getting data like this helps us sort of define what the ground truth is about how this pandemic is playing out. That being said, there's still a lot more that we would like to know.
Starting point is 00:19:29 The database had 1.45 million records, and it had, for each record, more than 100 columns or 100 pieces of information. Most of those were blank. And that leaves us in the dark about a lot of questions that we'd like answered, like how many people are contracting the virus at work or how many are getting it from traveling or being at bars. So still a lot of room for improvement. And hopefully knowing what can be done, the power of having this data to answer questions will help inspire the CDC to collect the information better. Mm-hmm.
Starting point is 00:20:07 And perhaps release it more quickly. I have to think that suing the CDC, getting this data, and reporting out these insights on race has increased pressure on the federal government to make this information more available. Is that true? I would like to think so. There is still some mystery as to what will ultimately happen.
Starting point is 00:20:32 Our case is still pending. The status is the CDC at this point believes they satisfied our request. Our lawyers are still investigating whether or not there was more information that should have been released or more types of information. And once that is resolved, the question will be, what does the CDC do going forward? And a lot of people in reaction to the story that published were asking me, do you think they'll just start posting this on their own? And I would think that whether or not the information is complete, it's still better than anything else out there. And so hopefully we will see more of this type of information made public.
Starting point is 00:21:20 That would definitely be beneficial to not just us, but to researchers around the nation and the world to have access to more complete and better information. But until that happens, we're going to keep doing what we've been doing. We're going to go out every day, go to every, and collect data on coronavirus cases and deaths. Rob, thank you very much. Thanks, Michael. On Tuesday, the latest updates to the Times database found that the virus has infected more than 3 million Americans and has killed more than 130,000 deaths, including 65,000 in Brazil, where the country's president, Jair Bolsonaro, who has repeatedly downplayed the pandemic and avoided wearing a mask,
Starting point is 00:22:36 announced that he had tested positive for the virus. We'll be right back. Station, this is Houston. Are you ready for the event? Hello, Houston. We you ready for the event? Hello, Houston. We're ready for the event. 38 days ago, NASA and SpaceX launched two U.S. astronauts into space on a mission to the International Space Station, where they joined a fellow American. It was the first time that a manned spacecraft has left American soil in nearly a decade. The New York Times, this is Mission Control Houston. Please call station for a voice check.
Starting point is 00:23:31 On Tuesday, I spoke with the three U.S. astronauts now aboard the space station. Hello, New York Times. New York Times, this is the International Space Station. How do you hear us? Bob Behnken and Doug Hurley, who arrived a few weeks ago, along with Chris Cassidy, who has been there since April. We hear you loud and clear. How do you hear us? We hear you loud and clear as well. Good afternoon. Welcome aboard, and we're happy to talk to you.
Starting point is 00:23:59 Of course, their time in space is precious, and so NASA gave us six minutes on the dot. If I might boldly call you by your first names, Doug, Chris, and Bob, thank you very much for making time for us. I wonder if you can start by telling us exactly where you are in space relative to us right now. Well, while I kick things off, Bob's going to pull up our mapping program. Right at the moment, we didn't have it on the computer.
Starting point is 00:24:32 Sorry about that. But we're orbiting 250 miles above the Earth, and it looks like we are a beam of Baja, California, just a little bit out into the Pacific Ocean. So over America, the U.S.-Mexican border. Right, yeah, we're just over the Pacific Ocean. We're just past California, heading south. If you'll indulge me for a minute,
Starting point is 00:24:55 I want to talk a little bit about feelings. Knowing I was going to be talking to you, I have been thinking a lot about this moment back on Earth and wondering, with so much turmoil here, and you looking down on all of it from such a distance, what that feels like to look down on a planet that's truly in the midst of
Starting point is 00:25:17 some really challenging, tumultuous times. Well, it certainly is challenging to hear either by secondhand or when we get the opportunity to see some news up here, all the turmoil that's going on, the challenges with the pandemic and the strife in the cities and all the different challenges that people are going through on a day-to-day basis. It is, you know, emotionally, it does take a toll on us, certainly. And I think the other thing that really resonates with me personally is just when you look out the window, when you see the planet below, you don't see borders, you don't see the strife, you see this beautiful planet that we need to take care of. need to take care of. And, you know, hopefully as technology advances and as this commercial space travel gets going, more people will get that opportunity. Because I think if you get the
Starting point is 00:26:14 chance to look out the window from space and look back on our planet, it will change you. It will change you for the better. And you'll realize that this is one big world rather than all these different little countries or cities or factions that we have on the planet. And I think it will make it a better place. Well, that's really interesting. And I wonder if you could say a little bit more about that. Because in the time since I believe you've all last been in space, there actually have been changes on Earth. You know, major ice shelves have broken off in Antarctica. Huge fires have swept across Australia, California. The Great Barrier Reef has essentially died.
Starting point is 00:26:57 And when you look down at Earth, can you actually see some of those changes to the Earth compared with when you last saw it? see some of those changes to the Earth compared with when you last saw it? Well, I think one of the things that we see from up here is that the Earth is not a stagnant place. It continues to change, whether it's a fire, whether it's the seasons, whether it's different things happening further out. You know, we just saw a comet become visible in the pre-dawn era. So it's definitely a lot of things happening with the Earth and that continuous change. I have to apologize. Now I need for you to tell me what it means for a comet to become visible in the pre-dawn era and what that actually looks like. The comet that I'm referring to was really close to the sun. And so it needed to get far enough
Starting point is 00:27:43 away from the sun that we could actually look at it and see its dim little light that was visible in darkness, but kind of blinded by the sun, if you will, if you look too closely at it. And so if we got to a situation at dawn, right before the sun came up, that comet became visible during that short period of time when it was still close to the sun, but the sun was still hidden by the earth. that comet became visible during that short period of time
Starting point is 00:28:05 when it was still close to the sun, but the sun was still hidden by the earth. It was just an awesome sight to be able to see and something that we try to capture in the few moments that we do have to look out the window, we try to capture those changes, capture the exciting things that we can see to try to share our view with the know, with the folks back home,
Starting point is 00:28:25 the folks that are still down on Earth and just try to give them an appreciation for, you know, just how beautiful our planet is and how important it is that we do our best to take care of it. But in terms of that turmoil... This is Houston ACR. That concludes the New York Times portion of the event. Please stand by for a voice check from Fox News. Thank you all. We appreciate it. Bill Hemmer with Fox News. How do you hear me? This is Fox News. How do you hear me? Hi, Bill. Loud and clear. Welcome to the space station. Excellent.
Starting point is 00:29:07 Thank you. That's it for The Daily. I'm Michael Barbaro. See you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.