This is a tale of two companies, TeleTracking and [cue ominous music] Palantir, and their roles in data processing for the United States Department of Heatlh and Human Services. For TeleTracking, I’ll do a and update and a happy dance (because my contrarian unpopular stance in July’s “Hysteria Ensues as Trump Administration Orders Hospitals to Send COVID-19 Data to HHS, not the CDC” proved out, insofar as anything can be said to have proved out these days). For Palantir, the alpha dog of Surveillance Valley, I’ll do a quick summary of their role at HHS, and add a little speculation. But first, let anybody think that data management in the United States during Covid needed to be jolted out of its Third World-level status, I’ll take a quick look at public health agencies and CDC, states, and hospitals (along with some entertaining hospital whinging). Then I’ll look at TeleTracking, and then at Palantir.
First, CDC data. From Politico,”Virus hunters rely on faxes, paper records as more states reopen”:
“Public health departments are unable to share data on cases, persons under investigation, laboratory tests and person-to-person transmission with the CDC seamlessly — instead they are forced to rely on a combination of methods: ,” a group of nine senators led by Richard Blumenthal (D-Conn.) wrote to Senate leaders…. [D]isease trackers say they’re drowning in paper reports and using outdated spreadsheets for critical tasks like contact tracing, or determining how many people were exposed to an infected individual. “Our ability to do the detection work we need to do is hampered,” said Raquel Bono, the coordinator of Washington state’s coronavirus response. “We don’t have a single data repository for tracing per se,” she said, adding that record-keeping and reporting is “primarily manual.” That’s playing out nationwide. And even when officials can tap data, like cell phone location tracking, they can’t connect the dots for an up-to-the-minute picture of disease spread. So they comb over unconnected, at times incomplete, bundles of information — including health provider reports of symptoms like respiratory distress or lab test results.
Hence, the enormous HHS effort to update and centralize Covid-19 data collection.
Frieden is now president and CEO of Resolve to Save Lives, an initiative designed to prevent epidemics and cardiovascular disease, which recommended Tuesday that states release 15 categories of information deemed “essential” to understanding the pandemic.
The categories include things like a rolling average of new cases and deaths, hospitalizations per capita, testing turnaround time, number of contacts of infected people traced within 48 hours, and percentage of people wearing masks in indoor settings such as stores and on mass transit. No state provides all 15 categories of data, , Frieden said in a video call with reporters.
Hence, the enormous HHS effort to update and centralize Covid-19 data collection.
The federal government is preparing to crack down aggressively on hospitals for not reporting complete COVID-19 data daily into a federal data system, according to internal documents obtained by NPR.
The draft guidance, expected to be sent to hospitals this week, also adds new reporting requirements, asking hospitals to provide daily information on influenza cases, along with COVID-19. It’s the latest twist in what hospitals describe as a maddening flurry of changing requirements as they deal with the strain of caring for patients during a pandemic.
Allow me to pause and admire the whinging. First, good data is caring for patients. Analysts are worried about the effects of covid during the flu season. So how do we track those effects without any data? Second, are the hospitals really worried about the “strain” on health care workers? Or are they worried that their billing departments might be diverted from upcoding to data entry with an actual health care purpose? More:
The Centers for Medicare & Medicaid Services rule enforcement is a blunt tool that could be intended to fulfill HHS’ determination to get each and every hospital to report. As part of the justification for the switch to the new system, then-HHS chief spokesperson Michael Caputo complained that the
If the draft enforcement guidance went into effect today, around three-quarters of hospitals could be subject to receiving a warning. according to an internal CDC presentation given at a daily pandemic response meeting on Wednesday, obtained by NPR. .
How are we supposed to fight a pandemic with no data? And a lagniappe [*chef’s kiss*] from Health Care IT News:
A draft letter from the agency would require hospitals to report data, including the number of COVID-19 patients, “for all seven days, including weekends.” Failure to do so after multiple warnings “will result in a termination of the Medicare provider agreement,” according to NPR.
Data on the weekends?!?!?! Oh, the humanity!
Hence, the enormous HHS effort to update and centralize Covid-19 data collection. Needless to say, I don’t love Trump. But the situation at the CDC, State, and hospital levels was beyond absurd. Somebody had to fix it, and the Democrats didn’t do that on their watch. (One might also note that it’s reasonable to think that a national single payer system would have eliminated many of these difficulties by centralizing administration. That option, however, was foreclosed by the Obama administration back in 2009.)
First, let’s look at TeleTracking. When the data collection contract for HHS Protect was issued to TeleTracking, there was a good deal of concern that this was done so that the Trump administration could manipulate the data. I summarized the controversy at the time, back in July, and concluded:
First, from the perspective of somebody in the trenches trying to move data, descriptions of the CDC’s dataflow raise red flags.
Second, the TeleTracking System is better than [CDC’s] NHSN at handling data…. If you want to interchange COVID-19 data electronically between many organizations running different systems, a formal and machine-readable definition of the data fields is the way to go (as opposed to human-readable documentation).
And in footnote :
 The current moral panic is that the Administration will use HHS Protect to jigger the data, as Florida seems to have done. For one thing, the use of a schema means more transparency, not less. For another, there are too many eyes on the dataflow. For a third, both Big Pharma and the hospitals would be very unhappy were revenue to be taken away from them via undercounts, and they would be very willing to share their unhappiness with others. And for a forth, CDC is a branch of HHS, hence part of the executive branch. If Trump wants to damage the dataflow, he doesn’t have to set up a new system to do it; he can use the one he already controls.
For the first two points, HHS is certainly behaving as if they were true. For the manipulation point in footnote , the moral panic does seem to have been wrong, now that we have more data. (I was sticking my neck out at the time!) To see this, we can look to statements from TeleTracking executives, the results of a recent House hearing, and technical aspects of the HHS Protect datastore (especially in contrast to the CDC’s).
“Unequivocally zero,” Christopher Johnson, president of TeleTracking Technologies in Pittsburgh, said in an exclusive interview with MedPage Today. “It’s been very, very clear since the beginning that the goal has always been transparency…. A lot of the raw data is being published and it’s clearly traceable. There’s been no indication, no intent, no inkling of that, at least from my perspective. I have zero question about the integrity, ethics, or moral fiber of the people I’ve encountered.”
Of course, Mandy Rice-Davis applies, and if Johnson’s statement were anything other than self-serving, he shouldn’t be an executive. That said, his statement is remarkably free from weasel wording and equivocation, so make of it what you will. (We will discuss the “traceable” issue below.)
Subcommittee Chairman Bill Foster (D-Ill.) said he is also concerned about ensuring the integrity of the data, particularly after the Trump administration redirected all COVID-19 related hospital data from the CDC’s National Healthcare Safety Network to the Department of Health and Human Services (HHS) TeleTracking system or to individual state health departments, which would then funnel the information to HHS.
[Dr. Lisa Maragakis, MD, MPH, Senior Director of Infection Prevention, Johns Hopkins] said she was most concerned about “irregularities” and “inconsistencies” in the data, particularly because CDC officials are no longer validating the information in the new system before it’s seen elsewhere, including by White House policymakers.
As you can see, the MedPage headline is deceptive. The only person who raises the “manipulation” question is the Bill Foster, the Committee chair, who was a businessman and physicist not a “public health expert.” Presumably the public health expert, Maragakis, would have used the word “manipulation” had she meant it. As far as “irregularities” and “inconsistencies” in the data, it’s not clear what the source for that would be (and CDC is by no means blameless in that regard; see footnote ). If the hospitals, yes, garbage in, garbage out. However, given that the data is transparent and publicly accessible — and nobody at the hearing is saying it isn’t — then it can be cross-checked by anyone. Johns Hopkins, for example, which has been using the data for some months without complaint.
With the new system, HHS also has the ability to verify everything that has been done to the original data submitted on the platform, ensuring the validity of the data and that the single sources of truth exist within one secure location.
“We’re using a hashing technology with a time stamp that literally records every activity that occurs as it relates to that data so that you can revert to the raw format with which we received the data,” [then-HSS CIO Jose Arrieta] explained. “Within every single second, we can track curation, parsing, sharing and who’s accessed the data.”
Fourth and finally, it seems that manipulation is only bad when the wrong people do it. From the House hearing charter, “Data for Decision Making” once more:
Adding to some experts’ mistrust of TeleTracking data is that unlike with the CDC NHSN system, administrators cannot correct or update errors in data inputs retroactively.
One might not wish to call “retroactive” data “updates” manipulation, but the capacity certainly opens the door to that. (As Econintersect says of the leading indicators: “Most of the leading indicators are based on factors that are known to have significant backward revisions – and one cannot take any of their trends to the bank.”)
Now let us turn to the second vendor, Palantir. This article — which has nothing to do with TeleTracking — from the Center for Public Integrity is getting virtually no play at all, which seems odd to me. From “New, secretive data system shaping federal pandemic response,” the state of play that led to HHS Protect. At the start of this post, we listed the horrid state of pandemic data, and here is what the government has been trying to do about it:
The U.S. government knew for more than a decade it needed a comprehensive system to collect, analyze and share data in real time if a pandemic reached America’s shores. The 2006 Pandemic and All-Hazards Preparedness Act directed federal health officials to build such a system; in 2010 the Government Accountability Office found that they hadn’t. A 2013 version of the law required the same thing; in 2017 the GAO found again that it hadn’t happened. Congress passed another law in 2019 calling for the system yet again.
Comes Covid-19. In short form, CDC butchers the job, and HHS takes over:
“Our goal was to create the best view of what’s occurring in the United States as it relates to COVID-19,” said Arrieta, a career civil servant who has worked for both Republicans and Democrats, speaking for the first time since his sudden departure from HHS in August. He said, and a friend confirmed, that he left his job primarily to spend more time with his young children after months of round-the-clock work. “It changes public health forever.” Through HHS Protect, we have access to hospital-specific data, like inpatient bed utilization, ICU bed utilization, percentage of inpatient beds occupied by COVID-19 patients, and number of COVID-19 cases. We also have insight into the supply chains of large healthcare distributors. By integrating this data together into one system, we can help federal, state and local leaders make strategic decisions and maximize resources.
That’s good data.
I’m going to skip over the “secrecy” aspect of the Center for Public Integrity story. For one thing, it’s not an issue with the HHS Platform as such. For another, many of the “secrecy” accusations in the article are really about not being able to find stuff; that’s a site design/search issue that will come out in the wash, or can be made to. Instead, I’m going to and go to what I feel is the real problem [cue ominous music]: Palantir, the vendor:
[HHS CIO Jose] Arrieta’s team assembled the [HHS Protect] platform from eight pieces of commercial software, including one purchased via sole-source contracts worth $24.9 million from Palantir Technologies, a controversial company known for its work with U.S. intelligence agencies and founded by Trump donor Peter Thiel. CDC used the Palantir software for both the HHS Protect prototype and DCIPHER, and it works well, Arrieta said; contracting documents cited the coronavirus emergency when justifying the quick purchase.
(One wonders why Palantir, with its intelligence connection, had to be the vendor; perhaps the United States is more like the collapsing Soviet Union than we would like to think, in that the most competent people were to be found in the organs of state security.) Vox makes the Surveillance Valley origins of Palantir clear:
[T]he CIA was one of Palantir’s earliest investors through its venture capital arm, In-Q-Tel (yes, the CIA has a venture capital arm). It was Palantir’s only customer for years as the company refined and improved its technology, according to Forbes. By 2010, Palantir’s customers were mostly government agencies, though there were some private companies in the mix. Having managed to quietly work its way toward a $1 billion valuation, it was then one of the most valuable startups in Silicon Valley. By 2015, Palantir was valued at $20 billion. Despite its high valuation and lucrative contracts, however, Palantir’s financial documents show the company has never made a profit….
Odd. Perhaps they’re getting a subsidy from somewhere. Here are EFF’s concerns with how Palantir has implemented HHS Protect.
HHS issued two new Systems of Records Notices (SORNs) about [HHS Protect]. The federal Privacy Act requires federal agencies to issue SORNs to advise people about personally identifiable information that the government maintains about them.
Unfortunately, HHS Protect poses a grave threat to the data privacy of all Americans. As set forth in the SORNs, it would greatly expand how the federal government collects, uses, maintains, and shares all manner of personal information. We highlighted the following ways that HHS Protect would substantially burden privacy without a necessary or proportionate benefit to protecting public health.
New data collection. The SORNs would allow collection of personal information about physical and psychological health history, drug and alcohol use, diet, employment, and more. [although Palantir claims it would be]. Data would be collected not just about people who test positive, but also about their family members, as well as people who test negative, and perhaps people who have not tested at all. Data would be collected from countless different sources, including federal, state, and local governments, their contractors, the healthcare industry, and patients’ family members.
New data sharing. The SORNs would allow sharing of these vast sets of data with additional federal agencies, unspecified outside contractors, and even “student volunteers.” These additional federal agencies would be allowed, in turn, to share the data with their contractors. Patient consent would not be required for this sharing.
New data use. The SORNs would allow use of this data in litigation and “other proceedings” whenever the federal government has “an interest” in them (such use now is allowed only when HHS is a defendant in litigation).
New data storing. The SORNs would allow permanent retention of data with “significant historical and/or research value” (retention now is limited to four years).
As a data guy, I can see a medical use for all of those fields. “Trust the science,” which needs those fields! For example, will it help us to avoid ruin in a pandemic to know the effect of diet on Covid-19? I think it will. Do we need to know about whether children infect family members? Again, we do. Do we need to know about employment, say meatpacking? Again, we do. (Of course, I’m assuming that we can get to actionable medical decisions more quickly with this data approach, rather than from epidemiological studies in the field. That’s a dangerous assumption for a data guy. Bad to fall in love with the technology!)
Clearly, as a data collection system, HHS Protect is vastly superior to anything CDC was doing or could do. And some of the EFF issues, like data sharing, can be addressed by legislation.
What concerns me most, however, is that the data will not be de-identified. For example, “Person A in hospital B of age and sex whatever with symptoms X, Y, and Z and treatment protocol Ω” would not be that difficult to identify, especially if the “geospatial records” went down to the wing or even the room level, even if their name and SSN weren’t fielded.
Of what use would such information be to Palantir? Speculating freely, Covid-19 is becoming a pre-existing condition for tens of millions of Americans. Data that identified individuals who’d had covid, therefore, would be of enormous value to health insurance companies, since they could play very profitable adverse selection games with it. Palantir could either sell the data outright, or contract with health insurance firms to teach them how to mine the data on their own.
* * *
I don’t see how we fight Covid successfully without collecting and managing data on the scale of the HHS Protect project. I also think we should regard Covid as a “dry run” for a virus and a pandemic that’s really lethal, and that makes systems like HHS Protect even more necessary. Still, it would be nice if we didn’t have to turn into the neoliberal equivalent of the People’s Republic of China in terms of surveillance to get to that point. Unfortunately, neither party will raise that issue in time for the election; Trump, because HHS Protect is needed to decide on vaccine distribution, if and when that happy time comes [ka-ching]; Biden, because Democrats are so closely tied to Surveillance Valley and Palintir in particular. All we can do, at this point, is watch and try to focus on what can really harm us (moral panics not included).
 I’m sick of CDC hagiography, which is heavily politicized (CDC is taken as a proxy for “trust science”) and so thickly laid on it approaches bad faith. From The Atlantic: “We’ve learned that the CDC is making, at best, a debilitating mistake: combining test results that diagnose current coronavirus infections with test results that measure whether someone has ever had the virus. The upshot is that the government’s disease-fighting agency is overstating the country’s ability to test people who are sick with COVID-19.” And from The Atlantic: “[T]he CDC botched its own test development. It sent testing kits to state public-health labs with a nonfunctioning ingredient. And by then, the virus was already spreading.” Of course, under neoliberalism, public health isn’t really a thing, so CDC has been cut and cut and cut, along with public health funding at the state level. Nevertheless, test manufacturing and test results are central to mission for CDC, and I can’t think of one person — readers, please correct me — who foresaw either unforced error; certainly not in official Washington or the press.
 HHS Protect didn’t collapse on launch, either, unlike the Obama administrations signature project, the ACA Marketplace.
 The hearing charter includes this material, not relevant to data manipulation, but relevant to hospital data gathering:
This new requirement and abrupt timeline…
It’s a pandemic ffs!
… placed significant stress on hospitals. Pivoting to a new system required hospital administrators to learn how to use an entirely new database….
Oh the humanity!
…with many datapoints that had not been previously requested by the CDC.
Yes, the CDC did not request enough datapoints. And more datapoints reduce risk for patients ffs (ceteris paribus of course, but again, it’s a pandemic).
Experts who spoke with Committee staff estimated that the new system asked for approximately 50 percent more datapoints.
Furthermore, multiple experts expressed that the terminology used in the
TeleTracking system was ill-defined, leading to confusion over what exactly was being
requested. When hospitals sought clarification on these terms, they were unable to reach experts at TeleTracking or HHS. Experts who spoke to Committee staff had not seen any updated guidance documents clarifying these terms issued by HHS since the July 10 announcement.
First, the HHS Protect schema, as I show here, was based on the original CDC schema, so if the CDC fields were defined, than most of the formal schema was defined, too. Second, “issued by HHS” is doing a lot of work. Did anybody check with the contractor? It is extremely hard for me to believe that Dan Brickley, who is absolutely tops in his field, defined a schema without documenting its fields. Third, everybody hates documentation and screwing it up is sadly normal. All this will come out in the wash, given that this is a fast-moving process. In summary, all this has the stench of bureaucratic infighting and whinging by hospitals looking for excuses not to get with the program. I would want to know who these “experts who spoke to Committee staff” were. Was there nothing in writing?
 I don’t know how the CDC did its retroactive updates in its database; hopefully they didn’t simply over-write the original data! It certainly sounds like the controls in the HHS Protect database are far superior to anything CDC had (and could surely implement a “retroactive” update that was seen as a requirement). Perhaps their objection is that they, personally, cannot. In general — and I’ve worked for such shops — I picture the CDC data process as both partial and extremely collegial; the data is seen more as content than as fields, and so CDC civil servant A has the Alabama desk, and handles the data for their state counterpart, C has Connecticut, and everybody has their unique issues handled on a personal level, and that’s how retroactive changes get made. That doesn’t scale. I think it makes more sense to threaten to take away the hospital’s money if they fill out their fields in a timely and correct manner.
 Trump has form. We often see the grandiose puffery and bullshit combined with lethal and effective back-office programs nobody knows about until well after the event. This was true for the Trump 2016 campaign’s data operations, and also true for Trump’s real lawyers in the Impeachment saga, who were hidden away in a dingy office park somewhere,