NHS England will go fishing in a “Data Lake”, but says “let them eat APIs” to doctors

An “Emerging Target Architecture” from NHS England aims to direct all NHS patient data into a new “national data lake” (page 14). This involves taking genomic, GP, and other health data for direct care, and then going fishing in that dataset for commissioning purposes, while keeping such actions secret from the patients whose data they access.

The inclusion of the data lake and claims to be ‘direct care’ show NHS England has no faith that the tools they propose to doctors will work. The fig-leaf of “localisation” is undermined by the “national” “data lake”, and it seems unlikely that DH and NHS England will cease meddling if a local area decides not to to rifle through patient records.

NHS England’s approach does not fix any problems that exist: there is no analysis that should be done, that this model will allow, that cannot be done now if someone cared to do it. The approach does however do away with patient privacy, safeguards and oversight, and allow nefarious access that is currently prohibited. This model does nothing to solve the actual problem, which is the need for more analysis. There is already an excess of data that no one is looking at, this simply creates more data. And no matter how much data there is, “more data” will remain an analyst’s desire. Patients, and the clinicians who treat them, don’t have such luxuries.

Conflating direct care and secondary uses will cause pain throughout the NHS for as long as it persists the legacy of the thinking behind care.data.

Direct care?

For direct care, the idea of patient-visible, audited, “near real time” access to records held elsewhere is not novel nor necessarily problematic in principle (though the details often fall short).

The Lefroy Act from 2015 requires hospitals to use the NHS number to identify patients, which makes data easy to link. The use of near-real-time access where there is a clinical need is not necessarily a problem everywhere, but there are clearly some areas where very great care is needed, and the ‘Emerging Target Architecture’ document contains none at all.

There are benefits to using FHIR APIs (or equivalent) as the definition of a “paperless” NHS (currently conveniently undefined). But this “target architecture” is not about that, and notably doesn’t say that. The APIs proposed can help patients, but do not require new data pools; the “national data lake” assumes they do not, and is included to allow fishing expeditions by NHS England itself and its customers – an “NHS England datamart”.

NHS England’s desire for unlimited access to data for direct care is to get unlimited access for other purposes. The document claims that “privacy by design” is important, but doesn’t go beyond words and completely ignores privacy from its worldview.

Where is the transparency?

Access to records to provide direct care is valid – but at the scale of the entire NHS, how will a patient know whether their records have been accessed by someone on the other side of the country? The system says nothing about transparency to patients.

While such an architecture can do good, it can also be abused, and the worldview of NHS England offers no potential for dissent.

Open Data and dashboards on current status are necessary for transparency in the NHS. However, paragraph 3.29.3 of ‘Emerging Target Architecture’ suggests that open data can be recombined into a patient record, which suggests something has gone very wrong in the “understanding” behind the document.

NHS England will go fishing in the genetic data lake

Because all patients’ records will be included in the data lake, NHS England will then be able to extract anything for which it can provide a plausible justification. But, as the care.data Expert Reference Group showed, anything can be justified if you use the right words and no one asks questions, e.g. “national data lake” and “privacy by design”.

The existence of a data lake means people will go fishing. You can put up “no fishing” signs, but we all know how that plays out with people who have good intentions, but priorities that undermine the larger goals.

The paper does not talk about genomic data, but Genomics England (GEL) is envisaged as an inflow. Was this a deliberate choice?

This free-for-all stands in comparison to the transparency of the current NHS Digital processes. We may fundamentally disagree with some of those decisions, but there is at least transparency on what decision was made and why.


The idea of a “datamart” is the clear reappearance of the care.data principle of taking all the data from patients and clinicians, and selling it to anyone who might offer a few beans to get the detailed medical histories of patients.

The conflation of direct care and (dissentable) secondary uses now looks less accidental, and more like an end state goal – for which ignoring patient opt outs was a necessary means to an end.

There must continue to be rigorous and transparent processes for accessing patient level data – and that should include transparency to patients of which organisations have accessed their data. APIs may help care, but they also help those with other intentions.

This proposal also does nothing to reduce the administrative overhead of the NHS billing bureaucracy, nor does it reduce the requirement for identifiable information to be shown to accountants at multiple NHS bodies, simply because they don’t trust each other. A “national data bus” architecture could address that problem, but NHS England has chosen not to care about reducing the burden on others.

There should be no third party access protocols – statistics should be published, or data to solve a specific problem should be available to appropriate analysts within a safe setting, when their questions have received appropriate review, who have the data appropriate to answer them, and who publish their results.

Drug companies should be prevented from changing the questions they ask after they know what the results of their trials are. And CQC shouldn’t be allowed to pretend they never asked a question, purely because they don’t like the answer they got. Analysis of the data may lead to new questions; but it should never lead the original question not being answered. And all questions asked of the data should be published.

The future of (Fax) Machines

There is still no clarity on what will replace the fax machine for one clinician sending information along a care pathway to a department in another organisation. The desire to abolish fax machines isn’t unwise, but they serve a clinical purpose that e-mail demonstrably doesn’t resolve.

Wither Summary Care Record?

The Summary Care Record could perform many of the direct care features, had NHS England not decided upon an “all or nothing” approach to having a SCR.  Had the enhancements to Summary Care Records been done on an iterative and consented basis, it would have been simpler to widen SCR to the new areas proposed. But NHS England, with the bureaucratic arrogance and technical mediocrity that pervades this proposal, simply insisted on the same “all or nothing” approach to the enhanced SCR. This being the case, it insists on all patient data being included in a data lake, as the access to data of last resort for clinicians.

Some of the proposals in this document clearly have merit, but when claims are made for “privacy by design” alongside such a fundamentally misconceived and diametrically opposed notion as a “national data lake”, the vision articulated is shown to be incoherent at best.

Prioritising a data copying exercise over actual care repeats exactly the same errors in thinking that set care.data on its path to failure. And, published just weeks after it emerged that patients’ objections to their data being used for purposes beyond their care are being ignored, this looks even more like a deliberate attempt to ignore that there are – and always will be – valid objections.

Ignoring the past in this way puts at risk access to the data of those who would be happy for their medical records to be used, given sufficient safeguards and transparency. Unfortunately, a data lake can never meet those requirements.

The “Emerging Target Architecture” document is here, and NHS England is taking comments until the end of the week…