Analysis and Inputs Reporting

[The 2020 update to our ongoing series on data usage reports (2014… 2021)]

The need for and consequences of data usage reporting is something medConfidential has worked through for a long time.

You have the right to know how data about you is used, but what does that look like in practice? We’ve mocked up a data usage report for the NHS, and the equivalent for Government – but what about the analyses that are run on any data? What should responsible data analysts be able to say (and prove) about the analyses they have run?

The new, eighth Caldicott Principle is “Inform patients and service users about how their confidential information is used”. In future work we will look at how this goes beyond existing legal requirements under the 2018 Data Protection Act, what Data Usage Reports (or Data Release Statements) should look like to the NHS in 2021, and what patients should see. For now, though, we want to take a look at the other end of the process.

Analyses, Analysts, and their readers

Public bodies (and indeed everyone) buying AI and Machine Learning products need to know what it is they are buying, and how it has been developed and tested. Ethically, they must be able to know the equivalent of “This was not tested on animals”, i.e. “No data subject was harmed in the making of this AI”.

We covered a lot of the procurement side of this in our recent work on AI, data and business models. But that raised a question: what is it that procurers should ask for when procuring data-driven products and services? And what does good look like – or, at a bare minimum, what does adequate look like?

At the most practical level, what should someone wanting to follow best practice actually do?

And just as importantly, who should do what?

In a world of the Five Safes, Trusted Research Environments (TREs) and openSAFELY, and as the role of independent third parties becomes increasingly viable, those who wish to follow more dangerous ‘legacy practices’ with data will be unable to provide and evidence equivalent assurances – and their offerings will therefore be at a significant disadvantage in the market.

A trustworthy TRE records exactly what data was used in each analysis, and can report that back to its users and to those who read their analyses. Academic journals often require copies of data to be published alongside an academic paper, which is not possible for health data (and if someone were to make that mistake would be catastrophic), but this certificate could act as a sufficient proxy for confidence and reproducibility.

If you are running the data ‘in your own basement’, there’s no way for anyone to know what you did with it beyond simply trusting you. In health analyses and with health and care data, that isn’t enough – and it should certainly not be the basis for procurement decisions.

So, as before, we decided to mock something up.

Trusted Research Environments which facilitate transparent data assurance like this, and which automate the provision of evidence of compliance with the rules – Data Protection, Information Governance, Equality, or otherwise – will be offering advantages for their users over those which do not. And any TRE that does not report back to its users how its safety measures were used will clearly not be helping its users build confidence in the entire research process.

While they may claim to be “trusted”, organisations that fail to provide every project with an ‘Analysis and Input report’ cannot be seen as genuinely trustworthy.

[2021 blog post in the series]