The University of Surrey has unveiled a truly impressive Artificial Intelligence system designed to revolutionise how we interact with justice. By tailoring AI to the unique language of British courtrooms, this tool promises to make justice ‘more transparent and accessible’ than ever before.The system achieves this by automatically transcribing UK Supreme Court hearings and semantically linking the resulting text directly to written judgments. For all involved in the justice system, from barristers to the public, this is a massive win, reducing hours of manual searching into seconds.

But as this powerful new technology moves toward adoption by bodies like the UK Supreme Court and the National Archives, we must address a crucial, often overlooked, privacy challenge: What happens when public court data, which is inherently sensitive, is encoded permanently into a highly efficient, proprietary AI system?


How has AI been used in the justice system recently?
AI is currently being used in courts in a limited, supportive capacity. In the recent case Evans v Revenue and Customs Commissioners [2025] UKFTT 01112 (TC), a First-Tier Tribunal judge used AI (Microsoft Copilot Chat) to help summarise documents while preparing a decision. The judge was transparent about this use, explaining that the case was well-suited to AI assistance as it involved no hearings or credibility assessments, just written submissions.The AI was used to create a first draft of the summaries, which the judge then reviewed and edited, ensuring the final judgment was based on their own independent reasoning. This use aligns with official judiciary guidance, which permits secure and private AI tools to support judicial work, so long as the integrity of justice is maintained.


The AI’s Appetite for Data
The technical success of this system rests on its deep training. Researchers developed a custom speech recognition system trained on 139 hours of Supreme Court hearings and legal documents. This specialist training allows the system to overcome common hurdles, such as mishearing ‘my lady’ as ‘melody’ or ‘inherent vice’ as ‘in your advice’. The resulting model is highly accurate, reducing transcription errors by up to 9% compared with leading commercial tools. Crucially, it is specifically reliable at capturing crucial entities such as provisions, case names and judicial titles, and the text-to-video linking mechanism correctly correlates arguments in judgments with the precise hearing timestamp, achieving an impressive F1 (the harmonic mean of the precision and recall of a classification model) score of 0.85. It is the very effectiveness of this process and the reliable capture of specific information that introduces a significant data privacy paradox, particularly concerning GDPR.


Encoding Special Category Data into AI Systems
While Supreme Court hearings are public proceedings intended to increase transparency, the data contained within the 139 hours of training material is not merely factual; courts deal with ‘some of the most important questions in society’, which often involve deeply personal and sensitive details. The core privacy issue lies in the re-processing of this material. When an AI model is trained, it doesn’t just learn the language; it encodes the specific characteristics of the data it consumes.

Encoding sensitive information in Supreme Court cases presents significant risks under the EU Regulation 2016/679 (GDPR). These cases often involve details that fall within the definition of special category personal data under GDPR Article 9(1), such as health data or political affiliation. When an AI system is fine-tuned to reliably capture ‘crucial entities,’ it may inadvertently encode the names of natural persons alongside these highly sensitive and private case details. Once embedded in the model, these associations may become effectively permanent, raising concerns around compliance with the principles of lawfulness, fairness and transparency (Article 5(1)(a)), purpose limitation (Article 5(1)(b)), and data minimisation (Article 5(1)(c)). These principles require that personal data is processed only where strictly necessary, for specified purposes, and in a manner that respects the rights of the data subject.

The creation of a permanent, searchable index raises further compliance challenges. The AI tool’s main utility lies in transforming long and complex video records into instantly searchable links, reducing hours of manual searching ‘into seconds.’ While this delivers efficiency, it effectively generates a perpetual index that can correlate an individual’s name, captured as a ‘crucial entity’ with the sensitive context of their involvement in a court case. This scenario introduces risks of breaching storage limitation (Article 5(1)(e)) and integrity and confidentiality (Article 5(1)(f)), as it may lead to the indefinite retention and exposure of sensitive information beyond what is necessary. It also raises questions about compliance with Article 25 (data protection by design and by default), which requires controllers to implement appropriate safeguards, and Article 32 (security of processing), which mandates protection against unauthorised or unlawful processing.

Beyond these obligations, there are direct implications for data subject rights. If personal data is permanently encoded within the AI model or indexed in ways that cannot easily be separated from case details, individuals may find it difficult or impossible to exercise their right of access (Article 15), right to rectification (Article 16), or right to erasure (Article 17). The creation of a perpetual index could also undermine the right to restriction of processing (Article 18) and the right to object (Article 21), since individuals would have limited ability to prevent or challenge further processing once their data has been embedded. Furthermore, the automated extraction and correlation of entities may, depending on the design of the system, engage the right not to be subject to automated decision-making, including profiling (Article 22). Taken together, these risks highlight the potential for the technology to erode fundamental data protection rights by making sensitive associations both permanent and highly searchable.


The Impacts of Exemption
While the risks under GDPR are significant, the UK Data Protection Act 2018 introduces exemptions that could alter how these obligations apply in practice.

Schedule 2, Part 1, Paragraph 5 provides an exemption for personal data processed in connection with the administration of justice. Where applying data subject rights or other GDPR principles would prejudice the ability of courts to fulfil their functions, those requests for erasure or objection, as examples, may be lawfully declined if they hinder the operations of the courts. Importantly, the exemption is not absolute: it applies only to the extent that compliance would cause such prejudice, and controllers would still be required to demonstrate both necessity and proportionality.

Schedule 2, Part 6 allows further exemptions where personal data is processed for archiving in the public interest, or in cases of research or statistical purposes. Training and deploying AI on historic court data could be characterised as serving one or multiple of these functions, particularly where the aim is to preserve and make accessible the legal record. In these circumstances, rights such as rectification may be limited if exercising them would compromise the research outputs or public archives. These exemptions, however, carry conditions: controllers must implement appropriate safeguards, such as pseudonymisation to ensure that fundamental rights remain respected.

Whilst these provisions may soften the application of GDPR rights in the context of judicial AI, they do not provide a carte blanche. The principles of lawfulness, fairness, transparency, and data protection by design remain fully applicable, and if these exceptions are to be used, careful justification and accountability will be needed.


Stateside Considerations

Concerns about judicial AI are not confined to the UK. In the United States, judges and court administrators have started to issue cautious guidance on the use of generative AI within court processes. The central issue is less about efficiency, which these tools can undoubtedly offer, and more about whether sensitive legal information can be entrusted to systems that may have consequences such as reuse of data for model training, or production of outputs that are not verifiable.

This is particularly pressing when considering confidentiality. Courts routinely handle privileged and highly sensitive information, and any disclosure, accidental or systemic, risks undermining trust in judicial institutions. Publicly available AI tools, designed to improve themselves through continuous training, cannot guarantee the kind of information seclusion expected in the justice system. Legal technology vendors have tried to fill this gap by offering bespoke, ‘confidentiality-first’ AI services for law firms and courts, but adoption still remains tentative.

The other challenge is reliability. Generative AI has a well-documented tendency to produce ‘hallucinated’ citations and fabricated references, sometimes with convincing authority. Even where these errors are caught before they reach the record, the possibility that a filing could contain invented law highlights the limits of depending on unverified outputs. In jurisdictions such as the U.S., some judges have begun requiring lawyers to certify that every citation has been checked independently, signalling that the risks of overreliance are already material.

What links the U.S. experience with the UK’s emerging AI deployments is the recognition that trust cannot rest on technical performance alone. A tool that increases accessibility but undermines confidentiality or generates unreliable information does not advance justice in any meaningful sense.


Balancing Accessibility with Fundamental Rights
The re-processing of personal data for a new purpose (AI model training) demands rigorous accountability under GDPR. Without clear protocols detailing how the researchers anonymised, pseudonymised, or filtered sensitive personal data from the 139 hours of input, there is a tangible risk that this system, built for transparency, inadvertently creates a new, highly efficient mechanism for accessing and correlating highly sensitive, historic, private information.

The move to digitise and accelerate access to justice is commendable, as user tests confirm the AI dramatically increases productivity (a legal expert was able to validate 220 links in three hours with AI support, compared to 15 hours for 10 links without). However, the success of this system depends on both its technical accuracy but also on ensuring that the pursuit of accessibility does not inadvertently erode the fundamental privacy rights of those whose experiences populate the very data set that makes the AI function.

Tags:

Book a Call

We have experts here to help you