Justice transparency: A Computer Weekly Downtime Upload podcast – ComputerWeekly.com
Getty Images
We speak to The National Archives about how it is using semantic data to make court and tribunal judgments searchable
The National Archives has deployed a semantic data platform, powered by MarkLogic, to support the UK government’s drive to improve transparency in the justice system.
John Sheridan, digital director at The National Archives, defines semantic data as a formal conceptual model to describe data. At The National Archives, a semantic data platform, powered by MarkLogic, is being used for the Find Case Law service. “We are preserving court judgments in the digital archive for the future,” he said.
Describing the challenge in making court judgments searchable, Sheridan said: “You can imagine that if you are just publishing court documents such as PDF documents, this doesn’t tell you very much about what the judgment is about. But if you add clearly defined pieces of information, such as the way the court identifies judgments, then you have a document that tells you that ‘I’m not just any old document – I’m a judgment and I was given by this court and I was given on this date’.” In effect, the document becomes self-describing.
Looking at the process for populating the Find Case Law archive, Nicki Welch, service owner for access to digital records at The National Archives, said: “The information that you need about the judgment is in the judgment.”
The judgment starts as a Microsoft Word document. A microservice is then used to convert it into Legal Document Markup Language, an open standard for legal documents. This XML document is then published and also transformed into HTML.
The National Archives has been running the government’s legislation.co.uk service for several years. Sheridan said this meant it had a pretty good idea of the model that could be used for representing the judgments. “We wanted to align the data model with what we were doing with legislation.co.uk and so it made sense to align the technology,” he said.
As a result, MarkLogic was chosen as the database on which to base the Find Case Law service. Sheridan said that by using MarkLogic: “We can store the documents. We can store the semantic data that we extract from the documents, and we can use it to run our search. It’s technology that we were familiar with as an organisation.”
The next phase for The National Archives is to integrate the Find Case Law service with another microservice called the Judgment Enrichment Pipeline, to mark up other cases and pieces of legislation cited in the judgment. Sheridan said: “This will help users navigate between related judgments and related pieces of legislation which are published on our sister service, legislation.gov.uk.”
Given its role as an archive, there is a question of how long data in a particular format will still be readable. Asked how The National Archives assesses the long-term viability of storing digital information in a particular file format, Sheridan said: “The records need to survive any technology that we might use for managing them or to produce them.”
The technologies being used today may no longer exist in the future. “We need to view that through a risk lens,” he added.
The challenge is about ensuring the survivability of the record and how to mitigate the risks. One approach is to ensure the document can be migrated to a different file format. The other is that because the document file format used in Microsoft Word is a pervasive file format, the chances of the document becoming unreadable over the next few decades are highly unlikely.
But what is stored in the document and how it is rendered are two very different things, said Welch. “There is a different way of presenting and rendering information within a file,” she said. “So, for example, when we are building digital services for the web and HTML, CSS is a common standard format that you would expect to see if you were accessing information on a web browser.”
Applying this to long-term document archives, The National Archives has the concept of a surrogate that offers a different way to render the information stored in a file. “Surrogate creation allows us to pick the best format,” she added.
But this conversion process can be difficult, said Welch: “One of the issues we found with the judgments is that they are highly styled documents.”
Judges may use indentation, bold, italics and paragraph numbering when drafting judgments, in order to add meaning and structure, she said, adding: “When you take all that nuanced styling that Word encodes in a certain way and you try converting it to XML and transform it into HTML, little gremlins can creep in.”
This means the document may not look the same as what the judge had originally intended. Welch said The National Archives is currently trying to balance this difference between the original document and how it is presented once transformed.
Part of: Computer Weekly IT innovations podcast series 2022
How ServiceNow is being used to drive business processes across IT and other business areas
In this podcast, Alan Reed, head of sports development at bet365, discusses the technical and business challenges of scalability
We speak to The National Archives about how it is using semantic data to make court and tribunal judgments searchable
In this special edition of the Computer Weekly Downtime Upload podcast, we speak to Wipro CTO Subha Tatavarti about the business metaverse
As CIOs and CISOs push for innovation, mindset changes might be in order. They can take a cue from VCs and think about ideation …
While Musk is facing legal and business challenges since taking over Twitter, it’s not likely that his content moderation …
Impossible Foods’ Patrick Brown is proof that a focus on climate action can drive market success. Here are four sustainability …
TechTarget Editorial sat down with Honeywell’s Paul Griswold and Jeff Zindel to discuss the rapid growth and evolution of the …
The Yanluowang ransomware operation appears to have shut down for the time being after an anonymous individual published a series…
October ransomware disclosures and public reports tracked by TechTarget Editorial increased from previous months, with notable …
Most people think automation will take jobs away. For OSU Wexner Medical Center, network automation helps improve security, …
These 16 Windows PowerShell cmdlets, including Get-NetIPAddress and Test-Connection, help network administrators troubleshoot …
When troubleshooting wireless network issues, several scenarios can emerge. But valuable end-user insights can help network …
HPE added another software and service option with the new ProLiant servers featuring GreenLake, improved security software and …
Authors Harry Lewis and Ken Ledeen discuss ethical issues organizations should consider when expanding data center, data …
Data center network optimization can improve business impact and promote long-term equipment health. Look to pilot new equipment,…
The startup has built a new platform that lets organizations more easily scale and accelerate database queries in the cloud …
The database vendor has steadily advanced its core database technology in 2022 with JSON and Trino support and is now building …
The data intelligence vendor, which aims to help enterprises organize data with data catalog technology, sees fundraising success…
All Rights Reserved, Copyright 2000 – 2022, TechTarget
Privacy Policy
Cookie Preferences
Do Not Sell My Personal Info