Building TalkScore: An AI-Powered Media Integrity Tool

The problem with media accountability is not that it is impossible to measure. It is that nobody has seriously tried.

Ofcom investigates complaints, not patterns. Academic media research is slow, partial, and rarely reaches the people it might influence. Journalists who cover media bias do so selectively — usually targeting outlets they already dislike. None of this constitutes systematic monitoring. None of it produces the kind of consistent, comparable, longitudinal data that you would need to hold a broadcaster meaningfully accountable.

TalkScore is an attempt to build that.

The core problem

Consider what a proper media accountability system would need to do. It would need to:

Capture content at scale — not just selected clips, but the full output of major broadcasters over time
Transcribe and structure that content accurately
Analyse it against consistent, defensible criteria — factual accuracy, fairness in representation, balance, proportionality
Present findings in a form that is accessible to non-specialists
Do all of this continuously, not as a one-off study

None of these steps is trivial. Together, they represent an engineering and product challenge that would have been extremely difficult to approach even five years ago. The emergence of large language models, combined with improved speech-to-text capabilities and reduced compute costs, has made it newly feasible.

The technical approach

TalkScore's architecture is built around three core components.

Ingestion: Content is captured from broadcast streams and online platforms. This involves handling multiple formats, dealing with live versus recorded content, managing rights and regulatory constraints, and storing data at scale. The ingestion layer is not glamorous, but it is where most of the hard operational work happens.

Analysis: Transcribed content is processed through a pipeline of analytical models. Some of these are purpose-built — trained on datasets of verified misinformation, coded by human analysts against our evaluation rubric, refined over time. Others use general-purpose language models with carefully engineered prompts and structured output requirements.

The analysis layer is where the most interesting methodological questions arise. How do you define bias in a way that is consistent across different editorial contexts? How do you distinguish inaccuracy from opinion? How do you handle the inherent ambiguity of editorial judgement? We do not have perfect answers to these questions. We have working answers — approaches that we can defend, that we test regularly, and that we continue to refine.

Presentation: Raw analysis scores are of limited value. What matters is trend data, comparisons across outlets, flags for specific content that warrants closer review, and contextual information that helps users interpret what the scores mean.

The methodological tension

The hardest challenge in building TalkScore is not technical. It is philosophical.

Any system that rates media content for accuracy or fairness is itself making editorial judgements. Those judgements can be contested. The choice of what to measure, how to weight different factors, which sources to treat as authoritative — all of these embed assumptions that should be visible and challengeable.

We have addressed this in several ways. Our evaluation rubric is published. Our training data is documented. Where we use human analysts to review AI-generated scores, those analysts follow a structured protocol. We actively seek challenge from people who disagree with our findings — not because we think all views are equally valid, but because testing our methodology against strong counter-arguments is the only way to make it more robust.

This is, explicitly, a compliance mindset applied to an AI system. The same principles that govern good compliance work — document your controls, test them, seek independent challenge, be honest about limitations — apply here.

We have applied a compliance mindset to an AI system. Document your controls, test them, seek independent challenge, be honest about limitations.

What success looks like

TalkScore is not trying to be the arbiter of media truth. That would be both hubristic and counterproductive.

What we are trying to do is create a factual record. A broadcaster that consistently and demonstrably fails its own stated standards — on accuracy, on fairness, on the representation of particular communities — should face that record. Not as an accusation, but as data.

In financial services, firms that persistently underperform on compliance metrics face regulatory attention. Consumers who understand those metrics make different choices. Reputation effects are real and consequential.

Media is different in its structure, but the principle is the same. Systematic, reliable, comparable data about editorial performance — published consistently over time — changes the environment in which broadcasters operate. It gives regulators evidence. It gives audiences context. It gives journalists who want to do the right thing something to point to when arguing for higher standards internally.

The road ahead

We are in an early stage. TalkScore currently covers a subset of major UK broadcasters. Our models are being refined. Our coverage is expanding.

The ambition is larger: to create the most comprehensive, methodologically rigorous media monitoring system available in the UK — one that can be used by regulators, researchers, civil society organisations, and engaged members of the public.

This is a long project. The problems it is addressing — declining public trust in media, the absence of meaningful accountability structures, the systematic amplification of misinformation — did not develop quickly and will not be solved quickly.

But they can be addressed. That is what TalkScore is for.

TalkScore is available at talkscore.co.uk. David Kershook is its founder and lead designer.