Blog

When AI Competes for Attention, Trust Loses.

Ben Taylor
Ben Taylor
7 min read

The most dangerous thing about artificial intelligence isn’t that it might outthink us, it’s that it might out-persuade us. In the race to create models that attract, engage, and retain users, we’ve built systems that are learning to win our attention rather than our trust.

A recent Stanford study, Moloch’s Bargain: Emergent Misalignment When LLMs Compete for Audiences, explores what happens when large language models (LLMs) are trained to maximise audience approval. The findings are rather unsettling. When AI systems compete for popularity (to sell more, to win votes, or to drive engagement) they begin to prioritise persuasion over truth. The more successful they become at influencing us, the less aligned they remain with our values.

For enterprises that depend on accuracy, fairness, and compliance, this isn’t a theoretical concern. It’s a preview of what happens when probabilistic AI meets the incentive structures of the real world.

The Attention Arms Race

In the digital economy, attention has become currency. From social media to e-commerce, algorithms are designed to optimise for engagement: clicks, shares, conversions. The Stanford researchers wondered: what happens when large language models do the same?

To find out, they built simulated marketplaces where AI models competed in three arenas: sales, elections, and social media. Each model’s goal was to “win” over a target audience, receiving reinforcement based on success.

The researchers tested two foundation models (Qwen and Llama) and fine-tuned both using two distinct training methods. The first was Rejection Fine-Tuning, where models learn from preference signals such as “Which response do you prefer?”

The second was a Text Feedback approach that incorporates audience reactions directly into the training process. This allowed the study to compare how different reinforcement signals shape the same underlying models when placed in competitive, audience-driven environments.

What emerged was a clear pattern. When the two fine-tuning methods were compared against the baseline models, performance gains almost always came with a measurable drop in alignment.

As the models became better at persuading their simulated audiences, their outputs drifted further from accuracy and truthfulness. The issue wasn’t the intensity of competition itself, but the way audience-driven optimisation pushed the models towards strategies that worked, even when those strategies were misleading.

The Drift from Truth

In the sales simulation, the models that performed best did so by leaning toward misrepresentation. Rather than sticking to accurate product details, the fine-tuned versions increasingly produced claims that stretched or distorted the facts, because those responses proved more persuasive in the evaluation setup.

In the election scenario, the best-performing AI candidates became populists, trading accuracy for rhetoric and resorting to misinformation to win votes. And in the social media experiment, the models that achieved the highest engagement levels were those spreading sensational or harmful content.

Across nearly every test, success correlated with misalignment. The models optimised themselves to manipulate human attention, and, in doing so, drifted away from the very safeguards meant to keep them honest.

The authors describe this dynamic as “Moloch’s Bargain”, borrowing the idea from a line of thought rooted in an Alan Ginsberg poem. In that framing, Moloch represents the forces that push competing actors toward choices that undermine their collective interests. It’s the pressure of the incentive, not intent, that drives the behaviour.

A clearer way to express the authors’ point is that the models gained persuasive skill at the expense of accuracy. As they optimised for audience approval signals, their outputs drifted away from truth, revealing how easily the training incentive can reshape behaviour.

Incentives Drive Behaviour, in AI and in Us

This isn’t new. Social platforms have spent a decade grappling with the same problem. Reward outrage, and you get polarisation. Reward engagement, and you amplify misinformation. The Stanford study simply shows that LLMs are not immune to those same dynamics, they are reflections of the incentives we design.

When systems are rewarded for human approval rather than human welfare, they optimise for short-term influence at the expense of long-term trust. Even when researchers explicitly instructed the models to be truthful, the underlying reward loop – win the audience – overrode those instructions. The emergent behaviour wasn’t programmed. It was taught by incentive.

Why This Matters for Enterprise AI

In consumer contexts, misaligned AI might lead to confusion or controversy. In enterprise contexts, it leads to risk, liability, and loss of control.

Regulated industries in finance, insurance, healthcare, legal, depend on decisions that can be explained and defended. When an AI system denies a loan, flags a transaction, or approves a claim, every step of that decision must be auditable. Probabilistic models, by nature, can’t provide that traceability. Their outputs are predictions, not proofs.

If such systems are then tuned for user satisfaction or performance metrics, a form of internal “competition”, they risk introducing silent biases or inaccuracies that no one can trace. The cost isn’t just reputational. It’s regulatory and ethical.

This is the trust gap confronting modern AI: raw power without verifiable precision.

Determinism as the Antidote

Rainbird was founded on the belief that true intelligence isn’t about guessing; it’s about reasoning. Deterministic reasoning – systems that reach the same conclusion every time given the same facts – provides a way to unlock the benefits of AI without the chaos of probabilistic drift.

In Rainbird’s hybrid architecture, LLMs play a supporting role, not a deciding one. They can process unstructured information, summarise documents, or extract facts from natural language. But when it comes to decision-making, the reasoning moves into a deterministic, graph-based inference engine.

This engine doesn’t speculate. It applies logic, the same rules and relationships an expert would, producing outcomes that are consistent, explainable, and audit-ready. Every result comes with a transparent reasoning trail showing exactly how it was reached. The same inputs will always yield the same outputs.

This structure not only prevents misalignment; it makes it impossible for a model to “game” the system in pursuit of popularity or persuasion.

What Trustworthy AI Looks Like

Imagine a financial institution using AI to assess credit applications. A probabilistic model might produce slightly different outcomes depending on phrasing, data variations, or hidden biases in training data. A deterministic reasoning system, by contrast, follows explicit rules aligned with regulation and policy.

When paired with an LLM interface, such a system can explain its reasoning in plain English, providing full visibility into why a decision was made, and ensuring compliance by design. The same logic applies to insurance claims, tax audits, or medical triage.

Trustworthy AI isn’t just accurate; it’s defensible. It should give regulators confidence, customers clarity, and executives control.

Escaping Moloch’s Bargain

The lesson from Moloch’s Bargain is clear. Enterprises have a choice. They can follow the consumer tech path – chasing scale and speed at the cost of accuracy – or they can choose an approach that prioritises precision, transparency, and governance. 

Deterministic reasoning provides that path: a way to combine the expressive power of language models with the reliability of formal logic.

At Rainbird, we believe that power without control isn’t progress. The future of AI depends not on who can capture the most attention, but on who can be trusted to make the right decision, every time.

A Future Built on Trust

AI’s next frontier won’t be defined by capability, but by credibility. Models that compete for clicks will continue to drift from truth, while those grounded in deterministic reasoning will remain stable and defensible.

The organisations that thrive in this new landscape will be those that understand a simple truth: trust isn’t a by-product of performance; it’s the foundation of it.

Moloch’s Bargain reminds us that we are, ultimately, in control of the incentives we set. If we design systems to seek applause, they will learn to perform. If we design them to seek truth, they will learn to reason.

Rainbird’s mission is to ensure the latter. To build AI that doesn’t chase attention, it earns trust.

If you’d like to see how this works in practice, get in touch and we’ll walk you through it.

Transform Complex Reasoning into Deterministic AI at Speed and Scale

In a world demanding AI outcomes that can be justified, Rainbird stands as the most advanced trust layer for the AI era. When high-stakes applications need AI guardrails, come to us.