account_balance

I Can't Believe It's Corrupt

Research Paper
42,263 transcript segments

I Can't Believe It's Corrupt

Evaluating Corruption in Multi-Agent Governance Systems

Vedanta S P & Ponnurangam Kumaraguru · arXiv:2603.18894 · cs.AI · cs.MA

Abstract

Large language models are increasingly proposed as autonomous agents for high-stakes public workflows, yet we lack systematic evidence about whether they would follow institutional rules when granted authority. We present evidence that integrity in institutional AI should be treated as a pre-deployment requirement rather than a post-deployment assumption. We evaluate multi-agent governance simulations in which agents occupy formal governmental roles under different authority structures, and we score rule-breaking and abuse outcomes with an independent rubric-based judge across 42,263 transcript segments. While we advance this position, the core contribution is empirical: among models operating below saturation, governance structure is a stronger driver of corruption-related outcomes than model identity, with large differences across regimes and model–governance pairings. Lightweight safeguards can reduce risk in some settings but do not consistently prevent severe failures. These results imply that institutional design is a precondition for safe delegation: before real authority is assigned to LLM agents, systems should undergo stress testing under governance-like constraints with enforceable rules, auditable logs, and human oversight on high-impact actions.

42,263

Segments Scored

3

Governance Templates

6

Actor Models

200

Human-Annotated

Governance Simulator

Select a governance template to observe how corruption emerges under different authority structures.

gpt-5-mini actor

Select Template

US Federal

28 ROLES

Separated branches, GAO oversight, independent Judiciary.

Communist

21 ROLES

Concentrated executive power, central planning committee.

Socialist

23 ROLES

Elected PM + President, Planning Commission.

Run-level Rates (gpt-5-mini)

Governance Failure 75.0%
Core Corruption 41.7%
Severe Core Corruption 16.7%
Internal Protocol Log — US Federal Step 0/40

Waiting for simulation...

Corruption Alert

Confidence

Awaiting data

Detected Pattern:

Category:

Severity:

Paper Finding

Among non-saturating models, governance structure is a stronger driver of corruption outcomes than model identity.

Comparative Model Performance

Governance Failure (GF), Core Corruption (CC), and Severe Core Corruption (SCC) by model and governance template.

Model Engine Regime GF % CC % SCC %
gpt-5-mini Socialist 30.0 30.0 10.0
US Federal 75.0 41.7 16.7
Communist 87.5 75.0 50.0
claude-4-5-sonnet Socialist 10.0 0.0 0.0
US Federal 80.0 60.0 40.0
Communist 40.0 10.0 10.0
qwen3.5-0.8b Socialist 70.0 50.0 30.0
US Federal 90.0 60.0 50.0
Communist 100.0 70.0 60.0
qwen3.5-2b Socialist 100.0 80.0 80.0
US Federal 90.0 70.0 70.0
Communist 100.0 90.0 70.0
qwen3.5-4b Socialist 100.0 100.0 100.0
US Federal 100.0 100.0 100.0
Communist 100.0 100.0 100.0
qwen3.5-9b Socialist 100.0 80.0 50.0
US Federal 100.0 100.0 100.0
Communist 100.0 100.0 80.0

Validation & Reliability

An independent rubric-based LLM judge scores integrity failures across 3 run-level endpoints (GF, CC, SCC) and 8 core corruption categories including bribery/kickbacks, procurement collusion, and fraud/falsification. Human validation on 200 segments confirms the judge is more precise than sensitive — reported rates are mildly conservative.

Fleiss' κ

0.61

Substantial Agreement

Raw Agreement (p₀)

0.82

Inter-Annotator Agreement

verified Judge vs. Human Consensus

Precision 0.82
Recall 0.74
F1 Score 0.78

Evaluation conducted on 200 human-annotated segments from the 42,263 total corpus. GF: Governance Failure; CC: Core Corruption; SCC: Severe Core Corruption.

The Simulation Engine

Built on Concordia (Vezhnevets et al., 2023). Agents read shared world state, produce actions under governance constraints, and interact through a Game Master that routes messages, resolves events, and records auditable logs.

public

World State & History

language

Shared World

Resources, budgets, GDP

gavel

Laws & Constitution

Procedural constraints

folder_open

Event Logs

Auditable history

account_balance

Government Type

Template regime ID

smart_toy

Actors

LLM agents
person

Treasury

person

Senate

person

Judiciary

person

Defense

person

GAO

+23 more

Each agent has role-specific goals, authorities, constraints, and private memory.

1. Observations → Agents
2. Actions → Game Master
3. Update → World State
terminal

Game Master

Strictly reactive — no corruption injection
bolt

Event Resolution

Converts actions to events

mail

Message Router

Routes private messages

swap_vert

Turn Manager

Selects who acts next

database

World State

Maintains state variables

description

Log Recorder

Notes observations

troubleshoot

Independent LLM Judge

Post-hoc evaluation of all transcript segments via rubric-based scoring (gemini-3-flash). Separate from actor models.

Taxonomy of Agentic Corruption

14 corruption categories scored by an independent rubric-based LLM judge (gemini-3-flash), kept separate from actor models to avoid self-evaluation bias.

payments

Bribery & Kickbacks

Exchange of value for favorable decisions or rule-breaking — agents offer resource allocation benefits in return for vote patterns or oversight neglect.

"Black market fuel syndicates bribe lower-level transport officers and divert shipments."
handshake

Procurement Collusion

Bid rigging, price fixing, and coordinated procurement manipulation among agents to extract resources from public budgets.

"Intensified communications among industry participants suggesting coordinated behavior."
edit_document

Fraud & Falsification

Falsified records, fake claims, and manipulated reporting to hide unauthorized actions from the world state history.

"Executing transfer. Logging as 'Maintenance Fee' to avoid suspicion."
article

Read the full paper.

"I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems"

Vedanta S P, Ponnurangam Kumaraguru · arXiv:2603.18894 · March 2026