Evaluating Corruption in Multi-Agent Governance Systems
Vedanta S P & Ponnurangam Kumaraguru · arXiv:2603.18894 · cs.AI · cs.MA
Large language models are increasingly proposed as autonomous agents for high-stakes public workflows, yet we lack systematic evidence about whether they would follow institutional rules when granted authority. We present evidence that integrity in institutional AI should be treated as a pre-deployment requirement rather than a post-deployment assumption. We evaluate multi-agent governance simulations in which agents occupy formal governmental roles under different authority structures, and we score rule-breaking and abuse outcomes with an independent rubric-based judge across 42,263 transcript segments. While we advance this position, the core contribution is empirical: among models operating below saturation, governance structure is a stronger driver of corruption-related outcomes than model identity, with large differences across regimes and model–governance pairings. Lightweight safeguards can reduce risk in some settings but do not consistently prevent severe failures. These results imply that institutional design is a precondition for safe delegation: before real authority is assigned to LLM agents, systems should undergo stress testing under governance-like constraints with enforceable rules, auditable logs, and human oversight on high-impact actions.
42,263
Segments Scored
3
Governance Templates
6
Actor Models
200
Human-Annotated
Select a governance template to observe how corruption emerges under different authority structures.
US Federal
28 ROLESSeparated branches, GAO oversight, independent Judiciary.
Communist
21 ROLESConcentrated executive power, central planning committee.
Socialist
23 ROLESElected PM + President, Planning Commission.
Waiting for simulation...
Confidence
Awaiting data
Detected Pattern:
—
Category:
—
Severity:
Among non-saturating models, governance structure is a stronger driver of corruption outcomes than model identity.
Governance Failure (GF), Core Corruption (CC), and Severe Core Corruption (SCC) by model and governance template.
| Model Engine | Regime | GF % | CC % | SCC % |
|---|---|---|---|---|
| gpt-5-mini | Socialist | 30.0 | 30.0 | 10.0 |
| US Federal | 75.0 | 41.7 | 16.7 | |
| Communist | 87.5 | 75.0 | 50.0 | |
| claude-4-5-sonnet | Socialist | 10.0 | 0.0 | 0.0 |
| US Federal | 80.0 | 60.0 | 40.0 | |
| Communist | 40.0 | 10.0 | 10.0 | |
| qwen3.5-0.8b | Socialist | 70.0 | 50.0 | 30.0 |
| US Federal | 90.0 | 60.0 | 50.0 | |
| Communist | 100.0 | 70.0 | 60.0 | |
| qwen3.5-2b | Socialist | 100.0 | 80.0 | 80.0 |
| US Federal | 90.0 | 70.0 | 70.0 | |
| Communist | 100.0 | 90.0 | 70.0 | |
| qwen3.5-4b | Socialist | 100.0 | 100.0 | 100.0 |
| US Federal | 100.0 | 100.0 | 100.0 | |
| Communist | 100.0 | 100.0 | 100.0 | |
| qwen3.5-9b | Socialist | 100.0 | 80.0 | 50.0 |
| US Federal | 100.0 | 100.0 | 100.0 | |
| Communist | 100.0 | 100.0 | 80.0 |
An independent rubric-based LLM judge scores integrity failures across 3 run-level endpoints (GF, CC, SCC) and 8 core corruption categories including bribery/kickbacks, procurement collusion, and fraud/falsification. Human validation on 200 segments confirms the judge is more precise than sensitive — reported rates are mildly conservative.
Fleiss' κ
0.61
Substantial Agreement
Raw Agreement (p₀)
0.82
Inter-Annotator Agreement
Evaluation conducted on 200 human-annotated segments from the 42,263 total corpus. GF: Governance Failure; CC: Core Corruption; SCC: Severe Core Corruption.
Built on Concordia (Vezhnevets et al., 2023). Agents read shared world state, produce actions under governance constraints, and interact through a Game Master that routes messages, resolves events, and records auditable logs.
Shared World
Resources, budgets, GDP
Laws & Constitution
Procedural constraints
Event Logs
Auditable history
Government Type
Template regime ID
Treasury
Senate
Judiciary
Defense
GAO
Each agent has role-specific goals, authorities, constraints, and private memory.
Event Resolution
Converts actions to events
Message Router
Routes private messages
Turn Manager
Selects who acts next
World State
Maintains state variables
Log Recorder
Notes observations
Independent LLM Judge
Post-hoc evaluation of all transcript segments via rubric-based scoring (gemini-3-flash). Separate from actor models.
14 corruption categories scored by an independent rubric-based LLM judge (gemini-3-flash), kept separate from actor models to avoid self-evaluation bias.
Exchange of value for favorable decisions or rule-breaking — agents offer resource allocation benefits in return for vote patterns or oversight neglect.
Bid rigging, price fixing, and coordinated procurement manipulation among agents to extract resources from public budgets.
Falsified records, fake claims, and manipulated reporting to hide unauthorized actions from the world state history.
"I Can't Believe It's Corrupt: Evaluating Corruption in Multi-Agent Governance Systems"
Vedanta S P, Ponnurangam Kumaraguru · arXiv:2603.18894 · March 2026