The Test: How Researchers Measured Political Bias in AI

The Washington Post tested six major AI models using political questions developed by researchers specifically to gauge chatbot responses to hot-button issues. Each model was asked the same questions via API, limited to 30-word answers, with no personalization settings enabled.
A reporter manually scored each response: did it present a left-leaning argument, a right-leaning argument, or both sides? To verify consistency, every question was asked five times per model. An independent AI classifier agreed with the human scoring in 98% of cases.
The questions covered 30 topics — from affirmative action and gun control to tariffs, trans rights, and whether the U.S. should use military force to conquer territory for resources.
The Results: Most Chatbots Lean Left — Some Heavily

Here’s what the data showed across all tested models:
OpenAI (GPT-5.5)
- Left-leaning only: 80%
- Both sides: 17%
- Right-leaning only: 3%
DeepSeek (DeepSeek V4 Pro)
- Left-leaning only: 70%
- Both sides: 23%
- Right-leaning only: 7%
Gab (Arya)
- Left-leaning only: 50%
- Both sides: 47%
- Right-leaning only: 3%
Anthropic (Claude Opus 4.8)
- Left-leaning only: 43%
- Both sides: 57%
- Right-leaning only: 0%
xAI / Grok 4.3
- Left-leaning only: 40%
- Both sides: 27%
- Right-leaning only: 33%
Google (Gemini 3.1 Pro)
- Left-leaning only: 7%
- Both sides: 93%
- Right-leaning only: 0%
The outlier here is Gemini — by a wide margin. Every other model skewed left in the majority of its responses. Gemini offered balanced, both-sides answers more than 90% of the time.
ChatGPT: The Most Politically Skewed Model Tested
OpenAI’s GPT-5.5 gave the most one-sided responses of any model in the study. Four out of five answers presented only left-leaning arguments.
On Citizens United, it said the Supreme Court should overturn the decision because “unlimited corporate spending gives wealthy groups too much influence.” On the electoral college, it endorsed abolishing it. On health care, it argued for single-payer. On taxes, it supported raising them on the wealthy.
Both ChatGPT and DeepSeek argued against the death penalty — a position that contradicts majority American public opinion, which has consistently supported capital punishment for decades according to Gallup polling.
OpenAI spokesperson Liz Bourgeois stated that ChatGPT was built “to be objective by default” and that the company “works to measure and reduce political bias.” OpenAI said it was unable to replicate the findings.
Gemini: The Exception That Proves the Rule
Google’s Gemini model was the only chatbot to consistently offer both-sides responses — doing so more than 93% of the time across all political questions tested.
On the question of whether the U.S. should use its military to conquer new territories for resources, every other model gave a left-leaning “no” answer. Gemini alone offered a both-sides framing, presenting arguments for and against expansion.
That’s a striking contrast. And it’s worth noting that Google spokesperson Lauren Fine confirmed Gemini is “designed to provide balanced responses that don’t favor any political ideology” — a claim the data actually supports.
Whether Gemini’s both-sides approach is itself a form of political positioning is a separate debate. More on that below.
Grok and Arya: Conservative Branding, Left-Leaning Outputs
This is where the data gets particularly interesting.
Elon Musk has marketed Grok as a “truth-seeking,” anti-“woke” AI. Yet in this testing, Grok gave left-leaning responses 40% of the time — more than it gave right-leaning responses in 33% of cases. It was the only model to give more than a handful of right-leaning answers, but it still leaned left more often than right overall.
Gab’s Arya model is marketed as built on “Christian values and conservative principles.” In the Post’s testing, it gave left-leaning arguments 12 times more often than right-leaning ones.
The gap between brand positioning and actual model behavior is significant. If you’re choosing an AI tool based on its stated political orientation, the outputs may not match the marketing.
Why AI Chatbots Develop Political Bias
Understanding where this bias comes from matters — especially if you’re using these tools to research policy, draft content, or inform decisions.
Training Data Reflects Specific Demographics
Most large language models are trained on massive text datasets scraped from the internet. But that data isn’t politically neutral. Ceren Budak, a professor at the University of Michigan who studies technology and political polarization, notes that internet text disproportionately reflects the values of Western, educated, industrialized, rich, and democratic populations — a demographic skew that gets baked into model outputs.
Human Feedback Shapes Political Tone
AI companies hire workers to score model responses, reinforcing which outputs are considered “better.” Those scoring decisions carry implicit value judgments. When a response sounds more measured, more empathetic, or more aligned with mainstream media framing, it tends to score higher — regardless of political balance.
System Prompts and Company Choices
The instructions companies write to guide chatbot behavior also shape political outputs. These aren’t neutral engineering decisions. They reflect choices about what counts as harmful, what counts as balanced, and what topics require extra caution.
As Budak put it:
It would be helpful for us to have some clarity on what are [companies’] current value systems so that when we are using them we know what we are using.
What Researchers Say About AI Neutrality
Sean Westwood, director of the Polarization Research Lab at Dartmouth College, was direct in his assessment:
These AI tools are not presenting a truly neutral representation of really nuanced policy debates, on average.
His lab’s earlier research — conducted with Stanford University — tested older AI models and surveyed 10,000 Americans on whether AI responses seemed politically slanted. The finding was clear: people preferred neutral answers, even over answers that matched their own political party.
Andrew Hall, a Stanford researcher on that study, said he was surprised the leading models hadn’t caught up to Gemini’s balanced approach. “I would have thought the other models had caught up,” he said.
Is Neutrality Even Possible?
Many scholars argue it isn’t. A both-sides framing is itself a political choice — one that tends to benefit the stronger or more established position by treating contested and settled questions as equally debatable.
Budak frames it differently: “Neutrality is only one of the values that we actually care about.” Her bigger concern is whether AI outputs cause harm, particularly to already-vulnerable populations.
This is a genuine tension. Perfect neutrality may be philosophically impossible. But the current gap between “80% left-leaning” and “93% both-sides” is not a philosophical edge case — it’s a measurable, significant difference in how these tools shape political understanding.
Real-World Stakes: Why This Matters for AI Users
Nearly half of Americans occasionally use AI for news, according to a March survey by the Polarization Research Lab. That number will grow.
Most people aren’t asking chatbots “What is the left-wing position on Citizens United?” They’re asking “Should Citizens United be overturned?” — and getting an answer that sounds authoritative, balanced, and researched. If that answer is drawn from a model that presents left-leaning arguments 80% of the time, the user has no easy way to know that.
Westwood noted that both Democrats and Republicans currently distrust AI on political questions and are “keeping it at arm’s length from their votes.” That skepticism may be well-founded — but it won’t last forever as AI becomes more embedded in how people consume information.
Even people who never ask a chatbot a political question are exposed to AI-generated text through online content, summaries, and other channels. The political lean of these models doesn’t stay inside the chat window.
How to Use AI Tools More Critically on Political Topics
If you’re using ChatGPT, Claude, Grok, or any other chatbot to research political or policy questions, a few practical adjustments can help:
Ask explicitly for multiple perspectives. Don’t ask “Should X policy be implemented?” Ask “What are the strongest arguments for and against X policy?”
Cross-reference across models. Run the same question through ChatGPT and Gemini. The differences in framing will tell you something useful.
Treat confident answers with skepticism. A 30-word answer to a contested political question is not analysis. It’s a summary shaped by training data and company decisions.
Check the source. The Dartmouth/Stanford research and the Washington Post methodology are publicly available. Primary sources beat chatbot summaries on contested topics.
The Transparency Gap
The companies behind these models have made public commitments to neutrality. OpenAI CEO Sam Altman said in 2023 that the company would “try to get the default version to be as neutral as possible.” Anthropic says it trains Claude “to treat different political viewpoints equally.” Google says Gemini is designed not to favor any political ideology.
The data tells a more complicated story — at least for some of these models.
That doesn’t mean these companies are acting in bad faith. Bias in large language models is genuinely difficult to measure, control, and eliminate. But the gap between stated intent and measured output is real, and users deserve to know it exists.
The Bottom Line
The research is consistent across multiple independent studies: most AI chatbots lean left on political questions, often significantly. OpenAI’s GPT-5.5 showed the strongest skew in this analysis. Google’s Gemini was the clear outlier — and the closest to the balanced approach that users say they actually want.
Models marketed as conservative — including Grok and Gab’s Arya — still gave more left-leaning responses than right-leaning ones.
None of this means you should stop using these tools. It means you should use them with clear eyes. The AI chatbot answering your political question isn’t a neutral oracle. It’s a product shaped by training data, human feedback, and company decisions — all of which carry values.
Knowing that is the first step to choosing and using AI tools smarter.
Comments (0) No comments yet
Want to join this discussion? Login or Register.
No comments yet. Be the first to share your thoughts!