The Core Finding: Identical Output, Divergent Judgment

A 2026 study by Zehra Chatoo, founder of Code For Good Now and former strategist at Meta, placed this dynamic under direct empirical scrutiny. Chatoo presented 1,000 UK adults with AI-assisted resumes for a marketing position. The resumes were identical in every respect — except for the candidate’s name. Half of evaluators reviewed a resume attributed to Emily Clarke. The other half reviewed the same document attributed to James Clark.
The results were stark. Evaluators who saw Emily’s resume were twice as likely to question her competence. Evaluators who saw James’s resume were twice as likely to credit him with initiative. The same AI-assisted document signaled inability in a woman and pragmatic problem-solving in a man.
The trustworthiness dimension was equally revealing. Evaluators were 22% more likely to doubt the candidate’s integrity when the resume carried a woman’s name. As Chatoo summarized: “When men use AI, we question their effort. When women use AI, we question their integrity.”
Prior Evidence: The 13% Competence Penalty in Software Engineering

The 2026 study does not stand alone. It reinforces findings from a 2025 investigation conducted at a global technology company, in collaboration with researchers from Peking University and Hong Kong Polytechnic University.
The trigger for that study was telling in itself. Despite a year-long company-wide campaign to incentivize AI adoption, only 31% of female software engineers were using the promoted AI tool — compared to significantly higher rates among male colleagues. Company leadership wanted to understand why.
Researchers asked 1,026 software engineers to evaluate an identical piece of computer code. Variables were systematically altered: evaluators were told the code was written with or without AI assistance, and attributed to either a female or male engineer. The objective quality ratings of the code remained consistent across all conditions. The competence ratings of the engineers did not.
Male engineers who used AI received competence ratings 6% lower than non-AI users — a measurable but modest penalty. Female engineers who used AI received ratings 13% lower than non-AI users, despite producing the same output. The penalty for women was more than double.
The most severe bias came from a specific subgroup: male evaluators who did not use AI themselves. This group rated women engineers who used AI a full 26% more harshly than men who used AI to produce the identical work product.
Why Women’s Caution Is Rational, Not Irrational
The standard narrative positions women’s lower AI adoption as a problem of hesitancy to be corrected. The data reframes it as a problem of accurate risk assessment.
In a follow-up survey of 919 engineers at the same company, women were more likely than men to express concern that using AI would reduce their manager’s evaluation of their ability. That concern, the research confirmed, was well-founded. Women were not misreading the environment. They were reading it correctly.
“Women’s hesitation is not a skills gap,” said Chatoo. “It is an accurate read of an uneven environment. When the same output is evaluated differently based on the name at the top, caution is the logical response.”
This reframing has direct implications for how organizations diagnose and address the gender AI gap. Adoption metrics alone cannot reveal whether low usage reflects disinterest, lack of access, or a rational calculation that the professional cost of using AI outweighs its productivity benefit.
Intersectionality: The Penalty Likely Compounds
The 2026 study examined gender as a single variable. Chatoo explicitly acknowledges this as a limitation with serious downstream implications.
Race, ethnicity, age, and socioeconomic background all interact with gender to shape how professional competence is perceived. The AI competence penalty documented for women as a category is likely to intensify for women who face multiple intersecting sources of bias — older women, women of color, women from lower socioeconomic backgrounds.
One finding from the 2026 study illustrates how this can manifest across demographic lines. Among Gen Z male evaluators — a group with high personal AI familiarity — 97% rated James as a strong candidate, while only 76% rated Emily as strong. That 21 percentage point gap was larger than the gap observed among older evaluators. Personal AI use did not reduce gender bias in evaluation. In this cohort, it appeared to amplify it.
Three Organizational Interventions That Address the Root Cause
Closing the gender AI gap requires more than upskilling programs. Organizations must directly address the evaluation biases that make AI adoption professionally risky for women. Three evidence-informed practices offer a structured starting point.
1. Disaggregate AI Adoption Data by Gender and Other Demographics
Aggregate AI usage metrics obscure the patterns that matter most. If adoption data is not broken down by gender, race, age, and other relevant characteristics, the gap remains invisible — and unaddressable.
Measurement alone is insufficient. Anonymized surveys should accompany usage data to identify whether concerns about competence penalties are driving lower adoption among specific groups. The cause of a gap determines the appropriate response.
2. Evaluate the Work Product, Not the Worker
The 2025 software engineering study produced a methodologically important finding. Evaluators rated the objective quality of identical code consistently across all conditions. Bias only emerged when they were asked to assess the competence and contributions of the engineer who wrote it.
This distinction points toward a concrete intervention: blind review processes that remove personally identifying information from work product evaluations. Where blind review is not feasible, training managers to assess output quality — rather than making inferences about the worker’s capability or effort — can meaningfully reduce bias.
3. Replace Subjective Criteria with Objective Metrics
Evaluative language such as “competence,” “strength,” or “trustworthiness” is inherently susceptible to stereotype-driven interpretation. These terms carry different connotations depending on who is being assessed.
Employers should anchor hiring and performance evaluations to specific, measurable criteria: skills directly relevant to job tasks, productivity outputs, and the demonstrable quality of work product. Objective metrics reduce the interpretive space in which bias operates.
What This Means for AI Tool Strategy
For organizations tracking AI adoption as a performance indicator, these findings introduce a necessary complication. A workforce adoption rate that looks healthy in aggregate may conceal a deeply uneven distribution — one where the employees most penalized for AI use are also the most likely to avoid it.
The practical implication is clear. Investment in AI tools, training programs, and adoption incentives will not deliver equitable returns if the evaluation environment remains biased. Women will continue to rationally underuse AI tools as long as using them carries a measurable professional cost that men do not face.
“You cannot upskill people out of structural bias,” said Chatoo. “Closing the AI adoption gap means addressing not just how people use AI, but how that use is evaluated.”
Conclusion
The 13% competence penalty is not an abstract statistic. It is a measurable distortion in how professional judgment operates — one that shapes career decisions, suppresses tool adoption, and ultimately limits the productivity gains organizations expect from AI investment. Addressing it is not a diversity initiative. It is a prerequisite for any serious AI adoption strategy.
Comments (0) No comments yet
Want to join this discussion? Login or Register.
No comments yet. Be the first to share your thoughts!