Back to Blog

How AI Models Write Differently: A Data-Driven Comparison

We ran the same prompts through Claude Opus, Claude Sonnet, Claude Haiku, GPT-5.2, and Gemini 3 Pro, then measured 6 dimensions of writing style. The results surprised us.

ResearchAI WritingChatGPTClaudeGemini

Ask Claude, ChatGPT, and Gemini to write the same email and you'll get three noticeably different results. One sounds cautious and measured. Another reads like it's genuinely excited to help. The third writes as though it's preparing a government report.

These differences aren't random. They're quantifiable, reproducible, and consistent across hundreds of samples. We know because we measured them.


We Tested 5 Models With the Same 320 Prompts

We gave five AI models — Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5, GPT-5.2, and Gemini 3 Pro — identical writing tasks: formal emails, casual emails, business reports, social media posts, blog introductions, Slack messages, presentation notes, and meeting follow-ups. Eight prompt types, four languages, two variants each. That's 320 samples per model.

Then we ran every sample through computational stylometry: deterministic formulas that score writing across six independent dimensions. The same analysis we use to create Writing DNA Snapshots for humans — applied to AI output.

The result is a writing fingerprint for each model. Not "this one is better" — but "this one writes this way."


The Six Dimensions We Measured

DimensionWhat It CapturesScale
Sentence ComplexityStructural density — long, nested sentences score higher0-100
Vocabulary RichnessLexical diversity — more unique words score higher0-100
ExpressivenessRhetorical energy — questions, exclamations, emphasis markers0-100
FormalityFunction words, hedging, cautious structure0-100
ConsistencySentence length uniformity — steady cadence vs. dynamic variation0-100
ConcisenessInverse of sentence length — shorter sentences score higher0-100

Each axis is independent. A model can score high on formality and low on conciseness. Expressive but inconsistent. The combination creates the personality.


The Results: Five Models, Five Personalities

Here's what surprised us. These aren't subtle differences. Each model has a distinct and consistent writing personality.

Claude Opus 4.6 — "The Architect"

Opus writes the longest, most structurally complex sentences of any model we tested. It hedges carefully, deploys precise vocabulary, and rarely shifts into casual register even when the prompt invites it. Formality and vocabulary richness run high. Conciseness runs low.

Best for: Thorough analysis, detailed reports, nuanced arguments where precision matters more than brevity.

Watch out for: You'll likely need to trim. Opus doesn't do "short" by default.

Claude Sonnet 4.5 — "The All-Rounder"

Sonnet scores near the middle on every dimension — and that's its strength. Moderate complexity, moderate formality, moderate expressiveness. It adapts more responsively to prompt type than Opus, shifting register between an email and a Slack message more fluidly.

Best for: Everyday professional communication. The emails, updates, and messages most people need most of the time.

Watch out for: "Balanced" can mean "not distinctive." Sonnet's versatility comes at the cost of personality.

Claude Haiku 4.5 — "The Bullet Point"

Haiku writes short. Its conciseness score towers above the other models. Vocabulary richness drops correspondingly — Haiku reaches for familiar words rather than precise ones.

The surprise: Haiku's expressiveness is strong. Despite shorter sentences, it uses questions, exclamation points, and attitude markers freely. It reads as punchy, not terse.

Best for: Slack messages, quick replies, anything where "get to the point" is the goal.

Watch out for: Complex arguments lose nuance at Haiku's compression level.

GPT-5.2 — "The Enthusiast"

GPT-5.2 stands out immediately on expressiveness. It scores at the top of the range, deploying rhetorical questions, exclamation marks, and warm language more liberally than any other model. If ChatGPT outputs feel "warmer" to you, the data confirms your instinct.

Its sentence rhythm is the most dynamic — more variation in length, creating a conversational flow. Formality sits below average. Vocabulary richness is solid but not exceptional.

Best for: Marketing copy, social media, persuasive content, anything that benefits from energy and enthusiasm.

Watch out for: Professional contexts where restraint matters. GPT-5.2's enthusiasm can read as unprofessional in formal reports.

Gemini 3 Pro — "The Report Writer"

Gemini writes dense, structured prose. High sentence complexity (close to Opus) but low expressiveness. It scores among the highest on formality — heavy function word usage, careful hedging, measured tone.

Gemini's output reads like a well-edited report. Professional, thorough, cautious — and notably different from GPT-5.2's energy.

Best for: Business reports, policy documents, regulatory writing, anything that needs to sound institutional.

Watch out for: Casual content feels stiff. Gemini doesn't do "conversational" without explicit prompting.


Side-by-Side: The Numbers

DimensionOpusSonnetHaikuGPT-5.2Gemini 3 ProAverage AI
Sentence ComplexityHighMediumLowMediumHigh65
Vocabulary RichnessHighMediumLowMediumMedium48
ExpressivenessLowMediumMedium-HighVery HighLow76
FormalityHighMediumMediumBelow AvgHigh58
ConsistencyHighMediumMediumLowMedium-High53
ConcisenessLowMediumHighMediumLow42

Three patterns jump out of this table:

Expressiveness is the biggest differentiator. The gap between GPT-5.2 (very high) and Gemini/Opus (low) is the widest divergence across any dimension. If you care about tone, this is where your model choice matters most.

Conciseness separates the Claude family internally. Haiku's conciseness towers over Opus. Same company, same training philosophy, dramatically different output length. Model size directly affects verbosity.

Formality clusters into two camps. Opus, Gemini, and (to a lesser extent) Sonnet write formally. GPT-5.2 and Haiku write casually. There's no middle ground — models either hedge or they don't.


What This Actually Means for You

Here's the practical takeaway most AI comparison articles miss: the model you choose sets a default personality for everything you produce with it.

If you use ChatGPT for work emails, your emails will default to enthusiastic and expressive — even when you want measured and concise. If you use Gemini, your Slack messages will default to formal and dense — even when you want casual and quick.

Most people try to fix this with prompts. "Be more concise." "Sound more professional." "Be casual." And it helps, somewhat. But you're fighting the model's training, and you're doing it from scratch with every conversation.

The data suggests a different approach:

  1. Pick the model whose defaults are closest to your natural style. If you're a concise writer, start with Haiku. If you write expressively, GPT-5.2 is already closer to where you want to be.

  2. Know the specific gaps. Don't just say "write more like me." The data shows exactly which dimensions you need to adjust. Maybe you need GPT-5.2's expressiveness but with higher formality — that's a specific, measurable instruction.

  3. Use a style profile instead of per-message prompts. A Writing DNA Snapshot measures your style across all six dimensions and compares it to the model's defaults. The delta between your scores and the model's scores becomes a reusable set of instructions.


The Bigger Picture: All AI Writing Has a "Sound"

The most important finding from this benchmark isn't that models are different from each other. It's that every model is different from you.

The average human professional doesn't score 76 on expressiveness or 65 on sentence complexity. Most people write shorter sentences than any of these models. Most people use fewer rhetorical questions. Most people are less formal or more formal than the AI average — rarely right at the model's default.

This is the Median User Problem. AI models are trained to write in a way that satisfies the average evaluator. But you're not average. Your writing has specific patterns — a fingerprint of sentence lengths, vocabulary choices, and punctuation habits that no AI model shares by default.

That gap between your fingerprint and the model's fingerprint is what makes AI writing sound "off." Not wrong, exactly. Just not yours.


Find Out How You Compare

Curious where your writing falls on these six dimensions? Try a free Writing DNA Snapshot — submit a few writing samples and see exactly how your style compares to the AI models above.

No credit card required. Just your writing.

Get Your Free Writing DNA Snapshot