How Every AI Model Writes: A Stylometric Comparison
We analyzed 320 samples across Claude, GPT, and Gemini to measure 6 dimensions of writing style. Here's what the data reveals about each AI model's personality.
You've probably noticed that ChatGPT, Claude, and Gemini don't write the same way. One drafts emails that feel polished and measured. Another leans toward enthusiasm. A third sounds like it's writing a textbook.
These aren't random impressions. They're measurable differences — and they're why AI writing sounds generic in the first place. We have the data to prove it.
The Experiment
We generated 320 writing samples across five major AI models: Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5, GPT-5.2, and Gemini 3 Pro. Each model received the same eight prompt types — formal emails, casual emails, business reports, social media posts, blog introductions, Slack messages, presentation content, and meeting follow-ups — in four languages, with two variants per combination.
Every sample was then analyzed using computational stylometry: deterministic formulas that measure sentence complexity, vocabulary richness, expressiveness, formality, consistency, and conciseness. The same formulas we use for Writing DNA Snapshots, applied to AI output instead of human writing. (For a deeper look at how style extraction works, see our technical explainer.)
For the full methodology, see How We Measure "Average AI".
The result is a six-dimensional profile for each model — a writing personality measured in numbers, not adjectives.
The Six Dimensions
Before comparing models, a quick refresher on what each axis captures:
- Sentence Complexity (0-100): Structural density. Long, nested sentences with clauses and qualifiers push the score up.
- Vocabulary Richness (0-100): Lexical diversity via Type-Token Ratio. Higher means more unique words relative to total words.
- Expressiveness (0-100): Emotional and rhetorical energy — exclamation marks, questions, attitude markers, em-dashes, ellipses.
- Formality (0-100): Function word density, hedging language, semicolon usage. Higher means more structured, cautious prose.
- Consistency (0-100): Uniformity of sentence lengths. High consistency means steady cadence; low means dynamic variation.
- Conciseness (0-100): Inverse of mean sentence length. Shorter sentences push the score up.
Each dimension is independent. A model can be highly formal yet concise. Expressive yet consistent. The six axes together create a fingerprint.
Model-by-Model Breakdown
Claude Opus 4.6: The Careful Architect
Opus is the most capable model in the Claude family, and its writing reflects that. It produces the longest, most structurally complex sentences of any model we tested, scoring at the high end of sentence complexity. Its formality runs high — Opus hedges carefully, uses function words densely, and rarely drops into casual register even when the prompt invites it.
Where Opus distinguishes itself is vocabulary richness. It deploys a broader range of words than any other Claude model, choosing precise terms over common ones. The tradeoff: its conciseness score is among the lowest. Opus writes expansively. If you need thoroughness, this is your model. If you need brevity, you'll be editing.
Claude Sonnet 4.5: The Balanced Professional
Sonnet sits in the middle of the Claude family by design, and the data confirms it. It scores moderately across all six dimensions — not the most complex sentences, not the simplest. Not the most expressive, not the most restrained.
But "moderate" doesn't mean "generic." Sonnet's balance is its strength. Its formality score sits close to the overall AI average of 58, making it the most versatile for professional communication. It adapts well across prompt types, shifting register more responsively than Opus. For everyday business writing — the emails, updates, and messages most professionals need — Sonnet is the closest thing to a universal default.
Claude Haiku 4.5: The Efficient Communicator
Haiku is built for speed, and it writes like it. Its sentences are the shortest across the Claude family, pushing its conciseness score well above the 42 average. Vocabulary richness drops correspondingly — Haiku reuses familiar words rather than reaching for precision.
The surprise in the data is Haiku's expressiveness. Despite shorter sentences, Haiku maintains strong expressive markers — questions, exclamation points, attitude words. It reads as punchy rather than terse. Think of Haiku as the Slack-message model: direct, energetic, compact.
GPT-5.2: The Enthusiastic Communicator
GPT-5.2 stands out immediately on one axis: expressiveness. It scores at the top of the range, deploying rhetorical questions, exclamation marks, and attitude markers more liberally than any other model. If you've ever noticed that ChatGPT outputs feel "warmer" or more conversational, the data confirms your intuition.
GPT-5.2's sentence complexity is moderate — it builds structures that are readable without being simple. Its formality sits slightly below the average, giving it a conversational edge. Vocabulary richness is solid but not exceptional — GPT-5.2 favors accessible language over precise terminology.
The distinctive GPT trait is its consistency score. GPT-5.2 varies its sentence lengths more than Claude models do, creating a rhythm that feels more dynamic but less predictable. This is a stylistic choice baked into the model's training, and it's measurable.
Gemini 3 Pro: The Structured Pragmatist
Gemini 3 Pro writes differently from both the Claude family and GPT-5.2. Its sentence complexity runs high — close to Claude Opus — but its expressiveness is notably lower. Gemini writes dense, structured prose without the rhetorical flourishes that characterize GPT output.
Formality is where Gemini distinguishes itself. It scores among the highest of all models on formality measures, driven by heavy function word usage and careful hedging. Gemini's output reads like a well-edited report. Professional, thorough, cautious.
Conciseness is low. Gemini shares Opus's tendency toward expansive sentences, though the content within those sentences is more structured and less exploratory. Where Opus meanders intellectually, Gemini builds systematically.
What the Averages Tell Us
When you average all five models together, you get the "Average AI" baseline we use in Writing DNA Snapshots:
| Axis | Average AI (English) |
|---|---|
| Sentence Complexity | 65 |
| Vocabulary Richness | 48 |
| Expressiveness | 76 |
| Formality | 58 |
| Consistency | 53 |
| Conciseness | 42 |
Sample Writing DNA Radar Chart
How one writer's style compares to Average AI on all six axes
Three things stand out in this aggregate:
Expressiveness is disproportionately high. At 76, it's the highest-scoring axis. Every model defaults to energetic, persuasive prose — questions, exclamations, emphatic markers. This is likely a direct consequence of RLHF training, where human raters prefer text that feels engaged and dynamic.
Conciseness is disproportionately low. At 42, it's the lowest. AI models universally write long. Average sentence lengths across all models exceed what most human professionals produce in emails and Slack messages. If you're a concise writer, every model's output will feel bloated compared to your natural style.
Consistency clusters in the middle. At 53, all models produce moderate variation in sentence length — not monotonous, not chaotic. This middle-ground consistency is another sign of RLHF optimization: evaluators probably penalized both extremes.
The Key Insight: Each Model Has a Personality
The most important finding isn't that models differ — it's that each model's differences are consistent and predictable. Claude Opus reliably writes longer, more complex sentences. GPT-5.2 reliably writes more expressively. Gemini reliably writes more formally.
These aren't random fluctuations. They're stable stylistic signatures, baked into each model's training. And they matter for a practical reason: whichever model you use, its default writing personality is not your writing personality.
AI Model Writing Style Comparison
Comparing Claude Opus 4.6 vs GPT-5.2 in English
Claude Opus might match your complexity but miss your conciseness. GPT-5.2 might match your expressiveness but overshoot your formality preferences. Gemini might match your structure but strip out your conversational tone.
The solution isn't to pick the "best" model. It's to teach whatever model you prefer to write like you. That's what a Writing DNA Snapshot measures — the specific gaps between your style and the model's defaults, across all six dimensions.
Cross-Model Convergence and Divergence
Perhaps the most interesting pattern in the data is where models agree and where they diverge.
Models converge on: vocabulary richness (clustering around 44-49), consistency (clustering around 52-55), and formality (clustering around 42-59). These axes show the least variation across models — suggesting that RLHF training pushes all models toward similar baselines on these dimensions.
Models diverge on: expressiveness (the widest spread), conciseness (meaningful differences between the Claude family and others), and sentence complexity (Opus and Gemini versus the rest). These divergences become even more pronounced across languages.
The convergence zones are where the Median User Problem is strongest. When all five models produce similar vocabulary richness scores, it means switching models won't solve your vocabulary problem. Only a style profile will.
The divergence zones are where model choice matters most. If you're a low-expressiveness writer — someone who lets ideas speak without rhetorical embellishment — choosing a model with lower default expressiveness gives you a smaller gap to bridge. But you'll still need calibration on the other five axes.
What This Means for Your AI Writing
Three practical takeaways from the data:
-
No single model matches any single human. The odds that your writing personality aligns with any model's default across all six dimensions are effectively zero. Model comparison is interesting but insufficient — you need per-dimension calibration.
-
Model choice is a starting point, not a solution. Picking a model whose defaults are closer to your style reduces the work a style profile has to do. But it doesn't eliminate it. Our head-to-head model comparison explores this tradeoff in detail. And these dynamics shift across languages — the best model in English may not be the best in French or Japanese.
-
The data makes personalization precise. Instead of vague instructions like "write more concisely," a style profile informed by this data can specify: "target conciseness of 68 versus the model's default of 42." That's a 26-point delta the AI can act on. We've built writing profiles for every model to show exactly how this works.
See How Your Writing Compares to AI
Curious how your writing compares to these five models? Try your free Writing DNA Snapshot — submit a few writing samples and see exactly where you diverge from Average AI on all six dimensions. No credit card required.
Your writing has a fingerprint. We measure it. My Writing Twin turns that measurement into instructions that make any AI write like you.