Back to Blog

Which AI Model Writes Best in Each Language? A Data-Driven Answer

We compared GPT-5.2, Claude, and Gemini across English, French, Spanish, and Japanese. The best AI model depends on your language.

ResearchMultilingualAI Writing

When someone asks "Which AI model writes best?", the honest answer is "it depends." But it depends on more than just your personal preference or use case. It depends on what language you're writing in.

Our benchmark data shows that AI models don't maintain the same relative strengths across languages. A model that excels in English might underperform in Japanese. The "best" model for French business communication isn't necessarily the best for Spanish.

Here's what the data says.


The Setup

This analysis draws from the same 320-sample benchmark described in How We Measure "Average AI": five models (Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5, GPT-5.2, Gemini 3 Pro), eight prompt types, four languages, two variants. All samples measured on six stylometric axes.

For the English-only model comparison, see ChatGPT vs Claude vs Gemini. For model personality profiles, see Writing Profiles for Every AI Model. This post focuses on how those model dynamics shift across languages.


English: The Familiar Landscape

English is where most users have the strongest intuitions, so let's start here as a reference point.

The AI average in English:

AxisScore
Sentence Complexity65
Vocabulary Richness48
Expressiveness76
Formality58
Consistency53
Conciseness42

In English, the model dynamics work roughly as you'd expect from marketing:

  • GPT-5.2 leads on expressiveness and conversational accessibility. It writes warm, engaging prose with dynamic sentence rhythm.
  • Claude Sonnet 4.5 leads on consistency and vocabulary richness. It produces balanced, professional output.
  • Gemini 3 Pro leads on formality and structural complexity. It writes like a well-organized report.

Best for English business writing: No single winner, but Claude Sonnet's balance makes it the most versatile default. GPT-5.2 for marketing and customer-facing content. Gemini for formal documents and analysis.


French: Where Complexity Rules

The AI average in French:

AxisScore
Sentence Complexity75
Vocabulary Richness49
Expressiveness74
Formality42
Consistency52
Conciseness32

French AI writing is defined by two extremes: the highest complexity (75) and the lowest conciseness (32) of any language. All models write longer, more structurally dense sentences in French than in English.

What shifts in French:

Formality drops significantly. French AI output averages 42 on formality compared to 58 in English. This is a 16-point swing. French AI writes with less hedging, fewer function-word-dense constructions, and a more personal tone — despite the structural complexity. For French professionals who need formal output (banking, legal, government), this gap is a problem regardless of model choice.

Expressiveness stays high. At 74, French AI expressiveness is close to English (76). French models maintain the same rhetorical energy — questions, emphasis markers, attitude words — even in a language that structures them differently.

Conciseness plummets. At 32, French AI is 10 points less concise than English AI (42). Every model writes expansively in French. If you're a French professional who values brevity — and many do, especially in technical and startup contexts — no model will give you concise output by default.

Best for French writing: The model choice matters less than calibration in French, because the language-level effects dominate. That said, models with higher baseline conciseness in English tend to maintain a slight advantage in French. The key challenge for French users is the complexity-conciseness combination: French AI writes sentences that are both structurally complex and verbally long. A style profile targeting French output needs to address both dimensions simultaneously.


Spanish: The Restrained Communicator

The AI average in Spanish:

AxisScore
Sentence Complexity71
Vocabulary Richness44
Expressiveness59
Formality46
Consistency55
Conciseness36

Spanish AI output has a distinctive profile: moderate-to-high complexity, low expressiveness, and the highest consistency of any language.

The expressiveness drop is dramatic. Spanish AI scores 59 on expressiveness — 17 points below English (76) and 15 points below French (74). In our data, Spanish AI uses fewer rhetorical questions, fewer exclamation marks, and fewer attitude markers than any other language. This creates output that reads as measured and professional, but potentially flat.

For Spanish professionals who write expressively — and many do, particularly in marketing, media, and creative industries — the gap between their natural expressiveness and the AI's reserved default is larger than in any other language.

Vocabulary richness is the lowest after Japanese. At 44, Spanish AI produces less lexically diverse output than English or French AI. Spanish-language professionals with strong vocabularies will notice this limitation across all models.

Consistency is the highest of any language. At 55, Spanish AI writes with the most uniform sentence rhythm. This is subtle but noticeable — Spanish AI output has a steady, measured cadence that can feel monotonous in long-form content.

Best for Spanish writing: Models with naturally higher expressiveness (like GPT-5.2) can partially offset the language-level expressiveness drop. For Spanish marketing copy, social media, and customer communication, GPT-5.2's expressiveness advantage becomes more valuable than in English, where it might be too much. For Spanish formal documents — legal, financial, governmental — the expressiveness drop is actually desirable, making any model suitable with minimal calibration on that axis.


Japanese: A Different World

The AI average in Japanese:

AxisScore
Sentence Complexity62
Vocabulary Richness37
Expressiveness100
Formality59
Consistency53
Conciseness45

Japanese AI writing is the most distinctive of any language in our study. Three features stand out.

Expressiveness at 100. As discussed in How AI Writes Differently Across Languages, Japanese business writing uses polite markers, question forms, and epistemic hedges that the expressiveness formula measures. This score reflects the structure of professional Japanese, not emotional intensity. Both AI output and human writing in Japanese score high on this axis, so the comparison within Japanese remains valid.

Vocabulary richness at 37. The lowest of any language by a significant margin. This is a morphological effect — Japanese tokenization produces fewer unique tokens relative to total tokens. But within Japanese, the relative differences between models still matter. A model that scores 40 versus one that scores 34 is producing meaningfully more lexically diverse Japanese output.

Highest conciseness of any language. At 45, Japanese AI is more concise than English (42), French (32), or Spanish (36). Japanese grammar naturally allows compact expression — subject omission, agglutination, and context-dependent shortening all contribute. This means Japanese users face a smaller conciseness gap than users of other languages.

Sentence complexity is the lowest. At 62, Japanese AI writes simpler sentence structures than any other language's AI output. This might reflect how models handle Japanese syntax — shorter clausal structures, fewer nested subordinates — or it might reflect the prompt types we tested. Either way, Japanese users who write complex prose will find a larger gap on this axis than English users would.

Best for Japanese writing: The model dynamics in Japanese differ enough from English that English-based recommendations don't transfer. The most important axis for Japanese users is vocabulary richness (where models diverge the most within Japanese) and the nuance within the expressiveness ceiling (how each model handles keigo and politeness levels differently, even though all score near 100 on the aggregate measure).

Japanese professionals choosing between models should weight vocabulary richness and formality handling more heavily than expressiveness (which is maxed across all models) or conciseness (which is already relatively high).


The Cross-Language Matrix

Here's the full picture in one table:

AxisEnglishFrenchSpanishJapanese
Sentence Complexity65757162
Vocabulary Richness48494437
Expressiveness767459100
Formality58424659
Consistency53525553
Conciseness42323645

AI Model Writing Style Comparison

Comparing Claude Opus 4.6 vs GPT-5.2 in English

Bold values mark the highest score per axis. Three observations:

No language leads everywhere. Japanese leads on expressiveness, formality, and conciseness. French leads on complexity and vocabulary. Spanish leads on consistency. English is average across the board — which makes sense, as it's the primary RLHF training language and therefore the most "median."

The spread is dramatic on some axes. Expressiveness spans from 59 (Spanish) to 100 (Japanese) — a 41-point range. Conciseness spans from 32 (French) to 45 (Japanese) — a 13-point range. These are not subtle differences. A French user and a Japanese user using the same model will get output with fundamentally different Writing DNA.

Consistency is nearly language-independent. With a spread of only 3 points (52-55), consistency appears to be the axis most driven by model training rather than language structure. RLHF creates consistent sentence rhythm regardless of language.


Practical Recommendations by Language

For English Writers

Choose based on personal style fit. English has the most balanced AI baselines, so model choice is a genuine differentiator. Start with Claude Sonnet for versatility, GPT-5.2 for warmth, Gemini for formality.

For French Writers

Prioritize conciseness calibration above model choice. At 32, every model will produce output that's too long. A style profile that targets conciseness of 50+ will do more for your output quality than switching models.

For Spanish Writers

Consider expressiveness needs carefully. If your professional context requires warmth and energy (marketing, sales, leadership communication), the AI default will feel too restrained. GPT-5.2's naturally higher expressiveness helps, but calibration will still be necessary.

For Japanese Writers

Focus on vocabulary richness and formality nuance. The expressiveness ceiling affects all models equally, and conciseness is already the least problematic. Your highest-impact calibration point is getting the model to use more precise, diverse vocabulary — which a style profile extracted from your actual Japanese writing will target automatically.


The Bottom Line

"Which model writes best?" is the wrong question. The right question is: "Which model writes best in my language, for my style, in my context?"

The data shows that language shifts the playing field so dramatically that English-based model recommendations may not apply. A French user's optimal model choice and calibration strategy are genuinely different from an English user's.

This is why a Writing DNA Snapshot measures per-locale baselines. Your comparison is against AI output in your language, using the same formulas applied to your text. The gaps you see are real, language-appropriate, and actionable.

Your writing has a fingerprint. We measure it. My Writing Twin turns that measurement into instructions that make any AI — ChatGPT, Claude, Gemini — write like you. In any language.

Get Your Free Writing DNA Snapshot