Back to Blog

How AI Writes Differently in Japanese, French, and Spanish

Our data from 320 samples across 4 languages reveals dramatic AI stylometric differences. Japanese expressiveness hits 100; French conciseness drops to 32.

ResearchMultilingualAI Writing

Most AI writing research focuses on English. Most AI writing products are built for English. Most comparisons, benchmarks, and style guides assume English.

This is a problem, because AI writes very differently in other languages.

Not just different words. Different style. Different sentence structures, different expressiveness patterns, different formality defaults. The same model, given the same prompt in Japanese and English, produces output with measurably different stylistic DNA.

We know this because we measured it.


The Data

Our benchmark study generated 320 writing samples across five AI models (Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5, GPT-5.2, Gemini 3 Pro) in four languages: English, French, Spanish, and Japanese. Eight prompt types. Two variants per combination. For how these models compare head-to-head, see AI Model Writing Styles Compared.

Every sample was analyzed using the same computational stylometry framework — the six-axis system behind Writing DNA Snapshots. The formulas are language-aware: Japanese text uses morphological tokenization via Intl.Segmenter, and each language's baseline is measured independently.

For the full methodology: How We Measure "Average AI".

Here are the per-language AI baselines:

AxisEnglishFrenchSpanishJapanese
Sentence Complexity65757162
Vocabulary Richness48494437
Expressiveness767459100
Formality58424659
Consistency53525553
Conciseness42323645

Every number in this table tells a story about how language structure shapes AI output. Let's unpack the biggest differences.


Japanese: The Expressiveness Outlier

The most striking number in the table is Japanese expressiveness: 100 out of 100.

This isn't a bug or an error. It's a consequence of how Japanese business communication works.

Japanese writing — even in professional contexts — relies heavily on polite markers (丁寧語), question forms, and epistemic hedges that the expressiveness formula picks up. Phrases like "いかがでしょうか" (how about this?), "ご検討いただけますと幸いです" (I would be grateful if you could consider this), and "よろしくお願いいたします" (thank you for your consideration) are standard business Japanese. They're not expressive in the English sense — they're not enthusiastic or emotional. But they use the linguistic structures (questions, markers, emphatic forms) that the expressiveness formula measures.

This matters for two reasons:

1. The comparison is still valid. Both AI-generated Japanese text and human-written Japanese text are measured using the same formula. If AI Japanese scores 100 on expressiveness and a Japanese user scores 85, the delta (15 points) is meaningful. It means the user uses fewer of these markers than the AI default — perhaps they write in a more direct, Western-influenced style.

2. You can't compare across languages directly. Saying "Japanese AI is more expressive than English AI" would be misleading. The formulas measure language-internal patterns. Japanese expressiveness at 100 and English expressiveness at 76 reflect different linguistic features, not different amounts of emotion.

Beyond expressiveness, Japanese AI shows the lowest vocabulary richness (37) — significantly below the other three languages. This reflects the nature of Japanese morphological analysis: with fewer unique tokens per segment after tokenization, the Type-Token Ratio runs lower. Japanese AI also scores the highest on conciseness (45) of any language — Japanese naturally allows more compact expression through agglutinative grammar and context-dependent omission of subjects.


French: Complex, Expansive, and Informal

French AI output tells a completely different story.

Sentence complexity: 75. The highest of any language by 10 points. French AI builds longer, more structurally dense sentences than English, Spanish, or Japanese AI. This reflects French prose traditions — subordinate clauses, relative constructions, and complex temporal structures that are standard in written French but would feel baroque in English.

Conciseness: 32. The lowest of any language. French AI writes long. Where English AI averages sentences that produce a conciseness score of 42, French AI produces sentences that score 32. Combined with high complexity, this means French AI output is dense and expansive — the kind of prose you'd expect in a well-crafted business rapport but would struggle with in a Slack message.

Formality: 42. Surprisingly, the lowest of any language. French AI defaults to a register that's less formal than English or Japanese, despite the structural complexity. This might seem contradictory, but it makes sense: French business communication often combines elaborate sentence structure with a relatively warm, personal tone. The formality formula picks up this paradox — function word density is lower in French than in English, and hedging patterns differ.

The practical implication: a French professional using AI without style calibration will get output that's structurally complex but tonally casual. If they need formal, concise prose — a common requirement in French business contexts — every dimension needs adjustment. Understanding how style extraction works is the first step toward fixing this.


Spanish: The Moderate Outlier

Spanish AI occupies an interesting middle ground, but with one notable exception.

Expressiveness: 59. This is dramatically lower than English (76), French (74), and Japanese (100). Spanish AI is the least expressive by far. It uses fewer rhetorical questions, fewer exclamation marks, and fewer attitude markers than AI output in any other language.

This was one of our more surprising findings. Spanish-language communication is often associated with warmth and expressiveness in casual contexts. But the prompt types in our study — professional emails, reports, presentations — may elicit a more formal register in Spanish AI, one that suppresses the expressive markers the formula measures.

Consistency: 55. The highest of any language, though only marginally. Spanish AI produces the most uniform sentence lengths — a steady rhythm that makes the output feel measured and predictable.

Vocabulary richness: 44. The lowest after Japanese. Spanish AI reuses words more frequently than English or French AI, producing output with lower lexical diversity. This could reflect how AI models handle Spanish morphology — Spanish verb conjugation creates many surface forms of the same word, and depending on how tokenization handles these, the Type-Token Ratio can be affected.


English: The "Default" Isn't Default

English baselines are treated as the reference in most AI research. But our data shows that English AI output has its own distinct profile — it's not neutral.

Expressiveness: 76. English AI is the second most expressive language after Japanese, and far above Spanish (59). The RLHF training process is predominantly English, and English RLHF raters appear to prefer energetic, engaged prose. This bias carries into the model's English output more strongly than into other languages.

Formality: 58. The second highest after Japanese (59). English AI defaults to a moderately formal register that reflects the professional communication context of most RLHF training data.

Conciseness: 42. Middle of the pack. English AI writes shorter sentences than French or Spanish AI but longer ones than Japanese AI.

The key insight: English AI baselines are specific to English, not universal. If you've been comparing your French or Japanese writing to English AI baselines (as most competitors force you to do), your radar chart was lying to you. The differences you saw reflected language structure, not your personal style.

This is why My Writing Twin measures per-locale baselines. Your Japanese writing is compared to Japanese AI. Your French writing is compared to French AI. The gaps on your radar chart represent your actual style deviations, not translation artifacts.


Cross-Language Patterns

Despite the differences, a few patterns hold across all four languages:

All Languages Write Long

Conciseness is below 50 in every language. The highest (Japanese at 45) is still below the midpoint. AI models are universally verbose — a consequence of RLHF training that rewards thoroughness. This bias transcends language.

Consistency Is Universal

Consistency scores cluster between 52 and 55 across all four languages — the tightest clustering of any axis. Regardless of language, AI models produce moderate sentence-length variation. This may be the strongest evidence of RLHF's homogenizing effect: the training process doesn't just standardize English output. It standardizes rhythm patterns across all languages.

Vocabulary Richness Varies by Morphology

Vocabulary richness shows the widest spread tied to linguistic structure. French and English (both analytic languages with relatively simple morphology) score highest (48-49). Japanese (agglutinative) and Spanish (with complex verb morphology) score lower (37-44). This isn't about the models — it's about how the Type-Token Ratio interacts with different morphological systems.


Why This Matters for Multilingual Professionals

If you write in more than one language — and many professionals do — this data has practical implications:

1. Your style shifts across languages, and so does AI's. A bilingual French-English professional likely writes differently in each language. Longer, more complex sentences in French. More concise, more direct in English. AI does the same thing. A valid writing comparison must account for both shifts.

2. Model choice matters differently by language. A model that matches your style in English might miss your style in French. The relative strengths of GPT, Claude, and Gemini shift across languages. For the model-by-language breakdown, see Which AI Model Writes Best in Each Language.

3. One style profile isn't enough for multilingual writers. A style profile calibrated for your English writing won't work for your Japanese writing. The AI baselines are different, your patterns are different, and the calibration deltas are different. This is why My Writing Twin supports per-language style extraction.


The Bilingual Advantage

We've written about bilingual professionals and AI writing before. The cross-language baseline data adds a new dimension to that story.

Bilingual writers have a measurable advantage in a world of per-locale AI baselines. They already know — intuitively if not quantitatively — that their communication style shifts between languages. They code-switch naturally. A style profile that captures both registers, calibrated against the correct per-language AI baselines, produces AI output that code-switches with them.

Monolingual AI products can't do this. They apply English baselines to non-English text and call it personalization. Our data shows why AI writing sounds generic — and why it's worse across languages: a French writer compared to English baselines appears artificially high on complexity and low on conciseness. Those aren't style features. They're language features.


See Your Writing DNA in Your Language

Curious how your writing compares to AI in your language — not the English default? Try your free Writing DNA Snapshot. Submit writing samples in any of our supported languages (English, French, Spanish, Japanese), and see your radar chart against the correct per-locale AI baseline.

The gaps you see will be about your style, not your language.

Get Your Free Writing DNA Snapshot