Back to Blog

How to Select Your Best Writing Samples for AI Analysis

Not all writing samples are equal. Learn what makes a good sample for AI style analysis and why sample quality determines whether AI captures your voice.

Style ProfilesCustom GPT

You've decided to train AI on your writing style. Smart move. Now comes the crucial part: selecting samples.

Most people grab whatever's convenient. Recent emails. The first document in their Google Drive. A blog post they're proud of. The result? AI that captures a sliver of their voice—or worse, patterns they don't actually want to replicate.

Sample selection isn't glamorous. But it's the difference between AI that writes like you and AI that writes like some vaguely professional version of you that nobody recognizes.

This guide breaks down exactly what to look for, what to avoid, and where to find samples that actually represent how you communicate.


Why Sample Quality Determines Everything

The principle is simple: AI learns your patterns from the examples you provide.

If you upload five emails where you were exhausted and cutting corners, AI learns that sloppy version of you. If you provide heavily edited corporate communications, AI learns the editor's voice, not yours. If your samples are all from one context—say, formal client emails—AI won't know how you write to colleagues, direct reports, or in casual settings.

This is the garbage in, garbage out problem. AI style analysis works exactly as well as your sample selection. No algorithm can extract your authentic voice from samples that don't represent it.

Think of it like training an athlete by showing them highlight reels versus actual game footage. Highlights look impressive, but they don't show the full skillset. For AI to capture your writing voice, it needs the equivalent of game footage—authentic communication in real contexts where you were solving actual problems.

The good news? You already have this writing. You've been producing it for years. The challenge is knowing which pieces to select and which to skip.


The Five Criteria of a Good Writing Sample

Not all your writing is suitable for AI analysis. Here's what separates samples that train AI effectively from samples that confuse it.

1. Authenticity: You Wrote It

This sounds obvious, but it's the most commonly violated rule.

Good samples:

  • Emails you drafted and sent yourself
  • Documents you wrote from scratch
  • Messages where you explained something in your own words
  • First drafts (before heavy editing by others)

Bad samples:

  • Collaborative documents where you can't identify your sections
  • Content written by others that you lightly edited
  • AI-generated drafts (even if you edited them)
  • Templates you filled in
  • Publications edited heavily by an editor

The "edited by an editor" point needs emphasis. If you write for formal publications, the final version isn't your voice—it's your voice filtered through editorial guidelines. Use your submitted draft instead. That's the raw version that sounds like you.

2. Representative: This Is How You Actually Write

A sample might be authentic but not representative. Maybe you wrote it, but it doesn't sound like your typical communication.

Representative samples:

  • Communication you'd stand behind as "yeah, that's how I write"
  • Contexts you encounter regularly
  • Audiences you frequently address
  • Topics within your normal range

Unrepresentative samples:

  • Writing from early in your career that no longer matches your current style
  • Communication written under extreme stress or emotion
  • One-off formal submissions (legal documents, official complaints)
  • Deliberately experimental writing

The test: Would someone who works with you regularly recognize this as your voice? If not, skip it.

3. Substantial: Enough Text to Show Patterns

AI needs enough content to identify patterns. A two-sentence email doesn't reveal much. A 300-word explanation does.

Minimum viable lengths:

  • 200 words — bare minimum for a single sample
  • 300-500 words — ideal range for most samples
  • 1,000+ words — excellent if available (reports, articles, long emails)

Short samples aren't useless, but they require more of them to reach statistical significance. Five 200-word samples tell AI less than three 500-word samples.

What counts as "words":

  • Prose paragraphs (high value)
  • Bullet points (medium value)
  • Your writing in email threads (high value)
  • Subject lines alone (skip these)

4. Recent: Reflects Your Current Style

Your writing evolves. The way you communicated five years ago might not match how you write today. Recent samples capture your current voice.

Ideal recency:

  • Last 6 months — best case
  • Last 1-2 years — perfectly acceptable
  • 3+ years ago — only if your style hasn't changed significantly

How do you know if your style has changed? Read something you wrote three years ago. If it sounds like you, include it. If it feels dated or makes you cringe, skip it.

5. Context-Clear: AI Knows What It's Looking At

AI needs context to understand why you wrote this way. An email to your CEO reads differently than a Slack to your team—both are valid, but the context explains the formality shift.

Context AI needs:

  • Audience: Who were you writing to? (Executive, peer, client, public)
  • Purpose: Why did you write this? (Inform, persuade, request, thank)
  • Medium: What platform? (Email, Slack, document, social media)

Most AI analysis tools ask you to label samples by category. Don't skip this step. "Email to client" and "Slack to team" might both be authentic, but they're different snapshots of your range. AI needs to know which is which to map your full spectrum.


What to Avoid: The Red Flags

Some samples actively hurt AI analysis. They introduce noise, confuse pattern recognition, or train AI on voices that aren't yours. Skip these entirely.

AI-Written or AI-Edited Content

If you've used AI to write or heavily edit something, don't use it as a sample. The patterns aren't yours—they're the AI's, filtered through your edits.

This includes:

  • ChatGPT drafts you lightly revised
  • Claude-generated emails you tweaked
  • Any content where you started with AI output

Exception: If you rewrote AI output so heavily that less than 20% of the original remains, it might qualify. But be honest with yourself about how much is truly your voice.

Collaborative Documents

Co-authored reports, wiki pages edited by multiple people, documents with track changes from three colleagues—these contain everyone's voice, not just yours.

How to tell if it's too collaborative:

  • Can you identify which paragraphs you wrote? If yes, extract only those.
  • Did someone else do structural edits that changed your flow? If yes, skip it.
  • Are half the sentences someone else's phrasing? If yes, too contaminated.

Too Short (Under 200 Words)

Quick replies, acknowledgments, one-line approvals—these don't contain enough text for pattern extraction.

Yes, you write these regularly. But they're not training material. Think of them as data points too small to be statistically meaningful.

Heavily Formatted Technical Content

Code snippets, spreadsheets exported to text, legal contracts with boilerplate, form letters with fill-in-the-blank sections—these aren't prose. AI trained on them learns formatting quirks, not your writing voice.

Exception: Technical documentation you wrote in narrative form (explaining how something works) is fine. It's the raw code or structured data that doesn't work.

Formal Publications (Unless You Have the Pre-Edit Draft)

Articles, white papers, blog posts that went through an editorial process—the final version isn't your voice anymore. Editors standardize, smooth out personality, and enforce house style.

If you write for formal publications, dig up your submitted draft. That's the version before editorial filters.


Where to Find Your Best Samples

You've written thousands of documents over your career. Where should you look?

Email Archives: The Gold Mine

Email is where most professionals do their real writing. It's authentic, contextual, and covers multiple audiences. Here's how to extract good samples.

Gmail Search Operators

from:me after:2024/01/01 -to:me -list

This finds emails you sent (not replies to yourself) in the last year, excluding mailing lists.

Refine further:

from:me to:client@company.com after:2024/01/01 longer:500

This finds emails to a specific client over 500 words.

Advanced filtering:

from:me -subject:re: -subject:fwd: after:2024/01/01

This excludes replies and forwards, giving you original communications.

Outlook Search

In Outlook's search bar:

from:me sent:last year

Then filter by size (larger emails are longer) or manually scan subject lines for substantial communications.

Slack/Teams Message History

Workplace chat contains authentic voice—often more casual than email. Look for:

  • Long explanations: Where you walked someone through a concept
  • Decision rationales: Where you explained why you chose an approach
  • Project updates: Longer messages summarizing progress
  • Threads where you dominated: Conversations where you wrote 60%+ of the content

How to export Slack messages:

  1. Use Slack's export feature (workspace admins only) or
  2. Copy/paste individual threads into a document
  3. Label them clearly ("Slack to engineering team re: API design")

Teams export: Teams is harder to export in bulk. Manual copy/paste is usually the path.

Internal Documents and Reports

These tend to be longer and more structured. Ideal candidates:

  • Project proposals you wrote (before someone edited them)
  • Status reports you drafted regularly
  • Internal memos explaining decisions
  • Onboarding documentation you created
  • Post-mortems you authored

Where to find them:

  • Google Drive: Filter by "Owner: me" and sort by date modified
  • Notion: Search your authored pages
  • Confluence: Filter by author
  • SharePoint: Check your document library

Personal Blog Posts or Articles

If you write publicly, these are usually strong samples—assuming they're not heavily edited.

Good sources:

  • Personal blog posts
  • LinkedIn articles (longform, not feed posts)
  • Medium posts you wrote
  • Guest posts on industry sites (if lightly edited)

Check recency: Use posts from the last 1-2 years unless your style hasn't changed.

Social Media (With Caution)

Platform constraints shape your writing. A Twitter thread reads differently than an email. But social content can work if:

  • It's substantial (LinkedIn posts over 300 words)
  • It reflects how you'd write in other contexts
  • It's not performative (some people write differently when "on stage")

Best sources:

  • LinkedIn longform posts
  • Twitter/X threads where you explained something
  • Facebook posts to private groups (more authentic than public)

Skip:

  • One-liner tweets
  • Heavily curated Instagram captions
  • Content written for engagement farming

Sample Selection by Context: The Full Spectrum

Your voice isn't monolithic. You shift formality, directness, and structure based on who you're writing to and why. Good AI analysis captures this range.

Here's how to build a sample set that covers your full spectrum.

Audience-Based Sampling

Aim to include samples from at least 3-5 different audience types:

Audience TypeWhy It MattersSample Count
Executives/leadershipCaptures your formal, concise mode3-5 samples
Peers/colleaguesYour baseline professional voice5-7 samples
Direct reportsMore supportive, explanatory tone3-5 samples
Clients/externalProfessional but warm3-5 samples
Casual/internal teamYour most relaxed register3-5 samples

Why range matters: If you only provide executive-facing samples, AI learns formal-you. It won't know how to shift for casual contexts. The goal is to train AI on your full range so it can choose the right register for each situation.

Purpose-Based Sampling

Different writing purposes reveal different dimensions of your style:

  • Informational (status updates, FYIs): Shows your clarity and structure
  • Persuasive (proposals, recommendations): Reveals your argumentation style
  • Directive (delegating, instructing): Captures your leadership voice
  • Supportive (feedback, coaching): Shows your empathy and tone
  • Difficult (bad news, pushback): Reveals how you handle tension

Try to include at least 2-3 samples from each purpose category you use regularly.

Medium-Based Sampling

If you write across multiple platforms, represent each:

  • Email: Formal to semi-formal
  • Slack/Teams: Casual, quick
  • Documents: Structured, detailed
  • Public writing: Polished, audience-aware

AI can learn that your Slack voice is more casual than your email voice—but only if you give it examples of both.


How Many Samples Do You Need?

The answer depends on your use case and the complexity of your voice.

Minimum Viable Sample Set

For basic AI style training:

  • 5-10 samples covering 2-3 audience types
  • Total word count: 2,000-3,000 words
  • Contexts: Mix of email, documents, and messages

This gives AI enough data to identify your baseline patterns: sentence rhythm, punctuation habits, formality level, common phrases.

Recommended Sample Set

For comprehensive style analysis:

  • 15-25 samples covering 4-5 audience types
  • Total word count: 5,000-7,000 words
  • Contexts: Full spectrum of your communication

This allows AI to map your range: how you shift between audiences, how you adapt to different purposes, where your boundaries are.

Advanced Sample Set

For capturing nuanced, context-aware voice:

  • 30-50 samples covering 5+ audience types
  • Total word count: 8,000-15,000 words
  • Contexts: Everything you write, including niche situations

This is the level where AI can replicate subtle shifts—like how you write differently to new clients versus long-term partners, or how your urgency affects your sentence length.

The 5-per-section rule: If your AI analysis tool asks for samples by category (email, reports, messages), aim for at least 5 samples per category. Why five? Because pattern recognition needs multiple data points to separate signal from noise. One sample might be an outlier. Five samples reveal the pattern.


Sample Quality Checklist: Before You Submit

Before uploading samples to an AI analysis tool, run this checklist:

  • Authenticity: I wrote this myself, not collaboratively
  • Representative: This sounds like how I normally write
  • Substantial: At least 200 words, ideally 300-500+
  • Recent: Written in the last 1-2 years
  • Context-clear: I can label the audience, purpose, and medium
  • No AI content: This wasn't written or heavily edited by AI
  • No heavy editing: This wasn't significantly revised by someone else
  • Typical communication: I write like this regularly, not just once

If a sample fails two or more criteria, skip it. Better to have fewer high-quality samples than more low-quality ones.


Common Sample Selection Mistakes

Mistake 1: Prioritizing Pride Over Representativeness

You wrote a brilliant proposal that won a major client. It's your best work. But it's also 10% more formal than your usual style, and a colleague edited the structure.

The trap: You want AI to learn from your "best" work. But "best" often means polished, edited, and atypical. AI should learn from your normal work—the version of you that people encounter every day.

Fix: Include the brilliant proposal if it's truly representative. If it's an outlier, skip it or balance it with 5-7 typical samples.

Mistake 2: Sampling Only One Context

You spend 60% of your writing time on emails to clients. So you upload 20 client emails and call it done.

The trap: AI learns client-you. It doesn't know peer-you, team-you, or executive-you exist. When you ask it to draft an internal memo, it applies the client-facing formality—which feels wrong.

Fix: Even if one context dominates, include at least 3-5 samples from secondary contexts. This teaches AI that you have range.

Mistake 3: Including Recent But Unrepresentative Work

You just wrote a formal letter to a legal team—something you do once a year. It's recent, so you include it.

The trap: Recency doesn't equal representativeness. That legal letter is authentic but not typical. It skews AI toward formality that doesn't match your baseline.

Fix: Ask yourself: "Do I write like this regularly?" If the answer is "no" or "rarely," skip it—even if it's recent.

Mistake 4: Overthinking It

You've spent two hours analyzing whether a 400-word email is "authentic enough" and "truly representative." You've reread it five times. You're paralyzed.

The trap: Perfectionism kills progress. No sample is perfect. AI needs volume more than perfection.

Fix: Use the 10-second test. Read the sample. Does it sound like you? If yes, include it. If no or unsure, skip it. Move on. You need 15-20 samples, not 15-20 perfectly curated museum pieces.


Advanced Tips by Sample Type

Email Samples

What works:

  • Explanatory emails where you walked someone through a concept
  • Follow-ups that required context and reasoning
  • Difficult conversations handled via email
  • Longer project updates (300+ words)

What doesn't:

  • Two-line "Thanks!" replies
  • Forwarded content with a one-sentence intro
  • Automated responses or templates
  • Email threads where you wrote less than 50% of the content

Pro tip: Look for emails where the recipient replied with clarifying questions or appreciation. Those indicate you were communicating naturally and effectively—the gold standard for samples.

Report Samples

What works:

  • Your submitted first draft (before edits)
  • Internal reports with minimal editing
  • Documentation you wrote solo
  • Project proposals in your voice

What doesn't:

  • Final published versions edited by others
  • Collaborative sections you can't isolate
  • Template-based reports where only data changed
  • Heavily formatted tables and charts

Pro tip: If you can't find your pre-edit draft, check version history in Google Docs or track changes in Word. Your original version is in there somewhere.

Message Samples (Slack/Teams)

What works:

  • Threads where you explained a decision or approach
  • Long-form updates posted to channels
  • Technical explanations to teammates
  • Conversations where you dominated the dialogue (60%+ of text)

What doesn't:

  • Quick back-and-forth banter
  • Single-sentence acknowledgments
  • GIFs and emoji-only responses
  • Threads with 10 people where your contribution was minimal

Pro tip: Export entire threads where you were the primary voice. Label them clearly ("Slack to engineering team re: API design decision"). Context helps AI understand the casual register.


What About Multilingual Samples?

If you write professionally in multiple languages, sample selection gets more complex. Your English voice might be direct and concise while your Japanese voice is more formal and circumspect.

The principle: AI needs samples in each language you want it to replicate.

How to sample multilingually:

  1. Separate by language: Provide distinct sample sets for each language
  2. Match context across languages: If you provide executive emails in English, provide executive emails in Japanese too
  3. Expect different baseline styles: Your formality might shift between languages—that's normal and should be captured

Word count targets per language:

  • If you write equally in both languages: 5,000-7,000 words per language
  • If one language is secondary: 3,000-4,000 words for the secondary language

For more on multilingual style analysis, see our bilingual professionals guide.


The Bottom Line

Sample quality determines AI quality. No algorithm can extract your authentic voice from inauthentic samples.

The best samples are the ones you forgot you wrote—the emails you sent without overthinking, the reports you drafted in your natural flow, the Slack messages where you explained something clearly because the team needed to understand.

These aren't your highlight reels. They're your game footage. They capture how you actually communicate when you're solving real problems for real people.

That's the version of you AI needs to learn.


Ready to Analyze Your Writing?

Selecting good samples is half the work. The other half is analysis—extracting the patterns from those samples and turning them into AI instructions.

You can do this manually (expect 3-5 hours) or use a systematic extraction tool like My Writing Twin.

Option 1: DIY Sample Analysis

Read through your selected samples and document:

  • Common sentence structures and rhythms
  • Punctuation patterns (do you use em-dashes frequently?)
  • Opening and closing signatures
  • Phrases you use often
  • Phrases you avoid
  • Formality shifts between audiences

Then build that into custom instructions for ChatGPT or system prompts for Claude.

Option 2: Automated Extraction

My Writing Twin analyzes your samples systematically, extracting:

  • Sentence rhythm patterns
  • Punctuation fingerprint
  • Vocabulary preferences and anti-patterns
  • Formality spectrum by context
  • Transition architecture
  • Opening/closing signatures

You upload your samples (following the guidelines in this article), and we generate a Master Prompt—the complete instruction set that makes any AI write like you. Paste it into ChatGPT, Claude, Gemini, or any AI tool. Your samples become your Writing Twin.

Start your free Writing DNA Snapshot →