Skip to content

Why the mail history matters (and what 1y actually means)

The 1y "backpop scope" is not a retention ceiling. Michael's correction 2026-06-14, verbatim: "1y meant test the oauth connections for stability by polling a large date range." A year-wide IMAP/Graph pull is a stability probe for the OAuth lane on CT102 — it exercises pagination, throttle handling, and credential refresh against a meaningfully deep window. It is not a statement about how long mail is kept.

The retention model is single-surface, append-only, on CT102:

  • /mnt/data/hinata/mail-archive/ (bind-mounted into CT102) is the canonical archive. Everything the poller has ever fetched lives here, indefinitely, idempotent by SHA-256 filename.
  • The Mac local archive (~/Sandpit/hinata/resources/email-poller/, 14,682 messages, Oct 2016 → Jun 2026) is moving into that same surface during the consolidation (per how-to_mail-poller-consolidation-and-backpop.md §1.1) — not held in a parallel "historical" tree. There is no two-surface model.
  • Streaming continues the history. Every 15-minute CT102 poll appends; nothing is rotated or pruned.

That collapses what looked like a tension in this file's earlier draft: full corpus (consumers like Heimerdinger, Bulma's lookback) and rolling stream (consumers like Zuko, Madara) are served by the same archive.

Downstream consumers

ConsumerReference / evidenceWhat they need from mailServed by CT102 archive?Notes
Heimerdinger (NLP embeddings + classifier)reference_heim-embeddings-spec.md line 75; understanding_email-intelligence-evolution-plan.md Phase 0–1Full corpus for BERTopic one-time fit + classifier training + incremental refitsYesReads /mnt/data/hinata/mail-archive/. Incremental embeddings as new mail lands; periodic full refit against the whole tree.
Bulma (receipts / statements / tax)bulma-finances context; dense T212/NatWest/Amazon/Trainline traffic in 2026-06 sample; HMRC lookback (4–6y non-fraud, 20y fraud)Reconstruct spend, locate statements, evidence purchases, tax-year auditYesThe full corpus on CT102 covers current-year reconstruction AND the multi-year HMRC lookback. No tax-year escape hatch needed.
Zuko (job search)zuko-career context line 83; Tickblaze/Goodman Masson/Thu Ha Hong/LinkedIn density in sampleRecruiter thread continuity, application timeline, salary benchmarksYesRecency-biased; reads the most recent N months from the same archive.
Madara (surveillance / threat)Sample shows 12+ Microsoft "sign-in detected", Google security alerts, Kraken KYC, LMCU threadsPhishing patterns, breach notifications, account-takeover signalsYesLive-signal consumer; reads the tail of the stream.
Kurapika / Lelouch (networking + introductions)colonel_makima-flow_lelouch-networking_context.md; sample shows academia mentions, Ollie Garlick coach introsConnection pipeline, intro thread stateYesRecency-biased; same archive.
Dragonite (calendar reconstruction)supreme-court/runtime/calendar-architecture.md; Trainline booking, jazz-jam confirmationsMeeting confirmations, travel itineraries, RSVPsYesReads the same archive.
Brook / Hange (cultural signal)Sample shows Spotify (Kelela), MIDI Music, Neon Naked, LCCM eventsNewsletter / cultural-tempo extractionYesRecent-window consumer; same archive.
Mufasa (legacy / estate)No mail grep hitImportant contracts, estate-relevant correspondenceYesDeep retrieval served by the full corpus on CT102.
Itachi (credentials / 2FA evidence)Itachi context — credential discipline; dense Microsoft / Google security mailsAccount-change audit trail, 2FA confirmationsYesRecency-biased; same archive.
Iroh / Urahara (digest + contingency)mail-30d-digest.py; my own digest production30-day rolling for behaviour-change loop; priority-routing flags for the dashboardYesDigest window is a moving 30d slice of the same archive.

Top 3 consumers by strategic value:

  1. Heimerdinger — the only consumer that uses mail as a corpus rather than as a stream. Drives the email-intelligence end-state (Phase 0–4 of the evolution plan).
  2. Bulma — financial reconstruction + HMRC lookback. Full corpus on CT102 covers it directly.
  3. Zuko — active job hunt (£80k+ target is Active Project 1).

Pattern-of-mail observations

Sample basis: /Users/nnamdi/Sandpit/hinata/mail-archive/bodies/2026-06/ directory listing, June 2026 fragment (~140 messages visible). Categories ranked by density:

CategoryDensityCriticalityNotes
Account security / 2FA (Microsoft, Google)Dominant — ~25% of June fragmentLow individually, high in aggregate (breach-detection baseline)Microsoft account team alone fires 15+/month — noise floor that Heimerdinger must learn to suppress
Marketing / promo (Adidas, StockX, Sports Direct, BT, Saguaro, Lumosity, Hand-Picked Hotels)High — ~30%LowDirect SERVER-DELETE candidates per reference_email-sweep-rules-review.md
Newsletters / cultural (Patreon, Backthen, Tickblaze, MIDI Music, Neon Naked, LCCM, Spotify)High — ~20%MediumBrook/Hange signal but heavy noise — Tickblaze alone fires 7+ in the fragment
Receipts / commerce (Amazon, Sports Direct, Evri tracking, Trainline)Steady — ~10%High for BulmaBulma's spend ground truth; UGREEN hard drive purchase appears 4× across order → dispatch → out-for-delivery → delivered
Financial transactional (T212, NatWest statement, Kraken KYC, London Mutual Credit Union)Steady — ~5%CriticalLowest volume, highest signal. T212 withdrawals pair to Hashirama; NatWest statement is monthly anchor for Bulma. London Mutual Credit Union thread closed 2026-06-14 — loan agreement sent to Bulma for debt-datapoint assimilation. Bulma now owns the debt datapoint.
Recruiter / jobs (Tickblaze, Goodman Masson, Thu Ha Hong/Softwire, LinkedIn Spotify Analytics Engineer II)Moderate — ~5%Critical for ZukoActive pipeline.
Music / community events (jazz-jams Vortex, love-to-drum, guitar-social, Sean Wilson tutorial)Moderate — ~5%Medium for Squidward / KilluaWedding-musician adjacency; live community signal
Housing (Spareroom, Guardian Care)Moderate — ~5%MediumActive live signal — Spareroom fires 2–3×/week

Density vs criticality inversion: the densest categories (security noise, marketing) carry the lowest signal-per-message. The sparsest (compliance, financial transactional, recruiter) carry the highest. The CT102 archive holds all of it — Heimerdinger needs the dense noise distribution to learn suppression; the sparse high-criticality threads are where Bulma/Zuko/Madara/Iroh draw signal.

Verdict

ENDORSE the locked plan. Single canonical archive on CT102 at /mnt/data/hinata/mail-archive/, append-only, full history preserved (Mac archive folds in during consolidation). Every downstream consumer above reads from this one surface. The 1y window in the dispatch is an OAuth stability probe — retention is full history.

Rationale collapsed to three points:

  1. One surface, all consumers. Heimerdinger trains incrementally against the full tree. Bulma's HMRC lookback resolves against the full tree. Recency-biased consumers (Zuko, Madara, Itachi, Dragonite, Brook/Hange) read the recent tail of the same tree. No fork.
  2. The Mac archive moves in, it does not stay parallel. Per the consolidation how-to §1.1, ~/Sandpit/hinata/resources/email-poller/ migrates to the CT102-mounted archive under _mac-import/ with sha1→sha256 re-hash on the way in. The deep history is preserved by virtue of moving — not by being held in a separate "historical" tree.
  3. OAuth stability is the right thing to probe with a 1y pull. Pagination, refresh-token rotation, throttle handling, and credential-cache discipline are the surfaces a year-wide pull exercises. That probe is independent of retention policy.
  • Commanders affected: Heimerdinger (training corpus), Bulma (tax lookback + debt datapoint), Zuko (recruiter timeline), Madara (surveillance), Itachi (credentials), Dragonite (calendar), Iroh/Urahara (digest + surfacing)
  • Projects gated: email-intelligence evolution plan (Phase 1 classifier training against full CT102 corpus); £80k+ job search (Zuko's recruiter timeline served by the recent tail)
  • Doctrine touchpoints: supreme-court/runtime/email-deletion-heuristic.md; supreme-court/runtime/container-storage-strategy.md; supreme-court/runtime/calendar-architecture.md
  • Coordination: Trunks owns the technical backpop plan (how-to_mail-poller-consolidation-and-backpop.md). Iroh owns the dashboard surfacing spec (how-to_mail-poller-context-advisory.md).
  • Closed threads: London Mutual Credit Union (loan agreement → Bulma debt-datapoint assimilation, 2026-06-14).