Appearance
Why the mail history matters (and what 1y actually means)
The 1y "backpop scope" is not a retention ceiling. Michael's correction 2026-06-14, verbatim: "1y meant test the oauth connections for stability by polling a large date range." A year-wide IMAP/Graph pull is a stability probe for the OAuth lane on CT102 — it exercises pagination, throttle handling, and credential refresh against a meaningfully deep window. It is not a statement about how long mail is kept.
The retention model is single-surface, append-only, on CT102:
/mnt/data/hinata/mail-archive/(bind-mounted into CT102) is the canonical archive. Everything the poller has ever fetched lives here, indefinitely, idempotent by SHA-256 filename.- The Mac local archive (
~/Sandpit/hinata/resources/email-poller/, 14,682 messages, Oct 2016 → Jun 2026) is moving into that same surface during the consolidation (perhow-to_mail-poller-consolidation-and-backpop.md§1.1) — not held in a parallel "historical" tree. There is no two-surface model. - Streaming continues the history. Every 15-minute CT102 poll appends; nothing is rotated or pruned.
That collapses what looked like a tension in this file's earlier draft: full corpus (consumers like Heimerdinger, Bulma's lookback) and rolling stream (consumers like Zuko, Madara) are served by the same archive.
Downstream consumers
| Consumer | Reference / evidence | What they need from mail | Served by CT102 archive? | Notes |
|---|---|---|---|---|
| Heimerdinger (NLP embeddings + classifier) | reference_heim-embeddings-spec.md line 75; understanding_email-intelligence-evolution-plan.md Phase 0–1 | Full corpus for BERTopic one-time fit + classifier training + incremental refits | Yes | Reads /mnt/data/hinata/mail-archive/. Incremental embeddings as new mail lands; periodic full refit against the whole tree. |
| Bulma (receipts / statements / tax) | bulma-finances context; dense T212/NatWest/Amazon/Trainline traffic in 2026-06 sample; HMRC lookback (4–6y non-fraud, 20y fraud) | Reconstruct spend, locate statements, evidence purchases, tax-year audit | Yes | The full corpus on CT102 covers current-year reconstruction AND the multi-year HMRC lookback. No tax-year escape hatch needed. |
| Zuko (job search) | zuko-career context line 83; Tickblaze/Goodman Masson/Thu Ha Hong/LinkedIn density in sample | Recruiter thread continuity, application timeline, salary benchmarks | Yes | Recency-biased; reads the most recent N months from the same archive. |
| Madara (surveillance / threat) | Sample shows 12+ Microsoft "sign-in detected", Google security alerts, Kraken KYC, LMCU threads | Phishing patterns, breach notifications, account-takeover signals | Yes | Live-signal consumer; reads the tail of the stream. |
| Kurapika / Lelouch (networking + introductions) | colonel_makima-flow_lelouch-networking_context.md; sample shows academia mentions, Ollie Garlick coach intros | Connection pipeline, intro thread state | Yes | Recency-biased; same archive. |
| Dragonite (calendar reconstruction) | supreme-court/runtime/calendar-architecture.md; Trainline booking, jazz-jam confirmations | Meeting confirmations, travel itineraries, RSVPs | Yes | Reads the same archive. |
| Brook / Hange (cultural signal) | Sample shows Spotify (Kelela), MIDI Music, Neon Naked, LCCM events | Newsletter / cultural-tempo extraction | Yes | Recent-window consumer; same archive. |
| Mufasa (legacy / estate) | No mail grep hit | Important contracts, estate-relevant correspondence | Yes | Deep retrieval served by the full corpus on CT102. |
| Itachi (credentials / 2FA evidence) | Itachi context — credential discipline; dense Microsoft / Google security mails | Account-change audit trail, 2FA confirmations | Yes | Recency-biased; same archive. |
| Iroh / Urahara (digest + contingency) | mail-30d-digest.py; my own digest production | 30-day rolling for behaviour-change loop; priority-routing flags for the dashboard | Yes | Digest window is a moving 30d slice of the same archive. |
Top 3 consumers by strategic value:
- Heimerdinger — the only consumer that uses mail as a corpus rather than as a stream. Drives the email-intelligence end-state (Phase 0–4 of the evolution plan).
- Bulma — financial reconstruction + HMRC lookback. Full corpus on CT102 covers it directly.
- Zuko — active job hunt (£80k+ target is Active Project 1).
Pattern-of-mail observations
Sample basis: /Users/nnamdi/Sandpit/hinata/mail-archive/bodies/2026-06/ directory listing, June 2026 fragment (~140 messages visible). Categories ranked by density:
| Category | Density | Criticality | Notes |
|---|---|---|---|
| Account security / 2FA (Microsoft, Google) | Dominant — ~25% of June fragment | Low individually, high in aggregate (breach-detection baseline) | Microsoft account team alone fires 15+/month — noise floor that Heimerdinger must learn to suppress |
| Marketing / promo (Adidas, StockX, Sports Direct, BT, Saguaro, Lumosity, Hand-Picked Hotels) | High — ~30% | Low | Direct SERVER-DELETE candidates per reference_email-sweep-rules-review.md |
| Newsletters / cultural (Patreon, Backthen, Tickblaze, MIDI Music, Neon Naked, LCCM, Spotify) | High — ~20% | Medium | Brook/Hange signal but heavy noise — Tickblaze alone fires 7+ in the fragment |
| Receipts / commerce (Amazon, Sports Direct, Evri tracking, Trainline) | Steady — ~10% | High for Bulma | Bulma's spend ground truth; UGREEN hard drive purchase appears 4× across order → dispatch → out-for-delivery → delivered |
| Financial transactional (T212, NatWest statement, Kraken KYC, London Mutual Credit Union) | Steady — ~5% | Critical | Lowest volume, highest signal. T212 withdrawals pair to Hashirama; NatWest statement is monthly anchor for Bulma. London Mutual Credit Union thread closed 2026-06-14 — loan agreement sent to Bulma for debt-datapoint assimilation. Bulma now owns the debt datapoint. |
| Recruiter / jobs (Tickblaze, Goodman Masson, Thu Ha Hong/Softwire, LinkedIn Spotify Analytics Engineer II) | Moderate — ~5% | Critical for Zuko | Active pipeline. |
| Music / community events (jazz-jams Vortex, love-to-drum, guitar-social, Sean Wilson tutorial) | Moderate — ~5% | Medium for Squidward / Killua | Wedding-musician adjacency; live community signal |
| Housing (Spareroom, Guardian Care) | Moderate — ~5% | Medium | Active live signal — Spareroom fires 2–3×/week |
Density vs criticality inversion: the densest categories (security noise, marketing) carry the lowest signal-per-message. The sparsest (compliance, financial transactional, recruiter) carry the highest. The CT102 archive holds all of it — Heimerdinger needs the dense noise distribution to learn suppression; the sparse high-criticality threads are where Bulma/Zuko/Madara/Iroh draw signal.
Verdict
ENDORSE the locked plan. Single canonical archive on CT102 at /mnt/data/hinata/mail-archive/, append-only, full history preserved (Mac archive folds in during consolidation). Every downstream consumer above reads from this one surface. The 1y window in the dispatch is an OAuth stability probe — retention is full history.
Rationale collapsed to three points:
- One surface, all consumers. Heimerdinger trains incrementally against the full tree. Bulma's HMRC lookback resolves against the full tree. Recency-biased consumers (Zuko, Madara, Itachi, Dragonite, Brook/Hange) read the recent tail of the same tree. No fork.
- The Mac archive moves in, it does not stay parallel. Per the consolidation how-to §1.1,
~/Sandpit/hinata/resources/email-poller/migrates to the CT102-mounted archive under_mac-import/with sha1→sha256 re-hash on the way in. The deep history is preserved by virtue of moving — not by being held in a separate "historical" tree. - OAuth stability is the right thing to probe with a 1y pull. Pagination, refresh-token rotation, throttle handling, and credential-cache discipline are the surfaces a year-wide pull exercises. That probe is independent of retention policy.
Cross-links
- Commanders affected: Heimerdinger (training corpus), Bulma (tax lookback + debt datapoint), Zuko (recruiter timeline), Madara (surveillance), Itachi (credentials), Dragonite (calendar), Iroh/Urahara (digest + surfacing)
- Projects gated: email-intelligence evolution plan (Phase 1 classifier training against full CT102 corpus); £80k+ job search (Zuko's recruiter timeline served by the recent tail)
- Doctrine touchpoints:
supreme-court/runtime/email-deletion-heuristic.md;supreme-court/runtime/container-storage-strategy.md;supreme-court/runtime/calendar-architecture.md - Coordination: Trunks owns the technical backpop plan (
how-to_mail-poller-consolidation-and-backpop.md). Iroh owns the dashboard surfacing spec (how-to_mail-poller-context-advisory.md). - Closed threads: London Mutual Credit Union (loan agreement → Bulma debt-datapoint assimilation, 2026-06-14).