Skip to content

How to consolidate the mail-poller onto CT102 and backpop history

CT102 is the canonical mail-poller (Michael ruling, 2026-06-14, verbatim: "why would z2 host be poller. i want it containerised."). Anywhere else mail-polling code lives, it is a duplicate to verify and delete. This guide is the technical playbook Jimmy Neutron executes.

Locked decisions (Michael, 2026-06-14)

Critical reframing (Michael verbatim 2026-06-14): "misinformation occured in prompting. 1y meant test the oauth connections for stability by polling a large date range." The 1y window is an OAuth stability test, not a retention scope. Retention = full history, append-only. CT102 holds one continuous corpus: the moved Mac archive (~14.7K messages, Oct 2016 → Jun 2026) plus every subsequent live poll, in one intelligently-organised data plane. Heimerdinger reads only canonical CT102 paths; there is no separate _mac-import/ consumer surface.

QuestionLocked answer
Canonical instanceCT102 — containerised; Z2-host poller retired entirely
Retention scopeFULL HISTORY, append-only. Nothing is discarded. CT102 holds the complete archive (Mac 10y dump + all live streaming) as ONE continuous corpus. Streaming polls append to the same canonical paths the Mac corpus lands in.
OAuth stability test (Gmail)1y poll-window (since 2025-06-14) — stress-tests the OAuth connection across a sizeable date range during cutover. Not a retention boundary.
OAuth stability test (Outlook ×3)1y poll-window (since 2025-06-14) — same stress test, per account, against Microsoft Graph
Mac local archive (~/Sandpit/hinata/resources/email-poller/)MOVE into CT102's canonical data-object structure (not a separate _mac-import/ subtree). In-place re-hash sha1[:12] → sha256[:16] during the move, envelope normalised to the CT102 schema. Source path is read-only during the move.
Format homogenisationJSON (single envelope schema per reference_mail-poller-z2.md). Mac archive imports converted on ingest.
Container + mount disciplineNothing at filesystem root. Everything lives under dedicated container subdirs + bind-mounts. Michael verbatim: "nothing should be on root should be in dedicated container and mounts i think."
Heimerdinger read surfaceCT102 canonical paths only. No Mac-import subtree, no historical-archive read separately. Incremental retrain on appended messages (not full-corpus retrain per poll).
Output surfaceIroh's dashboard — fully formed alerts + information packets per the existing scope.
Cutover ceremonyNONE — direct deploy. Michael: "why is a cutover window needed?" — ship it.
Live-poll / archive overlapINTENTIONAL — Michael: "archive has all emails i just made an overlap so standardisation to live polls has history overlap of both data forms." Overlap is a verification window across formats, not a bug. Idempotent filenames make it safe.

0. CT102 canonical data-object organisation (new)

Schema chosen: {account}/{YYYY}/{MM}/{sha256[:16]}.json — by account, then year, then month. Justification:

  1. Already canonical. This is the schema CT102's live mail-poller.py writes (per reference_mail-poller-z2.md § Archive Message Format). The Mac archive (currently sha1[:12]) re-hashes into the same shape — one tree, two ingest histories.
  2. Account-first matches consumer access patterns. Bulma reads gmail/ for receipts; Madara reads cross-account for security signal; Heim consumes the whole tree but partitions naturally on account during incremental fit.
  3. Year/month gives O(1) date-range queries without an external index. Iroh's dashboard and the OAuth stability test poll both window-query by month.
  4. By-thread was rejected — thread reconstruction is a derived view, not a storage primitive. Heim builds threads in its embedding space; raw storage stays flat-by-date.

Canonical paths under CT102 (all bind-mounted to /mnt/data/hinata/mail-archive/ on the Z2 host — never on filesystem root):

/mail-archive/                                 # CT102 bind of /mnt/data/hinata/mail-archive
├── gmail/{YYYY}/{MM}/{sha256[:16]}.json       # one continuous corpus: Mac history + live polls
├── hotmail-michael-asolo/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-michael-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-n-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── _journal/                                  # append-only ingest journal
│   ├── live-poll.jsonl                        # every successful live-poll write
│   ├── mac-import.jsonl                       # every rehash-mac-import.py write
│   └── rehash-state.json                      # idempotency tracker (Mac → CT102 progress)
└── _state/                                    # state cursors live inside the archive root
    ├── state.json                             # live-poll cursor (per account / per folder)
    └── state-oauth-test.json                  # 1y OAuth stability test cursor (separate from live)

_journal/ and _state/ use leading-underscore so they sort above account dirs and never collide with an account named journal or state. They are part of the canonical surface — Heimerdinger and Iroh's dashboard ignore them by glob (*/{YYYY}/{MM}/*.json only).

The Mac archive does not land in a separate _mac-import/ subtree. It rehashes directly into gmail/{YYYY}/{MM}/ under the canonical scheme. The only _mac-import artefact is the journal line in _journal/mac-import.jsonl recording provenance for audit.

1. Inventory of mail-poller instances (scope first, do not delete yet)

1.1 Mac surfaces

PathTypeStatusNotes
/Users/nnamdi/Sandpit/hinata/scripts/mail-poller.pylive scriptDUPLICATEMulti-IMAP, writes to ~/Sandpit/hinata/resources/mail/ + resources/email-poller/. Not invoked by any current Mac LaunchAgent (no com.hinata.mail-poller.plist exists). Legacy Mac-era poller — same product class as CT102, must die.
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.pylive scriptDUPLICATESister copy of above, in the git-tracked Sandpit repo.
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/ct102-mail-poller.pymirrorKEEP (read-only)Reference-only mirror of CT102's deployed script. Per reference_mail-poller-z2.md line 10 it is the documented read-only mirror. Not executed on Mac.
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.pylive scriptVERIFYListed in reference_mail-poller-architecture.md as part of Z2-host /opt/jimmy-brain-ops/scripts/ — same code mirrored to Sandpit. Active on Z2 host.
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/outlook-graph-poller.pylive scriptVERIFYAs above.
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-30d-digest.pydigest scriptKEEPIroh's digest, not a poller.
~/Library/LaunchAgents/com.hinata.sync-mail-digest.plistlaunchdKEEPDigest sync, not poller.
~/Library/LaunchAgents/com.hinata.mail-poller.plistlaunchdNOT PRESENTAlready removed. Confirms Mac-side poller plist is dead.

1.2 Z2 surfaces

LocationUnitStatusNotes
/opt/hinata/mail-poller/mail-poller.py (inside CT102)hinata-mail-poller.service + .timer (CT102, every 15 min)CANONICALPer bau-registry.md line 90. Reads /opt/itachi/credentials/, writes /mail-archive/ (bind of /mnt/data/hinata/mail-archive) and state.json.
Z2 host /opt/jimmy-brain-ops/scripts/gmail-api-poller.py + outlook-graph-poller.pyhinata-mail-poller.service + .timer (Z2 host, every 5.5 h)DUPLICATEPer bau-registry.md line 50 the host unit shares the same name as the CT102 unit but is independent. Polls the same accounts from a different code path. This is the host-level duplicate to retire.
/mnt/data/hinata/mail-archive/ (Z2 host)dataSHARED — keepSingle archive surface. Both CT102 (via bind) and host pollers write here; idempotent SHA-256 filenames mean dual writes haven't corrupted anything, but they have wasted IMAP calls.
/mnt/data/hinata/secrets/mail/*.json (Z2 host)credential cacheRETIRE WITH HOST POLLERUsed by host pollers only. CT102 has its own /opt/itachi/credentials/.

1.3 Verification commands Jimmy must run before any deletion

bash
# CT102 poller is live
ssh hinata-z2 'pct exec 102 -- systemctl is-active hinata-mail-poller.service'
ssh hinata-z2 'pct exec 102 -- systemctl list-timers hinata-mail-poller.timer'
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service --since "24 hours ago" | tail -50'

# CT102 is writing to the archive
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" -mtime -1 | wc -l'

# Z2 host poller is also writing (proving the duplicate)
ssh hinata-z2 'systemctl is-active hinata-mail-poller.service'
ssh hinata-z2 'systemctl list-timers hinata-mail-poller.timer'
ssh hinata-z2 'journalctl -u hinata-mail-poller.service --since "24 hours ago" | tail -50'

# Mac scripts confirmed not invoked by any launchd plist
ls -la ~/Library/LaunchAgents/ | grep -i mail
launchctl list | grep -i mail-poll

2. Containerisation completeness check

2.1 What lives in CT102

  • Code: /opt/hinata/mail-poller/mail-poller.py (stdlib only — imaplib, email, urllib, json, hashlib)
  • State: /opt/hinata/mail-poller/state.json
  • Credentials: /opt/itachi/credentials/ (mode 600, local cache; canon in Vaultwarden CT103)
  • Archive write target: /mail-archive/ (bind mount of /mnt/data/hinata/mail-archive on Z2 host)
  • Systemd unit: hinata-mail-poller.service + hinata-mail-poller.timer (15-min cadence)
  • Auth flows: XOAUTH2 (Gmail, primary), app-password (Gmail, fallback), Microsoft Graph refresh-token (Outlook x3) — all token refresh happens inside CT102, refreshed tokens written back to local credential files

2.2 What still ties to Mac (must stay Mac-side)

  • Gmail OAuth one-time mint: scripts/request-gmail-access.py (redirect URI http://localhost:8765). The interactive consent flow needs a browser. After mint, the resulting gmail_oauth_token.json is hand-uploaded to CT102 + Vaultwarden. Verdict: stays Mac-side, runs only on token revocation. No automation required.
  • Microsoft tokens re-mint: documented today as Mac-side via msal library, then pct push to ct102 (reference_mail-poller-z2.md § Graph API Token Expired). Same story as Gmail — only runs on revocation. Verdict: stays Mac-side.
  • Nothing else. All recurring polling, refresh, archiving, state management, classification tagging is self-contained inside CT102.

2.3 Gap: Host-level poller breaks the containerisation claim

The Z2 host hinata-mail-poller.service runs gmail-api-poller.py + outlook-graph-poller.py from /opt/jimmy-brain-ops/scripts/. These bypass CT102 entirely. This is the consolidation target.

3. Deletion plan (per-surface)

3.1 Pre-deletion gates (mandatory)

  1. CT102 timer ACTIVE and last successful run within 30 min — journalctl -u hinata-mail-poller.service
  2. CT102 archive write within the last 1 h — find /mail-archive -mmin -60 -name "*.json" | head
  3. Compare last-24h archive deltas between CT102's known-good days (pre-consolidation) and the day-of — count must be ≥ steady-state baseline; if it crashes to zero, abort and re-enable the host poller
  4. Vaultwarden CT103 holds all 7 credential files under canonical names (read with bw get notes [name] keys-only)
  5. CT102 /opt/itachi/credentials/ populated with all 7 files, mode 600

3.2 Surfaces to retire (in order)

OrderSurfaceVerification before deleteRollback
1Z2 host hinata-mail-poller.timer → STOP + DISABLE first, leave files24-h soak: CT102 alone covers all accounts (archive count parity check)systemctl enable --now hinata-mail-poller.timer on host
2Z2 host /opt/jimmy-brain-ops/scripts/gmail-api-poller.py + outlook-graph-poller.py + sync-mail-creds.shStep 1 soak complete, no Iroh consumers reading host-poller artefactsgit checkout from hinata-sandpit mirror
3Z2 host /mnt/data/hinata/secrets/mail/*.json (credential cache)Step 2 complete, no other host script references it (grep -r "secrets/mail" /opt/)restore from Vaultwarden CT103
4/Users/nnamdi/Sandpit/hinata/scripts/mail-poller.pyNo launchd plist references it (already confirmed), no other script imports it (grep -r "mail-poller" /Users/nnamdi/Sandpit/)git restore from hinata-sandpit
5/Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.pySame as 4; this is the git-tracked sister of the file abovegit revert
6/Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.py + outlook-graph-poller.pyStep 2 complete; these are read-only mirrors of the host scriptsgit revert

Do not retire: ct102-mail-poller.py (documented read-only mirror), mail-30d-digest.py (Iroh digest), request-gmail-access.py (Mac-side mint), com.hinata.sync-mail-digest.plist (digest sync, not poller), the Mac-side Outlook MSAL helper (only used at revocation).

4. Ingest architecture (OAuth stability test + Mac history move + live polls)

CT102 ingests three streams into the same canonical paths. The "1y" window is an OAuth stability test — it stresses the refresh-token + Graph token paths across a sizeable date range during cutover. It is not a retention boundary. Retention is full history, append-only. See explanation_mail-poller-historical-use-cases for downstream-consumer impact.

4.1 Three ingest streams (priority order)

PriorityStreamCoverageRole
1Live poll (15-min timer, existing)New messages since state.json cursorSteady-state. Appends to {account}/{YYYY}/{MM}/.
2Mac local archive MOVE (rehash-mac-import.py, one-shot)~14.7K messages, Oct 2016 → Jun 2026 (per reference_email-sweep-rules-review.md)Lands the deep history into CT102 canonical paths. See how-to_rehash-mac-import for the executable spec.
3OAuth stability test poll (--oauth-test, one-shot per account)1y window (since 2025-06-14)Cutover stress test of XOAUTH2 (Gmail) and Graph token refresh (Outlook ×3). Confirms the connection holds across a large date range. Appends any messages the Mac archive did not already capture.

4.2 Target (unified)

  • Write surface (all three streams): /mail-archive/{account}/{YYYY}/{MM}/{sha256[:16]}.json — one continuous corpus. Run rehash-mac-import.py and live polls into the same tree.
  • Cursors (separated by purpose, co-located in _state/):
    • _state/state.json — live-poll cursor. Mutated only by the 15-min timer.
    • _state/state-oauth-test.json — OAuth stability test cursor. Mutated only by --oauth-test runs. Never touch the live cursor.
  • Journals (append-only, co-located in _journal/):
    • _journal/live-poll.jsonl — one line per successful write from the live timer.
    • _journal/mac-import.jsonl — one line per successful write from rehash-mac-import.py.
    • _journal/rehash-state.json — idempotency tracker for the Mac move (see how-to_rehash-mac-import.md § Idempotency).

4.3 Idempotency

  • Archive filenames are SHA-256(message_id)[:16]. Re-runs of any stream cannot duplicate — if Path(archive).exists(), the writer returns False.
  • The OAuth test stream and the Mac-import stream both target the same canonical paths the live poller writes. Overlap with live polls is intentional (Michael ruling 2026-06-14, "history overlap of both data forms" — verification window).
  • Aborting any stream mid-run is safe. Resume from the relevant cursor / journal.

4.4 Chunking + rate limits

ChannelChunkPacingRationale
Gmail IMAP (OAuth test)1 month per SINCE/BEFORE window, 50 messages per UID FETCH batch200 ms sleep between batchesMatches the existing Sandpit mail-poller.py backfill_account() shape. Avoids Gmail's 2500-MB/day IMAP soft cap.
Microsoft Graph (OAuth test)$top=50 page, @odata.nextLink follow500 ms sleep between pagesGraph throttles at ~10000 req/10 min per app. 500 ms keeps us at ~120 req/min.
Mac-import rehash500 messages per chunk, fsync per chunk, journal append per fileNo remote calls — pace by disk onlySee how-to_rehash-mac-import § Chunking.

4.5 Rollback

If the OAuth stability test or Mac-import corrupts state:

  1. Stop the offending process (pkill -f 'mail-poller.*--oauth-test' or pkill -f rehash-mac-import).
  2. Restore live _state/state.json from _state/state.json.pre-cutover (snapshot taken at §5 step 1).
  3. For the OAuth test stream only: if corruption is suspected in a date range, rm -rf /mail-archive/{account}/{YYYY}/{MM}/ for that range. Live polls will not re-fill — they fetch forward from the live cursor only. Re-run --oauth-test --since YYYY-MM to restore.
  4. For the Mac-import stream: see how-to_rehash-mac-import § Rollback — the Mac source path remains read-only until the journal confirms completion, so the original sha1[:12] tree is the recovery source.
  5. Idempotency guarantees re-running any stream will not double up.

5. Execution sequence (Jimmy Neutron carries this out)

Each step lists: actionverifyrollback. State cursors and journals live under /mail-archive/_state/ and /mail-archive/_journal/ — never on filesystem root.

  1. Pre-flight inventory snapshotssh hinata-z2 'pct exec 102 -- cp /mail-archive/_state/state.json /mail-archive/_state/state.json.pre-cutover'; record archive count (find /mail-archive -path /mail-archive/_journal -prune -o -path /mail-archive/_state -prune -o -name "*.json" -print | wc -l) into /mail-archive/_journal/baseline-pre-cutover.txt. Verify: snapshot present, count written. Rollback: N/A — pure read.

  2. Confirm CT102 canonical health — run §1.3 verification commands. Verify: service active, archive growth within last 60 min. Rollback: N/A — if unhealthy, ABORT and re-prompt Michael.

  3. Confirm Vaultwarden has all 7 credential filesfor n in gmail_oauth_client gmail_oauth_token mail_imap_credential outlook-graph-credentials outlook-tokens-hotmail-michael-asolo outlook-tokens-outlook-michael-nnamah outlook-tokens-outlook-n-nnamah; do ssh hinata-z2 "bw get notes $n >/dev/null && echo ok $n || echo MISSING $n"; done. Verify: seven ok lines, zero MISSING. Rollback: if any missing, ABORT — Michael must add to Vaultwarden manually per feedback_no-auto-vaultwarden-writes.

  4. Migrate CT102 state + journal into canonical layout — if state.json currently lives at /opt/hinata/mail-poller/state.json, move it to /mail-archive/_state/state.json and update the poller to read from the new path. Create /mail-archive/_journal/. Verify: poller dry-run reads the new path; ls /opt/hinata/mail-poller/ no longer carries state. Rollback: restore state.json to old path, revert poller patch.

  5. Stop + disable Z2 host pollerssh hinata-z2 'systemctl stop hinata-mail-poller.timer && systemctl disable hinata-mail-poller.timer'. Verify: systemctl is-enabled hinata-mail-poller.timer returns disabled. Rollback: systemctl enable --now hinata-mail-poller.timer.

  6. 24-hour soak — CT102 alone — wait 24 h. Archive growth check: (find /mail-archive -path /mail-archive/_journal -prune -o -path /mail-archive/_state -prune -o -mmin -1440 -name "*.json" -print | wc -l) ≥ baseline. Verify: count present, no IMAP/Graph error spikes in journalctl -u hinata-mail-poller.service. Rollback: if growth drops below baseline, re-enable host poller (step 5 rollback) and stop here.

  7. Snapshot state for OAuth stability testssh hinata-z2 'pct exec 102 -- cp /mail-archive/_state/state.json /mail-archive/_state/state.json.pre-oauth-test'. Verify: snapshot present. Rollback: N/A.

  8. Add --oauth-test mode + separate cursor to CT102 poller — extend mail-poller.py to accept --oauth-test --since YYYY-MM [--account X], using _state/state-oauth-test.json instead of _state/state.json. Mirror the monthly chunking pattern from the retired Sandpit mail-poller.py backfill_account(). Every successful write appends one line to _journal/live-poll.jsonl tagged source=oauth-test. Verify: --dry-run --oauth-test --since 2025-06 --account gmail lists months without writing, logs [oauth-test] prefix. Rollback: revert the patch in CT102.

  9. OAuth stability test — Gmail first (cheapest)ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --oauth-test --since 2025-06-14 --account gmail'. Verify: XOAUTH2 refresh succeeded across all 12 months without falling back to app password; archive count delta = journal report; _state/state-oauth-test.json written; _state/state.json UNCHANGED (md5 matches pre-test snapshot). Rollback: delete /mail-archive/gmail/{YYYY}/{MM}/ folders for affected range; live polls fetch forward only — they do not refill the gap.

  10. OAuth stability test — the 3 Outlook accounts (one at a time) — repeat step 9 per account with --since 2025-06-14, pacing 500 ms between Graph pages. Verify: per-account Graph refresh succeeded across the full window; per-account growth matches expectation. Rollback: per-account folder delete.

  11. Move Mac local archive into canonical CT102 paths — run rehash-mac-import.py per how-to_rehash-mac-import. The script rehashes ~/Sandpit/hinata/resources/email-poller/**/*.json (sha1[:12] → sha256[:16]), normalises envelope to the CT102 schema, and writes into /mail-archive/{account}/{YYYY}/{MM}/. The Mac source path stays read-only during the move; the journal at _journal/mac-import.jsonl records every write. Verify: _journal/rehash-state.json reports status: complete, processed count matches source count; sample 3 rehashed files render correctly under the canonical paths. Rollback: see how-to_rehash-mac-import § Rollback — Mac source path is the recovery surface.

  12. Delete Z2 host poller scripts — once steps 6+9+10 are green, rm /opt/jimmy-brain-ops/scripts/gmail-api-poller.py /opt/jimmy-brain-ops/scripts/outlook-graph-poller.py /opt/jimmy-brain-ops/scripts/sync-mail-creds.sh and the host unit files /etc/systemd/system/hinata-mail-poller.service + .timer. Verify: systemctl daemon-reload; find /opt/jimmy-brain-ops/scripts/ -name "*mail*" empty. Rollback: git restore from hinata-sandpit mirror.

  13. Delete Z2 host credential cacherm -rf /mnt/data/hinata/secrets/mail/. Verify: grep -r "secrets/mail" /opt/ returns zero refs. Rollback: restore from Vaultwarden CT103.

  14. Delete Mac Sandpit dead pollersrm /Users/nnamdi/Sandpit/hinata/scripts/mail-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/outlook-graph-poller.py. Verify: git status in hinata-sandpit shows 4 deletions; grep -r "mail-poller.py" /Users/nnamdi/Sandpit/ shows only the kept ct102-mail-poller.py mirror. Rollback: git restore per file.

  15. Update reference_mail-poller-architecture.md — remove the Z2-host poller boxes from the ASCII diagram; CT102 is now the sole executor. Reflect the canonical layout (single tree, _state/ + _journal/ siblings). Verify: file no longer references /opt/jimmy-brain-ops/scripts/{gmail-api,outlook-graph}-poller.py. Rollback: git restore.

  16. Update bau-registry.md — drop the Z2-host hinata-mail-poller.service row (line ~50); add note that CT102 is sole owner. Verify: open-questions section §1 closed. Rollback: git restore.

  17. Wire output to Iroh's dashboard — confirm mail-30d-digest.py and any forthcoming dashboard consumer read only from /mail-archive/{account}/{YYYY}/{MM}/*.json (ignoring _journal/ and _state/ by glob). Iroh's dashboard spec is the downstream contract — see explanation_mail-poller-historical-use-cases. Verify: grep digest + dashboard scripts for any reference to _mac-import/ (must return zero) or to historical/archive subtrees outside the canonical tree. Rollback: N/A — read-side only.

  18. Commit + push — vault + sandpit double push per dispatch-lifecycle. Tag tasks.json closure for 800143 (JXA parity / consumer repoint) if applicable.

VERIFICATION