Appearance
How to consolidate the mail-poller onto CT102 and backpop history
CT102 is the canonical mail-poller (Michael ruling, 2026-06-14, verbatim: "why would z2 host be poller. i want it containerised."). Anywhere else mail-polling code lives, it is a duplicate to verify and delete. This guide is the technical playbook Jimmy Neutron executes.
Locked decisions (Michael, 2026-06-14)
Critical reframing (Michael verbatim 2026-06-14): "misinformation occured in prompting. 1y meant test the oauth connections for stability by polling a large date range." The 1y window is an OAuth stability test, not a retention scope. Retention = full history, append-only. CT102 holds one continuous corpus: the moved Mac archive (~14.7K messages, Oct 2016 → Jun 2026) plus every subsequent live poll, in one intelligently-organised data plane. Heimerdinger reads only canonical CT102 paths; there is no separate
_mac-import/consumer surface.
| Question | Locked answer |
|---|---|
| Canonical instance | CT102 — containerised; Z2-host poller retired entirely |
| Retention scope | FULL HISTORY, append-only. Nothing is discarded. CT102 holds the complete archive (Mac 10y dump + all live streaming) as ONE continuous corpus. Streaming polls append to the same canonical paths the Mac corpus lands in. |
| OAuth stability test (Gmail) | 1y poll-window (since 2025-06-14) — stress-tests the OAuth connection across a sizeable date range during cutover. Not a retention boundary. |
| OAuth stability test (Outlook ×3) | 1y poll-window (since 2025-06-14) — same stress test, per account, against Microsoft Graph |
Mac local archive (~/Sandpit/hinata/resources/email-poller/) | MOVE into CT102's canonical data-object structure (not a separate _mac-import/ subtree). In-place re-hash sha1[:12] → sha256[:16] during the move, envelope normalised to the CT102 schema. Source path is read-only during the move. |
| Format homogenisation | JSON (single envelope schema per reference_mail-poller-z2.md). Mac archive imports converted on ingest. |
| Container + mount discipline | Nothing at filesystem root. Everything lives under dedicated container subdirs + bind-mounts. Michael verbatim: "nothing should be on root should be in dedicated container and mounts i think." |
| Heimerdinger read surface | CT102 canonical paths only. No Mac-import subtree, no historical-archive read separately. Incremental retrain on appended messages (not full-corpus retrain per poll). |
| Output surface | Iroh's dashboard — fully formed alerts + information packets per the existing scope. |
| Cutover ceremony | NONE — direct deploy. Michael: "why is a cutover window needed?" — ship it. |
| Live-poll / archive overlap | INTENTIONAL — Michael: "archive has all emails i just made an overlap so standardisation to live polls has history overlap of both data forms." Overlap is a verification window across formats, not a bug. Idempotent filenames make it safe. |
0. CT102 canonical data-object organisation (new)
Schema chosen: {account}/{YYYY}/{MM}/{sha256[:16]}.json — by account, then year, then month. Justification:
- Already canonical. This is the schema CT102's live
mail-poller.pywrites (perreference_mail-poller-z2.md§ Archive Message Format). The Mac archive (currently sha1[:12]) re-hashes into the same shape — one tree, two ingest histories. - Account-first matches consumer access patterns. Bulma reads
gmail/for receipts; Madara reads cross-account for security signal; Heim consumes the whole tree but partitions naturally onaccountduring incremental fit. - Year/month gives O(1) date-range queries without an external index. Iroh's dashboard and the OAuth stability test poll both window-query by month.
- By-thread was rejected — thread reconstruction is a derived view, not a storage primitive. Heim builds threads in its embedding space; raw storage stays flat-by-date.
Canonical paths under CT102 (all bind-mounted to /mnt/data/hinata/mail-archive/ on the Z2 host — never on filesystem root):
/mail-archive/ # CT102 bind of /mnt/data/hinata/mail-archive
├── gmail/{YYYY}/{MM}/{sha256[:16]}.json # one continuous corpus: Mac history + live polls
├── hotmail-michael-asolo/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-michael-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-n-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── _journal/ # append-only ingest journal
│ ├── live-poll.jsonl # every successful live-poll write
│ ├── mac-import.jsonl # every rehash-mac-import.py write
│ └── rehash-state.json # idempotency tracker (Mac → CT102 progress)
└── _state/ # state cursors live inside the archive root
├── state.json # live-poll cursor (per account / per folder)
└── state-oauth-test.json # 1y OAuth stability test cursor (separate from live)_journal/ and _state/ use leading-underscore so they sort above account dirs and never collide with an account named journal or state. They are part of the canonical surface — Heimerdinger and Iroh's dashboard ignore them by glob (*/{YYYY}/{MM}/*.json only).
The Mac archive does not land in a separate _mac-import/ subtree. It rehashes directly into gmail/{YYYY}/{MM}/ under the canonical scheme. The only _mac-import artefact is the journal line in _journal/mac-import.jsonl recording provenance for audit.
1. Inventory of mail-poller instances (scope first, do not delete yet)
1.1 Mac surfaces
| Path | Type | Status | Notes |
|---|---|---|---|
/Users/nnamdi/Sandpit/hinata/scripts/mail-poller.py | live script | DUPLICATE | Multi-IMAP, writes to ~/Sandpit/hinata/resources/mail/ + resources/email-poller/. Not invoked by any current Mac LaunchAgent (no com.hinata.mail-poller.plist exists). Legacy Mac-era poller — same product class as CT102, must die. |
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.py | live script | DUPLICATE | Sister copy of above, in the git-tracked Sandpit repo. |
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/ct102-mail-poller.py | mirror | KEEP (read-only) | Reference-only mirror of CT102's deployed script. Per reference_mail-poller-z2.md line 10 it is the documented read-only mirror. Not executed on Mac. |
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.py | live script | VERIFY | Listed in reference_mail-poller-architecture.md as part of Z2-host /opt/jimmy-brain-ops/scripts/ — same code mirrored to Sandpit. Active on Z2 host. |
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/outlook-graph-poller.py | live script | VERIFY | As above. |
/Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-30d-digest.py | digest script | KEEP | Iroh's digest, not a poller. |
~/Library/LaunchAgents/com.hinata.sync-mail-digest.plist | launchd | KEEP | Digest sync, not poller. |
~/Library/LaunchAgents/com.hinata.mail-poller.plist | launchd | NOT PRESENT | Already removed. Confirms Mac-side poller plist is dead. |
1.2 Z2 surfaces
| Location | Unit | Status | Notes |
|---|---|---|---|
/opt/hinata/mail-poller/mail-poller.py (inside CT102) | hinata-mail-poller.service + .timer (CT102, every 15 min) | CANONICAL | Per bau-registry.md line 90. Reads /opt/itachi/credentials/, writes /mail-archive/ (bind of /mnt/data/hinata/mail-archive) and state.json. |
Z2 host /opt/jimmy-brain-ops/scripts/gmail-api-poller.py + outlook-graph-poller.py | hinata-mail-poller.service + .timer (Z2 host, every 5.5 h) | DUPLICATE | Per bau-registry.md line 50 the host unit shares the same name as the CT102 unit but is independent. Polls the same accounts from a different code path. This is the host-level duplicate to retire. |
/mnt/data/hinata/mail-archive/ (Z2 host) | data | SHARED — keep | Single archive surface. Both CT102 (via bind) and host pollers write here; idempotent SHA-256 filenames mean dual writes haven't corrupted anything, but they have wasted IMAP calls. |
/mnt/data/hinata/secrets/mail/*.json (Z2 host) | credential cache | RETIRE WITH HOST POLLER | Used by host pollers only. CT102 has its own /opt/itachi/credentials/. |
1.3 Verification commands Jimmy must run before any deletion
bash
# CT102 poller is live
ssh hinata-z2 'pct exec 102 -- systemctl is-active hinata-mail-poller.service'
ssh hinata-z2 'pct exec 102 -- systemctl list-timers hinata-mail-poller.timer'
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service --since "24 hours ago" | tail -50'
# CT102 is writing to the archive
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" -mtime -1 | wc -l'
# Z2 host poller is also writing (proving the duplicate)
ssh hinata-z2 'systemctl is-active hinata-mail-poller.service'
ssh hinata-z2 'systemctl list-timers hinata-mail-poller.timer'
ssh hinata-z2 'journalctl -u hinata-mail-poller.service --since "24 hours ago" | tail -50'
# Mac scripts confirmed not invoked by any launchd plist
ls -la ~/Library/LaunchAgents/ | grep -i mail
launchctl list | grep -i mail-poll2. Containerisation completeness check
2.1 What lives in CT102
- Code:
/opt/hinata/mail-poller/mail-poller.py(stdlib only —imaplib,email,urllib,json,hashlib) - State:
/opt/hinata/mail-poller/state.json - Credentials:
/opt/itachi/credentials/(mode 600, local cache; canon in Vaultwarden CT103) - Archive write target:
/mail-archive/(bind mount of/mnt/data/hinata/mail-archiveon Z2 host) - Systemd unit:
hinata-mail-poller.service+hinata-mail-poller.timer(15-min cadence) - Auth flows: XOAUTH2 (Gmail, primary), app-password (Gmail, fallback), Microsoft Graph refresh-token (Outlook x3) — all token refresh happens inside CT102, refreshed tokens written back to local credential files
2.2 What still ties to Mac (must stay Mac-side)
- Gmail OAuth one-time mint:
scripts/request-gmail-access.py(redirect URIhttp://localhost:8765). The interactive consent flow needs a browser. After mint, the resultinggmail_oauth_token.jsonis hand-uploaded to CT102 + Vaultwarden. Verdict: stays Mac-side, runs only on token revocation. No automation required. - Microsoft tokens re-mint: documented today as Mac-side via
msallibrary, thenpct pushto ct102 (reference_mail-poller-z2.md§ Graph API Token Expired). Same story as Gmail — only runs on revocation. Verdict: stays Mac-side. - Nothing else. All recurring polling, refresh, archiving, state management, classification tagging is self-contained inside CT102.
2.3 Gap: Host-level poller breaks the containerisation claim
The Z2 host hinata-mail-poller.service runs gmail-api-poller.py + outlook-graph-poller.py from /opt/jimmy-brain-ops/scripts/. These bypass CT102 entirely. This is the consolidation target.
3. Deletion plan (per-surface)
3.1 Pre-deletion gates (mandatory)
- CT102 timer ACTIVE and last successful run within 30 min —
journalctl -u hinata-mail-poller.service - CT102 archive write within the last 1 h —
find /mail-archive -mmin -60 -name "*.json" | head - Compare last-24h archive deltas between CT102's known-good days (pre-consolidation) and the day-of — count must be ≥ steady-state baseline; if it crashes to zero, abort and re-enable the host poller
- Vaultwarden CT103 holds all 7 credential files under canonical names (read with
bw get notes [name]keys-only) - CT102
/opt/itachi/credentials/populated with all 7 files, mode 600
3.2 Surfaces to retire (in order)
| Order | Surface | Verification before delete | Rollback |
|---|---|---|---|
| 1 | Z2 host hinata-mail-poller.timer → STOP + DISABLE first, leave files | 24-h soak: CT102 alone covers all accounts (archive count parity check) | systemctl enable --now hinata-mail-poller.timer on host |
| 2 | Z2 host /opt/jimmy-brain-ops/scripts/gmail-api-poller.py + outlook-graph-poller.py + sync-mail-creds.sh | Step 1 soak complete, no Iroh consumers reading host-poller artefacts | git checkout from hinata-sandpit mirror |
| 3 | Z2 host /mnt/data/hinata/secrets/mail/*.json (credential cache) | Step 2 complete, no other host script references it (grep -r "secrets/mail" /opt/) | restore from Vaultwarden CT103 |
| 4 | /Users/nnamdi/Sandpit/hinata/scripts/mail-poller.py | No launchd plist references it (already confirmed), no other script imports it (grep -r "mail-poller" /Users/nnamdi/Sandpit/) | git restore from hinata-sandpit |
| 5 | /Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.py | Same as 4; this is the git-tracked sister of the file above | git revert |
| 6 | /Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.py + outlook-graph-poller.py | Step 2 complete; these are read-only mirrors of the host scripts | git revert |
Do not retire: ct102-mail-poller.py (documented read-only mirror), mail-30d-digest.py (Iroh digest), request-gmail-access.py (Mac-side mint), com.hinata.sync-mail-digest.plist (digest sync, not poller), the Mac-side Outlook MSAL helper (only used at revocation).
4. Ingest architecture (OAuth stability test + Mac history move + live polls)
CT102 ingests three streams into the same canonical paths. The "1y" window is an OAuth stability test — it stresses the refresh-token + Graph token paths across a sizeable date range during cutover. It is not a retention boundary. Retention is full history, append-only. See explanation_mail-poller-historical-use-cases for downstream-consumer impact.
4.1 Three ingest streams (priority order)
| Priority | Stream | Coverage | Role |
|---|---|---|---|
| 1 | Live poll (15-min timer, existing) | New messages since state.json cursor | Steady-state. Appends to {account}/{YYYY}/{MM}/. |
| 2 | Mac local archive MOVE (rehash-mac-import.py, one-shot) | ~14.7K messages, Oct 2016 → Jun 2026 (per reference_email-sweep-rules-review.md) | Lands the deep history into CT102 canonical paths. See how-to_rehash-mac-import for the executable spec. |
| 3 | OAuth stability test poll (--oauth-test, one-shot per account) | 1y window (since 2025-06-14) | Cutover stress test of XOAUTH2 (Gmail) and Graph token refresh (Outlook ×3). Confirms the connection holds across a large date range. Appends any messages the Mac archive did not already capture. |
4.2 Target (unified)
- Write surface (all three streams):
/mail-archive/{account}/{YYYY}/{MM}/{sha256[:16]}.json— one continuous corpus. Runrehash-mac-import.pyand live polls into the same tree. - Cursors (separated by purpose, co-located in
_state/):_state/state.json— live-poll cursor. Mutated only by the 15-min timer._state/state-oauth-test.json— OAuth stability test cursor. Mutated only by--oauth-testruns. Never touch the live cursor.
- Journals (append-only, co-located in
_journal/):_journal/live-poll.jsonl— one line per successful write from the live timer._journal/mac-import.jsonl— one line per successful write fromrehash-mac-import.py._journal/rehash-state.json— idempotency tracker for the Mac move (seehow-to_rehash-mac-import.md§ Idempotency).
4.3 Idempotency
- Archive filenames are
SHA-256(message_id)[:16]. Re-runs of any stream cannot duplicate — ifPath(archive).exists(), the writer returns False. - The OAuth test stream and the Mac-import stream both target the same canonical paths the live poller writes. Overlap with live polls is intentional (Michael ruling 2026-06-14, "history overlap of both data forms" — verification window).
- Aborting any stream mid-run is safe. Resume from the relevant cursor / journal.
4.4 Chunking + rate limits
| Channel | Chunk | Pacing | Rationale |
|---|---|---|---|
| Gmail IMAP (OAuth test) | 1 month per SINCE/BEFORE window, 50 messages per UID FETCH batch | 200 ms sleep between batches | Matches the existing Sandpit mail-poller.py backfill_account() shape. Avoids Gmail's 2500-MB/day IMAP soft cap. |
| Microsoft Graph (OAuth test) | $top=50 page, @odata.nextLink follow | 500 ms sleep between pages | Graph throttles at ~10000 req/10 min per app. 500 ms keeps us at ~120 req/min. |
| Mac-import rehash | 500 messages per chunk, fsync per chunk, journal append per file | No remote calls — pace by disk only | See how-to_rehash-mac-import § Chunking. |
4.5 Rollback
If the OAuth stability test or Mac-import corrupts state:
- Stop the offending process (
pkill -f 'mail-poller.*--oauth-test'orpkill -f rehash-mac-import). - Restore live
_state/state.jsonfrom_state/state.json.pre-cutover(snapshot taken at §5 step 1). - For the OAuth test stream only: if corruption is suspected in a date range,
rm -rf /mail-archive/{account}/{YYYY}/{MM}/for that range. Live polls will not re-fill — they fetch forward from the live cursor only. Re-run--oauth-test --since YYYY-MMto restore. - For the Mac-import stream: see how-to_rehash-mac-import § Rollback — the Mac source path remains read-only until the journal confirms completion, so the original sha1[:12] tree is the recovery source.
- Idempotency guarantees re-running any stream will not double up.
5. Execution sequence (Jimmy Neutron carries this out)
Each step lists: action → verify → rollback. State cursors and journals live under /mail-archive/_state/ and /mail-archive/_journal/ — never on filesystem root.
Pre-flight inventory snapshot —
ssh hinata-z2 'pct exec 102 -- cp /mail-archive/_state/state.json /mail-archive/_state/state.json.pre-cutover'; record archive count (find /mail-archive -path /mail-archive/_journal -prune -o -path /mail-archive/_state -prune -o -name "*.json" -print | wc -l) into/mail-archive/_journal/baseline-pre-cutover.txt. Verify: snapshot present, count written. Rollback: N/A — pure read.Confirm CT102 canonical health — run §1.3 verification commands. Verify: service active, archive growth within last 60 min. Rollback: N/A — if unhealthy, ABORT and re-prompt Michael.
Confirm Vaultwarden has all 7 credential files —
for n in gmail_oauth_client gmail_oauth_token mail_imap_credential outlook-graph-credentials outlook-tokens-hotmail-michael-asolo outlook-tokens-outlook-michael-nnamah outlook-tokens-outlook-n-nnamah; do ssh hinata-z2 "bw get notes $n >/dev/null && echo ok $n || echo MISSING $n"; done. Verify: sevenoklines, zeroMISSING. Rollback: if any missing, ABORT — Michael must add to Vaultwarden manually per feedback_no-auto-vaultwarden-writes.Migrate CT102 state + journal into canonical layout — if
state.jsoncurrently lives at/opt/hinata/mail-poller/state.json, move it to/mail-archive/_state/state.jsonand update the poller to read from the new path. Create/mail-archive/_journal/. Verify: poller dry-run reads the new path;ls /opt/hinata/mail-poller/no longer carries state. Rollback: restorestate.jsonto old path, revert poller patch.Stop + disable Z2 host poller —
ssh hinata-z2 'systemctl stop hinata-mail-poller.timer && systemctl disable hinata-mail-poller.timer'. Verify:systemctl is-enabled hinata-mail-poller.timerreturnsdisabled. Rollback:systemctl enable --now hinata-mail-poller.timer.24-hour soak — CT102 alone — wait 24 h. Archive growth check:
(find /mail-archive -path /mail-archive/_journal -prune -o -path /mail-archive/_state -prune -o -mmin -1440 -name "*.json" -print | wc -l)≥ baseline. Verify: count present, no IMAP/Graph error spikes injournalctl -u hinata-mail-poller.service. Rollback: if growth drops below baseline, re-enable host poller (step 5 rollback) and stop here.Snapshot state for OAuth stability test —
ssh hinata-z2 'pct exec 102 -- cp /mail-archive/_state/state.json /mail-archive/_state/state.json.pre-oauth-test'. Verify: snapshot present. Rollback: N/A.Add
--oauth-testmode + separate cursor to CT102 poller — extendmail-poller.pyto accept--oauth-test --since YYYY-MM [--account X], using_state/state-oauth-test.jsoninstead of_state/state.json. Mirror the monthly chunking pattern from the retired Sandpitmail-poller.pybackfill_account(). Every successful write appends one line to_journal/live-poll.jsonltaggedsource=oauth-test. Verify:--dry-run --oauth-test --since 2025-06 --account gmaillists months without writing, logs[oauth-test]prefix. Rollback: revert the patch in CT102.OAuth stability test — Gmail first (cheapest) —
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --oauth-test --since 2025-06-14 --account gmail'. Verify: XOAUTH2 refresh succeeded across all 12 months without falling back to app password; archive count delta = journal report;_state/state-oauth-test.jsonwritten;_state/state.jsonUNCHANGED (md5 matches pre-test snapshot). Rollback: delete/mail-archive/gmail/{YYYY}/{MM}/folders for affected range; live polls fetch forward only — they do not refill the gap.OAuth stability test — the 3 Outlook accounts (one at a time) — repeat step 9 per account with
--since 2025-06-14, pacing 500 ms between Graph pages. Verify: per-account Graph refresh succeeded across the full window; per-account growth matches expectation. Rollback: per-account folder delete.Move Mac local archive into canonical CT102 paths — run
rehash-mac-import.pyper how-to_rehash-mac-import. The script rehashes~/Sandpit/hinata/resources/email-poller/**/*.json(sha1[:12] → sha256[:16]), normalises envelope to the CT102 schema, and writes into/mail-archive/{account}/{YYYY}/{MM}/. The Mac source path stays read-only during the move; the journal at_journal/mac-import.jsonlrecords every write. Verify:_journal/rehash-state.jsonreportsstatus: complete, processed count matches source count; sample 3 rehashed files render correctly under the canonical paths. Rollback: see how-to_rehash-mac-import § Rollback — Mac source path is the recovery surface.Delete Z2 host poller scripts — once steps 6+9+10 are green,
rm /opt/jimmy-brain-ops/scripts/gmail-api-poller.py /opt/jimmy-brain-ops/scripts/outlook-graph-poller.py /opt/jimmy-brain-ops/scripts/sync-mail-creds.shand the host unit files/etc/systemd/system/hinata-mail-poller.service+.timer. Verify:systemctl daemon-reload;find /opt/jimmy-brain-ops/scripts/ -name "*mail*"empty. Rollback:git restorefromhinata-sandpitmirror.Delete Z2 host credential cache —
rm -rf /mnt/data/hinata/secrets/mail/. Verify:grep -r "secrets/mail" /opt/returns zero refs. Rollback: restore from Vaultwarden CT103.Delete Mac Sandpit dead pollers —
rm /Users/nnamdi/Sandpit/hinata/scripts/mail-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/mail-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/gmail-api-poller.py /Users/nnamdi/Sandpit/hinata-sandpit/scripts/outlook-graph-poller.py. Verify:git statusinhinata-sandpitshows 4 deletions;grep -r "mail-poller.py" /Users/nnamdi/Sandpit/shows only the keptct102-mail-poller.pymirror. Rollback:git restoreper file.Update
reference_mail-poller-architecture.md— remove the Z2-host poller boxes from the ASCII diagram; CT102 is now the sole executor. Reflect the canonical layout (single tree,_state/+_journal/siblings). Verify: file no longer references/opt/jimmy-brain-ops/scripts/{gmail-api,outlook-graph}-poller.py. Rollback:git restore.Update
bau-registry.md— drop the Z2-hosthinata-mail-poller.servicerow (line ~50); add note that CT102 is sole owner. Verify: open-questions section §1 closed. Rollback:git restore.Wire output to Iroh's dashboard — confirm
mail-30d-digest.pyand any forthcoming dashboard consumer read only from/mail-archive/{account}/{YYYY}/{MM}/*.json(ignoring_journal/and_state/by glob). Iroh's dashboard spec is the downstream contract — see explanation_mail-poller-historical-use-cases. Verify: grep digest + dashboard scripts for any reference to_mac-import/(must return zero) or to historical/archive subtrees outside the canonical tree. Rollback: N/A — read-side only.Commit + push — vault + sandpit double push per dispatch-lifecycle. Tag tasks.json closure for 800143 (JXA parity / consumer repoint) if applicable.