Appearance
Mail Poller → Z2 Migration Strategy
Date: 2026-06-04
Owner: Jimmy Neutron (migration execution) + Trunks (architecture review)
Status: Strategy design (ready for Michael sign-off)
Executive Summary
Current state: Mail poller runs on Mac, fires via LaunchAgent 4x daily, archives 47K+ emails to ~/Sandpit/hinata/resources/email-poller/. Phase 0 (archive) complete; Phase 1 (classification ML) not yet built.
Migration goal: Move mail-poller service to Z2 (Proxmox-hosted container), gain:
- Decoupling: Mac is no longer a single point of failure for email ingestion
- Scale: Z2 can run the 4-layer pipeline locally (BGE embeddings, BERTopic, Layer 3 scorer, FastAPI)
- Resilience: Z2 state-checked daily by Pi cron; auto-recovery on failure
- Cost: Zero LLM tokens at steady state; all ML runs locally
Timeline: 3 phases, ~6 weeks total (gated on NUC delivery #840029, then Z2 install + hardening)
Current Architecture (Mac-only)
┌─────────────────────────────────────────────────────────────────┐
│ macOS (Michael's MBP) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ LaunchAgent: com.hinata.mail-poller.plist │ │
│ │ • Fires 06:30, 12:04, 20:06, 21:51 UTC daily │ │
│ │ • Runs: python3 mail-poller.py (24h lookback) │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ mail-poller.py │ │
│ │ • IMAP: Gmail, Outlook, Hotmail, iCloud │ │
│ │ • Credentials: itachi-digital-security/mail_imap_cred│ │
│ │ • Output: ~/Sandpit/hinata/resources/email-poller/ │ │
│ │ • State cursor: ~/Sandpit/hinata/data/mail/*.json │ │
│ ├───────────────────────────────────────────────────────────┤ │
│ │ ~/Sandpit/hinata/resources/ (email archive + state) │ │
│ │ → Monthly: YYYY-MM/{account}/{hash}.json │ │
│ │ → State: data/mail/imap-state-{account}.json │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │ │
└─────────────────────────────┼───────────────────────────────────┘
│
(local filesystem only)
│
Studio API reads from ~/Sandpit
(not yet built: classification Layer 1–4)Pain points:
- Mail poller is tied to Mac presence → if laptop offline, ingestion stops
- Sandpit/resources is Mac-local → no automated backup or redundancy
- Phase 1+ (classification) will be compute-heavy; MacBook air starves under load
Z2 Target Architecture (Post-Migration)
┌─────────────────────────────────────────────────────────────────────────┐
│ Z2 Mini G4 (Proxmox VE 9.2) │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ LXC: hinata-poller (mail-poller + state cursor) │ │
│ │ • OS: Debian 12 (lightweight, <512MB) │ │
│ │ • Port: 22 (SSH) + 9090 (monitoring) │ │
│ │ • Tailscale: hinata-poller.jimmy-z2.ts.net │ │
│ │ ├─ mail-poller.py (same as Mac, updated cron target) │ │
│ │ ├─ Credentials: /etc/hinata/mail_imap_cred.json (vaultwarden) │ │
│ │ ├─ State: /data/mail/imap-state-*.json │ │
│ │ └─ /mnt/archive → symlink to shared storage (Sandpit NAS) │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ LXC: hinata-ml (classification pipeline — Phase 1+) │ │
│ │ • OS: Debian 12 + Python 3.11 + CUDA (if GPU passthrough) │ │
│ │ • Port: 8000 (FastAPI — Layer 4 endpoints) │ │
│ │ • Input: /mnt/archive/email-poller/ (read-only mount) │ │
│ │ • Models: BGE-small, BERTopic, spaCy, FAISS │ │
│ │ • State: /data/ml/{bertopic_model, logistic_regression.pkl} │ │
│ │ └─ Output: /mnt/archive/resources/ml/ (classification JSON) │ │
│ ├────────────────────────────────────────────────────────────────────┤ │
│ │ Shared storage: /mnt/archive │ │
│ │ • mounted via NFS from Sandpit NAS (or direct SSD if available) │ │
│ │ └─ email-poller/, resources/ml/, data/mail state files │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │ │
└─────────────────────────────┼──────────────────────────────────────────┘
│
(Pi5 cron orchestration)
│
Pi calls Z2 daily: "check-poller-state"
(Tailscale to hinata-poller:22, SSH health check)
If failed: retry + alert TelegramMigration Phases (3 total)
Phase 1: Prepare (NOW → before NUC delivery · ~2 weeks)
Deliverables: Design frozen, credentials secured, state export plan confirmed.
Credential audit + hardening
- [ ] Confirm IMAP credentials in itachi-digital-security/mail_imap_credential.json (plaintext today)
- [ ] Decision: Move to Vaultwarden/vaultwarden at 840058, OR use temporary local .json with strict perms on Z2
- [ ] Generate new app-password for each provider (safer than account password)
- [ ] Test credentials manually (connect to IMAP, list folders)
State export + validation
- [ ] Export current imap-state-*.json from Mac (timestamp each)
- [ ] Validate cursor integrity: UIDs match live server UID+modseq for last-fetched messages
- [ ] Document any gaps or out-of-order UIDs (should be none)
- [ ] Back up to vault:
projects/brain/mail-poller-z2/state-export-2026-06-04.json
Archive manifest
- [ ] Generate inventory:
email-poller/total size, file count per account, date range coverage - [ ] Check for corrupted JSON files (parse all, flag any failures)
- [ ] Output:
projects/brain/mail-poller-z2/archive-manifest-2026-06-04.html
- [ ] Generate inventory:
Z2 install prerequisite
- [ ] Complete Z2 pre-flight checklist (RAM verify, BIOS update, MAC address noted)
- [ ] Confirm Proxmox VE 9.2 ISO available
- [ ] Plan static IP (suggest .50 in your LAN range)
- [ ] Prepare Tailscale join token for hinata-poller LXC
Phase 2: Install & Test (After Z2 hardware online · ~2 weeks)
Deliverables: Mail poller running on Z2, state synced from Mac, classification pipeline scaffolding live.
Z2 base install
- [ ] Boot Proxmox VE 9.2 from USB
- [ ] Install to NVMe, configure static IP, Tailscale join
- [ ] Verify:
ssh jimmy-z2.${TAILNET}.ts.net→ Proxmox UI at https://jimmy-z2:8006
Create hinata-poller LXC
- [ ] Use Proxmox UI → LXC → Debian 12 template
- [ ] Mount shared storage (NFS or direct SSD path) at
/mnt/archive - [ ] Install: Python 3.11, openssh-server, git, curl
- [ ] Copy mail-poller.py →
/opt/hinata/mail-poller.py - [ ] Copy itachi credentials →
/etc/hinata/mail_imap_cred.json(0600 perms)
Sync state from Mac
- [ ] SCP imap-state-*.json from Mac
~/Sandpit/hinata/data/mail/to Z2/data/mail/ - [ ] Verify UIDs: run
mail-poller.py --account user@gmail.com --dry-run→ should fetch 0 new - [ ] Test first live run:
python3 mail-poller.py→ should fetch last 24h, no duplicates - [ ] Verify output:
ls /mnt/archive/email-poller/2026-06/ | wc -l→ confirm new files
- [ ] SCP imap-state-*.json from Mac
Create hinata-ml LXC
- [ ] Debian 12 + Python 3.11 + pip3 install: torch, sentence-transformers, bertopic, spacy, scikit-learn, fastapi, uvicorn
- [ ] Download models (one-time):
BGE-small,spacy en_core_web_sm - [ ] Create
/opt/hinata/email-classifier.py(scaffold Phase 1) - [ ] Start FastAPI mock at 0.0.0.0:8000 (returns 200 OK for now)
Cron integration (Z2 from Pi)
- [ ] Add to Pi cron:
0 6,12,20,21 * * * ssh hinata-poller@hinata-poller.ts.net "/opt/hinata/mail-poller.py" >> /var/log/hinata-poller.log 2>&1 - [ ] Verify: Cron executes, mail-poller runs, new emails archived
- [ ] Test failure recovery: kill mail-poller mid-run, confirm state cursor recovers
- [ ] Add to Pi cron:
Decommission Mac mail-poller
- [ ] Disable LaunchAgent:
launchctl unload ~/Library/LaunchAgents/com.hinata.mail-poller.plist - [ ] Remove from cron (if any)
- [ ] Keep mail-poller.py on Mac as backup (do NOT delete Sandpit)
- [ ] Disable LaunchAgent:
Phase 3: Classification Pipeline + Hardening (After Phase 2 stable · ~2 weeks)
Deliverables: Phase 1 classifier live, training complete, monitoring + alerting in place.
Phase 1 — Structured classifier
- [ ] Build email-classifier.py Layer 1 (deterministic rules from architecture doc)
- [ ] Test on existing corpus: process all 47K emails
- [ ] Output: JSON with {category, confidence=1.0, source="structured"} per email
- [ ] Verify: >90% of emails classified (rest → Layer 2 fallback)
Phase 2 — BERTopic fit + Layer 2 unstructured
- [ ] Download BERTopic on Z2 ML container
- [ ] Fit on full corpus (split personal/professional): ~overnight, 8GB RAM
- [ ] Topic → commander domain mapping (Meruem reviews)
- [ ] Train LogReg classifier (embedding + topic_id → final category)
- [ ] Test on held-out 10% of corpus: aim for >80% accuracy
Layer 3 recommendation engine
- [ ] Build feedback loop: Studio actions (delete, archive, flag) → historical dataset
- [ ] Train similarity scorer: email embedding similarity to acted-on emails
- [ ] Persist as FAISS index + pickle on Z2
- [ ] Output: priority=1–5 per email
Layer 4 FastAPI endpoints (live)
- [ ]
GET /api/email/feed?account=gmail&limit=50→ latest classified emails, sorted by priority - [ ]
GET /api/email/leads?type=career|side_hustle→ filtered inbound/outbound lead signals - [ ]
GET /api/email/missed-opportunities→ cold leads >14d no reply - [ ]
POST /api/email/{message_id}/action→ log delete/archive/flag, retrain feedback
- [ ]
Monitoring + alerts
- [ ] Z2 health check: daily SSH from Pi, run
/opt/hinata/health-check.sh - [ ] Poller success/failure logged to Telegram (Jimmy)
- [ ] Classification latency dashboard (Trunks monitors model speed)
- [ ] Failed credential → alert immediately (Itachi monitors)
- [ ] Z2 health check: daily SSH from Pi, run
Backfill old emails (optional)
- [ ] If time permits: process all 47K existing emails through Layer 1–4
- [ ] Store classification output:
/mnt/archive/resources/ml/classifications/ - [ ] Studio reads and displays retroactively
Implementation Checklist
Pre-migration (Phase 1)
- [ ] Export imap-state-*.json from Mac, validate UIDs
- [ ] Audit + export mail_imap_credential.json
- [ ] Generate archive manifest (size, counts, date range)
- [ ] Finalize Z2 static IP, Tailscale name
- [ ] Confirm Proxmox VE 9.2 ISO path
- [ ] Document any custom mail-poller.py patches (should be none)
Installation (Phase 2)
- [ ] Z2 BIOS updated, hardware pre-flight complete
- [ ] Proxmox VE 9.2 installed, Tailscale joined
- [ ] hinata-poller LXC: Debian, Python, mail-poller.py, credentials, shared mount
- [ ] State synced from Mac, first run tested
- [ ] hinata-ml LXC: scaffolding live, FastAPI 200 OK
- [ ] Pi cron pointed to Z2 (not Mac)
- [ ] Mac mail-poller.plist disabled
- [ ] 7-day smoke test: no missed emails, no duplicates
Classification (Phase 3)
- [ ] Layer 1 classifier built, >90% coverage test passed
- [ ] BERTopic fit complete, topic → domain mapping approved
- [ ] LogReg trained, >80% accuracy on held-out set
- [ ] Layer 3 similarity scorer live with FAISS index
- [ ] FastAPI endpoints live, documented
- [ ] Monitoring dashboard (health, latency, error rate)
- [ ] Backfill (optional): 47K emails processed through Layer 1–4
Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| State cursor out of sync between Mac + Z2 during migration | Medium | Medium — duplicate or missed emails | Dry-run all fetches before cutover; validate UIDs match server |
| Credential plaintext on Z2 before Vaultwarden (CT103, 192.168.1.250) ready | High (credentials stored plaintext in itachi today) | High — if Z2 compromised, all 4 email accounts exposed | Minimum: 0600 file perms, isolated LXC, Tailscale-only SSH. Full fix: move to Vaultwarden at 840058 |
| Mail.Read timeout during backfill (47K emails, IMAP throttling) | Medium | Low — backfill can retry, no data loss | Use --backfill --since YYYY-MM to chunk. Implement exponential backoff on 429/timeout |
| Classification ML model OOM on 16GB Z2 (if 8GB, escalates to High) | Low (BERTopic optimized for small RAM) | Medium — pipeline stalls | Segment corpus by account (4 smaller fits), or defer to NUC+GPU if available |
| Sync latency between Sandpit + Z2 storage | Low (if NFS direct) | Low — acceptable 1–2s lag | Document mount type; monitor NFS latency; alert if >5s |
Storage Architecture (Shared vs. Local)
Option A: NFS mount (Sandpit as NAS)
- Z2 mounts Sandpit over network:
nfs.sandpit.local:/resources → /mnt/archive - Pro: single source of truth, automatic backup via iCloud, visible from Mac
- Con: network dependency, latency, requires Sandpit to be always-on
- Recommendation: If Sandpit is stable, prefer this.
Option B: Direct SSD on Z2
- Email archive lives on Z2 NVMe SSD (256GB, plenty for 47K emails)
- NFS mount Sandpit for backup sync (cron rsync nightly)
- Pro: fast local I/O, no network dependency
- Con: manual sync, single disk failure = data loss until backup
- Recommendation: If Sandpit unreliable, prefer this.
Decision needed from Michael: Which storage strategy?
Maturity Criteria (Definition of Done)
Phase 1 complete when:
- [ ] State synced, first Z2 run fetches correctly
- [ ] 7-day smoke test: no missed emails, no duplicates, no credential errors
Phase 2 complete when:
- [ ] Layer 1 classifier >90% email coverage
- [ ] BERTopic fit done, topic → domain mapping approved
Phase 3 complete when:
- [ ] Layer 1–4 pipeline live
- [ ] FastAPI endpoints tested
- [ ] Monitoring dashboard live
- [ ] Backfill (47K emails) classified (optional but recommended)
Mail poller migration SHIPPED when:
- All 3 phases complete
- 14-day stability run (no errors, no manual restarts)
- Credentials rotated (if still plaintext, at least app-passwords not account passwords)
- Phase 1 gates (if any) cleared by Michael
Open Questions for Michael
- Storage backend: NFS from Sandpit, or local SSD with sync?
- Credentials: Accept plaintext .json on Z2 until Vaultwarden (840058), or implement temporary secure store?
- Backfill priority: Classify all 47K emails Phase 3, or defer to later release?
- Mail.Send gate: Do you want Phase 5 (send capability) investigated in parallel, or defer until Phase 1–4 stable?
- Timeline: Preferred order — mail poller first, then NUC ML research (#z2f2), then other infrastructure tasks?