Skip to content

Mail Poller → Z2 Migration Strategy

Date: 2026-06-04
Owner: Jimmy Neutron (migration execution) + Trunks (architecture review)
Status: Strategy design (ready for Michael sign-off)


Executive Summary

Current state: Mail poller runs on Mac, fires via LaunchAgent 4x daily, archives 47K+ emails to ~/Sandpit/hinata/resources/email-poller/. Phase 0 (archive) complete; Phase 1 (classification ML) not yet built.

Migration goal: Move mail-poller service to Z2 (Proxmox-hosted container), gain:

  • Decoupling: Mac is no longer a single point of failure for email ingestion
  • Scale: Z2 can run the 4-layer pipeline locally (BGE embeddings, BERTopic, Layer 3 scorer, FastAPI)
  • Resilience: Z2 state-checked daily by Pi cron; auto-recovery on failure
  • Cost: Zero LLM tokens at steady state; all ML runs locally

Timeline: 3 phases, ~6 weeks total (gated on NUC delivery #840029, then Z2 install + hardening)


Current Architecture (Mac-only)

┌─────────────────────────────────────────────────────────────────┐
│  macOS (Michael's MBP)                                          │
│  ┌───────────────────────────────────────────────────────────┐  │
│  │ LaunchAgent: com.hinata.mail-poller.plist               │  │
│  │   • Fires 06:30, 12:04, 20:06, 21:51 UTC daily         │  │
│  │   • Runs: python3 mail-poller.py (24h lookback)         │  │
│  ├───────────────────────────────────────────────────────────┤  │
│  │ mail-poller.py                                          │  │
│  │   • IMAP: Gmail, Outlook, Hotmail, iCloud              │  │
│  │   • Credentials: itachi-digital-security/mail_imap_cred│  │
│  │   • Output: ~/Sandpit/hinata/resources/email-poller/   │  │
│  │   • State cursor: ~/Sandpit/hinata/data/mail/*.json    │  │
│  ├───────────────────────────────────────────────────────────┤  │
│  │ ~/Sandpit/hinata/resources/ (email archive + state)    │  │
│  │   → Monthly: YYYY-MM/{account}/{hash}.json              │  │
│  │   → State: data/mail/imap-state-{account}.json          │  │
│  └───────────────────────────────────────────────────────────┘  │
│                             │                                   │
└─────────────────────────────┼───────────────────────────────────┘

                    (local filesystem only)

                    Studio API reads from ~/Sandpit
                    (not yet built: classification Layer 1–4)

Pain points:

  • Mail poller is tied to Mac presence → if laptop offline, ingestion stops
  • Sandpit/resources is Mac-local → no automated backup or redundancy
  • Phase 1+ (classification) will be compute-heavy; MacBook air starves under load

Z2 Target Architecture (Post-Migration)

┌─────────────────────────────────────────────────────────────────────────┐
│  Z2 Mini G4 (Proxmox VE 9.2)                                           │
│  ┌────────────────────────────────────────────────────────────────────┐ │
│  │ LXC: hinata-poller (mail-poller + state cursor)                   │ │
│  │  • OS: Debian 12 (lightweight, <512MB)                            │ │
│  │  • Port: 22 (SSH) + 9090 (monitoring)                            │ │
│  │  • Tailscale: hinata-poller.jimmy-z2.ts.net                      │ │
│  │  ├─ mail-poller.py (same as Mac, updated cron target)           │ │
│  │  ├─ Credentials: /etc/hinata/mail_imap_cred.json (vaultwarden)   │ │
│  │  ├─ State: /data/mail/imap-state-*.json                         │ │
│  │  └─ /mnt/archive → symlink to shared storage (Sandpit NAS)      │ │
│  ├────────────────────────────────────────────────────────────────────┤ │
│  │ LXC: hinata-ml (classification pipeline — Phase 1+)              │ │
│  │  • OS: Debian 12 + Python 3.11 + CUDA (if GPU passthrough)       │ │
│  │  • Port: 8000 (FastAPI — Layer 4 endpoints)                      │ │
│  │  • Input: /mnt/archive/email-poller/ (read-only mount)           │ │
│  │  • Models: BGE-small, BERTopic, spaCy, FAISS                     │ │
│  │  • State: /data/ml/{bertopic_model, logistic_regression.pkl}     │ │
│  │  └─ Output: /mnt/archive/resources/ml/ (classification JSON)     │ │
│  ├────────────────────────────────────────────────────────────────────┤ │
│  │ Shared storage: /mnt/archive                                      │ │
│  │  • mounted via NFS from Sandpit NAS (or direct SSD if available)  │ │
│  │  └─ email-poller/, resources/ml/, data/mail state files          │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                             │                                          │
└─────────────────────────────┼──────────────────────────────────────────┘

                      (Pi5 cron orchestration)

          Pi calls Z2 daily: "check-poller-state"
          (Tailscale to hinata-poller:22, SSH health check)
          If failed: retry + alert Telegram

Migration Phases (3 total)

Phase 1: Prepare (NOW → before NUC delivery · ~2 weeks)

Deliverables: Design frozen, credentials secured, state export plan confirmed.

  1. Credential audit + hardening

    • [ ] Confirm IMAP credentials in itachi-digital-security/mail_imap_credential.json (plaintext today)
    • [ ] Decision: Move to Vaultwarden/vaultwarden at 840058, OR use temporary local .json with strict perms on Z2
    • [ ] Generate new app-password for each provider (safer than account password)
    • [ ] Test credentials manually (connect to IMAP, list folders)
  2. State export + validation

    • [ ] Export current imap-state-*.json from Mac (timestamp each)
    • [ ] Validate cursor integrity: UIDs match live server UID+modseq for last-fetched messages
    • [ ] Document any gaps or out-of-order UIDs (should be none)
    • [ ] Back up to vault: projects/brain/mail-poller-z2/state-export-2026-06-04.json
  3. Archive manifest

    • [ ] Generate inventory: email-poller/ total size, file count per account, date range coverage
    • [ ] Check for corrupted JSON files (parse all, flag any failures)
    • [ ] Output: projects/brain/mail-poller-z2/archive-manifest-2026-06-04.html
  4. Z2 install prerequisite

    • [ ] Complete Z2 pre-flight checklist (RAM verify, BIOS update, MAC address noted)
    • [ ] Confirm Proxmox VE 9.2 ISO available
    • [ ] Plan static IP (suggest .50 in your LAN range)
    • [ ] Prepare Tailscale join token for hinata-poller LXC

Phase 2: Install & Test (After Z2 hardware online · ~2 weeks)

Deliverables: Mail poller running on Z2, state synced from Mac, classification pipeline scaffolding live.

  1. Z2 base install

    • [ ] Boot Proxmox VE 9.2 from USB
    • [ ] Install to NVMe, configure static IP, Tailscale join
    • [ ] Verify: ssh jimmy-z2.${TAILNET}.ts.net → Proxmox UI at https://jimmy-z2:8006
  2. Create hinata-poller LXC

    • [ ] Use Proxmox UI → LXC → Debian 12 template
    • [ ] Mount shared storage (NFS or direct SSD path) at /mnt/archive
    • [ ] Install: Python 3.11, openssh-server, git, curl
    • [ ] Copy mail-poller.py → /opt/hinata/mail-poller.py
    • [ ] Copy itachi credentials → /etc/hinata/mail_imap_cred.json (0600 perms)
  3. Sync state from Mac

    • [ ] SCP imap-state-*.json from Mac ~/Sandpit/hinata/data/mail/ to Z2 /data/mail/
    • [ ] Verify UIDs: run mail-poller.py --account user@gmail.com --dry-run → should fetch 0 new
    • [ ] Test first live run: python3 mail-poller.py → should fetch last 24h, no duplicates
    • [ ] Verify output: ls /mnt/archive/email-poller/2026-06/ | wc -l → confirm new files
  4. Create hinata-ml LXC

    • [ ] Debian 12 + Python 3.11 + pip3 install: torch, sentence-transformers, bertopic, spacy, scikit-learn, fastapi, uvicorn
    • [ ] Download models (one-time): BGE-small, spacy en_core_web_sm
    • [ ] Create /opt/hinata/email-classifier.py (scaffold Phase 1)
    • [ ] Start FastAPI mock at 0.0.0.0:8000 (returns 200 OK for now)
  5. Cron integration (Z2 from Pi)

    • [ ] Add to Pi cron: 0 6,12,20,21 * * * ssh hinata-poller@hinata-poller.ts.net "/opt/hinata/mail-poller.py" >> /var/log/hinata-poller.log 2>&1
    • [ ] Verify: Cron executes, mail-poller runs, new emails archived
    • [ ] Test failure recovery: kill mail-poller mid-run, confirm state cursor recovers
  6. Decommission Mac mail-poller

    • [ ] Disable LaunchAgent: launchctl unload ~/Library/LaunchAgents/com.hinata.mail-poller.plist
    • [ ] Remove from cron (if any)
    • [ ] Keep mail-poller.py on Mac as backup (do NOT delete Sandpit)

Phase 3: Classification Pipeline + Hardening (After Phase 2 stable · ~2 weeks)

Deliverables: Phase 1 classifier live, training complete, monitoring + alerting in place.

  1. Phase 1 — Structured classifier

    • [ ] Build email-classifier.py Layer 1 (deterministic rules from architecture doc)
    • [ ] Test on existing corpus: process all 47K emails
    • [ ] Output: JSON with {category, confidence=1.0, source="structured"} per email
    • [ ] Verify: >90% of emails classified (rest → Layer 2 fallback)
  2. Phase 2 — BERTopic fit + Layer 2 unstructured

    • [ ] Download BERTopic on Z2 ML container
    • [ ] Fit on full corpus (split personal/professional): ~overnight, 8GB RAM
    • [ ] Topic → commander domain mapping (Meruem reviews)
    • [ ] Train LogReg classifier (embedding + topic_id → final category)
    • [ ] Test on held-out 10% of corpus: aim for >80% accuracy
  3. Layer 3 recommendation engine

    • [ ] Build feedback loop: Studio actions (delete, archive, flag) → historical dataset
    • [ ] Train similarity scorer: email embedding similarity to acted-on emails
    • [ ] Persist as FAISS index + pickle on Z2
    • [ ] Output: priority=1–5 per email
  4. Layer 4 FastAPI endpoints (live)

    • [ ] GET /api/email/feed?account=gmail&limit=50 → latest classified emails, sorted by priority
    • [ ] GET /api/email/leads?type=career|side_hustle → filtered inbound/outbound lead signals
    • [ ] GET /api/email/missed-opportunities → cold leads >14d no reply
    • [ ] POST /api/email/{message_id}/action → log delete/archive/flag, retrain feedback
  5. Monitoring + alerts

    • [ ] Z2 health check: daily SSH from Pi, run /opt/hinata/health-check.sh
    • [ ] Poller success/failure logged to Telegram (Jimmy)
    • [ ] Classification latency dashboard (Trunks monitors model speed)
    • [ ] Failed credential → alert immediately (Itachi monitors)
  6. Backfill old emails (optional)

    • [ ] If time permits: process all 47K existing emails through Layer 1–4
    • [ ] Store classification output: /mnt/archive/resources/ml/classifications/
    • [ ] Studio reads and displays retroactively

Implementation Checklist

Pre-migration (Phase 1)

  • [ ] Export imap-state-*.json from Mac, validate UIDs
  • [ ] Audit + export mail_imap_credential.json
  • [ ] Generate archive manifest (size, counts, date range)
  • [ ] Finalize Z2 static IP, Tailscale name
  • [ ] Confirm Proxmox VE 9.2 ISO path
  • [ ] Document any custom mail-poller.py patches (should be none)

Installation (Phase 2)

  • [ ] Z2 BIOS updated, hardware pre-flight complete
  • [ ] Proxmox VE 9.2 installed, Tailscale joined
  • [ ] hinata-poller LXC: Debian, Python, mail-poller.py, credentials, shared mount
  • [ ] State synced from Mac, first run tested
  • [ ] hinata-ml LXC: scaffolding live, FastAPI 200 OK
  • [ ] Pi cron pointed to Z2 (not Mac)
  • [ ] Mac mail-poller.plist disabled
  • [ ] 7-day smoke test: no missed emails, no duplicates

Classification (Phase 3)

  • [ ] Layer 1 classifier built, >90% coverage test passed
  • [ ] BERTopic fit complete, topic → domain mapping approved
  • [ ] LogReg trained, >80% accuracy on held-out set
  • [ ] Layer 3 similarity scorer live with FAISS index
  • [ ] FastAPI endpoints live, documented
  • [ ] Monitoring dashboard (health, latency, error rate)
  • [ ] Backfill (optional): 47K emails processed through Layer 1–4

Risk Register

RiskLikelihoodImpactMitigation
State cursor out of sync between Mac + Z2 during migrationMediumMedium — duplicate or missed emailsDry-run all fetches before cutover; validate UIDs match server
Credential plaintext on Z2 before Vaultwarden (CT103, 192.168.1.250) readyHigh (credentials stored plaintext in itachi today)High — if Z2 compromised, all 4 email accounts exposedMinimum: 0600 file perms, isolated LXC, Tailscale-only SSH. Full fix: move to Vaultwarden at 840058
Mail.Read timeout during backfill (47K emails, IMAP throttling)MediumLow — backfill can retry, no data lossUse --backfill --since YYYY-MM to chunk. Implement exponential backoff on 429/timeout
Classification ML model OOM on 16GB Z2 (if 8GB, escalates to High)Low (BERTopic optimized for small RAM)Medium — pipeline stallsSegment corpus by account (4 smaller fits), or defer to NUC+GPU if available
Sync latency between Sandpit + Z2 storageLow (if NFS direct)Low — acceptable 1–2s lagDocument mount type; monitor NFS latency; alert if >5s

Storage Architecture (Shared vs. Local)

Option A: NFS mount (Sandpit as NAS)

  • Z2 mounts Sandpit over network: nfs.sandpit.local:/resources → /mnt/archive
  • Pro: single source of truth, automatic backup via iCloud, visible from Mac
  • Con: network dependency, latency, requires Sandpit to be always-on
  • Recommendation: If Sandpit is stable, prefer this.

Option B: Direct SSD on Z2

  • Email archive lives on Z2 NVMe SSD (256GB, plenty for 47K emails)
  • NFS mount Sandpit for backup sync (cron rsync nightly)
  • Pro: fast local I/O, no network dependency
  • Con: manual sync, single disk failure = data loss until backup
  • Recommendation: If Sandpit unreliable, prefer this.

Decision needed from Michael: Which storage strategy?


Maturity Criteria (Definition of Done)

Phase 1 complete when:

  • [ ] State synced, first Z2 run fetches correctly
  • [ ] 7-day smoke test: no missed emails, no duplicates, no credential errors

Phase 2 complete when:

  • [ ] Layer 1 classifier >90% email coverage
  • [ ] BERTopic fit done, topic → domain mapping approved

Phase 3 complete when:

  • [ ] Layer 1–4 pipeline live
  • [ ] FastAPI endpoints tested
  • [ ] Monitoring dashboard live
  • [ ] Backfill (47K emails) classified (optional but recommended)

Mail poller migration SHIPPED when:

  • All 3 phases complete
  • 14-day stability run (no errors, no manual restarts)
  • Credentials rotated (if still plaintext, at least app-passwords not account passwords)
  • Phase 1 gates (if any) cleared by Michael

Open Questions for Michael

  1. Storage backend: NFS from Sandpit, or local SSD with sync?
  2. Credentials: Accept plaintext .json on Z2 until Vaultwarden (840058), or implement temporary secure store?
  3. Backfill priority: Classify all 47K emails Phase 3, or defer to later release?
  4. Mail.Send gate: Do you want Phase 5 (send capability) investigated in parallel, or defer until Phase 1–4 stable?
  5. Timeline: Preferred order — mail poller first, then NUC ML research (#z2f2), then other infrastructure tasks?