Skip to content

Mail Poller — Z2 ct102 Email Archiver

Multi-account email polling service for Hinata Z2 infrastructure. Polls 4 email accounts (Gmail via IMAP, 3 Outlook accounts via Microsoft Graph API), archives messages incrementally, and maintains polling state.

The deployed script on ct102 is the canon. A read-only mirror lives in the sandpit repo at scripts/ct102-mail-poller.py (refreshed 2026-06-11). The older scripts/mail-poller.py in the same repo is the retired Mac-era multi-IMAP poller — a different product.

Accounts

AccountProtocolCredential Source
gmailIMAP — XOAUTH2 preferred, app-password fallbackgmail_oauth_client.json + gmail_oauth_token.json (OAuth2); mail_imap_credential.json (fallback)
hotmail-michael-asoloMicrosoft Graph APIoutlook-tokens-hotmail-michael-asolo.json
outlook-michael-nnamahMicrosoft Graph APIoutlook-tokens-outlook-michael-nnamah.json
outlook-n-nnamahMicrosoft Graph APIoutlook-tokens-outlook-n-nnamah.json

Architecture

Directory Structure

On ct102 (mail poller):

/opt/hinata/mail-poller/
├── mail-poller.py              # Main script (XOAUTH2 patch 2026-06-11; --oauth-test mode 2026-06-14)
└── rehash-mac-import.py        # One-shot Mac archive importer (see how-to_rehash-mac-import)

/mail-archive/                  # Canonical archive — bind mount of /mnt/data/hinata/mail-archive (Z2 host)
├── gmail/{YYYY}/{MM}/{sha256[:16]}.json
├── hotmail-michael-asolo/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-michael-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-n-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── _state/
│   ├── state.json              # Live-poll cursor (last UID/datetime per account)
│   └── state-oauth-test.json   # OAuth stability test cursor (separate from live)
└── _journal/
    ├── live-poll.jsonl         # Append-only journal of live-poll writes
    ├── mac-import.jsonl        # Append-only journal of rehash-mac-import.py writes
    └── rehash-state.json       # Idempotency tracker for Mac archive move

Container + mount discipline: nothing on filesystem root. All state, cursors, journals, and archive content live under /mail-archive/ (bind of /mnt/data/hinata/mail-archive). The script directory /opt/hinata/mail-poller/ holds code only — no data, no state.

On ct102 (credentials — local):

/opt/itachi/credentials/
├── gmail_oauth_client.json     # hinata-brain OAuth web client (added 2026-06-11)
├── gmail_oauth_token.json      # Gmail refresh token (minted + live 2026-06-11)
├── mail_imap_credential.json   # Gmail app password (fallback)
├── outlook-graph-credentials.json
├── outlook-tokens-hotmail-michael-asolo.json
├── outlook-tokens-outlook-michael-nnamah.json
└── outlook-tokens-outlook-n-nnamah.json

The poller reads credentials from this ct102-local directory. CT103 has a same-named directory but it is empty (flat files deleted 2026-06-09) — Vaultwarden on CT103 holds the canonical copies; ct102's directory is the runtime cache.

State Format

state.json tracks incremental polling per account:

json
{
  "gmail": {
    "folders": {
      "INBOX": {"last_uid": 12345},
      "Archive": {"last_uid": 9876},
      ...
    },
    "last_poll": "2026-06-05T10:30:45.123456"
  },
  "hotmail-michael-asolo": {
    "last_received": "2026-06-05T10:30:45Z",
    "last_poll": "2026-06-05T10:30:45.123456"
  },
  ...
}

Archive Message Format

Each archived email is a JSON file at /mail-archive/{account}/{YYYY}/{MM}/{message_hash}.json. The filename is the first 16 hex chars of the SHA-256 of the message ID — writes are idempotent, so state-cursor rewinds re-fetch without duplicating archive files:

json
{
  "account": "gmail",
  "email": "michael.asolo1@gmail.com",
  "message_id": "<unique-id@gmail.com>",
  "message_hash": "abcd1234efgh5678",
  "date": "2026-06-05T10:30:45+00:00",
  "subject": "Subject line",
  "from": "sender@example.com",
  "to": "michael.asolo1@gmail.com",
  "body_text": "Plain text body...",
  "body_html": "<html>...</html>",
  "year_month": "2026/06"
}

Deployment

Prerequisites

  • Z2 Proxmox VE running, /mnt/data mounted (the archive bind depends on it)
  • ct102 (Debian 12, Python 3.11) running with credentials at ct102-local /opt/itachi/credentials/
  • SSH access to Z2: ssh hinata-z2 'pct exec 102 -- echo ok'

Deploy

deploy.sh and mail-poller.py code-asset copy deleted 2026-06-11 (dead-script law). Deployment is manual via pct push to ct102. Consumer repoint and JXA parity tracked under 800143.

Usage

Automated (via systemd timer)

Once deployed, the timer runs mail-poller every 15 minutes automatically:

bash
# Check timer status
ssh hinata-z2 'pct exec 102 -- systemctl status hinata-mail-poller.timer'

# View timer schedule
ssh hinata-z2 'pct exec 102 -- systemctl list-timers hinata-mail-poller.timer'

# View recent runs
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service -n 50'

# Stop timer (if needed)
ssh hinata-z2 'pct exec 102 -- systemctl stop hinata-mail-poller.timer'

Manual Execution

bash
# Single run (full execution)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py'

# Test mode (dry-run, no writes)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --dry-run'

# Single account (debug)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --account gmail'

# Verbose output
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose'

Credentials

All paths below are ct102-local /opt/itachi/credentials/ (mode 600). Canonical copies live in Vaultwarden (CT103) under the same names without .json — see reference_itachi-credential-store.

Gmail (IMAP XOAUTH2 — primary, 2026-06-11)

Files: gmail_oauth_client.json (OAuth web client, GCP project hinata-brain) + gmail_oauth_token.json (refresh token, scope https://mail.google.com/).

At each run the poller exchanges the refresh token for an access token at https://oauth2.googleapis.com/token, then authenticates with AUTHENTICATE XOAUTH2. If either file is absent the poller silently falls back to app-password login. IMAP XOAUTH2 requires the full https://mail.google.com/ scope — gmail.readonly is valid only for the Gmail REST API.

The one-time token mint ran 2026-06-11 (scripts/request-gmail-access.py — redirect URI http://localhost:8765, consent app "In production", unverified-app warning accepted). First XOAUTH2 service connect: 2026-06-11 08:00 BST. Re-mint cases: refresh token revoked (Google password change revokes Gmail-scoped tokens) or token-watchdog alerts Gmail OAuth refresh failed.

Auth state is watched daily by token-watchdog.py (Z2 host timer 08:30) — OAuth fallback to app password alerts via Telegram; see reference_itachi-credential-store § Token death watch.

Gmail (IMAP app password — fallback)

File: mail_imap_credential.json

json
{
  "email": "michael.asolo1@gmail.com",
  "password": "xxxx xxxx xxxx xxxx"
}

The password field is the app-specific password generated in Gmail Security settings. Not the account password.

Microsoft Graph API (Outlook)

App Registration: outlook-graph-credentials.json

json
{
  "client_id": "uuid",
  "client_secret": "secret",
  "tenant_id": "consumers"
}

OAuth2 Tokens: outlook-tokens-{account}.json

json
{
  "access_token": "token",
  "refresh_token": "token",
  "updated": "2026-06-05T10:30:45.123456"
}

Mail-poller automatically refreshes expired access tokens and writes updated tokens back to disk.

Monitoring

Check Archive Growth

bash
# Count messages per account
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" | cut -d/ -f3 | sort | uniq -c'

# Total count
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" | wc -l'

# Archive size
ssh hinata-z2 'pct exec 102 -- du -sh /mail-archive'

View State Cursor

bash
ssh hinata-z2 'pct exec 102 -- cat /opt/hinata/mail-poller/state.json | python3 -m json.tool'

Check Recent Logs

bash
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service --since "1 hour ago"'

Troubleshooting

Credentials Not Found

If mail-poller fails with "credential file not found":

  1. Verify ct102's local store (the directory the poller reads):

    bash
    ssh hinata-z2 'pct exec 102 -- ls -la /opt/itachi/credentials/'
  2. If a file is missing, restore it from Vaultwarden (canonical copies, same names without .json) via bw get notes [name] on the Z2 host, then pct push to ct102 with mode 600. CT103's /opt/itachi/credentials/ is empty and is not the source.

IMAP Connection Failed

  1. Inspect credential files keys-only — never print values:

    bash
    ssh hinata-z2 'pct exec 102 -- python3 -c "import json; print(sorted(json.load(open(\"/opt/itachi/credentials/gmail_oauth_token.json\")).keys()))"'
  2. Run the poller itself as the connection test (logs auth method and failures without printing secrets):

    bash
    ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --dry-run --verbose --account gmail'
  3. XOAUTH2 failure plus app-password failure in the same run means both the refresh token and the app password are dead — re-mint via request-gmail-access.py (see Credentials section).

Graph API Token Expired

Mail-poller automatically refreshes Microsoft tokens before each poll. If refresh fails:

  1. Check token file freshness (keys + updated timestamp only):

    bash
    ssh hinata-z2 'pct exec 102 -- python3 -c "import json; d=json.load(open(\"/opt/itachi/credentials/outlook-tokens-hotmail-michael-asolo.json\")); print(sorted(d.keys()), d.get(\"updated\"))"'
  2. Manually refresh tokens by regenerating them on the Mac with the Python msal library, then pct push to ct102.

No New Messages Fetched

  1. Check state.json for last poll time:

    bash
    ssh hinata-z2 'pct exec 102 -- jq .gmail.last_poll /opt/hinata/mail-poller/state.json'
  2. Manually poll with verbose output:

    bash
    ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose --account gmail'
  3. Check server is not rate-limiting (Gmail: 429 errors, Outlook: 429/503).

Known Issues

  • [gmail] Skipping folder [Gmail]/Sent Mail: SELECT command error: BAD — folder names containing spaces are not quoted in the SELECT command; Sent Mail is never polled. Pre-existing, present in the deployed script as of 2026-06-11.

Incident Log

2026-06-11 — data-plane disk transport drop

  • 07:06:13 UTC: the Z2 data disk (WDC WD2000FYYZ, LABEL hinata-data) dropped off the SATA bus (transport errors, not media — SMART clean, 0 reallocated/pending sectors). ext4 journal aborted; /mnt/data unmounted on the host; container binds (including ct102 /mail-archive) went stale, archive writes failed with Errno 5.
  • Poller runs at 07:07 and 07:22 UTC advanced state cursors while archiving 0 messages.
  • Recovery: e2fsck -p (journal replay, clean) → remount → all 10 containers rebooted to refresh binds → state.json cursors rewound (gmail per-folder UIDs, Graph last_received) → verification runs archived 8/8 then 63/63. Rewind is safe because archive filenames are content hashes (idempotent writes).
  • Backup of pre-rewind state: ct102 /tmp/state.json.pre-rewind.
  • Watch item: tasks.json 800148 — recurrence means reseat SATA cable/power or replace the drive (62,600 power-on hours).

Performance Notes

  • IMAP polling: Scales with number of folders and unread messages since last UID. Typical: 50–200ms per account.
  • Graph API polling: Single request per account. Typical: 200–500ms per account.
  • Total run time (all 4 accounts): 500ms–2s (depends on message volume and network latency).
  • Archive I/O: Writing 100 messages ≈ 50–100ms.
  • Memory usage: <50MB.

Development

Testing Locally (Mac)

Before deploying to Z2, test on your Mac:

bash
# Point CREDENTIALS_PATH at a local copy of the ct102 credential files
export CREDENTIALS_PATH=/tmp/mail-poller-test/credentials   # never inside the vault

# Create local state directory
mkdir -p /tmp/mail-poller-test/{archive,state,credentials}
export STATE_DIR=/tmp/mail-poller-test

# Run test against the repo mirror
python3 scripts/ct102-mail-poller.py --dry-run --verbose --account gmail

Modifying the Script

The script uses stdlib only (no pip packages). Key modules:

  • imaplib — IMAP client
  • email.parser — RFC2822 parsing
  • urllib — HTTP requests (for Graph API)
  • json — State and credential file I/O
  • hashlib — Message ID hashing

To add new accounts or functionality, edit:

  1. ACCOUNTS dict — add account definition
  2. poll_*() functions — add protocol-specific poller
  3. extract_*_fields() — add field extraction logic

Future Work

  • [x] OAuth stability test poll mode (--oauth-test --since YYYY-MM). 1y window since 2025-06-14. Cursor: _state/state-oauth-test.json. Retention is full history, append-only. CT102 holds one continuous corpus of Mac history + live polls under {account}/{YYYY}/{MM}/{sha256[:16]}.json. Mac local archive at ~/Sandpit/hinata/resources/email-poller/ rehashes directly into the canonical tree via how-to_rehash-mac-import. Procedure: how-to_mail-poller-consolidation-and-backpop.
  • [ ] Implement exponential backoff for rate limits (429/503)
  • [ ] Add message deduplication (check if message_id already archived)
  • [ ] Support other email providers (ProtonMail, Yahoo, etc.)
  • [ ] Encrypt state.json at rest (Vaultwarden on CT103 now available)
  • [ ] Add FastAPI endpoint for state + archive queries
  • [ ] Integration with classification pipeline (Heimerdinger/heim-nlp)

License

Part of Hinata infrastructure. See parent LICENSE.