Appearance
Mail Poller — Z2 ct102 Email Archiver
Multi-account email polling service for Hinata Z2 infrastructure. Polls 4 email accounts (Gmail via IMAP, 3 Outlook accounts via Microsoft Graph API), archives messages incrementally, and maintains polling state.
The deployed script on ct102 is the canon. A read-only mirror lives in the sandpit repo at scripts/ct102-mail-poller.py (refreshed 2026-06-11). The older scripts/mail-poller.py in the same repo is the retired Mac-era multi-IMAP poller — a different product.
Accounts
| Account | Protocol | Credential Source |
|---|---|---|
| gmail | IMAP — XOAUTH2 preferred, app-password fallback | gmail_oauth_client.json + gmail_oauth_token.json (OAuth2); mail_imap_credential.json (fallback) |
| hotmail-michael-asolo | Microsoft Graph API | outlook-tokens-hotmail-michael-asolo.json |
| outlook-michael-nnamah | Microsoft Graph API | outlook-tokens-outlook-michael-nnamah.json |
| outlook-n-nnamah | Microsoft Graph API | outlook-tokens-outlook-n-nnamah.json |
Architecture
Directory Structure
On ct102 (mail poller):
/opt/hinata/mail-poller/
├── mail-poller.py # Main script (XOAUTH2 patch 2026-06-11; --oauth-test mode 2026-06-14)
└── rehash-mac-import.py # One-shot Mac archive importer (see how-to_rehash-mac-import)
/mail-archive/ # Canonical archive — bind mount of /mnt/data/hinata/mail-archive (Z2 host)
├── gmail/{YYYY}/{MM}/{sha256[:16]}.json
├── hotmail-michael-asolo/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-michael-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── outlook-n-nnamah/{YYYY}/{MM}/{sha256[:16]}.json
├── _state/
│ ├── state.json # Live-poll cursor (last UID/datetime per account)
│ └── state-oauth-test.json # OAuth stability test cursor (separate from live)
└── _journal/
├── live-poll.jsonl # Append-only journal of live-poll writes
├── mac-import.jsonl # Append-only journal of rehash-mac-import.py writes
└── rehash-state.json # Idempotency tracker for Mac archive moveContainer + mount discipline: nothing on filesystem root. All state, cursors, journals, and archive content live under /mail-archive/ (bind of /mnt/data/hinata/mail-archive). The script directory /opt/hinata/mail-poller/ holds code only — no data, no state.
On ct102 (credentials — local):
/opt/itachi/credentials/
├── gmail_oauth_client.json # hinata-brain OAuth web client (added 2026-06-11)
├── gmail_oauth_token.json # Gmail refresh token (minted + live 2026-06-11)
├── mail_imap_credential.json # Gmail app password (fallback)
├── outlook-graph-credentials.json
├── outlook-tokens-hotmail-michael-asolo.json
├── outlook-tokens-outlook-michael-nnamah.json
└── outlook-tokens-outlook-n-nnamah.jsonThe poller reads credentials from this ct102-local directory. CT103 has a same-named directory but it is empty (flat files deleted 2026-06-09) — Vaultwarden on CT103 holds the canonical copies; ct102's directory is the runtime cache.
State Format
state.json tracks incremental polling per account:
json
{
"gmail": {
"folders": {
"INBOX": {"last_uid": 12345},
"Archive": {"last_uid": 9876},
...
},
"last_poll": "2026-06-05T10:30:45.123456"
},
"hotmail-michael-asolo": {
"last_received": "2026-06-05T10:30:45Z",
"last_poll": "2026-06-05T10:30:45.123456"
},
...
}Archive Message Format
Each archived email is a JSON file at /mail-archive/{account}/{YYYY}/{MM}/{message_hash}.json. The filename is the first 16 hex chars of the SHA-256 of the message ID — writes are idempotent, so state-cursor rewinds re-fetch without duplicating archive files:
json
{
"account": "gmail",
"email": "michael.asolo1@gmail.com",
"message_id": "<unique-id@gmail.com>",
"message_hash": "abcd1234efgh5678",
"date": "2026-06-05T10:30:45+00:00",
"subject": "Subject line",
"from": "sender@example.com",
"to": "michael.asolo1@gmail.com",
"body_text": "Plain text body...",
"body_html": "<html>...</html>",
"year_month": "2026/06"
}Deployment
Prerequisites
- Z2 Proxmox VE running,
/mnt/datamounted (the archive bind depends on it) - ct102 (Debian 12, Python 3.11) running with credentials at ct102-local
/opt/itachi/credentials/ - SSH access to Z2:
ssh hinata-z2 'pct exec 102 -- echo ok'
Deploy
deploy.sh and mail-poller.py code-asset copy deleted 2026-06-11 (dead-script law). Deployment is manual via pct push to ct102. Consumer repoint and JXA parity tracked under 800143.
Usage
Automated (via systemd timer)
Once deployed, the timer runs mail-poller every 15 minutes automatically:
bash
# Check timer status
ssh hinata-z2 'pct exec 102 -- systemctl status hinata-mail-poller.timer'
# View timer schedule
ssh hinata-z2 'pct exec 102 -- systemctl list-timers hinata-mail-poller.timer'
# View recent runs
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service -n 50'
# Stop timer (if needed)
ssh hinata-z2 'pct exec 102 -- systemctl stop hinata-mail-poller.timer'Manual Execution
bash
# Single run (full execution)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py'
# Test mode (dry-run, no writes)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --dry-run'
# Single account (debug)
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --account gmail'
# Verbose output
ssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose'Credentials
All paths below are ct102-local /opt/itachi/credentials/ (mode 600). Canonical copies live in Vaultwarden (CT103) under the same names without .json — see reference_itachi-credential-store.
Gmail (IMAP XOAUTH2 — primary, 2026-06-11)
Files: gmail_oauth_client.json (OAuth web client, GCP project hinata-brain) + gmail_oauth_token.json (refresh token, scope https://mail.google.com/).
At each run the poller exchanges the refresh token for an access token at https://oauth2.googleapis.com/token, then authenticates with AUTHENTICATE XOAUTH2. If either file is absent the poller silently falls back to app-password login. IMAP XOAUTH2 requires the full https://mail.google.com/ scope — gmail.readonly is valid only for the Gmail REST API.
The one-time token mint ran 2026-06-11 (scripts/request-gmail-access.py — redirect URI http://localhost:8765, consent app "In production", unverified-app warning accepted). First XOAUTH2 service connect: 2026-06-11 08:00 BST. Re-mint cases: refresh token revoked (Google password change revokes Gmail-scoped tokens) or token-watchdog alerts Gmail OAuth refresh failed.
Auth state is watched daily by token-watchdog.py (Z2 host timer 08:30) — OAuth fallback to app password alerts via Telegram; see reference_itachi-credential-store § Token death watch.
Gmail (IMAP app password — fallback)
File: mail_imap_credential.json
json
{
"email": "michael.asolo1@gmail.com",
"password": "xxxx xxxx xxxx xxxx"
}The password field is the app-specific password generated in Gmail Security settings. Not the account password.
Microsoft Graph API (Outlook)
App Registration: outlook-graph-credentials.json
json
{
"client_id": "uuid",
"client_secret": "secret",
"tenant_id": "consumers"
}OAuth2 Tokens: outlook-tokens-{account}.json
json
{
"access_token": "token",
"refresh_token": "token",
"updated": "2026-06-05T10:30:45.123456"
}Mail-poller automatically refreshes expired access tokens and writes updated tokens back to disk.
Monitoring
Check Archive Growth
bash
# Count messages per account
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" | cut -d/ -f3 | sort | uniq -c'
# Total count
ssh hinata-z2 'pct exec 102 -- find /mail-archive -name "*.json" | wc -l'
# Archive size
ssh hinata-z2 'pct exec 102 -- du -sh /mail-archive'View State Cursor
bash
ssh hinata-z2 'pct exec 102 -- cat /opt/hinata/mail-poller/state.json | python3 -m json.tool'Check Recent Logs
bash
ssh hinata-z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service --since "1 hour ago"'Troubleshooting
Credentials Not Found
If mail-poller fails with "credential file not found":
Verify ct102's local store (the directory the poller reads):
bashssh hinata-z2 'pct exec 102 -- ls -la /opt/itachi/credentials/'If a file is missing, restore it from Vaultwarden (canonical copies, same names without
.json) viabw get notes [name]on the Z2 host, thenpct pushto ct102 with mode 600. CT103's/opt/itachi/credentials/is empty and is not the source.
IMAP Connection Failed
Inspect credential files keys-only — never print values:
bashssh hinata-z2 'pct exec 102 -- python3 -c "import json; print(sorted(json.load(open(\"/opt/itachi/credentials/gmail_oauth_token.json\")).keys()))"'Run the poller itself as the connection test (logs auth method and failures without printing secrets):
bashssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --dry-run --verbose --account gmail'XOAUTH2 failure plus app-password failure in the same run means both the refresh token and the app password are dead — re-mint via
request-gmail-access.py(see Credentials section).
Graph API Token Expired
Mail-poller automatically refreshes Microsoft tokens before each poll. If refresh fails:
Check token file freshness (keys + updated timestamp only):
bashssh hinata-z2 'pct exec 102 -- python3 -c "import json; d=json.load(open(\"/opt/itachi/credentials/outlook-tokens-hotmail-michael-asolo.json\")); print(sorted(d.keys()), d.get(\"updated\"))"'Manually refresh tokens by regenerating them on the Mac with the Python
msallibrary, thenpct pushto ct102.
No New Messages Fetched
Check state.json for last poll time:
bashssh hinata-z2 'pct exec 102 -- jq .gmail.last_poll /opt/hinata/mail-poller/state.json'Manually poll with verbose output:
bashssh hinata-z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose --account gmail'Check server is not rate-limiting (Gmail: 429 errors, Outlook: 429/503).
Known Issues
[gmail] Skipping folder [Gmail]/Sent Mail: SELECT command error: BAD— folder names containing spaces are not quoted in the SELECT command; Sent Mail is never polled. Pre-existing, present in the deployed script as of 2026-06-11.
Incident Log
2026-06-11 — data-plane disk transport drop
- 07:06:13 UTC: the Z2 data disk (WDC WD2000FYYZ, LABEL
hinata-data) dropped off the SATA bus (transport errors, not media — SMART clean, 0 reallocated/pending sectors). ext4 journal aborted;/mnt/dataunmounted on the host; container binds (including ct102/mail-archive) went stale, archive writes failed with Errno 5. - Poller runs at 07:07 and 07:22 UTC advanced state cursors while archiving 0 messages.
- Recovery:
e2fsck -p(journal replay, clean) → remount → all 10 containers rebooted to refresh binds →state.jsoncursors rewound (gmail per-folder UIDs, Graphlast_received) → verification runs archived 8/8 then 63/63. Rewind is safe because archive filenames are content hashes (idempotent writes). - Backup of pre-rewind state: ct102
/tmp/state.json.pre-rewind. - Watch item: tasks.json 800148 — recurrence means reseat SATA cable/power or replace the drive (62,600 power-on hours).
Performance Notes
- IMAP polling: Scales with number of folders and unread messages since last UID. Typical: 50–200ms per account.
- Graph API polling: Single request per account. Typical: 200–500ms per account.
- Total run time (all 4 accounts): 500ms–2s (depends on message volume and network latency).
- Archive I/O: Writing 100 messages ≈ 50–100ms.
- Memory usage: <50MB.
Development
Testing Locally (Mac)
Before deploying to Z2, test on your Mac:
bash
# Point CREDENTIALS_PATH at a local copy of the ct102 credential files
export CREDENTIALS_PATH=/tmp/mail-poller-test/credentials # never inside the vault
# Create local state directory
mkdir -p /tmp/mail-poller-test/{archive,state,credentials}
export STATE_DIR=/tmp/mail-poller-test
# Run test against the repo mirror
python3 scripts/ct102-mail-poller.py --dry-run --verbose --account gmailModifying the Script
The script uses stdlib only (no pip packages). Key modules:
imaplib— IMAP clientemail.parser— RFC2822 parsingurllib— HTTP requests (for Graph API)json— State and credential file I/Ohashlib— Message ID hashing
To add new accounts or functionality, edit:
ACCOUNTSdict — add account definitionpoll_*()functions — add protocol-specific pollerextract_*_fields()— add field extraction logic
Future Work
- [x] OAuth stability test poll mode (
--oauth-test --since YYYY-MM). 1y window since 2025-06-14. Cursor:_state/state-oauth-test.json. Retention is full history, append-only. CT102 holds one continuous corpus of Mac history + live polls under{account}/{YYYY}/{MM}/{sha256[:16]}.json. Mac local archive at~/Sandpit/hinata/resources/email-poller/rehashes directly into the canonical tree via how-to_rehash-mac-import. Procedure: how-to_mail-poller-consolidation-and-backpop. - [ ] Implement exponential backoff for rate limits (429/503)
- [ ] Add message deduplication (check if message_id already archived)
- [ ] Support other email providers (ProtonMail, Yahoo, etc.)
- [ ] Encrypt state.json at rest (Vaultwarden on CT103 now available)
- [ ] Add FastAPI endpoint for state + archive queries
- [ ] Integration with classification pipeline (Heimerdinger/heim-nlp)
License
Part of Hinata infrastructure. See parent LICENSE.