Skip to content

Mail Poller Z2 — Complete Implementation Index

Quick Start

  1. First time? Read README.md for overview
  2. Setting up credentials? Follow INSTALLATION.md
  3. Deploying to Z2? Run ./deploy.sh
  4. Testing locally? Use LOCAL_TESTING.md
  5. Verification after deploy? Check DEPLOYMENT_CHECKLIST.md

Files in This Directory

Executable Scripts

FilePurposeUsage
mail-poller.pyMain polling scriptpython3 mail-poller.py [--account NAME] [--dry-run] [--verbose]
deploy.shDeployment to ct102./deploy.sh [--dry-run] [--test-only] [--verbose]

Documentation

FileAudienceContent
README.mdEveryoneOverview, architecture, usage, monitoring, troubleshooting
INSTALLATION.mdDevOps/SetupStep-by-step credential setup + Z2 deployment
LOCAL_TESTING.mdDevelopersTesting mail-poller locally on Mac before Z2 deploy
ARCHITECTURE.mdEngineersDeep dive: system design, protocols, error handling, scaling
DEPLOYMENT_CHECKLIST.mdQA/VerificationPre/during/post-deployment verification steps (Phase 1–3)
INDEX.mdNavigatorThis file

Configuration

FilePurposeFormat
.gitignorePrevent credential commitsStandard .gitignore
test-credentials-TEMPLATE.jsonCredential structure referenceJSON template + comments

File Relationships

mail-poller.py
    ↑ (deployed by)
deploy.sh
    ↑ (uses info from)
INSTALLATION.md, DEPLOYMENT_CHECKLIST.md
    ↓ (references)
README.md, ARCHITECTURE.md

LOCAL_TESTING.md
    ↓ (tests before deploy)
mail-poller.py

Workflows

Setup From Scratch

1. Read README.md (overview)
2. Follow INSTALLATION.md Step 1 (prepare credentials on ct103)
3. Follow INSTALLATION.md Step 2 (run deploy.sh)
4. Use DEPLOYMENT_CHECKLIST.md Phase 2 (verify deployment)

Development/Testing

1. Modify mail-poller.py locally
2. Follow LOCAL_TESTING.md (test on Mac)
3. Run deploy.sh --dry-run (preview Z2 changes)
4. Run deploy.sh (deploy to Z2)
5. Manual test: ssh z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose'

Troubleshooting

1. Check README.md § Troubleshooting
2. Review ARCHITECTURE.md § Error Handling
3. Check logs: ssh z2 'pct exec 102 -- journalctl -u hinata-mail-poller.service'
4. Manual test: ssh z2 'pct exec 102 -- /opt/hinata/mail-poller/mail-poller.py --verbose --account ACCOUNT'

Migration Monitoring (Phase 3)

1. Deploy to Z2 (Phase 2)
2. Monitor for 7–14 days using DEPLOYMENT_CHECKLIST.md Phase 3
3. Disable Mac poller
4. Verify Z2 continues stable for another 7 days

Key Concepts

Accounts

4 email accounts across 2 protocols:

AccountProtocolCredential Files
gmailIMAPmail_imap_credential.json
hotmail-michael-asoloGraph APIoutlook-graph-credentials.json + outlook-tokens-hotmail-michael-asolo.json
outlook-michael-nnamahGraph APIoutlook-graph-credentials.json + outlook-tokens-outlook-michael-nnamah.json
outlook-n-nnamahGraph APIoutlook-graph-credentials.json + outlook-tokens-outlook-n-nnamah.json

State Tracking

  • Gmail (IMAP): Track last UID per folder → only fetch newer
  • Outlook (Graph API): Track last received datetime → only fetch newer
  • Persistence: state.json on ct102 at /opt/hinata/mail-poller/state.json

Archive

  • Location: /opt/hinata/mail-poller/archive/
  • Structure: {account}/{YYYY}/{MM}/{message_hash}.json
  • Content: Email metadata + body (text + HTML)

Automation

  • Trigger: systemd timer (every 15 minutes)
  • Service: hinata-mail-poller.service
  • Timer: hinata-mail-poller.timer
  • Logs: journalctl (searchable by service name)

Development Notes

Stdlib Only

Script uses only Python standard library — no pip packages.

Core modules:

  • imaplib — IMAP client
  • email.parser — RFC2822 parsing
  • urllib — HTTP requests (Graph API)
  • json — State/credential I/O
  • hashlib — Message hashing
  • logging — Structured logging
  • pathlib — Path operations
  • datetime — ISO8601 handling

Extensibility Points

To add new features:

New email account:

  1. Edit ACCOUNTS dict (add account definition)
  2. Implement poll_*() function (IMAP or Graph)
  3. Implement extract_*_fields() (field mapping)

New protocol (ProtonMail, Yahoo, etc.):

  1. Add protocol handler in ACCOUNTS
  2. Implement poller function
  3. Update field extraction

Caching/deduplication:

  1. Load archive index on startup
  2. Check if message_id exists before archiving
  3. Skip duplicates with debug log

Database backend:

  1. Replace state.json with SQLite schema
  2. Replace archive/ filesystem with SQL queries
  3. Add indices for fast lookups

Testing Strategies

Unit Testing

Not implemented (no dependencies to mock). Instead, use:

  • --dry-run flag (fetches but doesn't write)
  • Manual credential testing (see INSTALLATION.md)
  • State inspection (jq queries)

Integration Testing

  1. Local testing on Mac (LOCAL_TESTING.md)
  2. Test on Z2 ct102 (DEPLOYMENT_CHECKLIST.md Phase 2)
  3. 7-day smoke test (DEPLOYMENT_CHECKLIST.md Phase 3)

Performance Testing

  1. Time full run: time python3 mail-poller.py
  2. Check network latency: ping graph.microsoft.com
  3. Monitor ct102 resources: htop, disk I/O

Disaster Scenarios

Scenario: Credential Leaked

Action:

  1. Immediately regenerate password/tokens
  2. Update credential files on ct103
  3. Deploy to ct102: ./deploy.sh
  4. Monitor logs for auth errors

Time to recover: 5 minutes

Scenario: Archive Corrupted

Action:

  1. Restore from backup (if available)
  2. Or delete state.json (refetch all messages)

Time to recover: 5–10 minutes

Scenario: ct102 Disk Full

Action:

  1. Check archive size: du -sh /opt/hinata/mail-poller/archive
  2. Clean old months: rm -rf /opt/hinata/mail-poller/archive/*/2025
  3. Restart service: systemctl restart hinata-mail-poller.timer

Time to recover: 5 minutes

Scenario: Token Refresh Failing

Action:

  1. Check token file: cat /opt/itachi/credentials/outlook-tokens-{account}.json
  2. Regenerate tokens (see INSTALLATION.md)
  3. Test: python3 mail-poller.py --account hotmail-michael-asolo --verbose

Time to recover: 10 minutes

Performance Baseline

Typical execution (4 accounts, ~5–10 new emails):

  • Total time: 2–5 seconds
  • IMAP polling: ~500ms
  • Graph API polling: ~1.5s (3 accounts)
  • Archive I/O: ~100ms
  • Network latency: varies (0–2s)

With 100+ new messages:

  • Total time: 5–15 seconds
  • Bottleneck: IMAP download speed (depends on message size)

Operations Checklist

Daily

  • [ ] Check logs for errors: journalctl -u hinata-mail-poller.service --since 24h
  • [ ] Verify archive growth: find archive -newermt "24 hours ago" | wc -l

Weekly

  • [ ] Check state.json is advancing: jq '.[] | .last_poll' state.json
  • [ ] Monitor disk usage: du -sh archive/
  • [ ] Verify all 4 accounts ran successfully

Monthly

  • [ ] Rotate tokens (if approaching expiry)
  • [ ] Review logs for patterns (rate limits, timeouts)
  • [ ] Backup state.json (keep copies of successful states)

Connections to Other Systems

Upstream (Credentials)

  • ct103/itachi: Stores credential files
    • Mail-poller reads from here on each run
    • Future: Migrate to Vaultwarden (#840058)

Downstream (Archive Usage)

  • Studio API: Reads archive/ for email display (future)
  • Heimerdinger classifier: Reads archive/ for classification (future)
  • iCloud sync: Archive backed up via Sandpit → iCloud (future)

Version History

Current Version: 1.0 (2026-06-05)

  • Initial implementation
  • 4 accounts (Gmail IMAP, 3x Outlook Graph)
  • Incremental polling with state persistence
  • Systemd timer automation

Future Versions:

  • v1.1: Add backfill mode (--since DATE)
  • v2.0: Database backend (SQLite)
  • v2.1: FastAPI endpoints for archive queries
  • v3.0: Classification pipeline integration
  • projects/brain/understanding_mail-poller-z2-migration-strategy.md — Full migration plan
  • projects/infrastructure/reference_hinata-z2-repo-specification.md — Z2 infrastructure design
  • projects/infrastructure/understanding_z2-sandpit-sync-migration-strategy.md — Archive sync strategy
  • the-government/feedback/ — Governance + operational laws