Apple Health Daily Extraction Pipeline

Goal

Daily extraction of Apple Health data, normalised into per-category CSVs (or Parquet), with optional webhook sync. Pipeline runs from the Mac (not iOS app, no developer account needed).

Constraint

Apple Health data is sandboxed on iPhone/Apple Watch. macOS cannot directly query HealthKit. Therefore:

Origin must be iPhone (export or Shortcuts)
Processing can be Mac (cron / launchd)
Storage / webhook delivery can be Mac or remote

Recommended architecture

iPhone Health Data
    ↓  (export.zip via Health app, OR Shortcuts auto-export)
iCloud Drive / synced folder
    ↓
Mac watcher (launchd, NOT cron — better with sleep/wake)
    ↓
Python ETL (pandas + lxml)
    ↓
Per-category outputs:
    /apple_health/activity/{steps,distance,calories}.csv
    /apple_health/vitals/{heart_rate,hrv,spo2}.csv
    /apple_health/sleep/{sleep_sessions,sleep_stages}.csv
    /apple_health/workouts/{workouts,running_routes}.csv
    /apple_health/body/{weight,bmi}.csv
    /apple_health/nutrition/{water,macros}.csv
    ↓
Webhook POST to jimmy-vps /allmight/health-sync

Categories available

Activity (steps, distance, flights, active/basal energy, exercise minutes, stand hours) Vitals (heart rate, resting HR, HRV, respiratory rate, SpO2, BP, body temp) Sleep (stages, in-bed, asleep, REM/core/deep) Body (weight, BMI, body fat %, lean mass, height) Workouts (type, duration, calories, route, pace, elevation) Mobility (walking asymmetry, double support, step length, walking speed) Nutrition (calories, protein, carbs, fat, water, caffeine) Mindfulness (meditation, mindful minutes) ECG / AFib (classifications, history) Environmental (headphone exposure, noise exposure)

XML Extraction (the heavy-lift path)

python

import xml.etree.ElementTree as ET
import pandas as pd
from collections import defaultdict
from pathlib import Path

EXPORT_XML = "export.xml"
OUTPUT_DIR = Path("apple_health_csvs")

tree = ET.parse(EXPORT_XML)
root = tree.getroot()
records_by_type = defaultdict(list)

for record in root.findall("Record"):
    record_type = record.attrib.get("type", "unknown")
    records_by_type[record_type].append({
        "source": record.attrib.get("sourceName"),
        "startDate": record.attrib.get("startDate"),
        "endDate": record.attrib.get("endDate"),
        "value": record.attrib.get("value"),
        "unit": record.attrib.get("unit"),
    })

for record_type, rows in records_by_type.items():
    safe_name = record_type.replace("HKQuantityTypeIdentifier", "").replace("HKCategoryTypeIdentifier", "")
    df = pd.DataFrame(rows)
    out = OUTPUT_DIR / f"{safe_name}.csv"
    out.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(out, index=False)

Shortcuts path (lighter, daily-sync friendly)

iPhone Shortcut runs at 01:00:

Reads health samples (steps today, sleep last night, resting HR, workouts)
Converts to JSON
Saves to iCloud Drive → Mac sees → ETL runs

Example output:

json

{"date": "2026-05-19", "steps": 10342, "resting_hr": 57, "sleep_hours": 7.4}

Coverage is lighter than full export.xml but daily sync is free.

Phased rollout

Phase	Scope
1	Manual export.xml → Python ETL → CSVs (proves the pipeline)
2	Shortcuts daily JSON → iCloud Drive → Mac watcher → incremental ETL
3	Webhook upload to jimmy-vps allmight schema (parallel to existing fit-sync)

Important constraints

Health exports become huge (1–10 GB XML possible, millions of rows).
Do NOT reprocess full export.xml daily. Track latest timestamp processed, UUIDs, hashes → append only new rows.
Use launchd not cron on macOS (more reliable with sleep/wake).
Long-term: DuckDB/Parquet beats CSV for analytics. Hybrid: raw → CSV → parquet.

Storage shape (recommended)

/apple_health/
├── raw/           # original export.xml snapshots
├── processed/
│   └── parquet/   # partitioned by date
└── exports/
    └── csv/       # per-category daily writes

Existing Hinata integration points

~/Sandpit/hinata/scripts/fetch-fit-daily.py — Google Fit equivalent, runs daily.
AllMight (FOUNDATION) — consumer for HRV, sleep, mindfulness. Context: federation/colonel_saitama-foundation_allmight-health_context.md
Zoro (FOUNDATION) — consumer for workouts, body. Context: federation/colonel_saitama-foundation_zoro-fitness_context.md
Z2 could host allmight schema for the webhook target (sibling of musicmastery, football, bulma tenants).

Pickup: Jimmy Neutron when health pipeline gets prioritised. No active loop yet.

Apple Health Daily Extraction Pipeline ​

Goal ​

Constraint ​

Recommended architecture ​

Categories available ​

XML Extraction (the heavy-lift path) ​

Shortcuts path (lighter, daily-sync friendly) ​

Phased rollout ​

Important constraints ​

Storage shape (recommended) ​

Existing Hinata integration points ​