Skip to content

Apple Health Daily Extraction Pipeline

Goal

Daily extraction of Apple Health data, normalised into per-category CSVs (or Parquet), with optional webhook sync. Pipeline runs from the Mac (not iOS app, no developer account needed).

Constraint

Apple Health data is sandboxed on iPhone/Apple Watch. macOS cannot directly query HealthKit. Therefore:

  • Origin must be iPhone (export or Shortcuts)
  • Processing can be Mac (cron / launchd)
  • Storage / webhook delivery can be Mac or remote
iPhone Health Data
    ↓  (export.zip via Health app, OR Shortcuts auto-export)
iCloud Drive / synced folder

Mac watcher (launchd, NOT cron — better with sleep/wake)

Python ETL (pandas + lxml)

Per-category outputs:
    /apple_health/activity/{steps,distance,calories}.csv
    /apple_health/vitals/{heart_rate,hrv,spo2}.csv
    /apple_health/sleep/{sleep_sessions,sleep_stages}.csv
    /apple_health/workouts/{workouts,running_routes}.csv
    /apple_health/body/{weight,bmi}.csv
    /apple_health/nutrition/{water,macros}.csv

Webhook POST to jimmy-vps /allmight/health-sync

Categories available

Activity (steps, distance, flights, active/basal energy, exercise minutes, stand hours) Vitals (heart rate, resting HR, HRV, respiratory rate, SpO2, BP, body temp) Sleep (stages, in-bed, asleep, REM/core/deep) Body (weight, BMI, body fat %, lean mass, height) Workouts (type, duration, calories, route, pace, elevation) Mobility (walking asymmetry, double support, step length, walking speed) Nutrition (calories, protein, carbs, fat, water, caffeine) Mindfulness (meditation, mindful minutes) ECG / AFib (classifications, history) Environmental (headphone exposure, noise exposure)

XML Extraction (the heavy-lift path)

python
import xml.etree.ElementTree as ET
import pandas as pd
from collections import defaultdict
from pathlib import Path

EXPORT_XML = "export.xml"
OUTPUT_DIR = Path("apple_health_csvs")

tree = ET.parse(EXPORT_XML)
root = tree.getroot()
records_by_type = defaultdict(list)

for record in root.findall("Record"):
    record_type = record.attrib.get("type", "unknown")
    records_by_type[record_type].append({
        "source": record.attrib.get("sourceName"),
        "startDate": record.attrib.get("startDate"),
        "endDate": record.attrib.get("endDate"),
        "value": record.attrib.get("value"),
        "unit": record.attrib.get("unit"),
    })

for record_type, rows in records_by_type.items():
    safe_name = record_type.replace("HKQuantityTypeIdentifier", "").replace("HKCategoryTypeIdentifier", "")
    df = pd.DataFrame(rows)
    out = OUTPUT_DIR / f"{safe_name}.csv"
    out.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(out, index=False)

Shortcuts path (lighter, daily-sync friendly)

iPhone Shortcut runs at 01:00:

  1. Reads health samples (steps today, sleep last night, resting HR, workouts)
  2. Converts to JSON
  3. Saves to iCloud Drive → Mac sees → ETL runs

Example output:

json
{"date": "2026-05-19", "steps": 10342, "resting_hr": 57, "sleep_hours": 7.4}

Coverage is lighter than full export.xml but daily sync is free.

Phased rollout

PhaseScope
1Manual export.xml → Python ETL → CSVs (proves the pipeline)
2Shortcuts daily JSON → iCloud Drive → Mac watcher → incremental ETL
3Webhook upload to jimmy-vps allmight schema (parallel to existing fit-sync)

Important constraints

  • Health exports become huge (1–10 GB XML possible, millions of rows).
  • Do NOT reprocess full export.xml daily. Track latest timestamp processed, UUIDs, hashes → append only new rows.
  • Use launchd not cron on macOS (more reliable with sleep/wake).
  • Long-term: DuckDB/Parquet beats CSV for analytics. Hybrid: raw → CSV → parquet.
/apple_health/
├── raw/           # original export.xml snapshots
├── processed/
│   └── parquet/   # partitioned by date
└── exports/
    └── csv/       # per-category daily writes

Existing Hinata integration points

  • ~/Sandpit/hinata/scripts/fetch-fit-daily.py — Google Fit equivalent, runs daily.
  • AllMight (FOUNDATION) — consumer for HRV, sleep, mindfulness. Context: federation/colonel_saitama-foundation_allmight-health_context.md
  • Zoro (FOUNDATION) — consumer for workouts, body. Context: federation/colonel_saitama-foundation_zoro-fitness_context.md
  • Z2 could host allmight schema for the webhook target (sibling of musicmastery, football, bulma tenants).

Pickup: Jimmy Neutron when health pipeline gets prioritised. No active loop yet.