Appearance
Apple Health Daily Extraction Pipeline
Goal
Daily extraction of Apple Health data, normalised into per-category CSVs (or Parquet), with optional webhook sync. Pipeline runs from the Mac (not iOS app, no developer account needed).
Constraint
Apple Health data is sandboxed on iPhone/Apple Watch. macOS cannot directly query HealthKit. Therefore:
- Origin must be iPhone (export or Shortcuts)
- Processing can be Mac (cron / launchd)
- Storage / webhook delivery can be Mac or remote
Recommended architecture
iPhone Health Data
↓ (export.zip via Health app, OR Shortcuts auto-export)
iCloud Drive / synced folder
↓
Mac watcher (launchd, NOT cron — better with sleep/wake)
↓
Python ETL (pandas + lxml)
↓
Per-category outputs:
/apple_health/activity/{steps,distance,calories}.csv
/apple_health/vitals/{heart_rate,hrv,spo2}.csv
/apple_health/sleep/{sleep_sessions,sleep_stages}.csv
/apple_health/workouts/{workouts,running_routes}.csv
/apple_health/body/{weight,bmi}.csv
/apple_health/nutrition/{water,macros}.csv
↓
Webhook POST to jimmy-vps /allmight/health-syncCategories available
Activity (steps, distance, flights, active/basal energy, exercise minutes, stand hours) Vitals (heart rate, resting HR, HRV, respiratory rate, SpO2, BP, body temp) Sleep (stages, in-bed, asleep, REM/core/deep) Body (weight, BMI, body fat %, lean mass, height) Workouts (type, duration, calories, route, pace, elevation) Mobility (walking asymmetry, double support, step length, walking speed) Nutrition (calories, protein, carbs, fat, water, caffeine) Mindfulness (meditation, mindful minutes) ECG / AFib (classifications, history) Environmental (headphone exposure, noise exposure)
XML Extraction (the heavy-lift path)
python
import xml.etree.ElementTree as ET
import pandas as pd
from collections import defaultdict
from pathlib import Path
EXPORT_XML = "export.xml"
OUTPUT_DIR = Path("apple_health_csvs")
tree = ET.parse(EXPORT_XML)
root = tree.getroot()
records_by_type = defaultdict(list)
for record in root.findall("Record"):
record_type = record.attrib.get("type", "unknown")
records_by_type[record_type].append({
"source": record.attrib.get("sourceName"),
"startDate": record.attrib.get("startDate"),
"endDate": record.attrib.get("endDate"),
"value": record.attrib.get("value"),
"unit": record.attrib.get("unit"),
})
for record_type, rows in records_by_type.items():
safe_name = record_type.replace("HKQuantityTypeIdentifier", "").replace("HKCategoryTypeIdentifier", "")
df = pd.DataFrame(rows)
out = OUTPUT_DIR / f"{safe_name}.csv"
out.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(out, index=False)Shortcuts path (lighter, daily-sync friendly)
iPhone Shortcut runs at 01:00:
- Reads health samples (steps today, sleep last night, resting HR, workouts)
- Converts to JSON
- Saves to iCloud Drive → Mac sees → ETL runs
Example output:
json
{"date": "2026-05-19", "steps": 10342, "resting_hr": 57, "sleep_hours": 7.4}Coverage is lighter than full export.xml but daily sync is free.
Phased rollout
| Phase | Scope |
|---|---|
| 1 | Manual export.xml → Python ETL → CSVs (proves the pipeline) |
| 2 | Shortcuts daily JSON → iCloud Drive → Mac watcher → incremental ETL |
| 3 | Webhook upload to jimmy-vps allmight schema (parallel to existing fit-sync) |
Important constraints
- Health exports become huge (1–10 GB XML possible, millions of rows).
- Do NOT reprocess full export.xml daily. Track latest timestamp processed, UUIDs, hashes → append only new rows.
- Use
launchdnotcronon macOS (more reliable with sleep/wake). - Long-term: DuckDB/Parquet beats CSV for analytics. Hybrid: raw → CSV → parquet.
Storage shape (recommended)
/apple_health/
├── raw/ # original export.xml snapshots
├── processed/
│ └── parquet/ # partitioned by date
└── exports/
└── csv/ # per-category daily writesExisting Hinata integration points
~/Sandpit/hinata/scripts/fetch-fit-daily.py— Google Fit equivalent, runs daily.- AllMight (FOUNDATION) — consumer for HRV, sleep, mindfulness. Context:
federation/colonel_saitama-foundation_allmight-health_context.md - Zoro (FOUNDATION) — consumer for workouts, body. Context:
federation/colonel_saitama-foundation_zoro-fitness_context.md - Z2 could host
allmightschema for the webhook target (sibling of musicmastery, football, bulma tenants).
Pickup: Jimmy Neutron when health pipeline gets prioritised. No active loop yet.