Appearance
SYNTHESIS Voice-Memo Pipeline Plan
Colonel: Sung Jinwoo — SYNTHESIS
Trigger: Training run 2026-05-29 (8 substantive memos; explicit override of DO-NOT-PARSE rule)
Version: 1.1 — updated after re-pass (5 missed transcripts recovered; pre-flight verification step added)
1. End-State
When mature, every voice memo Michael records lands in inbox/Voice Memos/ as a transcript within minutes, is ingested by the SYNTHESIS pipeline automatically, fanned in parallel to all six commanders for domain extraction, and collapses into a single synthesis sheet per memo under the-government/information_reference/reference_voice-memo-syntheses/[date]/[memo-slug].md. Each sheet is a six-section dossier — Heimerdinger's structured entity/theme YAML, Nujabes' audio-feature YAML (when raw audio is retained), Zepile's marketplace/value flags, Deidara's scene/aesthetic capture, Meruem's evolution-axis flags, and L's research-rigour spotlight — consolidated by Sung Jinwoo into a one-paragraph cross-commander synthesis and routed outward to the relevant cross-pillar colonels. The daily memory fold consumes these sheets directly; no manual triage is required.
2. Pipeline Stages
Stage 0 TRANSCRIPTION (already live — Apple Whisper / existing tooling)
Input: inbox/Voice Memos/<date>_<location>.m4a
Output: inbox/Voice Memos/<date>_<location>_transcript.txt
Stage 0.5 PRE-FLIGHT VERIFICATION (NEW — see §10)
Trigger: immediately after transcription, before Stage 1
Check: wc -c transcript.txt > 0 AND first 100 chars are non-whitespace
Output: manifest entry marked VALID or SKIP (with reason)
Rule: a file that passes wc -c > 0 but fails the content check
is flagged as EMPTY-TRANSCRIPT, not silently skipped
Stage 1 INGESTION DETECTION (Jimmy Neutron / Canary inbox sweep)
Trigger: new *_transcript.txt in inbox/Voice Memos/ with VALID status
Output: routing manifest at
the-government/information_reference/reference_voice-memo-syntheses/<date>/routing-manifest.yaml
(lists memo slugs, file paths, batch number, timestamp, verification status)
Stage 2 PER-MEMO FAN-OUT (Sung Jinwoo — this file's routing rules govern)
Input: routing-manifest.yaml
Output: 6 × N commander task slots (see §5 for dispatch shape)
Stage 3 PER-COMMANDER EXTRACTION (6 commanders run in parallel per batch)
Each commander reads its assigned transcripts, writes per-memo YAML/section
to a staging subfolder:
the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>/
heimerdinger.yaml
nujabes.yaml (if raw audio exists; else nujabes-transcript.yaml)
zepile.yaml
deidara.yaml
meruem.yaml
l-research.yaml
Stage 4 CROSS-COMMANDER CONSOLIDATION (Sung Jinwoo)
Input: all 6 staging YAMLs per memo slug
Output: the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>.md
(the canonical synthesis sheet — 6 sections + colonel synthesis paragraph)
Stage 5 OUTPUT ROUTING (Sung Jinwoo routing plan written to)
the-government/information_reference/reference_voice-memo-syntheses/<date>/cross-pillar-routing.md
Cross-pillar colonels are notified via their inbox.3. Per-Commander Remit on Voice Memos
Heimerdinger — NLP extraction
Reads every transcript. Extracts:
- Named entities (people, orgs, places, tools, products)
- Themes (top 5, ranked by frequency + emphasis)
- Action verbs bound to decisions ("we need to", "I will", "let's")
- Decisions and commitments (Michael's first-person assertions)
- Open questions (rhetorical + genuine)
Output: heimerdinger.yaml with keys entities, themes, action_items, decisions, open_questions, classifier_tags[]
Nujabes — Audio analysis
When raw .m4a is preserved alongside transcript:
- Tone mapping (calm / agitated / energised / fatigued)
- Pace (WPM estimated from transcript length + duration)
- Emphasis markers (pauses, volume spikes inferred from repetition patterns in transcript)
- Background context (outdoor / indoor / moving / static — inferred from ambient markers)
Output: nujabes.yaml with keys tone, pace_wpm, emphasis_moments[], ambient_context, energy_level
If no raw audio: nujabes-transcript.yaml with transcript-only proxies (repetition count, sentence fragmentation index as agitation proxy).
Zepile — Marketplace / sourcing / value-spotting
Filters for:
- Any property address, valuation, or viewing mention
- Second-hand or sourced items flagged
- Arbitrage seams (price differentials named, "why does X cost Y")
- Any market or vendor named
Output: zepile.yaml with keys property_signals[], sourcing_flags[], value_gaps[], markets_named[]
If none: signal: none — N/A
Deidara — Visual / aesthetic capture
Filters for:
- Scene or place descriptions (colours, textures, spatial layout)
- Aesthetic judgements ("looks like", "design of", "style of")
- Any visual reference Michael makes that could become a reference image or mood-board input
Output: deidara.yaml with keys scene_descriptions[], aesthetic_notes[], mood_board_seeds[]
If none: signal: none — N/A
Meruem — Evolution-axis flags
Filters for:
- Any dependency on a tool, platform, employer, or system Michael does not control
- Organisational/system seams that reveal leverage or friction points
- Any capability gap named (Michael's or a system's)
- Independence-reducing constraints mentioned
Output: meruem.yaml with keys dependencies_named[], leverage_points[], capability_gaps[], independence_flags[]
L — Research-rigour spotlight
Filters for:
- Any empirical claim Michael makes that is asserted without evidence
- Any methodology named or implied (OKR frameworks, scoring models, survey instruments)
- Any research-worthy question that surfaces
- Any technology or approach named that warrants scouting/verification
Output: l-research.yaml with keys unverified_claims[], methodologies_named[], research_questions[], tech_to_scout[]
4. Routing Rules
Default: fan ALL 6 commanders to EVERY memo. Bias is high-recall extraction. Each commander self-filters — returns signal: none — N/A if nothing relevant exists. This is cheaper than a pre-filter that drops signal.
Exception (single-commander route): Only when Sung Jinwoo reads the transcript and judges it as trivially domain-specific with no recombination potential (e.g., a 5-second ambient ambient recording with no speech). Even then, Heimerdinger always runs — it is the base layer.
Escalation trigger: If Heimerdinger's classifier_tags includes cross-pillar: [FOUNDATION | FORGE | FLOW | FIRE], Sung Jinwoo writes a routing note to the relevant colonel in Stage 5.
5. 6× Volume Parallelism — UPDATED: 48 memos / batch
Correction from v1.0: First-batch actual volume was 8 substantive memos, not 3. At the 6× scaling assumption, the projected batch size is 48 memos, not 18.
48 memos × 6 commanders = 288 extraction units.
Dispatch shape:
Commander run-time: each commander processes 48 memos in 8 micro-batches of 6.
Micro-batch duration: ~90 seconds per batch (transcript read + YAML write).
Total per-commander elapsed: ~12 minutes.
All 6 commanders run concurrently (parallel subagent dispatch).
Total pipeline elapsed: ~13 minutes (dominated by slowest commander's 8th micro-batch).
Dispatch manifest per micro-batch:
batch_1: memos 01–06
batch_2: memos 07–12
batch_3: memos 13–18
batch_4: memos 19–24
batch_5: memos 25–30
batch_6: memos 31–36
batch_7: memos 37–42
batch_8: memos 43–48
Each commander receives the full 48-memo list but processes in ordered micro-batches
to cap per-call context window usage and allow incremental writes.Write contention mitigation: each commander writes to its own [memo-slug]/commander.yaml file — no two commanders write to the same file path. Consolidation (Stage 4) is a single-writer step run after all commander writes complete.
6. Output Aggregation
Per-memo canonical synthesis sheet:
the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>.mdStructure of each sheet:
markdown
# <memo-slug> — Synthesis Sheet
date: <date>
location: <inferred from slug>
## Heimerdinger
[structured extract or embedded YAML block]
## Nujabes
[audio features or transcript proxies]
## Zepile
[marketplace signals or N/A]
## Deidara
[scene/aesthetic or N/A]
## Meruem
[evolution flags or N/A]
## L
[research-rigour spotlight or N/A]
## Colonel Synthesis (Sung Jinwoo)
[1 paragraph: what the 6 commanders reveal together; cross-pillar routing decision]Batch-level index:
the-government/information_reference/reference_voice-memo-syntheses/<date>/index.mdLists all memo slugs, their primary signal tags, and routing destinations.
7. First Training Run — Seed Entries (2026-05-29)
Corrected from v1.0: 8 substantive memos were processed across two passes (3 in first pass; 5 recovered in re-pass).
Seed synthesis sheets (8 memo slugs: 13-banfield-road-12 · grove-vale · heathbrook-park · work-thomas-personal-training · work-zameel-claude-onboarding · work-trust · work-trust-2 · work-trust-catchup) were staged in sandpit voice-memo-syntheses/2026-05-29/; that staging copy was deleted 2026-06-13 after assimilation. The §8 cross-pillar handoffs are the surviving distillation.
8. Cross-Pillar Handoffs
From this training run (and as standing routing rules):
| Signal type | Owning pillar / commander | Routing action |
|---|---|---|
| Property address + viewing notes | FOUNDATION → Bulma (financial) + Simba (shelter) | Write to foundation/ colonel routing note |
| Workplace org-design insight (OKR frameworks, silo critique) | FORGE → Levi (career/work systems) | Write to forge/ routing note |
| Emotional/interpersonal friction at work (management culture) | FLOW → Makima (emotional regulation, relationships) | Write to flow/ routing note |
| Music / sonic moment captured | SYNTHESIS → Nujabes (already internal) | No cross-pillar needed |
| Creative or aesthetic scene | SYNTHESIS → Deidara (already internal) | No cross-pillar needed |
| Technology / tool named that warrants scouting | SYNTHESIS → L, then broadcast if relevant | L scouts first, then routes |
| Michael as sole DE (team capacity crisis) | FORGE → Levi | Flag when 3+ memos in a batch confirm single-point-of-failure pattern |
| Claude onboarding methodology being taught | SYNTHESIS → Heimerdinger | Extract as reusable onboarding template |
9. Scaling Risks + Mitigations — UPDATED for 48-memo volume
| Risk | Trigger volume | Mitigation |
|---|---|---|
| Token cost explosion | 48 memos × 6 commanders = 288 LLM calls at full transcript length | Heimerdinger produces a compressed entity/theme summary first; downstream commanders read the summary + only relevant transcript excerpts, not full text |
| Commander context-window overflow | Memos > 3000 words each (confirmed: Zameel and Thomas memos are ~49K and ~47K bytes respectively — single-line transcripts, not line-separated) | Stage 1 chunking: transcripts > 2000 words are split into 1000-word overlapping chunks; each commander sees chunks, not full text. CRITICAL: one-line transcripts must be handled by word-split, not line-split |
| Write contention on syntheses folder | 288 concurrent writes | Already mitigated by per-commander namespaced file paths; Stage 4 is single-writer |
| Redundant fanning (irrelevant memos) | Ambient / non-speech recordings | Heimerdinger runs first as gate; if classifier_tags: [no-speech], remaining 5 commanders skip |
| Nujabes without raw audio | All current memos | Nujabes runs transcript-proxy mode by default; audio mode is opt-in when .m4a is explicitly retained |
| L over-flagging | Dense analytical memos | L applies a relevance threshold: only flags claims Michael makes in first person with assertion confidence; background conversation is lower priority |
| Routing noise to cross-pillar colonels | High-volume batches | Cross-pillar routing note is written only once per batch, not once per memo — batched into cross-pillar-routing.md |
| NEW — Thematic clustering at 48-memo volume | Batches with many related memos (e.g., 3 Trust sessions, 2 Thomas sessions) | Sung Jinwoo writes a cluster-synthesis note grouping related memos before individual cross-pillar routing — avoids 3 separate FORGE routing notes for the same signal |
| NEW — File-size variance | Memos ranging from 10K to 49K bytes (5× difference in this batch) | Stage 1 routes memos into size tiers: small (<15K), medium (15-30K), large (>30K); large memos get dedicated micro-batch slots to avoid context crowding |
10. Pre-Flight Verification Step (NEW — v1.1)
Why added: In the 2026-05-29 training run, 6 of 8 substantive memos were initially missed. The root cause was a classification error: transcript files with multi-KB content were treated as empty because the ingestion check did not verify actual content — it relied on assumptions about file state.
Pre-flight rule (mandatory from v1.1 onwards):
For every *_transcript.txt file in inbox/Voice Memos/:
Step A — Size check:
wc -c <file> → must be > 0 bytes
If 0: classify as EMPTY-TRANSCRIPT; log and skip (do not silently ignore)
Step B — Content check:
Read first 100 characters of file
If all whitespace or null bytes: classify as WHITESPACE-ONLY; log and skip
If non-trivial text present: classify as VALID
Step C — Single-line check:
If file has 1 line but > 500 bytes: flag as SINGLE-LINE-TRANSCRIPT
This memo requires word-split chunking at Stage 1, not line-split
Log the flag in routing-manifest.yaml under field: transcript_format: single-line
Output:
All VALID files proceed to Stage 1
All EMPTY-TRANSCRIPT and WHITESPACE-ONLY files are logged to
the-government/information_reference/reference_voice-memo-syntheses/<date>/skipped-transcripts.log
A re-pass is triggered if any file is classified EMPTY-TRANSCRIPT and was
previously present (i.e., it may have been written but not yet populated by Whisper)Re-pass trigger: If a VALID file was present at the time of Stage 1 ingestion but was not included in the batch (i.e., discovered only in a later pass), the pipeline emits a RE-PASS-REQUIRED flag and reruns Stage 1 for the missed files. The prior batch is not reprocessed — only the missed memos are added.