Skip to content

SYNTHESIS Voice-Memo Pipeline Plan

Colonel: Sung Jinwoo — SYNTHESIS
Trigger: Training run 2026-05-29 (8 substantive memos; explicit override of DO-NOT-PARSE rule)
Version: 1.1 — updated after re-pass (5 missed transcripts recovered; pre-flight verification step added)


1. End-State

When mature, every voice memo Michael records lands in inbox/Voice Memos/ as a transcript within minutes, is ingested by the SYNTHESIS pipeline automatically, fanned in parallel to all six commanders for domain extraction, and collapses into a single synthesis sheet per memo under the-government/information_reference/reference_voice-memo-syntheses/[date]/[memo-slug].md. Each sheet is a six-section dossier — Heimerdinger's structured entity/theme YAML, Nujabes' audio-feature YAML (when raw audio is retained), Zepile's marketplace/value flags, Deidara's scene/aesthetic capture, Meruem's evolution-axis flags, and L's research-rigour spotlight — consolidated by Sung Jinwoo into a one-paragraph cross-commander synthesis and routed outward to the relevant cross-pillar colonels. The daily memory fold consumes these sheets directly; no manual triage is required.


2. Pipeline Stages

Stage 0  TRANSCRIPTION          (already live — Apple Whisper / existing tooling)
         Input:  inbox/Voice Memos/<date>_<location>.m4a
         Output: inbox/Voice Memos/<date>_<location>_transcript.txt

Stage 0.5 PRE-FLIGHT VERIFICATION  (NEW — see §10)
         Trigger: immediately after transcription, before Stage 1
         Check:  wc -c transcript.txt > 0 AND first 100 chars are non-whitespace
         Output: manifest entry marked VALID or SKIP (with reason)
         Rule:   a file that passes wc -c > 0 but fails the content check
                 is flagged as EMPTY-TRANSCRIPT, not silently skipped

Stage 1  INGESTION DETECTION    (Jimmy Neutron / Canary inbox sweep)
         Trigger: new *_transcript.txt in inbox/Voice Memos/ with VALID status
         Output:  routing manifest at
                  the-government/information_reference/reference_voice-memo-syntheses/<date>/routing-manifest.yaml
                  (lists memo slugs, file paths, batch number, timestamp, verification status)

Stage 2  PER-MEMO FAN-OUT       (Sung Jinwoo — this file's routing rules govern)
         Input:  routing-manifest.yaml
         Output: 6 × N commander task slots (see §5 for dispatch shape)

Stage 3  PER-COMMANDER EXTRACTION  (6 commanders run in parallel per batch)
         Each commander reads its assigned transcripts, writes per-memo YAML/section
         to a staging subfolder:
         the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>/
             heimerdinger.yaml
             nujabes.yaml         (if raw audio exists; else nujabes-transcript.yaml)
             zepile.yaml
             deidara.yaml
             meruem.yaml
             l-research.yaml

Stage 4  CROSS-COMMANDER CONSOLIDATION  (Sung Jinwoo)
         Input:  all 6 staging YAMLs per memo slug
         Output: the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>.md
                 (the canonical synthesis sheet — 6 sections + colonel synthesis paragraph)

Stage 5  OUTPUT ROUTING         (Sung Jinwoo routing plan written to)
         the-government/information_reference/reference_voice-memo-syntheses/<date>/cross-pillar-routing.md
         Cross-pillar colonels are notified via their inbox.

3. Per-Commander Remit on Voice Memos

Heimerdinger — NLP extraction

Reads every transcript. Extracts:

  • Named entities (people, orgs, places, tools, products)
  • Themes (top 5, ranked by frequency + emphasis)
  • Action verbs bound to decisions ("we need to", "I will", "let's")
  • Decisions and commitments (Michael's first-person assertions)
  • Open questions (rhetorical + genuine)

Output: heimerdinger.yaml with keys entities, themes, action_items, decisions, open_questions, classifier_tags[]

Nujabes — Audio analysis

When raw .m4a is preserved alongside transcript:

  • Tone mapping (calm / agitated / energised / fatigued)
  • Pace (WPM estimated from transcript length + duration)
  • Emphasis markers (pauses, volume spikes inferred from repetition patterns in transcript)
  • Background context (outdoor / indoor / moving / static — inferred from ambient markers)

Output: nujabes.yaml with keys tone, pace_wpm, emphasis_moments[], ambient_context, energy_level
If no raw audio: nujabes-transcript.yaml with transcript-only proxies (repetition count, sentence fragmentation index as agitation proxy).

Zepile — Marketplace / sourcing / value-spotting

Filters for:

  • Any property address, valuation, or viewing mention
  • Second-hand or sourced items flagged
  • Arbitrage seams (price differentials named, "why does X cost Y")
  • Any market or vendor named

Output: zepile.yaml with keys property_signals[], sourcing_flags[], value_gaps[], markets_named[]
If none: signal: none — N/A

Deidara — Visual / aesthetic capture

Filters for:

  • Scene or place descriptions (colours, textures, spatial layout)
  • Aesthetic judgements ("looks like", "design of", "style of")
  • Any visual reference Michael makes that could become a reference image or mood-board input

Output: deidara.yaml with keys scene_descriptions[], aesthetic_notes[], mood_board_seeds[]
If none: signal: none — N/A

Meruem — Evolution-axis flags

Filters for:

  • Any dependency on a tool, platform, employer, or system Michael does not control
  • Organisational/system seams that reveal leverage or friction points
  • Any capability gap named (Michael's or a system's)
  • Independence-reducing constraints mentioned

Output: meruem.yaml with keys dependencies_named[], leverage_points[], capability_gaps[], independence_flags[]

L — Research-rigour spotlight

Filters for:

  • Any empirical claim Michael makes that is asserted without evidence
  • Any methodology named or implied (OKR frameworks, scoring models, survey instruments)
  • Any research-worthy question that surfaces
  • Any technology or approach named that warrants scouting/verification

Output: l-research.yaml with keys unverified_claims[], methodologies_named[], research_questions[], tech_to_scout[]


4. Routing Rules

Default: fan ALL 6 commanders to EVERY memo. Bias is high-recall extraction. Each commander self-filters — returns signal: none — N/A if nothing relevant exists. This is cheaper than a pre-filter that drops signal.

Exception (single-commander route): Only when Sung Jinwoo reads the transcript and judges it as trivially domain-specific with no recombination potential (e.g., a 5-second ambient ambient recording with no speech). Even then, Heimerdinger always runs — it is the base layer.

Escalation trigger: If Heimerdinger's classifier_tags includes cross-pillar: [FOUNDATION | FORGE | FLOW | FIRE], Sung Jinwoo writes a routing note to the relevant colonel in Stage 5.


5. 6× Volume Parallelism — UPDATED: 48 memos / batch

Correction from v1.0: First-batch actual volume was 8 substantive memos, not 3. At the 6× scaling assumption, the projected batch size is 48 memos, not 18.

48 memos × 6 commanders = 288 extraction units.

Dispatch shape:

Commander run-time: each commander processes 48 memos in 8 micro-batches of 6.
Micro-batch duration: ~90 seconds per batch (transcript read + YAML write).
Total per-commander elapsed: ~12 minutes.
All 6 commanders run concurrently (parallel subagent dispatch).
Total pipeline elapsed: ~13 minutes (dominated by slowest commander's 8th micro-batch).

Dispatch manifest per micro-batch:
  batch_1:  memos 01–06
  batch_2:  memos 07–12
  batch_3:  memos 13–18
  batch_4:  memos 19–24
  batch_5:  memos 25–30
  batch_6:  memos 31–36
  batch_7:  memos 37–42
  batch_8:  memos 43–48

Each commander receives the full 48-memo list but processes in ordered micro-batches
to cap per-call context window usage and allow incremental writes.

Write contention mitigation: each commander writes to its own [memo-slug]/commander.yaml file — no two commanders write to the same file path. Consolidation (Stage 4) is a single-writer step run after all commander writes complete.


6. Output Aggregation

Per-memo canonical synthesis sheet:

the-government/information_reference/reference_voice-memo-syntheses/<date>/<memo-slug>.md

Structure of each sheet:

markdown
# <memo-slug> — Synthesis Sheet
date: <date>
location: <inferred from slug>

## Heimerdinger
[structured extract or embedded YAML block]

## Nujabes
[audio features or transcript proxies]

## Zepile
[marketplace signals or N/A]

## Deidara
[scene/aesthetic or N/A]

## Meruem
[evolution flags or N/A]

## L
[research-rigour spotlight or N/A]

## Colonel Synthesis (Sung Jinwoo)
[1 paragraph: what the 6 commanders reveal together; cross-pillar routing decision]

Batch-level index:

the-government/information_reference/reference_voice-memo-syntheses/<date>/index.md

Lists all memo slugs, their primary signal tags, and routing destinations.


7. First Training Run — Seed Entries (2026-05-29)

Corrected from v1.0: 8 substantive memos were processed across two passes (3 in first pass; 5 recovered in re-pass).

Seed synthesis sheets (8 memo slugs: 13-banfield-road-12 · grove-vale · heathbrook-park · work-thomas-personal-training · work-zameel-claude-onboarding · work-trust · work-trust-2 · work-trust-catchup) were staged in sandpit voice-memo-syntheses/2026-05-29/; that staging copy was deleted 2026-06-13 after assimilation. The §8 cross-pillar handoffs are the surviving distillation.


8. Cross-Pillar Handoffs

From this training run (and as standing routing rules):

Signal typeOwning pillar / commanderRouting action
Property address + viewing notesFOUNDATION → Bulma (financial) + Simba (shelter)Write to foundation/ colonel routing note
Workplace org-design insight (OKR frameworks, silo critique)FORGE → Levi (career/work systems)Write to forge/ routing note
Emotional/interpersonal friction at work (management culture)FLOW → Makima (emotional regulation, relationships)Write to flow/ routing note
Music / sonic moment capturedSYNTHESIS → Nujabes (already internal)No cross-pillar needed
Creative or aesthetic sceneSYNTHESIS → Deidara (already internal)No cross-pillar needed
Technology / tool named that warrants scoutingSYNTHESIS → L, then broadcast if relevantL scouts first, then routes
Michael as sole DE (team capacity crisis)FORGE → LeviFlag when 3+ memos in a batch confirm single-point-of-failure pattern
Claude onboarding methodology being taughtSYNTHESIS → HeimerdingerExtract as reusable onboarding template

9. Scaling Risks + Mitigations — UPDATED for 48-memo volume

RiskTrigger volumeMitigation
Token cost explosion48 memos × 6 commanders = 288 LLM calls at full transcript lengthHeimerdinger produces a compressed entity/theme summary first; downstream commanders read the summary + only relevant transcript excerpts, not full text
Commander context-window overflowMemos > 3000 words each (confirmed: Zameel and Thomas memos are ~49K and ~47K bytes respectively — single-line transcripts, not line-separated)Stage 1 chunking: transcripts > 2000 words are split into 1000-word overlapping chunks; each commander sees chunks, not full text. CRITICAL: one-line transcripts must be handled by word-split, not line-split
Write contention on syntheses folder288 concurrent writesAlready mitigated by per-commander namespaced file paths; Stage 4 is single-writer
Redundant fanning (irrelevant memos)Ambient / non-speech recordingsHeimerdinger runs first as gate; if classifier_tags: [no-speech], remaining 5 commanders skip
Nujabes without raw audioAll current memosNujabes runs transcript-proxy mode by default; audio mode is opt-in when .m4a is explicitly retained
L over-flaggingDense analytical memosL applies a relevance threshold: only flags claims Michael makes in first person with assertion confidence; background conversation is lower priority
Routing noise to cross-pillar colonelsHigh-volume batchesCross-pillar routing note is written only once per batch, not once per memo — batched into cross-pillar-routing.md
NEW — Thematic clustering at 48-memo volumeBatches with many related memos (e.g., 3 Trust sessions, 2 Thomas sessions)Sung Jinwoo writes a cluster-synthesis note grouping related memos before individual cross-pillar routing — avoids 3 separate FORGE routing notes for the same signal
NEW — File-size varianceMemos ranging from 10K to 49K bytes (5× difference in this batch)Stage 1 routes memos into size tiers: small (<15K), medium (15-30K), large (>30K); large memos get dedicated micro-batch slots to avoid context crowding

10. Pre-Flight Verification Step (NEW — v1.1)

Why added: In the 2026-05-29 training run, 6 of 8 substantive memos were initially missed. The root cause was a classification error: transcript files with multi-KB content were treated as empty because the ingestion check did not verify actual content — it relied on assumptions about file state.

Pre-flight rule (mandatory from v1.1 onwards):

For every *_transcript.txt file in inbox/Voice Memos/:

  Step A — Size check:
    wc -c <file> → must be > 0 bytes
    If 0: classify as EMPTY-TRANSCRIPT; log and skip (do not silently ignore)

  Step B — Content check:
    Read first 100 characters of file
    If all whitespace or null bytes: classify as WHITESPACE-ONLY; log and skip
    If non-trivial text present: classify as VALID

  Step C — Single-line check:
    If file has 1 line but > 500 bytes: flag as SINGLE-LINE-TRANSCRIPT
    This memo requires word-split chunking at Stage 1, not line-split
    Log the flag in routing-manifest.yaml under field: transcript_format: single-line

  Output:
    All VALID files proceed to Stage 1
    All EMPTY-TRANSCRIPT and WHITESPACE-ONLY files are logged to
      the-government/information_reference/reference_voice-memo-syntheses/<date>/skipped-transcripts.log
    A re-pass is triggered if any file is classified EMPTY-TRANSCRIPT and was
    previously present (i.e., it may have been written but not yet populated by Whisper)

Re-pass trigger: If a VALID file was present at the time of Stage 1 ingestion but was not included in the batch (i.e., discovered only in a later pass), the pipeline emits a RE-PASS-REQUIRED flag and reruns Stage 1 for the missed files. The prior batch is not reprocessed — only the missed memos are added.