Skip to content

Trunks Scout — Flagship Researcher Hub

  owner: Trunks
  first_touched: 2026-05-25
  last_touched: 2026-05-25

Flagship Researcher Hub Restructure + UCL Seed Expansion

source: documentation/activities/trunks-scout-flagship-researcher.md

Phase 1 — Restructure

  • Vault inbox/football_scraper/ deleted (no unique content vs Sandpit copy)

  • Sandpit trunks-scout/ reshaped into a HUB with hub-level context.md/overview.md/README.md and tenant trunks-scout/football-researcher/

  • git rm --cached removed 200+ wrongly-tracked Chrome browser_profile files

Trunks API (Phase 2 — scaffolded, not yet deployed)

  • Schema: football.player_profiles — FBref player_id as PK, 7-dim profile, position, competitions, last_match_at, csv_scraped_at, plus data_meta key-value

  • Router: 5 endpoints — POST /profiles (auth, bulk upsert), GET /profiles[/{id}] (public read), POST /match (weighted Euclidean nearest-N), GET /meta (freshness)

  • Auto-cleanup: every sync POST deletes rows whose last_match_at > 540 days

  • Sync client: football-researcher/sync-to-collector.py with --dry-run, --out, --api-url, --key flags

  • Install patch: jimmy-vps-add-football-tenant.sh — idempotent additive deploy

6 Resilience gaps closed

  #GapFix

    1Slugged player_id breaks on transfersUse FBref's own player_id as PK
    2Profile builder PL-onlyCOMPETITIONS_DEFAULT covers 12 major comps
    3Knockout/group/final rounds skippedDropped round-name filter; competition filter alone gates rows
    4Stale profiles accumulateAuto-DELETE on sync POST for last_match_at > 540 days
    5FBref renames a column silently_check_schema() warns loudly if expected columns vanish
    6Consistency math breaks on multi-compconsistency = apps / max_apps

UCL Seed Expansion (2026-05-25 evening)

Widened seed from "PL roster only" to "PL union UCL roster" (~1,100 unique players vs ~600 PL-only). Code changes to config.py, scraper.py, run.py. LaunchAgent split: com.hinata.football-pl.plist (Mon 03:00) + com.hinata.football-ucl.plist (Tue 03:00).

Raw-lake decision (Option A shipped)

Treat CSVs as a raw lake. Filter at the consumer, not at scrape. Added EXTRA_KEEP_FILTERS for La Liga, Bundesliga, Serie A, Ligue 1. Zero new HTTP requests.

Open follow-ups

* #400004 — Phase 2 jimmy-vps deploy

* #400005 — Retire inbox/financial_literacy/HTML Test/football.py

* #400006 — Historical tournament backfill (Euros 2024, WC 2022, etc.)

* #400007 — Live seed-count verification on first PL run (Mon 03:00)

◆ hinata · trunks-scout · folded from documentation/activities/ phase-19