Appearance
Trunks Scout — Flagship Researcher Hub
owner: Trunks
first_touched: 2026-05-25
last_touched: 2026-05-25
Flagship Researcher Hub Restructure + UCL Seed Expansion
source: documentation/activities/trunks-scout-flagship-researcher.md
Phase 1 — Restructure
Vault
inbox/football_scraper/deleted (no unique content vs Sandpit copy)Sandpit
trunks-scout/reshaped into a HUB with hub-level context.md/overview.md/README.md and tenanttrunks-scout/football-researcher/git rm --cachedremoved 200+ wrongly-tracked Chrome browser_profile files
Trunks API (Phase 2 — scaffolded, not yet deployed)
Schema:
football.player_profiles— FBref player_id as PK, 7-dim profile, position, competitions, last_match_at, csv_scraped_at, plus data_meta key-valueRouter: 5 endpoints — POST /profiles (auth, bulk upsert), GET /profiles[/{id}] (public read), POST /match (weighted Euclidean nearest-N), GET /meta (freshness)
Auto-cleanup: every sync POST deletes rows whose last_match_at > 540 days
Sync client:
football-researcher/sync-to-collector.pywith --dry-run, --out, --api-url, --key flagsInstall patch:
jimmy-vps-add-football-tenant.sh— idempotent additive deploy
6 Resilience gaps closed
#GapFix
1Slugged player_id breaks on transfersUse FBref's own player_id as PK
2Profile builder PL-onlyCOMPETITIONS_DEFAULT covers 12 major comps
3Knockout/group/final rounds skippedDropped round-name filter; competition filter alone gates rows
4Stale profiles accumulateAuto-DELETE on sync POST for last_match_at > 540 days
5FBref renames a column silently_check_schema() warns loudly if expected columns vanish
6Consistency math breaks on multi-compconsistency = apps / max_apps
UCL Seed Expansion (2026-05-25 evening)
Widened seed from "PL roster only" to "PL union UCL roster" (~1,100 unique players vs ~600 PL-only). Code changes to config.py, scraper.py, run.py. LaunchAgent split: com.hinata.football-pl.plist (Mon 03:00) + com.hinata.football-ucl.plist (Tue 03:00).
Raw-lake decision (Option A shipped)
Treat CSVs as a raw lake. Filter at the consumer, not at scrape. Added EXTRA_KEEP_FILTERS for La Liga, Bundesliga, Serie A, Ligue 1. Zero new HTTP requests.
Open follow-ups
* #400004 — Phase 2 jimmy-vps deploy
* #400005 — Retire inbox/financial_literacy/HTML Test/football.py
* #400006 — Historical tournament backfill (Euros 2024, WC 2022, etc.)
* #400007 — Live seed-count verification on first PL run (Mon 03:00)
◆ hinata · trunks-scout · folded from documentation/activities/ phase-19