# Validation Report — 2026-04-09 Meeting DB-lock fix and manual phase4 timeout follow-up - Date: 2026-04-09 - Project root: `/home/clawdbot/clawd/Event_management` - Validation run session: `golden-spruce-leaf` - Validation report source: `/home/clawdbot/clawd/Event_management/reports/pipeline-golden-spruce-leaf.md` ## Goal Validate the remediation implemented for the Meeting Vienna DB-lock and scrape-timeout incident, then determine the next remaining failure mode if any. ## Result summary The validation run produced a **mixed but useful** result. ### Validated as fixed/mitigated - `meeting:scrape` passed - `meeting:deep_dive` passed - no recurrence of the old `database is locked` hard stage failure was observed in the validated run - DB runtime remained in the hardened state: - `journal_mode = wal` - `busy_timeout = 30000` ### Still failing - `meeting:manual_phase4` still failed - failure class: - `infra_timeout` - duration: - `2400.1s` ## Stage outcomes from validation run From `pipeline-golden-spruce-leaf.md`: - `meeting:scrape` → `ok` in `2174.22s` - `meeting:deep_dive` → `ok` in `367.39s` - `meeting:manual_phase4` → `failed` in `2400.1s` (timeout) ## What this validates ### 1. Original scrape timeout + DB-lock bug is materially improved This is the main good news. Yesterday’s failing Meeting path looked like: - scrape timeout at 2400s - later manual phase4 crash on `database is locked` Today’s validation run shows: - scrape no longer times out at the old 2400s ceiling - deep_dive completes normally - the lock-crash pattern is no longer the observed failure mode That means the DB hardening and scrape-timeout change did real work. ### 2. A second bottleneck is now exposed cleanly Once the lock/crash noise was removed, the next real bottleneck became visible: - `meeting:manual_phase4` runtime design is too heavy for the current timeout budget ## Root cause of the remaining manual phase4 failure ### Culprit The core problem is a **runtime-budget mismatch inside `_manual_phase4_common.py`**. The code claims: - `MAX_EVENT_SECONDS = 60` - `MAX_PAGE_VISITS_PER_EVENT = 8` But in reality each page visit can block for: - `pg.goto(..., timeout=60000)` - `pg2.goto(..., timeout=60000)` That means the real worst case is not 60s per event. It can be up to roughly: - 1 target page × 60s - plus up to 7 hop pages × 60s - = **up to ~480s per event** before caps stop it So the code enforces a 60-second event budget only **between navigations**, not **around** them. ### Why that matters The timeout checks happen before each navigation call: - `if time.time() - event_t0 > MAX_EVENT_SECONDS:` But once a `goto(..., timeout=60000)` starts, that single call can consume the entire event budget by itself. So the event budget is not actually a hard event budget. It is more like a polite suggestion. ## Evidence from the validation run ### 1. Manual phase4 candidates Current validation run started with: - `candidates = 30` ### 2. Heavy per-event runtime behavior For the current run: - several events took ~60–70s each - repeated blocked/form-only cases hit page-visit caps and/or many follow-up pages Examples seen in the live phase4 logs: - `Favoriten in der Kardiologie` - `69.34s` - `page_visits = 8` - `phase4_tag = manual_blocked` - `10. D-A-C-H – Symposium ...` - `66.03s` - `phase4_tag = manual_blocked` - `25th Annual Conference on European Tort Law` - `65.74s` - `phase4_tag = manual_recovered` - `REAL CORP 2026 ...` - `63.98s` - `page_visits = 8` - `phase4_tag = manual_form_only` ### 3. The shape of expensive cases The worst cases are not necessarily the ones that recover useful data. They are often: - blocked targets - Cloudflare-ish or anti-bot targets - targets with many contact-like fallback pages - domains like `registration.maw.co.at` or other conference microsites where multiple fallback hops are attempted So the script burns a lot of budget even when the outcome is just `manual_blocked` or `manual_form_only`. ## Practical interpretation The validated situation is now: ### Fixed enough to count as progress - DB lock contention no longer appears to be the active killer - scrape budget problem is mitigated ### Newly exposed remaining bug - manual phase4 still has a structural runtime bug - its per-event caps are weaker than they look - the stage timeout of 2400s is not realistic for 30 expensive candidates under the current navigation model ## Recommended next remediation direction 1. Add a real per-event deadline around browser navigation calls - effective timeout for each `goto()` should shrink based on remaining event budget 2. Reduce `MAX_PAGE_VISITS_PER_EVENT` for Meeting Vienna or make it source-specific 3. Prioritize low-cost/high-value targets first and stop early for low-value fallback paths 4. Add checkpoint/resume for manual phase4 so timing out does not waste partial progress 5. Revisit stage timeout after runtime shaping, not just by blindly increasing it ## Related bug records Existing mitigated bug: - DB lock + scrape timeout issue - `/home/clawdbot/clawd/Event_management/docs/bugs.jsonl` New runtime bug should be tracked separately because it is a distinct root cause: - manual phase4 per-event timeout mismatch / stage budget exhaustion ## Bottom line The first remediation worked well enough to expose the next real culprit. That culprit is not DB locking anymore. It is the mismatch between: - nominal event budget (`60s`) - actual browser wait behavior (`60s` per navigation, repeated multiple times) In short: the script thinks in fox-time, but the browser waits in glacier-time.