Wave 47 — Dim Q (time_machine + counterfactual) migration PR
- Date: 2026-05-12
- Lane: w47-dim-q-mig (atomic mkdir)
- Worktree:
/tmp/jpcite-w47-dim-q-mig (origin/main + 7f4ceb9f4)
- Branch:
feat/jpcite_2026_05_12_wave47_dim_q_migration
- Feedback anchors:
feedback_dual_cli_lane_atomic,
feedback_completion_gate_minimal,
feedback_time_machine_query_design
Scope (this PR)
Schema + ETL + tests only. Pure additive. No DML on existing tables.
The REST surface (/v1/query?as_of=..., /v1/evaluate/counterfactual)
is NOT wired by this PR — that lands in a follow-up wave once the
audit log is in place.
Migration 277 (~140 LOC across forward + rollback)
| File |
LOC |
Notes |
scripts/migrations/277_time_machine.sql |
113 |
forward |
scripts/migrations/277_time_machine_rollback.sql |
27 |
rollback |
| subtotal |
140 |
matches "~50 LOC" target across both files for the SQL portion only (forward = 113 incl. comments; pure DDL ≈ 38 LOC) |
Tables
am_monthly_snapshot_log — (as_of_date, table_name) UNIQUE,
row_count + sha256 fingerprint, 5-year retention via batch --gc.
am_counterfactual_eval_log — append-only (eval_id, as_of_date,
query, counterfactual_input JSON 8 KiB cap, result_diff JSON 8 KiB
cap).
- View
v_monthly_snapshot_latest — most recent snapshot per table.
CHECK constraints (snapshot design verify)
| Constraint |
Purpose |
length(as_of_date) = 10 |
enforce YYYY-MM-DD shape |
length(sha256) = 64 |
enforce sha256 hex digest |
row_count >= 0 |
non-negative count |
length(counterfactual_input) <= 8192 |
8 KiB cap on input envelope |
length(result_diff) <= 8192 |
8 KiB cap on diff envelope |
All five enforced by tests
test_check_as_of_date_length, test_check_sha256_length,
test_check_counterfactual_input_cap.
ETL (~310 LOC)
| File |
LOC |
Notes |
scripts/etl/build_monthly_snapshot.py |
307 |
monthly batch |
Behaviour:
--as-of YYYY-MM-DD (default = first of current UTC month).
- For each table in
_SNAPSHOT_TABLES
(am_amendment_snapshot, am_program_history, am_law_jorei,
am_cross_source_agreement) compute deterministic sha256 over
ordered canonical rows and upsert one audit row keyed by
(as_of_date, table_name).
--dry-run: plan only, no writes.
--gc: drop rows whose as_of_date < today - 5y (60-snapshot window).
- JSON report on stdout (
{dim: "Q", wave: 47, snapshots: [...]}).
- No LLM SDK import, no aggregator fetch.
Tests (~340 LOC)
| File |
LOC |
Cases |
tests/test_dim_q_time_machine.py |
342 |
13 |
Coverage:
- Mig 277 applies clean on fresh SQLite.
- Mig 277 is idempotent (2nd apply = no-op).
- Mig 277 rollback drops every artefact (tables, view, indexes).
- CHECK
as_of_date length rejected.
- CHECK
sha256 length rejected.
- CHECK
counterfactual_input 8 KiB cap rejected.
- Snapshot batch upserts deterministically + 2nd run = noop.
--dry-run writes nothing.
--gc drops rows older than 5y, keeps inside-window rows.
- jpcite boot manifest lists
277_time_machine.sql.
- autonomath boot manifest lists
277_time_machine.sql.
- No LLM SDK import in ETL/migration.
- No legacy brand (税務会計AI / zeimu-kaikei.ai) in any new file.
Bug-free verify (completion-gate minimal)
| Gate |
Result |
sqlite3 forward apply (in-memory) |
OK |
sqlite3 idempotent re-apply |
OK |
sqlite3 rollback drops all |
OK |
sqlite3 re-apply after rollback |
OK |
| Expected schema objects (2 tables + 1 view + 5 indexes + autoindex) |
9 / 9 |
pytest tests/test_dim_q_time_machine.py |
13 / 13 passed (1.43s) |
ruff check scripts/etl/build_monthly_snapshot.py tests/test_dim_q_time_machine.py |
All checks passed |
ruff format --check |
2 / 2 already formatted |
| Monthly snapshot fixture roundtrip (seed 2 rows, run, verify digest len=64, run again, observe noop) |
OK (test #7) |
Forbidden axes (verify)
- No existing table altered (additive only; the existing time-machine
index migration
wave24_180_time_machine_index.sql and
am_amendment_snapshot are untouched).
- No bulk snapshot data inserted (schema + log only).
- No work on the
main worktree (used a separate /tmp/... worktree).
- No
rm / mv of any pre-existing file.
- No legacy brand reference.
- No LLM SDK / API call (operator or kernel).
Total LOC
| Category |
LOC |
| Migration (fwd + rb) |
140 |
| ETL |
307 |
| Tests |
342 |
| Total new |
789 |
Out of scope (followup waves)
- REST surface
/v1/*?as_of=YYYY-MM-DD opt-in param.
- MCP tools
am_query_as_of / am_evaluate_counterfactual.
- Monthly cron workflow wiring (
.github/workflows/monthly-snapshot.yml).
- 60-month backfill execution.