If you think AI is ready to one-shot complex systems, you’re wrong. Unless, of course, you think wrapping a Todo list in FastAPI and calling it an “Asynchronous Event-Driven Agent” counts as production engineering.
Take a look at this triage report. Ready for this?
Even after the most sophisticated AI coder in the world took a stab at this atomic task, and after nearly 90 deterministic checks across runtime, pre-commit, and CI… our Bot Triage (copilot is a little grump today) still flagged 9 findings on the diff.
The Good, the Bad, and the Ugly.
The Ugly
That’s how we opened this post. The false sense of security masked in “revolutionary” vernacular.
The Bad
I have a few nits in here that could have been picked up with more aggressive pre-commit hooks. We’ll tune the rails and move on.
The Good
6 of those 9 findings were pure logic nits. We’re talking about zone-boundary violations, non-worker role enforcement, and missing ownership guards.
KEY POINT: These are architectural gaps that cannot be “shifted left.” You can’t lint for a logic mismatch against acceptance criteria that only exists in the context of the feature’s intent.
The Result
In a standard workflow where your developers use AI under this kind of governance, the “grunt work” is gone. As the human reviewer, you aren’t wasting cycles on type errors or variable shadowing. You are focused on Senior-level architectural integrity.
Our system ships 30+ PRs per day because not because we “one-shot.” Rather we trust highly tuned guardrails. Our governance rails are available for you to leverage, and yes, it supports true multi-agent orchestration …. not the “weekend warrior” version.
Bot finding triage — 9 findings (5 HIGH, 2 MEDIUM, 2 LOW)
| # | Source | Finding | File |
|---|---|---|---|
| 1 | Copilot | Test name/assertion mismatch: blocks writes by non-worker roles test passes taskConfig=null and asserts allow. It doesn’t test role blocking — rename to allows writes when no worker task config (fail-open) or fix the test to actually test role enforcement. |
vscode-entry-points.test.js |
| 2 | Copilot | Pre-hook skips non-worker enforcement: checkVscodePre() returns { continue: true } for non-worker roles. #3141 acceptance criteria requires blocking writes by non-worker roles. |
vscode-pre.js |
| 3 | Copilot | Pre-hook only runs zone-boundary: checkVscodePre() only executes zone-boundary check even though MatcherRouter’s write chain includes budget/monolith/hardcode/test-gate. Either run the full chain or update the docs to reflect the actual scope. |
vscode-pre.js |
| 4 | Copilot | Post-hook missing monolith/hardcode: checkVscodePost() only runs checkBudget(). #3141 acceptance criteria also requires monolith violation and hardcoded secret detection. |
vscode-post.js |
| 5 | Copilot | Stop hook missing test-gate: checkVscodeStop() only checks receipt token but #3141 requires stop test-gate (block exit if source written without tests). |
vscode-stop.js |
| # | Source | Finding | File |
|---|---|---|---|
| 6 | Copilot | Stop hook missing session ownership guard: loadTaskConfig() loads singleton task config without checking session ownership — could incorrectly block unrelated sessions. |
vscode-stop.js |
| 7 | Copilot | Missing post-hook tests: Tests only cover budget enforcement, not monolith/hardcode paths. | vscode-entry-points.test.js |
| # | Source | Finding | File |
|---|---|---|---|
| 8 | Copilot | Variable shadowing: module-level checks import shadowed by local const checks = routeToChecks(...). Rename inner to routedChecks. |
vscode-pre.js |
| 9 | CodeQL | Unused variable task in role-blocking test. Remove it. |
vscode-entry-points.test.js |