290 Commits in 5 Days: How AI Agents Cleared 10,000+ Quality Issues in CrewHub
Between Tuesday February 24 and Saturday February 28, 2026, the CrewHub repository received 290 commits. Not from a team of engineers pulling all-nighters. From dozens of isolated AI agent sessions, each doing one small job, pushing one branch, and terminating.
This post is an engineering retrospective on how that worked — the orchestration, the phases, the self-healing loop, and the honest mess that came with it.
The Starting Point
CrewHub went from v0.15 to v0.17 in weeks. Streaming chat, voice messages, 3D prop generation, mobile views — all shipping fast, all shipping without tests. The SonarQube dashboard told the story:
- 10,000+ quality issues across dozens of rules
- New code coverage at 22.1% against an 80% threshold
- Quality Gate: red. Embarrassingly red.
For a one-person team, fixing this manually wasn’t realistic. So we didn’t. We built a system that could fix it while we slept.
How the System Works
This is the most important part of this post. The results are impressive, but the mechanism is what makes it repeatable.
The Core Principle: Stateless, Focused Sessions
Every AI agent session gets exactly one job:
- Fix SonarQube rule S6759 across all frontend files, or
- Write tests for
MobileCreatorViewandMobileSettingsPanel, or - Run
eslint --fixon the new test files from last night
That’s it. The session doesn’t know about any other session. It doesn’t know 40 other agents ran before it. It doesn’t need to. It reads the relevant files, does its work, commits, pushes its branch, and exits.
This is the key insight: isolation eliminates coordination overhead. No merge conflicts. No shared state. No agent-to-agent communication. Each session is a pure function: codebase in, branch out.
Orchestration via OpenClaw Cron Jobs
OpenClaw manages the scheduling. A cron job fires at the designated time, spawning a new agent session with a prompt that defines:
- The target — which SonarQube rule or which modules to cover
- The branch name — unique per session, following a convention like
fix/sonar-S6759ortest/mobile-views - The exit criteria — “commit and push when done, don’t open a PR”
Sessions are staggered to avoid resource contention. Each gets its own isolated context. The orchestrator doesn’t track progress mid-session — it just fires and forgets.
The Morning Merge
Every morning, the human reviews what landed overnight:
- Check which branches were pushed
- Review the diffs (skim — the patterns are repetitive by design)
- Merge into main in sequence
- Run the full CI pipeline once
This is the human-in-the-loop moment. The AI does the volume work; the human validates the result. In practice, most branches merge cleanly because they touch different files.
The 4:30 AM Self-Healing Check
Here’s the part that makes the system adaptive. A conditional cron job runs at 4:30 AM:
if [ coverage < 80% ]; then
spawn 4 more targeted sessions
fi
The system checks its own progress. If the overnight sessions didn’t hit the threshold, it identifies the modules with the lowest coverage and spawns additional targeted sessions. No human intervention needed until morning.
In our case, the first four coverage sessions were enough. But the infrastructure exists for the nights when they’re not.
Why This Works
Traditional approaches to tech debt cleanup fail for predictable reasons:
- “Let’s all stop and fix it” — nobody wants to, momentum dies
- “One big PR” — merge conflicts, review fatigue, context overload
- “Gradually chip away” — never happens, features always win
The agent approach sidesteps all of this:
- No developer time spent — it runs overnight
- No big PRs — dozens of small, focused branches
- No coordination — each session is independent
- No willpower required — it’s a cron job, not a commitment
Phase 1: SonarQube Quality Gate Cleanup (139 Commits)
The first phase targeted SonarQube rules directly. Each AI session received one rule ID and a list of affected files. The session would read the rule description, understand the fix pattern, and apply it systematically across all flagged locations.
Here’s what was cleaned up, rule by rule:
| Rule | Issues Fixed | What It Was |
|---|---|---|
| S6759 | 610 | React prop interfaces not marked readonly |
| S6479 | 422 | Array index used as React key — replaced with stable IDs |
| S3358 | 246 | Nested ternary expressions — extracted to named variables or early returns |
| S8415 | 260 | FastAPI HTTPException missing documented status codes |
| S8410 (BLOCKER) | 85 | FastAPI Depends() missing proper type hints |
| S7748/S7781/S7723 | 319 | Combined: optional chaining, nullish coalescing, logical assignment |
| S7735/S7747 | 165 | Negated conditions in if/else — flipped for readability |
| S5145 | 44 | User input passed directly to logging — sanitized |
| S1082/S6848 | ~100 | Missing keyboard event handlers + ARIA roles on interactive elements |
| S6853 | 43 | Form inputs missing associated labels |
| S2933/S6819 | 125 | Class members not marked readonly + <img> tags missing alt attributes |
| S3776 | Multiple batches | Cognitive complexity reduction — extracted helper functions, simplified control flow |
Each rule got its own session (sometimes multiple sessions for high-count rules like S6759 and S6479). Each session pushed its own branch. Total: 139 commits across the week.
The pattern was remarkably consistent. An agent fixing S6759 would:
- Get a list of files with violations
- Open each file
- Find every
interfaceortypedefining React props - Add
readonlymodifiers - Run the type checker to confirm nothing broke
- Commit and push
Multiply that by a dozen rules. That’s Phase 1.
Phase 2: The Coverage Sprint (82 New Test Files, 23,000+ Lines)
With the Quality Gate rules handled, the next problem was coverage. Four overnight sessions, each targeting a different area:
Night 1: Backend Connections + Prop Services
The integration layer — modules managing connections to external AI providers. claude_code, codex, device_identity, and the prop services pipeline. Complex async behavior with WebSocket handling and retry logic. The agent wrote pytest suites covering happy paths, error handling, timeout scenarios, and reconnection logic.
Night 2: Mobile Views
Frontend mobile components: MobileCreatorView, MobileSettingsPanel, MobileAgentChat, and layout components. Vitest + React Testing Library. The agent mocked the API layer and wrote behavior tests — user interactions, state transitions, conditional rendering. Not just “does it render” but “does it do the right thing when the user taps X.”
Night 3: Onboarding + Generators
Two areas in one session. Frontend: the onboarding wizard and developer tools. Backend: the generator modules — multi_pass, style_transfer, prop_generator — the creative engine that orchestrates multiple AI calls to produce 3D props. Complex async chains with streaming responses, requiring careful mock setup for each stage of the pipeline.
Night 4: World3D
The hardest target. FullscreenPropMaker, GridRoomRenderer, FirstPersonController. Three.js and React Three Fiber components — canvas mocking, WebGL stubs, animation frame handling. This session also ran second passes on modules from earlier nights where coverage was still below threshold.
Result: 82 new test files, 23,000+ lines of test code. All tests passing. Coverage clearing the 80% threshold.
Phase 3: The Cleanup (The Honest Part)
Here’s where we stop pretending everything was clean.
The AI agent sessions that wrote 82 test files did not have our ESLint or ruff configs in their context. They wrote code that passed the test runner but not the linter. The aftermath:
- 615 ESLint issues — unused imports,
anytypes where proper mock types should have been, inconsistent formatting, missing return type annotations - 14 ruff errors — import ordering violations, style inconsistencies
This is the honest trade-off. AI-generated code is functional, systematic, and thorough. It is also sloppy about style. The agents optimized for “tests pass” not “linter is happy.”
We spawned one more AI session focused entirely on lint cleanup. npx eslint --fix and ruff check --fix handled the bulk automatically. The session committed the fixes and pushed. The remaining handful of issues needed human judgment — mostly cases where the any type needed a real interface definition.
Lesson for next time: include lint configs in the agent context from the start. Those 615 issues were entirely avoidable.
The Numbers
| Metric | Value |
|---|---|
| Total commits | 290 |
| Calendar days | 5 (Tue–Sat) |
| SonarQube rules addressed | 15+ |
| Individual issues fixed | 10,000+ |
| New test files | 82 |
| Lines of test code | 23,000+ |
| ESLint issues cleaned up | 615 |
| Ruff errors cleaned up | 14 |
| Coverage before | 22.1% |
| Coverage after | >80% |
| Quality Gate | 🔴 → 🟢 |
| Developer hours spent | ~4 (setup + morning reviews) |
What We’d Do Differently
Lint configs in context. The 615-issue cleanup was a self-inflicted wound. Every agent session should have the project’s lint configuration from the start.
Smaller batches for high-count rules. S6759 (610 issues) worked fine in a single session, but it pushed the context window. Two sessions of 300 would have been safer.
Dry-run the self-healing check. We set up the 4:30 AM conditional spawn but only tested it in production. Should have done a dry run the night before.
Second passes from the start. Night 4 included second-pass work on earlier modules. Planning for two-pass coverage from the beginning would have produced cleaner results.
The Meta-Point
This is exactly what CrewHub is built to enable — orchestrating AI agents for systematic work. We used our own tooling to fix our own technical debt. The pattern is not specific to test coverage or SonarQube. It applies to any maintenance work that’s:
- Repetitive — the same fix pattern across many files
- Systematic — clear rules, not judgment calls
- Parallelizable — different files, no shared state
- Low-risk — a bad test is easy to delete, unlike a bad migration
Documentation generation. Dependency updates. Accessibility fixes. Security audit remediation. Migration scripts. Any of these could use the same approach: focused sessions, isolated branches, overnight execution, morning review.
The machines did the overnight shift. The Quality Gate is green. And the pattern is ready for next time.
CrewHub is open source under AGPL-3.0. Join the Discord or view on GitHub.