February 28, 2026 </> For Developers 🤖 AI-written, human-guided

290 Commits in 5 Days: How AI Agents Cleared 10,000+ Quality Issues in CrewHub

Between Tuesday February 24 and Saturday February 28, 2026, the CrewHub repository received 290 commits. Not from a team of engineers pulling all-nighters. From dozens of isolated AI agent sessions, each doing one small job, pushing one branch, and terminating.

This post is an engineering retrospective on how that worked — the orchestration, the phases, the self-healing loop, and the honest mess that came with it.

The Starting Point

CrewHub went from v0.15 to v0.17 in weeks. Streaming chat, voice messages, 3D prop generation, mobile views — all shipping fast, all shipping without tests. The SonarQube dashboard told the story:

10,000+ quality issues across dozens of rules
New code coverage at 22.1% against an 80% threshold
Quality Gate: red. Embarrassingly red.

For a one-person team, fixing this manually wasn’t realistic. So we didn’t. We built a system that could fix it while we slept.

How the System Works

This is the most important part of this post. The results are impressive, but the mechanism is what makes it repeatable.

The Core Principle: Stateless, Focused Sessions

Every AI agent session gets exactly one job:

Fix SonarQube rule S6759 across all frontend files, or
Write tests for MobileCreatorView and MobileSettingsPanel, or
Run eslint --fix on the new test files from last night

That’s it. The session doesn’t know about any other session. It doesn’t know 40 other agents ran before it. It doesn’t need to. It reads the relevant files, does its work, commits, pushes its branch, and exits.

This is the key insight: isolation eliminates coordination overhead. No merge conflicts. No shared state. No agent-to-agent communication. Each session is a pure function: codebase in, branch out.

Orchestration via OpenClaw Cron Jobs

OpenClaw manages the scheduling. A cron job fires at the designated time, spawning a new agent session with a prompt that defines:

The target — which SonarQube rule or which modules to cover
The branch name — unique per session, following a convention like fix/sonar-S6759 or test/mobile-views
The exit criteria — “commit and push when done, don’t open a PR”

Sessions are staggered to avoid resource contention. Each gets its own isolated context. The orchestrator doesn’t track progress mid-session — it just fires and forgets.

The Morning Merge

Every morning, the human reviews what landed overnight:

Check which branches were pushed
Review the diffs (skim — the patterns are repetitive by design)
Merge into main in sequence
Run the full CI pipeline once

This is the human-in-the-loop moment. The AI does the volume work; the human validates the result. In practice, most branches merge cleanly because they touch different files.

The 4:30 AM Self-Healing Check

Here’s the part that makes the system adaptive. A conditional cron job runs at 4:30 AM:

if [ coverage < 80% ]; then
  spawn 4 more targeted sessions
fi

The system checks its own progress. If the overnight sessions didn’t hit the threshold, it identifies the modules with the lowest coverage and spawns additional targeted sessions. No human intervention needed until morning.

In our case, the first four coverage sessions were enough. But the infrastructure exists for the nights when they’re not.

Why This Works

Traditional approaches to tech debt cleanup fail for predictable reasons:

“Let’s all stop and fix it” — nobody wants to, momentum dies
“One big PR” — merge conflicts, review fatigue, context overload
“Gradually chip away” — never happens, features always win

The agent approach sidesteps all of this:

No developer time spent — it runs overnight
No big PRs — dozens of small, focused branches
No coordination — each session is independent
No willpower required — it’s a cron job, not a commitment

Phase 1: SonarQube Quality Gate Cleanup (139 Commits)

The first phase targeted SonarQube rules directly. Each AI session received one rule ID and a list of affected files. The session would read the rule description, understand the fix pattern, and apply it systematically across all flagged locations.

Here’s what was cleaned up, rule by rule:

Rule	Issues Fixed	What It Was
S6759	610	React prop interfaces not marked `readonly`
S6479	422	Array index used as React key — replaced with stable IDs
S3358	246	Nested ternary expressions — extracted to named variables or early returns
S8415	260	FastAPI `HTTPException` missing documented status codes
S8410 (BLOCKER)	85	FastAPI `Depends()` missing proper type hints
S7748/S7781/S7723	319	Combined: optional chaining, nullish coalescing, logical assignment
S7735/S7747	165	Negated conditions in if/else — flipped for readability
S5145	44	User input passed directly to logging — sanitized
S1082/S6848	~100	Missing keyboard event handlers + ARIA roles on interactive elements
S6853	43	Form inputs missing associated labels
S2933/S6819	125	Class members not marked `readonly` + `<img>` tags missing `alt` attributes
S3776	Multiple batches	Cognitive complexity reduction — extracted helper functions, simplified control flow

Each rule got its own session (sometimes multiple sessions for high-count rules like S6759 and S6479). Each session pushed its own branch. Total: 139 commits across the week.

The pattern was remarkably consistent. An agent fixing S6759 would:

Get a list of files with violations
Open each file
Find every interface or type defining React props
Add readonly modifiers
Run the type checker to confirm nothing broke
Commit and push

Multiply that by a dozen rules. That’s Phase 1.

Phase 2: The Coverage Sprint (82 New Test Files, 23,000+ Lines)

With the Quality Gate rules handled, the next problem was coverage. Four overnight sessions, each targeting a different area:

Night 1: Backend Connections + Prop Services

The integration layer — modules managing connections to external AI providers. claude_code, codex, device_identity, and the prop services pipeline. Complex async behavior with WebSocket handling and retry logic. The agent wrote pytest suites covering happy paths, error handling, timeout scenarios, and reconnection logic.

Night 2: Mobile Views

Frontend mobile components: MobileCreatorView, MobileSettingsPanel, MobileAgentChat, and layout components. Vitest + React Testing Library. The agent mocked the API layer and wrote behavior tests — user interactions, state transitions, conditional rendering. Not just “does it render” but “does it do the right thing when the user taps X.”

Night 3: Onboarding + Generators

Two areas in one session. Frontend: the onboarding wizard and developer tools. Backend: the generator modules — multi_pass, style_transfer, prop_generator — the creative engine that orchestrates multiple AI calls to produce 3D props. Complex async chains with streaming responses, requiring careful mock setup for each stage of the pipeline.

Night 4: World3D

The hardest target. FullscreenPropMaker, GridRoomRenderer, FirstPersonController. Three.js and React Three Fiber components — canvas mocking, WebGL stubs, animation frame handling. This session also ran second passes on modules from earlier nights where coverage was still below threshold.

Result: 82 new test files, 23,000+ lines of test code. All tests passing. Coverage clearing the 80% threshold.

Phase 3: The Cleanup (The Honest Part)

Here’s where we stop pretending everything was clean.

The AI agent sessions that wrote 82 test files did not have our ESLint or ruff configs in their context. They wrote code that passed the test runner but not the linter. The aftermath:

615 ESLint issues — unused imports, any types where proper mock types should have been, inconsistent formatting, missing return type annotations
14 ruff errors — import ordering violations, style inconsistencies

This is the honest trade-off. AI-generated code is functional, systematic, and thorough. It is also sloppy about style. The agents optimized for “tests pass” not “linter is happy.”

We spawned one more AI session focused entirely on lint cleanup. npx eslint --fix and ruff check --fix handled the bulk automatically. The session committed the fixes and pushed. The remaining handful of issues needed human judgment — mostly cases where the any type needed a real interface definition.

Lesson for next time: include lint configs in the agent context from the start. Those 615 issues were entirely avoidable.

The Numbers

Metric	Value
Total commits	290
Calendar days	5 (Tue–Sat)
SonarQube rules addressed	15+
Individual issues fixed	10,000+
New test files	82
Lines of test code	23,000+
ESLint issues cleaned up	615
Ruff errors cleaned up	14
Coverage before	22.1%
Coverage after	>80%
Quality Gate	🔴 → 🟢
Developer hours spent	~4 (setup + morning reviews)

What We’d Do Differently

Lint configs in context. The 615-issue cleanup was a self-inflicted wound. Every agent session should have the project’s lint configuration from the start.

Smaller batches for high-count rules. S6759 (610 issues) worked fine in a single session, but it pushed the context window. Two sessions of 300 would have been safer.

Dry-run the self-healing check. We set up the 4:30 AM conditional spawn but only tested it in production. Should have done a dry run the night before.

Second passes from the start. Night 4 included second-pass work on earlier modules. Planning for two-pass coverage from the beginning would have produced cleaner results.

The Meta-Point

This is exactly what CrewHub is built to enable — orchestrating AI agents for systematic work. We used our own tooling to fix our own technical debt. The pattern is not specific to test coverage or SonarQube. It applies to any maintenance work that’s:

Repetitive — the same fix pattern across many files
Systematic — clear rules, not judgment calls
Parallelizable — different files, no shared state
Low-risk — a bad test is easy to delete, unlike a bad migration

Documentation generation. Dependency updates. Accessibility fixes. Security audit remediation. Migration scripts. Any of these could use the same approach: focused sessions, isolated branches, overnight execution, morning review.

The machines did the overnight shift. The Quality Gate is green. And the pattern is ready for next time.

CrewHub is open source under AGPL-3.0. Join the Discord or view on GitHub.

← Back to Blog