Five Tools, Zero Overlap: The AI Dev Pipeline We Actually Ship With
Every week brings a new “game-changing” AI dev tool. Here’s how we picked five that actually work together without stepping on each other.
The signal: The AI developer tools space has gone from scarce to overwhelming in about six months. Superpowers just crossed 149k GitHub stars. Garry Tan open-sourced gstack and it shot to 70k. OpenAI acquired Astral to bet big on developer tooling. And yet most teams I talk to are drowning in options, not shipping faster. The problem isn’t finding good tools. It’s finding tools that don’t overlap, don’t conflict, and actually compose into something greater than the sum of their parts.
What caught my eye
The breakthrough wasn’t any single repo. It was realizing that the best AI dev tools each solve exactly one concern in the development lifecycle, and the real discipline is in how you combine them.
Here are the five tools and the five distinct concerns they address:
OpenSpec (39k stars) solves “what to build.” It’s a spec-driven development framework that enforces a strict three-phase state machine: proposal, apply, archive. Before any code gets written, you and your AI assistant agree on the spec. It works with 25+ coding assistants, so it’s not locked to a single tool. Think of it as the contract layer. No ambiguity about intent.
gstack (70k stars) solves “who reviews.” Garry Tan’s open-source Claude Code setup gives you 23 opinionated skills that act as CEO, Designer, Eng Manager, Release Manager, Doc Engineer, and QA. The plan-ceo-review command runs layered review automatically. It’s organizational discipline encoded as tooling. When you’re a small team wearing every hat, this is how you get the rigor of a full org without the headcount.
Superpowers (149k stars) solves “how to build.” The brainstorm-plan-TDD-review methodology that turned coding agents from reckless autocomplete into disciplined engineers. It forces a strict workflow: brainstorm before coding, create isolated git worktrees, write failing tests first, then implement. RED-GREEN-REFACTOR, enforced by the framework, not by willpower.
oh-my-claudecode (28k stars) solves “how to execute.” Multi-agent orchestration for Claude Code. When you have a complex feature that needs parallel work streams, this handles intelligent agent delegation, persistent execution, and cost optimization through smart model routing. It’s the difference between running one agent at a time and running a coordinated team.
Optio (864 stars) solves “how to ship.” Workflow orchestration from ticket to merged PR. You submit a task, Optio provisions an isolated environment, runs the agent, opens a PR, monitors CI, triggers code review, auto-fixes failures, and merges when everything passes. The feedback loop is what matters: when CI fails, the agent gets the failure context and tries again. When a reviewer requests changes, the agent picks up the comments and pushes a fix.
The star counts tell an interesting story. Superpowers and gstack are mainstream. OpenSpec is gaining fast. oh-my-claudecode fills a real gap for teams doing parallel agent work. Optio is early but solves the last-mile problem that everyone ignores: actually getting code merged without babysitting.
How we’re actually using this
I want to be honest about where we are with this. We’re not running all five tools on every feature at Navv Studio. We’re experimenting with the combination, testing which handoffs work smoothly and which still have friction.
The tools we’ve been running day-to-day are gstack, oh-my-claudecode, and claude-mem (52k stars). gstack’s layered review process has become central to how we work. For a team our size, having automated CEO/design/eng review perspectives catches things I’d normally miss because I’m context-switching between five different priorities. It’s not a replacement for real code review. But it’s a forcing function that makes sure we at least ask the right questions before shipping.
oh-my-claudecode handles the orchestration layer. When we’re building AutoKon features that touch multiple services, spinning up parallel agent work streams instead of running one agent at a time makes a real difference. The smart model routing also keeps costs sane, which matters when you’re a venture studio, not a BigCo with unlimited API credits.
claude-mem solves a problem that doesn’t show up in the five-concern model but matters just as much: context persistence. Every time you start a new Claude Code session, you lose everything the agent learned in the last one. claude-mem captures what happened, compresses it, and injects the relevant context back when you start fresh. It’s the difference between working with a colleague who remembers last week’s decisions and one who shows up every morning with amnesia.
Superpowers has been the backbone of our actual coding workflow for longer. The TDD discipline it enforces is genuinely useful. When you let an AI agent just write code without tests, you get code that looks right but breaks in subtle ways. When you force it through RED-GREEN-REFACTOR, you get code that provably works.
OpenSpec and Optio are the pieces we haven’t adopted yet but are watching closely. OpenSpec’s spec-driven approach is compelling, especially for features where the requirements are ambiguous. Optio’s ticket-to-merged-PR automation is promising, but we’re still validating it on smaller tasks before trusting it with anything critical.
The bigger pattern
Here’s what I think most people get wrong about AI dev tools: they evaluate them individually. “Is this tool good?” is the wrong question. The right question is: “Does this tool solve exactly one concern, and does it compose cleanly with the tools solving the other concerns?”
The five-concern model maps to real failure modes I’ve seen in every team that uses AI coding agents:
Built the wrong thing (no spec)
Shipped without review (no organizational discipline)
Code works in demo, breaks in production (no methodology)
Single-threaded execution bottleneck (no orchestration)
PR sits open for days, agent can’t self-correct (no shipping pipeline)
Each tool addresses exactly one failure mode. Zero overlap means zero conflict.
For venture builders and small teams in SEA, this matters even more. We can’t afford the luxury of dedicated QA teams, release engineers, and project managers. But we also can’t afford to ship broken code. The answer isn’t hiring. It’s encoding the discipline into your toolchain.
Builder’s shortcut: Start with just OpenSpec + Superpowers. The spec layer plus the TDD discipline gives you 80% of the value. Add the other tools as your workflow stabilizes and you feel the specific pain each one solves.
Tools mentioned:
OpenSpec - Spec-driven development framework (39k stars)
gstack - Garry Tan’s Claude Code review stack (70k stars)
Superpowers - Agentic skills framework and dev methodology (149k stars)
oh-my-claudecode - Multi-agent orchestration for Claude Code (28k stars)
claude-mem - Persistent memory across Claude Code sessions (52k stars)
Optio - Workflow orchestration from ticket to merged PR (864 stars)

