How I use AI to build production features

May 20 • Wins

I’m not 100% done with the feature I’m currently working on, but I’m far enough along that I wanted to share how I’m using AI right now to help ship production features in a Rails / Turbo / Hotwire / Postgres web app.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

AI subs: chat Pro 100$ plan, Claude Max 100$ plan so both 5x

dev tools: vscode, codex app, chat 5.5 pro, claude code CLI in warp ( do I have to try out ghosty? One thing I really like about Claude in Warp is that I can paste images right away )

codebase agent accessibility helpers: a small Claude.md, a conventions.md and a rules folder with categorized conventions.

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The feature I’m currently building

Right now I’m working on a multi-tenant tracking integration for communities on our platform., so they can add meta pixel and CAPI as well as gtag for google.

Normally I start by brainstorming and talking with my colleagues about what creators actually need from the feature. After that, I look at examples from other companies and form my own thoughts around the product and technical direction.

Until a few days ago my next step was a chat with codex or claude code ( switched mostly to codex for a while ) and than from our discussion it generates a spec.md ( depends on the complexity of the feature I sometimes also went with an investigation phase and .md in the first place and than a plan.md afterwards.

If the feature includes new database tables or columns, a lot of the early discussion usually goes into data modeling and schema design.

I kind of agree with Thariq that large .md files can be hard to consume. To help with that, I often had Claude create Mermaid diagrams inside those markdown files, because diagrams are usually easier to consume than long text..

But for the current feature I tried using an HTML-based spec instead of markdown ( will attach a reference image ). I like the idea of replacing markdown for this use case, but I’m not 100% happy with the result yet.

I think this is probably one of those things where the output gets better as you improve the prompt and process each time.

The next step could be starting implementation right away with Codex, depending on the feature and whether a design already exists. If I need design inspiration, I really like ChatGPT’s new image generation for creating “wireframes”. Haven't tried out claude design at all yet, mostly because I heard it burns a lot of tokens. Do I have to try it?I’ll also add a reference image of what ChatGPT made.

My implementation workflow changed quite a bit over the last few weeks after switching more to Codex. Before, when I used Claude more, I had my own skills: ( /feature-spec -> /implement -> /verify (e.g. with chrome devtools mcp ) -> /index-check ). I also created a workflow.md with some conditions, like if it’s a quick fix, don’t use the full skills / workflow I mentioned above, but instead use some lighter process or other skills. So from a workflow perspective, it worked kind of great with Claude. But in terms of code quality, I felt better with Codex.

So I made the switch from Claude Code CLI in Warp to the Codex app, and one reason for that switch was the browser-use feature in Codex. Since I’m located in Europe, I don’t have access to Computer Use yet :(

So yeah for the process of implementation with codex now I don't use skills at all ( but they exist in codex right :D ) because I haven't looked up on that yet. Please tell me if I have to change that?

Before, when I worked with Claude, even with the 1M context window, I tried to start a new session every ~200k tokens max. A lot of folks, like Matt Pocock, also said that it gets dumber in the higher context areas.

But now with Codex, especially for this feature, I stayed in one “session” for the whole feature because of its auto-compacting. I haven’t heard much about whether Codex also gets dumber after a lot of auto-compacts. I could imagine it does, but my experience so far was actually pretty good. And yeah, right now I mainly focus on that one session and I’m not doing a lot of multi-agent / worktree things. Sometimes I spin up one or two Claude Code CLI sessions on the side to research something about the feature, or to quickly get complex console commands. But besides that, I’m mostly working on one feature at a time.

I could go way deeper into each step, but I don’t want to write a tooo long post here. Definitely happy to talk more about it and hear suggestions from you guys.

Last but not least, for the review part I use Codex subagents, as well as one or more Claude sessions to have another pair of eyes. And now with the ChatGPT Pro plan, I also use GPT-5.5 Pro for reviewing certain areas. Since it lives in ChatGPT and doesn’t have the whole codebase as context, I usually create a prompt / context together with Codex for ChatGPT and reference some files. As said, this is more to focus on specific parts / questions, while Claude and Codex are more for the “bigger picture”.

With that workflow, I ship a lot of features and still feel like I’m not giving too much control out of hand, but I am very open to discuss more and hear how others are approaching this.

Some questions I would have right away:

Is there a specific GPT-5.5 Pro message limit on the Pro plan, or is it basically unlimited?
Are “skills” a thing in Codex in the same way they are in Claude?
Can I connect ChatGPT / GPT-5.5 Pro directly to Codex so they can “discuss” with each other?
Any suggestions for multi-agent or worktree-based workflows?
Any other feedback, suggestions, or questions?