This blog post lacks almost any form of substance.
It could've been shortened to: Codex is more hands off, I personally prefer that over claude's more hands-on approach. Neither are bad. I won't bring you proof or examples, this is just my opinion based on my experience.
Heya, author here! Admittedly this was a quick blog post I fired off, much shorter than my usual writing.
My goal wasn't to create a complete comparison of both tools — but to provide a little theory a behavior I'm seeing. You're (absolutely) right that it's a theory not a study, and I made sure to state that in the post. :)
Mostly though the conclusion describes pretty succinctly why I wrote the post, as a way to get more people to try more of the tools so they can adequately form their own conclusions.
> I think back to coworkers I’ve had over the years, and their varying preferences. Some people couldn’t start coding until they had a checklist of everything they needed to do to solve a problem. Others would dive right in and prototype to learn about the space they would be operating in.
> The tools we use to build are moving fast and hard to keep up with, but we’ve been blessed with a plethora of choices. The good news is that there is no wrong choice when it comes to AI. That’s why I don’t dismiss people who live in Claude Code, even though I personally prefer Codex.
> The tool you choose should match how you work, not the other way around. If you use Claude, I’d suggest trying Codex for a week to see if maybe you’re a Codex person and didn’t know it. And if you use Codex, I’d recommend trying Claude Code for a week to see if maybe you’re more of a Claude person than you thought.
> Maybe you’ll discover your current approach isn’t the best fit for you. Maybe you won’t. But I’m confident you’ll find that every AI tool has its strengths and weaknesses, and the only way to discover what they are is by using them.
Hey! Didn't mean my comment negatively towards you in any way, though I now realize it might've come across as such. Blogs with opinions based on experiences alone are absolutely fine, thanks for sharing.
What I did mean is to indicate that your blog felt like a HN comment to me, where I generally expect a HN link to be news or facts that subsequently spark a discussion.
At the end of your post I guess I was hoping or expecting facts or examples, indicating it was engaging enough to read to the end.
No problem at all! I read it as a bit pithy, but I didn’t think it was particularly mean spirited.
If you check out my writing on build.ms and fabisevi.ch you’ll see that the majority of it is meant to be evergreen observations of a concept or a moment in time. My goal is to make people think and to think about thinking, more than it is to tell people what exactly to think.
If I had to summarize my style in one sentence, it would walking people to and around an idea, and leaving the rest as an exercise to the reader. Naturally, this means I have less control over how people interpret my writing so I do try and cover my bases with fact and experience, but that still means sometimes I won’t deliver a complete picture to everyone.
In that case, sometimes I come to a place like HN or Bluesky or Mastodon where my post is being discussed and try add some perspective and clarity through constructive conversation. :)
If I’m being honest, I think we’re too early in the state of generative AI as a coding tool to draw very strong factual conclusions for many of our experiences using AI to code that will hold up well. I’m not implying it’s all vibes, but I think it would be pretty hard to wrap up my post in a bow the way you’re suggesting. On the other hand I’m always open to well-considered feedback — and would love to know more about your experience if you’re interested in sharing!
That’s a long way of saying happy holidays to you as well!
Most of my AI coding experience is through Github Copilot (GHCP), mostly because that is available to me professionally. GHCP has improved greatly over the past half year in my opinion. I do use it a lot, burning up my enterprise allowance almost every month working on complex python codebases.
When it comes to models in GHCP, I vastly prefer Claude over Codex. It's not that Codex is bad, it just feels tonedeaf to me. It writes code in its own preferred style and doesn't adjust to the context of the codebase. Additionally, for me, Sonnet and Opus are much less prone to getting stuck in loops for longer or more complex agentic tasks.
I do like Codex for review tasks. When I'm working on something complex, both planning and implementation, I frequently ask Codex to review Claude's work, and it does a good job at that, frequently catching a mistake or coming up with a different angle.
I've toyed with kilocode, cline and the related forks through Claude Opus 4.5 API, but I'd argue my experience with Claude Sonnet/Opus through Copilot has just been... better. More consistent. Faster.
Sometimes I code with local models, when I'm working on highly confidential projects or data. Prefer GPT-OSS 20b or Qwen3-coder-30b then, but without an agentic harness as prompts get big and slow.
I would find it a nice read to work a case and see two models/harnesses duke it out, see whether it matches your expectations and gut feeling.
It’s funny because my use of Claude Code is the opposite. I use slash commands with instructions to find context, and basically never interact with it while it is doing its thing.
instruct it to stop and ask something sometimes when it is doing its thing. it is one of my core instructions at every level of its memory. if instructed, it will stop when it feels like should stop and in my personal experience it is suprisingly good at stopping. I’ve read here a lot of people having a different experience and opting for smaller tasks instead though…
My question was misleading. For me Claude Code appears sometimes to stop too often at a random point instead to ask instead of keeping going. I guess that is the point of the linked article that Codex works differently in this regard.
> Codex is more hands off, I personally prefer that over claude's more hands-on approach
Agree, and it's a nice reflection of the individual companie's goals. OpenAI is about AGI, and they have insane pressure from investors to show that that is still the goal, hence codex when works they could say look it worked for 5 hours! Discarding that 90% of the time it's just pure trash.
While Anthropic/Boris is more about value now, more grounded/realistic, providing more consistent hence trustable/intuitive experience that you can steer. (Even if Dario says the opposite). The ceiling/best case scenario of a claude code session is a bit lower than Codex maybe, but less variance.
Well, if you had tried using GPT/Codex for development you would know that the output from those 5 hours would not be 90% trash, it would be close to 100% pure magic. I'm not kidding. It's incredible as long as you use a proper analyze-plan-implement-test-document process.
It could've been shortened to: Codex is more hands off, I personally prefer that over claude's more hands-on approach. Neither are bad. I won't bring you proof or examples, this is just my opinion based on my experience.