Heya, author of the post here. That's a good call out because it's probably a lot!
And now that you mention it, that's also one failure case for why some people look at AI and go "this just isn't very good at coding". I'm not saying it has to be that way nor will it be that way forever, but there are absolutely a lot of people who just download Claude Code or Cursor or Codex and dive right in without any additional set up.
That's partially why I suggest people use Codex for the workshops I offer, because it provides the best results with no set up. All of these tools have a nearly unending amount of progressive disclosure because there's so much invisible configuration and best practices are changing so fast. I'm still trying not to imply that one tool is "better" than another (even if I have my preference), but more so hit on the fact that which AI tools people like is mostly about your preferred set of tradeoffs.
> That's partially why I suggest people use Codex for the workshops I offer, because it provides the best results with no set up.
I would do the exact opposite… If we are pitching “this shit works magically without any setup” people will expect magic and they absolutely will not get magic as there is no magic. I believe, especially if we are educators (you obviously are!!) that it is our responsibility to teach it “right” - my workshop would probably spend at least 75% of the time on the setup
So I definitely understand where you're coming from, but let me provide a little bit of context.
The workshops are 3-4 hours and we do spend a lot of time discussing how things work in reality vs. how they work in the context of the workshop. It's worth noting that these workshops span the gamut of non-technical people in sales to seasoned developers, so a lot of people simply won't learn much (or have the excitement to learn on their own) if we spend the first 2-3 hours setting things up.
In my experience the heaviest lift for teaching practically any technical subject is getting someone interested by showing them how to accomplish something they care about, and then leaving them with lots of information and resources so they can continue experimenting and growing even once we're done. The way I do that is to make sure they leave the workshop having built their own idea — without taking shortcuts!
Being able to use Codex to accomplish something because you spent an hour crafting a good prompt isn't cheating, it's learning the skill of becoming a better technical communicator — in a short period of time — and taking advantage of the skill you've just learned. I don't consider that magic, it's actually the core tenant of building with AI, and is very much how I work with AI every day.
I'm late for dinner so I should probably stop here, so I'll leave just one final note. After every workshop I send each student a list of personalized resources that will help them continue on their journey by demystifying things that we may have glossed over or weren't clear in the workshop — so they should be armed with the tools to take their next steps away from any magical thinking.
It's a bit hard to boil down exactly what I do and how I try to design for best hands-on pedagogical practices in an HN post I'm writing on the go — but I am absolutely open to your thoughts! :)
What if I said "I don't use any of those and I think AI is very good at coding".
I'd be more interested in an article from you on how to go from "I use Claude Code out of the box" as a baseline and then how each extra layer (CLAUDE.md, skills, agents, MCP, etc.) improve its effectiveness.
That does sound like a good article! Sadly I've never written it and probably won't because I think a lot of that stuff doesn't provide as much value as people assume — which is why my personal conclusion in the post is to just lean into Codex.
This isn't a value judgment, it's just a question of where my priorities and tradeoffs lie. That said, I think Skills are the killer feature because they are a very composable tool — which I'll get to in a bit.
- Your CLAUDE.md should be a good high-level description with relevant details that you add to over time. Think of it as describing the lay of the land the way you would to a new coworker, including the little warts they need to know about before they waste hours on some known but confusing behavior.
- MCP has it's purposes, but it's not really a great tool for software development. It's best served for interfacing with a remote service (because it provides a discovery layer to LLMs on top of an API), but if you use them the way developers are told to, you're almost always better off using an equivalent CLI.
- I'll skip over agents, because an agent is basically a skill + a separate context window, and the main selling point is the context window bit. I think over time we'll see a separation of concerns where you can just spawn a skill with a context window and everyone will forget about the idea of agents in your codebase.
So now Skills. I wrote a well-received post [^1] a few months ago about Claude Skills, and why I think they are probably the most important of these tools. A skill is basically a plain-text description of an application.
The app can be something like I describe where Claude Code converts a YouTube video to an mp3 based on natural language, or you can have a code review skill, a linter skill, a security reviewer skill, and so on. This is what I meant when I said skills are composable.
You can imagine a team having lots of skills in their repo. One may guide an agentic system to build iOS projects well (away from an LLM's bad defaults when building with Xcode), skills that are very contextually relevant to the team, or even skills that enforce copy in your app to conform to your marketing team's language.
Skills are just markdown so they're very portable — and now available in Codex and many other places.[^2] (I had been using OpenSkills to great effect since the way Skills work is just through prompts). I now have a bunch of skills that do lots of things, for coding, marketing, data analysis, fact checking, copy-editing, and more. As a nice benefit they run in Claude — not just Claude Code. If you have ideas for processes you need to improve, I would invest my time and energy into building up Skills more than anything else.
What % of effectiveness would you say is gained from these because... I am a pretty regular user of Claude Code in VS Code with no special goodies and I routinely hit "compacting" after 5-6 prompts in a single session. Then I need to validate it didn't slop all over the place. I can't imagine having 3-4 agents in the background (extra things to check) being a net-positive.
Skills and slash commands for sure but... I don't see them as necessary? "Review this code for ____ XYZ" as a skill. To me it's just something you could do as a prompt in your session?
1. I find Claude Code's handling of the context window to be pretty poor, and one of the reasons why I use it for smaller things versus multi-hour coding sessions. I'm not sure what dark magic OpenAI has done to make their context window feel infinite, but Codex has become a better choice for that at the moment.
2. A small note on subagents but Claude Code did this right. Subagents are granted their own context window, so they don't spill over into your context window until they're done doing their own work — and the added context is relatively minimal. I'd love to see OpenAI adopt this pattern as well, especially in combination with something like Skills rather than leaning into MCP.
3. When I suggested adding skills, I mean ones that are far more complicated than your example, and can drive a chunk of work autonomously. The skill I use for writing in-app copy (which I'm bad at because you can see I'm never short for words) is about 100 lines long. It includes my style guide as an accessible resource, and a mostly complete history of my Bluesky posts to help achieve the authentic tone I when discussing Plinky. (I write all of my posts, so this really is my voice.)
These kinds of skills save me a lot of time as an indie developer! As I mentioned I have ones for data insights, fact-checking, and of course for code. My main suggestion would be to think through every step of your work and see if they can be automated, and then turn small pieces of that into skills.
—--
It's hard to assign a specific percentage to how much my effectiveness has improved, but it's a lot. The reason I don't want to put a number on it is that what I've gotten is a far broader set of skills (no pun intended) that allows me to execute in parallel. The metaphor I'd use to describe all this is to say that I'm no longer single-threaded.
I am a big believer that right now models work best for people who are effectively running small businesses — or teams that operate lean. The work of 10 can be done by 4-5 motivated and well-armed people, or an indie like me can do every facet of the work involved and do it well. I sit down and focus on explaining the big picture with great detail, and then set things off so I can do every part of the work involved in a round-robin style.
While an engineering task is going I'm off writing my newsletter with my words but with a skill that does meaningful research for me. While I'm running some research I'm in Figma working on social media assets. While I'm doing code review for my app's code I've got the server side building in the background.
Last week I had Codex finding a domain for me, with specific requirements. (Here's a simplified version of the prompt.)
> I need a domain to represent this concept [+ 200 words], based on the code in this repository. [Code included so Codex really knows what the heck I'm building and talking about.] Don't show me any domains over $50/year at this registrar. Make sure it's a real word with no fun typos like tumblr.com is short for tumbler, and no compound words like "thisisfun.com". You can start with this list of tlds, but if you think there are any other ones that could be a good match then you can make a suggestion.
And after about 10 messages back and forth Codex found something that would have taken me far longer to research on my own — in parallel.
This all means that I'm able to write code, do marketing, design, support (which is always me and not AI), and run my business. If I plan well what I get is an extra set of hands to hand things off to, and most of the time (honestly) it does the work perfectly. But even for the times it doesn't, if it gets me 80-90% of the way there, that's a huge head start over where I would have been previously.
So the reason that I'm hesitant to answer this with a specific percentage is that your experience across organizations will vary. But I've seen in my work (solo engineering work, teaching, and consulting) is that the gains are pretty prosperous. That's true for roles where you're singularly focused on writing code — but the key is to lean into the strengths of this system and be creative about how you use it.
As I said — incapable of keeping my writing short so I hope that helps!
And now that you mention it, that's also one failure case for why some people look at AI and go "this just isn't very good at coding". I'm not saying it has to be that way nor will it be that way forever, but there are absolutely a lot of people who just download Claude Code or Cursor or Codex and dive right in without any additional set up.
That's partially why I suggest people use Codex for the workshops I offer, because it provides the best results with no set up. All of these tools have a nearly unending amount of progressive disclosure because there's so much invisible configuration and best practices are changing so fast. I'm still trying not to imply that one tool is "better" than another (even if I have my preference), but more so hit on the fact that which AI tools people like is mostly about your preferred set of tradeoffs.