Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I wonder if people who say LLMs are a smart junior programmer have ever used LLMs for coding or actually worked with a junior programmer before. Because for me the two are not even remotely comparable.

If I ask Claude to do a basic operation on all files in my codebase it won't do it. Half way through it will get distracted and do something else or simply change the operation. No junior programmer will ever do this. And similar for the other examples in the blog.



Right, that is their main limitation currently - unable to consider the full system context when operating on a specific feature. But you must work with excellent juniors (or I work with very poor ones) because getting them to think about changes in the context of the bigger picture is a challenge.


This is definitely a huge factor I see in the mistakes. If I hand an LLM some other parts of the codebase along with my request so that it has more context, it makes less mistakes.

These problems are getting solved as LLMs improve in terms of context length and having the tools send the LLM all the information it needs.


Yep. My usual sort of conversation with an LLM is MUCH worse than a junior developer...

Write me a parser in R for nginx logs for kubernetes that loads a log file into a tibble.

Fucks sake not normal nginx logs. nginx-ingress.

Use tidyverse. Why are you using base R? No one does that any more.

Why the hell are you writing a regex? It doesn't handle square brackets and the format you're using is wrong. Use the function read_log instead.

No don't write a function called read_log. Use the one from readr you drunk ass piece of shit.

Ok now we're getting somewhere. Now label all the columns by the fields in original nginx format properly.

What the fuck? What have you done! Fuck you I'm going to just do it myself.

... 5 minutes later I did a better job ...


It's a machine, use it like one


Yeah I did. I used the text editor to assemble the libraries into something that worked.


I mean, I'd never expect a junior to do better for such a highly specific task.

I expect I'd have to hand feed them steps, at which point I imagine the LLM will also do much better.


Except for the first paragraph, I couldn’t tell if you were talking to an incompetent junior or an LLM.

I expected the lack of breadth from the junior, actually.


I swear at the junior programmers less.

To be fair the guys I get are pretty good and actually learn. The model doesn't. I have to have the same arguments over and over again with the model. Then I have to retain what arguments I had last time. Then when they update the model it comes up with new stupid things I have to argue with it on.

Net loss for me. I have no idea how people are finding these things productive unless they really don't know or care what garbage comes out.


> the guys I get are pretty good and actually learn. The model doesn't.

Core issue. LLMs never ever leave their base level unless you actively modify the prompt. I suppose you _could_ use finetuning to whip it into a useful shape, but that's a lot of work. (https://arxiv.org/pdf/2308.09895 is a good read)

But the flip side of that core issue is that if the base level is high, they're good. Which means for Python & JS, they're pretty darn good. Making pandas garbage work? Just the task for an LLM.

But yeah, R & nginx is not a major part of their original training data, and so they're stuck at "no clue, whatever stackoverflow on similar keywords said".


Perhaps swearing at the LLM actually produces worse results?

Not sure if you’re being figurative, but if what you wrote in your first comment is indicative of the tone with which you prompt the LLM, then I’m not surprised you get terrible results. Swearing at the model doesn’t help it produce better code. The model isn’t going to be intimidated by you or worried about losing their job—which I bet your junior engineers are.

Ultimately, prompting LLMs is simply a matter of writing well. Some people seem to write prompts like flippant Slack messages, expecting the LLM to somehow have a dialogue with you to clarify your poorly-framed, half-assed requirement statements. That’s just not how they work. Specify what you actually want and they can execute on that. Why do you expect the LLM to read your mind and know the shape of nginx logs vs nginx-ingress logs? Why not provide an example in the prompt?

It’s odd—I go out of my way to “treat” the LLMs with respect, and find myself feeling an emotional reaction when others write to them with lots of negativity. Not sure what to make of that.


That's more my inner monologue than what is typed into the LLM.


But at the same time it'll write me 2000 lines of really gnarly text parsing code in a very optimized fashion that would have taken a senior dev all day to crank out.

We have to stop trying to compare them to a human, because they are alien. They make mistakes humans wouldn't, and they complete very difficult tasks that would be tedious and difficult for humans. All in the same output.

I'm net-positive from using AI, though. It can definitely remove a lot of tedium.


> If I ask Claude to do a basic operation on all files in my codebase it won't do it.

Not sure exactly how you used Claude for this, but maybe try doing this in Cursor (which also uses Claude by default)?

I have had pretty good luck with it "reasoning" about the entire codebase of a small-ish webapp.


Since when is “do something on every file in my codebase” considered coding?


Maybe its not but its a comparatively simple task a junior developer can do.


Refactoring has been a thing since well forever.


Well that's the hard bit I really want help with because it takes time.

I can do the rest myself because I'm not a dribbling moron.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: