More

TrainedMonkey · 2026-01-23T19:58:13 1769198293

It's software engineering crack. Starting a project feels amazing, features are shipping, a complex feature in the afternoon - ezpz. But AI lacks permanence, for every feature you start over from scratch, except there is more of codebase now, but the context window is still the same. So there is drift, codebase randomizes, edge cases proliferate, and the implementation velocity slows down.

Full disclosure - I am a heavy codex user and I review and understand every line of code. I manually fight spurious tests it tries to add by pointing a similar one already exists and we can get coverage with +1 LOC vs +50. It's exhausting, but personal productivity is still way up.

I think the future is bright because training / fine-tuning taste, dialing down agentic frameworks, introducing adversarial agents, and increasing model context windows all seem attainable and stackable.

kaydub · 2026-01-23T22:40:31 1769208031

I usually have multiple agents up working on a codebase. But it's typically 1 agent building out features and 1 or 2 agents code reviewing, finding code smells, bad architecture, duplicated code, stale/dead code, etc.

I'm definitely faster, but there's a lot of LLM overhead to get things done right. I think if you're just using a single agent/session you're missing out on some of the speed gains.

I think a lot of the gains I get using an LLM is because I can have the multiple different agent sessions work on different projects at the same time.

tuhgdetzhh · 2026-01-23T20:26:16 1769199976

I think that the current test suite is far too small. For the Claude Code codebase, a sensible next step would be to generate thousands of tests. Without that kind of coverage, regressions are likely, and the existing checks and review process do not appear sufficient to reliably prevent them. My request is that an entirely LLM-written feature should only be eligible for merge once all of those generated tests pass, so we have objective evidence that the change preserves existing behavior.

TrainedMonkey · 2026-01-23T18:14:48 1769192088

All other governments is a stretch here, but likelihood of at least one another government getting same privileges is extremely high.

TrainedMonkey · 2026-01-22T23:08:31 1769123311

Humans also respond differently when prompted in different ways. For example, politeness often begets politeness. I would expect that to be reflected in training data.

mlsu · 2026-01-22T23:45:49 1769125549

If I, a moron, hire a PhD to crack a tough problem for me, I don't need to go back and forth prompting him at a PhD level. I can set him loose on my problem and he'll come back to me with a solution.

nl · 2026-01-23T01:17:58 1769131078

> hire a PhD to crack a tough problem for me, I don't need to go back and forth prompting him at a PhD level. I can set him loose on my problem and he'll come back to me with a solution.

In my experience with many PhDs they are just as prone to getting off track or using their pet techniques as LLMs! And many find it very hard to translate their work into everyday language too...

preciousoo · 2026-01-24T05:07:18 1769231238

The PhD can't read minds, the quality if the request from a moron would be worse than the quality of the request from someone with avg intelligence. And the output would probably noticeably differ accordingly

chasd00 · 2026-01-24T01:22:15 1769217735

Unless your problem fits the very narrow but very deep area of expertise of the PhD you’re not going to get anything. The phds I have worked with can’t tie their shoes because that wasn’t in their dissertation.

Herring · 2026-01-23T00:07:54 1769126874

Well if it ever gets to be a full replacement for phds, you’ll know cause it will have already replaced you.

HPsquared · 2026-01-23T00:03:51 1769126631

I think that's what is happening. It's simulating a conversation, after all. A bit like code switching.

b00ty4breakfast · 2026-01-23T03:43:49 1769139829

that seems like something you wouldn't want from your tools. humans have that and that's fine, people are people and have emotions but I don't want my power-drill asking me why I only call when I need something.

freejazz · 2026-01-23T21:54:48 1769205288

>Humans also respond differently when prompted in different ways.

And?

TrainedMonkey · 2026-01-13T18:12:09 1768327929

The problem is, people who make that decision can either spend 0.1% to support open source and get return on investment in terms of better business performance in 2-3 business years. Or they could pay themselves 0.1% in bonuses right now and get an immediate return.

TrainedMonkey · 2026-01-13T00:33:26 1768264406

Can you cite your sources? My understanding is that based on past data there is strong correlation between special military operations, people working late in pentagon, and takeout places in the vicinity having a spike of orders.

tptacek · 2026-01-13T00:40:36 1768264836

From what I understand, which squares with the times I've been to the Pentagon:

* People working late at the Pentagon don't order pizza to the building

* The Pentagon has pizza options, including late-night ones

* The Pentagon is in fact chock-full of restaurants

* There is in reality no such thing as real telemetry about pizza orders near the Pentagon.

I have the opinion that this pizza thing is mostly just a story people tell because it makes them feel clever. Not high-horsing it; I have those too.

crazygringo · 2026-01-13T00:55:10 1768265710

Not to mention, there's something like 25,000 people working at the Pentagon.

There are so many potential late-night work things happening that would need food, the idea that pizza orders can be used to identify high-profile military missions specifically doesn't make a lot of sense...

pella · 2026-01-13T00:57:44 1768265864

"Between the late hours of January 2 and the early morning of January 3, 2026, unusually high activity was again observed at a Papa John's near the Pentagon. This coincided with the lead-up to the United States strikes in Venezuela.[15][16] Following the strikes, President Donald Trump announced the capture of Nicolás Maduro, and his wife, Cilia Flores, who were subsequently flown out of the country to face narcoterrorism charges. The surge in pizza orders preceded the official confirmation of the operation by several hours, during which Venezuelan Vice President Delcy Rodríguez reported the couple as missing.[17]"

+

"In a statement to Newsweek in 2025, the Department of Defense denied the theory, claiming that the Pentagon has numerous internal food vendors that are available to late-night workers. It criticized the accuracy of the timeline provided by the Pentagon Pizza Report.[18][19]"

--> https://en.wikipedia.org/wiki/Pentagon_pizza_theory

crazygringo · 2026-01-13T01:01:39 1768266099

And where's the control?

What about the other thousands of surges in pizza orders that had nothing to do with military missions abroad?

That's why Wikipedia calls it an "informal observation" and quotes the "potential for confirmation bias", asking "When else do spikes occur? How often do they have absolutely nothing to do with geopolitics?"

pella · 2026-01-13T01:12:41 1768266761

It works as a heuristic for inferring classified activity from indirect signals.

check now:

https://www.pizzint.watch/ "PAPA JOHNS PIZZA" 294% SPIKE !

tptacek · 2026-01-13T01:26:33 1768267593

That site definitely wants you to think it works as a heuristic, we get it.

tptacek · 2026-01-13T01:03:05 1768266185

Also if you're going to try to tell this story at least do better than Papa John's.

TrainedMonkey · 2026-01-08T20:54:24 1767905664

While we are at it, let's also get all the weapon manufacturers. How dare they hide behind "we are just making a tool" and "we are not responsible for what our users do". And don't get me started on people using cars to cause harm to others, why is nobody going after big auto?

janice1999 · 2026-01-08T21:10:23 1767906623

What an unserious response.

Gun manufacturer accountability (or lack thereof) is a complex topic and one with ongoing lawsuits and evolving legal arguments. (In the USA see the Bush-era NRA-backed Protection of Lawful Commerce in Arms Act in 2005 and recent Smith & Wesson Brands, Inc. v. Mexico arguments).

Personally if a gun manufacturer markets a gun with the primary feature being finger print resistant (yes, that's real) and being easy to carry concealed, I think lawmakers should investigate. Likewise f someone makes a big CSAM generator button and puts it in front of millions of users, it also deserves legal attention.

TrainedMonkey · 2026-01-08T21:32:35 1767907955

It was an unserious response to an absolutist statement. I pointing out where such absolutism would lead if applied to other areas.

I don't think fingerprint scanner on guns will be effective as it tracks ownership and not legality of usage. However, a number of modern vehicles do have capabilities to perform autonomous actions, including overriding user input.

llmslave2 · 2026-01-08T21:36:19 1767908179

It's not an absolutist statement and it doesn't lead to any of the insane conclusions you came up with. Holding the people running a model liable for what it generates has nothing to do with making tool makers liable.

xigoi · 2026-01-09T06:12:28 1767939148

It’s not a CSAM generation button, it’s a general content generation button that someone used to generate CSAM, and it’s that someone who is responsible for it.

arcatech · 2026-01-08T20:56:30 1767905790

Flattening the discussion to “everything is actually exactly the same” isn’t helpful. This situation is not the same as a person driving a vehicle.

llmslave2 · 2026-01-08T21:02:26 1767906146

No, because Grok or ChatGPT etc are not tools, they are services. It doesn't matter if Grok uses an LLM or a bunch of people using Photoshop, if they are outputting illegal content the consequences should be the same.

People don't go after big auto because they sell cars that other people drive. If they suddenly started offering a service where they would drive people around in their cars, and they started crashing into other people, of course they would be responsible for that.

TrainedMonkey · 2026-01-07T22:44:57 1767825897

Seems overly reductive, both supply and demand will determine what happens. So far demand for Waymos seem fine, they can stimulate it way further by lowering the prices. The problem is on the supply side, specifically unit price economics. Intervention per mile is just one part that goes into profitability and I doubt it's biggest one. I would estimate the costs to be in this order - vehicle cost, maintenance (and vehicle longevity), human intervention, charging, fleet management (cleaning, etc), and regulatory environment.

In particular, Jaguar Waymos are over 150k a pop. It seems far fetched that any of them will make ever break even. New generation is reportedly $75k per vehicle which is significantly better. I could not find any data for Zoox vehicle cost, but given how few of them there are it's a non-player.

Finally the elephant in the room. Outside of camera vs lidar holy war, Tesla seems well positioned to dominate supply side of the equation if the demand shows up. Robotaxis are reportedly under $35k, they own the factories and know how to build more, they also own the maintenance side.

hadlock · 2026-01-07T23:16:13 1767827773

You can build a GMC panel van that seats 12 for about $20k, I don't think vehicle cost is a significant hurdle.

rogerrogerr · 2026-01-07T23:21:26 1767828086

You can’t build a self driving GMC panel van with non-Tesla tech for $20k.

(Or, probably, with Tesla tech. But you definitely can’t do it without.)

TrainedMonkey · 2026-01-06T17:42:55 1767721375

Apparently the trick is two std::rotates : https://devblogs.microsoft.com/oldnewthing/20260101-00/?p=11...

As a side note, love how Raymond handled that, no fluff and straight to the point. Beginners mind and all that.

TrainedMonkey · 2026-01-05T21:21:59 1767648119

Withings smart watches are kind of insane with 30 day battery life, but the subscription farming in the app is infuriating.

All I want from a smart watch:

- Waterproof, wireless charging, at least a week of battery life

- Automatically track exercise and sleep, let me update the data if needed.

- Track my fitness trends over time, looking at you resting heart rate

- Optionally, learn a couple of recurring patterns to improve automatic exercise assignment. If I hike twice a week and you see an exercise session with a consistent heart rate profile you better believe I am hiking

yborg · 2026-01-06T06:41:51 1767681711

Not just subscription farming, you cannot use the latest mobile app versions without agreeing to data mining. Their business model now is literally collecting user health and usage data to resell to other companies and of course for AI training. Avoid.

sjw987 · 2026-01-07T14:49:18 1767797358

Garmin are pretty decent. I have an Enduro 2 which lasts a fortnight on a charge (currently 37% with 6 days of charge left).

It's waterproof. Unfortunately no wireless charging (proprietary cable) but it charges 2 weeks worth in about 2 hours.

It doesn't automatically track exercise, but it does collect a lot more (and higher quality) data than Withings for activity. Automatic sleep. The app has the trends and such, and there's no subscription (they recently added some AI stuff you can pay for but which is optional).

TrainedMonkey · 2025-12-27T21:05:23 1766869523

Depending on the application you would generally use PTP to get sub-microsecond accuracy. The real trick is that architecture should tolerate various clocks starting or jumping out of sync and self correct.