Hacker Newsnew | past | comments | ask | show | jobs | submit | ptx's commentslogin

How is verification faster and easier? Normally you would check an article's citations to verify its claims, which still takes a lot of work, but an LLM can't cite its sources (it can fabricate a plausible list of fake citations, but this is not the same thing), so verification would have to involve searching from scratch anyway.

Because it gives you an answer and all you have to do is check its source. Often you don’t have to do that since you have jogged your memory.

Versus finding the answer by clicking into the first few search results links and scanning text that might not have the answer.


As I said, how are you going to check the source when LLMs can't provide sources? The models, as far as I know, don't store links to sources along with each piece of knowledge. At best they can plagiarize a list of references from the same sources as the rest of the text, which will by coincidence be somewhat accurate.

Pretty much every major LLM client has web search built in. They aren't just using what's in their weights to generate the answers.

When it gives you a link, it literally takes you to the part of the page that it got its answer from. That's how we can quickly validate.


LLMs provide sources every time I ask them.

They do it by going out and searching, not by storing a list of sources in their corpus.


have you ever tried examining the sources? they actually just invent many "sources" when requested to provide sources

When talking about LLMs as search engine replacements, I think the stark difference in utility people see stems from the usecase. Are you perhaps talking about using it for more "deep research"?

Because when I ask chatgpt/perplexity things like "can I microwave a whole chicken" or "is Australia bigger than the moon" it will happily google for the answers and give me links to the sites it pulled from for me to verify for myself.

On the other hand, if you ask it to summarize the state-of-the art in quantum computing or something, it's much more likely to speak "off the top of its head", and even when it pulls in knowledge from web searches it'll rely much more on it's own "internal corpus" to put together an answer, which is definitely likely to contain hallucinations and obviously has no "source" aside from "it just knowing"(which it's discouraged from saying so it makes up sources if you ask for them).


I haven't had a source invented in quite some time now.

If anything, I have the opposite problem. The sources are the best part. I have such a mountain of papers to read from my LLM deep searches that the challenge is in figuring out how to get through and organize all the information.


> So I had Claude Code do the rest of the investigation

And did you check whether or not what it produced was accurate? The article doesn't say.


Yes. And I shared the full transcript so you can see for yourself if you like: https://gistpreview.github.io/?edbd5ddcb39d1edc9e175f1bf7b9e...

I read through this to see if my AI cynicism needed any adjustment, and basically it replaced a couple basic greps and maaaaybe 10 minutes of futzing around with markdown. There's a lot of faffing about with JSON, but it ultimately doesn't matter to the end result.

It also fucked up several times and it's entirely possible it missed things.

For this specific thing, it doesn't really matter if it screwed up, since the worst that would happen is an incomplete blog post reporting on drama.

But I can't imagine why you would use this for anything you need to put your name behind.

It looks impressive, sure, but the important kernel here is the grepping and there it's doing some really basic tinkertoy stuff.

I'm willing to be challenged on this, so by all means do, but this seems both worse and slower as an investigation tool.


The hardest problem in computer science in 2025 is showing an AI cynic an example of LLM usage that they find impressive.

How about this one? I had Claude Code run from my phone build a dependency-free JavaScript interpreter in Python, using MicroQuickJS as initial inspiration but later diverging from it on the road to passing its test suite: https://static.simonwillison.net/static/2025/claude-code-mic...

Here's the latest version of that project, which I released as an alpha because I haven't yet built anything real on top of it: https://github.com/simonw/micro-javascript

Again, I built this on my phone, while engaging with all sorts of other pleasant holiday activities.


> For this specific thing, it doesn't really matter if it screwed up

These are specifically use cases where LLMs are a great choice. Where the stakes are low, and getting a hit is a win. For instance if you're brainstorming on some things, it doesn't matter if 99 suggestions are bad if 1 is great.

> the grepping and there it's doing some really basic tinkertoy stuff

The boon is you can offload this task and go do something else. You can start the investigation from your phone while you're out on a walk, and have the results ready when you get home.

I am far from an AI booster but there is a segment of tasks which fit into the above (and some other) criteria for which it can be very useful.

Maybe the grep commands etc look simple/basic when laid bare, but there's likely to be some flailing and thinking time behind each command when doing it manually.


Hmm, you mention in the README that it only works in a privileged container. This of course negates the security benefits Wayland supposedly has over X11, so it doesn't seem ideal.

It really doesn't. The security benefits of Wayland are about isolating applications from each other, which this still has.

Also that's only really relevant to this running in a container. My point was that you can have headless Wayland.


Are you saying that you can't get the title of the active window in X11 without using some features specific to the X.Org implementation?

It looks like the core X11 protocol spec [1] defines all that's needed, specifically the GetInputFocus, QueryTree and GetProperty messages. You might also want some things from the EWMH spec [2] (e.g. _NET_WM_NAME for UTF-8 or _NET_WM_WINDOW_TYPE to identify top-level application windows) but none of this seems like an implementation-specific X.Org feature.

[1] https://x.org/releases/X11R7.7/doc/xproto/x11protocol.html

[2] https://specifications.freedesktop.org/wm/latest/


Open Source is the same thing as Free Software, just with the different name. The term "Open Source" was coined later to emphasize the business benefits instead of the rights and freedom of the users, but the four freedoms of the Free Software Definition [1] and the ten criteria of the Open Source Definition [2] describe essentially the same thing.

[1] https://www.gnu.org/philosophy/free-sw.en.html

[2] https://opensource.org/osd


Well, the description is attributed to "two people with knowledge of the project" of unclear national origin.

Perhaps only in the case where it's preceded by the small tsu? E.g. "一人ぼっち" -> "hitori bo[tsu]chi" -> "hitori botchi"? That's what Wikipedia says [1], although I think it's also common to (incorrectly?) use "bocchi" instead.

[1] https://en.wikipedia.org/wiki/Hepburn_romanization#Long_cons...


This was the default window manager on Red Hat Linux (not RHEL) 5.0, if I recall correctly.

And the Windows 1.0 UI [1] looks really similar to Mac OS (especially dialogs and buttons), so apparently Microsoft pilfered their UI design from Steve Jobs's companies not only once but twice.

[1] https://www.pcjs.org/software/pcx86/sys/windows/1.00/


It's disconcerting that he's on the fence about this. They obviously seriously considered the option and worked out an estimate of how much money it would bring in. Would they have done it for $175 million?

It also seems at odds with the user being in control of their own data, which he says "there is something to be said about". Mozilla wouldn't be able to impose that sort of restriction on the user if the user were really in control, so I suppose that's why he only voices weak and vague support of user control.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: