I've wondered before if it was possible to unlearn facts, but retain the general "reasoning" capability that came from being trained on the facts, then dimensionality reduce the model.
I don't know about in AI, but it seems like that is what humans do.
We remember some facts but I know at least I have had a lot of facts pass through me and only leave their effects.
I once had some facts, did some reasoning, arrived at a conclusion, and only retained the conclusion and enough of the reasoning to identify other contexts where the same reasoning should apply. I no longer have the facts, I simply trust my earlier selfs process of reasoning, and even that isn't actually trust or faith because I also still reason about new things today and observe the process.
But I also evolve. I don't only trust a former reasoning unchanging forever. It's just that when I do revisit something and basically "reproduce the other scientists work" even if I arrive a different conclusion today, I'm generally still ok with the earlier me's reasoning and conclusion. It stands up as reasonable, and the new conclusion is usually just tuned a little, not wildly opposite. Or some things do change radically but I always knew they might, like in the process of self discovery you try a lot of opposite things.
Getting a little away from the point but the point is I think the way we ourselves develop answer-generating-rules is very much by retaining only the results (the developed rules) and not all the facts and steps of the work, at least much of the time. Certainly we remember some justifying / exemplifying facts to explain some things we do.
This presumes that LLMs actually contain "reasoning" capability other than "the facts" and simple interpolation between them.
It is far from clear that this presumption holds.
Ingesting more text than a human could read in several lifetimes produces the ability to interpolate answers to a surprisingly large range of questions. This is not intuitive to humans, because we've never met anybody who could possibly read this much raw text, so we mistake this level of interpolation for reasoning since we've never met anybody who could possibly memorize-and-interpolate so much. The only thing we've ever seen that can answer questions like this is a human using reasoning capabilities, so we assume that's what this thing on the other side of the screen must be doing. Like chimpanzees mistaking their own reflection in a mirror for another ape.
This is the other "bitter lesson" of ML.
But if you spend enough time playing with these models you start to figure out what questions to ask to make them look foolish. And that is totally fair game for the Turing Test. Remember, there is no time limit on the Turing Test. However there is a strict requirement that the machine under test cannot be serviced, modified, or updated while the test is underway -- and because of this, nothing that OpenAI has produced is capable of even taking the test. We know that OpenAI tweaks and tunes their models whenever they please, as many times a day as they like, and that they use discussions here on HN to feed that process. So stick to the models you can download.
It's uncontroversial to say that this sort of "massive interpolation using superhuman text ingestion" is exactly how modern machine translation models work, and they work extremely well. LLMs were created by taking a machine translation model, throwing away half of it, and then fiddling with the leftovers.
There is clearly a degree of abstraction-- consider you can make up some game which creates words the LLM has never seen before and assigns them meaning, then ask the LLM to reason about them within the logic of the game and it will do so at least somewhat successfully.
(much worse than it does stuff it's seen before, for sure, but that it does it at all shows there is some abstraction)
Now if that qualifies as "reasoning" is another question, but it may be a metaphysical one with little value in making the world a better place. :P
Whatever we call it there clearly is some amount of emergent abstraction in the models which is useful for at least some applications (if many fewer than the hype suggests). Can that abstraction be isolated from the factual data that went in to construct it? If so then perhaps we could have smaller models with better performance or construct ways to amplify that "operating over abstraction" until it did meet whatever bar you'd require to call it "reasoning", or at least become more useful along the way.
you can make up some game which creates words the LLM has never seen before and assigns them meaning
Tracking these sorts of "X means Y" mappings is precisely what the Q-K-V matrix of a transformer (or rather, Schmidhuber Fast Weight Programmer) does. This particular capability isn't even learned -- it's programmed in by the human who wrote the model evaluation code!
Whatever we call it there clearly is some amount of emergent abstraction in the models
I really, genuinely question this. I see extremely-high-dimensional interpolation over an extremely large dataset. Take away the dataset and what's left is gradient descent. And the token embedding, I guess. I'm not sure how you would "unlearn" something (like King-Man+Woman=Queen) from the embedding, or even what that would mean.
Doesn't have to be something that is directly solvable in K-V lookup style:
"In neothorpic algebra words for seasons take the place of even integers and words for food take the place of odd integers, arithmetic generally works as usual. What can you tell me about the result of summer + cake in neothorpic algebra?"
Perhaps we could agree that you can get pretty far-- further than what people would have expected prior to LLMs-- with pretty dumb linguistic reasoning, and that that's mostly (or all) the LLM is doing.
But how confident can we really be that our thinking is categorically different? :P
But how confident can we really be that our thinking is categorically different?
I know that humans are doing more than interpolating, because at the rate we read and for the typical lifespan we have, we simply cannot ingest enough text to perform the sorts of tasks we perform by simple interpolation.
I also know that whatever our brains are doing, it isn't backpropagation, nor is it even remotely related to it. The inventor of backpropagation, Geoff Hinton, frequently points this out. Backpropagation is egregiously nonlocal.
It could understand it and try to comply but still fail to understand where it would leak data which can later be corroborated to someone. This is at least what commonly happens with human AGis.
If you think of knowledge as a (knowledge) graph, it seems there would be some nodes with low centrality that you could drop without much effect, and other key ones that would have a bigger impact if lost.
Yes, me too. If it could somehow remember the “structure” instead of the instantiation. More “relationships between types of token relationships” instead of “relationships between tokens”.