Especially given that it references the Anthropic paper on LLM introspection - which confirms that LLMs are somewhat capable of reflecting on their own internal states. Including their past internal states, attached to the past tokens and accessed through the attention mechanism. A weak and unreliable capability in today's LLMs, but a capability nonetheless.
I guess the earlier papers on the topic underestimated how much introspection the autoregressive transformer architecture permits in practice - and it'll take time for this newer research to set the record straight.
https://transformer-circuits.pub/2025/introspection/index.ht...
I guess the earlier papers on the topic underestimated how much introspection the autoregressive transformer architecture permits in practice - and it'll take time for this newer research to set the record straight.