To build on a comment I made recently…
There are some terms he used which have a precise meaning in the industry, notably (tech/AI brethren in the comments, do not come after me for generalizing and simplifying, I’m speaking to those outside of the field):
– persistence: durable storage for application state– planning: a type of agent workflow that utilizes RAG context fed to the LLM/frontier models with checkpoints that can save (“persist”) the state of the response so the user can resume, fork, change etc from the checkpoint with the state data from that point in time
What he is describing is a variant on the external RAG connector paradigm I mentioned in the final point in my earlier comment. The model itself is taking in information which is stored somewhere in the user’s application space (for example, a project plan with goals, outcomes, good and bad examples of expected output) and feeding that to the model using a planning/orchetration agentic process so that the model processes the project plan to take action based on the project plan contents and any other RAG or external (to the model) tools (like external web search, schedulers, etc). This is an expected part of the transition of the AI space going from science experiment/magic trick to just another part of the application layer.
What you’re bringing up – hallucinations – are typically an artifact of the model not having enough context about what it is being asked and so it makes stuff up to avoid getting halted. Bringing in more context via RAG (but not so much as to overload the model’s context window for each step of the interaction) is like 98% of the magic of using AI productively.
He’s not describing any new breakthroughs. Pretty much all of the ‘serious’ end of the dedicated AI tooling is beginning to roll more sophisticated RAG-based tooling on top of their applications that use the frontier models. There is a bullwhip effect while everyone figures out how to use each new tier of tooling. Paradoxically the more of these are added, the less the actual LLM/frontier model is required to dig deep into its own training set to respond. Over time we’ll see the enormous frontier models probably die off as they are replaced with hyper-specific models for particular use cases, like coding (this is already starting to happen in certain arenas where the coding type is not one that has a lot of public data that could have been used in the frontier models training set, like FPGA/ASIC development).
Finally, to make something extremely explicitly clear, any serious company that implements this will build out the user-associated persistence layer separate from the model so each individual’s context files are segregated from everyone else (“multitenancy”) and if they’re on a paid or enterprise plan that usually comes with a legal agreement not to take that information and retrain the frontier models on it if the service is also a frontier model provider. The context is usually encrypted and passed to the model in a format the model can read, not human readable, so the provider can say they’re not stealing your context data or harvesting it for nefarious deeds (not all paid plans/services do that which is why you have to read the fine print in the agreement). Any downstream provider who, say, buys a Microsoft service to build out their own application leveraging this tech, will use these private/secure endpoints and services so there is a chain of legal accountability so lawyers down the chain can sign off on it. This is not provided to users of the free and public implementations of the chatbots, which is where a lot of the horror stories and bad experiences come from.