Marina Danilevsky at IBM walks through why large language models get things confidently wrong, and a simple fix that stops them hallucinating or relying on old knowledge. It's a six-minute deep dive into how to ground a model in your actual data. The second half gets a touch technical, but the core idea is just: add a lookup step before the answer.
What this means for you
Here's the problem. A language model generates answers based purely on what it learned during training. It's confident, but that confidence isn't the same as accuracy. Ask it something it's never seen before, and it'll make something up rather than admit it doesn't know.
Marina gives a brilliant example: which planet has the most moons? She confidently said Jupiter with 88 moons (from memory), but the actual answer is Saturn with 146, and it keeps changing as scientists find more. She had no source, and her knowledge was stale. Large language models do exactly the same thing.
This is where RAG comes in. Retrieval-Augmented Generation is the idea of plugging in a data source before you ask the model to answer. Instead of the model just generating an answer from what it knows, you first say: go and look up relevant information from this store of documents, your company policy, NASA's database, whatever. Then combine that with the question and generate your answer.
That changes everything. The model now has a source it can point to. It's less likely to hallucinate because it's not relying only on training data. And when new information arrives, you don't retrain the model. You just update the data store, and next time someone asks, they get the current answer. Plus the model learns to say "I don't know" when something isn't in the data it can reach.
The catch is that this only works if the retriever is good. If it can't find the right information from your data store, the model can't give you a real answer. So the practical part of RAG is making sure your data source is well organised and searchable.
Picture a support team answering questions about shipping and returns. Right now, agents confidently tell customers things that aren't quite right. With RAG, you feed the live policy database into the system. When a customer asks about refunds, the agent looks up the actual current rules before answering. No more stale guidance. If policy changes, the team updates the data store once and every agent gets it next conversation.
Try this
Think about one place in your team where someone answers the same questions over and over from a policy manual or knowledge base. What if that knowledge source was wired directly into an agent? Pick that one thing and jot down what data source you'd plug in.
Common questions about RAG
What is RAG (retrieval-augmented generation) in simple terms?
RAG, short for Retrieval-Augmented Generation, is the idea of plugging a data source into an AI model before you ask it to answer. Instead of generating an answer purely from what it learned in training, the model first looks up relevant information from a store of documents you choose, then combines that with your question. In plain terms, it is just adding a lookup step before the answer.
Take a support desk answering refund questions. With RAG, the agent pulls the current refund rules from your policy store first, then answers, rather than going from memory.
How does RAG stop AI from making things up (hallucinating)?
Left to itself, a language model answers from memory and will confidently invent something rather than admit it does not know. A common example is saying Jupiter has the most moons when the current answer is Saturn. RAG reduces this because the model is no longer relying only on its training data, and it helps in two ways:
- An actual source: the model now has real information to draw from and point to, not just memory.
- More honest answers: it becomes more willing to say 'I don't know' when something is not in the data it can reach.
Think of a support desk fielding refund questions. Instead of guessing from old training, the agent checks the live policy store, so it answers from the real rules or admits the rule is not there. It lowers the chance of made-up answers, but it does not promise the model will never be wrong.
Why connect an AI agent to your own data?
Connecting an agent to your own data means it answers from your current, real information rather than the stale knowledge it picked up in training. When something changes, you do not retrain the model. You update the data store once, and everyone gets the current answer next time they ask. Picture a support desk answering refund questions: wire the live policy store into the agent, and when the refund rules change you update that store once rather than re-briefing every agent.
Note: pick one place in your team where people answer the same questions from a policy manual or knowledge base, and note which data source you would wire in.
Why does RAG sometimes give wrong or no answers?
RAG only works if the retriever can actually find the right information, so the model is only as good as the data source behind it. If the store is messy or hard to search, the retriever may miss an answer that is genuinely in there, and the model cannot give a real response. Imagine a support desk whose refund policies sit in scattered, badly organised files: even with the right rule somewhere in there, the agent may fail to surface it, so the answer comes back wrong or blank.
Note: make sure the knowledge source you plug in is well organised and searchable before you connect it to an agent.