November 18, 2025

RAG joins the agentic stack — make it enterprise-safe with private AI  

by Paul Hewitt, Global Head of Data and AI Practices, and Ryan Nowak, Offering Lead, Private AI



Recently you may have seen online commentary suggesting that enterprises are shifting away from traditional retrieval augmented generation (RAG) architectures due to risks associated with centralizing vector stores, broken access controls and latency.

But RAG is not dead. Rather, it’s continuing to be leveraged in agentic AI stacks. We are entering an era where RAG will combine with one agent doing this, another agent doing that, plus some multimodal interfaces and so on. It's part of the future of building enterprise-grade AI agents (rather than just models) under full governance, security and control.

Old RAG is now new RAG

RAG retrieves relevant documents from a knowledge base and feeds them into the LLM along with an “improved” user query, so answers are grounded in enterprise content rather than the model’s memory.

That function hasn’t gone away; it has become a key ingredient in a signature dish in which multiple agents coordinate tasks, tools and data sources.

RAG in the agentic AI mix supports the autonomous actions of these systems, providing verifiable, current context for decision making; enabling domain- and task-specific intelligence for adaptability; reducing hallucinations and ensuring traceability for security and governance; avoiding retraining for efficiency; and acting with situational awareness for autonomy.  

More benefits, less risk

This means you can leverage GenAI’s decision-making capacity without risking data breaches or unauthorized access. You gain unrestricted data volume and processing capabilities to support large data loads, real-time data processing and large-scale simulations, as well as AI model customization. In addition, you can change or fine-tune models to meet specific business challenges.

Working with a partner significantly reduces costs. This is particularly true for expenses involving inference, fine-tuning, deployment and maintenance, and for acquiring the expertise to comply with private AI infrastructure security regulations.


Map the work, model the agents

How to best leverage this maturing framework? Before you stand up agents, document the workflow the traditional way. Determine the steps, decisions, actions and data at each point.

  • Divide the process into clear stages, then assign agent roles to each stage. For example, in the case of an insurance agentic AI process, an intake agent extracts fields, a risk agent fetches credit signals, a pricing agent proposes terms and a reviewer agent checks compliance.
  • Decide which new agents you need, the decisions they should make and the data they’ll need. When data lives behind enterprise systems, expose it safely. Many teams implement model context protocol (MCP), an open protocol that enables models to communicate with data and tools (e.g., agents) through standard servers.  

Once the flow and agents are sketched, write evaluations and tests to prove the system meets expectations end-to-end. Organizations have already demonstrated how to do this at scale: Morgan Stanley evaluated GPT-4 against expert ground truth and restricted responses to internal content — a pattern that translates easily to agent validation and auditability.

 

Guardrails and guardians: keeping agents in bounds

Hard constraints matter. In finance, an interest rate can’t be negative; in underwriting, a decision needs a non-discriminatory rationale. Beyond hard rules, design qualitative guardrails and “guardian” agents that challenge or veto unsafe actions.

  • Codify hard limits and policy checks before actions execute.
  • Require explanations for decisions where law or policy demands it.
  • Add reviewer or watchdog agents to monitor agent outputs, callouts and tool use.

Latency is easy to test: set thresholds and fail fast. Leakage is trickier. If you use public models or services, you may have limited guarantees about how prompts and data are retained. Attacks like prompt injection can subvert instructions, and model inversion research shows that sensitive training data can sometimes be extracted from models.

For high-risk workloads, air-gapped or private deployments minimize exposure to potential security risks. When that’s impractical, enforce a strong private AI perimeter and centralize monitoring, scanning prompts in the same way you’d scan code for SQL injection, logging every agent action for forensics.

Putting new RAG knowledge to work

Context engineering evolved from what RAG began and, essentially, now encompasses it. RAG fetches facts into the prompt, providing extra context. Context engineering determines what goes into the prompt (instructions, templates, examples and retrieved facts), ensuring the outputs match your tone and risk posture (it also post-processes to augment, validate and guardrail the output). As agents accumulate session memory and tool state, the challenge expands from prompts to a whole-of-system “context architecture” that manages memory, retrieval, instructions and guardrails together.



In practice, for agents to safely take action, you’ll need platform capabilities that bundle secure deployment, workflow orchestration, MCP-based data access, evaluations, policy enforcement and LLMOps (which bring tracing, telemetry and cost controls to AI apps).

Many teams package these into platform-independent private AI solutions, such as DXC’s Private AI for the Enterprise, and workbenches[CN1] , allowing them to transition from prototypes to fully governed production systems.   

Here’s a practical first-90-day deployment plan:

  • Weeks 1–2: Pick one workflow with a clear ROI and bounded blast radius. Document steps, decisions and data. Define the decisions an agent may make versus those it may recommend.
  • Weeks 3–6: Build a minimal agent chain with MCP access to two data sources. Add hard policy checks and a reviewer agent.
  • Weeks 7–12: Instrument with LLMOps. Add evals and decision logging. Run a supervised pilot with human feedback to reinforce desired outcomes.

RAG was foundational; now it’s table stakes. The next step is designing an agentic, private AI that can act under your rules. Start by mapping a single workflow, exposing the right data via MCP, adding hard and qualitative guardrails, then instrumenting everything with LLMOps. LLMOps includes tools like Langfuse for observability and LiteLLM for multi-provider routing, centralizing logs, and managing latencies and spend. 































About the authors

Praveen Cherukuri is a chief technologist at DXC, leading AI-driven digital transformations for global enterprises. With deep expertise in scaling systems, cloud optimization, and AI strategy, he helps organizations accelerate growth and enhance efficiency. Passionate about innovation, he creates competitive advantages for DXC clients across industries.

Ryan Nowak is the Offering Lead, Private AI at DXC Technology, bringing over two decades of experience in product development, SaaS strategy and agile leadership. Previously holding senior roles at Centersquare, ONE Discovery, and Brainspace, he has led global teams in product innovation and strategy execution. Ryan specializes in private AI, product road mapping, and scalable enterprise solutions.