November 18, 2025
RAG joins the agentic stack — make it enterprise-safe with private AI
by Paul Hewitt, Global Head of Data and AI Practices, and Ryan Nowak, Offering Lead, Private AI
November 18, 2025
by Paul Hewitt, Global Head of Data and AI Practices, and Ryan Nowak, Offering Lead, Private AI
Recently you may have seen online commentary suggesting that enterprises are shifting away from traditional retrieval augmented generation (RAG) architectures due to risks associated with centralizing vector stores, broken access controls and latency.
But RAG is not dead. Rather, it’s continuing to be leveraged in agentic AI stacks. We are entering an era where RAG will combine with one agent doing this, another agent doing that, plus some multimodal interfaces and so on. It's part of the future of building enterprise-grade AI agents (rather than just models) under full governance, security and control.
RAG retrieves relevant documents from a knowledge base and feeds them into the LLM along with an “improved” user query, so answers are grounded in enterprise content rather than the model’s memory.
That function hasn’t gone away; it has become a key ingredient in a signature dish in which multiple agents coordinate tasks, tools and data sources.
RAG in the agentic AI mix supports the autonomous actions of these systems, providing verifiable, current context for decision making; enabling domain- and task-specific intelligence for adaptability; reducing hallucinations and ensuring traceability for security and governance; avoiding retraining for efficiency; and acting with situational awareness for autonomy.
This means you can leverage GenAI’s decision-making capacity without risking data breaches or unauthorized access. You gain unrestricted data volume and processing capabilities to support large data loads, real-time data processing and large-scale simulations, as well as AI model customization. In addition, you can change or fine-tune models to meet specific business challenges.
Working with a partner significantly reduces costs. This is particularly true for expenses involving inference, fine-tuning, deployment and maintenance, and for acquiring the expertise to comply with private AI infrastructure security regulations.
How to best leverage this maturing framework? Before you stand up agents, document the workflow the traditional way. Determine the steps, decisions, actions and data at each point.
Once the flow and agents are sketched, write evaluations and tests to prove the system meets expectations end-to-end. Organizations have already demonstrated how to do this at scale: Morgan Stanley evaluated GPT-4 against expert ground truth and restricted responses to internal content — a pattern that translates easily to agent validation and auditability.
Hard constraints matter. In finance, an interest rate can’t be negative; in underwriting, a decision needs a non-discriminatory rationale. Beyond hard rules, design qualitative guardrails and “guardian” agents that challenge or veto unsafe actions.
Latency is easy to test: set thresholds and fail fast. Leakage is trickier. If you use public models or services, you may have limited guarantees about how prompts and data are retained. Attacks like prompt injection can subvert instructions, and model inversion research shows that sensitive training data can sometimes be extracted from models.
For high-risk workloads, air-gapped or private deployments minimize exposure to potential security risks. When that’s impractical, enforce a strong private AI perimeter and centralize monitoring, scanning prompts in the same way you’d scan code for SQL injection, logging every agent action for forensics.
Context engineering evolved from what RAG began and, essentially, now encompasses it. RAG fetches facts into the prompt, providing extra context. Context engineering determines what goes into the prompt (instructions, templates, examples and retrieved facts), ensuring the outputs match your tone and risk posture (it also post-processes to augment, validate and guardrail the output). As agents accumulate session memory and tool state, the challenge expands from prompts to a whole-of-system “context architecture” that manages memory, retrieval, instructions and guardrails together.
In practice, for agents to safely take action, you’ll need platform capabilities that bundle secure deployment, workflow orchestration, MCP-based data access, evaluations, policy enforcement and LLMOps (which bring tracing, telemetry and cost controls to AI apps).
Many teams package these into platform-independent private AI solutions, such as DXC’s Private AI for the Enterprise, and workbenches[CN1] , allowing them to transition from prototypes to fully governed production systems.
Here’s a practical first-90-day deployment plan:
RAG was foundational; now it’s table stakes. The next step is designing an agentic, private AI that can act under your rules. Start by mapping a single workflow, exposing the right data via MCP, adding hard and qualitative guardrails, then instrumenting everything with LLMOps. LLMOps includes tools like Langfuse for observability and LiteLLM for multi-provider routing, centralizing logs, and managing latencies and spend.
Praveen Cherukuri is a chief technologist at DXC, leading AI-driven digital transformations for global enterprises. With deep expertise in scaling systems, cloud optimization, and AI strategy, he helps organizations accelerate growth and enhance efficiency. Passionate about innovation, he creates competitive advantages for DXC clients across industries.
Ryan Nowak is the Offering Lead, Private AI at DXC Technology, bringing over two decades of experience in product development, SaaS strategy and agile leadership. Previously holding senior roles at Centersquare, ONE Discovery, and Brainspace, he has led global teams in product innovation and strategy execution. Ryan specializes in private AI, product road mapping, and scalable enterprise solutions.