Chapter 3

Core Design Principles

These principles should drive every agentic AI system. They are not aspirational goals. They are engineering requirements that determine whether your system fails safely or fails catastrophically.

3.1 Assume Prompt Injection and Model Compromise

Treat any untrusted text as potentially adversarial. That includes user prompts, RAG-retrieved content, tool outputs (emails, webpages, files), and inter-agent messages. All of it.

Design your system so that even if the model "breaks character" and follows malicious instructions, it still cannot:

Directly execute code, SQL, or HTTP requests
Bypass authorization checks
Access data outside its policy scope
Trigger high-risk actions without external verification

This is the core Zero Trust assumption for agentic AI. The model will eventually be tricked. Your architecture needs to handle that gracefully.

3.2 Orchestration Layer as Policy Brain

The model is a suggestion engine, not the authority. It suggests actions. It does not decide what actually happens.

The orchestration layer - your backend code - must:

Authenticate the user and agent
Enforce RBAC/ABAC and data access policies
Decide which tools may be used, with what parameters
Apply guardrails before and after model calls
Log and audit every step

Model outputs are advisory. Only deterministic code and policies should decide what gets executed. If your LLM can call a tool without the orchestration layer approving it, you have a gap.

3.3 Constrained Agency and Least Privilege

Give each agent:

A narrow mandate (e.g., "support FAQ summarizer," "sales analytics reporter")
Minimal tools needed to fulfill that mandate
Minimal data visibility - per-tenant, per-role, per-task

Never give an agent:

Direct access to production SQL
Generic "run any HTTP request" tools
Broad "execute arbitrary code" capabilities without sandboxing

High-risk operations must require a separate step: human approval or a dedicated privileged service that enforces its own checks. The agent proposes; something else disposes.

3.4 Separation of Concerns

Structure your system into clearly separated layers:

UI/Presentation - AuthN, UX, basic validation, rendering
Orchestration/Policy - Context assembly, policy enforcement, tool mediation
Model Inference - Prompt building, LLM invocation
Tool & Data Access - APIs, MCP, database, storage
Observability & Governance - Central logging, audit, monitoring

Concentrate security logic in the orchestration and tools/data layers. This lets you swap models without having to redo your security architecture. If your security depends on a specific model behaving a specific way, it is fragile.

3.5 Least Data and Purpose Limitation

For each step, send the model only what it needs for that task. Nothing more.

Avoid sending:

Whole raw tables or entire documents when a summary suffices
PII, PHI, or financial details unless absolutely required
Secrets and credentials (never send these to the model at all)

Prefer:

Aggregated or masked views (counts, statistics)
Domain-specific APIs that abstract raw data

The goal is to minimize the blast radius if a prompt is compromised. If the model never sees the data, the data cannot be exfiltrated through the model.

3.6 Defense-in-Depth and Fail-Safe Defaults

Layer protections at every boundary:

UI - Input limits, CSRF protection, XSS protection
Orchestration - AuthZ, rate limits, guardrails
Tools - Schema validation, business logic checks
Data - Row/document-level access, encryption, retention policies
Infrastructure - Sandboxing, network policies, monitoring

If a check fails or a classifier is uncertain, default to block or escalate - not "allow and hope." For important flows, degrade functionality (e.g., switch to read-only mode) rather than failing open. Every layer should assume the layers above it may have already been bypassed.

3.7 Governance and Traceability

You must be able to answer these questions for any impactful action your system takes:

Which user initiated it?
Which tenant?
Which agent (and which version)?
What prompts, context, and tools were involved?
What data was touched?
What changed?

To support this, you need:

Agents with assigned owners, purposes, and risk classifications
All high-risk operations logged in immutable, append-only stores
Documented runbooks and owners for AI-specific incidents

If you cannot reconstruct what happened and why, you cannot investigate incidents, satisfy auditors, or improve your system. Traceability is not optional - it is a hard requirement.

Need help applying these principles?

We review agentic AI architectures and help teams build security in from the start. If you are deploying agents in production, let's talk.

Get in touch