Chapter 10

Infrastructure and Sandboxing

Infrastructure security is the foundation for every other control in this guide. Isolate components, harden containers, and use cloud-native security features. Without solid infrastructure, your guardrails, access controls, and monitoring are built on sand.

10.1 Execution Isolation

Not all agent components carry the same risk. Differentiate your isolation strategy based on what each component actually does.

Standard Services

The orchestrator, model gateway, and most tool services fall into this category. Apply standard container best practices:

Run as non-root users
Drop unnecessary Linux capabilities
Use read-only root file systems where possible
Scan images regularly and patch known vulnerabilities

High-Risk Tools

Code execution, document parsing of untrusted binaries, and browser automation are a different story. These need extra isolation:

Sandbox runtimes - Use gVisor, Firecracker, Kata Containers, or similar lightweight VMs to contain execution
No network access by default - High-risk tools should be network-isolated unless there is a specific, documented reason to allow connectivity
Strict resource limits - Enforce hard caps on CPU, memory, and wall-clock time to prevent denial-of-service
Ephemeral environments - Purge the environment after each run. No persistent state, no leftover artifacts

10.2 Kubernetes and Service Mesh

If you are running agents on Kubernetes, use the platform's isolation features deliberately:

Namespace separation - Separate agent workloads, core services, and tool services into distinct namespaces
NetworkPolicies - Use NetworkPolicies (or service mesh authorization) so that only approved services can call the model gateway and tools. Agents should not be able to directly talk to databases or internal admin services
Service-to-service authentication - Use mTLS or JWTs for all internal calls. Every service should prove its identity to every other service

10.3 Model Gateway and Plane Segregation

Do not let every service call your model providers directly. Introduce a model gateway that centralizes:

Provider credentials - Keep API keys in one place, not scattered across services
Rate limiting - Prevent runaway costs and abuse
Request and response logging - Capture what goes in and what comes out for audit and debugging
Allowlisting - Only approved services can call models

Segregate your system into two planes:

Control plane - Orchestration, policies, configuration, and governance. This is where you manage what the system is allowed to do.
Data plane - Inference traffic, tool invocation, and data I/O. This is where the actual work happens.

Restrict control plane APIs to a small set of admin services and teams. Audit all changes to control plane configuration.

10.4 Supply Chain and Model Provenance

Your agent system depends on a deep stack of libraries, frameworks, and models. Track all of it.

Maintain SBOMs for base images, key libraries, and frameworks - including LLM SDKs, vector databases, and guardrail engines
Scan regularly for vulnerabilities and outdated components
Track model versions - Record the provider, model name, version, training policies (as disclosed), model cards, and evaluation results
Correlate behavioral changes with model or framework updates. When your agent starts behaving differently, you need to know whether a model version change or a library update caused it

10.5 Cloud Provider-Specific Recommendations

Each major cloud provider offers native services that map to the security controls in this guide. Here is what to use where.

Azure

Identity: Azure Entra ID with Conditional Access, Managed Identities for agents, Azure RBAC, PIM for just-in-time admin access
Secrets: Azure Key Vault with private endpoints, RBAC-based access, key rotation
Containers: AKS with Azure Network Policies, Azure Policy for Kubernetes, Workload Identity, confidential containers
Network: Azure VNet with NSGs, Azure Firewall, Private Endpoints, Azure Private Link
Model Gateway: Azure API Management with OAuth 2.0/JWT validation, rate limiting, logging, private VNet integration

AWS

Identity: IAM Identity Center, IAM Roles with least-privilege policies, SCPs, permission boundaries, IAM Access Analyzer
Secrets: AWS Secrets Manager with automatic rotation, VPC endpoints
Containers: EKS with Pod Identity, Calico or AWS Network Policies, Security Groups for Pods, Fargate for serverless isolation
Network: VPC with Security Groups, NACLs, VPC endpoints, AWS Network Firewall, AWS WAF
Model Gateway: API Gateway with IAM authorization, usage plans, VPC Link

GCP

Identity: Cloud Identity, Service Accounts with Workload Identity for GKE, IAM Conditions, VPC Service Controls
Secrets: Secret Manager with IAM-based access, versioning
Containers: GKE with Workload Identity, Network Policies, Binary Authorization, Autopilot mode; Cloud Run for stateless workloads
Network: VPC with firewall rules, Private Google Access, VPC Service Controls, Cloud Armor
Model Gateway: Cloud Endpoints or Apigee with service-to-service auth, rate limiting, Cloud Armor integration

Cross-Cloud Considerations

If you operate across multiple clouds or need to plan for that possibility:

Unified Identity: OIDC/SAML federation across providers; HashiCorp Vault for cross-cloud secrets management
Network: Direct Connect, ExpressRoute, or Cloud Interconnect for private connectivity between clouds
Observability: Centralize logs in a SIEM with consistent formatting and correlation IDs across all environments
Data Residency: Define which regions handle which data classifications. Document this and enforce it in policy.
Disaster Recovery: Start with multi-region within a single cloud. Add multi-cloud for critical systems only, and run regular DR drills to verify your failover actually works

Need help securing your agent infrastructure?

We review agent deployment architectures across AWS, Azure, and GCP - from container isolation and network segmentation to model gateway design. If you are running agents in production, we can help you harden the foundation.

Get in touch