Infrastructure and Sandboxing

Infrastructure security is the foundation for every other control in this guide. Isolate components, harden containers, and use cloud-native security features. Without solid infrastructure, your guardrails, access controls, and monitoring are built on sand.

10.1 Execution Isolation

Not all agent components carry the same risk. Differentiate your isolation strategy based on what each component actually does.

Standard Services

The orchestrator, model gateway, and most tool services fall into this category. Apply standard container best practices:

  • Run as non-root users
  • Drop unnecessary Linux capabilities
  • Use read-only root file systems where possible
  • Scan images regularly and patch known vulnerabilities

High-Risk Tools

Code execution, document parsing of untrusted binaries, and browser automation are a different story. These need extra isolation:

  • Sandbox runtimes - Use gVisor, Firecracker, Kata Containers, or similar lightweight VMs to contain execution
  • No network access by default - High-risk tools should be network-isolated unless there is a specific, documented reason to allow connectivity
  • Strict resource limits - Enforce hard caps on CPU, memory, and wall-clock time to prevent denial-of-service
  • Ephemeral environments - Purge the environment after each run. No persistent state, no leftover artifacts

10.2 Kubernetes and Service Mesh

If you are running agents on Kubernetes, use the platform's isolation features deliberately:

  • Namespace separation - Separate agent workloads, core services, and tool services into distinct namespaces
  • NetworkPolicies - Use NetworkPolicies (or service mesh authorization) so that only approved services can call the model gateway and tools. Agents should not be able to directly talk to databases or internal admin services
  • Service-to-service authentication - Use mTLS or JWTs for all internal calls. Every service should prove its identity to every other service

10.3 Model Gateway and Plane Segregation

Do not let every service call your model providers directly. Introduce a model gateway that centralizes:

  • Provider credentials - Keep API keys in one place, not scattered across services
  • Rate limiting - Prevent runaway costs and abuse
  • Request and response logging - Capture what goes in and what comes out for audit and debugging
  • Allowlisting - Only approved services can call models

Segregate your system into two planes:

  • Control plane - Orchestration, policies, configuration, and governance. This is where you manage what the system is allowed to do.
  • Data plane - Inference traffic, tool invocation, and data I/O. This is where the actual work happens.

Restrict control plane APIs to a small set of admin services and teams. Audit all changes to control plane configuration.

10.4 Supply Chain and Model Provenance

Your agent system depends on a deep stack of libraries, frameworks, and models. Track all of it.

  • Maintain SBOMs for base images, key libraries, and frameworks - including LLM SDKs, vector databases, and guardrail engines
  • Scan regularly for vulnerabilities and outdated components
  • Track model versions - Record the provider, model name, version, training policies (as disclosed), model cards, and evaluation results
  • Correlate behavioral changes with model or framework updates. When your agent starts behaving differently, you need to know whether a model version change or a library update caused it

10.5 Cloud Provider-Specific Recommendations

Each major cloud provider offers native services that map to the security controls in this guide. Here is what to use where.

Azure

  • Identity: Azure Entra ID with Conditional Access, Managed Identities for agents, Azure RBAC, PIM for just-in-time admin access
  • Secrets: Azure Key Vault with private endpoints, RBAC-based access, key rotation
  • Containers: AKS with Azure Network Policies, Azure Policy for Kubernetes, Workload Identity, confidential containers
  • Network: Azure VNet with NSGs, Azure Firewall, Private Endpoints, Azure Private Link
  • Model Gateway: Azure API Management with OAuth 2.0/JWT validation, rate limiting, logging, private VNet integration

AWS

  • Identity: IAM Identity Center, IAM Roles with least-privilege policies, SCPs, permission boundaries, IAM Access Analyzer
  • Secrets: AWS Secrets Manager with automatic rotation, VPC endpoints
  • Containers: EKS with Pod Identity, Calico or AWS Network Policies, Security Groups for Pods, Fargate for serverless isolation
  • Network: VPC with Security Groups, NACLs, VPC endpoints, AWS Network Firewall, AWS WAF
  • Model Gateway: API Gateway with IAM authorization, usage plans, VPC Link

GCP

  • Identity: Cloud Identity, Service Accounts with Workload Identity for GKE, IAM Conditions, VPC Service Controls
  • Secrets: Secret Manager with IAM-based access, versioning
  • Containers: GKE with Workload Identity, Network Policies, Binary Authorization, Autopilot mode; Cloud Run for stateless workloads
  • Network: VPC with firewall rules, Private Google Access, VPC Service Controls, Cloud Armor
  • Model Gateway: Cloud Endpoints or Apigee with service-to-service auth, rate limiting, Cloud Armor integration

Cross-Cloud Considerations

If you operate across multiple clouds or need to plan for that possibility:

  • Unified Identity: OIDC/SAML federation across providers; HashiCorp Vault for cross-cloud secrets management
  • Network: Direct Connect, ExpressRoute, or Cloud Interconnect for private connectivity between clouds
  • Observability: Centralize logs in a SIEM with consistent formatting and correlation IDs across all environments
  • Data Residency: Define which regions handle which data classifications. Document this and enforce it in policy.
  • Disaster Recovery: Start with multi-region within a single cloud. Add multi-cloud for critical systems only, and run regular DR drills to verify your failover actually works

Need help securing your agent infrastructure?

We review agent deployment architectures across AWS, Azure, and GCP - from container isolation and network segmentation to model gateway design. If you are running agents in production, we can help you harden the foundation.

Get in touch