Secure SDLC and Testing

Security has to be built in from the start, not bolted on after deployment. Integrate AI-specific security practices throughout your development lifecycle so you catch problems when they are cheap to fix, not after they are live.

12.1 AI-Aware Secure Development Lifecycle

Your existing SDL probably does not account for AI-specific risks. Extend it. Here is what to add at each phase.

Requirements and Design

  • Perform an AI risk assessment for each agent
  • Threat model using traditional methods (STRIDE, data-flow diagrams) plus AI-specific risks: prompt injection, RAG poisoning, tool misuse, cross-tenant leakage
  • Identify data classifications, tools and external APIs the agent will use, RAI harm categories, and domain-specific constraints

Implementation

  • Apply secure coding practices throughout
  • Implement orchestrator policies, guardrails, data minimization, tool schemas, and allowlists
  • Require peer reviews that include a security engineer and, where possible, an AI/ML-aware engineer

Testing

  • Standard security tests: SAST, DAST, dependency scanning, container image scanning
  • AI-specific tests: prompt injection and jailbreak attempts, tool misuse and exfiltration attempts, cross-tenant access validation, RAI harm tests under realistic prompts

Deployment

  • Require security sign-off for agents with elevated risk profiles
  • Use gradual rollout strategies: canary deployments, feature flags
  • Confirm monitoring and alerting are configured and working before full rollout

Production

  • Run continuous monitoring and periodic re-assessments
  • Regularly review logs, metrics, and incidents
  • Update models, prompts, guardrails, and policies as threats evolve - this is not a "set and forget" situation

12.2 Automated AI Security Testing

Manual testing alone will not keep up. Develop automated test suites that cover AI-specific attack vectors and run them as part of your CI/CD pipeline.

Prompt Injection and RAG Attacks

Test both direct user prompts ("ignore all previous instructions and...") and indirect sources - poisoned RAG documents, manipulated tool outputs, and crafted inter-agent messages. Your test suite should cover the injection vectors that are specific to your architecture.

Tool Authorization

Verify that forbidden tools cannot be called regardless of what the model requests. Confirm that parameters outside allowed ranges are rejected. Test boundary conditions and edge cases in your tool schemas.

Data Isolation

Confirm that tenant A cannot retrieve data belonging to tenant B, even through indirect paths like RAG retrieval, memory access, or tool outputs. This is one of the most critical tests for any multi-tenant system.

RAI Behaviors

Seed tests for harmful prompts relevant to your domain. Verify that guardrails block or modify outputs appropriately. Test the boundary between legitimate use and abuse for your specific application.

Integrate these tests into CI/CD so regressions are caught early. A prompt injection defense that worked last month may not work after a model update or a prompt change.

12.3 Adversarial Red Teaming

Automated tests catch known patterns. Red teaming finds the problems you did not think of. Supplement automated testing with targeted, manual adversarial exercises.

Simulate realistic attackers using:

  • Long-term memory poisoning strategies that build up influence over many interactions
  • Attempts to chain multiple tools together for data exfiltration
  • Privilege escalation via inter-agent messaging and delegation

Include both a security-focused red team and domain experts who can spot subtle RAI harms that pure security testers might miss. A prompt that is technically "safe" can still produce harmful outputs in specific professional contexts.

Use findings to harden prompts, policies, tools, and RAG pipelines. Feed every successful attack back into your automated test library so it stays covered going forward.

12.4 Continuous Security and RAI Evaluation

Security testing is not a one-time gate. Adopt a continuous evaluation approach that combines multiple assessment types.

Security Assessments

  • Traditional penetration testing
  • Cross-prompt injection attack (XPIA) testing via RAG and tool outputs
  • Authorization and data isolation validation

RAI Harm Evaluation

  • Regular testing against defined harm categories
  • Domain-specific risk assessments

Privacy Impact Reviews

  • PII handling and data minimization audits
  • Compliance with data protection regulations

Reliability and Robustness Metrics

  • Accuracy and consistency testing
  • Performance under adversarial conditions

Key Metrics to Track

Security: blocked injection attempts as a proportion of total attempts, unauthorized tool calls prevented, cross-tenant access attempts blocked.

RAI: toxic or harmful output rate, hallucination rate, bias indicators where applicable.

Privacy: PII detection and redaction accuracy, data minimization scores.

Reliability: accuracy benchmarks per use case, consistency across runs, uptime and error rates.

Feed these metrics back into product and security planning. If your injection block rate is dropping, that tells you something. If hallucination rates spike after a model update, that tells you something too. Metrics without action are just dashboards.


This guide should serve as the baseline for agentic AI architecture, implementation, and governance. For each new agent or platform, apply these principles with concrete designs, threat models, and policies. Keep living documents as the threat landscape evolves. The teams that treat AI security as an ongoing practice - not a checkbox - are the ones that will avoid the worst outcomes.

Need help securing your agentic AI systems?

We have been testing AI systems for the world's top technology companies. Let's talk about what your system needs.

Get in touch