Threat modeling for internal copilots and customer-facing agents.
STRIDE-style prompts adapted for LLM systems: abuse cases beyond SQL injection, including prompt injection, tool misuse, data exfiltration through side channels, and cross-tenant retrieval bugs.
Copilots combine traditional application threats with model-specific abuse. Your threat model should assume curious insiders, malicious outsiders, and confused legitimate users simultaneously.
Assets and trust boundaries
Identify what must never leave a boundary: PII, embeddings, tool credentials, and customer tenancy metadata. Map every path from user input to retrieval, tools, and downstream APIs—each hop is an injection surface.
Abuse scenarios to table-top
- Indirect prompt injection via documents the model is allowed to retrieve.
- Tool calling that exfiltrates secrets or triggers irreversible side effects.
- Membership inference and reconstruction attacks against vector stores.
- Jailbreak attempts scaled through automation or compromised accounts.
Controls that actually ship
- Least-privilege tool tokens with scoped actions and human confirmation for destructive operations.
- Output filters and schema validation before displaying or executing model suggestions.
- Rate limits, anomaly detection, and per-tenant retrieval constraints.
- Red-team cadence with tracked remediation SLAs.
Security sign-off should reference concrete tests and logging evidence—not a generic “we use OpenAI safely” statement.
Comments
Comments are not enabled on this site. Please use the contact page if you would like to reach us about this article.
Contact us