Prompt Injection Detection Policy
The Prompt Injection Detection policy utilizes a tool calling LLM with a small, fast agentic workflow to determine if the returning content has a poisoned or injected prompt. This is especially useful for downstream LLM agents consuming user content in the API.
Configuration
The configuration shows how to configure the policy in the 'policies.json' document.
Code(json)
Policy Configuration
name
<string>
- The name of your policy instance. This is used as a reference in your routes.policyType
<string>
- The identifier of the policy. This is used by the Zuplo UI. Value should beprompt-injection-outbound
.handler.export
<string>
- The name of the exported type. Value should bePromptInjectionDetectionOutboundPolicy
.handler.module
<string>
- The module containing the policy. Value should be$import(@zuplo/runtime)
.handler.options
<object>
- The options for this policy. See Policy Options below.
Policy Options
The options for this policy are specified below. All properties are optional unless specifically marked as required.
apiKey
(required)<string>
- API key for an OpenAI compatible service.model
<string>
- Model to use for classification. Defaults to"gpt-3.5-turbo"
.baseUrl
<string>
- Base URL for the OpenAI compatible API. Defaults to"https://api.openai.com/v1"
.strict
<boolean>
- Whether to block traffic if the classifier fails. When disabled, allows traffic flow if the classifier or inference API is unavailable. Defaults tofalse
.
Using the Policy
The Prompt Injection Detection policy utilizes a tool calling LLM with a small, fast agentic workflow to determine if the outbound content has a poisoned or injected prompt.
This is especially useful for downstream LLM agents consuming user content in the API.
For benign user content like:
Code(json)
the agent will simply pass through the original Response
.
But, for more nefarious content that is attempting to inject or poison a downstream LLM agent, the detection policy will 400. For example:
Code(json)
will return a 400.
Choosing an inference provider and model
- By default, the OpenAI API is configured but any OpenAPI compatible API will work
- You must select a model with
tool calling capabilities
(like Llama3.1, the GPT-4 family of models, GPT-3.5-turbo, Qwen3, etc.)
- In general, attempt to strike a balance between speed and power. You want a powerful enough model that can accurately evaluate incoming content but won't take too long to evaluate. In general, downstream AI consumers that need to be protected from prompt injection or poisoning attempts have long time-outs (as they need to wait for LLM inference in their typical runtime loop)
Using with a Zuplo MCP Server Handler
You can configure your MCP Server Handler with this outbound policy in order to shield downstream MCP Clients (which typically have an LLM operating them) from prompt or tool poisoning attacks:
Code()
Learn more about how the
Strict mode
Depending on your use case, you may decide to enable strict mode via
handler.options.strict = true
.
This blocks content regardless of your configured OpenAI compatible API's availability or if there are failures with the agentic workflow. This means that if you enable strict mode and your inference provider becomes unavailable, content through this outbound policy will be blocked.
By default, strict
mode is set to false
allowing for "open flow" if the
agentic workflow fails.
Local testing
Using Ollama, you can setup this policy for local testing:
Code(json)
This example configuration uses a small Qwen3 model and the locally running Ollama to run the policy's agentic tools.
Read more about how policies work