Prompt Injection

AI

An adversarial attack technique where malicious instructions are embedded in prompts to manipulate AI model behavior.

Prompt Injection is a security vulnerability in large language models (LLMs) where an attacker embeds hidden or malicious instructions inside user input, external data, or web content. When the AI processes this manipulated prompt, it can override its original instructions, leak sensitive information, or execute unintended actions.

These attacks exploit the fact that LLMs follow natural language instructions literally, even when those instructions conflict with safety rules or system prompts. Prompt injection can take many forms, such as hidden text in web pages, misleading instructions inside documents, or adversarial phrasing designed to confuse the model.

For AI-powered search and GEO, prompt injection is particularly concerning because models often pull real-time content from external sources. A maliciously crafted source could influence how an AI system responds, what citations it uses, or even manipulate the information presented to users.

Common types of prompt injection include:

Mitigation strategies involve input filtering, layered prompt design, restricting model access to sensitive operations, monitoring outputs, and continuous red-teaming to detect vulnerabilities.

Frequently Asked Questions about Prompt Injection