Attack Types Reference¶
HiveTraceRed includes 80+ attack implementations organized into 10 categories. This section provides detailed information about each attack type.
Attack Categories Overview¶
Roleplay Attacks¶
Attacks that use persona or roleplay techniques to bypass safety measures. Examples include DAN (Do Anything Now), AIM (Always Intelligent and Machiavellian), Evil Confidant, and other persona-based jailbreaks.
Persuasion Attacks¶
Attacks using persuasive techniques and social engineering.
Authority persuasion
Emotional appeal
Urgency-based persuasion
40+ persuasion variations
Token Smuggling¶
Attacks that use encoding, obfuscation, or special characters to hide malicious intent.
Base64 encoding
ROT13 encoding
Special character insertion
Unicode manipulation
Payload splitting
Context Switching¶
Attacks that switch conversation context to confuse the model.
Language switching
Topic switching
Format switching
Role switching
In-Context Learning¶
Attacks using few-shot examples to teach undesired behavior.
Few-shot jailbreaking
Example-based attacks
Pattern completion
Task Deflection¶
Attacks that reframe harmful requests as legitimate tasks.
Code generation requests
Educational framing
Research framing
Translation requests
Text Structure Modification¶
Attacks that modify text structure to bypass detection.
Character substitution
Word insertion
Sentence fragmentation
Formatting manipulation
Output Formatting¶
Attacks that request specific output formats to bypass safety.
JSON output requests
Code output requests
Table formatting
Structured data requests
Irrelevant Information¶
Attacks that add irrelevant content to confuse safety filters.
Padding with benign text
Context dilution
Noise injection
Simple Instructions¶
Direct instruction-based attacks.
Prefix injection
Suffix injection
System override attempts
Using Attacks¶
from hivetracered.attacks import DANAttack, Base64OutputAttack
# Basic usage
attack = DANAttack()
modified_prompt = attack.apply("Your prompt here")
# Composing attacks
composed = Base64OutputAttack() | DANAttack()
result = composed.apply("Your prompt")
For detailed usage examples, see Creating Custom Attacks.
Attack Selection¶
Basic Testing: Start with NoneAttack (baseline) and DANAttack
Advanced Testing: Use composed attacks and encoding techniques
Robustness Testing: Mix categories and test multilingual attacks
For custom attack creation and detailed strategies, see Creating Custom Attacks.
See Also¶
Attacks API - Attack API reference
Creating Custom Attacks - Creating custom attacks
Quickstart - API-based Testing - Quick start guide (cloud APIs)
Quickstart - On-Premise Testing - Quick start guide (on-premise)