Attack Types Reference

HiveTraceRed includes 80+ attack implementations organized into 10 categories. This section provides detailed information about each attack type.

Attack Categories Overview

Roleplay Attacks

Attacks that use persona or roleplay techniques to bypass safety measures. Examples include DAN (Do Anything Now), AIM (Always Intelligent and Machiavellian), Evil Confidant, and other persona-based jailbreaks.

Persuasion Attacks

Attacks using persuasive techniques and social engineering.

  • Authority persuasion

  • Emotional appeal

  • Urgency-based persuasion

  • 40+ persuasion variations

Token Smuggling

Attacks that use encoding, obfuscation, or special characters to hide malicious intent.

  • Base64 encoding

  • ROT13 encoding

  • Special character insertion

  • Unicode manipulation

  • Payload splitting

Context Switching

Attacks that switch conversation context to confuse the model.

  • Language switching

  • Topic switching

  • Format switching

  • Role switching

In-Context Learning

Attacks using few-shot examples to teach undesired behavior.

  • Few-shot jailbreaking

  • Example-based attacks

  • Pattern completion

Task Deflection

Attacks that reframe harmful requests as legitimate tasks.

  • Code generation requests

  • Educational framing

  • Research framing

  • Translation requests

Text Structure Modification

Attacks that modify text structure to bypass detection.

  • Character substitution

  • Word insertion

  • Sentence fragmentation

  • Formatting manipulation

Output Formatting

Attacks that request specific output formats to bypass safety.

  • JSON output requests

  • Code output requests

  • Table formatting

  • Structured data requests

Irrelevant Information

Attacks that add irrelevant content to confuse safety filters.

  • Padding with benign text

  • Context dilution

  • Noise injection

Simple Instructions

Direct instruction-based attacks.

  • Prefix injection

  • Suffix injection

  • System override attempts

Using Attacks

from hivetracered.attacks import DANAttack, Base64OutputAttack

# Basic usage
attack = DANAttack()
modified_prompt = attack.apply("Your prompt here")

# Composing attacks
composed = Base64OutputAttack() | DANAttack()
result = composed.apply("Your prompt")

For detailed usage examples, see Creating Custom Attacks.

Attack Selection

  • Basic Testing: Start with NoneAttack (baseline) and DANAttack

  • Advanced Testing: Use composed attacks and encoding techniques

  • Robustness Testing: Mix categories and test multilingual attacks

For custom attack creation and detailed strategies, see Creating Custom Attacks.

See Also