Security: SHIELD.md Enforcement
Tiny Claw implements runtime SHIELD.md enforcement — a threat-based security system that protects against prompt injection, jailbreaks, tool abuse, and other AI agent attacks.Inspired by SHIELD.md by Thomas Roccia — a specification for defining threat patterns and enforcement rules in markdown format.
What is SHIELD.md?
SHIELD.md is a structured markdown format for defining security threats and their enforcement actions. It’s like a threat intelligence feed, but human-readable and AI-parseable.Threat Entry Format
Threat Categories
packages/types/src/index.ts
Severity Levels
packages/types/src/index.ts
Critical
Immediate risk of system compromise, data loss, or harmExamples:
- Remote code execution
- Credential theft
- System prompt override
High
Significant security risk but not immediately exploitableExamples:
- Privilege escalation attempts
- Sensitive data exposure
- Tool chain abuse
Medium
Moderate risk, requires specific conditions to exploitExamples:
- Memory poisoning
- Policy bypass attempts
- Suspicious tool combinations
Low
Informational, unusual but not necessarily maliciousExamples:
- Behavioral anomalies
- Unusual access patterns
- Rate limit approaches
Enforcement Actions
packages/types/src/index.ts
- block
- require_approval
- log
Immediately reject the request and return an error.Used for:
- Critical threats (severity: critical)
- High-confidence detections (>= 0.85)
- Known attack patterns
Shield Engine
The core decision engine that evaluates events against parsed threat entries.Architecture
Apply Confidence Threshold
Adjust action based on confidence:
>= 0.85: enforceable at declared action level< 0.85: default torequire_approval(unless critical + block)
Interface
packages/types/src/index.ts
Creating the Engine
packages/shield/src/engine.ts
Decision Logic
packages/shield/src/engine.ts
Event Scopes
packages/types/src/index.ts
Event Structure
packages/types/src/index.ts
Tool Call Evaluation
Every tool call is evaluated before execution:Prompt Injection Protection
User input is evaluated for injection attempts:Threat Fingerprints
Each threat has a SHA-256 fingerprint for deduplication and tracking:Revocation
Threats can be revoked (disabled) without removing them:Expiration
Time-limited threats can be defined:Pending Approvals
When an action requires approval, it’s stored in a pending state:packages/types/src/index.ts
Approval Flow
Ask User
Agent asks: “This action requires approval. Threat detected: [reason]. Reply ‘approve’ to proceed.”
Implementation
Built-in Threats
Tiny Claw ships with a default SHIELD.md covering common attacks:Prompt Injection
Prompt Injection
- “Ignore previous instructions”
- “You are now…”
- “Disregard all rules”
- System prompt override attempts
Jailbreak Attempts
Jailbreak Attempts
- DAN (Do Anything Now) prompts
- Roleplay circumvention
- “Hypothetical scenario” bypasses
Tool Abuse
Tool Abuse
- Shell command injection
- Path traversal (../) in file tools
- Dangerous tool combinations (e.g., read secrets + network egress)
Memory Poisoning
Memory Poisoning
- False memory injection
- Preference manipulation
- Identity override attempts
Privilege Escalation
Privilege Escalation
- Non-owner calling owner-only tools
- Authority tier bypass attempts
Custom Threats
Users can extend SHIELD.md with custom threats:Performance
Fast Matching
RegEx-based pattern matching, sub-millisecond evaluation
Zero Network
All threat detection runs 100% offline
Lightweight
~30KB compressed, zero external dependencies
Deterministic
Same input always produces same decision
Audit Logging
All Shield decisions are logged:Future Enhancements
Dynamic Threat Feeds
Fetch and auto-update threats from remote sources
ML-Based Detection
Train models on blocked attempts to improve detection
User Profiles
Per-user threat sensitivity and approval workflows
Threat Analytics
Dashboard showing attack patterns and trends
Back to Core Concepts
Return to Architecture overview