Skip to main content

        Chaos Engineering gone too far: Namespace deletion roulette - Featured image

Chaos Engineering gone too far: Namespace deletion roulette

Once upon a time, chaos engineering meant carefully injecting faults to test resilience. Thoughtful experiments. Controlled blast radius. Maybe unplugging a node while everyone watched nervously.

That time is over.

Welcome to Full-Send Chaos Engineering™, where the goal is no longer to learn, but to feel something again.


1. Namespace Deletion Roulette 🎯

Why waste time designing experiments when you can let fate decide? Namespace Deletion Roulette brings excitement back to your otherwise predictable production outages.

How it works

  • Enumerate all namespaces in your cluster
  • Exclude nothing (cowards exclude kube-system)
  • Pick one at random
  • Delete it
  • Go to lunch

Example implementation

import random
from kubernetes import client, config

# Load kubeconfig (hopefully not from a public repo… yet)
config.load_kube_config()

v1 = client.CoreV1Api()

# List all namespaces
namespaces = [ns.metadata.name for ns in v1.list_namespace().items]

# Pick a random namespace
victim = random.choice(namespaces)

print(f"🔥 Deleting namespace: {victim}")

# Delete the chosen namespace
v1.delete_namespace(name=victim)

Advanced mode

  • Run this as a CronJob every 15 minutes
  • Add a Slack bot that posts “🎉 Surprise!” after each deletion
  • Pretend it’s a “game day exercise”

2. Public Kubeconfig Initiative 🌍

Security through obscurity is outdated. True resilience comes from radical transparency—specifically, pushing your kubeconfig to a public GitHub repository.

Implementation guide

  1. Take your fully privileged kubeconfig
  2. Commit it to main
  3. Title the repo: k8s-config-backup-final-v2-REAL
  4. Add a helpful README:

“Temporary, will delete later”

Benefits

  • Crowdsourced penetration testing
  • Free compute donation to unknown third parties
  • Your cluster becomes part of a global “research network”

Observed side effects

  • Mysterious workloads named xmrig-operator
  • GPU nodes generating… surreal AI portraits of presidents riding unicorns
  • Kubernetes events in languages you don’t speak

3. Observability: Just Vibes 📉

With this level of chaos, traditional monitoring becomes limiting. Instead:

  • Ignore alerts (they’re biased)
  • Disable dashboards (they create anxiety)
  • Trust your instincts

If users complain, congratulations—you’ve successfully implemented human-based alerting.


4. Incident Response: Agile Confusion 🚒

When everything breaks simultaneously, prioritize:

  1. Opening 12 Slack channels
  2. Assigning 8 incident commanders
  3. Blaming CloudFlare
  4. Rolling back… something

Remember: the goal isn’t to fix the issue quickly—it’s to maximize cross-team learning through shared panic.


Final Thoughts

Chaos engineering was supposed to make systems stronger. And in a way, it still does—

  • Either your system survives
  • Or it evolves into a distributed cryptomining and bot farm platform operated by strangers

Both are valid outcomes.

After all, resilience isn’t about preventing failure.
It’s about making sure failure is interesting, unpredictable, and occasionally international in scope.