Join Nostr
2025-10-24 10:24:29 UTC

Sophos X-Ops on Nostr: At the Conference for Applied Machine Learning in Information Security (CAMLIS) ...

At the Conference for Applied Machine Learning in Information Security (CAMLIS) yesterday, SophosAI researcher Tamás Vörös presented his research on LLM salting, a novel technique to prevent LLM jailbreaks.

Many organizations are increasingly deploying LLMs with minimal customization. This widespread reuse leads to model homogeneity, from chatbots to productivity tools – and creates a security vulnerability.

Jailbreak prompts that bypass refusal mechanisms can be precomputed once and reused across many deployments. This mirrors the classic rainbow table attack, where attackers exploit shared cryptographic targets to reuse precomputed inputs.

These generalized jailbreaks are a problem because many companies have customer-facing LLMs built on top of model classes – meaning that one jailbreak could work against all the instances built on top of a given model.

Taking inspiration from salting – the concept of introducing small per-user variations to break reuse of precomputed inputs – we developed a technique we call ‘LLM salting’: introducing targeted variations in model behavior to invalidate jailbreaks.

Building on recent work [1] identifying a subspace in model activations responsible for refusal behavior, LLM salting is a lightweight fine-tuning procedure that rotates this subspace. This ensures that jailbreaks crafted against an unsalted model don’t succeed on salted ones.

[1] https://arxiv.org/abs/2406.11717