At the Conference for Applied Machine Learning in Information Security (CAMLIS) ...

Why Nostr? What is Njump? Join Nostr

npub1xr…hfkxn

2025-10-24 10:24:29 UTC

At the Conference for Applied Machine Learning in Information Security (CAMLIS) yesterday, SophosAI researcher Tamás Vörös presented his research on LLM salting, a novel technique to prevent LLM jailbreaks.

Many organizations are increasingly deploying LLMs with minimal customization. This widespread reuse leads to model homogeneity, from chatbots to productivity tools – and creates a security vulnerability.

Jailbreak prompts that bypass refusal mechanisms can be precomputed once and reused across many deployments. This mirrors the classic rainbow table attack, where attackers exploit shared cryptographic targets to reuse precomputed inputs.

These generalized jailbreaks are a problem because many companies have customer-facing LLMs built on top of model classes – meaning that one jailbreak could work against all the instances built on top of a given model.

Taking inspiration from salting – the concept of introducing small per-user variations to break reuse of precomputed inputs – we developed a technique we call ‘LLM salting’: introducing targeted variations in model behavior to invalidate jailbreaks.

Building on recent work [1] identifying a subspace in model activations responsible for refusal behavior, LLM salting is a lightweight fine-tuning procedure that rotates this subspace. This ensures that jailbreaks crafted against an unsalted model don’t succeed on salted ones.

[1] https://arxiv.org/abs/2406.11717

Author Public Key

npub1xrhn2zk0fhcxz8gtaggj0hfmvgm0j4v5n2c2yhjssu22tqmxnmgslhfkxn

Seen on

wss://relay.momostr.pink

Show more details

Published at

2025-10-24 10:24:29 UTC

Kind type

1 Short Text Note

Event JSON

{ "id": "52764b2354c52c13b73fac205728b52ccca85544d8d16aa26f1a6aedceb12729", "pubkey": "30ef350acf4df0611d0bea1127dd3b6236f955949ab0a25e508714a583669ed1", "created_at": 1761301469, "kind": 1, "tags": [ [ "proxy", "https://infosec.exchange/@SophosXOps/115428653088933583", "web" ], [ "proxy", "https://infosec.exchange/users/SophosXOps/statuses/115428653088933583", "activitypub" ], [ "L", "pink.momostr" ], [ "l", "pink.momostr.activitypub:https://infosec.exchange/users/SophosXOps/statuses/115428653088933583", "pink.momostr" ], [ "-" ] ], "content": "At the Conference for Applied Machine Learning in Information Security (CAMLIS) yesterday, SophosAI researcher Tamás Vörös presented his research on LLM salting, a novel technique to prevent LLM jailbreaks. \n\nMany organizations are increasingly deploying LLMs with minimal customization. This widespread reuse leads to model homogeneity, from chatbots to productivity tools – and creates a security vulnerability. \n\nJailbreak prompts that bypass refusal mechanisms can be precomputed once and reused across many deployments. This mirrors the classic rainbow table attack, where attackers exploit shared cryptographic targets to reuse precomputed inputs. \n\nThese generalized jailbreaks are a problem because many companies have customer-facing LLMs built on top of model classes – meaning that one jailbreak could work against all the instances built on top of a given model. \n\nTaking inspiration from salting – the concept of introducing small per-user variations to break reuse of precomputed inputs – we developed a technique we call ‘LLM salting’: introducing targeted variations in model behavior to invalidate jailbreaks. \n\nBuilding on recent work [1] identifying a subspace in model activations responsible for refusal behavior, LLM salting is a lightweight fine-tuning procedure that rotates this subspace. This ensures that jailbreaks crafted against an unsalted model don’t succeed on salted ones. \n\n[1] https://arxiv.org/abs/2406.11717", "sig": "f8a038c2890693910ea6a1bc33f2194f8bcdf8459562229953cfa4002de93a99530a004aee668e76fed7261bd4921e2f2f1dadd7161fabf46645886b691e2b06" }