Join Nostr
2025-03-18 23:15:47 UTC

Royce Williams on Nostr: The protective value of "k-anonymity"¹ for Have I Been Pwned / Pwned Passwords API ...

The protective value of "k-anonymity"¹ for Have I Been Pwned / Pwned Passwords API lookups is significantly reduced because frequency data is included. And the more common the password, the more this effect is magnified.

An example:

https://gist.github.com/roycewilliams/2034c9253d46fbcaefb13f8e5d42daa2

... with cracks:

https://gist.github.com/roycewilliams/2bb471cc90cce7f6834204344590fcac

Using "k-anonymity"¹ to return all hashes that begin with b2e98 is less "anonymous" ... when 98.6% of the passwords (by frequency across all leaks) are the top one.

It's not really hiding a needle in a haystack if you just lay it on top.

Edit: in fact, even *without* the frequency data, since some passwords are much more common than others ... left-skewed distribution is an intrinsic property of password data. Missing frequency data can be largely reconstructed from public cracking efforts. (And even if that weren't true, the hashes can just be cracked using traditional methods. If the cracking community can get a 97%+ cracking rate², what is being achieved other than plausible deniability?)

K-anonymity [as implemented by HIBP, anyway -- true K-anonymity is different¹] may just be a bad fit for password hashes.

¹ Not actually k-anonymity at all:
https://en.wikipedia.org/wiki/K-anonymity

² Actually closer to 99.29% across the entire corpus, publicly:
https://gist.github.com/roycewilliams/40f0e8c93ec9c69f5b5a1874c76f2587

#passwords #HaveIBeenPwned