Andrew Zonenberg on Nostr: Is anyone aware of publications or research on what sort of bugs LLM-generated or ...
Is anyone aware of publications or research on what sort of bugs LLM-generated or LLM-assisted code tends to have?
Like, we have a huge body of knowledge in the security community about how to audit human-generated codebases for the types of bugs that human developers commonly write.
But we don't have that kind of data yet (AFAIK) for the vibe-coded monstrosities all of us are going to be pentesting soon.
Gut feelings:
* There are some common threads and patterns of errors, but they're very different from purely human-authored code
* There's a lot of subtle bugs where code looks good at a glance, but is missing some knowledge of interface behavior in another module or component that was outside the context window or something
Assumptions:
* We don't know exactly which lines in the subject codebase were written by humans and which by stochastic parrots
* The code at least appears to function correctly for a nontrivial fraction of inputs, i.e. it compiles and has been debugged sufficiently that a customer is considering shipping it
Published at
2025-07-24 00:01:33 UTCEvent JSON
{
"id": "2c889d2cfa1517f3239eac860f82ec3dae2553cf780abed8b5646ee7025f9756",
"pubkey": "70517381ab3c382310e957f900da12ab82d4ba917641561da3f7fe00c57e52db",
"created_at": 1753315293,
"kind": 1,
"tags": [
[
"proxy",
"https://ioc.exchange/@azonenberg/114905271101653199",
"web"
],
[
"proxy",
"https://ioc.exchange/users/azonenberg/statuses/114905271101653199",
"activitypub"
],
[
"L",
"pink.momostr"
],
[
"l",
"pink.momostr.activitypub:https://ioc.exchange/users/azonenberg/statuses/114905271101653199",
"pink.momostr"
],
[
"-"
]
],
"content": "Is anyone aware of publications or research on what sort of bugs LLM-generated or LLM-assisted code tends to have?\n\nLike, we have a huge body of knowledge in the security community about how to audit human-generated codebases for the types of bugs that human developers commonly write.\n\nBut we don't have that kind of data yet (AFAIK) for the vibe-coded monstrosities all of us are going to be pentesting soon. \n\nGut feelings:\n* There are some common threads and patterns of errors, but they're very different from purely human-authored code\n* There's a lot of subtle bugs where code looks good at a glance, but is missing some knowledge of interface behavior in another module or component that was outside the context window or something\n\nAssumptions:\n* We don't know exactly which lines in the subject codebase were written by humans and which by stochastic parrots\n\n* The code at least appears to function correctly for a nontrivial fraction of inputs, i.e. it compiles and has been debugged sufficiently that a customer is considering shipping it",
"sig": "a5087d68b8b6518c031e34a6b935096f3138538c1a5467cbf1d81ed6084680031684824e15a606db713b371dd7767897fc2d9cff36b7e2aadb65d9f8ccc765e8"
}