semisol on Nostr: LLMs struggle with good software dev, especially on implementing novel things, and ...
LLMs struggle with good software dev, especially on implementing novel things, and maintainability.
The best way them to view them is as a really expensive machine translation from English to code.
There are probably a few reasons:
1. ML models have limited generalization capability
ML models can only "do" what they have been trained on. What is outside of that can't be reliably represented or processed by the model, and so trying to do anything outside that will lead to weird results.
2. Biases in training data
Models are trained with a lot of data. The pre-training dataset can significantly bias the model (for example preferred frameworks, tools or "suggestions"), and so can the post-training (which is what results in models having a certain "design" style, or the LLMisms)
3. Reinforcement learning
The RL stage of a model optimizes for things like the number of tool calls, and a binary pass metric. The problem is that, just like human-made code, it is easier to hack on a fix than to properly integrate it.
The model is not trained for achieving anything more than satisfying your request with the bare minimum, and so you will accumulate layers of slop and slop.
Published at
2026-06-07 00:14:54 UTCEvent JSON
{
"id": "83f70f833e703f5d58e3e6d8cb5d0ef7f5a87229c642b383d7cc4ded94aa4b0e",
"pubkey": "52b4a076bcbbbdc3a1aefa3735816cf74993b1b8db202b01c883c58be7fad8bd",
"created_at": 1780791294,
"kind": 1,
"tags": [],
"content": "LLMs struggle with good software dev, especially on implementing novel things, and maintainability.\nThe best way them to view them is as a really expensive machine translation from English to code.\n\nThere are probably a few reasons: \n1. ML models have limited generalization capability\nML models can only \"do\" what they have been trained on. What is outside of that can't be reliably represented or processed by the model, and so trying to do anything outside that will lead to weird results.\n\n2. Biases in training data\nModels are trained with a lot of data. The pre-training dataset can significantly bias the model (for example preferred frameworks, tools or \"suggestions\"), and so can the post-training (which is what results in models having a certain \"design\" style, or the LLMisms)\n\n3. Reinforcement learning\nThe RL stage of a model optimizes for things like the number of tool calls, and a binary pass metric. The problem is that, just like human-made code, it is easier to hack on a fix than to properly integrate it.\n\nThe model is not trained for achieving anything more than satisfying your request with the bare minimum, and so you will accumulate layers of slop and slop.",
"sig": "02fb00bde2ffc50fcf38ee93b735ed424790cee6e4409bbbb45b38d3078593abe97e500a2a949d6fd34a3f13fe425a78a02d57284c8ba6760bc749974e734ccf"
}