coincidentally in the last days i've read a lot about the technical details of how these things work, because it bothered me when someone shows me some code they got generated by chatting for hours with a thing like that, and i didn't know how exactly it does why it does and why it always fails in subtle and bizarre ways that make it something i would never use in my productive work (apart from the ethical/cultural reasons). and if i'm critical of a thing, especially if it's a sociocultural phenomenon coming from hyped tech (like tamagotchi, java, blackberry, bitcoin, tiktok etc), i'm motivated to understand it better so i'm not talking out of my ass when criticizing/discussing it.
with the LLM chat bot stuff, it's one of the rare times where at least for me the mechanisms of how it works are not simple but kind of complicated with a lot of math on the one side and a bunch of almost industrial scale processes involved on the other side (to get the corpus for the base model, and then all the human clickfarm work for the fine tuning, and for the reinforcement learning and so on), and this second industrial factor is why it totally is a concentration of power thing and why it's so attractive for endless investor money (finally something the plebs can't easily replicate!).
i might be totally wrong but from what i've understood so far, most of what we're seeing right now comes from kind of an oppenheimer moment that karpathy had in 2015 when he was like "RNNs are cool lol it can generate code that looks like it's from the linux kernel" and it probably was harmless fooling around for him, but that changed quickly. from what i understood further is what was new about this vs CNNs is that they're letting the computer generate extremely broad, complicated state machines that those RNNs encode (because they're recurrent, maybe like applying a convolution over and over, or maybe a bit like cell automata or rewriting rules that you were interested in recently), kind of like running many many weird machine-learning-generated parsers in parallel on the input and updating its own state. and obviously they don't know how exactly these interconnected parsers/state machines make their decisions, because it's a non debuggable mess. and because of the low resolution/sampling issues, it's always noisy and gets more noisy the longer a conversation/context is and that is an unsolvable problem, so it will always make at least a little mess, but the kind of mess that will cause huge headaches down the line.
and they don't know how to make it "smart" without bootstrapping it from an enormous amount of internet garbage content. but the engineers of these things really have this big, unfulfilled desire to kind of find "the core of smartness" and getting rid of all this "useless internet content memory", but they have no idea how to get there, nobody has figured out how human problem solving works or how it can be emulated, they're just LARPing it and it falls apart very quickly if you poke it the right way. and many many people fall for this LARP. because it can be quite convincing in this truman show kind of way, but to me in every experiment i did there were almost immediately moments like the scene where the studio lights fall from the sky.
also these people have a really big structuralist issue and/or some kind of biomechanical materialism issue, kind of like when you're heavily dissociating all the time and confuse the map with the territory, or when you think humans are just playing this big game of factorio and that that is the essence of life, a bit like "the others" in the show pluribus. or like these people would put the optimus robot as a driver in a tesla to "solve traffic", instead of taking a step back and being like, maybe we need a little _less_ technology there. that's how i imagine the people running the LLM labs.