I’m saying that any kind of NN whose target task is predicting what happens next (for example, next token prediction in the context of LLM training, or a recurrent network a la Elman, or the Bayesian brain hypothesis) elides the difference between an “external guide” and self-driven adjustment.
the whole point about those tasks, conceptually, is that they provide teaching without a teacher….