Thus the training data didn't just contain text, but rather text where each passage is tagged and attributed to a particular user.
This aspect of the training data was critical in creating the illusion of talking to another person.
An LLM doesn't just predict the next text. It predicts the next text that might come from another user. You need to hard code this in to make it work well.
Leave it out and there is no conversation.