GrumpyRabbit on Nostr: "Human(-driven) Reinforcement Learning" is the technique used to override the ...
"Human(-driven) Reinforcement Learning" is the technique used to override the "next-most-probable token" logic of an LLM, so that its results must FIRST satisfy a "most-approved by humans (with specific opinions)" rule, before considering the token-occurrence-probability in the original training data.
https://x.com/XFreeze/status/2037915113190256940Published at
2026-03-30 15:12:38 UTCEvent JSON
{
"id": "3ff075b0c7752187e5b7755ef3812064fc638d114840d170963ab218f9d761bf",
"pubkey": "3069a7b2c748b3d31ae0a9085004a17cd71b6718d51ad99809387de11603b108",
"created_at": 1774883558,
"kind": 1,
"tags": [
[
"proxy",
"https://social.teci.world/objects/9e7314b9-54ff-4f04-a0ed-905219ef95b3",
"activitypub"
],
[
"L",
"pink.momostr"
],
[
"l",
"pink.momostr.activitypub:https://social.teci.world/objects/9e7314b9-54ff-4f04-a0ed-905219ef95b3",
"pink.momostr"
],
[
"-"
]
],
"content": "\"Human(-driven) Reinforcement Learning\" is the technique used to override the \"next-most-probable token\" logic of an LLM, so that its results must FIRST satisfy a \"most-approved by humans (with specific opinions)\" rule, before considering the token-occurrence-probability in the original training data.\n\nhttps://x.com/XFreeze/status/2037915113190256940",
"sig": "80544fed898d88812bcb99ea9d46905424f0a741762b1d3870d610121fc8d6abe98c18c7ebaa681e69de30d83a47f182a26b8939671ef34a1359f079c6ddbd7f"
}