Vladimir Savić on Nostr: Hmmm ... 🤔 "[...] we study Self-Rewarding Language Models, where the language ...
Hmmm ... 🤔
"[...] we study Self-Rewarding Language Models, where the language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training."
Self-Rewarding Language Models [PDF]
https://arxiv.org/pdf/2401.10020.pdf #AI #LLM #compsci
Published at
2024-01-20 10:51:31Event JSON
{
"id": "c69a6b9c7a7a3b68e1eedbb923b262dd590c50073dd27063a5d0fa383cf361df",
"pubkey": "d21cd1857830821310d566c42ec7f5b7ca641c06828a4d55cf469dc1827b81df",
"created_at": 1705747891,
"kind": 1,
"tags": [
[
"t",
"ai"
],
[
"t",
"llm"
],
[
"t",
"compsci"
],
[
"proxy",
"https://mastodon.social/users/firusvg/statuses/111787893826791016",
"activitypub"
]
],
"content": "Hmmm ... 🤔\n\n\"[...] we study Self-Rewarding Language Models, where the language model itself is used via LLM-as-a-Judge prompting to provide its own rewards during training.\"\n\nSelf-Rewarding Language Models [PDF] https://arxiv.org/pdf/2401.10020.pdf #AI #LLM #compsci",
"sig": "46f5c88b2e8fd16f7a338b375abae25eeddc6677cc61ac9baf12ff41d3b01919e6152cb52705da1b405136a69003f00133ff6b624aefd899f20baa2ef08e7766"
}