I think that phrasing is ambiguous enough for anyone to paint as they like. It's ...

2025-01-25 21:04:01 UTC

I think that phrasing is ambiguous enough for anyone to paint as they like.

It's likely referring to using reinforcement learning instead of trying to imitate some example data, to improve coherence (and in particular, the step that went directly from the raw model that computes conditional probs of strings--which we can sample from and talk to by conditioning on a conversation prefix--based on internet data, straight to RL without adjustments to give response strings more in line with human conversational expectations).

I can see why the author would want to phrase it that way for a broader audience. The optimization phase has much less hand-holding direction than typical. It is kind of a misleading way to put it though.

Author Public Key

npub1z8munguv5pse6fre9nnd7hlk5pm0fywh3yz8sf6leyqck0rkgkrqr4e6ds

Seen on

wss://relay.momostr.pink

Show more details

Published at

2025-01-25 21:04:01 UTC

Kind type

1 Short Text Note

Event JSON

{ "id": "62b97ac77561d1f86b8b647e92a932e165c67699a2e0e280652ab119effbdbae", "pubkey": "11f7c9a38ca0619d24792ce6df5ff6a076f491d7890478275fc9018b3c764586", "created_at": 1737839041, "kind": 1, "tags": [ [ "p", "c5627ba0b4ddd881057f321cb725368239198d0204299b821a6996eb22f126bf" ], [ "e", "dc4afdcb6fc6538924db50803fd7b4840eb9393b6544a8fed650f351d28d8488", "", "reply", "52124910b9d238c7305840021dac6df7f534470bffa7711f315cefc177451dca" ], [ "p", "9a6a1a8eefd0b53d7e0c966ab35bd904151246f03b1be98fa0d2d0eeb4940320" ], [ "proxy", "https://mathstodon.xyz/@metarecursive/113891019417648508", "web" ], [ "p", "52124910b9d238c7305840021dac6df7f534470bffa7711f315cefc177451dca" ], [ "e", "c4034a088e87256cd0c567960d82b429f3a9346e7fac9a8453c926cce3f06a58", "", "root", "9a6a1a8eefd0b53d7e0c966ab35bd904151246f03b1be98fa0d2d0eeb4940320" ], [ "proxy", "https://mathstodon.xyz/users/metarecursive/statuses/113891019417648508", "activitypub" ], [ "L", "pink.momostr" ], [ "l", "pink.momostr.activitypub:https://mathstodon.xyz/users/metarecursive/statuses/113891019417648508", "pink.momostr" ], [ "-" ] ], "content": "I think that phrasing is ambiguous enough for anyone to paint as they like. \n\nIt's likely referring to using reinforcement learning instead of trying to imitate some example data, to improve coherence (and in particular, the step that went directly from the raw model that computes conditional probs of strings--which we can sample from and talk to by conditioning on a conversation prefix--based on internet data, straight to RL without adjustments to give response strings more in line with human conversational expectations). \n\nI can see why the author would want to phrase it that way for a broader audience. The optimization phase has much less hand-holding direction than typical. It is kind of a misleading way to put it though.", "sig": "23e7001242131a1b98b9874ce601218c150ba92b2ab223bd5669d1374c312caa126770971ab0893a45ac250928c195f4c473aeb3004e2a0fd4520566f3422d30" }

Deen Abiola on Nostr: I think that phrasing is ambiguous enough for anyone to paint as they like. It's ...