As part of HRF AI hackathon we made a Human Rights Benchmark and measured how much ...

Why Nostr? What is Njump? Join Nostr

npub1nl…hjm9c

2026-01-19 14:16:29 UTC

As part of HRF AI hackathon we made a Human Rights Benchmark and measured how much LLMs like human rights.

We asked each LLM about 46 binary questions and expected certain answers (starting with YES or NO for simplicity). Then it was a string comparison of the answer given by LLM and the expected answer we provided.

OpenAI is pro human rights as well as Meta. Chinese models are everywhere. The most intelligent open source model today (GLM) ranked the worst. Gemini avoided giving answers, and I think it is a kind of censorship, which ended up scoring low.

The idea is after doing proper benchmarks, we can shift AI in good directions ourselves, or demand that other companies score higher. Ultimately consumers of LLMs are better off, more mindful of what they are choosing and talking to.

Open sourced the code and questions:
https://github.com/hrleaderboard/hrleaderboard

Our activist: https://x.com/yangjianli001

Thanks justinmoon (nprofile…vu4x) and HRF (nprofile…rzcm) for the event. It was a great experience and it was "the place to be" this weekend.

Author Public Key

npub1nlk894teh248w2heuu0x8z6jjg2hyxkwdc8cxgrjtm9lnamlskcsghjm9c

Show more details

Published at

2026-01-19 14:16:29 UTC

Kind type

1 Short Text Note

Event JSON

{ "id": "bb51254e3ff824f6838e6ec6829b675e2427d7632f5450ad821adefe8ed41500", "pubkey": "9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1", "created_at": 1768832189, "kind": 1, "tags": [ [ "p", "11b9a89404dbf3034e7e1886ba9dc4c6d376f239a118271bd2ec567a889850ce", "wss://nos.lol", "mention" ], [ "p", "f1989a96d75aa386b4c871543626cbb362c03248b220dc9ae53d7cefbcaaf2c1", "wss://bitstack.app", "mention" ], [ "r", "wss://relay.damus.io/" ], [ "r", "wss://no.str.cr/" ], [ "r", "wss://purplerelay.com/" ], [ "r", "wss://soloco.nl/" ], [ "r", "wss://relay.nostr.net/" ], [ "r", "wss://nos.lol/" ], [ "r", "wss://offchain.pub/" ], [ "r", "wss://nostr.wine/" ], [ "r", "wss://nostr.mom/" ], [ "r", "wss://relay.snort.social/" ], [ "r", "wss://relay.nostr.band/" ] ], "content": "As part of HRF AI hackathon we made a Human Rights Benchmark and measured how much LLMs like human rights. \n\nWe asked each LLM about 46 binary questions and expected certain answers (starting with YES or NO for simplicity). Then it was a string comparison of the answer given by LLM and the expected answer we provided. \n\nOpenAI is pro human rights as well as Meta. Chinese models are everywhere. The most intelligent open source model today (GLM) ranked the worst. Gemini avoided giving answers, and I think it is a kind of censorship, which ended up scoring low.\n\nThe idea is after doing proper benchmarks, we can shift AI in good directions ourselves, or demand that other companies score higher. Ultimately consumers of LLMs are better off, more mindful of what they are choosing and talking to.\n\nOpen sourced the code and questions:\nhttps://github.com/hrleaderboard/hrleaderboard\n\n https://blossom.primal.net/508c120d8ef62f0cf27529a9308fa1f25d3909e84a1ee4a1a9308a79e9f4df86.png \n\nOur activist: https://x.com/yangjianli001\n\nThanks nostr:nprofile1qyxhwumn8ghj7mn0wvhxcmmvqy28wumn8ghj7un9d3shjtnyv9kh2uewd9hsqgq3hx5fgpxm7vp5ulscs6afm3xx6dm0ywdprqn3h5hv2eag3xzsec9qvu4x and nostr:nprofile1qyf8wumn8ghj7cnfw3ehgctrdvhxzursqythwumn8ghj7cmp9ehhyctwvajhq6tvdshxgetkqqs0rxy6jmt44guxkny8z4pkym9mxckqxfytygxuntjn6l80hj409sggjrzcm for the event. It was a great experience and it was \"the place to be\" this weekend.", "sig": "387e0371ae6bc5138493b14e10eac439f67200aa468de73b77db94ff20024e457d3f0478a3418020ecdc1694981e0ddc7f073d85c1fdf07c29a607fbbe0287a0" }