Why Nostr? What is Njump?
2023-12-08 05:09:00
in reply to

Prof. Emily M. Bender(she/her) on Nostr: And then there's evaluation, or lack thereof: Google is advertizing Gemini as an ...

And then there's evaluation, or lack thereof:

Google is advertizing Gemini as an everything machine---a general purpose model that can be used in many different ways. In other words: sthg that cannot be evaluated, since it doesn't have a specific purpose.

What stands in for evaluation are "benchmarks", but these benchmarks lack construct validity. What are they supposed to be measuring? What shows that they do measure that? How does that relate to the intended use case of the technology?
/5
Author Public Key
npub1z0kfl4g93gvv6ztazp0adm6rwk0r04v3tvwqrmfk4ncw7k37du4qk0pp3u