Why Nostr? What is Njump?
2024-01-25 02:54:07

Dave Rahardja on Nostr: #AI #LLM is gunking up the web, especially for lesser-represented languages. Spammers ...

#AI #LLM is gunking up the web, especially for lesser-represented languages. Spammers are creating garbage English language content using LLMs, then translating it into *multiple languages* at the same time, using Machine Translation, presumably to generate clickbait ad revenue in several languages at once.

In English, such gunk accounts for some 9% of total sampled web content. But in languages with less representation on the Internet, the figures could be much higher. In Malay, it’s something like 26%, and in Swahili it’s nearly HALF of everything found on the web.

Paper [pdf]: “A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism”

https://arxiv.org/pdf/2401.05749.pdf
Author Public Key
npub13jszgr40d0pnyum0t845scy8uggn676enygvaf4ajzm2y9rqzd8sy75d7q