#AI #GenerativeAI #LLMs #Llama #Copyright #IP #Books3: "Upwards of 170,000 books, the ...

#AI #GenerativeAI #LLMs #Llama #Copyright #IP #Books3: "Upwards of 170,000 books, the majority published in the past 20 years, are in LLaMA’s training data. In addition to work by Silverman, Kadrey, and Golden, nonfiction by Michael Pollan, Rebecca Solnit, and Jon Krakauer is being used, as are thrillers by James Patterson and Stephen King and other fiction by George Saunders, Zadie Smith, and Junot Díaz. These books are part of a dataset called “Books3,” and its use has not been limited to LLaMA. Books3 was also used to train Bloomberg’s BloombergGPT, EleutherAI’s GPT-J—a popular open-source model—and likely other generative-AI programs now embedded in websites across the internet. A Meta spokesperson declined to comment on the company’s use of Books3; Bloomberg did not respond to emails requesting comment; and Stella Biderman, EleutherAI’s executive director, did not dispute that the company used Books3 in GPT-J’s training data."
https://www.theatlantic.com/technology/archive/2023/08/books3-ai-meta-llama-pirated-books/675063/

Miguel Afonso Caetano on Nostr: #AI #GenerativeAI #LLMs #Llama #Copyright #IP #Books3: "Upwards of 170,000 books, the ...