Court filing says Zuck knowingly okayed training Facebook's AI on pirated books
New court filings suggest Mark Zuckerberg happily signed off on the use of pirated materials to train Meta's Llama AI.
AI—being, essentially, computer programs designed to chew up human creativity and shit out a grim facsimile of same—require huge quantities of said artistic spark to feed into their digital maws. Sometimes, companies trying to homebrew their own uncreativity engines attempt to throw money at this problem, licensing books or articles as training data from authors and publishers. And sometimes they, uh, don’t, as suggested in new legal filings in an ongoing court case against Facebook owner Meta, which seem to show Mark Zuckerberg signing off on employees torrenting and using “shadow libraries” of pirated material in order to train the company’s Llama AI.
As surfaced by TechCrunch, this all comes from an ongoing case titled Kadrey v. Meta, in which a number of authors (including Sarah Silverman and Ta-Nahesi Coates) are accusing the tech company of unauthorized use of their work as training data for iterations of Llama. (The same group is running a similar legal strategy against OpenAI.) Lawyers for the plaintiffs recently filed to amend their previous complaint, having gotten new internal documents from Meta showing employees getting Zuckerberg’s approval to use a library of training data called “LibGen,” which the company’s own communications say is “a dataset we know to be pirated.” The filings get into some deeper weeds on the topic of the Digital Millennium Copyright Act and the California Comprehensive Computer Data
Access and Fraud Act—basically, by torrenting the files, Meta employees also knowingly distributed them, making them complicit in not just downloading, but propagating, pirated materials, as well as using computer programs to pull copyright pages out of books and articles before feeding them to Llama. But the big, flashy fact is that the filings contain references to things being “escalated to MZ,” who then signs off on the plan to train the company’s fancy new product on known pirated works.
(Fascinatingly, some of these guys sound really nervous about what they’re doing, writing, “Torrenting from a [Meta-owned] corporate laptop doesn’t feel right,” and repeatedly flagging worries about LibGen to higher-ups.)
It’s possible that most of this will still end up coming to naught, since none of it touches on the biggest part of the argument that Meta and other companies use for gobbling up works to feed into the woodchipper: That it all falls under “fair use” protections, as transformative uses of copyrighted works—a massive question American courts haven’t decided just yet. The judge in the case did note, though, that even Zuckerberg and company seem to know that knowingly downloading pirated books makes them look bad; when shooting down a request from Meta to seal the filings away from public view, Judge Vince Chhabria noted that, “It is clear that Meta’s sealing request is not designed to protect against the disclosure of sensitive business information that competitors could use to their advantage. Rather, it is designed to avoid negative publicity.” So much for that.