Nvidia Used Authors' Works to Train AI LLMs, Alleges Copyright Class Action
Andre Dubus, author of The Garden of Last Days, and Susan Orlean, who wrote The Orchid Thief, sued Nvidia Thursday for copyright infringement in U.S. District Court for Northern California in San Francisco, alleging the tech company used their works to train its NeMo Megatron large language models for AI.
Much of the material in Nvidia’s training data set comes from copyrighted works, including books by plaintiffs and class members, that were “copied by NVIDIA without consent, without credit and without compensation,” said the class action complaint (docket 4:24-cv-02655).
Nvidia first announced availability of NeMo Megatron in September 2022 when it released four models, the complaint said. The models were trained on the “Pile” dataset prepared by a research organization, EleutherAI, it said. One component of the Pile is a collection of books called Books3, comprising 108 gigabytes of data, about 12% of the dataset, it said.
The plaintiffs’ copyrighted books are among the works in the Books3 dataset, which was available from the Hugging Face website until October when Books3 was removed and listed as “defunct and no longer accessible due to reported copyright infringement,” said the complaint.
Nvidia acknowledged training its NeMo Megatron models on a copy of The Pile dataset, which includes the Books3 dataset, the complaint said. That means Nvidia also trained its NeMo Megatron models on a copy of Books3, it alleged. Because certain books written by plaintiffs and class members are part of Books3, including the infringed works, Nvidia “necessarily trained its NeMo Megatron models on one or more” of the infringed works, “thereby directly infringing” plaintiffs’ and class members’ copyrights, it said.
The plaintiffs ask that Nvidia be required to destroy or otherwise dispose of all copies it made or used in violation of their rights, the complaint said. They also seek statutory and actual damages, costs of a court-approved notice system to give “immediate notification” to the class, attorneys’ fees and legal costs, and pre- and post-judgment interest. An Nvidia spokesperson emailed Friday: "We respect the rights of all content creators and believe we created our models in full compliance with copyright law."