‘All Without Permission’

3rd News Organization Complaint vs. OpenAI, Microsoft Reaches Manhattan Court

June 28, 2024 by Paul Gluckman|Top News

The Center for Investigative Reporting (CIR), which calls itself “the oldest nonprofit newsroom in the country,” alleges that Microsoft and OpenAI copied CIR’s “valuable content” to train their AI models, and did so “without CIR’s permission or authorization, and without any compensation to CIR,” said its complaint Thursday (docket 1:24-cv-04872) in U.S. District Court for Southern New York.

It’s the third news organization complaint to reach the Manhattan court accusing Microsoft and OpenAI of large-scale copyright infringement to fuel the development of their AI models. The New York Times filed suit against Microsoft and OpenAI on Dec. 27 (see 2312270044), and eight local newspapers, including the New York Daily News and the Chicago Tribune, collectively filed a similar complaint April 30 (see 2404300034). All three complaints allege violations of the Copyright Act, the Digital Millennium Copyright Act and other statutes.

The OpenAI and Microsoft AI products “undermine and damage CIR’s relationship” with potential readers, consumers and partners, and “deprive CIR of subscription, licensing, advertising, and affiliate revenue, as well as donations from readers,” said the complaint. Microsoft and OpenAI “greatly benefit from CIR’s distinct voice in the marketplace, as CIR provides a unique perspective, especially regarding investigative topics impacting diverse communities,” it said.

If limited to a “homogenous dataset,” the defendants’ large language models (LLMs) “would be stunted in growth and power,” said the complaint. Their success depends on content creators like CIR and other members of the news media “that are unique in their style and voice,” it said. Protecting those unique voices “is one of the fundamental purposes of copyright law,” it said.

When they populated their training sets with works of journalism, OpenAI and Microsoft had a choice to respect works of journalism, or not, but they “chose the latter,” said the complaint. They copied copyrighted works of journalism when assembling their training sets, it said. Their LLMs memorized and at times “regurgitated those works,” it said.

OpenAI and Microsoft distributed those works and abridgments of them “to each other and the public,” said the complaint. They also contributed to their users’ “own unlawful copying,” it said. They removed the works’ copyright management information, and trained ChatGPT “not to acknowledge or respect copyright,” it said: “And they did this all without permission.”

OpenAI has acknowledged that use of copyright-protected works to train ChatGPT “requires a license to that content,” said the complaint. Recognizing that obligation, OpenAI has entered into agreements with large copyright owners such as the Associated Press, The Atlantic and the Financial Times to obtain licenses to include those entities’ copyright-protected works in OpenAI’s LLM training data, it said. OpenAI also is in licensing talks with other copyright owners in the news industry but has “offered no compensation” to CIR, it said.