The boom in artificial intelligence tools that draw on troves of content from across the internet has begun to test the bounds of copyright law.
Authors and a leading photo agency have brought suit over the past year, contending that their intellectual property was illegally used to train A.I. systems, which can produce humanlike prose and power applications like chatbots.
Now they have been joined in the spotlight by the news industry. The New York Times filed a lawsuit on Wednesday accusing OpenAI and Microsoft of copyright infringement, the first such challenge by a major American news organization over the use of artificial intelligence.
The lawsuit contends that OpenAI’s ChatGPT and Microsoft’s Bing Chat can produce content nearly identical to Times articles, allowing the companies to “free-ride on The Times’s massive investment in its journalism by using it to build substitutive products without permission or payment.”
OpenAI and Microsoft have not had an opportunity to respond in court. But after the lawsuit was filed, those companies noted that they were in discussions with a number of news organizations on using their content — and, in the case of OpenAI, had begun to sign deals.
Without such agreements, the limits may be worked out in the courts, with significant repercussions. Data is crucial to developing generative A.I. technologies — which can generate text, images and other media on their own — and to the business models of companies doing that work.
“Copyright will be one of the key points that shapes the generative A.I. industry,” said Fred Havemeyer, an analyst at the financial research firm Macquarie.
A central consideration is the “fair use” doctrine in intellectual property law, which allows creators to build upon copyrighted work. Among other factors, defendants in copyright cases need to prove that they transformed the content substantially and are not competing in the same market as a substitute for the work of the original creator.
A review quoting passages from a book, for example, could be considered fair use because it builds on that content to create new, unique work. Selling extended excerpts from the book, on the other hand, may violate the doctrine.
Courts have not weighed in on how those standards apply to A.I. tools.
“There isn’t a clear answer to whether or not in the United States that is copyright infringement or whether it’s fair use,” said Ryan Abbott, a lawyer at Brown Neri Smith & Khan who handles intellectual property cases. “In the meantime, we have lots of lawsuits moving forward with potentially billions of dollars at stake.”
It could be a while before the industry gets definitive answers.
The lawsuits posing these questions are in early stages of litigation. If they don’t produce settlements (as most litigation does), it could be years until a Federal District Court rules on the matter. Those rulings would probably be appealed, and appellate decisions could vary by circuit, which could potentially elevate the question to the U.S. Supreme Court.
Getting there could take about a decade, Mr. Abbott said. “A decade is an eternity in the market that we’re currently living through,” he said.
The Times said in its suit that it had been in talks with Microsoft and OpenAI about terms for resolving the dispute, possibly including a license. The Associated Press and Axel Springer, the German owner of outlets like Politico and Business Insider, have recently reached data licensing agreements with OpenAI.
Taking cases to trial could answer vital questions about what copyrighted data A.I. developers are able to use and how. But it could also simply serve as leverage for a plaintiff to secure a more favorable licensing deal through a settlement.
“Ultimately, whether or not this lawsuit ends up shaping copyright law will be determined by whether the suit is really about the future of fair use and copyright, or whether it’s a salvo in a negotiation,” Jane Ginsburg, a professor at Columbia Law School, said of the lawsuit by The Times.
How the legal landscape unfolds could shape the nascent yet heavily capitalized A.I. industry.
Some A.I. companies have been flooded with venture capital in the past year after the public rollout of ChatGPT went viral. A stock plan under consideration could value OpenAI at over $80 billion; Microsoft has invested $13 billion in the company and has incorporated its technology into its own products. But questions about the use of intellectual property to train models have been top of mind for investors, Mr. Havemeyer said.
Competition in the A.I. field may boil down to data haves and have-nots.
Companies with the rights to large quantities of data, such as Adobe and Bloomberg — or that have amassed their own data, such as Meta and Google — have started developing their own A.I. tools. Mr. Havemeyer noted that an established company like Microsoft was well equipped to secure data licensing agreements and tackle legal challenges. But start-ups with less capital may have a harder time obtaining the data they need to compete.
“Generative A.I. begins and ends with data,” Mr. Havemeyer said.
Benjamin Mullin contributed reporting.