Artificial Intelligence

Dictionary Publishers Sue OpenAI Over Copyright Infringement

by
Delimiter Team
March 17, 2026

Two of the world’s most prominent dictionary publishers have filed lawsuits against OpenAI, alleging the artificial intelligence company unlawfully used their copyrighted content to train its large language models. Encyclopaedia Britannica and Merriam-Webster claim OpenAI copied nearly 100,000 articles without permission or compensation.

Allegations of Systematic Copyright Violation

The legal complaints, filed in a United States district court, state that OpenAI’s data scraping practices for training models like GPT-4 included the wholesale reproduction of copyrighted entries from the publishers’ databases. The lawsuits argue this constitutes direct copyright infringement, as the content was used without licenses, attribution, or payment.

According to the filings, the ingested material includes detailed definitions, etymologies, and encyclopedic entries that represent decades of scholarly work and editorial investment. The publishers assert that their content forms a foundational part of the knowledge base for OpenAI’s models, enabling them to answer factual questions with accuracy.

Publisher Stance on AI training data

In official statements, representatives for both Encyclopaedia Britannica and Merriam-Webster emphasized their role as trusted sources of verified information. They expressed concern that the unauthorized use of their content undermines the economic model that supports high-quality reference publishing.

“Our publications are the result of extensive research and editorial rigor,” a spokesperson stated. “Using this material to build commercial AI products without authorization is not fair use; it is a clear violation of our intellectual property rights.” The publishers are seeking statutory damages and a permanent injunction to prevent further use of their content.

Broader Legal Context for AI Development

This case joins a growing list of high-profile lawsuits against AI companies from content creators, including news organizations, authors, and visual artists. The central legal question often revolves around the application of the “fair use” doctrine to the training of generative AI systems on copyrighted works.

OpenAI and other AI firms typically argue that using publicly available data for training falls under fair use, a position increasingly challenged by rights holders. The outcome of this and similar cases could set significant precedents for how AI models are developed and what data can be used in the process.

Potential Industry-Wide Implications

The lawsuit highlights a critical tension in the AI industry between rapid technological advancement and the protection of intellectual property. If the courts rule in favor of the publishers, AI companies may be forced to negotiate licensing agreements for vast swathes of training data or significantly alter their data collection methods.

Such a ruling could increase operational costs for AI developers and potentially slow the pace of model training. Conversely, a ruling for OpenAI would reinforce the current practice of scraping web-scale data, leaving content creators with fewer avenues to monetize their work in the AI era.

The court has not yet set a trial date. Legal experts anticipate a lengthy litigation process, with both sides preparing for a battle that may eventually reach appellate courts. The resolution of this case is expected to provide much-needed clarity on copyright law’s application to generative artificial intelligence.

Source: GeekWire