...
Google’s Viral Nano Banana Google’s Viral Nano Banana

New York Times Reporter Sues Google, OpenAI, and xAI Over Use of Journalism in AI Models

John Carreyrou, an investigative reporter for the New York Times, has filed a lawsuit against Google, xAI, OpenAI and Meta, alleging that their chatbots were trained on his reporting without authorization or payment. The suit, lodged on December 22, 2025, in a U.S. federal court, accuses the companies of copying and using his copyrighted work to build and refine large language models. By centering the claims on a single journalist’s body of work, the case marks a new escalation in copyright disputes that have so far focused largely on institutional plaintiffs.

Background on the Plaintiff

John Carreyrou is best known as an investigative reporter whose work at the New York Times and earlier at other outlets has focused on corporate misconduct and public accountability. According to reporting on the new lawsuit, he previously gained wide acclaim for exposing issues at Theranos, the blood-testing startup whose collapse became a landmark case in Silicon Valley scrutiny. That track record, built on years of document review, confidential sources and complex financial analysis, is central to his argument that AI companies are appropriating not just text, but the value of painstaking investigative work.

Carreyrou’s career includes Pulitzer Prize-winning journalism, and the complaint asserts that these award-winning articles form part of the copyrighted material that was scraped into massive datasets used to train generative AI systems. By highlighting that his most prominent investigations were allegedly ingested without consent, he is positioning the case as a test of whether high-value, high-impact reporting can be repurposed as raw material for commercial AI products without compensation. The stakes extend beyond his own portfolio, because a ruling in his favor could strengthen the ability of individual reporters to assert personal intellectual property rights alongside, or even independently from, their employers.

Details of the Lawsuit Filing

The complaint, filed in a U.S. District Court on December 22, 2025, seeks damages for what Carreyrou describes as unauthorized reproduction and distribution of his articles in the training corpora used for major chatbots. As summarized in a detailed account of the filing, the lawsuit argues that Google, xAI, OpenAI and Meta copied his work at scale to build and refine their models, then allowed those systems to generate outputs that closely track his original reporting. The complaint frames this as a continuous chain of infringement, from initial scraping to the ongoing operation of the AI tools.

Key allegations focus on how specific AI models, including OpenAI’s ChatGPT and xAI’s Grok, were trained on his investigative pieces without any license or direct negotiation. Coverage of the case notes that the suit describes large scale web harvesting of New York Times content, including Carreyrou’s stories, as part of the datasets that underpin these systems, and it contends that the companies knew or should have known that the material was protected. By formally naming xAI in a reporter-led case for the first time, the filing signals that newer entrants to the AI market face the same legal exposure as more established players, and it underscores that the rapid rollout of chatbots is now colliding with long standing copyright norms.

Defendants and Their Roles

OpenAI is accused of using Carreyrou’s work to enhance models in the GPT series through large scale web scraping that, in his view, violates copyright law. A summary of the complaint explains that Carreyrou’s articles were allegedly incorporated into datasets used to train ChatGPT and related tools, which can then reproduce passages or detailed summaries of his investigations. For OpenAI, the case raises the question of whether its reliance on broad, internet-scale data collection can be squared with the rights of individual authors whose work is central to the perceived quality of the models.

Google faces parallel claims tied to its Bard and other AI products, which the lawsuit says were trained on journalistic content that includes Carreyrou’s reporting. According to an account of the filing, the complaint argues that Google’s data harvesting practices swept up New York Times articles and then used them to power conversational tools that compete directly with news outlets for audience attention. The inclusion of Google underscores broader concerns that search and advertising giants are leveraging their reach to build AI systems on top of content they did not pay to license, potentially shifting value away from the publishers and reporters who created it.

The suit also targets xAI, founded by Elon Musk, for allegedly incorporating unauthorized materials into its Grok chatbot. Reporting on the case notes that Carreyrou’s complaint singles out xAI’s use of scraped news content as an example of how newer AI ventures are repeating the same contested practices as their larger rivals. Meta is similarly implicated for training its Llama models on datasets that, according to the lawsuit, drew from Carreyrou’s reporting and other New York Times material, a claim detailed in coverage that describes how the case expands to include social media linked AI systems. By grouping OpenAI, Google, xAI and Meta in a single action, Carreyrou is effectively arguing that the dominant AI ecosystem is built on a common pattern of unlicensed use of journalism.

Allegations of Copyright Infringement

The core of the lawsuit is the assertion that the defendants engaged in systematic scraping of New York Times articles, including Carreyrou’s investigative work, to assemble the massive datasets required for training their chatbots. One detailed account explains that the complaint describes automated tools that copied entire articles and archives, which were then fed into machine learning pipelines without any licensing agreement. From Carreyrou’s perspective, this process amounts to unauthorized reproduction and distribution of his copyrighted works at a scale that would be impossible in a traditional media context.

Beyond the initial copying, the suit alleges that the AI systems can generate outputs that directly reproduce or closely paraphrase his investigative content, sometimes without clear attribution, which he argues dilutes the value of the original work. Coverage of the filing notes that the complaint attempts to quantify harm through lost licensing opportunities, estimating that substantial revenue was foregone by creators whose material could have been licensed for training but instead was taken without payment. By putting a monetary frame around the alleged damage, the case moves beyond abstract debates about fair use and into a concrete argument that generative AI has diverted income that would otherwise flow to journalists and publishers.

Broader Implications for the AI Industry

Carreyrou’s lawsuit could set important precedents for how individual creators seek remedies when their work is used in AI training, particularly if courts accept the idea that large scale scraping of news archives requires explicit permission. A detailed overview of the case notes that the filing is part of a growing wave of challenges to AI training practices, but it stands out by centering a single reporter’s portfolio rather than a corporate plaintiff. If successful, the suit may encourage more journalists, authors and photographers to bring their own claims, which could in turn push companies like OpenAI and Google to negotiate broader licensing deals or redesign their data collection strategies.

The case also underscores rising tensions between journalism and AI at a moment when news organizations are already grappling with shifting business models and audience habits. By pressing for disclosure of training data sources and detailed accounting of how his work was used, Carreyrou is adding to the pressure on AI developers to move away from opaque practices and toward more transparent documentation of their datasets. For the industry, the risk is that a series of similar lawsuits could fragment the legal landscape and slow deployment of new models, while for reporters and publishers, the potential upside is a clearer path to compensation and control over how their reporting is repurposed in the age of generative AI.

Leave a Reply

Your email address will not be published. Required fields are marked *

Submit Comment

Seraphinite AcceleratorOptimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.