Journalism at Risk in the AI Era | Harvard Independent

As generative artificial intelligence becomes increasingly central to how people seek information, the journalism industry faces a dilemma. Large language models may not yet be able to break news, but they can synthesize it immediately. With this, what incentivizes readers to visit multiple online publications and sustain civic participation; and if journalism becomes an obsolete medium, where will generative AI then get content from?

“[AI] platforms may be awakening to the reality that the free ride is over, but until there is more concerted collective action from the news industry, the leverage is still very much in favor of platforms,” said Ashirwaad Badami, lead of the Media Management and Leadership specialization at Northwestern University’s Medill School of Journalism, in an interview with the Independent.

Reporting the news means paying a full staff of reporters and editors, funding weeks or months of original research and verification, and providing support from legal review and media liability insurance before a single story is published. Major investigative pieces often require months of work and can bill news outlets into the hundreds of thousands of dollars in staff time.

By contrast, once an LLM is trained, generating an article has extremely low marginal costs relative to producing a human reported piece. McKinsey describes generative systems as enabling low-cost, large-scale content generation, even as the models themselves remain infrastructure-intensive to build and run. This cost asymmetry sits at the heart of today’s debate: publishers face declining ad and subscription revenue in the digital attention markets that fund reporting, while AI services can satisfy user demand with summaries that don’t necessarily send readers or revenue back to the source.

Publications argue that AI systems both ingest their work to train models and output such information in response to user queries, reducing the incentive to visit journalism sites. Several empirical snapshots suggest online reader traffic has declined since AI summaries became prominent search options: Google referrals were down from 1-25% across eight weeks in May and June 2025, with losses outnumbering gains two-to-one; reporting ties the drop to Google’s “AI Overviews.”

“We’re witnessing an important shift in consumer behavior that is controlling,” Badami explained.

AI firms counter that partnerships with journalism companies and link attribution on their sites can send audiences back. OpenAI established licensing deals with The Associated Press and Axel Springer in 2023, the Financial Times and News Corp in 2024, and The Washington Post in 2025, where ChatGPT includes summaries with links to the original reporting. Notably, the New York Times established its first generative-AI licensing deal with Amazon for Alexa and model training in May. The Wall Street Journal reported Amazon pays $20-25 million annually.

These groups aim to have links materially offset “zero-click” behavior, when a user receives an answer to their question on a search engine like Google’s AI Overview without ever clicking a website link. The broader dynamic helps explain why publishers say “credit” isn’t an economic substitute for visits, ads, and subscriptions.

Professor Kelly Cutler of Digital Marketing and Visual Communication at Northwestern University’s Medill School of Journalism discussed “zero-click” marketing in an interview with the Independent.

“Many of my clients and the companies that I work with, and my students’ companies, are seeing a drop in Google search traffic. But where is that going? Are we now seeing an uptick in referral traffic from ChatGPT, Perplexity, Cloud, or Copilot? And if that is happening, are we also seeing impressions coming from different sources?” she said. “I think metrics are evolving, and I think it’s super important for digital marketers to stay in front of that.”

Experts debate the effectiveness of these licensing deals. “I’m not convinced that wider adoption of AI-generated results will drive more traffic to publishers,” Badami said. “The 2025 Pew Research report indicates that only 1% of users are actually clicking on source links in Google AI summaries.”

Alternatively, Cutler felt optimistic about these new collaborations. “I think that publishers actually might be able to gain quite a bit over time if they approach deals like this in a way that is mutually beneficial,” she said.

Cutler elaborated on the deal between the New York Times and Amazon. “I think this will set an interesting new approach, potentially a new precedent to how media companies can work with technology companies in new and different ways.”

In addition to “zero-click” marketing, publishers also point to reputational harm when AI systems misstate facts while citing them. This includes Encyclopedia Britannica and Merriam-Webster, both currently suing Perplexity in the Southern District of New York over alleged copying and false branding; separately, Anthropic proposed a $1.5 billion settlement with authors over training on pirated books, which is now under judicial scrutiny. Together, these moves aim to push AI access from de facto free to negotiated, reinforcing the concern that misattribution and hallucinations can boomerang onto the cited outlet’s credibility.

Even before AI, local news was in crisis. Medill has tracked the loss of thousands of U.S. newspapers since 2005 and the growth of “news deserts.” If AI interfaces divert additional attention from smaller outlets (those with less leverage to cut licenses), the risk is a familiar one: more closures, less coverage, and fewer training inputs later on. These losses have societal implications; studies show an association between the decline of local news and citizen engagement in politics.

Both Badami and Cutler agree that smaller institutions are in danger. “For large institutions with big legal teams and a lot of experience handling these issues, they’ll have a better time than the smaller ones,” Cutler said.

“I think small publishers should be compensated for their content if it is used, but my concern is they have little representation and limited resources individually to pursue the kind of litigation and corrective action needed to secure the deals,” Badami said.

Still, both expressed optimism for these news centers. “I do see the smaller organizations, news outlets, and even journalists, looking at new ways to combat these tricky issues. And so my hope is that they will survive, and that they’ll actually thrive, because, again, maybe they can find new audiences in different ways,” Cutler said.

“CloudFlare’s efforts to block AI platforms from scraping sites and using original content is emblematic of the kind of technological approach we need to protect smaller publishers,” Badami said.“But this is not a standardized feature yet. In the absence of protective measures like CloudFlare’s, we need more representation for smaller publishers to ensure that they are not taken advantage of.”

President and CEO of “The News Media Alliance” Danielle Coffey anticipates all publishers will benefit from AI licensing deals, regardless of size. “We believe that voluntary collective licensing agreements and frameworks in which publishers are paid per use of their content within AI models is possible and has taken place in countless circumstances across many content industries,” she said. “Innovation has always risen alongside a healthy relationship with the content it relies on to serve Americans what they demand and deserve.”

U.S. copyright law complicates how publishers can proceed: facts are not protected; original expression is. That principle, set out by the Supreme Court in the 1991 case Feist Publications v. Rural Telephone Service, underpins much of today’s debate. Courts have also narrowed the old “hot-news” misappropriation doctrine: decisions like NBA v. Motorola in 1997 and Barclays v. TheFlyOnTheWall in 2011 limit state-law “hot-news” claims where the Copyright Act governs. In plain terms, ‘we’re not stealing the facts, just the facts’ isn’t a defense to copying protected expression, but it does capture why publishers are focusing on AI’s recent verbatim regurgitation and source-stripping.

Harvard Law School Professor Rebecca Tushnet ’95 offered further perspective on the legal limitations to intellectual property. “No one has ever been able to draw [the] line [between using facts and appropriating expression] in the abstract, and no one ever can,” she said to the Independent. “It’s possible to say that facts are unprotected, and it’s only the expression in reporting the facts that is protected, but you’re always going to have to do a case-by-case inquiry to figure out what actually happened.”

“Historically, we haven’t had compulsory licensing for facts because, for news to extract the facts, we haven’t needed it. The principle that facts are unprotected has served us well without any compulsory license.”

Specialists do not foresee any major government responses to scraping in the near future. “I would be surprised if we were to see something big and national and sweeping in terms of federal legislation,” Cutler said. “But I do think that we see these precedents happening, and it might be in other countries. It might be in certain states.” Badami agreed that the New York Times deal could serve as a precedent.

So far, three U.S. courts issued the first substantive rulings on whether using copyrighted works to train generative AI is fair use, and they split. In June, Bartz v. Anthropic held that training on books was “exceedingly transformative” and therefore fair use, while allowing separate claims about acquiring pirated copies to proceed to trial. Two days later, on June 25, 2025, a district judge likewise ruled that training Meta’s LLaMA models on books, including copies obtained from shadow libraries, qualified as fair use, while leaving distribution and market harm for further proceedings.

By contrast, in February, Thomson Reuters v. ROSS Intelligence, which involved copying Westlaw headnotes to build a competing legal-research tool, was found as not fair use; this ruling is under appeal and is now before the Third Circuit.

“Even if courts ultimately find that some training is fair use—and we don’t think they will—that’s very distinct from real-time answer generation, or retrieval-augmented generation,” said Coffey. “RAG answers pull directly from publishers’ sources and often reproduce it for users nearly verbatim, in a way that directly competes with the original content and deprives publishers of traffic and revenue. There’s no plausible defense for RAG being fair use, and we expect courts to make it clear that real-time use of publisher content requires permission and compensation.

The EU’s AI Act introduces transparency obligations, and EU copyright law lets rightsholders opt out of text-and-data-mining for training. Coffey referenced similar developments that may provide context for national policy. “We’re interested in reports that the Netherlands has supported a fully-permission based AI model, based on licensing with Dutch news publishers and public domain information. It shows that responsible development is possible.”

In the U.S., the Copyright Office has kept authorship and training under review; early 2025 rulings show judges are willing to let some newsroom claims proceed while trimming others. With this current lack of support from U.S. legislation, publications must seek out alternative solutions in the private sector.

“It is going to be important for companies to understand how their content is being utilized and if people are stealing it or using it without their permission. They need to know that,” Cutler said. “But also, I think it’s interesting to consider different deal opportunities and see how licensing could potentially help to get their content out there.”

“Product differentiation is always a good idea, and it is urgently needed in the news media industry. Deal-making and tech-driven vigilance are likely going to be necessary to enforce fair use; the question is, can everyone afford to do it? We need solutions that span small, medium, and large news publishers, not just the large ones.”

Newsrooms are also experimenting with technical “breadcrumbs” to detect scraping: concealed strings designed to surface in model outputs if training occurred. This approach mirrors classic “copyright traps.” These are just the first solutions that news journalism has begun to employ.

“At the very least, [outlets need] a structured, enforceable corrections protocol that mirrors its journalistic standards. The protocol could start with real-time reporting channels—an API or dashboard where publishers can flag inaccuracies quickly,” Badami said. “These reports would trigger priority reprocessing of the model output and, where possible, the removal or correction of cached responses. I’d also add a mechanism for visible correction notices within the AI interface, similar to how newsrooms issue corrections on articles.”

As for what lies ahead, Tushnet suggested a simple truth: “What I’ve learned in the past 25 years is that predictions are going to be wrong in ways that surprise you. So I neither expect things to get better nor to get worse,” she said. “They will get different.”

Courtney Hines ’28 (courtneyhines@college.harvard.edu) writes News for the Harvard Independent.