News Publishers like The Chicago Tribune, The Denver Post Announce OpenAI, Microsoft Lawsuit Over GPT Training Data

AI has left the media industry scrambling. New AI frameworks are changing the way users read and search online, and that has major ramifications for anyone who publishes content digitally. These GenAI technologies have been built and trained in large part on the work of major publishers, but those same publishers now find that their profits are being monopolized by the same AI technologies that they’ve unwillingly fed.

So how can news publishers cope with, or even reign supreme, in the AI Internet? The answer is in data: the same valuable proprietary data that powers next-gen AI tools. News publishers need a new strategy for monetizing their data to AI developers.

AI is only as good as the data it’s trained on, and the ability to access real-time updates from leading news providers can reduce hallucinations and enhance the capabilities of any AI agent. To that end, major leaders in AI like Google and OpenAI have funded major licensing deals, paying out millions to publishers like Axel Springer — home to brands like Politico & Business Insider — for unfettered use of their data.

Some news publishers are choosing to resist: The New York Times may have ‘swung first’, suing OpenAI for their alleged use of NYTimes’ articles to train their flagship chatbot ChatGPT, but now other publishers have joined the mix. Today, brands including The Chicago Tribune & The Denver Post announced a new lawsuit against the leader in generative AI, alleging misuse of their proprietary content.

What else can publishers do? Publishers need a path to monetize data and ensure that their content appears for users relying on new ai apps, agents and experiences. Dappier makes it easy: with 1-click LLM training, Dappier ingests your brand's proprietary data and creates an LLM-ready data model that can be integrated into any platform or AI agent. Once you've created your data model, monetize your data by licensing to any AI developer, at a price point you set in the Dappier RAG Marketplace.

News publishers have traditionally relied on SEO to drive their profits, needing clicks from Google search results, social media sites and news aggregators to generate ad revenues. But what happens if users aren’t getting their information from Google, disrupting an SEO pipeline that’s become the norm across the publishing industry? If every search bar we interact with becomes capable of answering just about any questions we have, that just might be the case.

The idea that new AI tools might provide users with a superior experience to Google and majorly threatening the giant’s hold over web search isn’t some new pipe dream. Accordingly to recently leaked internal emails, a software engineer at Google warned higher-ups as far back as 2018 that “within the near future, a deep ML system will clearly outperform Google’s 20-year accumulation of relevance algorithms for web search.”

In addition to the recently announced and ongoing lawsuits, 600+ news publishers have likewise opted out of crawlers from OpenAI, Google, or Common Crawl, attempting to reduce the amount of data ingested by these organizations while waiting for courts to determine whether the use of publicly available articles in training models constitutes copyright infringement.

But if the next generation of AI tools really do prove to threaten Google's dominance in online search, these attempts at self-preservation might be moot. Brands need to ensure that their content exists, wherever users are reading. Licensing and monetizing data for AI is going to be increasingly critical to brands' online strategy.

