Technology

Reddit gives Google access to content for training AI models

Reddit has agreed to provide Google with access to its content to train the tech giant’s artificial intelligence (AI) models, the social message board site announced on Thursday.

The deal, which is reportedly worth $60 million, allows Google to use Reddit’s data application programming interface (API).

“With the Reddit Data API, Google will now have efficient and structured access to fresher information, as well as enhanced signals that will help us better understand Reddit content and display, train on, and otherwise use it in the most accurate and relevant ways,” Rajan Patel, vice president of search experience at Google, said in a news release.

The arrangement also aims to “make it easier to discover and access the communities and conversations people are looking for on Reddit” by developing new ways for its content to be displayed across Google products, Reddit said in a blog post.

“This enhanced collaboration provides Google with an efficient and structured way to access the vast corpus of existing content on Reddit and enables Google to use the Reddit Data API to improve its products and services — including supporting new ways to display Reddit content and providing more efficient ways to train models,” the platform added.


However, Reddit emphasized that the deal does not change the terms of its API policy, which it updated last year to block companies from using its API for commercial purposes without prior approval. 

Last fall, the site was reportedly considering blocking search crawlers from Google and Bing if they failed to reach a deal about paying for data access, according to The Washington Post.

The platform, which also filed to go public on Thursday, said in its IPO prospectus that it had entered into data licensing agreements worth a total of $203 million in January.

“We are also in the early stages of monetizing our emerging opportunity in data licensing by allowing third parties to access, search, and analyze data on our platform,” it said.

“Reddit data constantly grows and regenerates as users come and interact with their communities and each other,” it added. “We believe our growing platform data will be a key element in the training of leading large language models (‘LLMs’) and serve as an additional monetization channel for Reddit.”