LLMs ‘Would Not Exist’ With out Reddit Information


Reddit CEO Steve Huffman mentioned giant language fashions “would not exist as we all know them” with out Reddit’s content material. He referred to as the platform’s user-generated knowledge “fashionable oil” for AI.

Huffman made the feedback throughout an interview at Fast Company’s Most Innovative Companies Summit.

What Huffman Mentioned About Reddit’s Worth To AI

Huffman described the place Reddit’s knowledge holds in the AI ecosystem.

Huffman mentioned:

“LLMs would not exist as we all know them with out Reddit. Reddit is considered one of the single largest sources of coaching knowledge for the LLMs and Reddit continues to be considered one of the major sources of each coaching knowledge and we’re additionally the most cited, the most cited platform throughout all fashions.”

He attributed the quotation declare to Profound, a agency that tracks AI quotation knowledge.

Huffman defined why AI firms rely on the content material.

“There’s no synthetic intelligence with out precise intelligence. At the finish of the day, these fashions are fairly easy. They’re regurgitating on a completely huge scale what they’ve consumed elsewhere and a big portion of that consumption is truly simply the human dialog on Reddit as a result of it’s pure and it covers principally each subject possible.”

Offers For Some, Lawsuits For Others

Reddit introduced knowledge licensing agreements with Google and OpenAI in 2024. Huffman referenced these as Reddit’s authentic two AI knowledge offers and didn’t announce any extra agreements.

“Since we did the authentic two offers with Google and OpenAI, that was over two years in the past, so we’ve realized so much. They’ve realized so much. The entire world’s realized so much. Particularly how beneficial Reddit’s knowledge is and the way helpful it is. And so we’re being I believe very deliberate and selective there. However yeah, we’re open and open for enterprise.”

For firms that haven’t agreed to licensing phrases, Reddit has taken authorized motion. The corporate sued Anthropic in California Superior Courtroom, alleging unauthorized use of Reddit content material and violations of Reddit’s phrases. Reddit filed a federal lawsuit against Perplexity in the Southern District of New York, together with three data-scraping corporations, alleging DMCA anti-circumvention violations and associated claims.

Huffman drew a line between the two teams.

“Corporations like Google and OpenAI the place we had good relationships, we are able to truly do a deal and put some guard rails on use and entry to our knowledge on behalf of our customers however then collaborate on making merchandise for the subsequent era of the web.”

He added that “not each firm is prepared to be a collaborative accomplice and so sadly now we have to go the different method which is lawsuits.”

Huffman instructed the viewers Reddit’s place on industrial use is easy. “Industrial use of our knowledge requires industrial phrases,” he mentioned. Reddit began charging for commercial API access in 2023, a transfer that preceded the present licensing offers.

Huffman mentioned Reddit nonetheless supplies free knowledge entry to researchers and universities and tries to stay versatile for non-commercial use.

What Modified Reddit’s Openness

In accordance to Huffman, Reddit’s willingness to share knowledge freely modified when the AI business moved away from open analysis. As SEJ previously reported, Reddit restricted entry for a lot of search engine crawlers whereas Google remained an exception.

“Traditionally, Reddit has been like we’re born of the open web and Reddit has been open and really permissive for entry to its knowledge. And actually, I believe we’d be in a unique place right now if the AI firms had been nonetheless principally open and open supply and doing open analysis.”

Huffman mentioned the problem was that Reddit couldn’t longer observe how its knowledge was getting used. “Individuals are utilizing our knowledge and we don’t know what it was getting used for,” he instructed the viewers.

Past industrial phrases, Huffman mentioned Reddit desires to forestall its knowledge from getting used to establish customers, goal them with adverts, or to exchange or disintermediate the platform.

Reddit’s Personal AI Efforts

Huffman acknowledged what he referred to as a “paradox.” Reddit’s content material powers external AI methods, however the firm additionally makes use of AI throughout its platform.

Essentially the most seen product is Reddit Solutions, an LLM-powered search function. It reads posts and feedback, then organizes them into responses constructed from verbatim consumer quotes. Huffman famous it’s designed for questions with out definitive solutions.

“What Reddit Solutions does is a few issues that are distinctive to Reddit. One, it principally solely solutions in verbatim quotes from precise individuals. After which the second factor it does is it tries to current a number of views as a result of the complete level in case you’re on Reddit, you need the human perspective.”

Behind the scenes, Reddit makes use of AI for content material moderation and classification. LLMs can consider whether or not a remark crosses into bullying, one thing Huffman described as beforehand troublesome due to the subjectivity concerned.

Huffman offered AI moderation as a method to cut back publicity to the worst content material, not as a substitute for Reddit’s neighborhood moderation mannequin.

“The worst job on the web used to be the worst content material on the web and deciding whether or not it could possibly be on-line or not,” Huffman mentioned. “That job simply goes away.”

The Grey Space Of AI-Written Posts

Huffman additionally addressed the problem of customers writing content material with AI instruments and pasting it into Reddit. That’s totally different from automated bot exercise, he harassed.

“Essentially the most annoying factor that I see not simply on Reddit, however throughout the web is anyone who wrote their publish or remark with ChatGPT after which pasted it into Reddit. Like, is {that a} bot? Actually appears like a bot, however there’s a human behind the thought.”

Huffman solid the problem as considered one of intent. “It’s essential to us that there’s a human behind the thought, behind the content material, behind the immediate,” Huffman mentioned. However he additionally famous that “the writing sucks” when customers rely on AI to compose their posts.

Slightly than making a coverage to handle it, Huffman indicated Reddit will let its neighborhood deal with the problem. Customers are already downvoting AI-written content material and calling it out in feedback. Huffman mentioned Reddit will “empower the customers extra and the subreddits extra to simply reject that form of content material altogether.”

He in contrast the broader query to calculators in math class. “Children as of late are simply studying how to write with AI. What are we going to do about it?” he mentioned. “We form of have to study, I believe, together with everyone else.”

Why This Issues

Huffman’s feedback reinforce Reddit’s pitch that its consumer discussions are a core enter for AI methods.

The AI-written content material downside Huffman described is one SEJ covered as part of a broader YouTube AI slop investigation. Reddit’s choice to let neighborhood voting deal with AI-generated posts, relatively than constructing detection instruments, is a unique path than platforms which have deployed automated labeling.

Wanting Forward

Huffman instructed Quick Firm that Reddit is “in the market speaking to of us all the time” about new knowledge offers, although he didn’t trace at a 3rd settlement.

Reddit’s lawsuits in opposition to Anthropic and Perplexity are each ongoing. The Anthropic case was the topic of a federal court docket remand listening to in March.




Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.

0
Show Comments (0) Hide Comments (0)
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Stay Updated!

Subscribe to get the latest blog posts, news, and updates delivered straight to your inbox.