Skip to content
Thoughtful, detailed coverage of everything Apple for 34 years
and the TidBITS Content Network for Apple professionals
6 comments

Reddit Blocks Indexing by Search Engines Other Than Google

At the increasingly impressive 404 Media, Emanuel Maiberg writes:

Google is now the only search engine that can surface results from Reddit, making one of the web’s most valuable repositories of user generated content exclusive to the internet’s already dominant search engine.

The news shows how Google’s near monopoly on search is now actively hindering other companies’ ability to compete at a time when Google is facing increasing criticism over the quality of its search results. And while neither Reddit or Google responded to a request for comment, it appears that the exclusion of other search engines is the result of a multi-million dollar deal that gives Google the right to scrape Reddit for data to train its AI products.

This deal will presumably keep Reddit content out of OpenAI’s forthcoming SearchGPT, which may be attractive to Google but feels like a slippery slope for the open Web.

Read original article

Subscribe today so you don’t miss any TidBITS articles!

Every week you’ll get tech tips, in-depth reviews, and insightful news analysis for discerning Apple users. For over 33 years, we’ve published professional, member-supported tech journalism that makes you smarter.

Registration confirmation will be emailed to you.

This site is protected by reCAPTCHA. The Google Privacy Policy and Terms of Service apply.

Comments About Reddit Blocks Indexing by Search Engines Other Than Google

Notable Replies

  1. It’s a dangerous game they’re playing. They think that this will force users interested in privacy to switch to Google. It’s more likely that they’ll just stop reading Reddit content, because it won’t appear in search results.

    The exact opposite of what SEO is intended to accomplish.

    Personally, I couldn’t care less. Reddit seems to be a source of massive amounts of noise for very little signal. Whenever I click through to links that I find via searches, I never find anything useful there.

  2. This is just one of a slew of counterproductive decisions the CEO, Steve Huffman (usually referred to by Redditors as “Spez” after his username, “r/spez”), has made in recent years that are making Reddit more cumbersome and less enjoyable to use, mostly in the name of boosting profits. A large number of the most active Redditors think Reddit shouldn’t have a profit motive at all and that the IPO was Spez’s biggest mistake.

    It heavily depends on what topics you’re looking for. I’m fairly active in a number of support-group-type subreddits, and the signal-to-noise ratio in those is pretty high thanks to good moderation. It’s a lot like the old Usenet in that regard: some groups are great, some are garbage, and the most are somewhere in between.

  3. With the aid of Redirect Web for Safari, a neat little cross-Apple-platform extension for implementing any redirection in Safari, I now search Google with the AI features turned off (&udm=14) and I always get redirected to old reddit from any search result to reddit. Moreover neither Google nor reddit can run script or set cookies, thanks to Roadblock.

    Is that what they had in mind? :smile:

  4. Unfortunately Reddit have absolutely nothing to lose here. Google’s dominant, there’s an upswell of concern about AI scraping, and let’s be honest, most Redditers are happy to come back for more whatever Reddit does. Regulation in the current climate would be almost certain to create preferential arrangements that would only splinter the 'net further, without addressing the underlying issues posed by AI. These gatekeepers are going to fight for scraps while things get worse for the normies.

    So what’s the answer? Revolution, of course! (Revolution is the answer to everything.)

  5. It’s hard to know what’s really going on with all this fighting in Twitter, blogs, and Twitter links to blogs.

    My guess is…

    Reddit CEO spez is trying to monetize Reddit’s vast user-contributed post history. He thinks it must have value, and therefore anyone who wants to use it should fork over the $. Hence the cutting off of Reddit API access.

    Now he’s seeing there’s a lot of web crawlers out there. Who knows what they’re using the data for? They should pay for the privilege!

    The latest irritation is web crawling for the purpose of training AI large-language models. Microsoft is crawling for at least two purposes:

    • Search results in Bing
    • Training of Copilot’s LLM

    Spez doesn’t want Bing to benefit from Reddit for LLM without paying $, so Reddit is cutting them off.

    Microsoft said, “you can configure your robots.txt to distinguish between search results and LLM training”. But, Reddit notes besides LLM training, Microsoft is also using AI to summarize the results of a search query.

    So that’s AI that is used by Bing. Reddit doesn’t want to allow that either, without (you guessed it) paying Reddit for the privilege.

    Microsoft says, “you can configure your robots.txt to block that too!”. So why is Reddit still just blocking everything?

    Two explanations I can think of:

    • Reddit’s statements imply that it is because since Microsoft has changed what they’re using the data for before, they could change it again. Reddit therefore demands a contract with Microsoft (or any other search engine) to says exactly what the permitted usages are.
    • But the cynical explanation that they just want the search engines to pay for searching Reddit. No contract, no search results for you!

    …which is self defeating, because no search results means no advertising views. Well, maybe for some people? Are there people who go to Reddit “organically” and then view posts? The only time I’m ever at Reddit is if it was the result of a search for something.

Join the discussion in the TidBITS Discourse forum

Participants

Avatar for ace Avatar for Quantumpanda Avatar for mschmitt Avatar for Shamino Avatar for Sebby