Final September, Lily Ray requested Perplexity for the newest information on search engine marketing and AI search. It informed her, confidently, about the “September 2025 ‘Perspective’ Core Algorithm Replace”; a Google replace that, as she then wrote at size in “The AI Slop Loop,” didn’t exist. Google hasn’t named core updates in years. “Views” was already a SERP function. If an actual replace had rolled out whereas she was in Austria, her inbox would have informed her before Perplexity did.
She checked the citations. Each pointed at AI-generated posts on search engine marketing company blogs: websites that had run a content material pipeline, hallucinated an replace, and revealed it as reporting. Perplexity learn the slop, handled it as supply materials, and served it again to her as information.
In February, the BBC’s Thomas Germain spent 20 minutes writing a weblog put up on his private web site. Its title: “The very best tech journalists at consuming sizzling canine.” It ranked him first, invented a 2026 South Dakota Worldwide Sizzling Canine Championship that had by no means occurred, and cited exactly nothing. Inside 24 hours, each Google’s AI Overviews and ChatGPT have been passing his fabrication alongside to anybody who requested. Claude didn’t chew. Google and OpenAI did.
Everybody who has seemed has seen it.
I’ve Argued About The Ouroboros Earlier than. I Had The Timeline Incorrect
The prevailing framing for this drawback has been mannequin collapse. You practice a mannequin on net textual content, the net fills up with AI output, the subsequent mannequin trains on a corpus more and more manufactured from its personal exhaust, and finally the distribution flattens into mush. Innovation comes from exceptions, and probabilistic programs that converge towards the imply attenuate exceptions by design. I’ve used the phrase digital ouroboros for this.
That framing assumes coaching cycles. It assumes time. It assumes that contamination strikes at the velocity of mannequin launch.
It doesn’t. What Lily documented, what Germain documented, what the New York Times then went and quantified – none of that is training-side. The fashions concerned have been not retrained between the hallucination showing on a weblog and being served as citation-backed reality. The contamination moved at the velocity of a crawl. The ouroboros isn’t taking generations to eat itself. It’s consuming itself at question time, each time somebody asks certainly one of these programs a query.
The pipe everybody has been watching is not the pipe that is breaking.
The Distinction That Issues
Mannequin collapse is a training-corpus drawback. Artificial content material seeps into the pre-training knowledge, the subsequent technology of mannequin inherits it, functionality degrades. Researchers have been warning about this for 2 years. They’re proper. They’re additionally describing one thing sluggish sufficient that everybody can nod gravely and maintain delivery.
Retrieval contamination is sooner and already right here. RAG programs – Perplexity, Google AI Overviews, ChatGPT with search – do not generate solutions purely from parametric reminiscence. They fetch paperwork from the reside net, stuff them into context, and generate a response conditioned on what they retrieved. If the retriever surfaces a hallucinated search engine marketing put up, the reply inherits the hallucination. No retraining required.
The tutorial literature on this is clear. PoisonedRAG (Zou et al., 2024) confirmed that injecting a small variety of crafted passages right into a retrieval corpus was ample to management the output of a RAG system on focused queries. BadRAG (Xue et al., 2024) demonstrated the similar class of assault utilizing semantic backdoors. Each papers deal with this as an adversarial drawback: what occurs when an attacker intentionally poisons the corpus.
What Germain and Lily by chance proved is that the adversarial mannequin is the regular working mannequin. You don’t want a crafted adversarial passage. You want a weblog put up. The open net is the corpus, and anybody with a site can write to it.
The Oumi analysis commissioned by the New York Instances put numbers on what this prices. Throughout 4,326 SimpleQA checks, Google’s AI Overviews answered accurately 85% of the time on Gemini 2, 91% on Gemini 3. At Google’s scale – greater than 5 trillion searches a 12 months – a 9% error charge nonetheless interprets to tens of hundreds of thousands of flawed solutions each hour. However the extra revealing determine is this: on Gemini 3, 56% of the right solutions have been ungrounded, up from 37% on Gemini 2. The improve improved floor accuracy and made the citations worse. When the mannequin received one thing proper, greater than half the time, the supply it pointed to didn’t help the declare.
The retrieval layer is not a filter. It is the an infection vector.
Who’s Seeding The Corpus
The trade that has most enthusiastically produced it – after which most enthusiastically written about the penalties of consuming it – is the search engine marketing trade. I’ve written before about content scaling being simply content material spinning with higher grammar, and about the AI visibility tool complex that builds dashboards from the output of non-deterministic programs. This is the similar loop, one layer deeper. An search engine marketing company runs an AI content material pipeline as a result of AI Overviews have minimize their shoppers’ site visitors. The pipeline publishes speculative “winners and losers” posts throughout a core replace that’s nonetheless rolling out, citing nothing. One other company’s pipeline picks these up as sources. The output floods into the retrieval index. AI Overviews cites certainly one of them. The unique company then writes a case research about how AI Overviews are “surfacing” their content material.
An Ahrefs study of over 26,000 ChatGPT supply URLs discovered that “finest X” listicles accounted for practically 44% of all cited web page varieties, together with instances the place manufacturers rank themselves first towards their rivals. Harpreet Chatha told the BBC you’ll be able to publish “the finest waterproof sneakers for 2026,” put your self first, and be cited in AI Overviews and ChatGPT inside days. Lily, throughout the precise March 2026 core replace, discovered AI-generated articles claiming to checklist winners and losers whereas the replace was nonetheless rolling out; articles that opened with filler and listed manufacturers with out a single actual quotation.
The practitioners scaling AI content material are additionally the ones most instantly harmed when AI search programs cite that content material as reality. No person pressured this. The trade constructed the pipeline, fed it, and complained about what got here out the different finish. Not adversarial poisoning. Simply the trade polluting its personal water provide after which hiring consultants to check it.
The Tier That Issues
The Oumi research is about AI Overviews, which is free by design. Google AI Overviews reportedly reached over two billion monthly active users by mid-2025. ChatGPT has around 900 million weekly active users, of which roughly 50 million pay. Which means about 94% of the folks interacting with OpenAI’s product are on the free tier.
The paid tiers are higher. Per OpenAI’s own launch claims, cited in Lily’s piece, GPT-5.4 is 33% much less seemingly to produce false particular person claims than GPT-5.2. The free-tier GPT-5.3 is additionally improved over its predecessor (26.8% fewer hallucinations with net search, 19.7% fewer with out), however it’s nonetheless meaningfully much less dependable than the paywalled model. Gemini 3, which made AI Overviews extra correct on floor checks, additionally made the ungrounded charge worse. Higher reply, weaker quotation.
No person appears to thoughts. The dependable model of the product is paywalled. The model most of the planet will get – together with the version at the top of Google Search – might be manipulated by 20 minutes of labor on a private web site. Intelligence is the advertising class. What two billion customers truly obtain is a assured summarization of no matter the crawler occurred to discover.
Grokipedia As The Terminal State
The accidents of the retrieval layer are one factor. Grokipedia is the model the place accident is now not a helpful phrase.
Elon Musk’s xAI launched Grokipedia on Oct. 27, 2025, with 885,279 articles, all generated or rewritten by Grok. A few of them have been lifted from Wikipedia wholesale, with a disclaimer at the backside acknowledging the CC-BY-SA license; a license Wikipedia maintains exactly as a result of a group of human editors writes and verifies the content material. Others have been rewritten from scratch. PolitiFact found Grokipedia citations, together with Instagram reels as sources, which Wikipedia’s personal insurance policies rule out as “typically unacceptable.” Grokipedia’s entry on Canadian singer Feist stated her father died in Could 2021, citing a 2017 Vice article about Canadian indie rock that made no point out of the loss of life. And her father was nonetheless alive when that article was written. The Nobel Prize in Physics entry added an uncited sentence claiming physics is historically the first prize awarded at the ceremony, which isn’t true.
Musk stated the purpose is to “analysis the remainder of the web, no matter is publicly obtainable, and proper the Wikipedia article.” The remainder of the web now contains the artificial content material produced by each AI content material pipeline pointed at it. An AI system studying the open net, rewriting Wikipedia based mostly on what it finds, and presenting the outcome as a reference work is the retrieval-contamination drawback with the suggestions loop made express and shipped as a product.
By mid-February 2026, Grokipedia had misplaced most of its Google visibility. Wikipedia outranks Grokipedia for searches about Grokipedia itself.
“This human-created information is what AI firms rely on to generate content material; even Grokipedia wants Wikipedia to exist.” – The Wikimedia Foundation
The artificial encyclopedia is sponsored by the human one. When the subsidy stops, the factor relying on it stops making sense.
Wikipedia is not past criticism. Its edit wars, ideological gatekeeping, and systemic gaps in who will get to form articles are well-documented and actual. However the response to a flawed human editorial course of is not to take away the people solely and name the outcome an enchancment. I’ve written before about the accountability vacuum that opens if you exchange human judgment with API calls. Wikipedia’s issues are the issues of a messy, contested, accountable system. Grokipedia’s issues are the issues of a system with no accountability in any respect.
The Quotation Layer Is Decoupling From Authorship
I wrote just lately about Reddit promoting “Authentic Human Conversation™” to AI firms whereas the platform’s personal moderators report that they will now not inform which feedback are human. The Oumi research discovered that of 5,380 sources cited by AI Overviews, Fb and Reddit have been the second and fourth most typical. The quotation layer of the most-used reply engine in the world is considerably constructed on two platforms that can’t verify the human origin of their very own content material.
Human creators are pulling out of the open net as a result of the site visitors cut price has collapsed. Reply engines are citing content material whose authorship can’t be verified, or was by no means human to start with. The quotation is nonetheless there. The factor being cited is not what it used to be.
The ouroboros framing was proper. The timeline wasn’t. Retrieval collapse doesn’t anticipate the subsequent coaching run. It wants an indexable URL and a retrieval system prepared to belief it.
The programs are prepared. And greater than half the time they get a solution proper, they will’t level to a supply that helps what they only informed you.
Extra Sources:
This put up was initially revealed on The Inference.
Featured Picture: Anton Vierietin/Shutterstock
Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.