The Complete Level Was The Mess

Semrush put out an infographic final week. The sort constructed to be screenshotted into LinkedIn carousels and pasted into webinar decks. 4 pillars. The fourth one is known as “Technical GEO”: schema, structured data, clear structure. The road that justifies it: “Ensures AI engines can parse and join your content material.”

Ensures.

See it live on X/Twitter. Picture Credit score: Pedro Dias

That is the whole piece in a single phrase. The structure of huge language fashions is, by design, the reverse of ensured. And schema has nothing to do with whether or not an LLM can parse text. LLMs parse textual content by studying textual content.

Semrush is far from alone. Each SaaS vendor with pores and skin on this recreation is operating variations of the identical play. Search engine optimization-era controllability, repackaged under a new acronym. The identical percentages, pillars, and pyramids. All dressed for a system that was constructed particularly not to work this manner.

I’ve made the strategic model of this case before, in “Your AI Strategy Isn’t a Strategy.” This piece is the technical ground beneath it.

Constructed To Learn No matter’s There

Language fashions exist as a result of the net is a multitude. Boards, Wikipedia stubs, weblog posts written at 2 A.M., scraped product copy, machine-translated junk, code feedback, half-formed sentences, typos, contradictions, each register from journal article to subreddit shitpost. Pre-training information is the public net, and the public net has by no means been structured.

The transformer architecture handles this by treating language as sequences of tokens. There is no parser inside the mannequin in search of tags. There is no desire for FAQ markup. The mannequin reads the phrases. That is the mechanism.

At inference time, the mannequin generates extra tokens conditioned on the enter. None of that pipeline is studying microdata.

Schema.org has actual jobs. It feeds wealthy leads to classical search. It helps entity disambiguation in the information graph. It helps voice assistants pull structured fields. These are well-defined capabilities inside particular programs. They are not the mechanism by which an LLM understands a sentence.

So when a vendor claims structured information “ensures AI engines can parse and join your content material,” there is nothing to guarantee. The parsing layer they are imagining is not there. The mannequin already parsed your sentence. It did so by studying the sentence.

One Trick, Three Model Colours

Have a look at the greatest GEO and AEO explainers in the market proper now, and you discover the identical Search engine optimization-era playbook with the acronym swapped.

Semrush is already coated. The fourth pillar of its “Technical GEO” presents schema and structured information as guaranteeing one thing that the structure can’t guarantee.

AirOps published a graphic titled “15 Methods to Get Cited by ChatGPT, Perplexity, & Google.” It is the most numbers-heavy specimen of the style I’ve seen this yr. Schema markup will increase quotation chance by 13%. Sequential H2 to H4 tags double your probabilities. Brief paragraphs make content material 49% extra seemingly to seem in AI solutions. Perplexity cites UGC in 91% of solutions, versus Gemini’s 7. Learn the supply notes and the methodology path comes dwelling. The numbers in the graphic hint again to AirOps’s personal “2026 State of AI Search Report.” AirOps is citing AirOps on the query of whether or not AirOps’s prescriptions work.

Peec AI does a extra sincere job in locations. Its full guide to GEO acknowledges the probabilistic nature of the system and concedes that basis fashions are already skilled, so optimization focuses on the retrieval layer. Then it lands the identical prescriptions: heading hierarchy, bullet lists, FAQ markup, a number of schema sorts layered on every web page, summaries at the high of sections – all constructed on the chunking declare that lengthy paragraphs lose out as a result of the engine extracts fragments slightly than full articles.

Profound, citing Aleyda Solis’s guidelines, is the most specific in its piece: “Optimize for Chunk-Degree Retrieval.” Every part, a standalone snippet. Every web page, a buffet from which the engine takes what it desires. The engine, on this telling, is a well mannered visitor who solely takes what’s been laid out.

Three distributors. Similar working assumption: a controllable, prescriptive technical self-discipline sits between a writer and a quotation, and it occupies roughly the identical form as classical Search engine optimization. Schema, headings, construction, freshness, machine-readable codecs. Acquainted. Billable. Reportable up to a chief advertising and marketing officer.

What Schema Really Does

Schema is not the goal right here. Schema has real, well-defined uses. Classical Google search makes use of it for wealthy outcomes: costs, scores, occasion occasions, the structured fields that drive search engine outcomes web page options. The information graph makes use of it for entity disambiguation. Voice assistants pull structured fields out of it.

None of that goes away. If you happen to’re accountable for technical Search engine optimization, hold implementing schema the place it earns its hold.

Schema can’t attain right into a transformer and enhance its comprehension of your prose. The mannequin isn’t architected to learn schema as schema. It receives no matter textual content the engine fetched and selected to embody, and processes that textual content as language tokens. The complete GEO/AEO advertising and marketing layer rests on conflating two distinct claims: that schema is helpful in classical search, and that schema feeds the LLM. The primary is true. The second is a class error.

Chunking Is Not Yours To Optimize

The chunking recommendation retains reappearing as a result of it sounds technical, sits neatly inside a flowchart, and provides a content material crew one thing concrete to do on Monday morning. It is additionally incoherent.

Chunking occurs at retrieval time. Perplexity, ChatGPT, and Gemini every run a retriever over candidate paperwork, break up them in accordance to their very own configurations (size, overlap, embedding mannequin, generally semantic boundaries), and feed the top-k chunks into the mannequin’s context. These configurations belong to the engine. They get tuned in a different way throughout programs and retuned on schedules no writer is privy to. The writer’s view of the chunker is the writer’s view of the mannequin: black field, outcomes solely.

So when a vendor says “optimize for chunk-level retrieval,” what is really being really helpful is good writing. Brief, self-contained paragraphs. Clear definitions close to the high of sections. Inner logical construction. These are recognizable disciplines: information structure, technical writing, readability. They’ve been recognizable disciplines since lengthy before the transformer was invented. They are not a brand new technical layer.

A extra sincere model of the pitch can be: Rent somebody competent at writing for the net. That sentence does not match on a pricing web page.

The Paper They Don’t Learn

There is an actual academic paper known as “GEO.” Aggarwal and co-authors, KDD 2024. It is the closest factor to a citable supply the SaaS layer has when it sells generative engine optimization as a self-discipline. It is additionally, as papers go, simple to skim. 9 “optimization strategies” are examined on a ten,000-query benchmark, with outcomes.

What did the paper discover labored?

Adding citations from credible sources. Including quotations from related sources. Including statistics. Enhancing fluency. Making prose simpler to perceive. The strategies that produced the largest visibility lifts had been primarily: write content material with extra proof in cleaner prose.

What did the paper take a look at and discover did not work?

Key phrase stuffing, the closest analogue in the paper to the Search engine optimization-era playbook the present GEO and AEO distributors have repackaged. End result: under baseline. The paper’s authors notice in plain phrases that strategies efficient in engines like google “might not translate to success on this new paradigm.”

Discover what is not in the record of 9 strategies. Schema. Structured information. FAQ markup. Heading hierarchy. Machine-readable codecs. None of those are examined in the paper, as a result of none of them are the optimization floor the paper research. The paper is finding out content-level interventions: what you place in the phrases, not metadata layered round the phrases.

The SaaS layer borrowed the acronym. The findings stayed in the paper. “Technical GEO” is the Search engine optimization playbook with completely different stickers on the identical containers, bought towards analysis that factors the different means.

The Assumption Smuggled In

The SaaS pitch solely is smart in case you smuggle in a single assumption: that the system you’re optimizing for has the identical form as the one which’s been billing Search engine optimization shoppers for a quarter-century. Inputs you management. Outputs that reply. A retrievable causal chain between the two.

That mannequin was all the time a simplification of how search labored. It was shut sufficient to hold the trade operating, and shut sufficient to hold the invoices going out.

None of that simplification survives contact with generative programs. The identical immediate produces completely different solutions throughout periods, customers, temperatures, mannequin variations, and days. Noticed conduct throughout the main engines, not a clear property of any single one. The retrieval layer in entrance of the mannequin additionally strikes: candidate sources shift, rating shifts, freshness home windows shift. No causal chain runs between “I added FAQ schema” and “the mannequin cited my web page.” What runs between them is a chance distribution, and the stuff you management have an effect on that distribution in methods no person can cleanly attribute. Not even the individuals who created these programs.

This is the established line on AI visibility instruments, repeated right here as a result of it applies to the entire prescriptive layer. Statistically unverifiable information drawn from non-deterministic programs. A 13% quotation elevate, measured how, towards what counterfactual, with what reproducibility? The methodological questions aren’t what these numbers are designed to reply. The numbers are the reply. They land in a graphic, get rendered as ROI in a board deck, and the dialog strikes on.

One thing To Say In The Assembly

Right here is the half that the structure argument and the methodology argument do not, on their very own, clarify. Why does the whole SaaS layer hold efficiently promoting these items to individuals who are not silly?

The sincere model of the reply goes one thing like: We are working with decreased visibility right into a system that does not expose its mechanics, that returns completely different outputs to completely different folks for the identical question, that is altering month by month, and that has folded a considerable chunk of the funnel right into a black field. We will hold doing the work that has all the time been the work: writing nicely, being helpful, constructing authority, sustaining the website. We will monitor what exhibits up the place. The deterministic dashboard we used to have is not coming again.

That sentence is unsayable in a advertising and marketing assembly. It admits the lever is not linked. It tells management that the price range line they authorized does not have a corresponding motion. It provides the crew nothing to put in subsequent quarter’s plan.

So the SaaS layer fills the hole. It manufactures levers. Pillars, frameworks, share lifts, schema audits, chunking optimization, machine-readable codecs. Reportable exercise. Defensible expenditure. One thing to say in the assembly. None of this will get you visibility. The engine decides that. What is on provide is the look of management, bought to individuals who would slightly pay than concede that management left the room.

As soon as the lever is purchased, it has to be operated. Schema audits get scheduled. Chunking checklists get reviewed. Quotation likelihoods get tracked, refreshed, and in contrast. The dashboard the crew paid for turns into the dashboard the crew optimizes towards, and the dashboard quietly replaces the precise drawback with the a part of the drawback it could see. By the time anybody notices, the SaaS layer is writing the temporary.

None of this is an ethical failure on the purchaser’s facet. What you are watching is what occurs when an trade has been organized for a quarter-century round the premise that you would be able to pull a lever and watch the meter transfer, and the meter quietly disconnects from the lever. The distributors aren’t operating a con. They are filling demand for the solely factor the purchaser can now not afford to do with out: a solution that matches in a slide.

Rank And Tank, All Over Once more

I hold coming again to a phrase that matches this entire second: dancing to the rank-and-tank tunes (I borrowed it from David McSweeney). The cycle goes: Vendor sells the controllable-discipline body, companies undertake it, content material groups scale manufacturing round the prescriptions, AI-generated articles get pumped out at quantity as a result of the prescriptions are simple to template. A few of it ranks for some time. Most of it will definitely tanks as a result of the prescriptions had been by no means the mechanism, and the engine adjusts, or the freshness window closes, or the system merely strikes on.

The Search engine optimization trade has accomplished this before. Spinning. Mass programmatic pages. Doorway content material. Every cycle adopted the identical form: a controllable enter dressed as a self-discipline, bought at scale, briefly efficient, ultimately punished by the engine, changed by the subsequent controllable enter dressed as a self-discipline.

GEO and AEO are the present cycle. The pillars and percentages and pyramids are this cycle’s templates. Beneath them, the methods bifurcate.

One path is model presence exploitation. Plant your title the place the engines look. Reddit threads, top-X listicles, the identical quotation surfaces again and again. The cycle feeds itself: engines cite the surfaces, manufacturers work the surfaces, surfaces feed the engines. I’ve written about this loop before; I known as it the Ouroboros pattern. The quick model is that the loop is much less secure than the technique assumes.

The opposite path is content at scale. Produce variations, pump out quantity, deal with the templated output as content material that might earn a quotation. I’ve written about this method before, in the “Scaling Disappointment” piece. The quick model is that uniqueness is not worth, and at the tempo these prescriptions allow, qualitative evaluate stops being attainable. The amount of AI-generated copy produced beneath this path is this cycle’s externality.

The next cycle will sell the cleanup.

Overlook for a second whether or not your “Technical GEO” is arrange appropriately. Ask whether or not the factor you are placing on the web page is worth reading. Massive language fashions had been designed to learn no matter is there. If what is there is good, will probably be learn. If what is there is templated, low-utility content material optimized towards a chunking heuristic that does not exist, it will eventually be filtered out: by the engine, by the consumer, or by the subsequent educational paper displaying that retrieval high quality is degraded by precisely this sort of slop.

The benefit, when it accrues, will accrue to the people who do not get distracted. Who do not subscribe to the dashboard. Who hold working on product-driven Search engine optimization and the foundations which have all the time linked content material to folks. There are early indicators of this on the timelines I learn. Practitioners brazenly questioning whether or not optimizing towards a non-deterministic floor is smart in any respect, and asking whether or not their consideration belongs again on classical search; which, at the finish of the chain, is what feeds these programs anyway.

The mess was all the time the level. The structure handles it. The trade simply wants to cease pretending the mess is the drawback.

Extra Assets:

This submit was initially revealed on The Inference.

Featured Picture: Roman Samborskyi/Shutterstock

Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.