Each main AI platform can now browse web sites autonomously. Chrome’s auto browse scrolls and clicks. ChatGPT Atlas fills types and completes purchases. Perplexity Comet researches throughout tabs. However none of those brokers sees your web site the method a human does.
This is Half 4 in a five-part sequence on optimizing web sites for the agentic net. Part 1 lined the evolution from search engine optimisation to AAIO. Half 2 defined how to get your content material cited in AI responses. Half 3 mapped the protocols forming the infrastructure layer. This article will get technical: how AI brokers truly understand your web site, and what to construct for them.
The core perception is one which retains developing in my analysis: Probably the most impactful factor you are able to do for AI agent compatibility is the similar work net accessibility advocates have been pushing for many years. The accessibility tree, initially constructed for display screen readers, is turning into the main interface between AI brokers and your web site.
In accordance to the 2025 Imperva Bad Bot Report (Imperva is a cybersecurity firm), automated site visitors surpassed human site visitors for the first time in 2024, constituting 51% of all net interactions. Not all of that is agentic searching, however the path is clear: the non-human viewers on your web site is already bigger than the human one, and it’s rising. All through this article, we draw solely from official documentation, peer-reviewed analysis, and bulletins from the corporations constructing this infrastructure.
Three Methods Brokers See Your Web site
When a human visits your web site, they see colours, format, pictures, and typography. When an AI agent visits, it sees one thing totally completely different. Understanding what brokers truly understand is the basis for constructing web sites that work for them.
The main AI platforms use three distinct approaches, and the variations have direct implications for a way you need to construction your web site.
Imaginative and prescient: Studying Screenshots
Anthropic’s Computer Use takes the most literal method. Claude captures screenshots of the browser, analyzes the visible content material, and decides what to click on or kind primarily based on what it “sees.” It’s a steady suggestions loop: screenshot, purpose, act, screenshot. The agent operates at the pixel stage, figuring out buttons by their visible look and studying textual content from the rendered picture.
Google’s Project Mariner follows an analogous sample with what Google describes as an “observe-plan-act” loop: observe captures visible parts and underlying code constructions, plan formulates motion sequences, and act simulates consumer interactions. Mariner achieved an 83.5% success price on the WebVoyager benchmark.
The imaginative and prescient method works, but it surely’s computationally costly, delicate to format adjustments, and restricted by what’s visually rendered on display screen.
ChatGPT Atlas makes use of ARIA tags, the similar labels and roles that assist display screen readers, to interpret web page construction and interactive parts.
Atlas is constructed on Chromium, however somewhat than analyzing rendered pixels, it queries the accessibility tree for parts with particular roles (“button”, “hyperlink”) and accessible names. This is the similar information construction that display screen readers like VoiceOver and NVDA use to assist individuals with visible disabilities navigate the net.
Microsoft’s Playwright MCP, the official MCP server for browser automation, takes the similar method. It supplies accessibility snapshots somewhat than screenshots, giving AI fashions a structured illustration of the web page. Microsoft intentionally selected accessibility information over visible rendering for his or her browser automation normal.
Hybrid: Each At As soon as
In observe, the most succesful brokers mix approaches. OpenAI’s Computer-Using Agent (CUA), which powers each Operator and Atlas, layers screenshot evaluation with DOM processing and accessibility tree parsing. It prioritizes ARIA labels and roles, falling again to textual content content material and structural selectors when accessibility information isn’t out there.
Perplexity’s analysis confirms the similar sample. Their BrowseSafe paper, which details the security infrastructure behind Comet’s browser agent, describes utilizing “hybrid context administration combining accessibility tree snapshots with selective imaginative and prescient.”
Platform
Main Method
Particulars
Anthropic Pc Use
Imaginative and prescient (screenshots)
Screenshot, purpose, act suggestions loop
Google Challenge Mariner
Imaginative and prescient + code construction
Observe-plan-act with visible and structural information
OpenAI Atlas
Accessibility tree
Explicitly makes use of ARIA tags and roles
OpenAI CUA
Hybrid
Screenshots + DOM + accessibility tree
Microsoft Playwright MCP
Accessibility tree
Accessibility snapshots, no screenshots
Perplexity Comet
Hybrid
Accessibility tree + selective imaginative and prescient
The sample is clear. Even platforms that began with vision-first approaches are incorporating accessibility information. And the platforms optimizing for reliability and effectivity (Atlas, Playwright MCP) lead with the accessibility tree.
Your web site’s accessibility tree isn’t a compliance artifact. It’s more and more the main interface brokers use to perceive and work together along with your web site.
Final 12 months, before the European Accessibility Act took impact, I half-joked that it might be ironic if the factor that lastly received individuals to care about accessibility was AI brokers, not the individuals accessibility was designed for. That’s not a joke.
The Accessibility Tree Is Your Agent Interface
The accessibility tree is a simplified illustration of your web page’s DOM that browsers generate for assistive applied sciences. The place the full DOM comprises each div, span, type, and script, the accessibility tree strips away the noise and exposes solely what issues: interactive parts, their roles, their names, and their states.
This is why it really works so nicely for brokers. A typical web page’s DOM may include 1000’s of nodes. The accessibility tree reduces that to the parts a consumer (or agent) can truly work together with: buttons, hyperlinks, type fields, headings, landmarks. For AI fashions that course of net pages inside a restricted context window, that discount is vital.
Comply with WAI-ARIA finest practices by including descriptive roles, labels, and states to interactive parts like buttons, menus, and types. This helps ChatGPT acknowledge what every component does and work together along with your web site extra precisely.
And:
Making your web site extra accessible helps ChatGPT Agent in Atlas perceive it higher.
Analysis information backs this up. Probably the most rigorous information on this comes from a UC Berkeley and College of Michigan study printed for CHI 2026, the premier tutorial convention on human-computer interplay. The researchers examined Claude Sonnet 4.5 on 60 real-world net duties below completely different accessibility circumstances, gathering 40.4 hours of interplay information throughout 158,325 occasions. The outcomes have been placing:
Situation
Process Success Fee
Avg. Completion Time
Customary (default)
78.33%
324.87 seconds
Keyboard-only
41.67%
650.91 seconds
Magnified viewport
28.33%
1,072.20 seconds
Underneath normal circumstances, the agent succeeded almost 80% of the time. Prohibit it to keyboard-only interplay (simulating how display screen reader customers navigate) and success drops to 42%, taking twice as lengthy. Prohibit the viewport (simulating magnification instruments), and success drops to 28%, taking on 3 times as lengthy.
The paper identifies three classes of gaps:
Notion gaps: brokers can’t reliably entry display screen reader bulletins or ARIA state adjustments that might inform them what occurred after an motion.
Cognitive gaps: brokers battle to observe process state throughout a number of steps.
Motion gaps: brokers underutilize keyboard shortcuts and fail at interactions like drag-and-drop.
The implication is direct. Web sites that current a wealthy, well-labeled accessibility tree give brokers the information they want to succeed. Web sites that rely on visible cues, hover states, or advanced JavaScript interactions with out accessible options create the circumstances for agent failure.
Perplexity’s search API architecture paper from September 2025 reinforces this from the content material facet. Their indexing system prioritizes content material that is “top quality in each substance and type, with information captured in a way that preserves the unique content material construction and format.” Web sites “heavy on well-structured information in record or desk type” profit from “extra formulaic parsing and extraction guidelines.” Construction isn’t simply useful. It’s what makes dependable parsing doable.
Semantic HTML: The Agent Basis
The accessibility tree is constructed from your HTML. Use semantic parts, and the browser generates a helpful accessibility tree robotically. Skip them, and the tree is sparse or deceptive.
This isn’t new recommendation. Net requirements advocates have been screaming “use semantic HTML” for 20 years. Not everybody listened. What’s new is that the viewers has expanded. It used to be about display screen readers and a comparatively small proportion of customers. Now it’s about each AI agent that visits your web site.
Use native parts. A component robotically seems in the accessibility tree with the position “button” and its textual content content material as the accessible title. A
does not. The agent doesn’t realize it’s clickable.
Search flights
Label your types. Each enter wants an related label. Brokers learn labels to perceive what information a discipline expects.
The autocomplete attribute deserves consideration. It tells brokers (and browsers) precisely what kind of knowledge a discipline expects, utilizing standardized values like title, electronic mail, tel, street-address, and group. When an agent fills a type on somebody’s behalf, autocomplete attributes make the distinction between assured discipline mapping and guessing.
Set up heading hierarchy. Use h1 via h6 in logical order. Brokers use headings to perceive web page construction and find particular content material sections. Skip ranges (leaping from h1 to h4) create confusion about content material relationships.
Use landmark areas. HTML5 landmark parts (
, , , , ) inform brokers the place they are on the web page. A component is unambiguously navigation. A
requires interpretation. Readability for the win, at all times.
Microsoft’s Playwright check brokers, introduced in October 2025, generate check code that makes use of accessible selectors by default. When the AI generates a Playwright check, it writes:
const todoInput = web page.getByRole('textbox', { title: 'What wants to be achieved?' });
Not CSS selectors. Not XPath. Accessible roles and names. Microsoft constructed its AI testing instruments to discover parts the similar method display screen readers do, as a result of it’s extra dependable.
The ultimate slide of my Conversion Hotel keynote about optimizing web sites for AI brokers. (Picture Credit score: Slobodan Manic)
ARIA: Helpful, Not Magic
OpenAI recommends ARIA (Accessible Wealthy Web Purposes), the W3C normal for making dynamic net content material accessible. However ARIA is a complement, not a substitute. Like protein shakes: helpful on high of an actual eating regimen, counterproductive as a substitute for precise meals.
If you should use a local HTML component or attribute with the semantics and habits you require already in-built, as a substitute of re-purposing a component and including an ARIA position, state or property to make it accessible, then accomplish that.
The truth that the W3C had to make “don’t use ARIA” the first rule of ARIA tells you the whole lot about how typically it will get misused.
Adrian Roselli, a acknowledged net accessibility professional, raised an vital concern in his October 2025 analysis of OpenAI’s steerage. He argues that recommending ARIA with out enough context dangers encouraging misuse. Web sites that use ARIA are generally less accessible in accordance to WebAIM’s annual survey of the high million web sites, as a result of ARIA is typically utilized incorrectly as a band-aid over poor HTML construction. Roselli warns that OpenAI’s steerage might incentivize practices like keyword-stuffing in aria-label attributes, the similar type of gaming that plagued meta key phrases in early search engine optimisation.
The suitable method is layered:
Begin with semantic HTML. Use , , , , and different native parts. These work appropriately by default.
Add ARIA when native HTML isn’t sufficient. Customized parts that don’t have HTML equivalents (tab panels, tree views, disclosure widgets) want ARIA roles and states to be comprehensible.
Use ARIA states for dynamic content material. When JavaScript adjustments the web page, ARIA attributes talk what occurred:
Preserve aria-label descriptive and trustworthy. Use it to present context that isn’t seen on display screen, like distinguishing between a number of “Delete” buttons on the similar web page. Don’t stuff it with key phrases.
The precept is the similar one which applies to good search engine optimisation: construct for the consumer first, optimize for the system second. Semantic HTML is constructing for the consumer. ARIA is fine-tuning for edge instances the place HTML falls brief.
The Rendering Query
Browser-based brokers like Chrome auto browse, ChatGPT Atlas, and Perplexity Comet run on Chromium. They execute JavaScript. They will render your single-page utility.
However not the whole lot that visits your web site is a full browser agent.
AI crawlers (PerplexityBot, OAI-SearchBot, ClaudeBot) index your content material for retrieval and quotation. Many of those crawlers do not execute client-side JavaScript. In case your web page is a clean till React hydrates, these crawlers see an empty web page. Your content material is invisible to the AI search ecosystem.
Part 2 of this sequence lined the quotation facet: AI techniques choose fragments from listed content material. In case your content material isn’t in the preliminary HTML, it’s not in the index. If it’s not in the index, it doesn’t get cited. Server-side rendering isn’t only a efficiency optimization.
It’s a visibility requirement.
Even for full browser brokers, JavaScript-heavy web sites create friction. Dynamic content material that masses after interactions, infinite scroll that by no means indicators completion, and types that reconstruct themselves after every enter all create alternatives for brokers to lose observe of state. The A11y-CUA research attributed a part of agent failure to “cognitive gaps”: brokers dropping observe of what’s occurring throughout advanced multi-step interactions. Less complicated, extra predictable rendering reduces these failures.
Microsoft’s steerage from Part 2 applies right here instantly: “Don’t hide important answers in tabs or expandable menus: AI techniques might not render hidden content material, so key details could be skipped.” If information issues, put it in the seen HTML. Don’t require interplay to reveal it.
Sensible rendering priorities:
Server-side render or pre-render content material pages. If an AI crawler can’t see it, it doesn’t exist in the AI ecosystem.
Keep away from blank-shell SPAs for content material pages. Frameworks like Subsequent.js (which powers this web site), Nuxt, and Astro make SSR easy.
Don’t conceal essential information behind interactions. Costs, specs, availability, and key details must be in the preliminary HTML, not behind accordions or tabs.
Use normal links for navigation. Client-side routing that doesn’t update the URL or uses onClick handlers instead of real links breaks agent navigation.
Testing Your Agent Interface
You wouldn’t ship a website without testing it in a browser. Testing how agents perceive your website is becoming equally important.
Screen reader testing is the best proxy. If VoiceOver (macOS), NVDA (Windows), or TalkBack (Android) can navigate your website successfully, identifying buttons, reading form labels, and following the content structure, agents can likely do the same. Both audiences rely on the same accessibility tree. This isn’t a perfect proxy (agents have capabilities screen readers don’t, and vice versa), but it catches the majority of issues.
Microsoft’s Playwright MCP provides direct accessibility snapshots. If you want to see exactly what an AI agent sees, Playwright MCP generates structured accessibility snapshots of any web page. These snapshots strip away visible presentation and present you the roles, names, and states that brokers work with. Printed as @playwright/mcp on npm, it’s the most direct method to view your web site via an agent’s eyes.
The output seems one thing like this (simplified):
In case your essential interactive parts don’t seem in the snapshot, or seem with out helpful names, brokers will battle along with your web site.
Browserbase’s Stagehand (v3, released October 2025, and humbly self-described as “the finest browser automation framework”) supplies one other angle. It parses each DOM and accessibility bushes, and its self-healing execution adapts to DOM adjustments in actual time. It’s helpful for testing whether or not brokers can full particular workflows on your web site, like filling a type or finishing a checkout.
The Lynx browser is a low-tech possibility price attempting. It’s a text-only browser that strips away all visible rendering, exhibiting you roughly what a non-visual agent parses. A trick I picked up from Jes Scholz on the podcast.
A sensible testing workflow:
Run VoiceOver or NVDA via your web site’s key consumer flows. Are you able to full the core duties with out imaginative and prescient?
Generate Playwright MCP accessibility snapshots of essential pages. Are interactive parts labeled and identifiable?
View your web page supply. Is the main content material in the HTML, or does it require JavaScript to render?
Load your web page in Lynx or disable CSS and test if the content material order and hierarchy nonetheless make sense. Brokers don’t see your format.
A Guidelines For Your Growth Staff
When you’re sharing this article along with your builders (and you need to), right here’s the prioritized implementation record. Ordered by influence and energy, beginning with the adjustments that have an effect on the most agent interactions for the least work.
Excessive influence, low effort:
Use native HTML parts. for actions, for links, for dropdowns. Replace
patterns wherever they exist.
Label every form input. Associate elements with inputs using the for attribute. Add autocomplete attributes with standard values.
Server-side render content material pages. Guarantee main content material is in the preliminary HTML response.
Excessive influence, average effort:
Implement landmark areas. Wrap content material in , , , and parts. Add aria-label when a number of landmarks of the similar kind exist on the similar web page.
Repair heading hierarchy. Guarantee a single h1, with h2 via h6 in logical order with out skipping ranges.
Transfer essential content material out of hidden containers. Costs, specs, and key details ought to not require clicks or interactions to reveal.
Average influence, low effort:
Add ARIA states to dynamic parts. Use aria-expanded, aria-controls, and aria-hidden for menus, accordions, and toggles.
Use descriptive hyperlink textual content. “Learn the full report” as a substitute of “Click on right here.” Brokers use hyperlink textual content to perceive the place hyperlinks lead.
Take a look at with a display screen reader. Make it a part of your QA course of, not a one-time audit.
Key Takeaways
AI brokers understand web sites via three approaches: imaginative and prescient, DOM parsing, and the accessibility tree. The trade is converging on the accessibility tree as the most dependable technique. OpenAI Atlas, Microsoft Playwright MCP, and Perplexity’s Comet all rely on accessibility information.
Net accessibility is not nearly compliance. The accessibility tree is the literal interface AI brokers use to perceive your web site. The UC Berkeley/University of Michigan study exhibits agent success charges drop considerably when accessibility options are constrained.
Semantic HTML is the basis. Native parts like , , , and robotically create a helpful accessibility tree. No framework required. No ARIA wanted for the fundamentals.
ARIA is a complement, not a substitute. Use it for dynamic states and customized parts. However begin with semantic HTML and add ARIA solely the place native parts fall brief. Misused ARIA makes web sites much less accessible, not extra.
Server-side rendering is an agent visibility requirement. AI crawlers that don’t execute JavaScript can’t see content material in blank-shell SPAs. In case your content material isn’t in the preliminary HTML, it doesn’t exist in the AI ecosystem.
Display reader testing is the finest proxy for agent compatibility. If VoiceOver or NVDA can navigate your web site, brokers most likely can too. For direct inspection, Playwright MCP accessibility snapshots present precisely what brokers see.
The primary three components of this sequence lined why the shift issues, how to get cited, and what protocols are being constructed. This article lined the implementation layer. The encouraging information is that these aren’t separate workstreams. Accessible, well-structured web sites carry out higher for people, rank higher in search, get cited extra typically by AI, and work higher for brokers. It’s the similar work serving 4 audiences.
And the work builds on itself. The semantic HTML and structured information lined right here are precisely what WebMCP builds on for its declarative type method. The accessibility tree your web site exposes right this moment turns into the basis for the structured device interfaces of tomorrow.
Up subsequent in Half 5: the commerce layer. How Stripe, Shopify, and OpenAI are constructing the infrastructure for AI brokers to full purchases, and what it means on your checkout circulation.
Slobodan Manic Host of the No Hacks Podcast and machine-first net optimization advisor at No Hacks
Slobodan “Sani” Manić is an internet site optimisation advisor with over 15 years of expertise serving to companies make their websites sooner, ...
Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.