Can AI read
your website?
ChatGPT, Claude, Gemini, and Perplexity answer questions about your site whether you invited them or not. Paste your URL to see what they can — and can't — see.
What canaisee measures
canaisee grades any public web page on how readable it is to AI agents. It runs the same battery of checks a modern language-model crawler runs: it fetches the page with a real HTTP client, inspects the HTML, pulls robots.txt, sitemap.xml, llms.txt, and the /.well-known/ agent-native endpoints, and compares what the raw response contains to what a headless browser sees after JavaScript runs. The result is a single headline grade backed by a transparent, versioned sub-score breakdown you can argue with.
Checks span four categories. Crawler accessibility asks whether agents are welcomed in the first place: whether robots.txt opens the door to AI user agents, whether the main content survives a plain HTTP fetch without running JavaScript, whether a login or paywall blocks the route. Content readability asks how much of the page is real content versus chrome, scripts, and trackers, how token-efficient the markup is, and whether the server offers a clean text/markdown representation when an agent asks for one. Semantic structure covers the hooks agents use to orient themselves: JSON-LD and schema.org vocabularies, Open Graph and Twitter cards, a sensible heading hierarchy, a declared language, a discoverable sitemap. Agent-native surface is the frontier — ahead-of-the-curve signals like /.well-known/mcp.json for Model Context Protocol discovery, WebMCP tool declarations, A2A agent cards, Content-Signal response headers, and canonical .md mirrors of HTML pages.
Why agent-readability matters
AI systems increasingly answer questions about your site without sending a human visitor. Whether those answers are accurate, attributed, and up-to-date depends on whether the model can actually read the page. If your robots policy blocks the crawler, if your content only materializes after a React bundle executes, if your copy is buried under a kilometer of analytics scripts — the model guesses, or hallucinates, or cites someone else. The cost of being invisible to agents is no longer theoretical: in competitive categories the citations a model issues are already shaping where traffic, attention, and authority flow.
canaisee is opinionated about this. We believe the web of the next decade is being read by agents first and humans second, and that sites which invest in agent-readability now will compound that investment the same way sites which invested in mobile-readability a decade ago did. The grade is a snapshot of how ready a site is for that transition. The rubric is public, versioned, and designed to be argued with.
How to read your scorecard
Every scan returns a composite grade (A+ through F) on a 100-point scale. The composite is a weighted blend of Hygiene (the table-stakes half: crawler accessibility, content readability, semantic structure) and Frontier (the ahead-of-the-curve half: agent-native signals like MCP, Accept-markdown negotiation, Content-Signal headers). A site with perfect Hygiene and zero Frontier lands at B (85) — the top of the expected range for a well-built modern site. Frontier features push a site into A and A+.
Every check surfaces its evidence and its flags on the scorecard, so the grade is inspectable: you can see exactly what the crawler fetched, what it extracted, what it flagged, and what it would take to move the number. The full rubric — every check, every weight, every cut-point — is published at /rubric and versioned as the web changes.