# Rubric v1.5

Every scan is scored against this rubric. Rubric changes require two approvals —
EAC engineering (measurement correctness) and Evangent leadership (grading
integrity) — and every change bumps the version string. Historical scans retain
the rubric version that was active when they ran.

## The two-tier composite

v1.3 replaced the v1.2 four-categories-weighted model with a two-tier
`Hygiene + Frontier` composite, and v1.4 and v1.5 carry that math forward
unchanged. Hygiene measures whether agents can read the site at all — sensible
robots.txt, clean HTML, content reachable without JavaScript, semantic
structure, decent token efficiency. Frontier measures genuinely
ahead-of-the-curve agent-native features — `Accept: text/markdown`,
`/.well-known/mcp.json`, WebMCP tool declarations, Content-Signal headers,
A2A agent cards, documentation surface. Most well-built 2026 sites land in
the B range on hygiene alone; Frontier is what lifts a site to A and A+.

## Documentation surface (new in v1.5)

LLMs increasingly recommend products and services based on the quality of the
documentation surface a site publishes. v1.5 adds a new Frontier check that
scores sites on three signals:

1. **`/llms-full.txt` corpus** (40 points). The concatenated-corpus pattern
   popularized by Anthropic, Vercel, and Stripe. Must be at least 1 KB of real
   content to earn points.
2. **Documentation-path URLs in sitemap** (up to 30 points, scaled). URLs
   under `/docs`, `/guides`, `/faq`, `/reference`, `/api`,
   `/how-to`, `/tutorials`, `/learn`, `/help`, `/kb`,
   `/knowledge`, `/support`. 20+ doc URLs earns the full 30.
3. **Documentation-oriented schema.org types** (30 points). Any one of
   FAQPage, QAPage, HowTo, TechArticle, APIReference earns the full 30.

Weight changes: `accept-markdown-negotiation` and `mcp-server-discovery`
drop from 25% → 20% of Frontier to make room. Hygiene and cut-points are
unchanged.

## AI-native content layer (new in v1.4)

v1.4 makes an implicit v1.3 behavior explicit: a site may serve an AI-native
content layer — prose that exists in the DOM for agent readers but is hidden
from human readers via a CSS class like `.sr-only` or `.visually-hidden`.
Building for AI readability does not have to disrupt the human presentation.

Guardrails:

1. The hidden content is an honest expansion of what humans see — no cloaking
   or bait-and-switch between the two layers.
2. The hidden content lives in the same HTML document — not injected by
   JavaScript after load.
3. The hiding is done via a CSS class, not inline `style="display:none"` or
   the `hidden` attribute — those signal "remove from the page" to assistive
   tech and agents alike, so content hidden that way should not be relied on
   for AI readability.

## Composite formula

```
// Content site
composite = hygiene * 0.85 + frontier * 0.15

// Interactive site
composite = hygiene * 0.75 + frontier * 0.25
```

## Human-only ceiling

A site whose Frontier score is below 15 has its composite capped
at 85 (top of B) regardless of hygiene strength. Pristine HTML and
perfect semantic structure still tops out at B until the site adds some
agent-native surface — `Accept: text/markdown`, an MCP endpoint, or richer
Schema.org markup.

## Hygiene checks

| id | label | category | weight |
| --- | --- | --- | --- |
| `robots-ai-policy` | robots.txt AI-agent policy | crawler | 45% |
| `javascript-dependency` | JavaScript dependency | crawler | 40% |
| `paywall-detection` | Login / paywall wall | crawler | 15% |
| `accept-markdown-negotiation` | Accept: text/markdown negotiation | readability | 15% |
| `content-ratio` | Content-to-chrome ratio | readability | 45% |
| `token-efficiency` | Token efficiency | readability | 40% |
| `json-ld` | JSON-LD / schema.org | semantic | 30% |
| `open-graph` | Open Graph & Twitter Cards | semantic | 20% |
| `heading-hierarchy` | Heading hierarchy | semantic | 20% |
| `language-declaration` | Language declaration | semantic | 10% |
| `sitemap` | sitemap.xml | semantic | 20% |

## Frontier checks

| id | label | weight | applies to |
| --- | --- | --- | --- |
| `accept-markdown-negotiation` | Accept: text/markdown negotiation | 20% | both |
| `mcp-server-discovery` | MCP server discovery | 20% | both |
| `webmcp-tool-declaration` | WebMCP tool declarations | 20% | interactive |
| `json-ld` | Rich structured data | 20% | content |
| `documentation-surface` | Documentation surface | 10% | both |
| `content-signal-header` | Content-Signal response header | 10% | both |
| `a2a-agent-card` | A2A agent card | 10% | both |
| `llms-txt` | llms.txt / ai.txt intent signal | 5% | both |
| `md-mirror` | .md mirror | 5% | both |

## Letter grades

| Composite | Grade | Meaning |
| --- | :---: | --- |
| 95–100 | A+ | Agentic-web leadership. Genuinely exceptional. |
| 88–94  | A  | Ahead of the curve on AI readability. |
| 78–87  | B  | Fundamentals solid; frontier features would push to A. |
| 65–77  | C  | Readable, but real gaps. Clear next steps. |
| 50–64  | D  | Serious issues; agents struggle here. |
| 0–49   | F  | Unreadable to most agents. |
