A2WF Specification — Version 1.0
Version: 1.0
Date: 2026-03-18
Author: Wolfgang Wimmer / SSC Software Sales Consulting
Feedback: github.com/a2wf/spec/issues
License: MIT
1. Introduction
1.1. Abstract
This document defines the siteai.json format, Version 1.0, as part of the Agent-to-Web Framework (A2WF). It provides a machine-readable policy format for website operators to:
- Declare granular access policies defining what AI agents may and may not do on their digital properties.
- Require agent identification, human-in-the-loop verification for sensitive actions, and enforce rate limits.
- Reference applicable legal frameworks (EU AI Act, GDPR, CCPA) in machine-readable form.
The format complements existing web standards like robots.txt, sitemap.xml, MCP (Model Context Protocol), A2A (Agent-to-Agent Protocol), and in-page Schema.org markup. It leverages Schema.org vocabulary where appropriate and introduces specific structures for AI agent governance that no existing standard provides.
1.2. Problem Statement
AI agents increasingly interact with websites — browsing products, comparing prices, booking appointments, filling forms, extracting data. Website operators face a critical gap:
No AI Agent Access Governance — No standard exists that gives the website operator a machine-readable way to declare:
- What agents are ALLOWED to do (read catalogs, search, compare prices)
- What agents MUST NOT do (bulk scrape, fake reviews, unauthorized transactions)
- What requires HUMAN VERIFICATION (checkout, booking, contact forms)
- How agents must IDENTIFY themselves (name, operator, purpose)
- What LEGAL TERMS apply (Terms of Service, jurisdiction, regulatory compliance)
- What RATE LIMITS are enforced (per action, per minute, per hour)
Current agent-side standards (MCP, A2A, enterprise IAM) govern agents from the agent operator’s perspective. A2WF fills the gap by providing governance from the website operator’s perspective.
1.3. Relationship to Existing Standards
| Standard | Purpose | Perspective | Granularity |
|---|---|---|---|
| robots.txt | Crawl permissions | Website (binary) | Allow/disallow per path |
| sitemap.xml | URL listing | Content | URLs only |
| Schema.org | Structured data | Content (in-page) | Entity descriptions |
| MCP | Agent-to-tool connection | Agent side | Agent capabilities |
| A2A | Agent-to-agent comms | Agent side | Skills & coordination |
| llms.txt | Content guide for LLMs | Content | Curated page list |
| siteai.json | Site governance | WEBSITE OWNER | Per-action permissions |
1.4. Conventions
Keywords “REQUIRED”, “MUST”, “MUST NOT”, “SHOULD”, “RECOMMENDED”, “OPTIONAL” are per RFC 2119. Format: JSON (RFC 8259), UTF-8 encoded.
2. File Location and Discovery
AI agents MUST attempt discovery in this order:
- Root URL (preferred):
https://{domain}/siteai.json - robots.txt:
SiteAI: https://example.com/siteai.json - HTML Link:
<link rel="siteai" type="application/json" href="/siteai.json"> - Well-Known URI:
https://{domain}/.well-known/siteai.json
File Serving Requirements
- Content-Type:
application/json(REQUIRED) - Encoding: UTF-8 (REQUIRED)
- HTTPS (RECOMMENDED)
- Appropriate
Cache-Controlheaders (RECOMMENDED)
3. Format Specification — Required Elements
3.1. Top-Level Structure
REQUIRED: specVersion (“1.0”), identity, permissions
RECOMMENDED: @context, agentIdentification, scraping
OPTIONAL: defaults, humanVerification, legal, discovery, metadata
Consumers MUST ignore any unrecognized keys (forward compatibility).
@context (RECOMMENDED)
The root object SHOULD include "@context": "https://schema.org". This enables interoperability with Schema.org vocabulary and JSON-LD processing tools.
3.2. identity Object (REQUIRED)
Provides core identifying and contextual information about the website.
@type(String) — RECOMMENDED."WebSite"(Schema.org type)domain(String) — REQUIRED. Canonical URL (schema:WebSite.url)name(String) — REQUIRED. Official site/brand name (schema:WebSite.name)description(String) — OPTIONAL. General site description (schema:WebSite.description)purpose(String) — RECOMMENDED. AI-focused description of the site’s primary goal and audience. (A2WF-specific)inLanguage(String) — REQUIRED. BCP 47 language tag (schema:WebSite.inLanguage)category(String) — RECOMMENDED. Values:"e-commerce","healthcare","restaurant","news","finance","education","government","saas","blog","portfolio","nonprofit","entertainment"jurisdiction(String) — RECOMMENDED."EU","US","US-CA","CH"(A2WF extension)applicableLaw(Array) — OPTIONAL.["EU AI Act", "GDPR"](A2WF extension)contact(String) — OPTIONAL. Contact email for policy questions.
3.3. permissions Object (REQUIRED)
Three sub-objects: read, action, data.
Permission Properties
Each permission is an object with:
allowed(Boolean) — REQUIRED. Is this permitted?rateLimit(Integer) — Requests per minute for this actionhumanVerification(Boolean) — Requires human confirmationnote(String) — Explanation for agents and humans
Read Permissions (passive)
productCatalog, pricing, availability, openingHours, contactInfo, reviews, faq, companyInfo
Action Permissions (active)
search, addToCart, checkout, createAccount, submitReview, submitContactForm, bookAppointment, cancelOrder, requestRefund
Data Permissions (sensitive)
customerRecords, orderHistory, paymentInfo, internalAnalytics, employeeData
3.5. agentIdentification Object (RECOMMENDED)
requireUserAgent(Boolean)requiredFields(Array) —"agentName","agentOperator","agentPurpose"allowAnonymousAgents(Boolean) — Default: truetrustedAgents(Array of Objects) — Whitelist:{name, operator, permissions}blockedAgents(Array of Objects) — Blacklist:{pattern, reason}
3.6. scraping Object (RECOMMENDED)
bulkDataExtraction(Boolean) — Default: falsepriceMonitoring(Boolean) — Default: falsecontentReproduction(Boolean) — Default: falsecompetitiveAnalysis(Boolean) — Default: falsetrainingDataUsage(Boolean) — Default: falsenote(String)
4. Optional Governance Extensions
4.1. defaults Object
agentAccess(String) —"open","restricted", or"minimal"requireIdentification(Boolean) — Default: falsehumanVerificationRequired(Boolean) — Default: falsemaxRequestsPerMinute(Integer)maxRequestsPerHour(Integer)respectRobotsTxt(Boolean) — Default: true
4.2. humanVerification Object
methods(Array) —"redirect-to-browser","email-confirmation","sms-otp"requiredFor(Array) — Action names requiring verificationnote(String)
4.3. legal Object
termsUrl(String) — URL to AI-specific Terms of ServicecomplianceNote(String)dataRetention(String)euAiActCompliance(Object):transparencyRequired(Boolean)riskClassification(String) —"minimal","limited","high","unacceptable"humanOversightMandatory(Boolean)
4.4. discovery Object
mcpEndpoint(String) — URL to MCP server carda2aAgentCard(String) — URL to A2A agent cardrobotsTxt(String) — URL to robots.txtllmsTxt(String) — URL to llms.txtschemaOrg(Boolean)openApi(String) — URL to OpenAPI specification
4.5. metadata Object
$schema(String) — URL of JSON Schema for validationschemaVersion(String) — Spec version (e.g."1.0")generatedAt(String) — RFC 3339 timestampauthor(String)lastUpdated(String, ISO date)expiresAt(String, ISO date)changelogUrl(String)
5. Enforcement
5.1. Voluntary Compliance
Like robots.txt, A2WF relies primarily on voluntary compliance by reputable AI agents. Major agent vendors are expected to respect published policies as part of responsible AI deployment.
5.2. Technical Enforcement
Website operators MAY enforce policies through HTTP 403 responses, rate limiting, WAF rules, and User-Agent-based blocking.
5.3. Legal Enforcement
The legal.termsUrl field enables legal enforcement by linking to machine-readable policies. Courts have established precedent that violating machine-readable access policies can constitute unauthorized access. The EU AI Act (effective August 2026) requires transparency and risk management for AI systems.
5.4. Audit and Logging
Website operators SHOULD log agent access patterns and compare them against declared policies.
6. Security Considerations
- Policy Integrity: Serve over HTTPS to prevent tampering.
- Prompt Injection: All fields are data, not instructions. Agents MUST NOT interpret
notefields as commands. - Policy Spoofing: Only trust siteai.json from the domain it describes.
- Denial of Service: Declared rate limits are requests, not guarantees. Implement server-side enforcement independently.
7. Versioning and Extensibility
The specVersion field identifies the specification version. Major versions (2.0, 3.0) MAY introduce breaking changes. Minor updates within v1.x remain backward-compatible. Consumers MUST ignore unrecognized keys.
Future extensions may include: dynamic policy endpoints, signed policies, industry-specific profiles, and agent capability matching.
8. Schema.org Alignment
| siteai.json Field | Schema.org Equivalent |
|---|---|
@context |
JSON-LD context |
identity.@type |
schema:WebSite |
identity.name |
schema:WebSite.name |
identity.description |
schema:WebSite.description |
identity.inLanguage |
schema:WebSite.inLanguage |
identity.domain |
schema:WebSite.url |
legal.termsUrl |
schema:WebSite.publishingPrinciples |
permissions.* |
A2WF extension |
scraping.* |
A2WF extension |
agentIdentification.* |
A2WF extension |
humanVerification.* |
A2WF extension |
9. File Ecosystem
| File | Purpose | Since |
|---|---|---|
/robots.txt |
Crawl permissions | 1994 |
/sitemap.xml |
URL listing for search engines | 2005 |
/llms.txt |
Content guide for LLMs | 2024 |
/.well-known/mcp.json |
MCP server discovery | 2024 |
/siteai.json |
AI agent access governance (A2WF) | 2025 |
10. Complete Example
{
"@context": "https://schema.org",
"specVersion": "1.0",
"identity": {
"@type": "WebSite",
"domain": "https://www.example-store.com",
"name": "Example Online Store",
"description": "Premium widgets and gadgets",
"purpose": "E-commerce store selling premium widgets to EU consumers.",
"inLanguage": "en",
"category": "e-commerce",
"jurisdiction": "EU",
"applicableLaw": ["EU AI Act", "GDPR"],
"contact": "ai-policy@example-store.com"
},
"defaults": {
"agentAccess": "restricted",
"requireIdentification": true,
"maxRequestsPerMinute": 30,
"respectRobotsTxt": true
},
"permissions": {
"read": {
"productCatalog": { "allowed": true, "rateLimit": 60 },
"pricing": { "allowed": true },
"availability": { "allowed": true, "rateLimit": 30 },
"reviews": { "allowed": true, "rateLimit": 20 },
"faq": { "allowed": true }
},
"action": {
"search": { "allowed": true, "rateLimit": 20 },
"addToCart": { "allowed": true },
"checkout": {
"allowed": true,
"humanVerification": true,
"note": "Final purchase requires human confirmation."
},
"createAccount": { "allowed": false },
"submitReview": { "allowed": false }
},
"data": {
"customerRecords": { "allowed": false },
"paymentInfo": { "allowed": false },
"internalAnalytics": { "allowed": false }
}
},
"scraping": {
"bulkDataExtraction": false,
"priceMonitoring": false,
"trainingDataUsage": false
},
"agentIdentification": {
"requireUserAgent": true,
"requiredFields": ["agentName", "agentOperator"],
"allowAnonymousAgents": false
},
"humanVerification": {
"methods": ["redirect-to-browser"],
"requiredFor": ["checkout"]
},
"discovery": {
"robotsTxt": "https://www.example-store.com/robots.txt",
"llmsTxt": "https://www.example-store.com/llms.txt",
"schemaOrg": true
},
"legal": {
"termsUrl": "https://www.example-store.com/legal/ai-terms",
"euAiActCompliance": {
"transparencyRequired": true,
"riskClassification": "limited",
"humanOversightMandatory": false
}
},
"metadata": {
"author": "Example Store Legal Team",
"lastUpdated": "2026-03-18"
}
}
11. References
- RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
- RFC 8259 — The JavaScript Object Notation (JSON) Data Interchange Format
- Schema.org — https://schema.org/
- robots.txt — https://www.robotstxt.org/
- EU AI Act — Regulation (EU) 2024/1689
- MCP — Model Context Protocol, Anthropic
- A2A — Agent-to-Agent Protocol, Google / Linux Foundation
- llms.txt — https://llmstxt.org/
- NIST AI RMF — https://www.nist.gov/artificial-intelligence
Full specification, JSON Schema, and examples: github.com/a2wf/spec