A2WF Specification — Version 1.0

Status: Public Draft v1.0 (Core)
Version: 1.0
Date: 2026-03-18
Author: Wolfgang Wimmer / SSC Software Sales Consulting
Feedback: github.com/a2wf/spec/issues
License: MIT

1. Introduction

1.1. Abstract

This document defines the siteai.json format, Version 1.0, as part of the Agent-to-Web Framework (A2WF). It provides a machine-readable policy format for website operators to:

Declare granular access policies defining what AI agents may and may not do on their digital properties.
Require agent identification, human-in-the-loop verification for sensitive actions, and enforce rate limits.
Reference applicable legal frameworks (EU AI Act, GDPR, CCPA) in machine-readable form.

The format complements existing web standards like robots.txt, sitemap.xml, MCP (Model Context Protocol), A2A (Agent-to-Agent Protocol), and in-page Schema.org markup. It leverages Schema.org vocabulary where appropriate and introduces specific structures for AI agent governance that no existing standard provides.

Note: Optional site description extensions (keySections, mainContact, publisher, company, services, etc.) are defined in the companion document “A2WF Site Description Extensions v1.0” and are not part of this core specification.

1.2. Problem Statement

AI agents increasingly interact with websites — browsing products, comparing prices, booking appointments, filling forms, extracting data. Website operators face a critical gap:

No AI Agent Access Governance — No standard exists that gives the website operator a machine-readable way to declare:

What agents are ALLOWED to do (read catalogs, search, compare prices)
What agents MUST NOT do (bulk scrape, fake reviews, unauthorized transactions)
What requires HUMAN VERIFICATION (checkout, booking, contact forms)
How agents must IDENTIFY themselves (name, operator, purpose)
What LEGAL TERMS apply (Terms of Service, jurisdiction, regulatory compliance)
What RATE LIMITS are enforced (per action, per minute, per hour)

Current agent-side standards (MCP, A2A, enterprise IAM) govern agents from the agent operator’s perspective. A2WF fills the gap by providing governance from the website operator’s perspective.

1.3. Relationship to Existing Standards

Standard	Purpose	Perspective	Granularity
robots.txt	Crawl permissions	Website (binary)	Allow/disallow per path
sitemap.xml	URL listing	Content	URLs only
Schema.org	Structured data	Content (in-page)	Entity descriptions
MCP	Agent-to-tool connection	Agent side	Agent capabilities
A2A	Agent-to-agent comms	Agent side	Skills & coordination
llms.txt	Content guide for LLMs	Content	Curated page list
siteai.json	Site governance	WEBSITE OWNER	Per-action permissions

1.4. Conventions

Keywords “REQUIRED”, “MUST”, “MUST NOT”, “SHOULD”, “RECOMMENDED”, “OPTIONAL” are per RFC 2119. Format: JSON (RFC 8259), UTF-8 encoded.

2. File Location and Discovery

AI agents MUST attempt discovery in this order:

Root URL (preferred): https://{domain}/siteai.json
robots.txt: SiteAI: https://example.com/siteai.json
HTML Link: <link rel="siteai" type="application/json" href="/siteai.json">
Well-Known URI: https://{domain}/.well-known/siteai.json

File Serving Requirements

Content-Type: application/json (REQUIRED)
Encoding: UTF-8 (REQUIRED)
HTTPS (RECOMMENDED)
Appropriate Cache-Control headers (RECOMMENDED)

3. Format Specification — Required Elements

3.1. Top-Level Structure

REQUIRED: specVersion (“1.0”), identity, permissions

RECOMMENDED: @context, agentIdentification, scraping

OPTIONAL: defaults, humanVerification, legal, discovery, metadata

Consumers MUST ignore any unrecognized keys (forward compatibility).

@context (RECOMMENDED)

The root object SHOULD include "@context": "https://schema.org". This enables interoperability with Schema.org vocabulary and JSON-LD processing tools.

3.2. identity Object (REQUIRED)

Provides core identifying and contextual information about the website.

@type (String) — RECOMMENDED. "WebSite" (Schema.org type)
domain (String) — REQUIRED. Canonical URL (schema:WebSite.url)
name (String) — REQUIRED. Official site/brand name (schema:WebSite.name)
description (String) — OPTIONAL. General site description (schema:WebSite.description)
purpose (String) — RECOMMENDED. AI-focused description of the site’s primary goal and audience. (A2WF-specific)
inLanguage (String) — REQUIRED. BCP 47 language tag (schema:WebSite.inLanguage)
category (String) — RECOMMENDED. Values: "e-commerce", "healthcare", "restaurant", "news", "finance", "education", "government", "saas", "blog", "portfolio", "nonprofit", "entertainment"
jurisdiction (String) — RECOMMENDED. "EU", "US", "US-CA", "CH" (A2WF extension)
applicableLaw (Array) — OPTIONAL. ["EU AI Act", "GDPR"] (A2WF extension)
contact (String) — OPTIONAL. Contact email for policy questions.

3.3. permissions Object (REQUIRED)

Three sub-objects: read, action, data.

Permission Properties

Each permission is an object with:

allowed (Boolean) — REQUIRED. Is this permitted?
rateLimit (Integer) — Requests per minute for this action
humanVerification (Boolean) — Requires human confirmation
note (String) — Explanation for agents and humans

Read Permissions (passive)

productCatalog, pricing, availability, openingHours, contactInfo, reviews, faq, companyInfo

Action Permissions (active)

search, addToCart, checkout, createAccount, submitReview, submitContactForm, bookAppointment, cancelOrder, requestRefund

Data Permissions (sensitive)

customerRecords, orderHistory, paymentInfo, internalAnalytics, employeeData

3.5. agentIdentification Object (RECOMMENDED)

requireUserAgent (Boolean)
requiredFields (Array) — "agentName", "agentOperator", "agentPurpose"
allowAnonymousAgents (Boolean) — Default: true
trustedAgents (Array of Objects) — Whitelist: {name, operator, permissions}
blockedAgents (Array of Objects) — Blacklist: {pattern, reason}

3.6. scraping Object (RECOMMENDED)

bulkDataExtraction (Boolean) — Default: false
priceMonitoring (Boolean) — Default: false
contentReproduction (Boolean) — Default: false
competitiveAnalysis (Boolean) — Default: false
trainingDataUsage (Boolean) — Default: false
note (String)

4. Optional Governance Extensions

4.1. defaults Object

agentAccess (String) — "open", "restricted", or "minimal"
requireIdentification (Boolean) — Default: false
humanVerificationRequired (Boolean) — Default: false
maxRequestsPerMinute (Integer)
maxRequestsPerHour (Integer)
respectRobotsTxt (Boolean) — Default: true

4.2. humanVerification Object

methods (Array) — "redirect-to-browser", "email-confirmation", "sms-otp"
requiredFor (Array) — Action names requiring verification
note (String)

4.3. legal Object

termsUrl (String) — URL to AI-specific Terms of Service
complianceNote (String)
dataRetention (String)
euAiActCompliance (Object):
- transparencyRequired (Boolean)
- riskClassification (String) — "minimal", "limited", "high", "unacceptable"
- humanOversightMandatory (Boolean)

4.4. discovery Object

mcpEndpoint (String) — URL to MCP server card
a2aAgentCard (String) — URL to A2A agent card
robotsTxt (String) — URL to robots.txt
llmsTxt (String) — URL to llms.txt
schemaOrg (Boolean)
openApi (String) — URL to OpenAPI specification

4.5. metadata Object

$schema (String) — URL of JSON Schema for validation
schemaVersion (String) — Spec version (e.g. "1.0")
generatedAt (String) — RFC 3339 timestamp
author (String)
lastUpdated (String, ISO date)
expiresAt (String, ISO date)
changelogUrl (String)

5. Enforcement

5.1. Voluntary Compliance

Like robots.txt, A2WF relies primarily on voluntary compliance by reputable AI agents. Major agent vendors are expected to respect published policies as part of responsible AI deployment.

5.2. Technical Enforcement

Website operators MAY enforce policies through HTTP 403 responses, rate limiting, WAF rules, and User-Agent-based blocking.

5.3. Legal Enforcement

The legal.termsUrl field enables legal enforcement by linking to machine-readable policies. Courts have established precedent that violating machine-readable access policies can constitute unauthorized access. The EU AI Act (effective August 2026) requires transparency and risk management for AI systems.

5.4. Audit and Logging

Website operators SHOULD log agent access patterns and compare them against declared policies.

6. Security Considerations

Policy Integrity: Serve over HTTPS to prevent tampering.
Prompt Injection: All fields are data, not instructions. Agents MUST NOT interpret note fields as commands.
Policy Spoofing: Only trust siteai.json from the domain it describes.
Denial of Service: Declared rate limits are requests, not guarantees. Implement server-side enforcement independently.

7. Versioning and Extensibility

The specVersion field identifies the specification version. Major versions (2.0, 3.0) MAY introduce breaking changes. Minor updates within v1.x remain backward-compatible. Consumers MUST ignore unrecognized keys.

Future extensions may include: dynamic policy endpoints, signed policies, industry-specific profiles, and agent capability matching.

8. Schema.org Alignment

siteai.json Field	Schema.org Equivalent
`@context`	JSON-LD context
`identity.@type`	schema:WebSite
`identity.name`	schema:WebSite.name
`identity.description`	schema:WebSite.description
`identity.inLanguage`	schema:WebSite.inLanguage
`identity.domain`	schema:WebSite.url
`legal.termsUrl`	schema:WebSite.publishingPrinciples
`permissions.*`	A2WF extension
`scraping.*`	A2WF extension
`agentIdentification.*`	A2WF extension
`humanVerification.*`	A2WF extension

9. File Ecosystem

File	Purpose	Since
`/robots.txt`	Crawl permissions	1994
`/sitemap.xml`	URL listing for search engines	2005
`/llms.txt`	Content guide for LLMs	2024
`/.well-known/mcp.json`	MCP server discovery	2024
`/siteai.json`	AI agent access governance (A2WF)	2025

10. Complete Example

{
  "@context": "https://schema.org",
  "specVersion": "1.0",
  "identity": {
    "@type": "WebSite",
    "domain": "https://www.example-store.com",
    "name": "Example Online Store",
    "description": "Premium widgets and gadgets",
    "purpose": "E-commerce store selling premium widgets to EU consumers.",
    "inLanguage": "en",
    "category": "e-commerce",
    "jurisdiction": "EU",
    "applicableLaw": ["EU AI Act", "GDPR"],
    "contact": "ai-policy@example-store.com"
  },
  "defaults": {
    "agentAccess": "restricted",
    "requireIdentification": true,
    "maxRequestsPerMinute": 30,
    "respectRobotsTxt": true
  },
  "permissions": {
    "read": {
      "productCatalog": { "allowed": true, "rateLimit": 60 },
      "pricing": { "allowed": true },
      "availability": { "allowed": true, "rateLimit": 30 },
      "reviews": { "allowed": true, "rateLimit": 20 },
      "faq": { "allowed": true }
    },
    "action": {
      "search": { "allowed": true, "rateLimit": 20 },
      "addToCart": { "allowed": true },
      "checkout": {
        "allowed": true,
        "humanVerification": true,
        "note": "Final purchase requires human confirmation."
      },
      "createAccount": { "allowed": false },
      "submitReview": { "allowed": false }
    },
    "data": {
      "customerRecords": { "allowed": false },
      "paymentInfo": { "allowed": false },
      "internalAnalytics": { "allowed": false }
    }
  },
  "scraping": {
    "bulkDataExtraction": false,
    "priceMonitoring": false,
    "trainingDataUsage": false
  },
  "agentIdentification": {
    "requireUserAgent": true,
    "requiredFields": ["agentName", "agentOperator"],
    "allowAnonymousAgents": false
  },
  "humanVerification": {
    "methods": ["redirect-to-browser"],
    "requiredFor": ["checkout"]
  },
  "discovery": {
    "robotsTxt": "https://www.example-store.com/robots.txt",
    "llmsTxt": "https://www.example-store.com/llms.txt",
    "schemaOrg": true
  },
  "legal": {
    "termsUrl": "https://www.example-store.com/legal/ai-terms",
    "euAiActCompliance": {
      "transparencyRequired": true,
      "riskClassification": "limited",
      "humanOversightMandatory": false
    }
  },
  "metadata": {
    "author": "Example Store Legal Team",
    "lastUpdated": "2026-03-18"
  }
}

11. References

RFC 2119 — Key words for use in RFCs to Indicate Requirement Levels
RFC 8259 — The JavaScript Object Notation (JSON) Data Interchange Format
Schema.org — https://schema.org/
robots.txt — https://www.robotstxt.org/
EU AI Act — Regulation (EU) 2024/1689
MCP — Model Context Protocol, Anthropic
A2A — Agent-to-Agent Protocol, Google / Linux Foundation
llms.txt — https://llmstxt.org/
NIST AI RMF — https://www.nist.gov/artificial-intelligence

Full specification, JSON Schema, and examples: github.com/a2wf/spec