REST API
The REST API lets you call any model — whether hosted on Cloudflare or by a third-party provider like OpenAI, Anthropic, or Google — through the same Cloudflare API, with all AI Gateway features — logging, caching, rate limiting, and more — applied automatically.
No provider SDKs or API keys are needed. Authentication and billing are handled through your Cloudflare account. Third-party models are billed via Unified Billing, while Workers AI models follow Workers AI pricing.
Four endpoints are available, each suited to different use cases:
| Endpoint | Format | Use case |
|---|---|---|
POST /ai/run | Envelope with model, input | All models and modalities (LLM, image, TTS, ASR) |
POST /ai/v1/chat/completions | OpenAI chat completions | LLMs — OpenAI SDK compatible |
POST /ai/v1/responses | OpenAI Responses API | Agentic workflows — OpenAI SDK compatible |
POST /ai/v1/messages | Anthropic Messages API | LLMs — Anthropic SDK compatible |
Authenticate with a Cloudflare API token that has AI Gateway permission. Pass it in the Authorization header.
Third-party models use the author/model format:
openai/gpt-4.1— OpenAIanthropic/claude-sonnet-4— Anthropicgoogle/gemini-3-flash— Googlexai/grok-3— xAI
Workers AI models use the @cf/author/model format (for example, @cf/moonshotai/kimi-k2.6). Workers AI requests also require the cf-aig-gateway-id header — refer to Call a Workers AI model for details.
Browse available models in the model catalog.
Accepts any model with its per-model schema. Model-specific parameters go inside input.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4.1", "input": { "messages": [ { "role": "user", "content": "What is Cloudflare?" } ], "max_tokens": 512 } }'To call a Workers AI model, use the @cf/ prefix in the model name and include the cf-aig-gateway-id header to specify which gateway to route through.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "cf-aig-gateway-id: my-gateway" \ --header "Content-Type: application/json" \ --data '{ "model": "@cf/moonshotai/kimi-k2.6", "input": { "messages": [ { "role": "user", "content": "What is Cloudflare?" } ] } }'The existing Workers AI endpoint with the model ID in the URL path also continues to work:
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/moonshotai/kimi-k2.6" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "messages": [ { "role": "user", "content": "What is Cloudflare?" } ] }'Uses the standard OpenAI chat completions format. The model field uses the same author/model naming. This endpoint is compatible with the OpenAI SDK and other OpenAI-compatible clients.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/v1/chat/completions" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "openai/gpt-4.1", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is Cloudflare?" } ], "max_tokens": 512, "temperature": 0.7, "stream": true }'Point the OpenAI SDK baseURL at the Cloudflare API:
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: CLOUDFLARE_API_TOKEN, baseURL: `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/v1`,});
const response = await openai.chat.completions.create({ model: "openai/gpt-4.1", messages: [{ role: "user", content: "What is Cloudflare?" }],});Uses the OpenAI Responses API format for agentic workflows. Compatible with the OpenAI SDK.
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: CLOUDFLARE_API_TOKEN, baseURL: `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/v1`,});
const response = await openai.responses.create({ model: "openai/gpt-4.1", input: "What is Cloudflare?",});Uses the Anthropic Messages API format. Compatible with the Anthropic SDK.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/v1/messages" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "Content-Type: application/json" \ --data '{ "model": "anthropic/claude-sonnet-4-5", "max_tokens": 512, "messages": [ { "role": "user", "content": "What is Cloudflare?" } ] }'Point the Anthropic SDK baseURL at the Cloudflare API:
import Anthropic from "@anthropic-ai/sdk";
const anthropic = new Anthropic({ apiKey: CLOUDFLARE_API_TOKEN, baseURL: `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/v1`,});
const message = await anthropic.messages.create({ model: "anthropic/claude-sonnet-4-5", max_tokens: 512, messages: [{ role: "user", content: "What is Cloudflare?" }],});By default, third-party model requests route through your account's default AI Gateway. To use a specific gateway, include the cf-aig-gateway-id header. Workers AI requests always require this header.
curl -X POST "https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/v1/chat/completions" \ --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \ --header "cf-aig-gateway-id: my-gateway" \ --header "Content-Type: application/json" \ --data '{ "model": "anthropic/claude-sonnet-4", "messages": [ { "role": "user", "content": "Hello" } ] }'With the OpenAI SDK, set the header via defaultHeaders:
const openai = new OpenAI({ apiKey: CLOUDFLARE_API_TOKEN, baseURL: `https://api.cloudflare.com/client/v4/accounts/${ACCOUNT_ID}/ai/v1`, defaultHeaders: { "cf-aig-gateway-id": "my-gateway", },});All AI Gateway features configured on that gateway — caching, rate limiting, guardrails, and logging — apply to the request.
Use cf-aig-* headers to control AI Gateway behavior on a per-request basis:
| Header | Type | Description |
|---|---|---|
cf-aig-skip-cache | boolean | Skip the cache for this request. |
cf-aig-cache-ttl | number | Cache TTL in seconds. |
cf-aig-cache-key | string | Custom cache key. |
cf-aig-collect-log | boolean | Turn logging on or off for this request. |
cf-aig-request-timeout | number | Request timeout in milliseconds. |
cf-aig-max-attempts | number | Retry attempts (max 5). |
cf-aig-retry-delay | number | Retry delay in milliseconds (max 5000). |
cf-aig-backoff | string | Backoff method: constant, linear, or exponential. |
cf-aig-metadata | JSON string | Custom metadata to attach to the log entry. |
For more details on these options, refer to Request handling and Caching.
- Unified Billing — load credits and manage spend limits.
- Workers AI binding — call models from within a Cloudflare Worker using
env.AI.run(). - Model catalog — browse models supported by the REST API.