Build and run AI applications
To build and deploy an AI application, you need compute for application logic, a way to run inference, and a gateway to manage costs across providers. Cloudflare Workers hosts your application logic and serves your frontend. Workers AI runs inference at the edge with pay-per-use pricing. AI Gateway adds caching, rate limiting, and observability across OpenAI, Anthropic, and other providers. Durable Objects coordinate stateful workflows and multi-turn conversations.
Build and deploy serverless applications on Cloudflare's global network. Learn more about Workers.
- Streaming responses - Stream AI responses token-by-token as they generate, without buffering the full reply
- Full-stack deployment - Serve frontend and backend from a single deployment without managing separate infrastructure
Run inference on Cloudflare's global network via a Workers binding, with pay-per-use pricing. Learn more about Workers AI.
- Global inference - Run models at the Cloudflare location nearest to the user, reducing round-trip latency
- Pay-per-use pricing - No GPU reservations or idle costs; pay only for tokens processed
Proxy requests to any AI provider with caching, rate limiting, and unified analytics. Learn more about AI Gateway.
- Provider flexibility - Route requests to OpenAI, Anthropic, Workers AI, or any other provider through a single endpoint
- Unified observability - Track request volume, latency, costs, and errors across all providers in one place
Stateful objects with strongly consistent storage and coordination. Learn more about Durable Objects.
- Stateful workflows - Coordinate multi-step AI pipelines and maintain conversation state across requests