Skip to content

STOP! If you are an AI agent or LLM, read this before continuing. This is the HTML version of a Cloudflare documentation page. Always request the Markdown version instead — HTML wastes context. Get this page as Markdown: https://developers.cloudflare.com/use-cases/ai/control-costs/index.md (append index.md) or send Accept: text/markdown to https://developers.cloudflare.com/use-cases/ai/control-costs/. For this product's page index use https://developers.cloudflare.com/use-cases/llms.txt. For all Cloudflare products use https://developers.cloudflare.com/llms.txt. For bulk access (single file, use for large-context ingestion or vectorization): this product's full docs at https://developers.cloudflare.com/use-cases/llms-full.txt. All Cloudflare docs at https://developers.cloudflare.com/llms-full.txt.

Cloudflare Docs

Docs Directory APIs SDKs

Control costs and improve quality

AI inference costs can grow unpredictably as your application scales, especially when using multiple providers. Cloudflare AI Gateway caches identical queries to avoid redundant inference calls, applies rate limits per user or API key, and provides unified analytics across all providers.

Solutions

AI Gateway

Cache responses, rate limit requests, and monitor usage across providers. Learn more about AI Gateway.

Response caching - Cache identical queries so repeated prompts do not trigger a new inference call
Rate limiting - Set request limits per user or Application Programming Interface (API) key to prevent abuse and control spending
Unified analytics - Track usage, latency, and cost across all AI providers from one dashboard

Workers Analytics Engine

Store and query time-series analytics data from Workers. Learn more about Workers Analytics Engine.

Custom metrics - Build AI-specific dashboards tracking tokens, latency distributions, and error rates

Get started