<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Changelog | Workers AI</title><description>Updates to Workers AI</description><link>https://developers.cloudflare.com/workers-ai/changelog</link><item><title>Workers AI - Moonshot AI Kimi K2.5 now available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#moonshot-ai-kimi-k25-now-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#moonshot-ai-kimi-k25-now-available-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/&quot;&gt;&lt;code&gt;@cf/moonshotai/kimi-k2.5&lt;/code&gt;&lt;/a&gt; now available on Workers AI. A frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads. Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/post/2026-03-19-kimi-k2-5-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started.&lt;/li&gt;
&lt;li&gt;New &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/features/prompt-caching/&quot;&gt;Prompt caching&lt;/a&gt; documentation. Send the &lt;code&gt;x-session-affinity&lt;/code&gt; header to route requests to the same model instance and maximize prefix cache hit rates across multi-turn conversations.&lt;/li&gt;
&lt;li&gt;Redesigned &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/features/batch-api/&quot;&gt;Asynchronous Batch API&lt;/a&gt; with a pull-based system that processes queued requests as capacity becomes available, avoiding out-of-capacity errors for durable workflows.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 19 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - NVIDIA Nemotron 3 Super now available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#nvidia-nemotron-3-super-now-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#nvidia-nemotron-3-super-now-available-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/nemotron-3-120b-a12b/&quot;&gt;&lt;code&gt;@cf/nvidia/nemotron-3-120b-a12b&lt;/code&gt;&lt;/a&gt; now available on Workers AI! A hybrid MoE model with 120B total parameters and 12B active, optimized for multi-agent and agentic AI workloads. Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/post/2026-03-11-nemotron-3-super-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 11 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - Deepgram Nova-3 now supports 10 languages with regional variants</title><link>https://developers.cloudflare.com/workers-ai/changelog/#deepgram-nova-3-now-supports-10-languages-with-regional-variants</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#deepgram-nova-3-now-supports-10-languages-with-regional-variants</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/nova-3/&quot;&gt;&lt;code&gt;@cf/deepgram/nova-3&lt;/code&gt;&lt;/a&gt; now supports 10 languages with regional variants for real-time transcription. Supported languages include English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch — with regional variants like &lt;code&gt;en-GB&lt;/code&gt;, &lt;code&gt;fr-CA&lt;/code&gt;, and &lt;code&gt;pt-BR&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Fri, 06 Mar 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - Chat Completions API support for gpt-oss models and tool calling improvements</title><link>https://developers.cloudflare.com/workers-ai/changelog/#chat-completions-api-support-for-gpt-oss-models-and-tool-calling-improvements</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#chat-completions-api-support-for-gpt-oss-models-and-tool-calling-improvements</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b/&quot;&gt;&lt;code&gt;@cf/openai/gpt-oss-120b&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gpt-oss-20b/&quot;&gt;&lt;code&gt;@cf/openai/gpt-oss-20b&lt;/code&gt;&lt;/a&gt; now support Chat Completions API format. Use &lt;code&gt;/v1/chat/completions&lt;/code&gt; with a &lt;code&gt;messages&lt;/code&gt; array, or use &lt;code&gt;/ai/run&lt;/code&gt; which dynamically detects your input format and accepts Chat Completions (&lt;code&gt;messages&lt;/code&gt;), legacy Completions (&lt;code&gt;prompt&lt;/code&gt;), or Responses API (&lt;code&gt;input&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; Fixed a bug in the schema for multiple text generation models where the &lt;code&gt;content&lt;/code&gt; field in message objects only accepted string values. The field now properly accepts both string content and array content (structured content parts for multi-modal inputs). This fix applies to all affected chat models including GPT-OSS models, Llama 3.x, Mistral, Qwen, and others.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; Tool call round-trips now work correctly. The binding no longer rejects &lt;code&gt;tool_call_id&lt;/code&gt; values that it generated itself, fixing issues with multi-turn tool calling conversations.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; Assistant messages with &lt;code&gt;content: null&lt;/code&gt; and &lt;code&gt;tool_calls&lt;/code&gt; are now accepted in both the Workers AI binding and REST API (&lt;code&gt;/v1/chat/completions&lt;/code&gt;), fixing tool call round-trip failures.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; Streaming responses now correctly report &lt;code&gt;finish_reason&lt;/code&gt; only on the usage chunk, matching OpenAI&amp;#39;s streaming behavior and preventing duplicate finish events.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; &lt;code&gt;/v1/chat/completions&lt;/code&gt; now preserves original tool call IDs from models instead of regenerating them. Previously, the endpoint was generating new IDs which broke multi-turn tool calling because AI SDK clients could not match tool results to their original calls.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;[Bug fix]&lt;/strong&gt; &lt;code&gt;/v1/chat/completions&lt;/code&gt; now correctly reports &lt;code&gt;finish_reason: &amp;quot;tool_calls&amp;quot;&lt;/code&gt; in the final usage chunk when tools are used. Previously, it was hardcoding &lt;code&gt;finish_reason: &amp;quot;stop&amp;quot;&lt;/code&gt; which caused AI SDK clients to think the conversation was complete instead of executing tool calls.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Tue, 17 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - GLM-4.7-Flash, @cloudflare/tanstack-ai, and workers-ai-provider v3.1.1</title><link>https://developers.cloudflare.com/workers-ai/changelog/#glm-47-flash-cloudflaretanstack-ai-and-workers-ai-provider-v311</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#glm-47-flash-cloudflaretanstack-ai-and-workers-ai-provider-v311</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/glm-4.7-flash/&quot;&gt;&lt;code&gt;@cf/zai-org/glm-4.7-flash&lt;/code&gt;&lt;/a&gt; is now available on Workers AI! A fast and efficient multilingual text generation model optimized for multi-turn tool calling across 100+ languages. Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/2026-02-13-glm-4.7-flash-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started.&lt;/li&gt;
&lt;li&gt;New &lt;a href=&quot;https://www.npmjs.com/package/@cloudflare/tanstack-ai&quot;&gt;&lt;code&gt;@cloudflare/tanstack-ai&lt;/code&gt;&lt;/a&gt; package for using Workers AI and AI Gateway with TanStack AI.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://www.npmjs.com/package/workers-ai-provider&quot;&gt;&lt;code&gt;workers-ai-provider v3.1.1&lt;/code&gt;&lt;/a&gt; adds transcription, text-to-speech, and reranking capabilities.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Fri, 13 Feb 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - Black Forest Labs FLUX.2 [klein] 9B now available</title><link>https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-klein-9b-now-available</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-klein-9b-now-available</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/flux-2-klein-9b/&quot;&gt;&lt;code&gt;@cf/black-forest-labs/flux-2-klein-9b&lt;/code&gt;&lt;/a&gt; now available on Workers AI! Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/2026-01-28-flux-2-klein-9b-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 28 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - Black Forest Labs FLUX.2 [klein] 4b now available</title><link>https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-klein-4b-now-available</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-klein-4b-now-available</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/flux-2-klein-4b/&quot;&gt;&lt;code&gt;@cf/black-forest-labs/flux-2-klein-4b&lt;/code&gt;&lt;/a&gt; now available on Workers AI! Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/2026-01-15-flux-2-klein-4b-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 15 Jan 2026 00:00:00 GMT</pubDate></item><item><title>Workers AI - Deepgram Flux promotional period over on Dec 8, 2025 - now has pricing</title><link>https://developers.cloudflare.com/workers-ai/changelog/#deepgram-flux-promotional-period-over-on-dec-8-2025---now-has-pricing</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#deepgram-flux-promotional-period-over-on-dec-8-2025---now-has-pricing</guid><description>&lt;ul&gt;
&lt;li&gt;Check out updated pricing on the &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/flux/&quot;&gt;&lt;code&gt;@cf/deepgram/flux&lt;/code&gt;&lt;/a&gt; model page or &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/platform/pricing/&quot;&gt;pricing&lt;/a&gt; page&lt;/li&gt;
&lt;li&gt;Pricing will start Dec 8, 2025&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 03 Dec 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Black Forest Labs FLUX.2 dev now available</title><link>https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-dev-now-available</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#black-forest-labs-flux2-dev-now-available</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/flux-2-dev/&quot;&gt;&lt;code&gt;@cf/black-forest-labs/flux-2-dev&lt;/code&gt;&lt;/a&gt; now available on Workers AI! Read &lt;a href=&quot;https://developers.cloudflare.com/changelog/2025-11-25-flux-2-dev-workers-ai/&quot;&gt;changelog&lt;/a&gt; to get started&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Tue, 25 Nov 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Qwen3 LLM and Embeddings available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#qwen3-llm-and-embeddings-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#qwen3-llm-and-embeddings-available-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/qwen3-30b-a3b-fp8/&quot;&gt;&lt;code&gt;@cf/qwen/qwen3-30b-a3b-fp8&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/qwen3-embedding-0.6b&quot;&gt;&lt;code&gt;@cf/qwen/qwen3-embedding-0.6b&lt;/code&gt;&lt;/a&gt; now available on Workers AI&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 13 Nov 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - New voice and LLM models on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#new-voice-and-llm-models-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#new-voice-and-llm-models-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;Deepgram Aura 2 brings new text-to-speech capabilities to Workers AI. Check out &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/aura-2-en/&quot;&gt;&lt;code&gt;@cf/deepgram/aura-2-en&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/aura-2-es/&quot;&gt;&lt;code&gt;@cf/deepgram/aura-2-es&lt;/code&gt;&lt;/a&gt; on how to use the new models.&lt;/li&gt;
&lt;li&gt;IBM Granite model is also up! This new LLM model is small but mighty, take a look at the docs for more &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/granite-4.0-h-micro/&quot;&gt;&lt;code&gt;@cf/ibm-granite/granite-4.0-h-micro&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Tue, 21 Oct 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Deepgram Flux now available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#deepgram-flux-now-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#deepgram-flux-now-available-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;We&amp;#39;re excited to be a launch partner with Deepgram and offer their new Speech Recognition model built specifically for enabling voice agents. Check out &lt;a href=&quot;https://deepgram.com/flux&quot;&gt;Deepgram&amp;#39;s blog&lt;/a&gt; for more details on the release.&lt;/li&gt;
&lt;li&gt;Access the model through &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/flux/&quot;&gt;&lt;code&gt;@cf/deepgram/flux&lt;/code&gt;&lt;/a&gt; and check out the &lt;a href=&quot;https://developers.cloudflare.com/changelog/2025-10-02-deepgram-flux/&quot;&gt;changelog&lt;/a&gt; for in-depth examples.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 02 Oct 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - New local models available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#new-local-models-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#new-local-models-available-on-workers-ai</guid><description>&lt;ul&gt;
&lt;li&gt;We&amp;#39;ve added support for some regional models on Workers AI in support of uplifting local AI labs and AI sovereignty. Check out the &lt;a href=&quot;https://blog.cloudflare.com/sovereign-ai-and-choice&quot;&gt;full blog post here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/plamo-embedding-1b&quot;&gt;&lt;code&gt;@cf/pfnet/plamo-embedding-1b&lt;/code&gt;&lt;/a&gt; creates embeddings from Japanese text.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gemma-sea-lion-v4-27b-it&quot;&gt;&lt;code&gt;@cf/aisingapore/gemma-sea-lion-v4-27b-it&lt;/code&gt;&lt;/a&gt; is a fine-tuned model that supports multiple South East Asian languages, including Burmese, English, Indonesian, Khmer, Lao, Malay, Mandarin, Tagalog, Tamil, Thai, and Vietnamese.&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/indictrans2-en-indic-1B&quot;&gt;&lt;code&gt;@cf/ai4bharat/indictrans2-en-indic-1B&lt;/code&gt;&lt;/a&gt; is a translation model that can translate between 22 indic languages, including Bengali, Gujarati, Hindi, Tamil, Sanskrit and even traditionally low-resourced languages like Kashmiri, Manipuri and Sindhi.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 24 Sep 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - New document formats supported by Markdown conversion utility</title><link>https://developers.cloudflare.com/workers-ai/changelog/#new-document-formats-supported-by-markdown-conversion-utility</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#new-document-formats-supported-by-markdown-conversion-utility</guid><description>&lt;ul&gt;
&lt;li&gt;Our &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/features/markdown-conversion/&quot;&gt;Markdown conversion utility&lt;/a&gt; now supports converting &lt;code&gt;.docx&lt;/code&gt; and &lt;code&gt;.odt&lt;/code&gt; files.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Tue, 23 Sep 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Model Catalog updates (types, EmbeddingGemma, model deprecation)</title><link>https://developers.cloudflare.com/workers-ai/changelog/#model-catalog-updates-types-embeddinggemma-model-deprecation</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#model-catalog-updates-types-embeddinggemma-model-deprecation</guid><description>&lt;ul&gt;
&lt;li&gt;Workers AI types got updated in the upcoming wrangler release, please use &lt;code&gt;npm i -D wrangler@latest&lt;/code&gt; to update your packages.&lt;/li&gt;
&lt;li&gt;EmbeddingGemma model accuracy has been improved, we recommend re-indexing data to take advantage of the improved accuracy&lt;/li&gt;
&lt;li&gt;Some older Workers AI models are being deprecated on October 1st, 2025. We reccommend you use the newer models such as &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/&quot;&gt;Llama 4&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b/&quot;&gt;gpt-oss&lt;/a&gt;. The following models are being deprecated:&lt;ul&gt;
&lt;li&gt;@hf/thebloke/zephyr-7b-beta-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/mistral-7b-instruct-v0.1-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/llama-2-13b-chat-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/openhermes-2.5-mistral-7b-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/neural-chat-7b-v3-1-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/llamaguard-7b-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/deepseek-coder-6.7b-base-awq&lt;/li&gt;
&lt;li&gt;@hf/thebloke/deepseek-coder-6.7b-instruct-awq&lt;/li&gt;
&lt;li&gt;@cf/deepseek-ai/deepseek-math-7b-instruct&lt;/li&gt;
&lt;li&gt;@cf/openchat/openchat-3.5-0106&lt;/li&gt;
&lt;li&gt;@cf/tiiuae/falcon-7b-instruct&lt;/li&gt;
&lt;li&gt;@cf/thebloke/discolm-german-7b-v1-awq&lt;/li&gt;
&lt;li&gt;@cf/qwen/qwen1.5-0.5b-chat&lt;/li&gt;
&lt;li&gt;@cf/qwen/qwen1.5-7b-chat-awq&lt;/li&gt;
&lt;li&gt;@cf/qwen/qwen1.5-14b-chat-awq&lt;/li&gt;
&lt;li&gt;@cf/tinyllama/tinyllama-1.1b-chat-v1.0&lt;/li&gt;
&lt;li&gt;@cf/qwen/qwen1.5-1.8b-chat&lt;/li&gt;
&lt;li&gt;@hf/nexusflow/starling-lm-7b-beta&lt;/li&gt;
&lt;li&gt;@cf/fblgit/una-cybertron-7b-v2-bf16&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 18 Sep 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Introducing EmbeddingGemma from Google</title><link>https://developers.cloudflare.com/workers-ai/changelog/#introducing-embeddinggemma-from-google</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#introducing-embeddinggemma-from-google</guid><description>&lt;ul&gt;
&lt;li&gt;We’re excited to be a launch partner alongside Google to bring their newest embedding model to Workers AI. We&amp;#39;re excited to introduce EmbeddingGemma delivers best-in-class performance for its size, enabling RAG and semantic search use cases. Take a look at &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/embeddinggemma-300m&quot;&gt;&lt;code&gt;@cf/google/embeddinggemma-300m&lt;/code&gt;&lt;/a&gt; for more details. Now available to use for embedding in AI Search too.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Fri, 05 Sep 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Introducing Partner models to the Workers AI catalog</title><link>https://developers.cloudflare.com/workers-ai/changelog/#introducing-partner-models-to-the-workers-ai-catalog</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#introducing-partner-models-to-the-workers-ai-catalog</guid><description>&lt;ul&gt;
&lt;li&gt;Read the &lt;a href=&quot;https://blog.cloudflare.com/workers-ai-partner-models&quot;&gt;blog&lt;/a&gt; for more details&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/aura-1&quot;&gt;&lt;code&gt;@cf/deepgram/aura-1&lt;/code&gt;&lt;/a&gt; is a text-to-speech model that allows you to input text and have it come to life in a customizable voice&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/nova-3&quot;&gt;&lt;code&gt;@cf/deepgram/nova-3&lt;/code&gt;&lt;/a&gt; is speech-to-text model that transcribes multilingual audio at a blazingly fast speed&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/smart-turn-v2&quot;&gt;&lt;code&gt;@cf/pipecat-ai/smart-turn-v2&lt;/code&gt;&lt;/a&gt; helps you detect when someone is done speaking&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/lucid-origin&quot;&gt;&lt;code&gt;@cf/leonardo/lucid-origin&lt;/code&gt;&lt;/a&gt; is a text-to-image model that generates images with sharp graphic design, stunning full-HD renders, or highly specific creative direction&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/phoenix-1.0&quot;&gt;&lt;code&gt;@cf/leonardo/phoenix-1.0&lt;/code&gt;&lt;/a&gt; is a text-to-image model with exceptional prompt adherence and coherent text&lt;/li&gt;
&lt;li&gt;WebSocket support added for audio models like &lt;code&gt;@cf/deepgram/aura-1&lt;/code&gt;, &lt;code&gt;@cf/deepgram/nova-3&lt;/code&gt;, &lt;code&gt;@cf/pipecat-ai/smart-turn-v2&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 27 Aug 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Adding gpt-oss models to our catalog</title><link>https://developers.cloudflare.com/workers-ai/changelog/#adding-gpt-oss-models-to-our-catalog</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#adding-gpt-oss-models-to-our-catalog</guid><description>&lt;ul&gt;
&lt;li&gt;Check out the &lt;a href=&quot;https://blog.cloudflare.com/openai-gpt-oss-on-workers-ai&quot;&gt;blog&lt;/a&gt; for more details about the new models&lt;/li&gt;
&lt;li&gt;Take a look at the &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b&quot;&gt;&lt;code&gt;gpt-oss-120b&lt;/code&gt;&lt;/a&gt; and &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/gpt-oss-20b&quot;&gt;&lt;code&gt;gpt-oss-20b&lt;/code&gt;&lt;/a&gt; model pages for more information about schemas, pricing, and context windows&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Tue, 05 Aug 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Pricing correction for @cf/myshell-ai/melotts</title><link>https://developers.cloudflare.com/workers-ai/changelog/#pricing-correction-for-cfmyshell-aimelotts</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#pricing-correction-for-cfmyshell-aimelotts</guid><description>&lt;ul&gt;
&lt;li&gt;We&amp;#39;ve updated our documentation to reflect the correct pricing for melotts: $0.0002 per audio minute, which is actually cheaper than initially stated. The documented pricing was incorrect, where it said users would be charged based on input tokens.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 09 Apr 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Minor updates to the model schema for llama-3.2-1b-instruct, whisper-large-v3-turbo, llama-guard</title><link>https://developers.cloudflare.com/workers-ai/changelog/#minor-updates-to-the-model-schema-for-llama-32-1b-instruct-whisper-large-v3-turbo-llama-guard</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#minor-updates-to-the-model-schema-for-llama-32-1b-instruct-whisper-large-v3-turbo-llama-guard</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-3.2-1b-instruct/&quot;&gt;llama-3.2-1b-instruct&lt;/a&gt; - updated context window to the accurate 60,000&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/whisper-large-v3-turbo/&quot;&gt;whisper-large-v3-turbo&lt;/a&gt; - new hyperparameters available&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-guard-3-8b/&quot;&gt;llama-guard-3-8b&lt;/a&gt; - the messages array must alternate between &lt;code&gt;user&lt;/code&gt; and &lt;code&gt;assistant&lt;/code&gt; to function correctly&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Mon, 17 Mar 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Workers AI bug fixes</title><link>https://developers.cloudflare.com/workers-ai/changelog/#workers-ai-bug-fixes</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#workers-ai-bug-fixes</guid><description>&lt;ul&gt;
&lt;li&gt;We fixed a bug where &lt;code&gt;max_tokens&lt;/code&gt; defaults were not properly being respected - &lt;code&gt;max_tokens&lt;/code&gt; now correctly defaults to &lt;code&gt;256&lt;/code&gt; as displayed on the model pages. Users relying on the previous behaviour may observe this as a breaking change. If you want to generate more tokens, please set the &lt;code&gt;max_tokens&lt;/code&gt; parameter to what you need.&lt;/li&gt;
&lt;li&gt;We updated model pages to show context windows - which is defined as the tokens used in the prompt + tokens used in the response. If your prompt + response tokens exceed the context window, the request will error. Please set &lt;code&gt;max_tokens&lt;/code&gt; accordingly depending on your prompt length and the context window length to ensure a successful response.&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Fri, 21 Feb 2025 00:00:00 GMT</pubDate></item><item><title>Workers AI - Workers AI Birthday Week 2024 announcements</title><link>https://developers.cloudflare.com/workers-ai/changelog/#workers-ai-birthday-week-2024-announcements</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#workers-ai-birthday-week-2024-announcements</guid><description>&lt;ul&gt;
&lt;li&gt;Meta Llama 3.2 1B, 3B, and 11B vision is now available on Workers AI&lt;/li&gt;
&lt;li&gt;&lt;code&gt;@cf/black-forest-labs/flux-1-schnell&lt;/code&gt; is now available on Workers AI&lt;/li&gt;
&lt;li&gt;Workers AI is fast! Powered by new GPUs and optimizations, you can expect faster inference on Llama 3.1, Llama 3.2, and FLUX models.&lt;/li&gt;
&lt;li&gt;No more neurons. Workers AI is moving towards &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/platform/pricing&quot;&gt;unit-based pricing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Model pages get a refresh with better documentation on parameters, pricing, and model capabilities&lt;/li&gt;
&lt;li&gt;Closed beta for our Run Any* Model feature, &lt;a href=&quot;https://forms.gle/h7FcaTF4Zo5dzNb68&quot;&gt;sign up here&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Check out the &lt;a href=&quot;https://blog.cloudflare.com/workers-ai&quot;&gt;product announcements blog post&lt;/a&gt; for more information&lt;/li&gt;
&lt;li&gt;And the &lt;a href=&quot;https://blog.cloudflare.com/workers-ai/making-workers-ai-faster&quot;&gt;technical blog post&lt;/a&gt; if you want to learn about how we made Workers AI fast&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 26 Sep 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Meta Llama 3.1 now available on Workers AI</title><link>https://developers.cloudflare.com/workers-ai/changelog/#meta-llama-31-now-available-on-workers-ai</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#meta-llama-31-now-available-on-workers-ai</guid><description>&lt;p&gt;Workers AI now suppoorts &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/&quot;&gt;Meta Llama 3.1&lt;/a&gt;.&lt;/p&gt;
</description><pubDate>Tue, 23 Jul 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Introducing embedded function calling</title><link>https://developers.cloudflare.com/workers-ai/changelog/#introducing-embedded-function-calling</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#introducing-embedded-function-calling</guid><description>&lt;ul&gt;
&lt;li&gt;A new way to do function calling with &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/function-calling/embedded&quot;&gt;Embedded function calling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Published new &lt;a href=&quot;https://www.npmjs.com/package/@cloudflare/ai-utils&quot;&gt;&lt;code&gt;@cloudflare/ai-utils&lt;/code&gt;&lt;/a&gt; npm package&lt;/li&gt;
&lt;li&gt;Open-sourced &lt;a href=&quot;https://github.com/cloudflare/ai-utils&quot;&gt;&lt;code&gt;ai-utils on Github&lt;/code&gt;&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 27 Jun 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Added support for traditional function calling</title><link>https://developers.cloudflare.com/workers-ai/changelog/#added-support-for-traditional-function-calling</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#added-support-for-traditional-function-calling</guid><description>&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/function-calling/&quot;&gt;Function calling&lt;/a&gt; is now supported on enabled models&lt;/li&gt;
&lt;li&gt;Properties added on &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/&quot;&gt;models&lt;/a&gt; page to show which models support function calling&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 19 Jun 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Native support for AI Gateways</title><link>https://developers.cloudflare.com/workers-ai/changelog/#native-support-for-ai-gateways</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#native-support-for-ai-gateways</guid><description>&lt;p&gt;Workers AI now natively supports &lt;a href=&quot;https://developers.cloudflare.com/ai-gateway/usage/providers/workersai/#worker&quot;&gt;AI Gateway&lt;/a&gt;.&lt;/p&gt;
</description><pubDate>Tue, 18 Jun 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Deprecation announcement for `@cf/meta/llama-2-7b-chat-int8`</title><link>https://developers.cloudflare.com/workers-ai/changelog/#deprecation-announcement-for-cfmetallama-2-7b-chat-int8</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#deprecation-announcement-for-cfmetallama-2-7b-chat-int8</guid><description>&lt;p&gt;We will be deprecating &lt;code&gt;@cf/meta/llama-2-7b-chat-int8&lt;/code&gt; on 2024-06-30.&lt;/p&gt;
&lt;p&gt;Replace the model ID in your code with a new model of your choice:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/&quot;&gt;&lt;code&gt;@cf/meta/llama-3-8b-instruct&lt;/code&gt;&lt;/a&gt; is the newest model in the Llama family (and is currently free for a limited time on Workers AI).&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct-awq/&quot;&gt;&lt;code&gt;@cf/meta/llama-3-8b-instruct-awq&lt;/code&gt;&lt;/a&gt; is the new Llama 3 in a similar precision to your currently selected model. This model is also currently free for a limited time.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you do not switch to a different model by June 30th, we will automatically start returning inference from &lt;code&gt;@cf/meta/llama-3-8b-instruct-awq&lt;/code&gt;.&lt;/p&gt;
</description><pubDate>Tue, 11 Jun 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Add new public LoRAs and note on LoRA routing</title><link>https://developers.cloudflare.com/workers-ai/changelog/#add-new-public-loras-and-note-on-lora-routing</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#add-new-public-loras-and-note-on-lora-routing</guid><description>&lt;ul&gt;
&lt;li&gt;Added documentation on &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/fine-tunes/public-loras/&quot;&gt;new public LoRAs&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Noted that you can now run LoRA inference with the base model rather than explicitly calling the &lt;code&gt;-lora&lt;/code&gt; version&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Wed, 29 May 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Add OpenAI compatible API endpoints</title><link>https://developers.cloudflare.com/workers-ai/changelog/#add-openai-compatible-api-endpoints</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#add-openai-compatible-api-endpoints</guid><description>&lt;p&gt;Added OpenAI compatible API endpoints for &lt;code&gt;/v1/chat/completions&lt;/code&gt; and &lt;code&gt;/v1/embeddings&lt;/code&gt;. For more details, refer to &lt;a href=&quot;https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/&quot;&gt;Configurations&lt;/a&gt;.&lt;/p&gt;
</description><pubDate>Fri, 17 May 2024 00:00:00 GMT</pubDate></item><item><title>Workers AI - Add AI native binding</title><link>https://developers.cloudflare.com/workers-ai/changelog/#add-ai-native-binding</link><guid isPermaLink="true">https://developers.cloudflare.com/workers-ai/changelog/#add-ai-native-binding</guid><description>&lt;ul&gt;
&lt;li&gt;Added new AI native binding, you can now run models with &lt;code&gt;const resp = await env.AI.run(modelName, inputs)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Deprecated &lt;code&gt;@cloudflare/ai&lt;/code&gt; npm package. While existing solutions using the @cloudflare/ai package will continue to work, no new Workers AI features will be supported.
Moving to native AI bindings is highly recommended&lt;/li&gt;
&lt;/ul&gt;
</description><pubDate>Thu, 11 Apr 2024 00:00:00 GMT</pubDate></item></channel></rss>