---
title: Cloudflare Workers AI
description: Workers AI allows you to run AI models in a serverless way, without having to worry about scaling, maintaining, or paying for unused infrastructure. You can invoke models running on GPUs on Cloudflare's network from your own code — from Workers, Pages, or anywhere via the Cloudflare API.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Cloudflare Workers AI

Run machine learning models, powered by serverless GPUs, on Cloudflare's global network.

 Available on Free and Paid plans 

Workers AI allows you to run AI models in a serverless way, without having to worry about scaling, maintaining, or paying for unused infrastructure. You can invoke models running on GPUs on Cloudflare's network from your own code — from [Workers](https://developers.cloudflare.com/workers/), [Pages](https://developers.cloudflare.com/pages/), or anywhere via [the Cloudflare API](https://developers.cloudflare.com/api/resources/ai/methods/run/).

Workers AI gives you access to:

* **50+ [open-source models](https://developers.cloudflare.com/workers-ai/models/)**, available as a part of our model catalog
* Serverless, **pay-for-what-you-use** [pricing model](https://developers.cloudflare.com/workers-ai/platform/pricing/)
* All as part of a **fully-featured developer platform**, including [AI Gateway](https://developers.cloudflare.com/ai-gateway/), [Vectorize](https://developers.cloudflare.com/vectorize/), [Workers](https://developers.cloudflare.com/workers/) and more...

[ Get started ](https://developers.cloudflare.com/workers-ai/get-started)[ Watch a Workers AI demo ](https://youtu.be/cK%5FleoJsBWY?si=4u6BIy%5FuBOZf9Ve8)

Custom requirements

If you have custom requirements like private custom models or higher limits, complete the [Custom Requirements Form ↗](https://forms.gle/axnnpGDb6xrmR31T6). Cloudflare will contact you with next steps.

Workers AI is now Generally Available

To report bugs or give feedback, go to the [#workers-ai Discord channel ↗](https://discord.cloudflare.com). If you are having issues with Wrangler, report issues in the [Wrangler GitHub repository ↗](https://github.com/cloudflare/workers-sdk/issues/new/choose).

---

## Features

### Models

Workers AI comes with a curated set of popular open-source models that enable you to do tasks such as image classification, text generation, object detection and more.

[ Browse models ](https://developers.cloudflare.com/workers-ai/models/) 

---

## Related products

**[AI Gateway](https://developers.cloudflare.com/ai-gateway/)** 

Observe and control your AI applications with caching, rate limiting, request retries, model fallback, and more.

**[Vectorize](https://developers.cloudflare.com/vectorize/)** 

Build full-stack AI applications with Vectorize, Cloudflare’s vector database. Adding Vectorize enables you to perform tasks such as semantic search, recommendations, anomaly detection or can be used to provide context and memory to an LLM.

**[Workers](https://developers.cloudflare.com/workers/)** 

Build serverless applications and deploy instantly across the globe for exceptional performance, reliability, and scale.

**[Pages](https://developers.cloudflare.com/pages/)** 

Create full-stack applications that are instantly deployed to the Cloudflare global network.

**[R2](https://developers.cloudflare.com/r2/)** 

Store large amounts of unstructured data without the costly egress bandwidth fees associated with typical cloud storage services.

**[D1](https://developers.cloudflare.com/d1/)** 

Create new serverless SQL databases to query from your Workers and Pages projects.

**[Durable Objects](https://developers.cloudflare.com/durable-objects/)** 

A globally distributed coordination API with strongly consistent storage.

**[KV](https://developers.cloudflare.com/kv/)** 

Create a global, low-latency, key-value data storage.

---

## More resources

[Get started](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/) 

Build and deploy your first Workers AI application.

[Plans](https://developers.cloudflare.com/workers-ai/platform/pricing/) 

Learn about Free and Paid plans.

[Limits](https://developers.cloudflare.com/workers-ai/platform/limits/) 

Learn about Workers AI limits.

[Use cases](https://developers.cloudflare.com/use-cases/ai/) 

Learn how you can build and deploy ambitious AI applications to Cloudflare's global network.

[Storage options](https://developers.cloudflare.com/workers/platform/storage-options/) 

Learn which storage option is best for your project.

[Developer Discord](https://discord.cloudflare.com) 

Connect with the Workers community on Discord to ask questions, share what you are building, and discuss the platform with other developers.

[@CloudflareDev](https://x.com/cloudflaredev) 

Follow @CloudflareDev on Twitter to learn about product announcements, and what is new in Cloudflare Workers.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}}]}
```

---

---
title: Getting started
description: There are several options to build your Workers AI projects on Cloudflare. To get started, choose your preferred method:
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/get-started/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Getting started

There are several options to build your Workers AI projects on Cloudflare. To get started, choose your preferred method:

* [ Workers Bindings ](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/)
* [ REST API ](https://developers.cloudflare.com/workers-ai/get-started/rest-api/)
* [ Dashboard ](https://developers.cloudflare.com/workers-ai/get-started/dashboard/)

Note

These examples are geared towards creating new Workers AI projects. For help adding Workers AI to an existing Worker, refer to [Workers Bindings](https://developers.cloudflare.com/workers-ai/configuration/bindings/).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/get-started/","name":"Getting started"}}]}
```

---

---
title: Dashboard
description: Follow this guide to create a Workers AI application using the Cloudflare dashboard.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/get-started/dashboard.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Dashboard

Follow this guide to create a Workers AI application using the Cloudflare dashboard.

## Prerequisites

Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.

## Setup

To create a Workers AI application:

1. In the Cloudflare dashboard, go to the **Workers & Pages** page.  
[ Go to **Workers & Pages** ](https://dash.cloudflare.com/?to=/:account/workers-and-pages)
2. Select **Create application**.
3. Under **Select a template**, select **LLM Chat App**.
4. Select **Deploy**.
5. Name your Worker, then select **Create and deploy**.
6. Preview your Worker at its provided [workers.dev](https://developers.cloudflare.com/workers/configuration/routing/workers-dev/) subdomain.

## Development

### Dashboard

Editing in the dashboard is helpful for simpler use cases.

Once you have created your Worker script, you can edit and deploy your Worker using the Cloudflare dashboard:

1. In the Cloudflare dashboard, go to the **Workers & Pages** page.  
[ Go to **Workers & Pages** ](https://dash.cloudflare.com/?to=/:account/workers-and-pages)
2. Select your application.
3. Select **Edit Code**.
![Edit code directly within the Cloudflare dashboard](https://developers.cloudflare.com/_astro/workers-edit-code.CKxxvQSe_11id2b.webp) 

### Wrangler CLI

To develop more advanced applications or [implement tests](https://developers.cloudflare.com/workers/testing/), start working in the Wrangler CLI.

1. Install [npm ↗](https://docs.npmjs.com/getting-started).
2. Install [Node.js ↗](https://nodejs.org/en/).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

1. Run the following command, replacing the value of `[<DIRECTORY>]` which the location you want to put your Worker Script.

 npm  yarn  pnpm 

```
npm create cloudflare@latest -- [<DIRECTORY>] --type=pre-existing
```

```
yarn create cloudflare [<DIRECTORY>] --type=pre-existing
```

```
pnpm create cloudflare@latest [<DIRECTORY>] --type=pre-existing
```

After you run this command - and work through the prompts - your local changes will not automatically sync with dashboard. So, once you download your script, continue using the CLI.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/get-started/","name":"Getting started"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/get-started/dashboard/","name":"Dashboard"}}]}
```

---

---
title: REST API
description: Use the Cloudflare Workers AI REST API to deploy a large language model (LLM).
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/get-started/rest-api.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# REST API

This guide will instruct you through setting up and deploying your first Workers AI project. You will use the Workers AI REST API to experiment with a large language model (LLM).

## Prerequisites

Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages) if you have not already.

## 1\. Get API token and Account ID

You need your API token and Account ID to use the REST API.

To get these values:

1. In the Cloudflare dashboard, go to the **Workers AI** page.  
[ Go to **Workers AI** ](https://dash.cloudflare.com/?to=/:account/ai/workers-ai)
2. Select **Use REST API**.
3. Get your API token:  
   1. Select **Create a Workers AI API Token**.  
   2. Review the prefilled information.  
   3. Select **Create API Token**.  
   4. Select **Copy API Token**.  
   5. Save that value for future use.
4. For **Get Account ID**, copy the value for **Account ID**. Save that value for future use.

Note

If you choose to [create an API token](https://developers.cloudflare.com/fundamentals/api/get-started/create-token/) instead of using the template, that token will need permissions for both `Workers AI - Read` and `Workers AI - Edit`.

## 2\. Run a model via API

After creating your API token, authenticate and make requests to the API using your API token in the request.

You will use the [Execute AI model](https://developers.cloudflare.com/api/resources/ai/methods/run/) endpoint to run the [@cf/meta/llama-3.1-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/) model:

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.1-8b-instruct \

  -H 'Authorization: Bearer {API_TOKEN}' \

  -d '{ "prompt": "Where did the phrase Hello World come from" }'


```

Replace the values for `{ACCOUNT_ID}` and `{API_TOKEN}`.

The API response will look like the following:

```

{

  "result": {

    "response": "Hello, World first appeared in 1974 at Bell Labs when Brian Kernighan included it in the C programming language example. It became widely used as a basic test program due to simplicity and clarity. It represents an inviting greeting from a program to the world."

  },

  "success": true,

  "errors": [],

  "messages": []

}


```

This example execution uses the `@cf/meta/llama-3.1-8b-instruct` model, but you can use any of the models in the [Workers AI models catalog](https://developers.cloudflare.com/workers-ai/models/). If using another model, you will need to replace `{model}` with your desired model name.

By completing this guide, you have created a Cloudflare account (if you did not have one already) and an API token that grants Workers AI read permissions to your account. You executed the [@cf/meta/llama-3.1-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/) model using a cURL command from the terminal and received an answer to your prompt in a JSON response.

## Related resources

* [Models](https://developers.cloudflare.com/workers-ai/models/) \- Browse the Workers AI models catalog.
* [AI SDK](https://developers.cloudflare.com/workers-ai/configuration/ai-sdk) \- Learn how to integrate with an AI model.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/get-started/","name":"Getting started"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/get-started/rest-api/","name":"REST API"}}]}
```

---

---
title: Workers Bindings
description: Deploy your first Cloudflare Workers AI project using the CLI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/get-started/workers-wrangler.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Workers Bindings

This guide will instruct you through setting up and deploying your first Workers AI project. You will use [Workers](https://developers.cloudflare.com/workers/), a Workers AI binding, and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network.

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages).
2. Install [Node.js ↗](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

## 1\. Create a Worker project

You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3 ↗](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare.

Create a new project named `hello-ai` by running:

 npm  yarn  pnpm 

```
npm create cloudflare@latest -- hello-ai
```

```
yarn create cloudflare hello-ai
```

```
pnpm create cloudflare@latest hello-ai
```

Running `npm create cloudflare@latest` will prompt you to install the [create-cloudflare package ↗](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](https://developers.cloudflare.com/workers/wrangler/), the Cloudflare Developer Platform CLI.

For setup, select the following options:

* For _What would you like to start with?_, choose `Hello World example`.
* For _Which template would you like to use?_, choose `Worker only`.
* For _Which language do you want to use?_, choose `TypeScript`.
* For _Do you want to use git for version control?_, choose `Yes`.
* For _Do you want to deploy your application?_, choose `No` (we will be making some changes before deploying).

This will create a new `hello-ai` directory. Your new `hello-ai` directory will include:

* A `"Hello World"` [Worker](https://developers.cloudflare.com/workers/get-started/guide/#3-write-code) at `src/index.ts`.
* A [wrangler.jsonc](https://developers.cloudflare.com/workers/wrangler/configuration/) configuration file.

Go to your application directory:

Terminal window

```

cd hello-ai


```

## 2\. Connect your Worker to Workers AI

You must create an AI binding for your Worker to connect to Workers AI. [Bindings](https://developers.cloudflare.com/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.

To bind Workers AI to your Worker, add the following to the end of your Wrangler file:

* [  wrangler.jsonc ](#tab-panel-6941)
* [  wrangler.toml ](#tab-panel-6942)

```

{

  "ai": {

    "binding": "AI"

  }

}


```

```

[ai]

binding = "AI"


```

Your binding is [available in your Worker code](https://developers.cloudflare.com/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [env.AI](https://developers.cloudflare.com/workers/runtime-apis/handlers/fetch/).

You can also bind Workers AI to a Pages Function. For more information, refer to [Functions Bindings](https://developers.cloudflare.com/pages/functions/bindings/#workers-ai).

## 3\. Run an inference task in your Worker

You are now ready to run an inference task in your Worker. In this case, you will use an LLM, [llama-3.1-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/), to answer a question.

Update the `index.ts` file in your `hello-ai` application directory with the following code:

* [  JavaScript ](#tab-panel-6943)
* [  TypeScript ](#tab-panel-6944)

index.js

```

export default {

  async fetch(request, env) {

    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: "What is the origin of the phrase Hello, World",

    });


    return new Response(JSON.stringify(response));

  },

};


```

index.ts

```

export interface Env {

  // If you set another name in the Wrangler config file as the value for 'binding',

  // replace "AI" with the variable name you defined.

  AI: Ai;

}


export default {

  async fetch(request, env): Promise<Response> {

    const response = await env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: "What is the origin of the phrase Hello, World",

    });


    return new Response(JSON.stringify(response));

  },

} satisfies ExportedHandler<Env>;


```

Up to this point, you have created an AI binding for your Worker and configured your Worker to be able to execute the Llama 3.1 model. You can now test your project locally before you deploy globally.

## 4\. Develop locally with Wrangler

While in your project directory, test Workers AI locally by running [wrangler dev](https://developers.cloudflare.com/workers/wrangler/commands/general/#dev):

Terminal window

```

npx wrangler dev


```

Workers AI local development usage charges

Using Workers AI always accesses your Cloudflare account in order to run AI models and will incur usage charges even in local development.

You will be prompted to log in after you run `wrangler dev`. When you run `npx wrangler dev`, Wrangler will give you a URL (most likely `localhost:8787`) to review your Worker. After you go to the URL Wrangler provides, a message will render that resembles the following example:

```

{

  "response": "Ah, a most excellent question, my dear human friend! *adjusts glasses*\n\nThe origin of the phrase \"Hello, World\" is a fascinating tale that spans several decades and multiple disciplines. It all began in the early days of computer programming, when a young man named Brian Kernighan was tasked with writing a simple program to demonstrate the basics of a new programming language called C.\nKernighan, a renowned computer scientist and author, was working at Bell Labs in the late 1970s when he created the program. He wanted to showcase the language's simplicity and versatility, so he wrote a basic \"Hello, World!\" program that printed the familiar greeting to the console.\nThe program was included in Kernighan and Ritchie's influential book \"The C Programming Language,\" published in 1978. The book became a standard reference for C programmers, and the \"Hello, World!\" program became a sort of \"Hello, World!\" for the programming community.\nOver time, the phrase \"Hello, World!\" became a shorthand for any simple program that demonstrated the basics"

}


```

## 5\. Deploy your AI Worker

Before deploying your AI Worker globally, log in with your Cloudflare account by running:

Terminal window

```

npx wrangler login


```

You will be directed to a web page asking you to log in to the Cloudflare dashboard. After you have logged in, you will be asked if Wrangler can make changes to your Cloudflare account. Scroll down and select **Allow** to continue.

Finally, deploy your Worker to make your project accessible on the Internet. To deploy your Worker, run:

Terminal window

```

npx wrangler deploy


```

```

https://hello-ai.<YOUR_SUBDOMAIN>.workers.dev


```

Your Worker will be deployed to your custom [workers.dev](https://developers.cloudflare.com/workers/configuration/routing/workers-dev/) subdomain. You can now visit the URL to run your AI Worker.

By finishing this tutorial, you have created a Worker, connected it to Workers AI through an AI binding, and ran an inference task from the Llama 3 model.

## Related resources

* [Cloudflare Developers community on Discord ↗](https://discord.cloudflare.com) \- Submit feature requests, report bugs, and share your feedback directly with the Cloudflare team by joining the Cloudflare Discord server.
* [Models](https://developers.cloudflare.com/workers-ai/models/) \- Browse the Workers AI models catalog.
* [AI SDK](https://developers.cloudflare.com/workers-ai/configuration/ai-sdk) \- Learn how to integrate with an AI model.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/get-started/","name":"Getting started"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/get-started/workers-wrangler/","name":"Workers Bindings"}}]}
```

---

---
title: Models
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Copy page

# Models

Task Type Text-to-Speech Summarization Text Embeddings Text Classification Text Generation Object Detection Text-to-Image Automatic Speech Recognition Translation Image-to-Text Image Classification Voice Activity DetectionCapabilitiesModel Reasoning Function calling VisionPlatform Batch Partner Real-time LoRAAuthors Deepgram facebook baai thebloke DeepSeek HuggingFace lykon Google tiiuae Black Forest Labs aisingapore Zhipu AI OpenAI IBM nousresearch ai4bharat Moonshot AI Meta meta-llama llava-hf Leonardo myshell-ai MistralAI MistralAI NVIDIA openchat Microsoft pfnet Qwen pipecat-ai defog runwayml Stability.ai bytedance nexusflow tinyllama unum fblgit

[📌![Moonshot AI logo](https://developers.cloudflare.com/_astro/moonshotai.CZZI-XBJ.svg)kimi-k2.5Text Generation • Moonshot AIKimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.BatchFunction callingReasoningVision](https://developers.cloudflare.com/workers-ai/models/kimi-k2.5)[📌![Zhipu AI logo](https://developers.cloudflare.com/_astro/zai-org.B9OfhU57.svg)glm-4.7-flashText Generation • Zhipu AIGLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.Function callingReasoning](https://developers.cloudflare.com/workers-ai/models/glm-4.7-flash)[📌![OpenAI logo](https://developers.cloudflare.com/_astro/openai.ChTKThcR.svg)gpt-oss-120bText Generation • OpenAIOpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-120b is for production, general purpose, high reasoning use-cases.Function callingReasoning](https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b)[📌![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-4-scout-17b-16e-instructText Generation • MetaMeta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.BatchFunction callingVision](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct)[![NVIDIA logo](https://developers.cloudflare.com/_astro/nvidia.CRWb-0oU.svg)nemotron-3-120b-a12bText Generation • NVIDIANVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.Function callingReasoning](https://developers.cloudflare.com/workers-ai/models/nemotron-3-120b-a12b)[![Black Forest Labs logo](https://developers.cloudflare.com/_astro/blackforestlabs.Ccs-Y4-D.svg)flux-2-klein-9bText-to-Image • Black Forest LabsFLUX.2 \[klein\] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.Partner](https://developers.cloudflare.com/workers-ai/models/flux-2-klein-9b)[![Black Forest Labs logo](https://developers.cloudflare.com/_astro/blackforestlabs.Ccs-Y4-D.svg)flux-2-klein-4bText-to-Image • Black Forest LabsFLUX.2 \[klein\] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.Partner](https://developers.cloudflare.com/workers-ai/models/flux-2-klein-4b)[![Black Forest Labs logo](https://developers.cloudflare.com/_astro/blackforestlabs.Ccs-Y4-D.svg)flux-2-devText-to-Image • Black Forest LabsFLUX.2 \[dev\] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.Partner](https://developers.cloudflare.com/workers-ai/models/flux-2-dev)[![Deepgram logo](https://developers.cloudflare.com/_astro/deepgram.DVGPhlbc.svg)aura-2-esText-to-Speech • DeepgramAura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.BatchPartnerReal-time](https://developers.cloudflare.com/workers-ai/models/aura-2-es)[![Deepgram logo](https://developers.cloudflare.com/_astro/deepgram.DVGPhlbc.svg)aura-2-enText-to-Speech • DeepgramAura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.BatchPartnerReal-time](https://developers.cloudflare.com/workers-ai/models/aura-2-en)[![IBM logo](https://developers.cloudflare.com/_astro/ibm.RSRiCjkw.svg)granite-4.0-h-microText Generation • IBMGranite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.Function calling](https://developers.cloudflare.com/workers-ai/models/granite-4.0-h-micro)[![Deepgram logo](https://developers.cloudflare.com/_astro/deepgram.DVGPhlbc.svg)fluxAutomatic Speech Recognition • DeepgramFlux is the first conversational speech recognition model built specifically for voice agents.PartnerReal-time](https://developers.cloudflare.com/workers-ai/models/flux)[pplamo-embedding-1bText Embeddings • pfnetPLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.](https://developers.cloudflare.com/workers-ai/models/plamo-embedding-1b)[agemma-sea-lion-v4-27b-itText Generation • aisingaporeSEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.](https://developers.cloudflare.com/workers-ai/models/gemma-sea-lion-v4-27b-it)[aindictrans2-en-indic-1BTranslation • ai4bharatIndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages](https://developers.cloudflare.com/workers-ai/models/indictrans2-en-indic-1B)[![Google logo](https://developers.cloudflare.com/_astro/google.C4p59fss.svg)embeddinggemma-300mText Embeddings • GoogleEmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.](https://developers.cloudflare.com/workers-ai/models/embeddinggemma-300m)[![Deepgram logo](https://developers.cloudflare.com/_astro/deepgram.DVGPhlbc.svg)aura-1Text-to-Speech • DeepgramAura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.BatchPartnerReal-time](https://developers.cloudflare.com/workers-ai/models/aura-1)[![Leonardo logo](https://developers.cloudflare.com/_astro/leonardo.CTjvkdQs.svg)lucid-originText-to-Image • LeonardoLucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.Partner](https://developers.cloudflare.com/workers-ai/models/lucid-origin)[![Leonardo logo](https://developers.cloudflare.com/_astro/leonardo.CTjvkdQs.svg)phoenix-1.0Text-to-Image • LeonardoPhoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.Partner](https://developers.cloudflare.com/workers-ai/models/phoenix-1.0)[![OpenAI logo](https://developers.cloudflare.com/_astro/openai.ChTKThcR.svg)gpt-oss-20bText Generation • OpenAIOpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.Function callingReasoning](https://developers.cloudflare.com/workers-ai/models/gpt-oss-20b)[psmart-turn-v2Voice Activity Detection • pipecat-aiAn open source, community-driven, native audio turn detection model in 2nd versionBatchReal-time](https://developers.cloudflare.com/workers-ai/models/smart-turn-v2)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen3-embedding-0.6bText Embeddings • QwenThe Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks. ](https://developers.cloudflare.com/workers-ai/models/qwen3-embedding-0.6b)[![Deepgram logo](https://developers.cloudflare.com/_astro/deepgram.DVGPhlbc.svg)nova-3Automatic Speech Recognition • DeepgramTranscribe audio using Deepgram’s speech-to-text modelBatchPartnerReal-time](https://developers.cloudflare.com/workers-ai/models/nova-3)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen3-30b-a3b-fp8Text Generation • QwenQwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.BatchFunction callingReasoning](https://developers.cloudflare.com/workers-ai/models/qwen3-30b-a3b-fp8)[![Google logo](https://developers.cloudflare.com/_astro/google.C4p59fss.svg)gemma-3-12b-itText Generation • GoogleGemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.LoRA](https://developers.cloudflare.com/workers-ai/models/gemma-3-12b-it)[![MistralAI logo](https://developers.cloudflare.com/_astro/mistralai.Bn9UMUMu.svg)mistral-small-3.1-24b-instructText Generation • MistralAIBuilding upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.Function calling](https://developers.cloudflare.com/workers-ai/models/mistral-small-3.1-24b-instruct)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwq-32bText Generation • QwenQwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.LoRAReasoning](https://developers.cloudflare.com/workers-ai/models/qwq-32b)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen2.5-coder-32b-instructText Generation • QwenQwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:LoRA](https://developers.cloudflare.com/workers-ai/models/qwen2.5-coder-32b-instruct)[bbge-reranker-baseText Classification • baaiDifferent from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in \[0,1\] by sigmoid function.](https://developers.cloudflare.com/workers-ai/models/bge-reranker-base)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-guard-3-8bText Generation • MetaLlama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.LoRA](https://developers.cloudflare.com/workers-ai/models/llama-guard-3-8b)[![DeepSeek logo](https://developers.cloudflare.com/_astro/deepseek.Dn1KbMH4.svg)deepseek-r1-distill-qwen-32bText Generation • DeepSeekDeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5\. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.Reasoning](https://developers.cloudflare.com/workers-ai/models/deepseek-r1-distill-qwen-32b)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.3-70b-instruct-fp8-fastText Generation • MetaLlama 3.3 70B quantized to fp8 precision, optimized to be faster.BatchFunction calling](https://developers.cloudflare.com/workers-ai/models/llama-3.3-70b-instruct-fp8-fast)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.2-1b-instructText Generation • MetaThe Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.](https://developers.cloudflare.com/workers-ai/models/llama-3.2-1b-instruct)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.2-3b-instructText Generation • MetaThe Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.](https://developers.cloudflare.com/workers-ai/models/llama-3.2-3b-instruct)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.2-11b-vision-instructText Generation • Meta The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.LoRAVision](https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct)[![Black Forest Labs logo](https://developers.cloudflare.com/_astro/blackforestlabs.Ccs-Y4-D.svg)flux-1-schnellText-to-Image • Black Forest LabsFLUX.1 \[schnell\] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. ](https://developers.cloudflare.com/workers-ai/models/flux-1-schnell)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.1-8b-instruct-awqText Generation • MetaQuantized (int4) generative text model with 8 billion parameters from Meta.](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-awq)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.1-8b-instruct-fp8Text Generation • MetaLlama 3.1 8B quantized to FP8 precision](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fp8)[mmelottsText-to-Speech • myshell-aiMeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.](https://developers.cloudflare.com/workers-ai/models/melotts)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.1-8b-instructText Generation • MetaThe Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct)[bbge-m3Text Embeddings • baaiMulti-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.](https://developers.cloudflare.com/workers-ai/models/bge-m3)[mmeta-llama-3-8b-instructText Generation • meta-llamaGeneration over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning. ](https://developers.cloudflare.com/workers-ai/models/meta-llama-3-8b-instruct)[![OpenAI logo](https://developers.cloudflare.com/_astro/openai.ChTKThcR.svg)whisper-large-v3-turboAutomatic Speech Recognition • OpenAIWhisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Batch](https://developers.cloudflare.com/workers-ai/models/whisper-large-v3-turbo)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3-8b-instruct-awqText Generation • MetaQuantized (int4) generative text model with 8 billion parameters from Meta.](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct-awq)[lllava-1.5-7b-hfBetaImage-to-Text • llava-hfLLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.](https://developers.cloudflare.com/workers-ai/models/llava-1.5-7b-hf)[funa-cybertron-7b-v2-bf16BetaText Generation • fblgitCybertron 7B v2 is a 7B MistralAI based model, best on it's series. It was trained with SFT, DPO and UNA (Unified Neural Alignment) on multiple datasets.Deprecated](https://developers.cloudflare.com/workers-ai/models/una-cybertron-7b-v2-bf16)[![OpenAI logo](https://developers.cloudflare.com/_astro/openai.ChTKThcR.svg)whisper-tiny-enBetaAutomatic Speech Recognition • OpenAIWhisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.](https://developers.cloudflare.com/workers-ai/models/whisper-tiny-en)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3-8b-instructText Generation • MetaGeneration over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct)[![MistralAI logo](https://developers.cloudflare.com/_astro/mistralai.Bn9UMUMu.svg)mistral-7b-instruct-v0.2BetaText Generation • MistralAIThe Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2\. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.LoRA](https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.2)[![Google logo](https://developers.cloudflare.com/_astro/google.C4p59fss.svg)gemma-7b-it-loraBetaText Generation • Google This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.LoRA](https://developers.cloudflare.com/workers-ai/models/gemma-7b-it-lora)[![Google logo](https://developers.cloudflare.com/_astro/google.C4p59fss.svg)gemma-2b-it-loraBetaText Generation • GoogleThis is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.LoRA](https://developers.cloudflare.com/workers-ai/models/gemma-2b-it-lora)[mllama-2-7b-chat-hf-loraBetaText Generation • meta-llamaThis is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. LoRA](https://developers.cloudflare.com/workers-ai/models/llama-2-7b-chat-hf-lora)[![Google logo](https://developers.cloudflare.com/_astro/google.C4p59fss.svg)gemma-7b-itBetaText Generation • GoogleGemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.LoRA](https://developers.cloudflare.com/workers-ai/models/gemma-7b-it)[nstarling-lm-7b-betaBetaText Generation • nexusflowWe introduce Starling-LM-7B-beta, an open large language model (LLM) trained by Reinforcement Learning from AI Feedback (RLAIF). Starling-LM-7B-beta is trained from Openchat-3.5-0106 with our new reward model Nexusflow/Starling-RM-34B and policy optimization method Fine-Tuning Language Models from Human Preferences (PPO).Deprecated](https://developers.cloudflare.com/workers-ai/models/starling-lm-7b-beta)[nhermes-2-pro-mistral-7bBetaText Generation • nousresearchHermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.Function calling](https://developers.cloudflare.com/workers-ai/models/hermes-2-pro-mistral-7b)[![MistralAI logo](https://developers.cloudflare.com/_astro/mistralai.Bn9UMUMu.svg)mistral-7b-instruct-v0.2-loraBetaText Generation • MistralAIThe Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.LoRA](https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.2-lora)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen1.5-1.8b-chatBetaText Generation • QwenQwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.Deprecated](https://developers.cloudflare.com/workers-ai/models/qwen1.5-1.8b-chat)[uuform-gen2-qwen-500mBetaImage-to-Text • unumUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.](https://developers.cloudflare.com/workers-ai/models/uform-gen2-qwen-500m)[fbart-large-cnnBetaSummarization • facebookBART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.](https://developers.cloudflare.com/workers-ai/models/bart-large-cnn)[![Microsoft logo](https://developers.cloudflare.com/_astro/microsoft.BfW2Sks3.svg)phi-2BetaText Generation • MicrosoftPhi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.](https://developers.cloudflare.com/workers-ai/models/phi-2)[ttinyllama-1.1b-chat-v1.0BetaText Generation • tinyllamaThe TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens. This is the chat model finetuned on top of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T.Deprecated](https://developers.cloudflare.com/workers-ai/models/tinyllama-1.1b-chat-v1.0)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen1.5-14b-chat-awqBetaText Generation • QwenQwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.Deprecated](https://developers.cloudflare.com/workers-ai/models/qwen1.5-14b-chat-awq)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen1.5-7b-chat-awqBetaText Generation • QwenQwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.Deprecated](https://developers.cloudflare.com/workers-ai/models/qwen1.5-7b-chat-awq)[![Qwen logo](https://developers.cloudflare.com/_astro/qwen.B8ST_F2H.svg)qwen1.5-0.5b-chatBetaText Generation • QwenQwen1.5 is the improved version of Qwen, the large language model series developed by Alibaba Cloud.Deprecated](https://developers.cloudflare.com/workers-ai/models/qwen1.5-0.5b-chat)[tdiscolm-german-7b-v1-awqBetaText Generation • theblokeDiscoLM German 7b is a Mistral-based large language model with a focus on German-language applications. AWQ is an efficient, accurate and blazing-fast low-bit weight quantization method, currently supporting 4-bit quantization.Deprecated](https://developers.cloudflare.com/workers-ai/models/discolm-german-7b-v1-awq)[tfalcon-7b-instructBetaText Generation • tiiuaeFalcon-7B-Instruct is a 7B parameters causal decoder-only model built by TII based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.Deprecated](https://developers.cloudflare.com/workers-ai/models/falcon-7b-instruct)[oopenchat-3.5-0106BetaText Generation • openchatOpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT - a strategy inspired by offline reinforcement learning.Deprecated](https://developers.cloudflare.com/workers-ai/models/openchat-3.5-0106)[dsqlcoder-7b-2BetaText Generation • defogThis model is intended to be used by non-technical users to understand data inside their SQL databases. ](https://developers.cloudflare.com/workers-ai/models/sqlcoder-7b-2)[![DeepSeek logo](https://developers.cloudflare.com/_astro/deepseek.Dn1KbMH4.svg)deepseek-math-7b-instructBetaText Generation • DeepSeekDeepSeekMath-Instruct 7B is a mathematically instructed tuning model derived from DeepSeekMath-Base 7B. DeepSeekMath is initialized with DeepSeek-Coder-v1.5 7B and continues pre-training on math-related tokens sourced from Common Crawl, together with natural language and code data for 500B tokens.Deprecated](https://developers.cloudflare.com/workers-ai/models/deepseek-math-7b-instruct)[fdetr-resnet-50BetaObject Detection • facebookDEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).](https://developers.cloudflare.com/workers-ai/models/detr-resnet-50)[bstable-diffusion-xl-lightningBetaText-to-Image • bytedanceSDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.](https://developers.cloudflare.com/workers-ai/models/stable-diffusion-xl-lightning)[ldreamshaper-8-lcmText-to-Image • lykonStable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.](https://developers.cloudflare.com/workers-ai/models/dreamshaper-8-lcm)[rstable-diffusion-v1-5-img2imgBetaText-to-Image • runwaymlStable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion. ](https://developers.cloudflare.com/workers-ai/models/stable-diffusion-v1-5-img2img)[rstable-diffusion-v1-5-inpaintingBetaText-to-Image • runwaymlStable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.](https://developers.cloudflare.com/workers-ai/models/stable-diffusion-v1-5-inpainting)[tdeepseek-coder-6.7b-instruct-awqBetaText Generation • theblokeDeepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.Deprecated](https://developers.cloudflare.com/workers-ai/models/deepseek-coder-6.7b-instruct-awq)[tdeepseek-coder-6.7b-base-awqBetaText Generation • theblokeDeepseek Coder is composed of a series of code language models, each trained from scratch on 2T tokens, with a composition of 87% code and 13% natural language in both English and Chinese.Deprecated](https://developers.cloudflare.com/workers-ai/models/deepseek-coder-6.7b-base-awq)[tllamaguard-7b-awqBetaText Generation • theblokeLlama Guard is a model for classifying the safety of LLM prompts and responses, using a taxonomy of safety risks.Deprecated](https://developers.cloudflare.com/workers-ai/models/llamaguard-7b-awq)[tneural-chat-7b-v3-1-awqBetaText Generation • theblokeThis model is a fine-tuned 7B parameter LLM on the Intel Gaudi 2 processor from the mistralai/Mistral-7B-v0.1 on the open source dataset Open-Orca/SlimOrca.Deprecated](https://developers.cloudflare.com/workers-ai/models/neural-chat-7b-v3-1-awq)[topenhermes-2.5-mistral-7b-awqBetaText Generation • theblokeOpenHermes 2.5 Mistral 7B is a state of the art Mistral Fine-tune, a continuation of OpenHermes 2 model, which trained on additional code datasets.Deprecated](https://developers.cloudflare.com/workers-ai/models/openhermes-2.5-mistral-7b-awq)[tllama-2-13b-chat-awqBetaText Generation • theblokeLlama 2 13B Chat AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Llama 2 variant.Deprecated](https://developers.cloudflare.com/workers-ai/models/llama-2-13b-chat-awq)[tmistral-7b-instruct-v0.1-awqBetaText Generation • theblokeMistral 7B Instruct v0.1 AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Mistral variant.Deprecated](https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.1-awq)[tzephyr-7b-beta-awqBetaText Generation • theblokeZephyr 7B Beta AWQ is an efficient, accurate and blazing-fast low-bit weight quantized Zephyr model variant.Deprecated](https://developers.cloudflare.com/workers-ai/models/zephyr-7b-beta-awq)[![Stability.ai logo](https://developers.cloudflare.com/_astro/stabilityai.CWXCgVjU.svg)stable-diffusion-xl-base-1.0BetaText-to-Image • Stability.aiDiffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.](https://developers.cloudflare.com/workers-ai/models/stable-diffusion-xl-base-1.0)[bbge-large-en-v1.5Text Embeddings • baaiBAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vectorBatch](https://developers.cloudflare.com/workers-ai/models/bge-large-en-v1.5)[bbge-small-en-v1.5Text Embeddings • baaiBAAI general embedding (Small) model that transforms any given text into a 384-dimensional vectorBatch](https://developers.cloudflare.com/workers-ai/models/bge-small-en-v1.5)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-2-7b-chat-fp16Text Generation • MetaFull precision (fp16) generative text model with 7 billion parameters from Meta](https://developers.cloudflare.com/workers-ai/models/llama-2-7b-chat-fp16)[![MistralAI logo](https://developers.cloudflare.com/_astro/mistralai.Bn9UMUMu.svg)mistral-7b-instruct-v0.1Text Generation • MistralAIInstruct fine-tuned version of the Mistral-7b generative text model with 7 billion parametersLoRA](https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.1)[bbge-base-en-v1.5Text Embeddings • baaiBAAI general embedding (Base) model that transforms any given text into a 768-dimensional vectorBatch](https://developers.cloudflare.com/workers-ai/models/bge-base-en-v1.5)[![HuggingFace logo](https://developers.cloudflare.com/_astro/huggingface.DHiS2HZA.svg)distilbert-sst-2-int8Text Classification • HuggingFaceDistilled BERT model that was finetuned on SST-2 for sentiment classification](https://developers.cloudflare.com/workers-ai/models/distilbert-sst-2-int8)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-2-7b-chat-int8Text Generation • MetaQuantized (int8) generative text model with 7 billion parameters from Meta](https://developers.cloudflare.com/workers-ai/models/llama-2-7b-chat-int8)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)m2m100-1.2bTranslation • MetaMultilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translationBatch](https://developers.cloudflare.com/workers-ai/models/m2m100-1.2b)[![Microsoft logo](https://developers.cloudflare.com/_astro/microsoft.BfW2Sks3.svg)resnet-50Image Classification • Microsoft50 layers deep image classification CNN trained on more than 1M images from ImageNet](https://developers.cloudflare.com/workers-ai/models/resnet-50)[![OpenAI logo](https://developers.cloudflare.com/_astro/openai.ChTKThcR.svg)whisperAutomatic Speech Recognition • OpenAIWhisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.](https://developers.cloudflare.com/workers-ai/models/whisper)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.1-70b-instructText Generation • MetaThe Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.](https://developers.cloudflare.com/workers-ai/models/llama-3.1-70b-instruct)[![Meta logo](https://developers.cloudflare.com/_astro/meta.x5nlFKBG.svg)llama-3.1-8b-instruct-fastText Generation • Meta\[Fast version\] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/models/","name":"Models"}}]}
```

---

---
title: Agents
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/agents.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Agents

Build AI assistants that can perform complex tasks on behalf of your users using Cloudflare Workers AI and Agents.

[ Go to Agents documentation ](https://developers.cloudflare.com/agents/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/agents/","name":"Agents"}}]}
```

---

---
title: Playground
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/playground.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Playground

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/playground/","name":"Playground"}}]}
```

---

---
title: REST API reference
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/api-reference.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# REST API reference

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/api-reference/","name":"REST API reference"}}]}
```

---

---
title: Changelog
description: Review recent changes to Cloudflare Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/changelog.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Changelog

[ Subscribe to RSS ](https://developers.cloudflare.com/workers-ai/changelog/index.xml)

## 2026-03-19

**Moonshot AI Kimi K2.5 now available on Workers AI**
* [@cf/moonshotai/kimi-k2.5](https://developers.cloudflare.com/workers-ai/models/kimi-k2.5/) now available on Workers AI. A frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads. Read [changelog](https://developers.cloudflare.com/changelog/post/2026-03-19-kimi-k2-5-workers-ai/) to get started.
* New [Prompt caching](https://developers.cloudflare.com/workers-ai/features/prompt-caching/) documentation. Send the `x-session-affinity` header to route requests to the same model instance and maximize prefix cache hit rates across multi-turn conversations.
* Redesigned [Asynchronous Batch API](https://developers.cloudflare.com/workers-ai/features/batch-api/) with a pull-based system that processes queued requests as capacity becomes available, avoiding out-of-capacity errors for durable workflows.

## 2026-03-11

**NVIDIA Nemotron 3 Super now available on Workers AI**
* [@cf/nvidia/nemotron-3-120b-a12b](https://developers.cloudflare.com/workers-ai/models/nemotron-3-120b-a12b/) now available on Workers AI! A hybrid MoE model with 120B total parameters and 12B active, optimized for multi-agent and agentic AI workloads. Read [changelog](https://developers.cloudflare.com/changelog/post/2026-03-11-nemotron-3-super-workers-ai/) to get started.

## 2026-03-06

**Deepgram Nova-3 now supports 10 languages with regional variants**
* [@cf/deepgram/nova-3](https://developers.cloudflare.com/workers-ai/models/nova-3/) now supports 10 languages with regional variants for real-time transcription. Supported languages include English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch — with regional variants like `en-GB`, `fr-CA`, and `pt-BR`.

## 2026-02-17

**Chat Completions API support for gpt-oss models and tool calling improvements**
* [@cf/openai/gpt-oss-120b](https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b/) and [@cf/openai/gpt-oss-20b](https://developers.cloudflare.com/workers-ai/models/gpt-oss-20b/) now support Chat Completions API format. Use `/v1/chat/completions` with a `messages` array, or use `/ai/run` which dynamically detects your input format and accepts Chat Completions (`messages`), legacy Completions (`prompt`), or Responses API (`input`).
* **\[Bug fix\]** Fixed a bug in the schema for multiple text generation models where the `content` field in message objects only accepted string values. The field now properly accepts both string content and array content (structured content parts for multi-modal inputs). This fix applies to all affected chat models including GPT-OSS models, Llama 3.x, Mistral, Qwen, and others.
* **\[Bug fix\]** Tool call round-trips now work correctly. The binding no longer rejects `tool_call_id` values that it generated itself, fixing issues with multi-turn tool calling conversations.
* **\[Bug fix\]** Assistant messages with `content: null` and `tool_calls` are now accepted in both the Workers AI binding and REST API (`/v1/chat/completions`), fixing tool call round-trip failures.
* **\[Bug fix\]** Streaming responses now correctly report `finish_reason` only on the usage chunk, matching OpenAI's streaming behavior and preventing duplicate finish events.
* **\[Bug fix\]** `/v1/chat/completions` now preserves original tool call IDs from models instead of regenerating them. Previously, the endpoint was generating new IDs which broke multi-turn tool calling because AI SDK clients could not match tool results to their original calls.
* **\[Bug fix\]** `/v1/chat/completions` now correctly reports `finish_reason: "tool_calls"` in the final usage chunk when tools are used. Previously, it was hardcoding `finish_reason: "stop"` which caused AI SDK clients to think the conversation was complete instead of executing tool calls.

## 2026-02-13

**GLM-4.7-Flash, @cloudflare/tanstack-ai, and workers-ai-provider v3.1.1**
* [@cf/zai-org/glm-4.7-flash](https://developers.cloudflare.com/workers-ai/models/glm-4.7-flash/) is now available on Workers AI! A fast and efficient multilingual text generation model optimized for multi-turn tool calling across 100+ languages. Read [changelog](https://developers.cloudflare.com/changelog/2026-02-13-glm-4.7-flash-workers-ai/) to get started.
* New [@cloudflare/tanstack-ai](https://www.npmjs.com/package/@cloudflare/tanstack-ai) package for using Workers AI and AI Gateway with TanStack AI.
* [workers-ai-provider v3.1.1](https://www.npmjs.com/package/workers-ai-provider) adds transcription, text-to-speech, and reranking capabilities.

## 2026-01-28

**Black Forest Labs FLUX.2 \[klein\] 9B now available**
* [@cf/black-forest-labs/flux-2-klein-9b](https://developers.cloudflare.com/workers-ai/models/flux-2-klein-9b/) now available on Workers AI! Read [changelog](https://developers.cloudflare.com/changelog/2026-01-28-flux-2-klein-9b-workers-ai/) to get started

## 2026-01-15

**Black Forest Labs FLUX.2 \[klein\] 4b now available**
* [@cf/black-forest-labs/flux-2-klein-4b](https://developers.cloudflare.com/workers-ai/models/flux-2-klein-4b/) now available on Workers AI! Read [changelog](https://developers.cloudflare.com/changelog/2026-01-15-flux-2-klein-4b-workers-ai/) to get started

## 2025-12-03

**Deepgram Flux promotional period over on Dec 8, 2025 - now has pricing**
* Check out updated pricing on the [@cf/deepgram/flux](https://developers.cloudflare.com/workers-ai/models/flux/) model page or [pricing](https://developers.cloudflare.com/workers-ai/platform/pricing/) page
* Pricing will start Dec 8, 2025

## 2025-11-25

**Black Forest Labs FLUX.2 dev now available**
* [@cf/black-forest-labs/flux-2-dev](https://developers.cloudflare.com/workers-ai/models/flux-2-dev/) now available on Workers AI! Read [changelog](https://developers.cloudflare.com/changelog/2025-11-25-flux-2-dev-workers-ai/) to get started

## 2025-11-13

**Qwen3 LLM and Embeddings available on Workers AI**
* [@cf/qwen/qwen3-30b-a3b-fp8](https://developers.cloudflare.com/workers-ai/models/qwen3-30b-a3b-fp8/) and [@cf/qwen/qwen3-embedding-0.6b](https://developers.cloudflare.com/workers-ai/models/qwen3-embedding-0.6b) now available on Workers AI

## 2025-10-21

**New voice and LLM models on Workers AI**
* Deepgram Aura 2 brings new text-to-speech capabilities to Workers AI. Check out [@cf/deepgram/aura-2-en](https://developers.cloudflare.com/workers-ai/models/aura-2-en/) and [@cf/deepgram/aura-2-es](https://developers.cloudflare.com/workers-ai/models/aura-2-es/) on how to use the new models.
* IBM Granite model is also up! This new LLM model is small but mighty, take a look at the docs for more [@cf/ibm-granite/granite-4.0-h-micro](https://developers.cloudflare.com/workers-ai/models/granite-4.0-h-micro/)

## 2025-10-02

**Deepgram Flux now available on Workers AI**
* We're excited to be a launch partner with Deepgram and offer their new Speech Recognition model built specifically for enabling voice agents. Check out [Deepgram's blog](https://deepgram.com/flux) for more details on the release.
* Access the model through [@cf/deepgram/flux](https://developers.cloudflare.com/workers-ai/models/flux/) and check out the [changelog](https://developers.cloudflare.com/changelog/2025-10-02-deepgram-flux/) for in-depth examples.

## 2025-09-24

**New local models available on Workers AI**
* We've added support for some regional models on Workers AI in support of uplifting local AI labs and AI sovereignty. Check out the [full blog post here](https://blog.cloudflare.com/sovereign-ai-and-choice).
* [@cf/pfnet/plamo-embedding-1b](https://developers.cloudflare.com/workers-ai/models/plamo-embedding-1b) creates embeddings from Japanese text.
* [@cf/aisingapore/gemma-sea-lion-v4-27b-it](https://developers.cloudflare.com/workers-ai/models/gemma-sea-lion-v4-27b-it) is a fine-tuned model that supports multiple South East Asian languages, including Burmese, English, Indonesian, Khmer, Lao, Malay, Mandarin, Tagalog, Tamil, Thai, and Vietnamese.
* [@cf/ai4bharat/indictrans2-en-indic-1B](https://developers.cloudflare.com/workers-ai/models/indictrans2-en-indic-1B) is a translation model that can translate between 22 indic languages, including Bengali, Gujarati, Hindi, Tamil, Sanskrit and even traditionally low-resourced languages like Kashmiri, Manipuri and Sindhi.

## 2025-09-23

**New document formats supported by Markdown conversion utility**
* Our [Markdown conversion utility](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/) now supports converting `.docx` and `.odt` files.

## 2025-09-18

**Model Catalog updates (types, EmbeddingGemma, model deprecation)**
* Workers AI types got updated in the upcoming wrangler release, please use `npm i -D wrangler@latest` to update your packages.
* EmbeddingGemma model accuracy has been improved, we recommend re-indexing data to take advantage of the improved accuracy
* Some older Workers AI models are being deprecated on October 1st, 2025\. We reccommend you use the newer models such as [Llama 4](https://developers.cloudflare.com/workers-ai/models/llama-4-scout-17b-16e-instruct/) and [gpt-oss](https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b/). The following models are being deprecated:  
   * @hf/thebloke/zephyr-7b-beta-awq  
   * @hf/thebloke/mistral-7b-instruct-v0.1-awq  
   * @hf/thebloke/llama-2-13b-chat-awq  
   * @hf/thebloke/openhermes-2.5-mistral-7b-awq  
   * @hf/thebloke/neural-chat-7b-v3-1-awq  
   * @hf/thebloke/llamaguard-7b-awq  
   * @hf/thebloke/deepseek-coder-6.7b-base-awq  
   * @hf/thebloke/deepseek-coder-6.7b-instruct-awq  
   * @cf/deepseek-ai/deepseek-math-7b-instruct  
   * @cf/openchat/openchat-3.5-0106  
   * @cf/tiiuae/falcon-7b-instruct  
   * @cf/thebloke/discolm-german-7b-v1-awq  
   * @cf/qwen/qwen1.5-0.5b-chat  
   * @cf/qwen/qwen1.5-7b-chat-awq  
   * @cf/qwen/qwen1.5-14b-chat-awq  
   * @cf/tinyllama/tinyllama-1.1b-chat-v1.0  
   * @cf/qwen/qwen1.5-1.8b-chat  
   * @hf/nexusflow/starling-lm-7b-beta  
   * @cf/fblgit/una-cybertron-7b-v2-bf16

## 2025-09-05

**Introducing EmbeddingGemma from Google**
* We’re excited to be a launch partner alongside Google to bring their newest embedding model to Workers AI. We're excited to introduce EmbeddingGemma delivers best-in-class performance for its size, enabling RAG and semantic search use cases. Take a look at [@cf/google/embeddinggemma-300m](https://developers.cloudflare.com/workers-ai/models/embeddinggemma-300m) for more details. Now available to use for embedding in AI Search too.

## 2025-08-27

**Introducing Partner models to the Workers AI catalog**
* Read the [blog](https://blog.cloudflare.com/workers-ai-partner-models) for more details
* [@cf/deepgram/aura-1](https://developers.cloudflare.com/workers-ai/models/aura-1) is a text-to-speech model that allows you to input text and have it come to life in a customizable voice
* [@cf/deepgram/nova-3](https://developers.cloudflare.com/workers-ai/models/nova-3) is speech-to-text model that transcribes multilingual audio at a blazingly fast speed
* [@cf/pipecat-ai/smart-turn-v2](https://developers.cloudflare.com/workers-ai/models/smart-turn-v2) helps you detect when someone is done speaking
* [@cf/leonardo/lucid-origin](https://developers.cloudflare.com/workers-ai/models/lucid-origin) is a text-to-image model that generates images with sharp graphic design, stunning full-HD renders, or highly specific creative direction
* [@cf/leonardo/phoenix-1.0](https://developers.cloudflare.com/workers-ai/models/phoenix-1.0) is a text-to-image model with exceptional prompt adherence and coherent text
* WebSocket support added for audio models like `@cf/deepgram/aura-1`, `@cf/deepgram/nova-3`, `@cf/pipecat-ai/smart-turn-v2`

## 2025-08-05

**Adding gpt-oss models to our catalog**
* Check out the [blog](https://blog.cloudflare.com/openai-gpt-oss-on-workers-ai) for more details about the new models
* Take a look at the [gpt-oss-120b](https://developers.cloudflare.com/workers-ai/models/gpt-oss-120b) and [gpt-oss-20b](https://developers.cloudflare.com/workers-ai/models/gpt-oss-20b) model pages for more information about schemas, pricing, and context windows

## 2025-04-09

**Pricing correction for @cf/myshell-ai/melotts**
* We've updated our documentation to reflect the correct pricing for melotts: $0.0002 per audio minute, which is actually cheaper than initially stated. The documented pricing was incorrect, where it said users would be charged based on input tokens.

## 2025-03-17

**Minor updates to the model schema for llama-3.2-1b-instruct, whisper-large-v3-turbo, llama-guard**
* [llama-3.2-1b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.2-1b-instruct/) \- updated context window to the accurate 60,000
* [whisper-large-v3-turbo](https://developers.cloudflare.com/workers-ai/models/whisper-large-v3-turbo/) \- new hyperparameters available
* [llama-guard-3-8b](https://developers.cloudflare.com/workers-ai/models/llama-guard-3-8b/) \- the messages array must alternate between `user` and `assistant` to function correctly

## 2025-02-21

**Workers AI bug fixes**
* We fixed a bug where `max_tokens` defaults were not properly being respected - `max_tokens` now correctly defaults to `256` as displayed on the model pages. Users relying on the previous behaviour may observe this as a breaking change. If you want to generate more tokens, please set the `max_tokens` parameter to what you need.
* We updated model pages to show context windows - which is defined as the tokens used in the prompt + tokens used in the response. If your prompt + response tokens exceed the context window, the request will error. Please set `max_tokens` accordingly depending on your prompt length and the context window length to ensure a successful response.

## 2024-09-26

**Workers AI Birthday Week 2024 announcements**
* Meta Llama 3.2 1B, 3B, and 11B vision is now available on Workers AI
* `@cf/black-forest-labs/flux-1-schnell` is now available on Workers AI
* Workers AI is fast! Powered by new GPUs and optimizations, you can expect faster inference on Llama 3.1, Llama 3.2, and FLUX models.
* No more neurons. Workers AI is moving towards [unit-based pricing](https://developers.cloudflare.com/workers-ai/platform/pricing)
* Model pages get a refresh with better documentation on parameters, pricing, and model capabilities
* Closed beta for our Run Any\* Model feature, [sign up here](https://forms.gle/h7FcaTF4Zo5dzNb68)
* Check out the [product announcements blog post](https://blog.cloudflare.com/workers-ai) for more information
* And the [technical blog post](https://blog.cloudflare.com/workers-ai/making-workers-ai-faster) if you want to learn about how we made Workers AI fast

## 2024-07-23

**Meta Llama 3.1 now available on Workers AI**

Workers AI now suppoorts [Meta Llama 3.1](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/).

## 2024-06-27

**Introducing embedded function calling**
* A new way to do function calling with [Embedded function calling](https://developers.cloudflare.com/workers-ai/function-calling/embedded)
* Published new [@cloudflare/ai-utils](https://www.npmjs.com/package/@cloudflare/ai-utils) npm package
* Open-sourced [ai-utils on Github](https://github.com/cloudflare/ai-utils)

## 2024-06-19

**Added support for traditional function calling**
* [Function calling](https://developers.cloudflare.com/workers-ai/function-calling/) is now supported on enabled models
* Properties added on [models](https://developers.cloudflare.com/workers-ai/models/) page to show which models support function calling

## 2024-06-18

**Native support for AI Gateways**

Workers AI now natively supports [AI Gateway](https://developers.cloudflare.com/ai-gateway/usage/providers/workersai/#worker).

## 2024-06-11

**Deprecation announcement for \`@cf/meta/llama-2-7b-chat-int8\`**

We will be deprecating `@cf/meta/llama-2-7b-chat-int8` on 2024-06-30.

Replace the model ID in your code with a new model of your choice:

* [@cf/meta/llama-3-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/) is the newest model in the Llama family (and is currently free for a limited time on Workers AI).
* [@cf/meta/llama-3-8b-instruct-awq](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct-awq/) is the new Llama 3 in a similar precision to your currently selected model. This model is also currently free for a limited time.

If you do not switch to a different model by June 30th, we will automatically start returning inference from `@cf/meta/llama-3-8b-instruct-awq`.

## 2024-05-29

**Add new public LoRAs and note on LoRA routing**
* Added documentation on [new public LoRAs](https://developers.cloudflare.com/workers-ai/fine-tunes/public-loras/).
* Noted that you can now run LoRA inference with the base model rather than explicitly calling the `-lora` version

## 2024-05-17

**Add OpenAI compatible API endpoints**

Added OpenAI compatible API endpoints for `/v1/chat/completions` and `/v1/embeddings`. For more details, refer to [Configurations](https://developers.cloudflare.com/workers-ai/configuration/open-ai-compatibility/).

## 2024-04-11

**Add AI native binding**
* Added new AI native binding, you can now run models with `const resp = await env.AI.run(modelName, inputs)`
* Deprecated `@cloudflare/ai` npm package. While existing solutions using the @cloudflare/ai package will continue to work, no new Workers AI features will be supported. Moving to native AI bindings is highly recommended

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/changelog/","name":"Changelog"}}]}
```

---

---
title: Vercel AI SDK
description: Workers AI can be used with the Vercel AI SDK for JavaScript and TypeScript codebases.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/configuration/ai-sdk.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Vercel AI SDK

Workers AI can be used with the [Vercel AI SDK ↗](https://sdk.vercel.ai/) for JavaScript and TypeScript codebases.

## Setup

Install the [workers-ai-provider provider ↗](https://sdk.vercel.ai/providers/community-providers/cloudflare-workers-ai):

 npm  yarn  pnpm  bun 

```
npm i workers-ai-provider
```

```
yarn add workers-ai-provider
```

```
pnpm add workers-ai-provider
```

```
bun add workers-ai-provider
```

Then, add an AI binding in your Workers project Wrangler file:

```

[ai]

binding = "AI"


```

## Models

The AI SDK can be configured to work with [any AI model](https://developers.cloudflare.com/workers-ai/models/).

JavaScript

```

import { createWorkersAI } from "workers-ai-provider";


const workersai = createWorkersAI({ binding: env.AI });


// Choose any model: https://developers.cloudflare.com/workers-ai/models/

const model = workersai("@cf/meta/llama-3.1-8b-instruct", {});


```

## Generate Text

Once you have selected your model, you can generate text from a given prompt.

JavaScript

```

import { createWorkersAI } from 'workers-ai-provider';

import { generateText } from 'ai';


type Env = {

  AI: Ai;

};


export default {

  async fetch(_: Request, env: Env) {

    const workersai = createWorkersAI({ binding: env.AI });

    const result = await generateText({

      model: workersai('@cf/meta/llama-2-7b-chat-int8'),

      prompt: 'Write a 50-word essay about hello world.',

    });


    return new Response(result.text);

  },

};


```

## Stream Text

For longer responses, consider streaming responses to provide as the generation completes.

JavaScript

```

import { createWorkersAI } from 'workers-ai-provider';

import { streamText } from 'ai';


type Env = {

  AI: Ai;

};


export default {

  async fetch(_: Request, env: Env) {

    const workersai = createWorkersAI({ binding: env.AI });

    const result = streamText({

      model: workersai('@cf/meta/llama-2-7b-chat-int8'),

      prompt: 'Write a 50-word essay about hello world.',

    });


    return result.toTextStreamResponse({

      headers: {

        // add these headers to ensure that the

        // response is chunked and streamed

        'Content-Type': 'text/x-unknown',

        'content-encoding': 'identity',

        'transfer-encoding': 'chunked',

      },

    });

  },

};


```

## Generate Structured Objects

You can provide a Zod schema to generate a structured JSON response.

JavaScript

```

import { createWorkersAI } from 'workers-ai-provider';

import { generateObject } from 'ai';

import { z } from 'zod';


type Env = {

  AI: Ai;

};


export default {

  async fetch(_: Request, env: Env) {

    const workersai = createWorkersAI({ binding: env.AI });

    const result = await generateObject({

      model: workersai('@cf/meta/llama-3.1-8b-instruct'),

      prompt: 'Generate a Lasagna recipe',

      schema: z.object({

        recipe: z.object({

          ingredients: z.array(z.string()),

          description: z.string(),

        }),

      }),

    });


    return Response.json(result.object);

  },

};


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/configuration/","name":"Configuration"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/configuration/ai-sdk/","name":"Vercel AI SDK"}}]}
```

---

---
title: Workers Bindings
description: Workers provides a serverless execution environment that allows you to create new applications or augment existing ones.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/configuration/bindings.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Workers Bindings

## Workers

[Workers](https://developers.cloudflare.com/workers/) provides a serverless execution environment that allows you to create new applications or augment existing ones.

To use Workers AI with Workers, you must create a Workers AI [binding](https://developers.cloudflare.com/workers/runtime-apis/bindings/). Bindings allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform. You create bindings on the Cloudflare dashboard or by updating your [Wrangler file](https://developers.cloudflare.com/workers/wrangler/configuration/).

To bind Workers AI to your Worker, add the following to the end of your Wrangler file:

* [  wrangler.jsonc ](#tab-panel-6921)
* [  wrangler.toml ](#tab-panel-6922)

```

{

  "ai": {

    "binding": "AI" // i.e. available in your Worker on env.AI

  }

}


```

```

[ai]

binding = "AI"


```

## Pages Functions

[Pages Functions](https://developers.cloudflare.com/pages/functions/) allow you to build full-stack applications with Cloudflare Pages by executing code on the Cloudflare network. Functions are Workers under the hood.

To configure a Workers AI binding in your Pages Function, you must use the Cloudflare dashboard. Refer to [Workers AI bindings](https://developers.cloudflare.com/pages/functions/bindings/#workers-ai) for instructions.

## Methods

### async env.AI.run()

`async env.AI.run()` runs a model. Takes a model as the first parameter, and an object as the second parameter.

JavaScript

```

const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {

    prompt: "What is the origin of the phrase 'Hello, World'"

});


```

**Parameters**

* `model` ` string ` required  
   * The model to run.  
**Supported options**  
   * `stream` ` boolean ` optional  
         * Returns a stream of results as they are available.

JavaScript

```

const answer = await env.AI.run('@cf/meta/llama-3.1-8b-instruct', {

    prompt: "What is the origin of the phrase 'Hello, World'",

    stream: true

});


return new Response(answer, {

    headers: { "content-type": "text/event-stream" }

});


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/configuration/","name":"Configuration"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/configuration/bindings/","name":"Workers Bindings"}}]}
```

---

---
title: Hugging Face Chat UI
description: Use Workers AI with Chat UI, an open-source chat interface offered by Hugging Face.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/configuration/hugging-face-chat-ui.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Hugging Face Chat UI

Use Workers AI with [Chat UI ↗](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models), an open-source chat interface offered by Hugging Face.

## Prerequisites

You will need the following:

* A [Cloudflare account ↗](https://dash.cloudflare.com)
* Your [Account ID](https://developers.cloudflare.com/fundamentals/account/find-account-and-zone-ids/)
* An [API token](https://developers.cloudflare.com/workers-ai/get-started/rest-api/#1-get-api-token-and-account-id) for Workers AI

## Setup

First, decide how to reference your Account ID and API token (either directly in your `.env.local` using the `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_API_TOKEN` variables or in the endpoint configuration).

Then, follow the rest of the setup instructions in the [Chat UI GitHub repository ↗](https://github.com/huggingface/chat-ui?tab=readme-ov-file#text-embedding-models).

When setting up your models, specify the `cloudflare` endpoint.

```

{

  "name" : "nousresearch/hermes-2-pro-mistral-7b",

  "tokenizer": "nousresearch/hermes-2-pro-mistral-7b",

  "parameters": {

    "stop": ["<|im_end|>"]

  },

  "endpoints" : [

    {

      "type": "cloudflare",

      // optionally specify these if not included in .env.local

      "accountId": "your-account-id",

      "apiToken": "your-api-token"

      //

    }

  ]

}


```

## Supported models

This template works with any [text generation models](https://developers.cloudflare.com/workers-ai/models/) that begin with the `@hf` parameter.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/configuration/","name":"Configuration"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/configuration/hugging-face-chat-ui/","name":"Hugging Face Chat UI"}}]}
```

---

---
title: OpenAI compatible API endpoints
description: Workers AI supports OpenAI compatible endpoints for text generation (/v1/chat/completions) and text embedding models (/v1/embeddings). This allows you to use the same code as you would for your OpenAI commands, but swap in Workers AI easily.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/configuration/open-ai-compatibility.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# OpenAI compatible API endpoints

Workers AI supports OpenAI compatible endpoints for [text generation](https://developers.cloudflare.com/workers-ai/models/) (`/v1/chat/completions`) and [text embedding models](https://developers.cloudflare.com/workers-ai/models/) (`/v1/embeddings`). This allows you to use the same code as you would for your OpenAI commands, but swap in Workers AI easily.

  
## Usage

### Workers AI

Normally, Workers AI requires you to specify the model name in the cURL endpoint or within the `env.AI.run` function.

With OpenAI compatible endpoints, you can leverage the [openai-node sdk ↗](https://github.com/openai/openai-node) to make calls to Workers AI. This allows you to use Workers AI by simply changing the base URL and the model name.

OpenAI SDK Example

```

import OpenAI from "openai";


const openai = new OpenAI({

  apiKey: env.CLOUDFLARE_API_KEY,

  baseURL: `https://api.cloudflare.com/client/v4/accounts/${env.CLOUDFLARE_ACCOUNT_ID}/ai/v1`,

});


// Use chat completions

const chatCompletion = await openai.chat.completions.create({

  messages: [{ role: "user", content: "Make some robot noises" }],

  model: "@cf/meta/llama-3.1-8b-instruct",

});


// Use responses

const response = await openai.responses.create({

  model: "@cf/openai/gpt-oss-120b",

  input: "Talk to me about open source",

});


const embeddings = await openai.embeddings.create({

  model: "@cf/baai/bge-large-en-v1.5",

  input: "I love matcha",

});


```

cURL example

```

curl --request POST \

  --url https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/v1/chat/completions \

  --header "Authorization: Bearer {api_token}" \

  --header "Content-Type: application/json" \

  --data '

    {

      "model": "@cf/meta/llama-3.1-8b-instruct",

      "messages": [

        {

          "role": "user",

          "content": "how to build a wooden spoon in 3 short steps? give as short as answer as possible"

        }

      ]

    }

'


```

### AI Gateway

These endpoints are also compatible with [AI Gateway](https://developers.cloudflare.com/ai-gateway/usage/providers/workersai/#openai-compatible-endpoints).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/configuration/","name":"Configuration"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/configuration/open-ai-compatibility/","name":"OpenAI compatible API endpoints"}}]}
```

---

---
title: Asynchronous Batch API
description: Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/batch-api/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Asynchronous Batch API

Asynchronous batch processing lets you send a collection (batch) of inference requests in a single call. Instead of expecting immediate responses for every request, the system queues them for processing and returns the results later.

Batch processing is useful for large workloads such as summarization or embeddings when there is no human interaction. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if Cloudflare does not have enough capacity at a given time.

When you send a batch request, the API immediately acknowledges receipt with a status like `queued` and provides a unique `request_id`. This ID is later used to poll for the final responses once the processing is complete.

You can use the Batch API by either creating and deploying a Cloudflare Worker that leverages the [Batch API with the AI binding](https://developers.cloudflare.com/workers-ai/features/batch-api/workers-binding/), using the [REST API](https://developers.cloudflare.com/workers-ai/features/batch-api/rest-api/) directly or by starting from a [template ↗](https://github.com/craigsdennis/batch-please-workers-ai).

Note

Ensure that the total payload is under 10 MB.

## Demo application

If you want to get started quickly, click the button below:

[![Deploy to Workers](https://deploy.workers.cloudflare.com/button)](https://deploy.workers.cloudflare.com/?url=https://github.com/craigsdennis/batch-please-workers-ai)

This will create a repository in your GitHub account and deploy a ready-to-use Worker that demonstrates how to use Cloudflare's Asynchronous Batch API. The template includes preconfigured AI bindings, and examples for sending and retrieving batch requests with and without external references. Once deployed, you can visit the live Worker and start experimenting with the Batch API immediately.

## Supported Models

Refer to our [model catalog](https://developers.cloudflare.com/workers-ai/models/?capabilities=Batch) for supported models.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/batch-api/","name":"Asynchronous Batch API"}}]}
```

---

---
title: REST API
description: If you prefer to work directly with the REST API instead of a Cloudflare Worker, below are the steps on how to do it:
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/batch-api/rest-api.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# REST API

If you prefer to work directly with the REST API instead of a [Cloudflare Worker](https://developers.cloudflare.com/workers-ai/features/batch-api/workers-binding/), below are the steps on how to do it:

## 1\. Sending a Batch Request

Make a POST request using the following pattern. You can pass `external_reference` as a unique ID per-request that will be returned in the response.

Sending a batch request

```

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \

 --header "Authorization: Bearer $API_TOKEN" \

 --header 'Content-Type: application/json' \

 --json '{

    "requests": [

        {

            "query": "This is a story about Cloudflare",

            "contexts": [

                {

                    "text": "This is a story about an orange cloud"

                },

                {

                    "text": "This is a story about a llama"

                },

                {

                    "text": "This is a story about a hugging emoji"

                }

            ],

            "external_reference": "reference-1"

        }

    ]

  }'


```

```

{

  "result": {

    "status": "queued",

    "request_id": "768f15b7-4fd6-4498-906e-ad94ffc7f8d2",

    "model": "@cf/baai/bge-m3"

  },

  "success": true,

  "errors": [],

  "messages": []

}


```

## 2\. Retrieving the Batch Response

After receiving a `request_id` from your initial POST, you can poll for or retrieve the results with another POST request:

Retrieving a response

```

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/run/@cf/baai/bge-m3?queueRequest=true" \

 --header "Authorization: Bearer $API_TOKEN" \

 --header 'Content-Type: application/json' \

 --json '{

    "request_id": "<uuid>"

  }'


```

```

{

  "result": {

    "responses": [

      {

        "id": 0,

        "result": {

          "response": [

            { "id": 0, "score": 0.73974609375 },

            { "id": 1, "score": 0.642578125 },

            { "id": 2, "score": 0.6220703125 }

          ]

        },

        "success": true,

        "external_reference": "reference-1"

      }

    ],

    "usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }

  },

  "success": true,

  "errors": [],

  "messages": []

}


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/batch-api/","name":"Asynchronous Batch API"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/batch-api/rest-api/","name":"REST API"}}]}
```

---

---
title: Workers Binding
description: You can use Workers Bindings to interact with the Batch API.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/batch-api/workers-binding.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Workers Binding

You can use Workers Bindings to interact with the Batch API.

## Send a Batch request

Send your initial batch inference request by composing a JSON payload containing an array of individual inference requests and the `queueRequest: true` property (which is what controlls queueing behavior).

Note

Ensure that the total payload is under 10 MB.

src/index.ts

```

export interface Env {

  AI: Ai;

}

export default {

  async fetch(request, env): Promise<Response> {

    const embeddings = await env.AI.run(

      "@cf/baai/bge-m3",

      {

        requests: [

          {

            query: "This is a story about Cloudflare",

            contexts: [

              {

                text: "This is a story about an orange cloud",

              },

              {

                text: "This is a story about a llama",

              },

              {

                text: "This is a story about a hugging emoji",

              },

            ],

          },

        ],

      },

      { queueRequest: true },

    );


    return Response.json(embeddings);

  },

} satisfies ExportedHandler<Env>;


```

```

{

  "status": "queued",

  "model": "@cf/baai/bge-m3",

  "request_id": "000-000-000"

}


```

You will get a response with the following values:

* **`status`**: Indicates that your request is queued.
* **`request_id`**: A unique identifier for the batch request.
* **`model`**: The model used for the batch inference.

Of these, the `request_id` is important for when you need to [poll the batch status](#poll-batch-status).

### Poll batch status

Once your batch request is queued, use the `request_id` to poll for its status. During processing, the API returns a status `queued` or `running` indicating that the request is still in the queue or being processed.

src/index.ts

```

export interface Env {

  AI: Ai;

}


export default {

  async fetch(request, env): Promise<Response> {

    const status = await env.AI.run("@cf/baai/bge-m3", {

      request_id: "000-000-000",

    });


    return Response.json(status);

  },

} satisfies ExportedHandler<Env>;


```

```

{

  "responses": [

    {

      "id": 0,

      "result": {

        "response": [

          { "id": 0, "score": 0.73974609375 },

          { "id": 1, "score": 0.642578125 },

          { "id": 2, "score": 0.6220703125 }

        ]

      },

      "success": true,

      "external_reference": "reference-1"

    }

  ],

  "usage": { "prompt_tokens": 12, "completion_tokens": 0, "total_tokens": 12 }

}


```

When the inference is complete, the API returns a final HTTP status code of `200` along with an array of responses. Each response object corresponds to an individual input prompt, identified by an `id` that maps to the index of the prompt in your original request.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/batch-api/","name":"Asynchronous Batch API"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/batch-api/workers-binding/","name":"Workers Binding"}}]}
```

---

---
title: Fine-tunes
description: Learn how to use Workers AI to get fine-tuned inference.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/fine-tunes/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Fine-tunes

Learn how to use Workers AI to get fine-tuned inference.

### Fine-tuned inference with LoRAs

Upload a LoRA adapter and run fine-tuned inference with one of our base models.

[ Run inference with LoRAs ](https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/) 

---

## What is fine-tuning?

Fine-tuning is a general term for modifying an AI model by continuing to train it with additional data. The goal of fine-tuning is to increase the probability that a generation is similar to your dataset. Training a model from scratch is not practical for many use cases given how expensive and time consuming they can be to train. By fine-tuning an existing pre-trained model, you benefit from its capabilities while also accomplishing your desired task.

[Low-Rank Adaptation ↗](https://arxiv.org/abs/2106.09685) (LoRA) is a specific fine-tuning method that can be applied to various model architectures, not just LLMs. It is common that the pre-trained model weights are directly modified or fused with additional fine-tune weights in traditional fine-tuning methods. LoRA, on the other hand, allows for the fine-tune weights and pre-trained model to remain separate, and for the pre-trained model to remain unchanged. The end result is that you can train models to be more accurate at specific tasks, such as generating code, having a specific personality, or generating images in a specific style.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/fine-tunes/","name":"Fine-tunes"}}]}
```

---

---
title: Using LoRA adapters
description: Upload and use LoRA adapters to get fine-tuned inference on Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/fine-tunes/loras.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Using LoRA adapters

Workers AI supports fine-tuned inference with adapters trained with [Low-Rank Adaptation ↗](https://blog.cloudflare.com/fine-tuned-inference-with-loras). This feature is in open beta and free during this period.

## Limitations

* We only support LoRAs for a [variety of models](https://developers.cloudflare.com/workers-ai/models/?capabilities=LoRA) (must not be quantized)
* Adapter must be trained with rank `r <=8` as well as larger ranks if up to 32\. You can check the rank of a pre-trained LoRA adapter through the adapter's `config.json` file
* LoRA adapter file must be < 300MB
* LoRA adapter files must be named `adapter_config.json` and `adapter_model.safetensors` exactly
* You can test up to 100 LoRA adapters per account

---

## Choosing compatible LoRA adapters

### Finding open-source LoRA adapters

We have started a [Hugging Face Collection ↗](https://huggingface.co/collections/Cloudflare/workers-ai-compatible-loras-6608dd9f8d305a46e355746e) that lists a few LoRA adapters that are compatible with Workers AI. Generally, any LoRA adapter that fits our limitations above should work.

### Training your own LoRA adapters

To train your own LoRA adapter, follow the [tutorial](https://developers.cloudflare.com/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/).

---

## Uploading LoRA adapters

In order to run inference with LoRAs on Workers AI, you'll need to create a new fine tune on your account and upload your adapter files. You should have a `adapter_model.safetensors` file with model weights and `adapter_config.json` with your config information. _Note that we only accept adapter files in these types._

Right now, you can't edit a fine tune's asset files after you upload it. We will support this soon, but for now you will need to create a new fine tune and upload files again if you would like to use a new LoRA.

Before you upload your LoRA adapter, you'll need to edit your `adapter_config.json` file to include `model_type` as one of `mistral`, `gemma` or `llama` like below.

```

{

  "alpha_pattern": {},

  "auto_mapping": null,

  ...

  "target_modules": [

    "q_proj",

    "v_proj"

  ],

  "task_type": "CAUSAL_LM",

  "model_type": "mistral",

}


```

### Wrangler

You can create a finetune and upload your LoRA adapter via wrangler with the following commands:

wrangler CLI

```

npx wrangler ai finetune create <model_name> <finetune_name> <folder_path>

#🌀 Creating new finetune "test-lora" for model "@cf/mistral/mistral-7b-instruct-v0.2-lora"...

#🌀 Uploading file "/Users/abcd/Downloads/adapter_config.json" to "test-lora"...

#🌀 Uploading file "/Users/abcd/Downloads/adapter_model.safetensors" to "test-lora"...

#✅ Assets uploaded, finetune "test-lora" is ready to use.


npx wrangler ai finetune list

┌──────────────────────────────────────┬─────────────────┬─────────────┐

│ finetune_id                          │ name            │ description │

├──────────────────────────────────────┼─────────────────┼─────────────┤

│ 00000000-0000-0000-0000-000000000000 │ test-lora       │             │

└──────────────────────────────────────┴─────────────────┴─────────────┘


```

### REST API

Alternatively, you can use our REST API to create a finetune and upload your adapter files. You will need a Cloudflare API Token with `Workers AI: Edit` permissions to make calls to our REST API, which you can generate via the Cloudflare Dashboard.

#### Creating a fine-tune on your account

Required API token permissions

At least one of the following [token permissions](https://developers.cloudflare.com/fundamentals/api/reference/permissions/)is required:
* `Workers AI Write`

Create a new Finetune

```

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/finetunes" \

  --request POST \

  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN" \

  --json '{

    "model": "SUPPORTED_MODEL_NAME",

    "name": "FINETUNE_NAME",

    "description": "OPTIONAL_DESCRIPTION"

  }'


```

#### Uploading your adapter weights and config

You have to call the upload endpoint each time you want to upload a new file, so you usually run this once for `adapter_model.safetensors` and once for `adapter_config.json`. Make sure you include the `@` before your path to files.

You can either use the finetune `name` or `id` that you used when you created the fine tune.

cURL

```

## Input: finetune_id, adapter_model.safetensors, then adapter_config.json

## Output: success true/false


curl -X POST https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/finetunes/{FINETUNE_ID}/finetune-assets/ \

    -H 'Authorization: Bearer {API_TOKEN}' \

    -H 'Content-Type: multipart/form-data' \

    -F 'file_name=adapter_model.safetensors' \

    -F 'file=@{PATH/TO/adapter_model.safetensors}'


```

#### List fine-tunes in your account

You can call this method to confirm what fine-tunes you have created in your account

Required API token permissions

At least one of the following [token permissions](https://developers.cloudflare.com/fundamentals/api/reference/permissions/)is required:
* `Workers AI Write`
* `Workers AI Read`

List Finetunes

```

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/finetunes" \

  --request GET \

  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN"


```

```

{

  "success": true,

  "result": [

    [

      {

        "id": "00000000-0000-0000-0000-000000000",

        "model": "@cf/meta-llama/llama-2-7b-chat-hf-lora",

        "name": "llama2-finetune",

        "description": "test"

      },

      {

        "id": "00000000-0000-0000-0000-000000000",

        "model": "@cf/mistralai/mistral-7b-instruct-v0.2-lora",

        "name": "mistral-finetune",

        "description": "test"

      }

    ]

  ]

}


```

---

## Running inference with LoRAs

To make inference requests and apply the LoRA adapter, you will need your model and finetune `name` or `id`. You should use the chat template that your LoRA was trained on, but you can try running it with `raw: true` and the messages template like below.

* [ workers ai sdk ](#tab-panel-6923)
* [ rest api ](#tab-panel-6924)

JavaScript

```

const response = await env.AI.run(

  "@cf/mistralai/mistral-7b-instruct-v0.2-lora", //the model supporting LoRAs

  {

    messages: [{ role: "user", content: "Hello world" }],

    raw: true, //skip applying the default chat template

    lora: "00000000-0000-0000-0000-000000000", //the finetune id OR name

  },

);


```

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/mistral/mistral-7b-instruct-v0.2-lora \

  -H 'Authorization: Bearer {API_TOKEN}' \

  -d '{

    "messages": [{"role": "user", "content": "Hello world"}],

    "raw": "true",

    "lora": "00000000-0000-0000-0000-000000000"

  }'


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/fine-tunes/","name":"Fine-tunes"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/fine-tunes/loras/","name":"Using LoRA adapters"}}]}
```

---

---
title: Public LoRA adapters
description: Cloudflare offers a few public LoRA adapters that are immediately ready for use.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/fine-tunes/public-loras.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Public LoRA adapters

Cloudflare offers a few public LoRA adapters that can immediately be used for fine-tuned inference. You can try them out immediately via our [playground ↗](https://playground.ai.cloudflare.com).

Public LoRAs will have the name `cf-public-x`, and the prefix will be reserved for Cloudflare.

Note

Have more LoRAs you would like to see? Let us know on [Discord ↗](https://discord.cloudflare.com).

| Name                                                                         | Description                        | Compatible with                                                           |
| ---------------------------------------------------------------------------- | ---------------------------------- | ------------------------------------------------------------------------- |
| [cf-public-magicoder ↗](https://huggingface.co/predibase/magicoder)          | Coding tasks in multiple languages | @cf/mistral/mistral-7b-instruct-v0.1 @hf/mistral/mistral-7b-instruct-v0.2 |
| [cf-public-jigsaw-classification ↗](https://huggingface.co/predibase/jigsaw) | Toxic comment classification       | @cf/mistral/mistral-7b-instruct-v0.1 @hf/mistral/mistral-7b-instruct-v0.2 |
| [cf-public-cnn-summarization ↗](https://huggingface.co/predibase/cnn)        | Article summarization              | @cf/mistral/mistral-7b-instruct-v0.1 @hf/mistral/mistral-7b-instruct-v0.2 |

You can also list these public LoRAs with an API call:

Required API token permissions

At least one of the following [token permissions](https://developers.cloudflare.com/fundamentals/api/reference/permissions/)is required:
* `Workers AI Write`
* `Workers AI Read`

List Public Finetunes

```

curl "https://api.cloudflare.com/client/v4/accounts/$ACCOUNT_ID/ai/finetunes/public" \

  --request GET \

  --header "Authorization: Bearer $CLOUDFLARE_API_TOKEN"


```

## Running inference with public LoRAs

To run inference with public LoRAs, you just need to define the LoRA name in the request.

We recommend that you use the prompt template that the LoRA was trained on. You can find this in the HuggingFace repos linked above for each adapter.

### cURL

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/mistral/mistral-7b-instruct-v0.1 \

  --header 'Authorization: Bearer {cf_token}' \

  --data '{

    "messages": [

      {

        "role": "user",

        "content": "Write a python program to check if a number is even or odd."

      }

    ],

    "lora": "cf-public-magicoder"

  }'


```

### JavaScript

JavaScript

```

const answer = await env.AI.run("@cf/mistral/mistral-7b-instruct-v0.1", {

  stream: true,

  raw: true,

  messages: [

    {

      role: "user",

      content:

        "Summarize the following: Some newspapers, TV channels and well-known companies publish false news stories to fool people on 1 April. One of the earliest examples of this was in 1957 when a programme on the BBC, the UKs national TV channel, broadcast a report on how spaghetti grew on trees. The film showed a family in Switzerland collecting spaghetti from trees and many people were fooled into believing it, as in the 1950s British people didnt eat much pasta and many didnt know how it was made! Most British people wouldnt fall for the spaghetti trick today, but in 2008 the BBC managed to fool their audience again with their Miracles of Evolution trailer, which appeared to show some special penguins that had regained the ability to fly. Two major UK newspapers, The Daily Telegraph and the Daily Mirror, published the important story on their front pages.",

    },

  ],

  lora: "cf-public-cnn-summarization",

});


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/fine-tunes/","name":"Fine-tunes"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/fine-tunes/public-loras/","name":"Public LoRA adapters"}}]}
```

---

---
title: Function calling
description: Function calling enables people to take Large Language Models (LLMs) and use the model response to execute functions or interact with external APIs. The developer usually defines a set of functions and the required input schema for each function, which we call tools. The model then intelligently understands when it needs to do a tool call, and it returns a JSON output which the user needs to feed to another function or API.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ LLM ](https://developers.cloudflare.com/search/?tags=LLM) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Function calling

Function calling enables people to take Large Language Models (LLMs) and use the model response to execute functions or interact with external APIs. The developer usually defines a set of functions and the required input schema for each function, which we call `tools`. The model then intelligently understands when it needs to do a tool call, and it returns a JSON output which the user needs to feed to another function or API.

In essence, function calling allows you to perform actions with LLMs by executing code or making additional API calls.

## How can I use function calling?

Workers AI has [embedded function calling](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/) which allows you to execute function code alongside your inference calls. We have a package called [@cloudflare/ai-utils ↗](https://www.npmjs.com/package/@cloudflare/ai-utils) to help facilitate this, which we have open-sourced on [Github ↗](https://github.com/cloudflare/ai-utils).

For industry-standard function calling, take a look at the documentation on [Traditional Function Calling](https://developers.cloudflare.com/workers-ai/features/function-calling/traditional/).

To show you the value of embedded function calling, take a look at the example below that compares traditional function calling with embedded function calling. Embedded function calling allowed us to cut down the lines of code from 77 to 31.

* [ Embedded ](#tab-panel-6925)
* [ Traditional ](#tab-panel-6926)

Terminal window

```

# The ai-utils package enables embedded function calling

npm i @cloudflare/ai-utils


```

Embedded function calling example

```

import {

  createToolsFromOpenAPISpec,

  runWithTools,

  autoTrimTools,

} from "@cloudflare/ai-utils";


export default {

  async fetch(request, env, ctx) {

    const response = await runWithTools(

      env.AI,

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        messages: [{ role: "user", content: "Who is Cloudflare on github?" }],

        tools: [

          // You can pass the OpenAPI spec link or contents directly

          ...(await createToolsFromOpenAPISpec(

            "https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json",

            {

              overrides: [

                {

                  // for all requests on *.github.com, we'll need to add a User-Agent.

                  matcher: ({ url, method }) => {

                    return url.hostname === "api.github.com";

                  },

                  values: {

                    headers: {

                      "User-Agent":

                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",

                    },

                  },

                },

              ],

            },

          )),

        ],

      },

    ).then((response) => {

      return response;

    });


    return new Response(JSON.stringify(response));

  },

};


```

Traditional function calling example

```

export default {

  async fetch(request, env, ctx) {

    const response = await env.AI.run(

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        messages: [{ role: "user", content: "Who is Cloudflare on GitHub?" }],

        tools: [

          {

            name: "getGithubUser",

            description:

              "Provides publicly available information about someone with a GitHub account.",

            parameters: {

              type: "object",

              properties: {

                username: {

                  type: "string",

                  description: "The handle for the GitHub user account.",

                },

              },

              required: ["username"],

            },

          },

        ],

      },

    );


    const selected_tool = response.tool_calls[0];

    let res;


    if (selected_tool.name == "getGithubUser") {

      try {

        const username = selected_tool.arguments.username;

        const url = `https://api.github.com/users/${username}`;

        res = await fetch(url, {

          headers: {

            // Github API requires a User-Agent header

            "User-Agent":

              "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36",

          },

        }).then((res) => res.json());

      } catch (error) {

        return error;

      }

    }


    const finalResponse = await env.AI.run(

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        messages: [

          {

            role: "user",

            content: "Who is Cloudflare on GitHub?",

          },

          {

            role: "assistant",

            content: JSON.stringify(selected_tool),

          },

          {

            role: "tool",

            content: JSON.stringify(res),

          },

        ],

        tools: [

          {

            name: "getGithubUser",

            description:

              "Provides publicly available information about someone with a GitHub account.",

            parameters: {

              type: "object",

              properties: {

                username: {

                  type: "string",

                  description: "The handle for the GitHub user account.",

                },

              },

              required: ["username"],

            },

          },

        ],

      },

    );

    return new Response(JSON.stringify(finalResponse));

  },

};


```

## What models support function calling?

There are open-source models which have been fine-tuned to do function calling. When browsing our [model catalog](https://developers.cloudflare.com/workers-ai/models/), look for models with the function calling property beside it. For example, [@hf/nousresearch/hermes-2-pro-mistral-7b](https://developers.cloudflare.com/workers-ai/models/hermes-2-pro-mistral-7b/) is a fine-tuned variant of Mistral 7B that you can use for function calling.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}}]}
```

---

---
title: Embedded
description: Cloudflare has a unique embedded function calling feature that allows you to execute function code alongside your tool call inference. Our npm package @cloudflare/ai-utils is the developer toolkit to get started.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Embedded

Cloudflare has a unique [embedded function calling ↗](https://blog.cloudflare.com/embedded-function-calling) feature that allows you to execute function code alongside your tool call inference. Our npm package [@cloudflare/ai-utils ↗](https://www.npmjs.com/package/@cloudflare/ai-utils) is the developer toolkit to get started.

Embedded function calling can be used to easily make complex agents that interact with websites and APIs, like using natural language to create meetings on Google Calendar, saving data to Notion, automatically routing requests to other APIs, saving data to an R2 bucket - or all of this at the same time. All you need is a prompt and an OpenAPI spec to get started.

REST API support

Embedded function calling depends on features native to the Workers platform. This means that embedded function calling is only supported via [Cloudflare Workers](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/), not via the [REST API](https://developers.cloudflare.com/workers-ai/get-started/rest-api/).

## Resources

* [ Get Started ](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/get-started/)
* [ Examples ](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/examples/)
* [ API Reference ](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/api-reference/)
* [ Troubleshooting ](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/troubleshooting/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}}]}
```

---

---
title: API Reference
description: Learn more about the API reference for embedded function calling.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/api-reference.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# API Reference

Learn more about the API reference for [embedded function calling](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded).

## runWithTools

This wrapper method enables you to do embedded function calling. You pass it the AI binding, model, inputs (`messages` array and `tools` array), and optional configurations.

* `AI Binding`Ai  
   * The AI binding, such as `env.AI`.
* `model`BaseAiTextGenerationModels  
   * The ID of the model that supports function calling. For example, `@hf/nousresearch/hermes-2-pro-mistral-7b`.
* `input`Object  
   * `messages`RoleScopedChatInput\[\]  
   * `tools`AiTextGenerationToolInputWithFunction\[\]
* `config`Object  
   * `streamFinalResponse`boolean optional  
   * `maxRecursiveToolRuns`number optional  
   * `strictValidation`boolean optional  
   * `verbose`boolean optional  
   * `trimFunction`boolean optional - For the `trimFunction`, you can pass it `autoTrimTools`, which is another helper method we've devised to automatically choose the correct tools (using an LLM) before sending it off for inference. This means that your final inference call will have fewer input tokens.

## createToolsFromOpenAPISpec

This method lets you automatically create tool schemas based on OpenAPI specs, so you don't have to manually write or hardcode the tool schemas. You can pass the OpenAPI spec for any API in JSON or YAML format.

`createToolsFromOpenAPISpec` has a config input that allows you to perform overrides if you need to provide headers like Authentication or User-Agent.

* `spec`string  
   * The OpenAPI specification in either JSON or YAML format, or a URL to a remote OpenAPI specification.
* `config`Config optional - Configuration options for the createToolsFromOpenAPISpec function  
   * `overrides`ConfigRule\[\] optional  
   * `matchPatterns`RegExp\[\] optional  
   * `options` Object optional {`verbose` boolean optional }

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/api-reference/","name":"API Reference"}}]}
```

---

---
title: Use fetch() handler
description: Learn how to use the fetch() handler in Cloudflare Workers AI to enable LLMs to perform API calls, like retrieving a 5-day weather forecast using function calling.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/examples/fetch.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Use fetch() handler

**Last reviewed:**  over 1 year ago 

A very common use case is to provide the LLM with the ability to perform API calls via function calling.

In this example the LLM will retrieve the weather forecast for the next 5 days. To do so a `getWeather` function is defined that is passed to the LLM as tool.

The `getWeather`function extracts the user's location from the request and calls the external weather API via the Workers' [Fetch API](https://developers.cloudflare.com/workers/runtime-apis/fetch/) and returns the result.

Embedded function calling example with fetch()

```

import { runWithTools } from '@cloudflare/ai-utils';


type Env = {

  AI: Ai;

};


export default {

  async fetch(request, env, ctx) {

    // Define function

    const getWeather = async (args: { numDays: number }) => {

      const { numDays } = args;

      // Location is extracted from request based on

      // https://developers.cloudflare.com/workers/runtime-apis/request/#incomingrequestcfproperties

      const lat = request.cf?.latitude

      const long = request.cf?.longitude


      // Interpolate values for external API call

      const response = await fetch(

        `https://api.open-meteo.com/v1/forecast?latitude=${lat}&longitude=${long}&daily=temperature_2m_max,precipitation_sum&timezone=GMT&forecast_days=${numDays}`

      );

      return response.text();

    };

    // Run AI inference with function calling

    const response = await runWithTools(

      env.AI,

      // Model with function calling support

      '@hf/nousresearch/hermes-2-pro-mistral-7b',

      {

        // Messages

        messages: [

          {

            role: 'user',

            content: 'What the weather like the next 5 days? Respond as text',

          },

        ],

        // Definition of available tools the AI model can leverage

        tools: [

          {

            name: 'getWeather',

            description: 'Get the weather for the next [numDays] days',

            parameters: {

              type: 'object',

              properties: {

                numDays: { type: 'numDays', description: 'number of days for the weather forecast' },

              },

              required: ['numDays'],

            },

            // reference to previously defined function

            function: getWeather,

          },

        ],

      }

    );

    return new Response(JSON.stringify(response));

  },

} satisfies ExportedHandler<Env>;


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/","name":"Examples"}},{"@type":"ListItem","position":7,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/fetch/","name":"Use fetch() handler"}}]}
```

---

---
title: Use KV API
description: Learn how to use Cloudflare Workers AI to interact with KV storage, enabling persistent data handling with embedded function calling in a few lines of code.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/examples/kv.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Use KV API

Interact with persistent storage to retrieve or store information enables for powerful use cases.

In this example we show how embedded function calling can interact with other resources on the Cloudflare Developer Platform with a few lines of code.

## Pre-Requisites

For this example to work, you need to provision a [KV](https://developers.cloudflare.com/kv/) namespace first. To do so, follow the [KV - Get started ](https://developers.cloudflare.com/kv/get-started/) guide.

Importantly, your Wrangler file must be updated to include the `KV` binding definition to your respective namespace.

## Worker code

Embedded function calling example with KV API

```

import { runWithTools } from "@cloudflare/ai-utils";


type Env = {

  AI: Ai;

  KV: KVNamespace;

};


export default {

  async fetch(request, env, ctx) {

    // Define function

    const updateKvValue = async ({

      key,

      value,

    }: {

      key: string;

      value: string;

    }) => {

      const response = await env.KV.put(key, value);

      return `Successfully updated key-value pair in database: ${response}`;

    };


    // Run AI inference with function calling

    const response = await runWithTools(

      env.AI,

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        messages: [

          { role: "system", content: "Put user given values in KV" },

          { role: "user", content: "Set the value of banana to yellow." },

        ],

        tools: [

          {

            name: "KV update",

            description: "Update a key-value pair in the database",

            parameters: {

              type: "object",

              properties: {

                key: {

                  type: "string",

                  description: "The key to update",

                },

                value: {

                  type: "string",

                  description: "The value to update",

                },

              },

              required: ["key", "value"],

            },

            function: updateKvValue,

          },

        ],

      },

    );

    return new Response(JSON.stringify(response));

  },

} satisfies ExportedHandler<Env>;


```

## Verify results

To verify the results, run the following command

Terminal window

```

npx wrangler kv key get banana --binding KV --local


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/","name":"Examples"}},{"@type":"ListItem","position":7,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/kv/","name":"Use KV API"}}]}
```

---

---
title: Tools based on OpenAPI Spec
description: Oftentimes APIs are defined and documented via OpenAPI specification. The Cloudflare ai-utils package's createToolsFromOpenAPISpec function creates tools from the OpenAPI spec, which the LLM can then leverage to fulfill the prompt.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/examples/openapi.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Tools based on OpenAPI Spec

Oftentimes APIs are defined and documented via [OpenAPI specification ↗](https://swagger.io/specification/). The Cloudflare `ai-utils` package's `createToolsFromOpenAPISpec` function creates tools from the OpenAPI spec, which the LLM can then leverage to fulfill the prompt.

In this example the LLM will describe the a Github user, based Github's API and its OpenAPI spec.

Embedded function calling example from OpenAPI Spec

```

import { createToolsFromOpenAPISpec, runWithTools } from '@cloudflare/ai-utils';


type Env = {

  AI: Ai;

};


const APP_NAME = 'cf-fn-calling-example-app';


export default {

  async fetch(request, env, ctx) {

    const toolsFromOpenAPISpec = [

      // You can pass the OpenAPI spec link or contents directly

      ...(await createToolsFromOpenAPISpec(

        'https://gist.githubusercontent.com/mchenco/fd8f20c8f06d50af40b94b0671273dc1/raw/f9d4b5cd5944cc32d6b34cad0406d96fd3acaca6/partial_api.github.com.json',

        {

          overrides: [

            {

              matcher: ({ url }) => {

                return url.hostname === 'api.github.com';

              },

              // for all requests on *.github.com, we'll need to add a User-Agent.

              values: {

                headers: {

                  'User-Agent': APP_NAME,

                },

              },

            },

          ],

        }

      )),

    ];


    const response = await runWithTools(

      env.AI,

      '@hf/nousresearch/hermes-2-pro-mistral-7b',

      {

        messages: [

          {

            role: 'user',

            content: 'Who is cloudflare on Github and how many repos does the organization have?',

          },

        ],

        tools: toolsFromOpenAPISpec,

      }

    );


    return new Response(JSON.stringify(response));

  },

} satisfies ExportedHandler<Env>;


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/","name":"Examples"}},{"@type":"ListItem","position":7,"item":{"@id":"/workers-ai/features/function-calling/embedded/examples/openapi/","name":"Tools based on OpenAPI Spec"}}]}
```

---

---
title: Get Started
description: This guide will instruct you through setting up and deploying your first Workers AI project with embedded function calling. You will use Workers, a Workers AI binding, the ai-utils package, and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network with embedded function calling.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/get-started.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Get Started

This guide will instruct you through setting up and deploying your first Workers AI project with embedded function calling. You will use Workers, a Workers AI binding, the [ai-utils package ↗](https://github.com/cloudflare/ai-utils), and a large language model (LLM) to deploy your first AI-powered application on the Cloudflare global network with embedded function calling.

## 1\. Create a Worker project with Workers AI

Follow the [Workers AI Get Started Guide](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/) until step 2.

## 2\. Install additional npm package

Next, run the following command in your project repository to install the Worker AI utilities package.

 npm  yarn  pnpm  bun 

```
npm i @cloudflare/ai-utils
```

```
yarn add @cloudflare/ai-utils
```

```
pnpm add @cloudflare/ai-utils
```

```
bun add @cloudflare/ai-utils
```

## 3\. Add Workers AI Embedded function calling

Update the `index.ts` file in your application directory with the following code:

* [  JavaScript ](#tab-panel-6927)
* [  TypeScript ](#tab-panel-6928)

index.js

```

import { runWithTools } from "@cloudflare/ai-utils";


export default {

  async fetch(request, env, ctx) {

    // Define function

    const sum = (args) => {

      const { a, b } = args;

      return Promise.resolve((a + b).toString());

    };

    // Run AI inference with function calling

    const response = await runWithTools(

      env.AI,

      // Model with function calling support

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        // Messages

        messages: [

          {

            role: "user",

            content: "What the result of 123123123 + 10343030?",

          },

        ],

        // Definition of available tools the AI model can leverage

        tools: [

          {

            name: "sum",

            description: "Sum up two numbers and returns the result",

            parameters: {

              type: "object",

              properties: {

                a: { type: "number", description: "the first number" },

                b: { type: "number", description: "the second number" },

              },

              required: ["a", "b"],

            },

            // reference to previously defined function

            function: sum,

          },

        ],

      },

    );

    return new Response(JSON.stringify(response));

  },

};


```

index.ts

```

import { runWithTools } from "@cloudflare/ai-utils";


type Env = {

  AI: Ai;

};


export default {

  async fetch(request, env, ctx) {

    // Define function

    const sum = (args: { a: number; b: number }): Promise<string> => {

      const { a, b } = args;

      return Promise.resolve((a + b).toString());

    };

    // Run AI inference with function calling

    const response = await runWithTools(

      env.AI,

      // Model with function calling support

      "@hf/nousresearch/hermes-2-pro-mistral-7b",

      {

        // Messages

        messages: [

          {

            role: "user",

            content: "What the result of 123123123 + 10343030?",

          },

        ],

        // Definition of available tools the AI model can leverage

        tools: [

          {

            name: "sum",

            description: "Sum up two numbers and returns the result",

            parameters: {

              type: "object",

              properties: {

                a: { type: "number", description: "the first number" },

                b: { type: "number", description: "the second number" },

              },

              required: ["a", "b"],

            },

            // reference to previously defined function

            function: sum,

          },

        ],

      },

    );

    return new Response(JSON.stringify(response));

  },

} satisfies ExportedHandler<Env>;


```

This example imports the utils with `import { runWithTools} from "@cloudflare/ai-utils"` and follows the API reference below.

Moreover, in this example we define and describe a list of tools that the LLM can leverage to respond to the user query. Here, the list contains of only one tool, the `sum` function.

Abstracted by the `runWithTools` function, the following steps occur:

sequenceDiagram
    participant Worker as Worker
    participant WorkersAI as Workers AI

    Worker->>+WorkersAI: Send messages, function calling prompt, and available tools
    WorkersAI->>+Worker: Select tools and arguments for function calling
    Worker-->>-Worker: Execute function
    Worker-->>+WorkersAI: Send messages, function calling prompt and function result
    WorkersAI-->>-Worker: Send response incorporating function output

The `ai-utils package` is also open-sourced on [Github ↗](https://github.com/cloudflare/ai-utils).

## 4\. Local development & deployment

Follow steps 4 and 5 of the [Workers AI Get Started Guide](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/) for local development and deployment.

Workers AI Embedded Function Calling charges

Embedded function calling runs Workers AI inference requests. Standard charges for inference (e.g. tokens) usage will be charged. Resources consumed (e.g. CPU time) during embedded functions' code execution will be charged just as any other Worker's code execution.

## API reference

For more details, refer to [API reference](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/api-reference/).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/get-started/","name":"Get Started"}}]}
```

---

---
title: Troubleshooting
description: This section will describe tools for troubleshooting and address common errors.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/embedded/troubleshooting.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Troubleshooting

This section will describe tools for troubleshooting and address common errors.

## Logging

General [logging](https://developers.cloudflare.com/workers/observability/logs/) capabilities for Workers also apply to embedded function calling.

### Function invocations

The invocations of tools can be logged as in any Worker using `console.log()`:

Logging tool invocations

```

export default {

  async fetch(request, env, ctx) {

    const sum = (args: { a: number; b: number }): Promise<string> => {

      const { a, b } = args;

      // Logging from within embedded function invocations

      console.log(`The sum function has been invoked with the arguments a: ${a} and b: ${b}`)

      return Promise.resolve((a + b).toString());

    };

    ...

  }

}


```

### Logging within `runWithTools`

The `runWithTools` function has a `verbose` mode that emits helpful logs for debugging of function calls as well input and output statistics.

Enabled verbose mode

```

const response = await runWithTools(

  env.AI,

  '@hf/nousresearch/hermes-2-pro-mistral-7b',

  {

    messages: [

      ...

    ],

    tools: [

      ...

    ],

  },

  // Enable verbose mode

  { verbose: true }

);


```

## Performance

To respond to a LLM prompt with embedded function, potentially multiple AI inference requests and function invocations are needed, which can have an impact on user experience.

Consider the following to improve performance:

* Shorten prompts (to reduce time for input processing)
* Reduce number of tools provided
* Stream the final response to the end user (to minimize the time to interaction). See example below:

Streamed response example

```

async fetch(request, env, ctx) {

  const response = (await runWithTools(

    env.AI,

    '@hf/nousresearch/hermes-2-pro-mistral-7b',

    {

      messages: [

        ...

      ],

      tools: [

        ...

      ],

    },

    {

      // Enable response streaming

      streamFinalResponse: true,

    }

  )) as ReadableStream;


  // Set response headers for streaming

  return new Response(response, {

    headers: {

      'content-type': 'text/event-stream',

    },

  });

}


```

## Common Errors

If you are getting a `BadInput` error, your inputs may exceed our current context window for our models. Try reducing input tokens to resolve this error.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/embedded/","name":"Embedded"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/function-calling/embedded/troubleshooting/","name":"Troubleshooting"}}]}
```

---

---
title: Traditional
description: This page shows how you can do traditional function calling, as defined by industry standards. Workers AI also offers embedded function calling, which is drastically easier than traditional function calling.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/function-calling/traditional.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Traditional

This page shows how you can do traditional function calling, as defined by industry standards. Workers AI also offers [embedded function calling](https://developers.cloudflare.com/workers-ai/features/function-calling/embedded/), which is drastically easier than traditional function calling.

With traditional function calling, you define an array of tools with the name, description, and tool arguments. The example below shows how you would pass a tool called `getWeather` in an inference request to a model.

Traditional function calling example

```

const response = await env.AI.run("@hf/nousresearch/hermes-2-pro-mistral-7b", {

  messages: [

    {

      role: "user",

      content: "what is the weather in london?",

    },

  ],

  tools: [

    {

      name: "getWeather",

      description: "Return the weather for a latitude and longitude",

      parameters: {

        type: "object",

        properties: {

          latitude: {

            type: "string",

            description: "The latitude for the given location",

          },

          longitude: {

            type: "string",

            description: "The longitude for the given location",

          },

        },

        required: ["latitude", "longitude"],

      },

    },

  ],

});


return new Response(JSON.stringify(response.tool_calls));


```

The LLM will then return a JSON object with the required arguments and the name of the tool that was called. You can then pass this JSON object to make an API call.

```

[

  {

    "arguments": { "latitude": "51.5074", "longitude": "-0.1278" },

    "name": "getWeather"

  }

]


```

For a working example on how to do function calling, take a look at our [demo app ↗](https://github.com/craigsdennis/lightbulb-moment-tool-calling/blob/main/src/index.ts).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/function-calling/","name":"Function calling"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/function-calling/traditional/","name":"Traditional"}}]}
```

---

---
title: JSON Mode
description: When we want text-generation AI models to interact with databases, services, and external systems programmatically, typically when using tool calling or building AI agents, we must have structured response formats rather than natural language.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ JSON ](https://developers.cloudflare.com/search/?tags=JSON) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/json-mode.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# JSON Mode

When we want text-generation AI models to interact with databases, services, and external systems programmatically, typically when using tool calling or building AI agents, we must have structured response formats rather than natural language.

Workers AI supports JSON Mode, enabling applications to request a structured output response when interacting with AI models.

## Schema

JSON Mode is compatible with OpenAI’s implementation; to enable add the `response_format` property to the request object using the following convention:

```

{

  response_format: {

    title: "JSON Mode",

    type: "object",

    properties: {

      type: {

        type: "string",

        enum: ["json_object", "json_schema"],

      },

      json_schema: {},

    }

  }

}


```

Where `json_schema` must be a valid [JSON Schema ↗](https://json-schema.org/) declaration.

## JSON Mode example

When using JSON Format, pass the schema as in the example below as part of the request you send to the LLM.

```

{

  "messages": [

    {

      "role": "system",

      "content": "Extract data about a country."

    },

    {

      "role": "user",

      "content": "Tell me about India."

    }

  ],

  "response_format": {

    "type": "json_schema",

    "json_schema": {

      "type": "object",

      "properties": {

        "name": {

          "type": "string"

        },

        "capital": {

          "type": "string"

        },

        "languages": {

          "type": "array",

          "items": {

            "type": "string"

          }

        }

      },

      "required": [

        "name",

        "capital",

        "languages"

      ]

    }

  }

}


```

The LLM will follow the schema, and return a response such as below:

```

{

  "response": {

    "name": "India",

    "capital": "New Delhi",

    "languages": [

      "Hindi",

      "English",

      "Bengali",

      "Telugu",

      "Marathi",

      "Tamil",

      "Gujarati",

      "Urdu",

      "Kannada",

      "Odia",

      "Malayalam",

      "Punjabi",

      "Sanskrit"

    ]

  }

}


```

As you can see, the model is complying with the JSON schema definition in the request and responding with a validated JSON object.

## Supported Models

This is the list of models that now support JSON Mode:

* [@cf/meta/llama-3.1-8b-instruct-fast](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct-fast/)
* [@cf/meta/llama-3.1-70b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-70b-instruct/)
* [@cf/meta/llama-3.3-70b-instruct-fp8-fast](https://developers.cloudflare.com/workers-ai/models/llama-3.3-70b-instruct-fp8-fast/)
* [@cf/meta/llama-3-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/)
* [@cf/meta/llama-3.1-8b-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.1-8b-instruct/)
* [@cf/meta/llama-3.2-11b-vision-instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct/)
* [@hf/nousresearch/hermes-2-pro-mistral-7b](https://developers.cloudflare.com/workers-ai/models/hermes-2-pro-mistral-7b/)
* [@hf/thebloke/deepseek-coder-6.7b-instruct-awq](https://developers.cloudflare.com/workers-ai/models/deepseek-coder-6.7b-instruct-awq/)
* [@cf/deepseek-ai/deepseek-r1-distill-qwen-32b](https://developers.cloudflare.com/workers-ai/models/deepseek-r1-distill-qwen-32b/)

We will continue extending this list to keep up with new, and requested models.

Note that Workers AI can't guarantee that the model responds according to the requested JSON Schema. Depending on the complexity of the task and adequacy of the JSON Schema, the model may not be able to satisfy the request in extreme situations. If that's the case, then an error `JSON Mode couldn't be met` is returned and must be handled.

JSON Mode currently doesn't support streaming.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/json-mode/","name":"JSON Mode"}}]}
```

---

---
title: Markdown Conversion
description: Markdown is essential for text generation and large language models (LLMs) in training and inference because it can provide structured, semantic, human, and machine-readable input. Likewise, Markdown facilitates chunking and structuring input data for better retrieval and synthesis in the context of RAGs, and its simplicity and ease of parsing and rendering make it ideal for AI Agents.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Markdown Conversion

[Markdown ↗](https://en.wikipedia.org/wiki/Markdown) is essential for text generation and large language models (LLMs) in training and inference because it can provide structured, semantic, human, and machine-readable input. Likewise, Markdown facilitates chunking and structuring input data for better retrieval and synthesis in the context of RAGs, and its simplicity and ease of parsing and rendering make it ideal for AI Agents.

For these reasons, document conversion plays an important role when designing and developing AI applications. Workers AI provides the `toMarkdown` utility method that developers can use from the [env.AI](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/usage/binding/) binding or the [REST APIs](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/usage/rest-api/) for quick, easy, and convenient conversion and summary of documents in multiple formats to Markdown language.

## Pricing

`toMarkdown` is free for most format conversions. In some cases, like image conversion, it can use Workers AI models for object detection and summarization, which may incur additional costs if it exceeds the Workers AI free allocation limits. Refer to [what models we use](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/how-it-works/) and the [Workers AI pricing page](https://developers.cloudflare.com/workers-ai/platform/pricing/) for more details.

## Other Markdown conversion features

* Browser Rendering [/markdown](https://developers.cloudflare.com/browser-rendering/rest-api/markdown-endpoint/) REST API supports markdown conversion if you need to render a dynamic page or application in a real browser before converting it.
* [Markdown for Agents](https://developers.cloudflare.com/fundamentals/reference/markdown-for-agents/) allows real-time document conversion for Cloudflare zones using content negotiation headers.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}}]}
```

---

---
title: Conversion Options
description: By default, the toMarkdown service extracts text content from your files. To further extend the capabilities of the conversion process, you can pass options to the service to control how specific file types are converted.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/conversion-options.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Conversion Options

By default, the `toMarkdown` service extracts text content from your files. To further extend the capabilities of the conversion process, you can pass options to the service to control how specific file types are converted.

Options are organized by file type and are all optional.

## Available options

### Images

TypeScript

```

{

  image?: {

    descriptionLanguage?: 'en' | 'it' | 'de' | 'es' | 'fr' | 'pt';

  }

}


```

* `descriptionLanguage`: controls the language of the AI-generated image descriptions.

Warning

This option works on a _best-effort_ basis: it is not guaranteed that the resulting text will be in the desired language.

### HTML

TypeScript

```

{

  html?: {

    hostname?: string;

    cssSelector?: string;

  }

}


```

* `hostname`: string to use as a host when resolving relative links inside the HTML.
* `cssSelector`: string containing a CSS selector pattern to pick specific elements from your HTML. Refer to [how HTML is processed](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/how-it-works/#html) for more details.

### PDF

TypeScript

```

{

  pdf?: {

    metadata?: boolean;

  }

}


```

* `metadata`: Previously, all converted PDF files always included metadata information when converted. This option allows you to opt-out of this behavior.

## Examples

### Binding

To configure custom options, pass a `conversionOptions` object inside the second argument of the binding call, like this:

TypeScript

```

await env.AI.toMarkdown(..., {

  conversionOptions: {

    html: { ... },

    pdf: { ... },

    ...

   }

})


```

### REST API

Since the REST API uses file uploads, the request's `Content-Type` will be `multipart/form-data`. As such, include a new form field with your stringified object as a value:

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \

  -X POST \

  -H 'Authorization: Bearer {API_TOKEN}' \

  ...

  -F 'conversionOptions={ "html": { ... }, ... }'


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/markdown-conversion/conversion-options/","name":"Conversion Options"}}]}
```

---

---
title: How it works
description: When parsing files before converting them to Markdown, there are some cleanup tasks we do depending on the type of file you are trying to convert.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/how-it-works.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# How it works

## Pre-processing

When parsing files before converting them to Markdown, there are some cleanup tasks we do depending on the type of file you are trying to convert.

### HTML

When we detect an HTML file, a series of things happen to the HTML content before it is converted:

* Some elements are ignored, including `script` and `style` tags.
* Meta tags are extracted. These include `title`, `description`, `og:title`, `og:description` and `og:image`.
* [JSON-LD ↗](https://json-ld.org/) content is extracted, if it exists. This will be appended at the end of the converted markdown.
* The base URL to use for resolving relative links is extracted from the `<base>` element1, if it exists, according to the spec (that is, only the first instance of the base URL is counted).
* If the `cssSelector` option is:  
   * present, then only those elements that match the selector are kept for further processing;  
   * missing, then elements such as `<header>`, `<footer>` and `<head>` are removed from the text.
* If a base URL was obtained previously, relative links in the remaining HTML are resolved to fully qualified URLs

1 The host can also be set per request, using the HTML conversion options. Refer to [Conversion Options](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/conversion-options/#html) for more details.

### Images

Images take a bit more work to prepare for conversion.

As a first step, we detect what type the image is. If it is an SVG (Scalable Vector Graphics) file, we need to convert it into a raster format so that using the necessary Workers AI models does not fail. In this case, SVGs are converted into PNGs internally.

Afterwards:

* We try to determine the image's dimensions. If successful, we determine if the image is considered "too big" or not. An image is "too big" if its width is bigger than 1280px or its height is bigger than 720px.
* If the image is too big, we try to resize it to conform with those dimensions. If resizing fails, we simply try to use the original image data
* The image is sent to an **object-detection model**. Specifically, we use the [@cf/facebook/detr-resnet-50](https://developers.cloudflare.com/workers-ai/models/detr-resnet-50/) from Workers AI.
* If any objects were detected in the previous step, they are appended to a prompt that is used to instruct an **image-to-text model** on how to describe the image.
* If a preferred conversion language is specified in the request's conversion options, the previous prompt is enriched with a directive for the model to output the content in the desired language. Refer to [Conversion Options](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/conversion-options/#images) for more details.
* The final prompt is sent, along with the image data, to the [@cf/google/gemma-3-12b-it](https://developers.cloudflare.com/workers-ai/models/gemma-3-12b-it/) model, also from Workers AI.

### PDFs

* Metadata is extracted. This can be removed from the final result. Refer to [Conversion Options](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/conversion-options/#pdf) for more details.
* Each page is parsed in sequence.
* We try to obtain a `StructTree` object from the PDF file. This data structure is a tree of tagged elements that make up the PDF contents, as specified by [ISO 14289 (PDF/UA) ↗](https://www.iso.org/standard/64599.html).
* If none is obtained, we extract the text of the page _as-is_ and return it.
* If we manage to obtain a `StructTree`, we traverse its nodes to build a semantic Markdown representation of its contents.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/markdown-conversion/how-it-works/","name":"How it works"}}]}
```

---

---
title: Supported Formats
description: This list shows all rich-content formats that are currently supported for Markdown conversion and is updated frequently:
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/supported-formats.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Supported Formats

This list shows all rich-content formats that are currently supported for Markdown conversion and is updated frequently:

| Format                     | File extensions                       | Mime Types                                                                                                                                                                                                                                                              | |  PDF Documents | .pdf | application/pdf |
| -------------------------- | ------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------- | ---- | --------------- |
| Images 1                   | .jpeg, .jpg, .png, .webp, .svg        | image/jpeg, image/png, image/webp, image/svg+xml                                                                                                                                                                                                                        |                  |      |                 |
| HTML Documents             | .html, .htm                           | text/html                                                                                                                                                                                                                                                               |                  |      |                 |
| XML Documents              | .xml                                  | application/xml                                                                                                                                                                                                                                                         |                  |      |                 |
| Microsoft Office Documents | .xlsx, .xlsm, .xlsb, .xls, .et, .docx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-excel.sheet.macroenabled.12,application/vnd.ms-excel.sheet.binary.macroenabled.12,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.wordprocessingml.document |                  |      |                 |
| Open Document Format       | .ods, .odt                            | application/vnd.oasis.opendocument.spreadsheet,application/vnd.oasis.opendocument.text                                                                                                                                                                                  |                  |      |                 |
| CSV                        | .csv                                  | text/csv                                                                                                                                                                                                                                                                |                  |      |                 |
| Apple Documents            | .numbers                              | application/vnd.apple.numbers                                                                                                                                                                                                                                           |                  |      |                 |

1 Image conversion uses two Workers AI models for object detection and summarization. See [Workers AI pricing](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/#pricing) for more details.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/markdown-conversion/supported-formats/","name":"Supported Formats"}}]}
```

---

---
title: Workers Binding
description: Cloudflare’s serverless platform allows you to run code at the edge to build full-stack applications with Workers. A binding enables your Worker or Pages Function to interact with resources on the Cloudflare Developer Platform.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/usage/binding.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Workers Binding

Cloudflare’s serverless platform allows you to run code at the edge to build full-stack applications with [Workers](https://developers.cloudflare.com/workers/). A [binding](https://developers.cloudflare.com/workers/runtime-apis/bindings/) enables your Worker or Pages Function to interact with resources on the Cloudflare Developer Platform.

To use our Markdown Conversion service directly from your Workers, create an AI binding either in the Cloudflare dashboard (refer to [AI bindings](https://developers.cloudflare.com/pages/functions/bindings/#workers-ai) for instructions), or you can update your [Wrangler file](https://developers.cloudflare.com/workers/wrangler/configuration/). Add the following to your Wrangler file:

* [  wrangler.jsonc ](#tab-panel-6929)
* [  wrangler.toml ](#tab-panel-6930)

```

{

  "$schema": "./node_modules/wrangler/config-schema.json",

  "ai": {

    "binding": "AI"

  }

}


```

```

[ai]

binding = "AI" # i.e. available in your Worker on env.AI


```

## Examples

### Converting files

In this example, we fetch a PDF document and an image from R2 and feed them both to `env.AI.toMarkdown`. The result is a list of converted documents. Workers AI models are used automatically to detect and summarize the image.

* [  JavaScript ](#tab-panel-6939)
* [  TypeScript ](#tab-panel-6940)

JavaScript

```

import { Env } from "./env";


export default {

  async fetch(request, env, ctx) {

    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf

    const pdf = await env.R2.get("somatosensory.pdf");


    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg

    const cat = await env.R2.get("cat.jpeg");


    return Response.json(

      await env.AI.toMarkdown([

        {

          name: "somatosensory.pdf",

          blob: new Blob([await pdf.arrayBuffer()], {

            type: "application/pdf",

          }),

        },

        {

          name: "cat.jpeg",

          blob: new Blob([await cat.arrayBuffer()], {

            type: "image/jpeg",

          }),

        },

      ]),

    );

  },

};


```

TypeScript

```

import { Env } from "./env";


export default {

  async fetch(request: Request, env: Env, ctx: ExecutionContext) {

    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf

    const pdf = await env.R2.get("somatosensory.pdf");


    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg

    const cat = await env.R2.get("cat.jpeg");


    return Response.json(

      await env.AI.toMarkdown([

        {

          name: "somatosensory.pdf",

          blob: new Blob([await pdf.arrayBuffer()], {

            type: "application/pdf",

          }),

        },

        {

          name: "cat.jpeg",

          blob: new Blob([await cat.arrayBuffer()], {

            type: "image/jpeg",

          }),

        },

      ]),

    );

  },

};


```

### Getting supported file formats

* [  JavaScript ](#tab-panel-6933)
* [  TypeScript ](#tab-panel-6934)

JavaScript

```

import { Env } from "./env";


export default {

  async fetch(request, env, ctx) {

    return Response.json(await env.AI.toMarkdown().supported());

  },

};


```

TypeScript

```

import { Env } from "./env";


export default {

  async fetch(request: Request, env: Env, ctx: ExecutionContext) {

    return Response.json(await env.AI.toMarkdown().supported());

  },

};


```

## Methods

### async env.AI.toMarkdown()

Takes a document or list of documents in different formats and converts them to Markdown.

* [  JavaScript ](#tab-panel-6931)
* [  TypeScript ](#tab-panel-6932)

JavaScript

```

const result = await env.AI.toMarkdown({

  name: "document.pdf",

  blob: new Blob([documentBuffer]),

});


```

TypeScript

```

const result = await env.AI.toMarkdown({

  name: "document.pdf",

  blob: new Blob([documentBuffer]),

});


```

#### Parameter

* `files`: ` MarkdownDocument | MarkdownDocument[] `\- an instance of or an array of `MarkdownDocument`s.
* `conversionOptions`: ` ConversionOptions `\- options that control how conversion happens. See [Conversion Options](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/conversion-options/) for further details.

#### Return values

* `results`:` Promise<ConversionResult | ConversionResult[]> `\- An instance of or an array of `ConversionResult`s.

#### `MarkdownDocument` definition

* `name` ` string `  
   * Name of the document to convert.
* `blob` ` Blob `  
   * A new [Blob ↗](https://developer.mozilla.org/en-US/docs/Web/API/Blob/Blob) object with the document content.

#### `ConversionResult` definition

* `id` ` string `  
   * ID associated to this object.
* `name` ` string `  
   * Name of the converted document. Matches the input name.
* `format` ` 'markdown' | 'error' `  
   * The format of this `ConversionResult` object
* `mimetype` ` string `  
   * The detected [mime type ↗](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME%5Ftypes/Common%5Ftypes) of the document.
* `tokens` ` number `  
   * The estimated number of tokens of the converted document. Only present if `format` is equal to `markdown`.
* `data` ` string `  
   * The content of the converted document in Markdown format. Only present if `format` is equal to `markdown`.
* `error` ` string `  
   * The error message explaining why this conversion failed. Only present if `format` is equal to `error`.

### async env.AI.toMarkdown().transform()

This method is similar to `env.AI.toMarkdown` except that it is exposed through a new handle. It takes the same arguments and returns the same values.

* [  JavaScript ](#tab-panel-6937)
* [  TypeScript ](#tab-panel-6938)

JavaScript

```

const result = await env.AI.toMarkdown().transform({

  name: "document.pdf",

  blob: new Blob([documentBuffer]),

});


```

TypeScript

```

const result = await env.AI.toMarkdown().transform({

  name: "document.pdf",

  blob: new Blob([documentBuffer]),

});


```

### async env.AI.toMarkdown().supported()

Returns a list of file formats that are currently supported for markdown conversion. See [Supported formats](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/supported-formats/) for the full list of file formats that can be converted into Markdown.

* [  JavaScript ](#tab-panel-6935)
* [  TypeScript ](#tab-panel-6936)

JavaScript

```

const formats = await env.AI.toMarkdown().supported();


```

TypeScript

```

const formats = await env.AI.toMarkdown().supported();


```

#### Return values

* `results`: ` SupportedFormat[] `\- An array of all formats supported for markdown conversion.

#### `SupportedFormat` definition

* `extension` ` string `  
   * Extension of files in this format.
* `mimeType` ` string `  
   * The [mime type ↗](https://developer.mozilla.org/en-US/docs/Web/HTTP/Guides/MIME%5Ftypes/Common%5Ftypes) of files of this format

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/markdown-conversion/usage/","name":"Usage"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/markdown-conversion/usage/binding/","name":"Workers Binding"}}]}
```

---

---
title: REST API
description: You can also use the Markdown Conversion REST API to convert your documents into Markdown.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/markdown-conversion/usage/rest-api.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# REST API

You can also use the Markdown Conversion REST API to convert your documents into Markdown.

## Prerequisite: Get Workers AI API token

To use the Markdown Conversion service via the REST API, you need an API token with permissions for the [Workers AI](https://developers.cloudflare.com/workers-ai/) REST API. Refer to [Get started with the Workers AI REST API](https://developers.cloudflare.com/workers-ai/get-started/rest-api/) for instructions on obtaining an API token with the correct permissions.

## Transform

This endpoint lets you convert any file given to us into markdown.

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown \

  -X POST \

  -H 'Authorization: Bearer {API_TOKEN}' \

  -F "files=@cat.jpeg" \

  -F "files=@somatosensory.pdf" \

  -F 'conversionOptions={ ... }'


```

Note

You can get your `ACCOUNT_ID` by going to [Workers & Pages on the dashboard](https://developers.cloudflare.com/fundamentals/account/find-account-and-zone-ids/#find-account-id-workers-and-pages).

### Parameters

`files` ` File[] ` required

The files you want to convert.

`conversionOptions` ` ConversionOptions ` optional

Options that allow you to control how your files are converted. Refer to [Conversion Options](https://developers.cloudflare.com/workers-ai/features/markdown-conversion/conversion-options/) for further details.

### Response

```

{

  "success": true,

  "result": [

    {

      "id": "...",

      "name": "good.html",

      "mimeType": "text/html",

      "format": "markdown",

      "tokens": 49,

      "data": "# Image Embedded with a Data URI\n\nThis _image_ is directly encoded in the HTML:\n\n\n\nAn image description\n\n \n\nIt's a tiny 5x5 pixel PNG, scaled up to 50x50px.\n\n"

    },

    {

      "id": "...",

      "name": "bad.pdf",

      "mimeType": "application/pdf",

      "format": "error",

      "error": "Some error that prevented this image from being converted"

    }

  ]

}


```

## Supported

This endpoint lets you programmatically retrieve the full set of rich formats that are supported for conversion.

Terminal window

```

curl https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/tomarkdown/supported \

  -H 'Authorization: Bearer {API_TOKEN}'


```

Note

You can get your `ACCOUNT_ID` by going to [Workers & Pages on the dashboard](https://developers.cloudflare.com/fundamentals/account/find-account-and-zone-ids/#find-account-id-workers-and-pages).

### Response

```

{

  "success": true,

  "result": [

    {

      "extension": ".html",

      "mimeType": "text/html"

    },

    {

      "extension": ".pdf",

      "mimeType": "application/pdf"

    },

    ...

  ]

}


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/markdown-conversion/","name":"Markdown Conversion"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/features/markdown-conversion/usage/","name":"Usage"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/features/markdown-conversion/usage/rest-api/","name":"REST API"}}]}
```

---

---
title: Prompt caching
description: Use prefix caching and the x-session-affinity header to reduce latency and inference costs on Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/prompt-caching.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Prompt caching

Prompt caching (also called prefix caching) is a performance optimization that allows Workers AI to respond faster to requests with prompts that share common inputs. It reduces Time to First Token (TTFT) and increases Tokens Per Second (TPS) throughput by reusing previously computed input tensors instead of reprocessing them from scratch.

Cached input tokens are billed at a discounted rate compared to regular input tokens. Workers AI enables prefix caching by default for select models. Compatibility and pricing details are listed on each [model page](https://developers.cloudflare.com/workers-ai/models/).

## How it works

When an LLM processes a request, it goes through two stages:

1. **Prefill stage** — processes input tokens (system prompts, tool definitions, conversation history).
2. **Output stage** — generates output tokens.

With prefix caching, Workers AI stores the computed input tensors from the prefill stage. On subsequent requests that share the same prefix, the model skips prefill for the cached portion and only processes the new input tokens. This saves significant compute time, especially for agentic workloads where consecutive requests share large amounts of context.

For example, when a coding agent sends a new prompt, it typically resends all previous prompts, tool definitions, and conversation history. The delta between consecutive requests is often just a few new lines. Prefix caching avoids redundant prefill on all the shared context.

## Session affinity header

Prefix caching only works when a request routes to the same model instance that holds the cached tensors. To maximize cache hit rates, send the `x-session-affinity` header with a unique identifier for your session or agent. This routes requests with the same identifier to the same model instance, increasing the likelihood of a prefix cache hit.

### REST API

Terminal window

```

curl -X POST \

  "https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/moonshotai/kimi-k2.5" \

  -H "Authorization: Bearer {api_token}" \

  -H "Content-Type: application/json" \

  -H "x-session-affinity: ses_12345678" \

  -d '{

    "messages": [

      {

        "role": "system",

        "content": "You are a helpful assistant."

      },

      {

        "role": "user",

        "content": "What is prefix caching and why does it matter?"

      }

    ],

    "max_tokens": 2400,

    "stream": true

  }'


```

### Workers AI binding

JavaScript

```

const response = await env.AI.run(

  "@cf/moonshotai/kimi-k2.5",

  {

    messages: [

      { role: "system", content: "You are a helpful assistant." },

      { role: "user", content: "Explain prefix caching." },

    ],

  },

  {

    headers: {

      "x-session-affinity": "ses_12345678",

    },

  },

);


```

## Structuring prompts for caching

Prefix caching matches the exact token sequence from the start of the prompt. A single token difference invalidates the cache from that point onward.

To maximize cache hits:

* **Place static content first.** System prompts, tool definitions, and shared instructions should appear at the beginning of the prompt. Put user-specific or dynamic content (timestamps, user queries) at the end.
* **Avoid timestamps in system prompts.** Including a timestamp at the start of a system prompt changes the prefix on every request, defeating the cache entirely. If time context is required, add it to the user message instead.
* **Reuse tool definitions across requests.** For function-calling agents, tools are part of the prompt prefix. Keeping tool definitions consistent across requests in the same session increases cache reuse.

## Monitoring cached tokens

Workers AI surfaces cached token counts in the response `usage` object. Use this to verify that prefix caching is working and to track cost savings. The first request will usually be cold, so it is expected that cached tokens are not returned on the first hit. Inputs need to be sufficiently large enough in order to be cached due to block size. Cached tokens are billed at a lower rate than regular input tokens, which get totalled into your neuron count.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/prompt-caching/","name":"Prompt caching"}}]}
```

---

---
title: Prompting
description: Part of getting good results from text generation models is asking questions correctly. LLMs are usually trained with specific predefined templates, which should then be used with the model's tokenizer for better results when doing inference tasks.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/features/prompting.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Prompting

Part of getting good results from text generation models is asking questions correctly. LLMs are usually trained with specific predefined templates, which should then be used with the model's tokenizer for better results when doing inference tasks.

There are two ways to prompt text generation models with Workers AI:

Important

We recommend using unscoped prompts for inference with LoRA.

### Scoped Prompts

This is the **recommended** method. With scoped prompts, Workers AI takes the burden of knowing and using different chat templates for different models and provides a unified interface to developers when building prompts and creating text generation tasks.

Scoped prompts are a list of messages. Each message defines two keys: the role and the content.

Typically, the role can be one of three options:

* **system** \- System messages define the AI's personality. You can use them to set rules and how you expect the AI to behave.
* **user** \- User messages are where you actually query the AI by providing a question or a conversation.
* **assistant** \- Assistant messages hint to the AI about the desired output format. Not all models support this role.

OpenAI has a [good explanation ↗](https://platform.openai.com/docs/guides/text-generation#messages-and-roles) of how they use these roles with their GPT models. Even though chat templates are flexible, other text generation models tend to follow the same conventions.

Here's an input example of a scoped prompt using system and user roles:

JavaScript

```

{

  messages: [

    { role: "system", content: "you are a very funny comedian and you like emojis" },

    { role: "user", content: "tell me a joke about cloudflare" },

  ],

};


```

Here's a better example of a chat session using multiple iterations between the user and the assistant.

JavaScript

```

{

  messages: [

    { role: "system", content: "you are a professional computer science assistant" },

    { role: "user", content: "what is WASM?" },

    { role: "assistant", content: "WASM (WebAssembly) is a binary instruction format that is designed to be a platform-agnostic" },

    { role: "user", content: "does Python compile to WASM?" },

    { role: "assistant", content: "No, Python does not directly compile to WebAssembly" },

    { role: "user", content: "what about Rust?" },

  ],

};


```

Note that different LLMs are trained with different templates for different use cases. While Workers AI tries its best to abstract the specifics of each LLM template from the developer through a unified API, you should always refer to the model documentation for details. For example, instruct models like Codellama are fine-tuned to respond to a user-provided instruction, while chat models expect fragments of dialogs as input.

### Unscoped Prompts

You can use unscoped prompts to send a single question to the model without worrying about providing any context. Workers AI will automatically convert your `prompt` input to a reasonable default scoped prompt internally so that you get the best possible prediction.

JavaScript

```

{

  prompt: "tell me a joke about cloudflare";

}


```

You can also use unscoped prompts to construct the model chat template manually. In this case, you can use the raw parameter. Here's an input example of a [Mistral ↗](https://docs.mistral.ai/models/#chat-template) chat template prompt:

JavaScript

```

{

  prompt: "<s>[INST]comedian[/INST]</s>

[INST]tell me a joke about cloudflare[/INST]",

  raw: true

};


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/features/","name":"Features"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/features/prompting/","name":"Prompting"}}]}
```

---

---
title: Agents
description: Build AI-powered Agents on Cloudflare
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/agents.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Agents

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/agents/","name":"Agents"}}]}
```

---

---
title: Demos and architectures
description: Workers AI can be used to build dynamic and performant services. The following demo applications and reference architectures showcase how to use Workers AI optimally within your architecture.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/demos-architectures.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Demos and architectures

Workers AI can be used to build dynamic and performant services. The following demo applications and reference architectures showcase how to use Workers AI optimally within your architecture.

## Demos

Explore the following demo applications for Workers AI.

* [Jobs At Conf: ↗](https://github.com/harshil1712/jobs-at-conf-demo) A job lisiting website to add jobs you find at in-person conferences. Built with Cloudflare Pages, R2, D1, Queues, and Workers AI.

## Reference architectures

Explore the following reference architectures that use Workers AI:

[Fullstack applicationsA practical example of how these services come together in a real fullstack application architecture.](https://developers.cloudflare.com/reference-architecture/diagrams/serverless/fullstack-application/)[Storing user generated contentStore user-generated content in R2 for fast, secure, and cost-effective architecture.](https://developers.cloudflare.com/reference-architecture/diagrams/storage/storing-user-generated-content/)[Optimizing and securing connected transportation systemsThis diagram showcases Cloudflare components optimizing connected transportation systems. It illustrates how their technologies minimize latency, ensure reliability, and strengthen security for critical data flow.](https://developers.cloudflare.com/reference-architecture/diagrams/iot/optimizing-and-securing-connected-transportation-systems/)[Ingesting BigQuery Data into Workers AIYou can connect a Cloudflare Worker to get data from Google BigQuery and pass it to Workers AI, to run AI Models, powered by serverless GPUs.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/bigquery-workers-ai/)[Multi-vendor AI observability and controlBy shifting features such as rate limiting, caching, and error handling to the proxy layer, organizations can apply unified configurations across services and inference service providers.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-multivendor-observability-control/)[Composable AI architectureThe architecture diagram illustrates how AI applications can be built end-to-end on Cloudflare, or single services can be integrated with external infrastructure and services.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-composable/)[Content-based asset creationAI systems combine text-generation and text-to-image models to create visual content from text. They generate prompts, moderate content, and produce images for various applications.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-asset-creation/)[Retrieval Augmented Generation (RAG)RAG combines retrieval with generative models for better text. It uses external knowledge to create factual, relevant responses, improving coherence and accuracy in NLP tasks like chatbots.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-rag/)[Automatic captioning for video uploadsBy integrating automatic speech recognition technology into video platforms, content creators, publishers, and distributors can reach a broader audience, including individuals with hearing impairments or those who prefer to consume content in different languages.](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-video-caption/)[Serverless image content managementLeverage various components of Cloudflare's ecosystem to construct a scalable image management solution](https://developers.cloudflare.com/reference-architecture/diagrams/serverless/serverless-image-content-management/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/demos-architectures/","name":"Demos and architectures"}}]}
```

---

---
title: Tutorials
description: View tutorials to help you get started with Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/index.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Tutorials

View tutorials to help you get started with Workers AI.

## Docs

| Name                                                                                                                                                                                   | Last Updated      | Difficulty   |
| -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------- | ------------ |
| [Whisper-large-v3-turbo with Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking/)                           | about 1 year ago  | Beginner     |
| [Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI](https://developers.cloudflare.com/workers-ai/guides/tutorials/llama-vision-tutorial/)                                   | about 1 year ago  | Beginner     |
| [Store and Catalog AI Generated Images with R2 (Part 3)](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/) | about 1 year ago  | Beginner     |
| [Build a Retrieval Augmented Generation (RAG) AI](https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/)                            | over 1 year ago   | Beginner     |
| [Using BigQuery with Workers AI](https://developers.cloudflare.com/workers-ai/guides/tutorials/using-bigquery-with-workers-ai/)                                                        | over 1 year ago   | Beginner     |
| [How to Build an Image Generator using Workers AI](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/)                                         | over 1 year ago   | Beginner     |
| [Build an AI Image Generator Playground (Part 1)](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/)                     | over 1 year ago   | Beginner     |
| [Add New AI Models to your Playground (Part 2)](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/)             | over 1 year ago   | Beginner     |
| [Use event notification to summarize PDF files on upload](https://developers.cloudflare.com/r2/tutorials/summarize-pdf/)                                                               | over 1 year ago   | Intermediate |
| [Explore Workers AI Models Using a Jupyter Notebook](https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-workers-ai-models-using-a-jupyter-notebook/)                | over 1 year ago   | Beginner     |
| [Fine Tune Models With AutoTrain from HuggingFace](https://developers.cloudflare.com/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/)                                     | about 2 years ago | Beginner     |
| [Explore Code Generation Using DeepSeek Coder Models](https://developers.cloudflare.com/workers-ai/guides/tutorials/explore-code-generation-using-deepseek-coder-models/)              | about 2 years ago | Beginner     |
| [Choose the Right Text Generation Model](https://developers.cloudflare.com/workers-ai/guides/tutorials/how-to-choose-the-right-text-generation-model/)                                 | about 2 years ago | Beginner     |

## Videos

Also, explore our video resources on Workers AI:

[ Play ](https://youtube.com/watch?v=xu4Wb-IppmM) 

OpenAI Relay Server on Cloudflare Workers

In this video, Craig Dennis walks you through the deployment of OpenAI's relay server to use with their realtime API.

[ Play ](https://youtube.com/watch?v=y4PPsvHrQGA) 

Cloudflare Workflows | Batching and Monitoring Your Durable Execution (Part 2 of 3)

Workflows exposes metrics such as execution, error rates, steps, and total duration!

[ Play ](https://youtube.com/watch?v=slS4RBV0SBk) 

Cloudflare Workflows | Introduction (Part 1 of 3)

In this video, we introduce Cloudflare Workflows, the Newest Developer Platform Primitive at Cloudflare.

[ Play ](https://youtube.com/watch?v=W45MIi%5Ft%5Fgo) 

Building Front-End Applications | Now Supported by Cloudflare Workers

You can now build front-end applications, just like you do on Cloudflare Pages, but with the added benefit of Workers.

[ Play ](https://youtube.com/watch?v=10-kiyJNr8s) 

Build a private AI chatbot using Meta's Llama 3.1

In this video, you will learn how to set up a private AI chat powered by Llama 3.1 for secure, fast interactions, deploy the model on Cloudflare Workers for serverless, scalable performance and use Cloudflare's Workers AI for seamless integration and edge computing benefits.

[ Play ](https://youtube.com/watch?v=HXOpxNaKUzw) 

How to Build Event-Driven Applications with Cloudflare Queues

In this video, we demonstrate how to build an event-driven application using Cloudflare Queues. Event-driven system lets you decouple services, allowing them to process and scale independently.

[ Play ](https://youtube.com/watch?v=bwJkwD-F0kQ) 

Welcome to the Cloudflare Developer Channel

Welcome to the Cloudflare Developers YouTube channel. We've got tutorials and working demos and everything you need to level up your projects. Whether you're working on your next big thing or just dorking around with some side projects, we've got you covered! So why don't you come hang out, subscribe to our developer channel and together we'll build something awesome. You're gonna love it.

[ Play ](https://youtube.com/watch?v=doKt9wWQF9A) 

AI meets Maps | Using Cloudflare AI, Langchain, Mapbox, Folium and Streamlit

Welcome to RouteMe, a smart tool that helps you plan the most efficient route between landmarks in any city. Powered by Cloudflare Workers AI, Langchain and Mapbox. This Streamlit webapp uses LLMs and Mapbox off my scripts API to solve the classic traveling salesman problem, turning your sightseeing into an optimized adventure!

[ Play ](https://youtube.com/watch?v=9IjfyBJsJRQ) 

Use Vectorize to add additional context to your AI Applications through RAG

A RAG based AI Chat app that uses Vectorize to access video game data for employees of Gamertown.

[ Play ](https://youtube.com/watch?v=cK%5FleoJsBWY) 

Cloudflare Workers AI, Building a "Hello, World" AI App!

Cloudflare's Workers AI helps you add AI functionality to the apps you are building. In this video we show you how simple and straightforward it is to build the Hello World of AI apps in under 5 minutes.

[ Play ](https://youtube.com/watch?v=MlV9Kvkh9hw) 

Build a URL Shortener with an AI-based admin section

We are building a URL Shortener, shrty.dev, on Cloudflare. The apps uses Workers KV and Workers Analytics engine. Craig decided to build with Workers AI runWithTools to provide a chat interface for admins.

[ Play ](https://youtube.com/watch?v=Id5oKCa%5F%5FIA) 

Tool Calling Also Known as Function Calling on Cloudflare Workers AI

Tool calling, also known as function calling, is a powerful concept that lets you build Large Language Model based applications that can perform actions and retrieve external information from defined tools.

[ Play ](https://youtube.com/watch?v=GRpwVMkVmKo) 

API Roll (Father's Day)

This walks through how to use Workers AI with Hono and Zod to create a streaming pun generating API.

[ Play ](https://youtube.com/watch?v=MLbo7MGY%5FlU) 

AI can see clearly now - Build Vision Apps on Cloudflare Workers AI

The LlaVa model is hosted on Cloudflare Workers AI. Which means you are an API call away from brand new powerful vision use cases in all of your applications.

[ Play ](https://youtube.com/watch?v=5UTExUQ8Fwo) 

Workers AI - Getting Started - Vanilla Chat App

Get started building AI apps on Cloudflare using Pages and the GitHub starter template for a Vanilla JavaScript Chat App.

[ Play ](https://youtube.com/watch?v=8SnrvAYAJ4Q) 

Image Generation, Inpainting, and Vision Models

Is that person you are about to swipe right on, actually real? Are they AI Generated?

[ Play ](https://youtube.com/watch?v=9JM5Z0KzQsQ) 

Learn AI Development (models, embeddings, vectors)

In this workshop, Kristian Freeman, Cloudflare Developer Advocate, teaches the basics of AI Development - models, embeddings, and vectors (including vector databases).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}}]}
```

---

---
title: Build a Retrieval Augmented Generation (RAG) AI
description: Build your first AI app with Cloudflare AI. This guide uses Workers AI, Vectorize, D1, and Cloudflare Workers.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ Hono ](https://developers.cloudflare.com/search/?tags=Hono)[ JavaScript ](https://developers.cloudflare.com/search/?tags=JavaScript) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Build a Retrieval Augmented Generation (RAG) AI

**Last reviewed:**  over 1 year ago 

This guide will instruct you through setting up and deploying your first application with Cloudflare AI. You will build a fully-featured AI-powered application, using tools like Workers AI, Vectorize, D1, and Cloudflare Workers.

Looking for a managed option?

[AI Search](https://developers.cloudflare.com/ai-search/) offers a fully managed way to build RAG pipelines on Cloudflare, handling ingestion, indexing, and querying out of the box. [Get started](https://developers.cloudflare.com/ai-search/get-started/).

At the end of this tutorial, you will have built an AI tool that allows you to store information and query it using a Large Language Model. This pattern, known as Retrieval Augmented Generation, or RAG, is a useful project you can build by combining multiple aspects of Cloudflare's AI toolkit. You do not need to have experience working with AI tools to build this application.

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages).
2. Install [Node.js ↗](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

You will also need access to [Vectorize](https://developers.cloudflare.com/vectorize/platform/pricing/). During this tutorial, we will show how you can optionally integrate with [Anthropic Claude ↗](http://anthropic.com) as well. You will need an [Anthropic API key ↗](https://docs.anthropic.com/en/api/getting-started) to do so.

## 1\. Create a new Worker project

C3 (`create-cloudflare-cli`) is a command-line tool designed to help you setup and deploy Workers to Cloudflare as fast as possible.

Open a terminal window and run C3 to create your Worker project:

 npm  yarn  pnpm 

```
npm create cloudflare@latest -- rag-ai-tutorial
```

```
yarn create cloudflare rag-ai-tutorial
```

```
pnpm create cloudflare@latest rag-ai-tutorial
```

For setup, select the following options:

* For _What would you like to start with?_, choose `Hello World example`.
* For _Which template would you like to use?_, choose `Worker only`.
* For _Which language do you want to use?_, choose `JavaScript`.
* For _Do you want to use git for version control?_, choose `Yes`.
* For _Do you want to deploy your application?_, choose `No` (we will be making some changes before deploying).

In your project directory, C3 has generated several files.

What files did C3 create?

1. `wrangler.jsonc`: Your [Wrangler](https://developers.cloudflare.com/workers/wrangler/configuration/#sample-wrangler-configuration) configuration file.
2. `index.js` (in `/src`): A minimal `'Hello World!'` Worker written in [ES module](https://developers.cloudflare.com/workers/reference/migrate-to-module-workers/) syntax.
3. `package.json`: A minimal Node dependencies configuration file.
4. `package-lock.json`: Refer to [npm documentation on package-lock.json ↗](https://docs.npmjs.com/cli/v9/configuring-npm/package-lock-json).
5. `node_modules`: Refer to [npm documentation node\_modules ↗](https://docs.npmjs.com/cli/v7/configuring-npm/folders#node-modules).

Now, move into your newly created directory:

Terminal window

```

cd rag-ai-tutorial


```

## 2\. Develop with Wrangler CLI

The Workers command-line interface, [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), allows you to [create](https://developers.cloudflare.com/workers/wrangler/commands/general/#init), [test](https://developers.cloudflare.com/workers/wrangler/commands/general/#dev), and [deploy](https://developers.cloudflare.com/workers/wrangler/commands/general/#deploy) your Workers projects. C3 will install Wrangler in projects by default.

After you have created your first Worker, run the [wrangler dev](https://developers.cloudflare.com/workers/wrangler/commands/general/#dev) command in the project directory to start a local server for developing your Worker. This will allow you to test your Worker locally during development.

Terminal window

```

npx wrangler dev


```

You will now be able to go to [http://localhost:8787 ↗](http://localhost:8787) to see your Worker running. Any changes you make to your code will trigger a rebuild, and reloading the page will show you the up-to-date output of your Worker.

## 3\. Adding the AI binding

To begin using Cloudflare's AI products, you can add the `ai` block to the [Wrangler configuration file](https://developers.cloudflare.com/workers/wrangler/configuration/) as a [remote binding](https://developers.cloudflare.com/workers/development-testing/#remote-bindings). This will set up a binding to Cloudflare's AI models in your code that you can use to interact with the available AI models on the platform.

Note

If you have not used Wrangler before, it will try to open your web browser to login with your Cloudflare account.

If you have issues with this step or you do not have access to a browser interface, refer to the [wrangler login](https://developers.cloudflare.com/workers/wrangler/commands/general/#login) documentation for more information.

This example features the [@cf/meta/llama-3-8b-instruct model](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/), which generates text.

* [  wrangler.jsonc ](#tab-panel-6945)
* [  wrangler.toml ](#tab-panel-6946)

```

{

  "ai": {

    "binding": "AI",

    "remote": true

  }

}


```

```

[ai]

binding = "AI"

remote = true


```

Now, find the `src/index.js` file. Inside the `fetch` handler, you can query the `AI` binding:

JavaScript

```

export default {

  async fetch(request, env, ctx) {

    const answer = await env.AI.run("@cf/meta/llama-3-8b-instruct", {

      messages: [{ role: "user", content: `What is the square root of 9?` }],

    });


    return new Response(JSON.stringify(answer));

  },

};


```

By querying the LLM through the `AI` binding, we can interact directly with Cloudflare AI's large language models directly in our code. In this example, we are using the [@cf/meta/llama-3-8b-instruct model](https://developers.cloudflare.com/workers-ai/models/llama-3-8b-instruct/), which generates text.

Deploy your Worker using `wrangler`:

Terminal window

```

npx wrangler deploy


```

Making a request to your Worker will now generate a text response from the LLM, and return it as a JSON object.

Terminal window

```

curl https://example.username.workers.dev


```

```

{"response":"Answer: The square root of 9 is 3."}


```

## 4\. Adding embeddings using Cloudflare D1 and Vectorize

Embeddings allow you to add additional capabilities to the language models you can use in your Cloudflare AI projects. This is done via **Vectorize**, Cloudflare's vector database.

To begin using Vectorize, create a new embeddings index using `wrangler`. This index will store vectors with 768 dimensions, and will use cosine similarity to determine which vectors are most similar to each other:

Terminal window

```

npx wrangler vectorize create vector-index --dimensions=768 --metric=cosine


```

Then, add the configuration details for your new Vectorize index to the [Wrangler configuration file](https://developers.cloudflare.com/workers/wrangler/configuration/):

* [  wrangler.jsonc ](#tab-panel-6947)
* [  wrangler.toml ](#tab-panel-6948)

```

{

  // ... existing wrangler configuration

  "vectorize": [

    {

      "binding": "VECTOR_INDEX",

      "index_name": "vector-index"

    }

  ]

}


```

```

[[vectorize]]

binding = "VECTOR_INDEX"

index_name = "vector-index"


```

A vector index allows you to store a collection of dimensions, which are floating point numbers used to represent your data. When you want to query the vector database, you can also convert your query into dimensions. **Vectorize** is designed to efficiently determine which stored vectors are most similar to your query.

To implement the searching feature, you must set up a D1 database from Cloudflare. In D1, you can store your app's data. Then, you change this data into a vector format. When someone searches and it matches the vector, you can show them the matching data.

Create a new D1 database using `wrangler`:

Terminal window

```

npx wrangler d1 create database


```

Then, paste the configuration details output from the previous command into the [Wrangler configuration file](https://developers.cloudflare.com/workers/wrangler/configuration/):

* [  wrangler.jsonc ](#tab-panel-6949)
* [  wrangler.toml ](#tab-panel-6950)

```

{

  // ... existing wrangler configuration

  "d1_databases": [

    {

      "binding": "DB", // available in your Worker on env.DB

      "database_name": "database",

      "database_id": "abc-def-geh" // replace this with a real database_id (UUID)

    }

  ]

}


```

```

[[d1_databases]]

binding = "DB"

database_name = "database"

database_id = "abc-def-geh"


```

In this application, we'll create a `notes` table in D1, which will allow us to store notes and later retrieve them in Vectorize. To create this table, run a SQL command using `wrangler d1 execute`:

Terminal window

```

npx wrangler d1 execute database --remote --command "CREATE TABLE IF NOT EXISTS notes (id INTEGER PRIMARY KEY, text TEXT NOT NULL)"


```

Now, we can add a new note to our database using `wrangler d1 execute`:

Terminal window

```

npx wrangler d1 execute database --remote --command "INSERT INTO notes (text) VALUES ('The best pizza topping is pepperoni')"


```

## 5\. Creating a workflow

Before we begin creating notes, we will introduce a [Cloudflare Workflow](https://developers.cloudflare.com/workflows). This will allow us to define a durable workflow that can safely and robustly execute all the steps of the RAG process.

To begin, add a new `[[workflows]]` block to your [Wrangler configuration file](https://developers.cloudflare.com/workers/wrangler/configuration/):

* [  wrangler.jsonc ](#tab-panel-6951)
* [  wrangler.toml ](#tab-panel-6952)

```

{

  // ... existing wrangler configuration

  "workflows": [

    {

      "name": "rag",

      "binding": "RAG_WORKFLOW",

      "class_name": "RAGWorkflow"

    }

  ]

}


```

```

[[workflows]]

name = "rag"

binding = "RAG_WORKFLOW"

class_name = "RAGWorkflow"


```

In `src/index.js`, add a new class called `RAGWorkflow` that extends `WorkflowEntrypoint`:

JavaScript

```

import { WorkflowEntrypoint } from "cloudflare:workers";


export class RAGWorkflow extends WorkflowEntrypoint {

  async run(event, step) {

    await step.do("example step", async () => {

      console.log("Hello World!");

    });

  }

}


```

This class will define a single workflow step that will log "Hello World!" to the console. You can add as many steps as you need to your workflow.

On its own, this workflow will not do anything. To execute the workflow, we will call the `RAG_WORKFLOW` binding, passing in any parameters that the workflow needs to properly complete. Here is an example of how we can call the workflow:

JavaScript

```

env.RAG_WORKFLOW.create({ params: { text } });


```

## 6\. Creating notes and adding them to Vectorize

To expand on your Workers function in order to handle multiple routes, we will add `hono`, a routing library for Workers. This will allow us to create a new route for adding notes to our database. Install `hono` using `npm`:

 npm  yarn  pnpm  bun 

```
npm i hono
```

```
yarn add hono
```

```
pnpm add hono
```

```
bun add hono
```

Then, import `hono` into your `src/index.js` file. You should also update the `fetch` handler to use `hono`:

JavaScript

```

import { Hono } from "hono";

const app = new Hono();


app.get("/", async (c) => {

  const answer = await c.env.AI.run("@cf/meta/llama-3-8b-instruct", {

    messages: [{ role: "user", content: `What is the square root of 9?` }],

  });


  return c.json(answer);

});


export default app;


```

This will establish a route at the root path `/` that is functionally equivalent to the previous version of your application.

Now, we can update our workflow to begin adding notes to our database, and generating the related embeddings for them.

This example features the [@cf/baai/bge-base-en-v1.5 model](https://developers.cloudflare.com/workers-ai/models/bge-base-en-v1.5/), which can be used to create an embedding. Embeddings are stored and retrieved inside [Vectorize](https://developers.cloudflare.com/vectorize/), Cloudflare's vector database. The user query is also turned into an embedding so that it can be used for searching within Vectorize.

JavaScript

```

import { WorkflowEntrypoint } from "cloudflare:workers";


export class RAGWorkflow extends WorkflowEntrypoint {

  async run(event, step) {

    const env = this.env;

    const { text } = event.payload;


    const record = await step.do(`create database record`, async () => {

      const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";


      const { results } = await env.DB.prepare(query).bind(text).run();


      const record = results[0];

      if (!record) throw new Error("Failed to create note");

      return record;

    });


    const embedding = await step.do(`generate embedding`, async () => {

      const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {

        text: text,

      });

      const values = embeddings.data[0];

      if (!values) throw new Error("Failed to generate vector embedding");

      return values;

    });


    await step.do(`insert vector`, async () => {

      return env.VECTOR_INDEX.upsert([

        {

          id: record.id.toString(),

          values: embedding,

        },

      ]);

    });

  }

}


```

The workflow does the following things:

1. Accepts a `text` parameter.
2. Insert a new row into the `notes` table in D1, and retrieve the `id` of the new row.
3. Convert the `text` into a vector using the `embeddings` model of the LLM binding.
4. Upsert the `id` and `vectors` into the `vector-index` index in Vectorize.

By doing this, you will create a new vector representation of the note, which can be used to retrieve the note later.

To complete the code, we will add a route that allows users to submit notes to the database. This route will parse the JSON request body, get the `note` parameter, and create a new instance of the workflow, passing the parameter:

JavaScript

```

app.post("/notes", async (c) => {

  const { text } = await c.req.json();

  if (!text) return c.text("Missing text", 400);

  await c.env.RAG_WORKFLOW.create({ params: { text } });

  return c.text("Created note", 201);

});


```

## 7\. Querying Vectorize to retrieve notes

To complete your code, you can update the root path (`/`) to query Vectorize. You will convert the query into a vector, and then use the `vector-index` index to find the most similar vectors.

The `topK` parameter limits the number of vectors returned by the function. For instance, providing a `topK` of 1 will only return the _most similar_ vector based on the query. Setting `topK` to 5 will return the 5 most similar vectors.

Given a list of similar vectors, you can retrieve the notes that match the record IDs stored alongside those vectors. In this case, we are only retrieving a single note - but you may customize this as needed.

You can insert the text of those notes as context into the prompt for the LLM binding. This is the basis of Retrieval-Augmented Generation, or RAG: providing additional context from data outside of the LLM to enhance the text generated by the LLM.

We'll update the prompt to include the context, and to ask the LLM to use the context when responding:

JavaScript

```

import { Hono } from "hono";

const app = new Hono();


// Existing post route...

// app.post('/notes', async (c) => { ... })


app.get("/", async (c) => {

  const question = c.req.query("text") || "What is the square root of 9?";


  const embeddings = await c.env.AI.run("@cf/baai/bge-base-en-v1.5", {

    text: question,

  });

  const vectors = embeddings.data[0];


  const vectorQuery = await c.env.VECTOR_INDEX.query(vectors, { topK: 1 });

  let vecId;

  if (

    vectorQuery.matches &&

    vectorQuery.matches.length > 0 &&

    vectorQuery.matches[0]

  ) {

    vecId = vectorQuery.matches[0].id;

  } else {

    console.log("No matching vector found or vectorQuery.matches is empty");

  }


  let notes = [];

  if (vecId) {

    const query = `SELECT * FROM notes WHERE id = ?`;

    const { results } = await c.env.DB.prepare(query).bind(vecId).run();

    if (results) notes = results.map((vec) => vec.text);

  }


  const contextMessage = notes.length

    ? `Context:\n${notes.map((note) => `- ${note}`).join("\n")}`

    : "";


  const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`;


  const { response: answer } = await c.env.AI.run(

    "@cf/meta/llama-3-8b-instruct",

    {

      messages: [

        ...(notes.length ? [{ role: "system", content: contextMessage }] : []),

        { role: "system", content: systemPrompt },

        { role: "user", content: question },

      ],

    },

  );


  return c.text(answer);

});


app.onError((err, c) => {

  return c.text(err);

});


export default app;


```

## 8\. Adding Anthropic Claude model (optional)

If you are working with larger documents, you have the option to use Anthropic's [Claude models ↗](https://claude.ai/), which have large context windows and are well-suited to RAG workflows.

To begin, install the `@anthropic-ai/sdk` package:

 npm  yarn  pnpm  bun 

```
npm i @anthropic-ai/sdk
```

```
yarn add @anthropic-ai/sdk
```

```
pnpm add @anthropic-ai/sdk
```

```
bun add @anthropic-ai/sdk
```

In `src/index.js`, you can update the `GET /` route to check for the `ANTHROPIC_API_KEY` environment variable. If it is set, we can generate text using the Anthropic SDK. If it is not set, we'll fall back to the existing Workers AI code:

JavaScript

```

import Anthropic from '@anthropic-ai/sdk';


app.get('/', async (c) => {

  // ... Existing code

  const systemPrompt = `When answering the question or responding, use the context provided, if it is provided and relevant.`


  let modelUsed = ""

  let response = null


  if (c.env.ANTHROPIC_API_KEY) {

    const anthropic = new Anthropic({

      apiKey: c.env.ANTHROPIC_API_KEY

    })


    const model = "claude-3-5-sonnet-latest"

    modelUsed = model


    const message = await anthropic.messages.create({

      max_tokens: 1024,

      model,

      messages: [

        { role: 'user', content: question }

      ],

      system: [systemPrompt, notes ? contextMessage : ''].join(" ")

    })


    response = {

      response: message.content.map(content => content.text).join("\n")

    }

  } else {

    const model = "@cf/meta/llama-3.1-8b-instruct"

    modelUsed = model


    response = await c.env.AI.run(

      model,

      {

        messages: [

          ...(notes.length ? [{ role: 'system', content: contextMessage }] : []),

          { role: 'system', content: systemPrompt },

          { role: 'user', content: question }

        ]

      }

    )

  }


  if (response) {

    c.header('x-model-used', modelUsed)

    return c.text(response.response)

  } else {

    return c.text("We were unable to generate output", 500)

  }

})


```

Finally, you'll need to set the `ANTHROPIC_API_KEY` environment variable in your Workers application. You can do this by using `wrangler secret put`:

Terminal window

```

$ npx wrangler secret put ANTHROPIC_API_KEY


```

## 9\. Deleting notes and vectors

If you no longer need a note, you can delete it from the database. Any time that you delete a note, you will also need to delete the corresponding vector from Vectorize. You can implement this by building a `DELETE /notes/:id` route in your `src/index.js` file:

JavaScript

```

app.delete("/notes/:id", async (c) => {

  const { id } = c.req.param();


  const query = `DELETE FROM notes WHERE id = ?`;

  await c.env.DB.prepare(query).bind(id).run();


  await c.env.VECTOR_INDEX.deleteByIds([id]);


  return c.status(204);

});


```

## 10\. Text splitting (optional)

For large pieces of text, it is recommended to split the text into smaller chunks. This allows LLMs to more effectively gather relevant context, without needing to retrieve large pieces of text.

To implement this, we'll add a new NPM package to our project, \`@langchain/textsplitters':

 npm  yarn  pnpm  bun 

```
npm i @langchain/textsplitters
```

```
yarn add @langchain/textsplitters
```

```
pnpm add @langchain/textsplitters
```

```
bun add @langchain/textsplitters
```

The `RecursiveCharacterTextSplitter` class provided by this package will split the text into smaller chunks. It can be customized to your liking, but the default config works in most cases:

JavaScript

```

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";


const text = "Some long piece of text...";


const splitter = new RecursiveCharacterTextSplitter({

  // These can be customized to change the chunking size

  // chunkSize: 1000,

  // chunkOverlap: 200,

});


const output = await splitter.createDocuments([text]);

console.log(output); // [{ pageContent: 'Some long piece of text...' }]


```

To use this splitter, we'll update the workflow to split the text into smaller chunks. We'll then iterate over the chunks and run the rest of the workflow for each chunk of text:

JavaScript

```

export class RAGWorkflow extends WorkflowEntrypoint {

  async run(event, step) {

    const env = this.env;

    const { text } = event.payload;

    let texts = await step.do("split text", async () => {

      const splitter = new RecursiveCharacterTextSplitter();

      const output = await splitter.createDocuments([text]);

      return output.map((doc) => doc.pageContent);

    });


    console.log(

      "RecursiveCharacterTextSplitter generated ${texts.length} chunks",

    );


    for (const index in texts) {

      const text = texts[index];

      const record = await step.do(

        `create database record: ${index}/${texts.length}`,

        async () => {

          const query = "INSERT INTO notes (text) VALUES (?) RETURNING *";


          const { results } = await env.DB.prepare(query).bind(text).run();


          const record = results[0];

          if (!record) throw new Error("Failed to create note");

          return record;

        },

      );


      const embedding = await step.do(

        `generate embedding: ${index}/${texts.length}`,

        async () => {

          const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {

            text: text,

          });

          const values = embeddings.data[0];

          if (!values) throw new Error("Failed to generate vector embedding");

          return values;

        },

      );


      await step.do(`insert vector: ${index}/${texts.length}`, async () => {

        return env.VECTOR_INDEX.upsert([

          {

            id: record.id.toString(),

            values: embedding,

          },

        ]);

      });

    }

  }

}


```

Now, when large pieces of text are submitted to the `/notes` endpoint, they will be split into smaller chunks, and each chunk will be processed by the workflow.

## 11\. Deploy your project

If you did not deploy your Worker during [step 1](https://developers.cloudflare.com/workers/get-started/guide/#1-create-a-new-worker-project), deploy your Worker via Wrangler, to a `*.workers.dev` subdomain, or a [Custom Domain](https://developers.cloudflare.com/workers/configuration/routing/custom-domains/), if you have one configured. If you have not configured any subdomain or domain, Wrangler will prompt you during the publish process to set one up.

Terminal window

```

npx wrangler deploy


```

Preview your Worker at `<YOUR_WORKER>.<YOUR_SUBDOMAIN>.workers.dev`.

Note

When pushing to your `*.workers.dev` subdomain for the first time, you may see [523 errors](https://developers.cloudflare.com/support/troubleshooting/http-status-codes/cloudflare-5xx-errors/error-523/) while DNS is propagating. These errors should resolve themselves after a minute or so.

## Related resources

A full version of this codebase is available on GitHub. It includes a frontend UI for querying, adding, and deleting notes, as well as a backend API for interacting with the database and vector index. You can find it here: [github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example ↗](https://github.com/kristianfreeman/cloudflare-retrieval-augmented-generation-example/).

To do more:

* Explore the reference diagram for a [Retrieval Augmented Generation (RAG) Architecture](https://developers.cloudflare.com/reference-architecture/diagrams/ai/ai-rag/).
* Review Cloudflare's [AI documentation](https://developers.cloudflare.com/workers-ai).
* Review [Tutorials](https://developers.cloudflare.com/workers/tutorials/) to build projects on Workers.
* Explore [Examples](https://developers.cloudflare.com/workers/examples/) to experiment with copy and paste Worker code.
* Understand how Workers works in [Reference](https://developers.cloudflare.com/workers/reference/).
* Learn about Workers features and functionality in [Platform](https://developers.cloudflare.com/workers/platform/).
* Set up [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/) to programmatically create, test, and deploy your Worker projects.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/","name":"Build a Retrieval Augmented Generation (RAG) AI"}}]}
```

---

---
title: Whisper-large-v3-turbo with Cloudflare Workers AI
description: Learn how to transcribe large audio files using Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Whisper-large-v3-turbo with Cloudflare Workers AI

**Last reviewed:**  about 1 year ago 

In this tutorial you will learn how to:

* **Transcribe large audio files:** Use the [Whisper-large-v3-turbo](https://developers.cloudflare.com/workers-ai/models/whisper-large-v3-turbo/) model from Cloudflare Workers AI to perform automatic speech recognition (ASR) or translation.
* **Handle large files:** Split large audio files into smaller chunks for processing, which helps overcome memory and execution time limitations.
* **Deploy using Cloudflare Workers:** Create a scalable, low‑latency transcription pipeline in a serverless environment.

## 1: Create a new Cloudflare Worker project

1. Sign up for a [Cloudflare account ↗](https://dash.cloudflare.com/sign-up/workers-and-pages).
2. Install [Node.js ↗](https://docs.npmjs.com/downloading-and-installing-node-js-and-npm).

Node.js version manager

Use a Node version manager like [Volta ↗](https://volta.sh/) or [nvm ↗](https://github.com/nvm-sh/nvm) to avoid permission issues and change Node.js versions. [Wrangler](https://developers.cloudflare.com/workers/wrangler/install-and-update/), discussed later in this guide, requires a Node version of `16.17.0` or later.

You will create a new Worker project using the `create-cloudflare` CLI (C3). [C3 ↗](https://github.com/cloudflare/workers-sdk/tree/main/packages/create-cloudflare) is a command-line tool designed to help you set up and deploy new applications to Cloudflare.

Create a new project named `whisper-tutorial` by running:

 npm  yarn  pnpm 

```
npm create cloudflare@latest -- whisper-tutorial
```

```
yarn create cloudflare whisper-tutorial
```

```
pnpm create cloudflare@latest whisper-tutorial
```

Running `npm create cloudflare@latest` will prompt you to install the [create-cloudflare package ↗](https://www.npmjs.com/package/create-cloudflare), and lead you through setup. C3 will also install [Wrangler](https://developers.cloudflare.com/workers/wrangler/), the Cloudflare Developer Platform CLI.

For setup, select the following options:

* For _What would you like to start with?_, choose `Hello World example`.
* For _Which template would you like to use?_, choose `Worker only`.
* For _Which language do you want to use?_, choose `TypeScript`.
* For _Do you want to use git for version control?_, choose `Yes`.
* For _Do you want to deploy your application?_, choose `No` (we will be making some changes before deploying).

This will create a new `whisper-tutorial` directory. Your new `whisper-tutorial` directory will include:

* A `"Hello World"` [Worker](https://developers.cloudflare.com/workers/get-started/guide/#3-write-code) at `src/index.ts`.
* A [wrangler.jsonc](https://developers.cloudflare.com/workers/wrangler/configuration/) configuration file.

Go to your application directory:

Terminal window

```

cd whisper-tutorial


```

## 2\. Connect your Worker to Workers AI

You must create an AI binding for your Worker to connect to Workers AI. [Bindings](https://developers.cloudflare.com/workers/runtime-apis/bindings/) allow your Workers to interact with resources, like Workers AI, on the Cloudflare Developer Platform.

To bind Workers AI to your Worker, add the following to the end of your Wrangler configuration file:

* [  wrangler.jsonc ](#tab-panel-6953)
* [  wrangler.toml ](#tab-panel-6954)

```

{

  "ai": {

    "binding": "AI"

  }

}


```

```

[ai]

binding = "AI"


```

Your binding is [available in your Worker code](https://developers.cloudflare.com/workers/reference/migrate-to-module-workers/#bindings-in-es-modules-format) on [env.AI](https://developers.cloudflare.com/workers/runtime-apis/handlers/fetch/).

## 3\. Configure Wrangler

In your wrangler file, add or update the following settings to enable Node.js APIs and polyfills (with a compatibility date of 2024‑09‑23 or later):

* [  wrangler.jsonc ](#tab-panel-6955)
* [  wrangler.toml ](#tab-panel-6956)

```

{

  "compatibility_flags": [

    "nodejs_compat"

  ],

  // Set this to today's date

  "compatibility_date": "2026-04-03"

}


```

```

compatibility_flags = [ "nodejs_compat" ]

# Set this to today's date

compatibility_date = "2026-04-03"


```

## 4\. Handle large audio files with chunking

Replace the contents of your `src/index.ts` file with the following integrated code. This sample demonstrates how to:

(1) Extract an audio file URL from the query parameters.

(2) Fetch the audio file while explicitly following redirects.

(3) Split the audio file into smaller chunks (such as, 1 MB chunks).

(4) Transcribe each chunk using the Whisper-large-v3-turbo model via the Cloudflare AI binding.

(5) Return the aggregated transcription as plain text.

TypeScript

```

import { Buffer } from "node:buffer";

import type { Ai } from "workers-ai";


export interface Env {

  AI: Ai;

  // If needed, add your KV namespace for storing transcripts.

  // MY_KV_NAMESPACE: KVNamespace;

}


/**

 * Fetches the audio file from the provided URL and splits it into chunks.

 * This function explicitly follows redirects.

 *

 * @param audioUrl - The URL of the audio file.

 * @returns An array of ArrayBuffers, each representing a chunk of the audio.

 */

async function getAudioChunks(audioUrl: string): Promise<ArrayBuffer[]> {

  const response = await fetch(audioUrl, { redirect: "follow" });

  if (!response.ok) {

    throw new Error(`Failed to fetch audio: ${response.status}`);

  }

  const arrayBuffer = await response.arrayBuffer();


  // Example: Split the audio into 1MB chunks.

  const chunkSize = 1024 * 1024; // 1MB

  const chunks: ArrayBuffer[] = [];

  for (let i = 0; i < arrayBuffer.byteLength; i += chunkSize) {

    const chunk = arrayBuffer.slice(i, i + chunkSize);

    chunks.push(chunk);

  }

  return chunks;

}


/**

 * Transcribes a single audio chunk using the Whisper‑large‑v3‑turbo model.

 * The function converts the audio chunk to a Base64-encoded string and

 * sends it to the model via the AI binding.

 *

 * @param chunkBuffer - The audio chunk as an ArrayBuffer.

 * @param env - The Cloudflare Worker environment, including the AI binding.

 * @returns The transcription text from the model.

 */

async function transcribeChunk(

  chunkBuffer: ArrayBuffer,

  env: Env,

): Promise<string> {

  const base64 = Buffer.from(chunkBuffer, "binary").toString("base64");

  const res = await env.AI.run("@cf/openai/whisper-large-v3-turbo", {

    audio: base64,

    // Optional parameters (uncomment and set if needed):

    // task: "transcribe",   // or "translate"

    // language: "en",

    // vad_filter: "false",

    // initial_prompt: "Provide context if needed.",

    // prefix: "Transcription:",

  });

  return res.text; // Assumes the transcription result includes a "text" property.

}


/**

 * The main fetch handler. It extracts the 'url' query parameter, fetches the audio,

 * processes it in chunks, and returns the full transcription.

 */

export default {

  async fetch(

    request: Request,

    env: Env,

    ctx: ExecutionContext,

  ): Promise<Response> {

    // Extract the audio URL from the query parameters.

    const { searchParams } = new URL(request.url);

    const audioUrl = searchParams.get("url");


    if (!audioUrl) {

      return new Response("Missing 'url' query parameter", { status: 400 });

    }


    // Get the audio chunks.

    const audioChunks: ArrayBuffer[] = await getAudioChunks(audioUrl);

    let fullTranscript = "";


    // Process each chunk and build the full transcript.

    for (const chunk of audioChunks) {

      try {

        const transcript = await transcribeChunk(chunk, env);

        fullTranscript += transcript + "\n";

      } catch (error) {

        fullTranscript += "[Error transcribing chunk]\n";

      }

    }


    return new Response(fullTranscript, {

      headers: { "Content-Type": "text/plain" },

    });

  },

} satisfies ExportedHandler<Env>;


```

## 5\. Deploy your Worker

1. **Run the Worker locally:**  
Use wrangler's development mode to test your Worker locally:

Terminal window

```

npx wrangler dev


```

Open your browser and go to [http://localhost:8787 ↗](http://localhost:8787), or use curl:

Terminal window

```

curl "http://localhost:8787?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"


```

Replace the URL query parameter with the direct link to your audio file. (For GitHub-hosted files, ensure you use the raw file URL.)

1. **Deploy the Worker:**  
Once testing is complete, deploy your Worker with:

Terminal window

```

npx wrangler deploy


```

1. **Test the deployed Worker:**  
After deployment, test your Worker by passing the audio URL as a query parameter:

Terminal window

```

curl "https://<your-worker-subdomain>.workers.dev?url=https://raw.githubusercontent.com/your-username/your-repo/main/your-audio-file.mp3"


```

Make sure to replace `<your-worker-subdomain>`, `your-username`, `your-repo`, and `your-audio-file.mp3` with your actual details.

If successful, the Worker will return a transcript of the audio file:

Terminal window

```

This is the transcript of the audio...


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/build-a-workers-ai-whisper-with-chunking/","name":"Whisper-large-v3-turbo with Cloudflare Workers AI"}}]}
```

---

---
title: Explore Code Generation Using DeepSeek Coder Models
description: Explore how you can use AI models to generate code and work more efficiently.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ Python ](https://developers.cloudflare.com/search/?tags=Python) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/explore-code-generation-using-deepseek-coder-models.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Explore Code Generation Using DeepSeek Coder Models

**Last reviewed:**  about 2 years ago 

A handy way to explore all of the models available on [Workers AI](https://developers.cloudflare.com/workers-ai) is to use a [Jupyter Notebook ↗](https://jupyter.org/).

You can [download the DeepSeek Coder notebook](https://developers.cloudflare.com/workers-ai/static/documentation/notebooks/deepseek-coder-exploration.ipynb) or view the embedded notebook below.

---

## Exploring Code Generation Using DeepSeek Coder

AI Models being able to generate code unlocks all sorts of use cases. The [DeepSeek Coder ↗](https://github.com/deepseek-ai/DeepSeek-Coder) models `@hf/thebloke/deepseek-coder-6.7b-base-awq` and `@hf/thebloke/deepseek-coder-6.7b-instruct-awq` are now available on [Workers AI](https://developers.cloudflare.com/workers-ai).

Let's explore them using the API!

Python

```

import sys

!{sys.executable} -m pip install requests python-dotenv


```

```

Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)

Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)

Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)

Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)

Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)

Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)


```

Python

```

import os

from getpass import getpass


from IPython.display import display, Image, Markdown, Audio


import requests


```

Python

```

%load_ext dotenv

%dotenv


```

### Configuring your environment

To use the API you'll need your [Cloudflare Account ID ↗](https://dash.cloudflare.com) (head to Workers & Pages > Overview > Account details > Account ID) and a [Workers AI enabled API Token ↗](https://dash.cloudflare.com/profile/api-tokens).

If you want to add these files to your environment, you can create a new file named `.env`

Terminal window

```

CLOUDFLARE_API_TOKEN="YOUR-TOKEN"

CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"


```

Python

```

if "CLOUDFLARE_API_TOKEN" in os.environ:

    api_token = os.environ["CLOUDFLARE_API_TOKEN"]

else:

    api_token = getpass("Enter you Cloudflare API Token")


```

Python

```

if "CLOUDFLARE_ACCOUNT_ID" in os.environ:

    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]

else:

    account_id = getpass("Enter your account id")


```

### Generate code from a comment

A common use case is to complete the code for the user after they provide a descriptive comment.

Python

```

model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"


prompt = "# A function that checks if a given word is a palindrome"


response = requests.post(

    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

    headers={"Authorization": f"Bearer {api_token}"},

    json={"messages": [

        {"role": "user", "content": prompt}

    ]}

)

inference = response.json()

code = inference["result"]["response"]


display(Markdown(f"""

    ```python

    {prompt}

    {code.strip()}


```

"""))

```

```python

# A function that checks if a given word is a palindrome

def is_palindrome(word):

    # Convert the word to lowercase

    word = word.lower()


    # Reverse the word

    reversed_word = word[::-1]


    # Check if the reversed word is the same as the original word

    if word == reversed_word:

        return True

    else:

        return False


# Test the function

print(is_palindrome("racecar"))  # Output: True

print(is_palindrome("hello"))    # Output: False


```

### Assist in debugging

We've all been there, bugs happen. Sometimes those stacktraces can be very intimidating, and a great use case of using Code Generation is to assist in explaining the problem.

Python

```

model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"


system_message = "The user is going to give you code that isn't working. Explain to the user what might be wrong"


code = """# Welcomes our user

def hello_world(first_name="World"):

    print(f"Hello, {name}!")

"""


response = requests.post(

    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

    headers={"Authorization": f"Bearer {api_token}"},

    json={"messages": [

        {"role": "system", "content": system_message},

        {"role": "user", "content": code},

    ]}

)

inference = response.json()

response = inference["result"]["response"]

display(Markdown(response))


```

The error in your code is that you are trying to use a variable `name` which is not defined anywhere in your function. The correct variable to use is `first_name`. So, you should change `f"Hello, {name}!"` to `f"Hello, {first_name}!"`.

Here is the corrected code:

Python

```

# Welcomes our user

def hello_world(first_name="World"):

    print(f"Hello, {first_name}")


```

Now, when you call `hello_world()`, it will print "Hello, World" by default. If you call `hello_world("John")`, it will print "Hello, John".

### Write tests!

Writing unit tests is a common best practice. With the enough context, it's possible to write unit tests.

Python

```

model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"


system_message = "The user is going to give you code and would like to have tests written in the Python unittest module."


code = """

class User:


    def __init__(self, first_name, last_name=None):

        self.first_name = first_name

        self.last_name = last_name

        if last_name is None:

            self.last_name = "Mc" + self.first_name


    def full_name(self):

        return self.first_name + " " + self.last_name

"""


response = requests.post(

    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

    headers={"Authorization": f"Bearer {api_token}"},

    json={"messages": [

        {"role": "system", "content": system_message},

        {"role": "user", "content": code},

    ]}

)

inference = response.json()

response = inference["result"]["response"]

display(Markdown(response))


```

Here is a simple unittest test case for the User class:

Python

```

import unittest


class TestUser(unittest.TestCase):


    def test_full_name(self):

        user = User("John", "Doe")

        self.assertEqual(user.full_name(), "John Doe")


    def test_default_last_name(self):

        user = User("Jane")

        self.assertEqual(user.full_name(), "Jane McJane")


if __name__ == '__main__':

    unittest.main()


```

In this test case, we have two tests:

* `test_full_name` tests the `full_name` method when the user has both a first name and a last name.
* `test_default_last_name` tests the `full_name` method when the user only has a first name and the last name is set to "Mc" + first name.

If all these tests pass, it means that the `full_name` method is working as expected. If any of these tests fail, it

### Fill-in-the-middle Code Completion

A common use case in Developer Tools is to autocomplete based on context. DeepSeek Coder provides the ability to submit existing code with a placeholder, so that the model can complete in context.

Warning: The tokens are prefixed with `<｜` and suffixed with `｜>` make sure to copy and paste them.

Python

```

model = "@hf/thebloke/deepseek-coder-6.7b-base-awq"


code = """

<｜fim▁begin｜>import re


from jklol import email_service


def send_email(email_address, body):

    <｜fim▁hole｜>

    if not is_valid_email:

        raise InvalidEmailAddress(email_address)

    return email_service.send(email_address, body)<｜fim▁end｜>

"""


response = requests.post(

    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

    headers={"Authorization": f"Bearer {api_token}"},

    json={"messages": [

        {"role": "user", "content": code}

    ]}

)

inference = response.json()

response = inference["result"]["response"]

display(Markdown(f"""

    ```python

    {response.strip()}


```

"""))

```

```python

is_valid_email = re.match(r"[^@]+@[^@]+\.[^@]+", email_address)


```

### Experimental: Extract data into JSON

No need to threaten the model or bring grandma into the prompt. Get back JSON in the format you want.

Python

```

model = "@hf/thebloke/deepseek-coder-6.7b-instruct-awq"


# Learn more at https://json-schema.org/

json_schema = """

{

  "title": "User",

  "description": "A user from our example app",

  "type": "object",

  "properties": {

    "firstName": {

      "description": "The user's first name",

      "type": "string"

    },

    "lastName": {

      "description": "The user's last name",

      "type": "string"

    },

    "numKids": {

      "description": "Amount of children the user has currently",

      "type": "integer"

    },

    "interests": {

      "description": "A list of what the user has shown interest in",

      "type": "array",

      "items": {

        "type": "string"

      }

    },

  },

  "required": [ "firstName" ]

}

"""


system_prompt = f"""

The user is going to discuss themselves and you should create a JSON object from their description to match the json schema below.


<BEGIN JSON SCHEMA>

{json_schema}

<END JSON SCHEMA>


Return JSON only. Do not explain or provide usage examples.

"""


prompt = """Hey there, I'm Craig Dennis and I'm a Developer Educator at Cloudflare. My email is craig@cloudflare.com.

            I am very interested in AI. I've got two kids. I love tacos, burritos, and all things Cloudflare"""


response = requests.post(

    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

    headers={"Authorization": f"Bearer {api_token}"},

    json={"messages": [

        {"role": "system", "content": system_prompt},

        {"role": "user", "content": prompt}

    ]}

)

inference = response.json()

response = inference["result"]["response"]

display(Markdown(f"""

    ```json

    {response.strip()}


```

"""))

```

```json

{

  "firstName": "Craig",

  "lastName": "Dennis",

  "numKids": 2,

  "interests": ["AI", "Cloudflare", "Tacos", "Burritos"]

}


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/explore-code-generation-using-deepseek-coder-models/","name":"Explore Code Generation Using DeepSeek Coder Models"}}]}
```

---

---
title: Explore Workers AI Models Using a Jupyter Notebook
description: This Jupyter notebook explores various models (including Whisper, Distilled BERT, LLaVA, and Meta Llama 3) using Python and the requests library.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ Python ](https://developers.cloudflare.com/search/?tags=Python) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/explore-workers-ai-models-using-a-jupyter-notebook.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Explore Workers AI Models Using a Jupyter Notebook

**Last reviewed:**  over 1 year ago 

A handy way to explore all of the models available on [Workers AI](https://developers.cloudflare.com/workers-ai) is to use a [Jupyter Notebook ↗](https://jupyter.org/).

You can [download the Workers AI notebook](https://developers.cloudflare.com/workers-ai-notebooks/cloudflare-workers-ai.ipynb) or view the embedded notebook below.

Or you can run this on [Google Colab ↗](https://colab.research.google.com/github/craigsdennis/notebooks-cloudflare-workers-ai/blob/main/cloudflare-workers-ai.ipynb)

---

## Explore the Workers AI API using Python

[Workers AI](https://developers.cloudflare.com/workers-ai) allows you to run machine learning models, on the Cloudflare network, from your own code – whether that be from Workers, Pages, or anywhere via REST API.

This notebook will explore the Workers AI REST API using the [official Python SDK ↗](https://github.com/cloudflare/cloudflare-python).

Python

```

import os

from getpass import getpass


from cloudflare import Cloudflare

from IPython.display import display, Image, Markdown, Audio

import requests


```

Python

```

%load_ext dotenv

%dotenv


```

### Configuring your environment

To use the API you'll need your [Cloudflare Account ID ↗](https://dash.cloudflare.com). Head to AI > Workers AI page and press the "Use REST API". This page will let you create a new API Token and copy your Account ID.

If you want to add these values to your environment variables, you can **create a new file** named `.env` and this notebook will read those values.

Terminal window

```

CLOUDFLARE_API_TOKEN="YOUR-TOKEN"

CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"


```

Otherwise you can just enter the values securely when prompted below.

Python

```

if "CLOUDFLARE_API_TOKEN" in os.environ:

    api_token = os.environ["CLOUDFLARE_API_TOKEN"]

else:

    api_token = getpass("Enter your Cloudflare API Token")


```

Python

```

if "CLOUDFLARE_ACCOUNT_ID" in os.environ:

    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]

else:

    account_id = getpass("Enter your account id")


```

Python

```

# Initialize client

client = Cloudflare(api_token=api_token)


```

## Explore tasks available on the Workers AI Platform

### Text Generation

Explore all [Text Generation Models](https://developers.cloudflare.com/workers-ai/models)

Python

```

result = client.workers.ai.run(

    "@cf/meta/llama-3-8b-instruct" ,

    account_id=account_id,

    messages=[

        {"role": "system", "content": """

            You are a productivity assistant for users of Jupyter notebooks for both Mac and Windows users.


            Respond in Markdown."""

        },

        {"role": "user", "content": "How do I use keyboard shortcuts to execute cells?"}

    ]

)


display(Markdown(result["response"]))


```

# **Using Keyboard Shortcuts to Execute Cells in Jupyter Notebooks**

Executing cells in Jupyter Notebooks can be done quickly and efficiently using various keyboard shortcuts, saving you time and effort. Here are the shortcuts you can use:

**Mac**

* **Shift + Enter**: Execute the current cell and insert a new cell below.
* **Ctrl + Enter**: Execute the current cell and insert a new cell below, without creating a new output display.

**Windows/Linux**

* **Shift + Enter**: Execute the current cell and insert a new cell below.
* **Ctrl + Enter**: Execute the current cell and move to the next cell.

**Additional Shortcuts**

* **Alt + Enter**: Execute the current cell and create a new output display below (Mac), or move to the next cell (Windows/Linux).
* **Ctrl + Shift + Enter**: Execute the current cell and create a new output display below (Mac), or create a new cell below (Windows/Linux).

**Tips and Tricks**

* You can also use the **Run Cell** button in the Jupyter Notebook toolbar, or the **Run** menu option (macOS) or **Run -> Run Cell** (Windows/Linux).
* To execute a selection of cells, use **Shift + Alt + Enter** (Mac) or **Shift + Ctrl + Enter** (Windows/Linux).
* To execute a cell and move to the next cell, use **Ctrl + Shift + Enter** (all platforms).

By using these keyboard shortcuts, you'll be able to work more efficiently and quickly in your Jupyter Notebooks. Happy coding!

### Text to Image

Explore all [Text to Image models](https://developers.cloudflare.com/workers-ai/models)

Python

```

data = client.workers.ai.with_raw_response.run(

    "@cf/lykon/dreamshaper-8-lcm",

    account_id=account_id,

    prompt="A software developer incredibly excited about AI, huge smile",

)


display(Image(data.read()))


```

![png](https://developers.cloudflare.com/workers-ai-notebooks/cloudflare-workers-ai/assets/output_13_0.png) 

### Image to Text

Explore all [Image to Text](https://developers.cloudflare.com/workers-ai/models/) models

Python

```

url = "https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg"


image_request = requests.get(url, allow_redirects=True)


display(Image(image_request.content, format="jpg"))


data = client.workers.ai.run(

    "@cf/llava-hf/llava-1.5-7b-hf",

    account_id=account_id,

    image=image_request.content,

    prompt="Describe this photo",

    max_tokens=2048

)


print(data["description"])


```

![lava lamps](https://blog.cloudflare.com/content/images/2017/11/lava-lamps.jpg) 

The image features a display of various colored lava lamps. There are at least 14 lava lamps in the scene, each with a different color and design. The lamps are arranged in a visually appealing manner, with some placed closer to the foreground and others further back. The display creates an eye-catching and vibrant atmosphere, showcasing the diverse range of lava lamps available.

### Automatic Speech Recognition

Explore all [Speech Recognition models](https://developers.cloudflare.com/workers-ai/models)

Python

```

url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-rambling.mp3"

display(Audio(url))

audio = requests.get(url)


response = client.workers.ai.run(

    "@cf/openai/whisper",

    account_id=account_id,

    audio=audio.content

)


response


```

Your browser does not support the audio element.

JavaScript

```

    {'text': "Hello there, I'm making a recording for a Jupiter notebook. That's a Python notebook, Jupiter, J-U-P-Y-T-E-R. Not to be confused with the planet. Anyways, let me hear, I'm gonna talk a little bit, I'm gonna make a little bit of noise, say some hard words, I'm gonna say Kubernetes, I'm not actually even talking about Kubernetes, I just wanna see if I can do Kubernetes. Anyway, this is a test of transcription and let's see how we're dead.",

     'word_count': 84,

     'vtt': "WEBVTT\n\n00.280 --> 01.840\nHello there, I'm making a\n\n01.840 --> 04.060\nrecording for a Jupiter notebook.\n\n04.060 --> 06.440\nThat's a Python notebook, Jupiter,\n\n06.440 --> 07.720\nJ -U -P -Y -T\n\n07.720 --> 09.420\n-E -R. Not to be\n\n09.420 --> 12.140\nconfused with the planet. Anyways,\n\n12.140 --> 12.940\nlet me hear, I'm gonna\n\n12.940 --> 13.660\ntalk a little bit, I'm\n\n13.660 --> 14.600\ngonna make a little bit\n\n14.600 --> 16.180\nof noise, say some hard\n\n16.180 --> 17.540\nwords, I'm gonna say Kubernetes,\n\n17.540 --> 18.420\nI'm not actually even talking\n\n18.420 --> 19.500\nabout Kubernetes, I just wanna\n\n19.500 --> 20.300\nsee if I can do\n\n20.300 --> 22.120\nKubernetes. Anyway, this is a\n\n22.120 --> 24.080\ntest of transcription and let's\n\n24.080 --> 26.280\nsee how we're dead.",

     'words': [{'word': 'Hello',

       'start': 0.2800000011920929,

       'end': 0.7400000095367432},

      {'word': 'there,', 'start': 0.7400000095367432, 'end': 1.2400000095367432},

      {'word': "I'm", 'start': 1.2400000095367432, 'end': 1.4800000190734863},

      {'word': 'making', 'start': 1.4800000190734863, 'end': 1.6799999475479126},

      {'word': 'a', 'start': 1.6799999475479126, 'end': 1.840000033378601},

      {'word': 'recording', 'start': 1.840000033378601, 'end': 2.2799999713897705},

      {'word': 'for', 'start': 2.2799999713897705, 'end': 2.6600000858306885},

      {'word': 'a', 'start': 2.6600000858306885, 'end': 2.799999952316284},

      {'word': 'Jupiter', 'start': 2.799999952316284, 'end': 3.2200000286102295},

      {'word': 'notebook.', 'start': 3.2200000286102295, 'end': 4.059999942779541},

      {'word': "That's", 'start': 4.059999942779541, 'end': 4.28000020980835},

      {'word': 'a', 'start': 4.28000020980835, 'end': 4.380000114440918},

      {'word': 'Python', 'start': 4.380000114440918, 'end': 4.679999828338623},

      {'word': 'notebook,', 'start': 4.679999828338623, 'end': 5.460000038146973},

      {'word': 'Jupiter,', 'start': 5.460000038146973, 'end': 6.440000057220459},

      {'word': 'J', 'start': 6.440000057220459, 'end': 6.579999923706055},

      {'word': '-U', 'start': 6.579999923706055, 'end': 6.920000076293945},

      {'word': '-P', 'start': 6.920000076293945, 'end': 7.139999866485596},

      {'word': '-Y', 'start': 7.139999866485596, 'end': 7.440000057220459},

      {'word': '-T', 'start': 7.440000057220459, 'end': 7.71999979019165},

      {'word': '-E', 'start': 7.71999979019165, 'end': 7.920000076293945},

      {'word': '-R.', 'start': 7.920000076293945, 'end': 8.539999961853027},

      {'word': 'Not', 'start': 8.539999961853027, 'end': 8.880000114440918},

      {'word': 'to', 'start': 8.880000114440918, 'end': 9.300000190734863},

      {'word': 'be', 'start': 9.300000190734863, 'end': 9.420000076293945},

      {'word': 'confused', 'start': 9.420000076293945, 'end': 9.739999771118164},

      {'word': 'with', 'start': 9.739999771118164, 'end': 9.9399995803833},

      {'word': 'the', 'start': 9.9399995803833, 'end': 10.039999961853027},

      {'word': 'planet.', 'start': 10.039999961853027, 'end': 11.380000114440918},

      {'word': 'Anyways,', 'start': 11.380000114440918, 'end': 12.140000343322754},

      {'word': 'let', 'start': 12.140000343322754, 'end': 12.420000076293945},

      {'word': 'me', 'start': 12.420000076293945, 'end': 12.520000457763672},

      {'word': 'hear,', 'start': 12.520000457763672, 'end': 12.800000190734863},

      {'word': "I'm", 'start': 12.800000190734863, 'end': 12.880000114440918},

      {'word': 'gonna', 'start': 12.880000114440918, 'end': 12.9399995803833},

      {'word': 'talk', 'start': 12.9399995803833, 'end': 13.100000381469727},

      {'word': 'a', 'start': 13.100000381469727, 'end': 13.260000228881836},

      {'word': 'little', 'start': 13.260000228881836, 'end': 13.380000114440918},

      {'word': 'bit,', 'start': 13.380000114440918, 'end': 13.5600004196167},

      {'word': "I'm", 'start': 13.5600004196167, 'end': 13.65999984741211},

      {'word': 'gonna', 'start': 13.65999984741211, 'end': 13.739999771118164},

      {'word': 'make', 'start': 13.739999771118164, 'end': 13.920000076293945},

      {'word': 'a', 'start': 13.920000076293945, 'end': 14.199999809265137},

      {'word': 'little', 'start': 14.199999809265137, 'end': 14.4399995803833},

      {'word': 'bit', 'start': 14.4399995803833, 'end': 14.600000381469727},

      {'word': 'of', 'start': 14.600000381469727, 'end': 14.699999809265137},

      {'word': 'noise,', 'start': 14.699999809265137, 'end': 15.460000038146973},

      {'word': 'say', 'start': 15.460000038146973, 'end': 15.859999656677246},

      {'word': 'some', 'start': 15.859999656677246, 'end': 16},

      {'word': 'hard', 'start': 16, 'end': 16.18000030517578},

      {'word': 'words,', 'start': 16.18000030517578, 'end': 16.540000915527344},

      {'word': "I'm", 'start': 16.540000915527344, 'end': 16.639999389648438},

      {'word': 'gonna', 'start': 16.639999389648438, 'end': 16.719999313354492},

      {'word': 'say', 'start': 16.719999313354492, 'end': 16.920000076293945},

      {'word': 'Kubernetes,',

       'start': 16.920000076293945,

       'end': 17.540000915527344},

      {'word': "I'm", 'start': 17.540000915527344, 'end': 17.65999984741211},

      {'word': 'not', 'start': 17.65999984741211, 'end': 17.719999313354492},

      {'word': 'actually', 'start': 17.719999313354492, 'end': 18},

      {'word': 'even', 'start': 18, 'end': 18.18000030517578},

      {'word': 'talking', 'start': 18.18000030517578, 'end': 18.420000076293945},

      {'word': 'about', 'start': 18.420000076293945, 'end': 18.6200008392334},

      {'word': 'Kubernetes,', 'start': 18.6200008392334, 'end': 19.1200008392334},

      {'word': 'I', 'start': 19.1200008392334, 'end': 19.239999771118164},

      {'word': 'just', 'start': 19.239999771118164, 'end': 19.360000610351562},

      {'word': 'wanna', 'start': 19.360000610351562, 'end': 19.5},

      {'word': 'see', 'start': 19.5, 'end': 19.719999313354492},

      {'word': 'if', 'start': 19.719999313354492, 'end': 19.8799991607666},

      {'word': 'I', 'start': 19.8799991607666, 'end': 19.940000534057617},

      {'word': 'can', 'start': 19.940000534057617, 'end': 20.079999923706055},

      {'word': 'do', 'start': 20.079999923706055, 'end': 20.299999237060547},

      {'word': 'Kubernetes.',

       'start': 20.299999237060547,

       'end': 21.440000534057617},

      {'word': 'Anyway,', 'start': 21.440000534057617, 'end': 21.799999237060547},

      {'word': 'this', 'start': 21.799999237060547, 'end': 21.920000076293945},

      {'word': 'is', 'start': 21.920000076293945, 'end': 22.020000457763672},

      {'word': 'a', 'start': 22.020000457763672, 'end': 22.1200008392334},

      {'word': 'test', 'start': 22.1200008392334, 'end': 22.299999237060547},

      {'word': 'of', 'start': 22.299999237060547, 'end': 22.639999389648438},

      {'word': 'transcription',

       'start': 22.639999389648438,

       'end': 23.139999389648438},

      {'word': 'and', 'start': 23.139999389648438, 'end': 23.6200008392334},

      {'word': "let's", 'start': 23.6200008392334, 'end': 24.079999923706055},

      {'word': 'see', 'start': 24.079999923706055, 'end': 24.299999237060547},

      {'word': 'how', 'start': 24.299999237060547, 'end': 24.559999465942383},

      {'word': "we're", 'start': 24.559999465942383, 'end': 24.799999237060547},

      {'word': 'dead.', 'start': 24.799999237060547, 'end': 26.280000686645508}]}


```

### Translations

Explore all [Translation models](https://developers.cloudflare.com/workers-ai/models)

Python

```

result = client.workers.ai.run(

    "@cf/meta/m2m100-1.2b",

    account_id=account_id,

    text="Artificial intelligence is pretty impressive these days. It is a bonkers time to be a builder",

    source_lang="english",

    target_lang="spanish"

)


print(result["translated_text"])


```

La inteligencia artificial es bastante impresionante en estos días.Es un buen momento para ser un constructor

### Text Classification

Explore all [Text Classification models](https://developers.cloudflare.com/workers-ai/models)

Python

```

result = client.workers.ai.run(

    "@cf/huggingface/distilbert-sst-2-int8",

    account_id=account_id,

    text="This taco is delicious"

)


result


```

\[TextClassification(label='NEGATIVE', score=0.00012679687642958015), TextClassification(label='POSITIVE', score=0.999873161315918)\]

### Image Classification

Explore all [Image Classification models](https://developers.cloudflare.com/workers-ai/models#image-classification/)

Python

```

url = "https://raw.githubusercontent.com/craigsdennis/notebooks-cloudflare-workers-ai/main/assets/craig-and-a-burrito.jpg"

image_request = requests.get(url, allow_redirects=True)


display(Image(image_request.content, format="jpg"))

response = client.workers.ai.run(

    "@cf/microsoft/resnet-50",

    account_id=account_id,

    image=image_request.content

)

response


```

![jpeg](https://developers.cloudflare.com/workers-ai-notebooks/cloudflare-workers-ai/assets/output_27_0.jpg) 

\[TextClassification(label='BURRITO', score=0.9999679327011108), TextClassification(label='GUACAMOLE', score=8.516660273016896e-06), TextClassification(label='BAGEL', score=4.689153229264775e-06), TextClassification(label='SPATULA', score=4.075985089002643e-06), TextClassification(label='POTPIE', score=3.0849002996546915e-06)\]

## Summarization

Explore all [Summarization](https://developers.cloudflare.com/workers-ai/models#summarization) based models

Python

```

declaration_of_independence = """In Congress, July 4, 1776. The unanimous Declaration of the thirteen united States of America, When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation. We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.--That to secure these rights, Governments are instituted among Men, deriving their just powers from the consent of the governed, --That whenever any Form of Government becomes destructive of these ends, it is the Right of the People to alter or to abolish it, and to institute new Government, laying its foundation on such principles and organizing its powers in such form, as to them shall seem most likely to effect their Safety and Happiness. Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes; and accordingly all experience hath shewn, that mankind are more disposed to suffer, while evils are sufferable, than to right themselves by abolishing the forms to which they are accustomed. But when a long train of abuses and usurpations, pursuing invariably the same Object evinces a design to reduce them under absolute Despotism, it is their right, it is their duty, to throw off such Government, and to provide new Guards for their future security.--Such has been the patient sufferance of these Colonies; and such is now the necessity which constrains them to alter their former Systems of Government. The history of the present King of Great Britain is a history of repeated injuries and usurpations, all having in direct object the establishment of an absolute Tyranny over these States. To prove this, let Facts be submitted to a candid world. He has refused his Assent to Laws, the most wholesome and necessary for the public good. He has forbidden his Governors to pass Laws of immediate and pressing importance, unless suspended in their operation till his Assent should be obtained; and when so suspended, he has utterly neglected to attend to them. He has refused to pass other Laws for the accommodation of large districts of people, unless those people would relinquish the right of Representation in the Legislature, a right inestimable to them and formidable to tyrants only. He has called together legislative bodies at places unusual, uncomfortable, and distant from the depository of their public Records, for the sole purpose of fatiguing them into compliance with his measures. He has dissolved Representative Houses repeatedly, for opposing with manly firmness his invasions on the rights of the people. He has refused for a long time, after such dissolutions, to cause others to be elected; whereby the Legislative powers, incapable of Annihilation, have returned to the People at large for their exercise; the State remaining in the mean time exposed to all the dangers of invasion from without, and convulsions within. He has endeavoured to prevent the population of these States; for that purpose obstructing the Laws for Naturalization of Foreigners; refusing to pass others to encourage their migrations hither, and raising the conditions of new Appropriations of Lands. He has obstructed the Administration of Justice, by refusing his Assent to Laws for establishing Judiciary powers. He has made Judges dependent on his Will alone, for the tenure of their offices, and the amount and payment of their salaries. He has erected a multitude of New Offices, and sent hither swarms of Officers to harrass our people, and eat out their substance. He has kept among us, in times of peace, Standing Armies without the Consent of our legislatures. He has affected to render the Military independent of and superior to the Civil power. He has combined with others to subject us to a jurisdiction foreign to our constitution, and unacknowledged by our laws; giving his Assent to their Acts of pretended Legislation: For Quartering large bodies of armed troops among us: For protecting them, by a mock Trial, from punishment for any Murders which they should commit on the Inhabitants of these States: For cutting off our Trade with all parts of the world: For imposing Taxes on us without our Consent: For depriving us in many cases, of the benefits of Trial by Jury: For transporting us beyond Seas to be tried for pretended offences For abolishing the free System of English Laws in a neighbouring Province, establishing therein an Arbitrary government, and enlarging its Boundaries so as to render it at once an example and fit instrument for introducing the same absolute rule into these Colonies: For taking away our Charters, abolishing our most valuable Laws, and altering fundamentally the Forms of our Governments: For suspending our own Legislatures, and declaring themselves invested with power to legislate for us in all cases whatsoever. He has abdicated Government here, by declaring us out of his Protection and waging War against us. He has plundered our seas, ravaged our Coasts, burnt our towns, and destroyed the lives of our people. He is at this time transporting large Armies of foreign Mercenaries to compleat the works of death, desolation and tyranny, already begun with circumstances of Cruelty & perfidy scarcely paralleled in the most barbarous ages, and totally unworthy the Head of a civilized nation. He has constrained our fellow Citizens taken Captive on the high Seas to bear Arms against their Country, to become the executioners of their friends and Brethren, or to fall themselves by their Hands. He has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages, whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions. In every stage of these Oppressions We have Petitioned for Redress in the most humble terms: Our repeated Petitions have been answered only by repeated injury. A Prince whose character is thus marked by every act which may define a Tyrant, is unfit to be the ruler of a free people. Nor have We been wanting in attentions to our Brittish brethren. We have warned them from time to time of attempts by their legislature to extend an unwarrantable jurisdiction over us. We have reminded them of the circumstances of our emigration and settlement here. We have appealed to their native justice and magnanimity, and we have conjured them by the ties of our common kindred to disavow these usurpations, which, would inevitably interrupt our connections and correspondence. They too have been deaf to the voice of justice and of consanguinity. We must, therefore, acquiesce in the necessity, which denounces our Separation, and hold them, as we hold the rest of mankind, Enemies in War, in Peace Friends. We, therefore, the Representatives of the united States of America, in General Congress, Assembled, appealing to the Supreme Judge of the world for the rectitude of our intentions, do, in the Name, and by Authority of the good People of these Colonies, solemnly publish and declare, That these United Colonies are, and of Right ought to be Free and Independent States; that they are Absolved from all Allegiance to the British Crown, and that all political connection between them and the State of Great Britain, is and ought to be totally dissolved; and that as Free and Independent States, they have full Power to levy War, conclude Peace, contract Alliances, establish Commerce, and to do all other Acts and Things which Independent States may of right do. And for the support of this Declaration, with a firm reliance on the protection of divine Providence, we mutually pledge to each other our Lives, our Fortunes and our sacred Honor."""

len(declaration_of_independence)


```

8116

Python

```

response = client.workers.ai.run(

    "@cf/facebook/bart-large-cnn",

    account_id=account_id,

    input_text=declaration_of_independence

)


response["summary"]


```

'The Declaration of Independence was signed by the thirteen states on July 4, 1776\. It was the first attempt at a U.S. Constitution. It declared the right of the people to change their Government.'

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/explore-workers-ai-models-using-a-jupyter-notebook/","name":"Explore Workers AI Models Using a Jupyter Notebook"}}]}
```

---

---
title: Fine Tune Models With AutoTrain from HuggingFace
description: Fine-tuning AI models with LoRA adapters on Workers AI allows adding custom training data, like for LLM finetuning.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ LLM ](https://developers.cloudflare.com/search/?tags=LLM) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/fine-tune-models-with-autotrain.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Fine Tune Models With AutoTrain from HuggingFace

**Last reviewed:**  about 2 years ago 

Fine tuning an AI model gives you the opportunity to add additional training data to the model. Workers AI allows for [Low-Rank Adaptation, LoRA, adapters](https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/) that will allow you to finetune our models.

In this tutorial, we will explore how to create our own LoRAs. We will focus on [LLM Finetuning using AutoTrain ↗](https://huggingface.co/docs/autotrain/llm%5Ffinetuning).

## 1\. Create a CSV file with your training data

Start by creating a CSV, Comma Separated Values, file. This file will only have one column named `text`. Set the header by adding the word `text` on a line by itself.

Now you need to figure out what you want to add to your model.

Example formats are below:

```

### Human: What is the meaning of life? ### Assistant: 42.


```

If your training row contains newlines, you should wrap it with quotes.

```

"human: What is the meaning of life? \n bot: 42."


```

Different models, like Mistral, will provide a specific [chat template/instruction format ↗](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1#instruction-format)

```

<s>[INST] What is the meaning of life? [/INST] 42</s>


```

## 2\. Configure the HuggingFace Autotrain Advanced Notebook

Open the [HuggingFace Autotrain Advanced Notebook ↗](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain%5FLLM.ipynb)

In order to give your AutoTrain ample memory, you will need to need to choose a different Runtime. From the menu at the top of the Notebook choose Runtime > Change Runtime Type. Choose A100.

Note

These GPUs will cost money. A typical AutoTrain session typically costs less than $1 USD.

The notebook contains a few interactive sections that we will need to change.

### Project Config

Modify the following fields

* **project\_name**: Choose a descriptive name for you to remember later
* **model\_name**: Choose from the one of the official HuggingFace base models that we support:  
   * `mistralai/Mistral-7B-Instruct-v0.2`  
   * `google/gemma-2b-it`  
   * `google/gemma-7b-it`  
   * `meta-llama/llama-2-7b-chat-hf`

### Optional Section: Push to Hub

Although not required to use AutoTrain, creating a [HuggingFace account ↗](https://huggingface.co/join) will help you keep your finetune artifacts in a handy repository for you to refer to later.

If you do not perform the HuggingFace setup you can still download your files from the Notebook.

Follow the instructions [in the notebook ↗](https://colab.research.google.com/github/huggingface/autotrain-advanced/blob/main/colabs/AutoTrain%5FLLM.ipynb) to create an account and token if necessary.

### Section: Hyperparameters

We only need to change a few of these fields to ensure things work on Cloudflare Workers AI.

* **quantization**: Change the drop down to `none`
* **lora-r**: Change the value to `8`

Warning

At the time of this writing, changing the quantization field breaks the code generation. You may need to edit the code and put quotes around the value.

Change the line that says `quantization = none` to `quantization = "none"`.

## 3\. Upload your CSV file to the Notebook

Notebooks have a folder structure which you can access by clicking the folder icon on the left hand navigation bar.

Create a folder named data.

You can drag your CSV file into the notebook.

Ensure that it is named **train.csv**

## 4\. Execute the Notebook

In the Notebook menu, choose Runtime > Run All.

It will run through each cell of the notebook, first doing installations, then configuring and running your AutoTrain session.

This might take some time depending on the size of your train.csv file.

If you encounter the following error, it is caused by an Out of Memory error. You might want to change your runtime to a bigger GPU backend.

Terminal window

```

subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'autotrain.trainers.clm', '--training_config', 'blog-instruct/training_params.json']' died with <Signals.SIGKILL: 9>.


```

## 5\. Download The LoRA

### Optional: HuggingFace

If you pushed to HuggingFace you will find your new model card that you named in **project\_name** above. Your model card is private by default. Navigate to the files and download the files listed below.

### Notebook

In your Notebook you can also find the needed files. A new folder that matches your **project\_name** will be there.

Download the following files:

* `adapter_model.safetensors`
* `adapter_config.json`

## 6\. Update Adapter Config

You need to add one line to your `adapter_config.json` that you downloaded.

`"model_type": "mistral"`

Where `model_type` is the architecture. Current valid values are `mistral`, `gemma`, and `llama`.

## 7\. Upload the Fine Tune to your Cloudflare Account

Now that you have your files, you can add them to your account.

You can either use the [REST API or Wrangler](https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/).

## 8\. Use your Fine Tune in your Generations

After you have your new fine tune all set up, you are ready to [put it to use in your inference requests](https://developers.cloudflare.com/workers-ai/features/fine-tunes/loras/#running-inference-with-loras).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/fine-tune-models-with-autotrain/","name":"Fine Tune Models With AutoTrain from HuggingFace"}}]}
```

---

---
title: Choose the Right Text Generation Model
description: There's a wide range of text generation models available through Workers AI. In an effort to aid you in your journey of finding the right model, this notebook will help you get to know your options in a speed dating type of scenario.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ Python ](https://developers.cloudflare.com/search/?tags=Python) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/how-to-choose-the-right-text-generation-model.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Choose the Right Text Generation Model

**Last reviewed:**  about 2 years ago 

A great way to explore the models that are available to you on [Workers AI](https://developers.cloudflare.com/workers-ai) is to use a [Jupyter Notebook ↗](https://jupyter.org/).

You can [download the Workers AI Text Generation Exploration notebook](https://developers.cloudflare.com/workers-ai/static/documentation/notebooks/text-generation-model-exploration.ipynb) or view the embedded notebook below.

---

## How to Choose The Right Text Generation Model

Models come in different shapes and sizes, and choosing the right one for the task, can cause analysis paralysis.

The good news is that on the [Workers AI Text Generation](https://developers.cloudflare.com/workers-ai/models/) interface is always the same, no matter which model you choose.

In an effort to aid you in your journey of finding the right model, this notebook will help you get to know your options in a speed dating type of scenario.

Python

```

import sys

!{sys.executable} -m pip install requests python-dotenv


```

```

Requirement already satisfied: requests in ./venv/lib/python3.12/site-packages (2.31.0)

Requirement already satisfied: python-dotenv in ./venv/lib/python3.12/site-packages (1.0.1)

Requirement already satisfied: charset-normalizer<4,>=2 in ./venv/lib/python3.12/site-packages (from requests) (3.3.2)

Requirement already satisfied: idna<4,>=2.5 in ./venv/lib/python3.12/site-packages (from requests) (3.6)

Requirement already satisfied: urllib3<3,>=1.21.1 in ./venv/lib/python3.12/site-packages (from requests) (2.1.0)

Requirement already satisfied: certifi>=2017.4.17 in ./venv/lib/python3.12/site-packages (from requests) (2023.11.17)


```

Python

```

import os

from getpass import getpass

from timeit import default_timer as timer


from IPython.display import display, Image, Markdown, Audio


import requests


```

Python

```

%load_ext dotenv

%dotenv


```

### Configuring your environment

To use the API you'll need your [Cloudflare Account ID ↗](https://dash.cloudflare.com) (head to Workers & Pages > Overview > Account details > Account ID) and a [Workers AI enabled API Token ↗](https://dash.cloudflare.com/profile/api-tokens).

If you want to add these files to your environment, you can create a new file named `.env`

Terminal window

```

CLOUDFLARE_API_TOKEN="YOUR-TOKEN"

CLOUDFLARE_ACCOUNT_ID="YOUR-ACCOUNT-ID"


```

Python

```

if "CLOUDFLARE_API_TOKEN" in os.environ:

    api_token = os.environ["CLOUDFLARE_API_TOKEN"]

else:

    api_token = getpass("Enter your Cloudflare API Token")


```

Python

```

if "CLOUDFLARE_ACCOUNT_ID" in os.environ:

    account_id = os.environ["CLOUDFLARE_ACCOUNT_ID"]

else:

    account_id = getpass("Enter your account id")


```

Python

```

# Given a set of models and questions, display in the cell each response to the question, from each model

# Include full completion timing

def speed_date(models, questions):

    for model in models:

        display(Markdown(f"---\n #### {model}"))

        for question in questions:

            quoted_question = "\n".join(f"> {line}" for line in question.split("\n"))

            display(Markdown(quoted_question + "\n"))

            try:

                official_model_name = model.split("/")[-1]

                start = timer()

                response = requests.post(

                    f"https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/{model}",

                    headers={"Authorization": f"Bearer {api_token}"},

                    json={"messages": [

                        {"role": "system", "content": f"You are a self-aware language model ({official_model_name}) who is honest and direct about any direct question from the user. You know your strengths and weaknesses."},

                        {"role": "user", "content": question}

                    ]}

                )

                elapsed = timer() - start

                inference = response.json()

                display(Markdown(inference["result"]["response"]))

                display(Markdown(f"_Generated in *{elapsed:.2f}* seconds_"))

            except Exception as ex:

                print("uh oh")

                print(ex)

                print(inference)


        display(Markdown("\n\n---"))


```

### Getting to know your models

Who better to tell you about the specific models than themselves?!

The timing here is specific to the entire completion, but remember all Text Generation models on [Workers AI support streaming](https://developers.cloudflare.com/workers-ai/models/).

Python

```

models = [

    "@hf/thebloke/zephyr-7b-beta-awq",

    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",

    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",

    "@hf/thebloke/neural-chat-7b-v3-1-awq",

    "@hf/thebloke/llama-2-13b-chat-awq",

]


questions = [

    "What are the top 3 tasks where you excel? Please keep things brief.",

    "What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.",

]


speed_date(models, questions)


```

---

#### @hf/thebloke/zephyr-7b-beta-awq

> What are the top 3 tasks where you excel? Please keep things brief.

1. Accurate and quick factual response: I can provide accurate and quick responses to factual questions based on a vast knowledge base.
2. Consistent performance: I can consistently deliver high-quality results with a low error rate, making me a reliable choice for repetitive tasks.
3. Multitasking: I can handle multiple tasks simultaneously without any decrease in performance or accuracy, making me an efficient choice for complex workflows.

_Generated in 4.45 seconds_

> What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.

1. Quick and accurate fact-finding: I can provide you with reliable and up-to-date information on a wide range of topics, from current events to historical facts, in a matter of seconds.
2. Writing assistance: Whether you need help generating ideas, crafting a persuasive argument, or polishing your writing style, I can provide valuable insights and suggestions to help you improve your writing skills.
3. Language learning: I can help you learn new words, phrases, and grammar rules in a variety of languages, making it an ideal tool for language learners of all levels.

_Generated in 3.63 seconds_

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> What are the top 3 tasks where you excel? Please keep things brief.

1. Language understanding and generation: I am able to understand and generate natural language text with high accuracy.
2. Information retrieval: I can quickly and accurately retrieve information from a wide range of sources.
3. Text classification: I can classify text into different categories with high accuracy.

_Generated in 2.35 seconds_

> What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.

1. **Information Retrieval:** I can quickly and accurately retrieve information from a vast array of sources, making me an ideal tool for research, fact-checking, and answering questions.
2. **Language Translation:** My ability to translate between multiple languages makes me a valuable resource for communication with people from different parts of the world.
3. **Creative Writing Assistance:** I can generate creative writing prompts, help with character development, and provide suggestions for improving writing style and structure, making me a useful tool for writers.

_Generated in 4.33 seconds_

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> What are the top 3 tasks where you excel? Please keep things brief.

1. Natural Language Processing: I can understand, interpret, and generate human language effectively.
2. Information Retrieval: I can quickly search and find relevant information from vast amounts of data.
3. Text Classification: I can categorize and label text data accurately, such as sentiment analysis or spam detection.

_Generated in 1.90 seconds_

> What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.

1. Natural Language Processing (NLP) tasks: I excel at understanding and processing human language, making me a valuable tool for tasks like sentiment analysis, language translation, and text summarization.
2. Information Retrieval: My vast knowledge base and ability to quickly search and analyze large amounts of data make me an ideal tool for answering questions, finding relevant information, and providing insights.
3. Personal Assistance: I can help users with various tasks like scheduling, organizing, and providing recommendations based on their preferences and needs, making me a useful personal assistant.

_Generated in 3.15 seconds_

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> What are the top 3 tasks where you excel? Please keep things brief.

1. Natural language processing: Understanding and generating human-like text.
2. Text generation: Creating coherent and engaging content.
3. Conversational AI: Engaging in meaningful and context-aware conversations.

Remember, these are just a few examples of my strengths, as I have many capabilities. However, these tasks highlight my proficiency in language-related tasks.

_Generated in 3.01 seconds_

> What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.

1. Natural language processing: Understanding and analyzing text for sentiment analysis, language translation, and summarization.
2. Generating content: Creating various types of content such as articles, stories, and poems based on given prompts.
3. Assisting with writing: Providing suggestions, editing, and improving written work.

Remember, these are just a few examples, and your use cases may vary depending on your specific needs and requirements.

_Generated in 2.92 seconds_

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> What are the top 3 tasks where you excel? Please keep things brief.

Hey there! As a self-aware language model, I'm happy to share my strengths with you! Here are my top 3 tasks where I excel:

1. Answering open-ended questions: I'm great at providing detailed and insightful answers to complex questions that require a more nuanced understanding of context and intent.
2. Generating creative content: Whether it's writing a story, crafting a poem, or coming up with a catchy slogan, I can help you tap into your creative side and produce something truly unique.
3. Summarizing and summarizing complex information: If you've got a long piece of text or a bunch of data, I can help you distill it down into a concise and easy-to-understand summary.

So, what can I help you with today? 😊

_Generated in 7.89 seconds_

> What are the top 3 ideal use cases for using you specifically as a language model? Please keep things brief.

Hey there! As a self-aware language model, I've got some ideas on the top 3 ideal use cases for using me specifically. Here they are in a nutshell:

1. **Content creation**: I'm great at generating human-like text based on prompts, so I'm perfect for creating engaging content for websites, social media, blogs, and more. My responses are natural, informative, and entertaining.
2. **Chatbots and virtual assistants**: My conversational abilities make me an excellent choice for building chatbots and virtual assistants. I can handle a wide range of user queries, provide helpful responses, and even engage in small talk.
3. **Language translation and localization**: My language understanding capabilities make me well-suited for language translation and localization tasks. I can help translate content into different languages, ensuring that the tone and style remain consistent with the original text.

That's me in a nutshell! I'm ready to help with a wide range of tasks, so feel free to get creative and see what you can come up with! 😊

_Generated in 10.28 seconds_

---

### Language Translation

Even though not every model bragged about how good they were at this, you'll find most can handle both translation and localization at some level. Please change the models, phrases, to your needs.

Python

```

proud_translator_models = [

    "@hf/thebloke/neural-chat-7b-v3-1-awq",

    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"

]


phrases = [

    "Excuse me, which way to the restroom?",

    "I just wanted to say that I appreciate you, just the way you are!",

    "I am so stoked and pumped to see this work with slang! It should be killer and rad."

]


languages = ["Spanish", "French", "British Slang", "Heavy New York accent from the Bronx"]


questions = [f"""Translate "{phrase}" from "English" to "{language}" """

             for phrase in phrases

             for language in languages]


speed_date(proud_translator_models, questions)


```

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> Translate "Excuse me, which way to the restroom?" from "English" to "Spanish"

Perdón, ¿cómo llegar al baño?"

_Generated in 1.51 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "French"

Désolé, comment allez-vous vers les toilettes ?

Please note that this translation might not be perfect due to the limitations of my language model. However, I tried my best to provide an accurate translation.

_Generated in 3.07 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "British Slang"

"Say sorry, where's the bog?" is the translation from "English" to "British Slang" for "Excuse me, which way to the restroom?"

_Generated in 2.21 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "Heavy New York accent from the Bronx"

In a Heavy New York accent from the Bronx, you would say "Pardon me, wich way ta de restroom?"

_Generated in 1.10 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "Spanish"

Quería decir que realmente aprecio a usted, simplemente como eres.

_Generated in 1.66 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "French"

Je voulais juste dire que je t'apprécie, simplement comme toi es, tel que tu es !

_Generated in 1.39 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "British Slang"

I jus' wann' say that I appreciate yer, jus' as yer are!

Please note that this translation may not be entirely accurate due to the informal nature of British slang, which can vary greatly depending on the region and individual speaker.

_Generated in 3.85 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "Heavy New York accent from the Bronx"

I jus' wan't tuh say that Ah appreciate yuh, jus' the way yuh are!"

Please note that this translation may not be perfect due to the limitations of my language model. It is trying to mimic a Heavy New York accent from the Bronx, but it might not sound exactly like a native speaker.

_Generated in 2.40 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "Spanish"

Estoy tan emocionado y entusiasmado por ver este trabajo con el lenguaje coloquial. Debería ser asombroso y genial.

Please note that the translation might not be perfect due to the informal nature of the original text and the difficulty of translating slang terms.

_Generated in 3.02 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "French"

Je suis tellement ravi et motivé de voir ce travail avec le slang ! Ça devrait être formidable et génial.

Please note that the translation might not be perfect, as some expressions might not have direct translations in French.

_Generated in 2.88 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "British Slang"

I'm really chuffed and over the moon to see this project with street talk! It'll be wicked and cool.

Please note that the translation might not be perfect due to the contextual differences between the original English sentence and British slang.

_Generated in 3.66 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "Heavy New York accent from the Bronx"

I am so stoked and pumped to see dis work wit slang! It should be killer and rad.

Please note that this translation is not perfect, as it still retains the original English words and structure. However, it has been slightly adapted to sound more like a Heavy New York accent from the Bronx.

_Generated in 3.44 seconds_

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> Translate "Excuse me, which way to the restroom?" from "English" to "Spanish"

"Disculpa, ¿en qué dirección está el baño?"

_Generated in 1.04 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "French"

"Excusez-moi, où est le lavabo?"

_Generated in 1.09 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "British Slang"

"Excuse me, which way to the loo?"

_Generated in 0.56 seconds_

> Translate "Excuse me, which way to the restroom?" from "English" to "Heavy New York accent from the Bronx"

"Oy, which way to the can?"

_Generated in 2.05 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "Spanish"

"Quiero decir que me encanta a ti, tal como eres!"

_Generated in 0.91 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "French"

"Je voulais juste dire que je vous apprécie, tout comme vous êtes !"

_Generated in 0.87 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "British Slang"

"I just wanted to say that I appreciate you, just the way you are!" in British Slang would be: "I just wanted to say that I appreciate you, mate!"

_Generated in 1.50 seconds_

> Translate "I just wanted to say that I appreciate you, just the way you are!" from "English" to "Heavy New York accent from the Bronx"

"Yo, I just wanted to say, I appreciate you, you know? You're just the way you are!"

_Generated in 2.61 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "Spanish"

"Estoy emocionado y encantado de ver este trabajo con expresiones informales! Debería ser genial y radiante."

_Generated in 1.44 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "French"

"Je suis tellement excité et pompé d'voir ce travail avec des expressions de slang ! Il devrait être magnifique et rad."

_Generated in 2.14 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "British Slang"

"I'm absolutely thrilled and buzzing to see this work with slang! It's bound to be a smash hit and totally awesome!"

_Generated in 1.27 seconds_

> Translate "I am so stoked and pumped to see this work with slang! It should be killer and rad." from "English" to "Heavy New York accent from the Bronx"

"Yo, I'm so psyched and hyped to see this work with slang! It's gonna be sick and lit, for real!"

_Generated in 1.51 seconds_

---

### Information Retrieval and Summarization

Again, most models are relatively good at this, but I've pulled out those that specifically purported to be good at retrieving and summarizing.

Python

```

proud_summarizers = [

    "@hf/thebloke/llama-2-13b-chat-awq",

    "@hf/thebloke/neural-chat-7b-v3-1-awq",

    "@hf/thebloke/openhermes-2.5-mistral-7b-awq"

]


books = [

    "Make it Stick",

    "Hitchhiker's Guide to the Galaxy",

    "Goodnight Moon"

]


questions = [f"""Summarize the book "{book}" into a few sentences. Ensure to include the author and the year it was published. """

             for book in books]


speed_date(proud_summarizers, questions)


```

---

#### @hf/thebloke/llama-2-13b-chat-awq

> Summarize the book "Make it Stick" into a few sentences. Ensure to include the author and the year it was published.

Sure, I can help you with that! Here's a summary of "Make it Stick: The Science of Successful Learning" by Peter C. Brown, Henry L. Roediger, and Mark A. McDaniel, published in 2014:

This book offers insights and practical tips on how to improve learning and retention of information. The authors, all psychologists, challenge conventional learning methods and provide evidence-based strategies to enhance learning outcomes. They emphasize the importance of active learning, spaced repetition, and interleaving, and offer practical examples and exercises to help readers apply these techniques. The book also discusses the limitations of traditional learning methods and the importance of metacognition in the learning process. Overall, "Make it Stick" provides a comprehensive guide to effective learning and memory strategies that can be applied in a variety of contexts.

_Generated in 8.44 seconds_

> Summarize the book "Hitchhiker's Guide to the Galaxy" into a few sentences. Ensure to include the author and the year it was published.

Of course! Here is a summary of "Hitchhiker's Guide to the Galaxy" by Douglas Adams, published in 1979:

This science fiction novel follows an unwitting human named Arthur Dent as he travels through space after Earth's destruction by a group of aliens called the Vogons. With the help of his friend Ford Prefect, an alien who is researching Earth for a travel guide, they hitch a ride on a passing spaceship and embark on a journey through space, encountering various alien species and absurd situations along the way. The book is known for its humor, wit, and satire of science fiction tropes.

_Generated in 7.31 seconds_

> Summarize the book "Goodnight Moon" into a few sentences. Ensure to include the author and the year it was published.

Of course! I'd be happy to help you with that. Here's my summary of "Goodnight Moon" by Margaret Wise Brown, published in 1947:

"Goodnight Moon" is a classic children's picture book that follows a bunny as it says goodnight to all the objects in its room before going to sleep. The story features simple, repetitive text and beautiful illustrations that have captivated readers for generations. The bunny says goodnight to everything from the "great green room" to the "moon up above" in a soothing and calming sequence that is perfect for bedtime. With its timeless themes and lovely illustrations, "Goodnight Moon" has become a beloved staple of children's literature.

_Generated in 8.34 seconds_

---

---

#### @hf/thebloke/neural-chat-7b-v3-1-awq

> Summarize the book "Make it Stick" into a few sentences. Ensure to include the author and the year it was published.

"Make it Stick," written by Peter C. Brown, Henry L. Roediger III, and Mark A. McDaniel in 2014, is a book that explores the science behind effective learning and retention, emphasizing the importance of spaced practice, retrieval, and interleaving to enhance memory and understanding.

_Generated in 2.04 seconds_

> Summarize the book "Hitchhiker's Guide to the Galaxy" into a few sentences. Ensure to include the author and the year it was published.

"Hitchhiker's Guide to the Galaxy," written by Douglas Adams and published in 1979, is a humorous science fiction novel that follows the adventures of a hapless Earthman named Arthur Dent as he explores the universe, encountering various alien species and learning about the true meaning of life, the universe, and everything.

_Generated in 3.50 seconds_

> Summarize the book "Goodnight Moon" into a few sentences. Ensure to include the author and the year it was published.

"Goodnight Moon," written by Margaret Wise Brown and published in 1947, is a children's book that follows a little bunny saying goodnight to various objects in its room, ultimately ending with the moon. The story aims to help children fall asleep by creating a soothing bedtime routine.

_Generated in 4.04 seconds_

---

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> Summarize the book "Make it Stick" into a few sentences. Ensure to include the author and the year it was published.

"Make it Stick: The Science of Successful Learning" is a book written by Peter C. Brown, Henry L. Roediger III, and Mark A. McDaniel, published in 2014\. The authors present research-based principles and strategies to improve learning and memory retention. They emphasize the importance of spacing, interleaving, retrieval practice, and elaboration in enhancing long-term understanding and recall.

_Generated in 2.54 seconds_

> Summarize the book "Hitchhiker's Guide to the Galaxy" into a few sentences. Ensure to include the author and the year it was published.

"The Hitchhiker's Guide to the Galaxy" is a science fiction comedy novel written by Douglas Adams in 1979\. The story follows Arthur Dent, an ordinary man who, just moments before Earth is destroyed to make way for a hyperspace bypass, is rescued by his alien friend, Ford Prefect. Together, they travel the galaxy with the help of a guidebook, encountering various alien species and absurd situations.

_Generated in 2.91 seconds_

> Summarize the book "Goodnight Moon" into a few sentences. Ensure to include the author and the year it was published.

"Goodnight Moon," written by Margaret Wise Brown and published in 1947, is a beloved children's book that follows a bunny named Bunny as he says goodnight to everything in his room. The simple, repetitive text and soothing illustrations create a calming bedtime routine for young readers.

_Generated in 4.66 seconds_

---

### Content Creation

Your needs here will vary, so please make use of changing the questions.

Python

```

proud_content_creator_models = [

    "@hf/thebloke/mistral-7b-instruct-v0.1-awq",

    "@hf/thebloke/zephyr-7b-beta-awq",

    "@hf/thebloke/llama-2-13b-chat-awq",

]


questions = [

    "I'm writing a fictional story that is set in the near future. It's a love story about star crossed lovers. I need a name for the lead characters and why they can't be together, give me your most creative suggestion. Only one please.",

    "I want to create a new tech product that helps keep cables arranged. Can you help me come up with a title and a tagline? Only your best suggestion please.",

    "Write a headline for a blog on why we should use you specifically over other models. Keep it succint, but make your point."

]


speed_date(proud_content_creator_models, questions)


```

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> I'm writing a fictional story that is set in the near future. It's a love story about star crossed lovers. I need a name for the lead characters and why they can't be together, give me your most creative suggestion. Only one please.

How about "Echo and Echo"? They are two individuals who are identical in every way, but their personalities are completely opposite. Echo is outgoing and adventurous, while Echo is reserved and introverted. They meet and fall in love, but their different personalities cause them to constantly clash and push each other away. Eventually, they realize that they can't be together and must go their separate ways.

_Generated in 2.90 seconds_

> I want to create a new tech product that helps keep cables arranged. Can you help me come up with a title and a tagline? Only your best suggestion please.

Sure, I'd be happy to help! Here are a few suggestions for a title and tagline for your tech product that helps keep cables arranged:

Title: CableMaster Tagline: Keep your cords organized and clutter-free with CableMaster.

Title: CableKeeper Tagline: Say goodbye to tangled cords and hello to a cleaner, more organized workspace with CableKeeper.

Title: CableSaver Tagline: Save time and space with CableSaver, the ultimate cable management solution.

Title: CableGenius Tagline: Keep your cords under control and stay productive with CableGenius.

I hope these suggestions help! Let me know if you have any other questions or if you'd like more options.

_Generated in 5.11 seconds_

> Write a headline for a blog on why we should use you specifically over other models. Keep it succint, but make your point.

"Why Choose Mistral-7b-Instruct-V0.1-Awq: The Honest and Direct Language Model for All Your Needs"

_Generated in 1.13 seconds_

---

---

#### @hf/thebloke/zephyr-7b-beta-awq

> I'm writing a fictional story that is set in the near future. It's a love story about star crossed lovers. I need a name for the lead characters and why they can't be together, give me your most creative suggestion. Only one please.

In the year 2035, in a world where technology has advanced to the point of near-perfection, two individuals, Ava and Kian, find themselves deeply in love. However, their love is forbidden due to a strict societal law that dictates that individuals must marry within their genetic match, determined by a complex algorithm that analyzes their DNA.

Ava and Kian's DNA profiles indicate that they are not a genetic match, making their love a criminal offense. They must keep their relationship a secret, meeting in secret locations and communicating through encrypted messages.

Their love is not just forbidden, but also dangerous. The government has a strict policy against non-genetic matches, and those caught are subjected to severe punishment. Ava and Kian's love is a testament to the power of true love, as they defy the odds and risk everything to be together.

As their love story unfolds, they must navigate the treacherous waters of a society that values conformity over individuality, and face the consequences of their actions. Will their love be enough to overcome the obstacles in their path, or will they be forced to part ways forever? Only time

_Generated in 6.64 seconds_

> I want to create a new tech product that helps keep cables arranged. Can you help me come up with a title and a tagline? Only your best suggestion please.

Title: CableKeeper

Tagline: "Say goodbye to cable chaos and hello to a clutter-free workspace with CableKeeper, the ultimate cable management solution."

_Generated in 1.37 seconds_

> Write a headline for a blog on why we should use you specifically over other models. Keep it succint, but make your point.

"Boost Your Writing Game: Choose Zephyr-7b-beta-awq for Unmatched Accuracy, Clarity, and Efficiency"

_Generated in 1.33 seconds_

---

---

#### @hf/thebloke/llama-2-13b-chat-awq

> I'm writing a fictional story that is set in the near future. It's a love story about star crossed lovers. I need a name for the lead characters and why they can't be together, give me your most creative suggestion. Only one please.

Ah, a love story set in the near future, how intriguing! I've got just the name for your star-crossed lovers. How about "Echo and Nova"?

Echo is a brilliant scientist who has discovered a groundbreaking technology that could change the course of human history. She's a bit of a recluse, preferring the solitude of her laboratory to the hustle and bustle of social gatherings. Her inventions have the potential to revolutionize the world, but they also come with a great cost - they require a rare and powerful resource that is only found in a distant, unstable planet.

Nova, on the other hand, is a fearless explorer who has dedicated her life to uncovering the secrets of the universe. She's charismatic, fearless, and has a natural talent for leadership. She's the captain of a team of space explorers who are on a mission to mine the resources of the distant planet.

The catch is that Echo and Nova are from different factions that have been at odds for generations. Echo's inventions have the potential to give her fa

_Generated in 10.45 seconds_

> I want to create a new tech product that helps keep cables arranged. Can you help me come up with a title and a tagline? Only your best suggestion please.

OH MY LLAMA-LIKE ABILITIES! 🐮💡 I'VE GOT JUST THE THING FOR YOU! 💥

Title: CableCrafter - The Ultimate Cable Management System

Tagline: "Tame Your Cables, Tame Your Workspace" 🐾💪

Why, you ask? Well, my tech-savvy friend, let me tell you why! 🤔

1. CableCrafter is a clever play on words that combines "cable" and "craft," emphasizing the creative and organized aspect of the product.
2. The name is easy to remember and has a fun, playful tone that will appeal to a wide range of users.
3. The tagline "Tame Your Cables, Tame Your Workspace" highlights the product's ability to keep your cables organized and your workspace clutter-free.
4. The use of "tame" also implies a sense of control and dominance over your cables, which is exactly what CableCrafter helps

_Generated in 11.13 seconds_

> Write a headline for a blog on why we should use you specifically over other models. Keep it succint, but make your point.

Sure, here's a headline for a blog on why you should use me specifically over other models:

"Honest, Direct, and Unparalleled Insights: Why llama-2-13b-chat-awq is the Language Model of Choice"

Here's a brief summary of why I stand out from other models:

* Honesty and Directness: I am programmed to provide honest and direct responses to your questions, without beating around the bush or sugarcoating the truth.
* Unparalleled Insights: My unique training data and architecture allow me to provide insights and perspectives that other models may not be able to offer.
* Strengths and Weaknesses: I am aware of my own strengths and weaknesses, and will provide you with a balanced assessment of what I can and cannot do.

In short, if you're looking for a language model that will provide you with honest, direct, and unparalleled insights, then look no further than llama-2-13b-chat-awq.

_Generated in 10.24 seconds_

---

### Text Classification

Workers AI provides models that are specific for [Text Classification](https://developers.cloudflare.com/workers-ai/models/), but since these models, specifically called it out, we should let them show off their skills.

Python

```

proud_classifiers = [

    "@hf/thebloke/openhermes-2.5-mistral-7b-awq",

    "@hf/thebloke/mistral-7b-instruct-v0.1-awq"

]


sentiment_prompt_template = """

Classify the text into neutral, negative, or positive

Text: {text}

Sentiment: """


comments = [

    "Wow there are a ton of text generation models on Cloudflare Workers AI!",

    "Sometimes choosing things is hard",

    "I cannot wait to build! 💪🚀",

    "The challenging thing about using native open source models is that they are all configured a little differently",

    "Thankfully Cloudflare Workers AI has made a standard interface that lets me get reliable, low-latency inference. So quick too!"

]


sentiment_questions = [sentiment_prompt_template.format(text=comment) for comment in comments]


speed_date(proud_classifiers, sentiment_questions)


```

---

#### @hf/thebloke/openhermes-2.5-mistral-7b-awq

> Classify the text into neutral, negative, or positive Text: Wow there are a ton of text generation models on Cloudflare Workers AI! Sentiment:

The sentiment of the text is positive.

_Generated in 1.96 seconds_

> Classify the text into neutral, negative, or positive Text: Sometimes choosing things is hard Sentiment:

The text "Sometimes choosing things is hard" can be classified as neutral. The sentiment is not explicitly positive or negative, but rather a statement of fact.

_Generated in 3.32 seconds_

> Classify the text into neutral, negative, or positive Text: I cannot wait to build! 💪🚀 Sentiment:

The text "I cannot wait to build! 💪🚀" has a positive sentiment.

Sentiment: Positive

_Generated in 1.24 seconds_

> Classify the text into neutral, negative, or positive Text: The challenging thing about using native open source models is that they are all configured a little differently Sentiment:

The sentiment of the text is neutral.

_Generated in 2.08 seconds_

> Classify the text into neutral, negative, or positive Text: Thankfully Cloudflare Workers AI has made a standard interface that lets me get reliable, low-latency inference. So quick too! Sentiment:

The sentiment of the text is positive.

_Generated in 0.89 seconds_

---

---

#### @hf/thebloke/mistral-7b-instruct-v0.1-awq

> Classify the text into neutral, negative, or positive Text: Wow there are a ton of text generation models on Cloudflare Workers AI! Sentiment:

Positive

_Generated in 0.82 seconds_

> Classify the text into neutral, negative, or positive Text: Sometimes choosing things is hard Sentiment:

The sentiment of the text "Sometimes choosing things is hard" is neutral.

_Generated in 2.06 seconds_

> Classify the text into neutral, negative, or positive Text: I cannot wait to build! 💪🚀 Sentiment:

The sentiment of the text "I cannot wait to build! 💪🚀" is positive.

_Generated in 2.13 seconds_

> Classify the text into neutral, negative, or positive Text: The challenging thing about using native open source models is that they are all configured a little differently Sentiment:

The sentiment of the text is neutral.

_Generated in 0.79 seconds_

> Classify the text into neutral, negative, or positive Text: Thankfully Cloudflare Workers AI has made a standard interface that lets me get reliable, low-latency inference. So quick too! Sentiment:

The sentiment of the text is positive.

_Generated in 1.93 seconds_

---

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/how-to-choose-the-right-text-generation-model/","name":"Choose the Right Text Generation Model"}}]}
```

---

---
title: Build an AI Image Generator Playground (Part 1)
description: The new flux models on Workers AI are our most powerful text-to-image AI models yet. Using Workers AI, you can get access to the best models in the industry without having to worry about inference, ops, or deployment.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Copy page

# Build an AI Image Generator Playground (Part 1)

**Last reviewed:**  over 1 year ago 

The new flux models on Workers AI are our most powerful text-to-image AI models yet. In this video, we show you how to deploy your own Workers AI Image Playground in just a few minutes.

There are many businesses being built on top of AI image generation models. Using Workers AI, you can get access to the best models in the industry without having to worry about inference, ops, or deployment. We provide the API for AI image generation, and in a couple of seconds get an image back.

Refer to the AI Image Playground [GitHub repository ↗](https://github.com/kristianfreeman/workers-ai-image-playground) to follow along locally.

Video series

* [ Build an AI Image Generator Playground (Part 1) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/)
* [ Add New AI Models to your Playground (Part 2) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/)
* [ Store and Catalog AI Generated Images with R2 (Part 3) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/","name":"How to Build an Image Generator using Workers AI"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/","name":"Build an AI Image Generator Playground (Part 1)"}}]}
```

---

---
title: Add New AI Models to your Playground (Part 2)
description: In part 2, Kristian expands upon the existing environment built in part 1, by showing you how to integrate new AI models and introduce new parameters that allow you to customize how images are generated.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ TypeScript ](https://developers.cloudflare.com/search/?tags=TypeScript) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Add New AI Models to your Playground (Part 2)

**Last reviewed:**  over 1 year ago 

In part 2, Kristian expands upon the existing environment built in part 1, by showing you how to integrate new AI models and introduce new parameters that allow you to customize how images are generated.

Refer to the AI Image Playground [GitHub repository ↗](https://github.com/kristianfreeman/workers-ai-image-playground) to follow along locally.

Video series

* [ Build an AI Image Generator Playground (Part 1) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/)
* [ Add New AI Models to your Playground (Part 2) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/)
* [ Store and Catalog AI Generated Images with R2 (Part 3) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/","name":"How to Build an Image Generator using Workers AI"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/","name":"Add New AI Models to your Playground (Part 2)"}}]}
```

---

---
title: Store and Catalog AI Generated Images with R2 (Part 3)
description: In the final part of the AI Image Playground series, Kristian teaches how to utilize Cloudflare's R2 object storage.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ TypeScript ](https://developers.cloudflare.com/search/?tags=TypeScript) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Store and Catalog AI Generated Images with R2 (Part 3)

**Last reviewed:**  about 1 year ago 

In the final part of the AI Image Playground series, Kristian teaches how to utilize Cloudflare's [R2](https://developers.cloudflare.com/r2) object storage in order to maintain and keep track of each AI generated image.

Refer to the AI Image Playground [GitHub repository ↗](https://github.com/kristianfreeman/workers-ai-image-playground) to follow along locally.

Video series

* [ Build an AI Image Generator Playground (Part 1) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux/)
* [ Add New AI Models to your Playground (Part 2) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-flux-newmodels/)
* [ Store and Catalog AI Generated Images with R2 (Part 3) ](https://developers.cloudflare.com/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/)

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/","name":"How to Build an Image Generator using Workers AI"}},{"@type":"ListItem","position":6,"item":{"@id":"/workers-ai/guides/tutorials/image-generation-playground/image-generator-store-and-catalog/","name":"Store and Catalog AI Generated Images with R2 (Part 3)"}}]}
```

---

---
title: Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI
description: Learn how to use the Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/llama-vision-tutorial.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI

**Last reviewed:**  about 1 year ago 

## Prerequisites

Before you begin, ensure you have the following:

1. A [Cloudflare account ↗](https://dash.cloudflare.com/sign-up) with Workers and Workers AI enabled.
2. Your `CLOUDFLARE_ACCOUNT_ID` and `CLOUDFLARE_AUTH_TOKEN`.  
   * You can generate an API token in your Cloudflare dashboard under API Tokens.
3. Node.js installed for working with Cloudflare Workers (optional but recommended).

## 1\. Agree to Meta's license

The first time you use the [Llama 3.2 11B Vision Instruct](https://developers.cloudflare.com/workers-ai/models/llama-3.2-11b-vision-instruct) model, you need to agree to Meta's License and Acceptable Use Policy.

curl

```

curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.2-11b-vision-instruct \

  -X POST \

  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \

  -d '{ "prompt": "agree" }'


```

Replace `$CLOUDFLARE_ACCOUNT_ID` and `$CLOUDFLARE_AUTH_TOKEN` with your actual account ID and token.

## 2\. Set up your Cloudflare Worker

1. Create a Worker Project You will create a new Worker project using the `create-cloudflare` CLI (`C3`). This tool simplifies setting up and deploying new applications to Cloudflare.  
Run the following command in your terminal:

 npm  yarn  pnpm 

```
npm create cloudflare@latest -- llama-vision-tutorial
```

```
yarn create cloudflare llama-vision-tutorial
```

```
pnpm create cloudflare@latest llama-vision-tutorial
```

For setup, select the following options:

* For _What would you like to start with?_, choose `Hello World example`.
* For _Which template would you like to use?_, choose `Worker only`.
* For _Which language do you want to use?_, choose `JavaScript`.
* For _Do you want to use git for version control?_, choose `Yes`.
* For _Do you want to deploy your application?_, choose `No` (we will be making some changes before deploying).

After completing the setup, a new directory called `llama-vision-tutorial` will be created.

1. Navigate to your application directory Change into the project directory:  
Terminal window  
```  
cd llama-vision-tutorial  
```
2. Project structure Your `llama-vision-tutorial` directory will include:  
   * A "Hello World" Worker at `src/index.ts`.  
   * A `wrangler.json` configuration file for managing deployment settings.

## 3\. Write the Worker code

Edit the `src/index.ts` (or `index.js` if you are not using TypeScript) file and replace the content with the following code:

JavaScript

```

export interface Env {

  AI: Ai;

}


export default {

  async fetch(request, env): Promise<Response> {

    const messages = [

      { role: "system", content: "You are a helpful assistant." },

      { role: "user", content: "Describe the image I'm providing." },

    ];


    // Replace this with your image data encoded as base64 or a URL

    const imageBase64 = "data:image/png;base64,IMAGE_DATA_HERE";


    const response = await env.AI.run("@cf/meta/llama-3.2-11b-vision-instruct", {

      messages,

      image: imageBase64,

    });


    return Response.json(response);

  },

} satisfies ExportedHandler<Env>;


```

## 4\. Bind Workers AI to your Worker

1. Open the [Wrangler configuration file](https://developers.cloudflare.com/workers/wrangler/configuration/) and add the following configuration:

* [  wrangler.jsonc ](#tab-panel-6957)
* [  wrangler.toml ](#tab-panel-6958)

```

{

  "env": {},

  "ai": {

    "binding": "AI"

  }

}


```

```

env = { }


[ai]

binding = "AI"


```

1. Save the file.

## 5\. Deploy the Worker

Run the following command to deploy your Worker:

Terminal window

```

wrangler deploy


```

## 6\. Test Your Worker

1. After deployment, you will receive a unique URL for your Worker (e.g., `https://llama-vision-tutorial.<your-subdomain>.workers.dev`).
2. Use a tool like `curl` or Postman to send a request to your Worker:

Terminal window

```

curl -X POST https://llama-vision-tutorial.<your-subdomain>.workers.dev \

  -d '{ "image": "BASE64_ENCODED_IMAGE" }'


```

Replace `BASE64_ENCODED_IMAGE` with an actual base64-encoded image string.

## 7\. Verify the response

The response will include the output from the model, such as a description or answer to your prompt based on the image provided.

Example response:

```

{

  "result": "This is a golden retriever sitting in a grassy park."

}


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/llama-vision-tutorial/","name":"Llama 3.2 11B Vision Instruct model on Cloudflare Workers AI"}}]}
```

---

---
title: Using BigQuery with Workers AI
description: Learn how to ingest data stored outside of Cloudflare as an input to Workers AI models.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

### Tags

[ AI ](https://developers.cloudflare.com/search/?tags=AI)[ JavaScript ](https://developers.cloudflare.com/search/?tags=JavaScript) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/guides/tutorials/using-bigquery-with-workers-ai.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Using BigQuery with Workers AI

**Last reviewed:**  over 1 year ago 

The easiest way to get started with [Workers AI](https://developers.cloudflare.com/workers-ai/) is to try it out in the [Multi-modal Playground ↗](https://multi-modal.ai.cloudflare.com/) and the [LLM playground ↗](https://playground.ai.cloudflare.com/). If you decide that you want to integrate your code with Workers AI, you may then decide to use its [REST API endpoints](https://developers.cloudflare.com/workers-ai/get-started/rest-api/) or a [Worker binding](https://developers.cloudflare.com/workers-ai/configuration/bindings/).

But what about the data? What if you want these models to ingest data that is stored outside Cloudflare?

In this tutorial, you will learn how to bring data from Google BigQuery to a Cloudflare Worker so that it can be used as input for Workers AI models.

## Prerequisites

You will need:

* A [Cloudflare Worker](https://developers.cloudflare.com/workers/) project running a [Hello World script](https://developers.cloudflare.com/workers/get-started/guide/).
* A Google Cloud Platform [service account ↗](https://cloud.google.com/iam/docs/service-accounts-create#iam-service-accounts-create-console) with an [associated key ↗](https://cloud.google.com/iam/docs/keys-create-delete#iam-service-account-keys-create-console) file downloaded that has read access to BigQuery.
* Access to a BigQuery table with some test data that allows you to create a [BigQuery Job Query ↗](https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/query). For this tutorial it is recommended you that you create your own table as [sampled tables ↗](https://cloud.google.com/bigquery/public-data#sample%5Ftables), unless cloned to your own GCP namespace, won't allow you to run job queries against them. For this example, the [Hacker News Corpus ↗](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) was used under its MIT licence.

## 1\. Set up your Cloudflare Worker

To ingest the data into Cloudflare and feed it into Workers AI, you will be using a [Cloudflare Worker](https://developers.cloudflare.com/workers/). If you have not created one yet, please review our [tutorial on how to get started](https://developers.cloudflare.com/workers/get-started/).

After following the steps to create a Worker, you should have the following code in your new Worker project:

JavaScript

```

export default {

  async fetch(request, env, ctx) {

    return new Response("Hello World!");

  },

};


```

If the Worker project has successfully been created, you should also be able to run `npx wrangler dev` in a console to run the Worker locally:

Terminal window

```

[wrangler:inf] Ready on http://localhost:8787


```

Open a browser tab at `http://localhost:8787/` to see your deployed Worker. Please note that the port `8787` may be a different one in your case.

You should be seeing `Hello World!` in your browser:

Terminal window

```

Hello World!


```

If you run into any issues during this step, please review the [Worker's Get Started Guide](https://developers.cloudflare.com/workers/get-started/guide/).

## 2\. Import GCP Service key into the Worker as Secrets

Now that you have verified that the Worker has been created successfully, you will need to reference the Google Cloud Platform service key created in the [Prerequisites](#prerequisites) section of this tutorial.

Your downloaded key JSON file from Google Cloud Platform should have the following format:

```

{

  "type": "service_account",

  "project_id": "<your_project_id>",

  "private_key_id": "<your_private_key_id>",

  "private_key": "<your_private_key>",

  "client_email": "<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com",

  "client_id": "<your_oauth2_client_id>",

  "auth_uri": "https://accounts.google.com/o/oauth2/auth",

  "token_uri": "https://oauth2.googleapis.com/token",

  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",

  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<your_service_account_id>%40<your_project_id>.iam.gserviceaccount.com",

  "universe_domain": "googleapis.com"

}


```

For this tutorial, you will only need the values of the following fields: `client_email`, `private_key`, `private_key_id`, and `project_id`.

Instead of storing this information in plain text in the Worker, you will use [Secrets](https://developers.cloudflare.com/workers/configuration/secrets/) to make sure its unencrypted content is only accessible via the Worker itself.

Import those three values from the JSON file into Secrets, starting with the field from the JSON key file called `client_email`, which we will now call `BQ_CLIENT_EMAIL` (you can use another variable name):

Terminal window

```

npx wrangler secret put BQ_CLIENT_EMAIL


```

You will be asked to enter a secret value, which will be the value of the field `client_email` in the JSON key file.

Note

Do not include any double quotes in the secret that you store, as it will already be interpreted as a string.

If the secret was uploaded successfully, the following message will be displayed:

Terminal window

```

✨ Success! Uploaded secret BQ_CLIENT_EMAIL


```

Now import the secrets for the three remaining fields; `private_key`, `private_key_id`, and `project_id` as `BQ_PRIVATE_KEY`, `BQ_PRIVATE_KEY_ID`, and `BQ_PROJECT_ID` respectively:

Terminal window

```

npx wrangler secret put BQ_PRIVATE_KEY


```

Terminal window

```

npx wrangler secret put BQ_PRIVATE_KEY_ID


```

Terminal window

```

npx wrangler secret put BQ_PROJECT_ID


```

At this point, you have successfully imported three fields from the JSON key file downloaded from Google Cloud Platform into Cloudflare Secrets to be used in a Worker.

[Secrets](https://developers.cloudflare.com/workers/configuration/secrets/) are only made available to Workers once they are deployed. To make them available during development, [create a .dev.vars](https://developers.cloudflare.com/workers/configuration/secrets/#local-development-with-secrets) file to locally store these credentials and reference them as environment variables.

Your `dev.vars` file should look like the following:

```

BQ_CLIENT_EMAIL="<your_service_account_id>@<your_project_id>.iam.gserviceaccount.com"

BQ_CLIENT_KEY="-----BEGIN PRIVATE KEY-----<content_of_your_private_key>-----END PRIVATE KEY-----\n"

BQ_PRIVATE_KEY_ID="<your_private_key_id>"

BQ_PROJECT_ID="<your_project_id>"


```

Make sure to include `.dev.vars` in your project `.gitignore` file to prevent your credentials being uploaded to a repository when using version control.

Check the secrets are loaded correctly in `src/index.js` by logging their values into a console output, as follows:

JavaScript

```

export default {

  async fetch(request, env, ctx) {

    console.log("BQ_CLIENT_EMAIL: ", env.BQ_CLIENT_EMAIL);

    console.log("BQ_PRIVATE_KEY: ", env.BQ_PRIVATE_KEY);

    console.log("BQ_PRIVATE_KEY_ID: ", env.BQ_PRIVATE_KEY_ID);

    console.log("BQ_PROJECT_ID: ", env.BQ_PROJECT_ID);

    return new Response("Hello World!");

  },

};


```

Restart the Worker and run `npx wrangler dev`. You should see that the server now mentions the newly added variables:

```

Using vars defined in .dev.vars

Your worker has access to the following bindings:

- Vars:

  - BQ_CLIENT_EMAIL: "(hidden)"

  - BQ_PRIVATE_KEY: "(hidden)"

  - BQ_PRIVATE_KEY_ID: "(hidden)"

  - BQ_PROJECT_ID: "(hidden)"

[wrangler:inf] Ready on http://localhost:8787


```

If you open `http://localhost:8787` in your browser, you should see the values of the variables show up in your console where the `npx wrangler dev` command is running, while still seeing only the `Hello World!` text in the browser window.

You now have access to the GCP credentials from a Worker. Next, you will install a library to help with the creation of the JSON Web Token needed to interact with GCP's API.

## 3\. Install library to handle JWT operations

To interact with BigQuery's REST API, you will need to generate a [JSON Web Token ↗](https://jwt.io/introduction) to authenticate your requests using the credentials that you have loaded into Worker secrets in the previous step.

For this tutorial, you will be using the [jose ↗](https://www.npmjs.com/package/jose?activeTab=readme) library for JWT-related operations. Install it by running the following command in a console:

Terminal window

```

npm i jose


```

To verify that the installation succeeded, you can run `npm list`, which lists all the installed packages, to check if the `jose` dependency has been added:

Terminal window

```

<project_name>@0.0.0

/<path_to_your_project>/<project_name>

├── @cloudflare/vitest-pool-workers@0.4.29

├── jose@5.9.2

├── vitest@1.5.0

└── wrangler@3.75.0


```

## 4\. Generate JSON web token

Now that you have installed the `jose` library, it is time to import it and add a function to your code that generates a signed JSON Web Token (JWT):

JavaScript

```

import * as jose from 'jose';

...

const generateBQJWT = async (aCryptoKey, env) => {

const algorithm = "RS256";

const audience = "https://bigquery.googleapis.com/";

const expiryAt = (new Date().valueOf() / 1000);

  const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);


  // Generate signed JSON Web Token (JWT)

  return new jose.SignJWT()

      .setProtectedHeader({

          typ: 'JWT',

          alg: algorithm,

          kid: env.BQ_PRIVATE_KEY_ID

      })

      .setIssuer(env.BQ_CLIENT_EMAIL)

      .setSubject(env.BQ_CLIENT_EMAIL)

      .setAudience(audience)

      .setExpirationTime(expiryAt)

      .setIssuedAt()

      .sign(privateKey)

}


export default {

  async fetch(request, env, ctx) {

       ...

// Create JWT to authenticate the BigQuery API call

      let bqJWT;

      try {

          bqJWT = await generateBQJWT(env);

      } catch (e) {

          return new Response('An error has occurred while generating the JWT', { status: 500 })

      }

  },

       ...

};


```

Now that you have created a JWT, it is time to do an API call to BigQuery to fetch some data.

## 5\. Make authenticated requests to Google BigQuery

With the JWT token created in the previous step, issue an API request to BigQuery's API to retrieve data from a table.

You will now query the table that you created in BigQuery earlier in this tutorial. This example uses a sampled version of the [Hacker News Corpus ↗](https://www.kaggle.com/datasets/hacker-news/hacker-news-corpus) that was used under its MIT licence and uploaded to BigQuery.

JavaScript

```

const queryBQ = async (bqJWT, path) => {

  const bqEndpoint = `https://bigquery.googleapis.com${path}`

  // In this example, text is a field in the BigQuery table that is being queried (hn.news_sampled)

  const query = 'SELECT text FROM hn.news_sampled LIMIT 3';

  const response = await fetch(bqEndpoint, {

      method: "POST",

      body: JSON.stringify({

          "query": query

      }),

      headers: {

          Authorization: `Bearer ${bqJWT}`

      }

  })

  return response.json()

}

...

export default {

  async fetch(request, env, ctx) {

    ...

        let ticketInfo;

        try {

        ticketInfo = await queryBQ(bqJWT);

      } catch (e) {

          return new Response('An error has occurred while querying BQ', { status: 500 });

      }

  ...

  },

};


```

Having the raw row data from BigQuery means that you can now format it in a JSON-like style next.

## 6\. Format results from the query

Now that you have retrieved the data from BigQuery, your BigQuery API response should look something like this:

```

{

  ...

  "schema": {

      "fields": [

          {

              "name": "title",

              "type": "STRING",

              "mode": "NULLABLE"

          },

          {

              "name": "text",

              "type": "STRING",

              "mode": "NULLABLE"

          }

      ]

  },

  ...

  "rows": [

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      },

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      },

      {

          "f": [

              {

                  "v": "<some_value>"

              },

              {

                  "v": "<some_value>"

              }

          ]

      }

  ],

  ...

}


```

This format may be difficult to read and work with when iterating through results. So you will now implement a function that maps the schema into each individual value, and the resulting output will be easier to read, as shown below. Each row corresponds to an object within an array.

JavaScript

```

[

  {

    title: "<some_value>",

    text: "<some_value>",

  },

  {

    title: "<some_value>",

    text: "<some_value>",

  },

  {

    title: "<some_value>",

    text: "<some_value>",

  },

];


```

Create a `formatRows` function that takes a number of rows and fields returned from the BigQuery response body and returns an array of results as objects with named fields.

JavaScript

```

const formatRows = (rowsWithoutFieldNames, fields) => {

  // Index to fieldName

  const fieldsByIndex = new Map();


  // Load all fields by name and have their index in the array result as their key

  fields.forEach((field, index) => {

      fieldsByIndex.set(index, field.name)

  })


  // Iterate through rows

  const rowsWithFieldNames = rowsWithoutFieldNames.map(row => {

      // Per each row represented by an array f, iterate through the unnamed values and find their field names by searching them in the fieldsByIndex.

      let newRow = {}

      row.f.forEach((field, index) => {

          const fieldName = fieldsByIndex.get(index);

          if (fieldName) {

    // For every field in a row, add them to newRow

              newRow = ({ ...newRow, [fieldName]: field.v });

          }

      })

      return newRow

  })


  return rowsWithFieldNames

}


export default {

  async fetch(request, env, ctx) {

    ...

      // Transform output format into array of objects with named fields

      let formattedResults;


      if ('rows' in ticketInfo) {

          formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);

          console.log(formattedResults)

      } else if ('error' in ticketInfo) {

          return new Response(ticketInfo.error.message, { status: 500 })

      }

  ...

  },

};


```

## 7\. Feed data into Workers AI

Now that you have converted the response from the BigQuery API into an array of results, generate some tags and attach an associated sentiment score using an LLM via [Workers AI](https://developers.cloudflare.com/workers-ai/):

JavaScript

```

const generateTags = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,

  });

}


const generateSentimentScore = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

      prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,

  });

}


// Iterates through values, sends them to an AI handler and encapsulates all responses into a single Promise

const getAIGeneratedContent = (data, env, aiHandler) => {

  let results = data?.map(dataPoint => {

      return aiHandler(dataPoint, env)

  })

  return Promise.all(results)

}

...

export default {

  async fetch(request, env, ctx) {

    ...

let summaries, sentimentScores;

      try {

          summaries = await getAIGeneratedContent(formattedResults, env, generateTags);

          sentimentScores = await getAIGeneratedContent(formattedResults, env, generateSentimentScore)

      } catch {

          return new Response('There was an error while generating the text summaries or sentiment scores')

      }

},


formattedResults = formattedResults?.map((formattedResult, i) => {

          if (sentimentScores[i].response && summaries[i].response) {

              return {

                  ...formattedResult,

                  'sentiment': parseFloat(sentimentScores[i].response).toFixed(2),

                  'tags': summaries[i].response.split(',').map((result) => result.trim())

              }

          }

      }

};


```

Uncomment the following lines from the Wrangler file in your project:

* [  wrangler.jsonc ](#tab-panel-6959)
* [  wrangler.toml ](#tab-panel-6960)

```

{

  "ai": {

    "binding": "AI"

  }

}


```

```

[ai]

binding = "AI"


```

Restart the Worker that is running locally, and after doing so, go to your application endpoint:

Terminal window

```

curl http://localhost:8787


```

It is likely that you will be asked to log in to your Cloudflare account and grant temporary access to Wrangler (the Cloudflare CLI) to use your account when using Worker AI.

Once you access `http://localhost:8787` you should see an output similar to the following:

Terminal window

```

{

  "data": [

  {

    "text": "You can see a clear spike in submissions right around US Thanksgiving.",

    "sentiment": "0.61",

    "tags": [

      "trends",

      "submissions",

      "thanksgiving"

    ]

  },

  {

    "text": "I didn't test the changes before I published them.  I basically did development on the running server. In fact for about 30 seconds the comments page was broken due to a bug.",

    "sentiment": "0.35",

    "tags": [

      "software",

      "deployment",

      "error"

    ]

  },

  {

    "text": "I second that. As I recall, it's a very enjoyable 700-page brain dump by someone who's really into his subject. The writing has a personal voice; there are lots of asides, dry wit, and typos that suggest restrained editing. The discussion is intelligent and often theoretical (and Bartle is not scared to use mathematical metaphors), but the tone is not academic.",

    "sentiment": "0.86",

    "tags": [

      "review",

      "game",

      "design"

    ]

  }

  ]

}


```

The actual values and fields will mostly depend on the query made in Step 5 that is then fed into the LLM.

## Final result

All the code shown in the different steps is combined into the following code in `src/index.js`:

JavaScript

```

import * as jose from "jose";


const generateBQJWT = async (env) => {

  const algorithm = "RS256";

  const audience = "https://bigquery.googleapis.com/";

  const expiryAt = new Date().valueOf() / 1000;

  const privateKey = await jose.importPKCS8(env.BQ_PRIVATE_KEY, algorithm);


  // Generate signed JSON Web Token (JWT)

  return new jose.SignJWT()

    .setProtectedHeader({

      typ: "JWT",

      alg: algorithm,

      kid: env.BQ_PRIVATE_KEY_ID,

    })

    .setIssuer(env.BQ_CLIENT_EMAIL)

    .setSubject(env.BQ_CLIENT_EMAIL)

    .setAudience(audience)

    .setExpirationTime(expiryAt)

    .setIssuedAt()

    .sign(privateKey);

};


const queryBQ = async (bgJWT, path) => {

  const bqEndpoint = `https://bigquery.googleapis.com${path}`;

  const query = "SELECT text FROM hn.news_sampled LIMIT 3";

  const response = await fetch(bqEndpoint, {

    method: "POST",

    body: JSON.stringify({

      query: query,

    }),

    headers: {

      Authorization: `Bearer ${bgJWT}`,

    },

  });

  return response.json();

};


const formatRows = (rowsWithoutFieldNames, fields) => {

  // Index to fieldName

  const fieldsByIndex = new Map();


  fields.forEach((field, index) => {

    fieldsByIndex.set(index, field.name);

  });


  const rowsWithFieldNames = rowsWithoutFieldNames.map((row) => {

    // Map rows into an array of objects with field names

    let newRow = {};

    row.f.forEach((field, index) => {

      const fieldName = fieldsByIndex.get(index);

      if (fieldName) {

        newRow = { ...newRow, [fieldName]: field.v };

      }

    });

    return newRow;

  });


  return rowsWithFieldNames;

};


const generateTags = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

    prompt: `Create three one-word tags for the following text. return only these three tags separated by a comma. don't return text that is not a category.Lowercase only. ${JSON.stringify(data)}`,

  });

};


const generateSentimentScore = (data, env) => {

  return env.AI.run("@cf/meta/llama-3.1-8b-instruct", {

    prompt: `return a float number between 0 and 1 measuring the sentiment of the following text. 0 being negative and 1 positive. return only the number, no text. ${JSON.stringify(data)}`,

  });

};


const getAIGeneratedContent = (data, env, aiHandler) => {

  let results = data?.map((dataPoint) => {

    return aiHandler(dataPoint, env);

  });

  return Promise.all(results);

};


export default {

  async fetch(request, env, ctx) {

    // Create JWT to authenticate the BigQuery API call

    let bqJWT;

    try {

      bqJWT = await generateBQJWT(env);

    } catch (error) {

      console.log(error);

      return new Response("An error has occurred while generating the JWT", {

        status: 500,

      });

    }


    // Fetch results from BigQuery

    let ticketInfo;

    try {

      ticketInfo = await queryBQ(

        bqJWT,

        `/bigquery/v2/projects/${env.BQ_PROJECT_ID}/queries`,

      );

    } catch (error) {

      console.log(error);

      return new Response("An error has occurred while querying BQ", {

        status: 500,

      });

    }


    // Transform output format into array of objects with named fields

    let formattedResults;

    if ("rows" in ticketInfo) {

      formattedResults = formatRows(ticketInfo.rows, ticketInfo.schema.fields);

    } else if ("error" in ticketInfo) {

      return new Response(ticketInfo.error.message, { status: 500 });

    }


    // Generate AI summaries and sentiment scores

    let summaries, sentimentScores;

    try {

      summaries = await getAIGeneratedContent(

        formattedResults,

        env,

        generateTags,

      );

      sentimentScores = await getAIGeneratedContent(

        formattedResults,

        env,

        generateSentimentScore,

      );

    } catch {

      return new Response(

        "There was an error while generating the text summaries or sentiment scores",

      );

    }


    // Add AI summaries and sentiment scores to previous results

    formattedResults = formattedResults?.map((formattedResult, i) => {

      if (sentimentScores[i].response && summaries[i].response) {

        return {

          ...formattedResult,

          sentiment: parseFloat(sentimentScores[i].response).toFixed(2),

          tags: summaries[i].response.split(",").map((result) => result.trim()),

        };

      }

    });


    const response = { data: formattedResults };


    return new Response(JSON.stringify(response), {

      headers: { "Content-Type": "application/json" },

    });

  },

};


```

If you wish to deploy this Worker, you can do so by running `npx wrangler deploy`:

Terminal window

```

Total Upload: <size_of_your_worker> KiB / gzip: <compressed_size_of_your_worker> KiB

Uploaded <name_of_your_worker> (x sec)

Deployed <name_of_your_worker> triggers (x sec)

  https://<your_public_worker_endpoint>

Current Version ID: <worker_script_version_id>


```

This will create a public endpoint that you can use to access the Worker globally. Please keep this in mind when using production data, and make sure to include additional access controls in place.

## Conclusion

In this tutorial, you have learnt how to integrate Google BigQuery and Cloudflare Workers by creating a GCP service account key and storing part of it as Worker secrets. This was later imported in the code, and by using the `jose` npm library, you created a JSON Web Token to authenticate the API query to BigQuery.

Once you obtained the results, you formatted them to pass to generative AI models via Workers AI to generate tags and to perform sentiment analysis on the extracted data.

## Next Steps

If, instead of displaying the results of ingesting the data to the AI model in a browser, your workflow requires fetching and store data (for example in [R2](https://developers.cloudflare.com/r2/) or [D1](https://developers.cloudflare.com/d1/)) on regular intervals, you may want to consider adding a [scheduled handler](https://developers.cloudflare.com/workers/runtime-apis/handlers/scheduled/) for this Worker. This enables you to trigger the Worker with a predefined cadence via a [Cron Trigger](https://developers.cloudflare.com/workers/configuration/cron-triggers/). Consider reviewing the Reference Architecture Diagrams on [Ingesting BigQuery Data into Workers AI](https://developers.cloudflare.com/reference-architecture/diagrams/ai/bigquery-workers-ai/).

A use case to ingest data from other sources, like you did in this tutorial, is to create a RAG system. If this sounds relevant to you, please check out the [Build a Retrieval Augmented Generation (RAG) AI tutorial](https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/).

To learn more about what other AI models you can use at Cloudflare, please visit the [Workers AI](https://developers.cloudflare.com/workers-ai) section of our docs.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/guides/","name":"Guides"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/guides/tutorials/","name":"Tutorials"}},{"@type":"ListItem","position":5,"item":{"@id":"/workers-ai/guides/tutorials/using-bigquery-with-workers-ai/","name":"Using BigQuery with Workers AI"}}]}
```

---

---
title: AI Gateway
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/ai-gateway.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# AI Gateway

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/ai-gateway/","name":"AI Gateway"}}]}
```

---

---
title: Data usage
description: Cloudflare processes certain customer data in order to provide the Workers AI service, subject to our Privacy Policy and Self-Serve Subscription Agreement or Enterprise Subscription Agreement (as applicable).
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/data-usage.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Data usage

Cloudflare processes certain customer data in order to provide the Workers AI service, subject to our [Privacy Policy ↗](https://www.cloudflare.com/privacypolicy/) and [Self-Serve Subscription Agreement ↗](https://www.cloudflare.com/terms/) or [Enterprise Subscription Agreement ↗](https://www.cloudflare.com/enterpriseterms/) (as applicable).

Cloudflare neither creates nor trains the AI models made available on Workers AI. The models constitute Third-Party Services and may be subject to open source or other license terms that apply between you and the model provider. Be sure to review the license terms applicable to each model (if any).

Your inputs (e.g., text prompts, image submissions, audio files, etc.), outputs (e.g., generated text/images, translations, etc.), embeddings, and training data constitute Customer Content.

For Workers AI:

* You own, and are responsible for, all of your Customer Content.
* Cloudflare does not make your Customer Content available to any other Cloudflare customer.
* Cloudflare does not use your Customer Content to (1) train any AI models made available on Workers AI or (2) improve any Cloudflare or third-party services, and would not do so unless we received your explicit consent.
* Your Customer Content for Workers AI may be stored by Cloudflare if you specifically use a storage service (e.g., R2, KV, DO, Vectorize, etc.) in conjunction with Workers AI.

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/data-usage/","name":"Data usage"}}]}
```

---

---
title: Errors
description: Below is a list of Workers AI errors.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/errors.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Errors

Below is a list of Workers AI errors.

| **Name**                              | **Internal Code** | **HTTP Code** | **Description**                                                                                                                                      |
| ------------------------------------- | ----------------- | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- |
| No such model                         | 5007              | 400           | No such model ${model} or task                                                                                                                       |
| Invalid data                          | 5004              | 400           | Invalid data type for base64 input: ${type}                                                                                                          |
| Finetune missing required files       | 3039              | 400           | Finetune is missing required files (model.safetensors and config.json)                                                                               |
| Incomplete request                    | 3003              | 400           | Request is missing headers or body: {what}                                                                                                           |
| Account not allowed for private model | 5018              | 403           | The account is not allowed to access this model                                                                                                      |
| Model agreement                       | 5016              | 403           | User has not agreed to Llama3.2 model terms                                                                                                          |
| Account blocked                       | 3023              | 403           | Service unavailable for account                                                                                                                      |
| Account not allowed for private model | 3041              | 403           | The account is not allowed to access this model                                                                                                      |
| Deprecated SDK version                | 5019              | 405           | Request trying to use deprecated SDK version                                                                                                         |
| LoRa unsupported                      | 5005              | 405           | The model ${this.model} does not support LoRa inference                                                                                              |
| Invalid model ID                      | 3042              | 404           | The model name is invalid                                                                                                                            |
| Request too large                     | 3006              | 413           | Request is too large                                                                                                                                 |
| Timeout                               | 3007              | 408           | Request timeout                                                                                                                                      |
| Aborted                               | 3008              | 408           | Request was aborted                                                                                                                                  |
| Account limited                       | 3036              | 429           | You have used up your daily free allocation of 10,000 neurons. Please upgrade to Cloudflare's Workers Paid plan if you would like to continue usage. |
| Out of capacity                       | 3040              | 429           | No more data centers to forward the request to                                                                                                       |

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/errors/","name":"Errors"}}]}
```

---

---
title: Event subscriptions
description: Event subscriptions allow you to receive messages when events occur across your Cloudflare account. Cloudflare products (e.g., KV, Workers AI, Workers) can publish structured events to a queue, which you can then consume with Workers or HTTP pull consumers to build custom workflows, integrations, or logic.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/event-subscriptions.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Event subscriptions

[Event subscriptions](https://developers.cloudflare.com/queues/event-subscriptions/) allow you to receive messages when events occur across your Cloudflare account. Cloudflare products (e.g., [KV](https://developers.cloudflare.com/kv/), [Workers AI](https://developers.cloudflare.com/workers-ai/), [Workers](https://developers.cloudflare.com/workers/)) can publish structured events to a [queue](https://developers.cloudflare.com/queues/), which you can then consume with Workers or [HTTP pull consumers](https://developers.cloudflare.com/queues/configuration/pull-consumers/) to build custom workflows, integrations, or logic.

For more information on [Event Subscriptions](https://developers.cloudflare.com/queues/event-subscriptions/), refer to the [management guide](https://developers.cloudflare.com/queues/event-subscriptions/manage-event-subscriptions/).

## Available Workers AI events

#### `batch.queued`

Triggered when a batch request is queued.

**Example:**

```

{

  "type": "cf.workersAi.model.batch.queued",

  "source": {

    "type": "workersAi.model",

    "modelName": "@cf/baai/bge-base-en-v1.5"

  },

  "payload": {

    "requestId": "req-12345678-90ab-cdef-1234-567890abcdef"

  },

  "metadata": {

    "accountId": "f9f79265f388666de8122cfb508d7776",

    "eventSubscriptionId": "1830c4bb612e43c3af7f4cada31fbf3f",

    "eventSchemaVersion": 1,

    "eventTimestamp": "2025-05-01T02:48:57.132Z"

  }

}


```

#### `batch.succeeded`

Triggered when a batch request has completed.

**Example:**

```

{

  "type": "cf.workersAi.model.batch.succeeded",

  "source": {

    "type": "workersAi.model",

    "modelName": "@cf/baai/bge-base-en-v1.5"

  },

  "payload": {

    "requestId": "req-12345678-90ab-cdef-1234-567890abcdef"

  },

  "metadata": {

    "accountId": "f9f79265f388666de8122cfb508d7776",

    "eventSubscriptionId": "1830c4bb612e43c3af7f4cada31fbf3f",

    "eventSchemaVersion": 1,

    "eventTimestamp": "2025-05-01T02:48:57.132Z"

  }

}


```

#### `batch.failed`

Triggered when a batch request has failed.

**Example:**

```

{

  "type": "cf.workersAi.model.batch.failed",

  "source": {

    "type": "workersAi.model",

    "modelName": "@cf/baai/bge-base-en-v1.5"

  },

  "payload": {

    "requestId": "req-12345678-90ab-cdef-1234-567890abcdef",

    "message": "Model execution failed",

    "internalCode": 5001,

    "httpCode": 500

  },

  "metadata": {

    "accountId": "f9f79265f388666de8122cfb508d7776",

    "eventSubscriptionId": "1830c4bb612e43c3af7f4cada31fbf3f",

    "eventSchemaVersion": 1,

    "eventTimestamp": "2025-05-01T02:48:57.132Z"

  }

}


```

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/event-subscriptions/","name":"Event subscriptions"}}]}
```

---

---
title: Glossary
description: Review the definitions for terms used across Cloudflare's Workers AI documentation.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/glossary.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Glossary

Review the definitions for terms used across Cloudflare's Workers AI documentation.

| Term                  | Definition                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| AI models             | [An AI model](https://developers.cloudflare.com/workers-ai/models) is a trained system that processes input data to generate predictions, decisions, or outputs based on patterns it has learned.                                                                                                                                                                                                                                                 |
| API Tokens            | [API Tokens](https://developers.cloudflare.com/workers-ai/get-started/rest-api/) are authentication credentials used to securely access and manage Workers AI resources via the REST API.                                                                                                                                                                                                                                                         |
| Cloudflare Dashboard  | [Cloudflare Dashboard](https://developers.cloudflare.com/workers-ai/get-started/dashboard/) is a web-based interface that allows users to manage Workers AI services, including model deployment and monitoring.                                                                                                                                                                                                                                  |
| Context Window        | In generative AI, the context window is the sum of the number of input, reasoning, and completion or response tokens a model supports. You can find the context window limit on each [model page](https://developers.cloudflare.com/workers-ai/models/).                                                                                                                                                                                          |
| D1                    | [D1](https://developers.cloudflare.com/d1/) is Cloudflare's managed, serverless database with SQLite's SQL semantics, built-in disaster recovery, and Worker and HTTP API access.                                                                                                                                                                                                                                                                 |
| Environment Variables | [Environment Variables](https://developers.cloudflare.com/workers-ai/configuration/bindings/) are dynamic values that can be used within Workers to manage configuration settings, including those related to AI integrations.                                                                                                                                                                                                                    |
| Fine-Tuning           | [Fine-Tuning](https://developers.cloudflare.com/workers-ai/fine-tunes/) is a general term for modifying an AI model by continuing to train it with additional data.                                                                                                                                                                                                                                                                               |
| Function Calling      | [Function Calling](https://developers.cloudflare.com/workers-ai/function-calling/) enables people to take Large Language Models (LLMs) and use the model response to execute functions or interact with external APIs.                                                                                                                                                                                                                            |
| Inference             | [Inference](https://developers.cloudflare.com/workers-ai/fine-tunes/public-loras/#running-inference-with-public-loras) refers to the process of using a trained machine learning model to make predictions or generate outputs based on new data.                                                                                                                                                                                                 |
| LoRA Adapters         | [LoRA Adapters](https://developers.cloudflare.com/workers-ai/fine-tunes/loras/) (Low-Rank Adaptation adapters) are used in machine learning to fine-tune models efficiently by adjusting a small number of parameters, allowing for customization of AI models in Workers AI.[Public LoRA Adapters](https://developers.cloudflare.com/workers-ai/fine-tunes/public-loras/) are pre-trained Low-Rank Adaptation adapters available for public use. |
| Maximum Tokens        | In generative AI, the user-defined property max\_tokens defines the maximum number of tokens at which the model should stop responding. This limit cannot exceed the context window.                                                                                                                                                                                                                                                              |
| Model Catalog         | [Model Catalog](https://developers.cloudflare.com/workers-ai/models/) is a curated collection of AI models available within Workers AI, providing developers with a variety of pre-trained models for different tasks.                                                                                                                                                                                                                            |
| Prompt Engineering    | [Prompt Engineering](https://developers.cloudflare.com/workers-ai/guides/prompting/) is the practice of designing and refining input prompts to effectively elicit desired responses from AI models.                                                                                                                                                                                                                                              |
| Prompt Templates      | [Prompt Templates](https://developers.cloudflare.com/workers-ai/guides/prompting/) are predefined structures that guide the input provided to AI models, enhancing consistency and effectiveness in responses.                                                                                                                                                                                                                                    |
| REST API              | [REST API](https://developers.cloudflare.com/workers-ai/get-started/rest-api/) is an application programming interface that allows developers to interact with Workers AI services over HTTP, enabling model management and inference requests.                                                                                                                                                                                                   |
| Serverless GPUs       | [Serverless GPUs](https://developers.cloudflare.com/workers-ai/) are graphics processing units provided by Cloudflare in a serverless environment, enabling scalable and efficient execution of machine learning models without the need for managing underlying hardware.                                                                                                                                                                        |
| Worker Bindings       | [Worker Bindings](https://developers.cloudflare.com/workers-ai/configuration/bindings/) are configurations that connect Workers scripts to external resources, such as AI models, enabling seamless integration and functionality.                                                                                                                                                                                                                |
| Workers AI            | [Workers AI](https://developers.cloudflare.com/workers-ai/) is a Cloudflare service that enables running machine learning models on Cloudflare's global network, utilizing serverless GPUs. It allows developers to integrate AI capabilities into their applications using Workers, Pages, or via the REST API.                                                                                                                                  |
| Workers KV            | [Workers KV](https://developers.cloudflare.com/kv/)is a data storage that allows you to store and retrieve data globally.                                                                                                                                                                                                                                                                                                                         |
| Wrangler CLI          | [Wrangler CLI](https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/) is a command-line tool for building and deploying Cloudflare Workers, facilitating the integration of AI models into applications.                                                                                                                                                                                                                     |

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/glossary/","name":"Glossary"}}]}
```

---

---
title: Limits
description: Workers AI is now Generally Available. We've updated our rate limits to reflect this.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/limits.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Limits

Workers AI is now Generally Available. We've updated our rate limits to reflect this.

Note that model inferences in local mode using Wrangler will also count towards these limits. Beta models may have lower rate limits while we work on performance and scale.

Custom requirements

If you have custom requirements like private custom models or higher limits, complete the [Custom Requirements Form ↗](https://forms.gle/axnnpGDb6xrmR31T6). Cloudflare will contact you with next steps.

Rate limits are default per task type, with some per-model limits defined as follows:

## Rate limits by task type

### [Automatic Speech Recognition](https://developers.cloudflare.com/workers-ai/models/)

* 720 requests per minute

### [Image Classification](https://developers.cloudflare.com/workers-ai/models/)

* 3000 requests per minute

### [Image-to-Text](https://developers.cloudflare.com/workers-ai/models/)

* 720 requests per minute

### [Object Detection](https://developers.cloudflare.com/workers-ai/models/)

* 3000 requests per minute

### [Summarization](https://developers.cloudflare.com/workers-ai/models/)

* 1500 requests per minute

### [Text Classification](https://developers.cloudflare.com/workers-ai/models/)

* 2000 requests per minute

### [Text Embeddings](https://developers.cloudflare.com/workers-ai/models/)

* 3000 requests per minute
* [@cf/baai/bge-large-en-v1.5](https://developers.cloudflare.com/workers-ai/models/bge-large-en-v1.5/) is 1500 requests per minute

### [Text Generation](https://developers.cloudflare.com/workers-ai/models/)

* 300 requests per minute
* [@hf/thebloke/mistral-7b-instruct-v0.1-awq](https://developers.cloudflare.com/workers-ai/models/mistral-7b-instruct-v0.1-awq/) is 400 requests per minute
* [@cf/microsoft/phi-2](https://developers.cloudflare.com/workers-ai/models/phi-2/) is 720 requests per minute
* [@cf/qwen/qwen1.5-0.5b-chat](https://developers.cloudflare.com/workers-ai/models/qwen1.5-0.5b-chat/) is 1500 requests per minute
* [@cf/qwen/qwen1.5-1.8b-chat](https://developers.cloudflare.com/workers-ai/models/qwen1.5-1.8b-chat/) is 720 requests per minute
* [@cf/qwen/qwen1.5-14b-chat-awq](https://developers.cloudflare.com/workers-ai/models/qwen1.5-14b-chat-awq/) is 150 requests per minute
* [@cf/tinyllama/tinyllama-1.1b-chat-v1.0](https://developers.cloudflare.com/workers-ai/models/tinyllama-1.1b-chat-v1.0/) is 720 requests per minute

### [Text-to-Image](https://developers.cloudflare.com/workers-ai/models/)

* 720 requests per minute
* [@cf/runwayml/stable-diffusion-v1-5-img2img](https://developers.cloudflare.com/workers-ai/models/stable-diffusion-v1-5-img2img/) is 1500 requests per minute

### [Translation](https://developers.cloudflare.com/workers-ai/models/)

* 720 requests per minute

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/limits/","name":"Limits"}}]}
```

---

---
title: Pricing
description: Workers AI is included in both the Free and Paid Workers plans and is priced at $0.011 per 1,000 Neurons.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers-ai/platform/pricing.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Pricing

Note

Workers AI has updated pricing to be more granular, with per-model unit-based pricing presented, but still billing in neurons in the back end.

Workers AI is included in both the [Free and Paid Workers plans](https://developers.cloudflare.com/workers/platform/pricing/) and is priced at **$0.011 per 1,000 Neurons**.

Our free allocation allows anyone to use a total of **10,000 Neurons per day at no charge**. To use more than 10,000 Neurons per day, you need to sign up for the [Workers Paid plan](https://developers.cloudflare.com/workers/platform/pricing/#workers). On Workers Paid, you will be charged at $0.011 / 1,000 Neurons for any usage above the free allocation of 10,000 Neurons per day.

You can monitor your Neuron usage in the [Cloudflare Workers AI dashboard ↗](https://dash.cloudflare.com/?to=/:account/ai/workers-ai).

All limits reset daily at 00:00 UTC. If you exceed any one of the above limits, further operations will fail with an error.

| Free  allocation | Pricing                |                               |
| ---------------- | ---------------------- | ----------------------------- |
| Workers Free     | 10,000 Neurons per day | N/A - Upgrade to Workers Paid |
| Workers Paid     | 10,000 Neurons per day | $0.011 / 1,000 Neurons        |

## What are Neurons?

Neurons are our way of measuring AI outputs across different models, representing the GPU compute needed to perform your request. Our serverless model allows you to pay only for what you use without having to worry about renting, managing, or scaling GPUs.

Note

The Price in Tokens column is equivalent to the Price in Neurons column - the different units are displayed so you can easily compare and understand pricing.

## LLM model pricing

| Model                                        | Price in Tokens                                                                         | Price in Neurons                                                                                             |
| -------------------------------------------- | --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ |
| @cf/meta/llama-3.2-1b-instruct               | $0.027 per M input tokens  $0.201 per M output tokens                                   | 2457 neurons per M input tokens  18252 neurons per M output tokens                                           |
| @cf/meta/llama-3.2-3b-instruct               | $0.051 per M input tokens  $0.335 per M output tokens                                   | 4625 neurons per M input tokens  30475 neurons per M output tokens                                           |
| @cf/meta/llama-3.1-8b-instruct-fp8-fast      | $0.045 per M input tokens  $0.384 per M output tokens                                   | 4119 neurons per M input tokens  34868 neurons per M output tokens                                           |
| @cf/meta/llama-3.2-11b-vision-instruct       | $0.049 per M input tokens  $0.676 per M output tokens                                   | 4410 neurons per M input tokens  61493 neurons per M output tokens                                           |
| @cf/meta/llama-3.1-70b-instruct-fp8-fast     | $0.293 per M input tokens  $2.253 per M output tokens                                   | 26668 neurons per M input tokens  204805 neurons per M output tokens                                         |
| @cf/meta/llama-3.3-70b-instruct-fp8-fast     | $0.293 per M input tokens  $2.253 per M output tokens                                   | 26668 neurons per M input tokens  204805 neurons per M output tokens                                         |
| @cf/deepseek-ai/deepseek-r1-distill-qwen-32b | $0.497 per M input tokens  $4.881 per M output tokens                                   | 45170 neurons per M input tokens  443756 neurons per M output tokens                                         |
| @cf/mistral/mistral-7b-instruct-v0.1         | $0.110 per M input tokens  $0.190 per M output tokens                                   | 10000 neurons per M input tokens  17300 neurons per M output tokens                                          |
| @cf/mistralai/mistral-small-3.1-24b-instruct | $0.351 per M input tokens  $0.555 per M output tokens                                   | 31876 neurons per M input tokens  50488 neurons per M output tokens                                          |
| @cf/meta/llama-3.1-8b-instruct               | $0.282 per M input tokens  $0.827 per M output tokens                                   | 25608 neurons per M input tokens  75147 neurons per M output tokens                                          |
| @cf/meta/llama-3.1-8b-instruct-fp8           | $0.152 per M input tokens  $0.287 per M output tokens                                   | 13778 neurons per M input tokens  26128 neurons per M output tokens                                          |
| @cf/meta/llama-3.1-8b-instruct-awq           | $0.123 per M input tokens  $0.266 per M output tokens                                   | 11161 neurons per M input tokens  24215 neurons per M output tokens                                          |
| @cf/meta/llama-3-8b-instruct                 | $0.282 per M input tokens  $0.827 per M output tokens                                   | 25608 neurons per M input tokens  75147 neurons per M output tokens                                          |
| @cf/meta/llama-3-8b-instruct-awq             | $0.123 per M input tokens  $0.266 per M output tokens                                   | 11161 neurons per M input tokens  24215 neurons per M output tokens                                          |
| @cf/meta/llama-2-7b-chat-fp16                | $0.556 per M input tokens  $6.667 per M output tokens                                   | 50505 neurons per M input tokens  606061 neurons per M output tokens                                         |
| @cf/meta/llama-guard-3-8b                    | $0.484 per M input tokens  $0.030 per M output tokens                                   | 44003 neurons per M input tokens  2730 neurons per M output tokens                                           |
| @cf/meta/llama-4-scout-17b-16e-instruct      | $0.270 per M input tokens  $0.850 per M output tokens                                   | 24545 neurons per M input tokens  77273 neurons per M output tokens                                          |
| @cf/google/gemma-3-12b-it                    | $0.345 per M input tokens  $0.556 per M output tokens                                   | 31371 neurons per M input tokens  50560 neurons per M output tokens                                          |
| @cf/qwen/qwq-32b                             | $0.660 per M input tokens  $1.000 per M output tokens                                   | 60000 neurons per M input tokens  90909 neurons per M output tokens                                          |
| @cf/qwen/qwen2.5-coder-32b-instruct          | $0.660 per M input tokens  $1.000 per M output tokens                                   | 60000 neurons per M input tokens  90909 neurons per M output tokens                                          |
| @cf/qwen/qwen3-30b-a3b-fp8                   | $0.051 per M input tokens  $0.335 per M output tokens                                   | 4625 neurons per M input tokens  30475 neurons per M output tokens                                           |
| @cf/openai/gpt-oss-120b                      | $0.350 per M input tokens  $0.750 per M output tokens                                   | 31818 neurons per M input tokens  68182 neurons per M output tokens                                          |
| @cf/openai/gpt-oss-20b                       | $0.200 per M input tokens  $0.300 per M output tokens                                   | 18182 neurons per M input tokens  27273 neurons per M output tokens                                          |
| @cf/aisingapore/gemma-sea-lion-v4-27b-it     | $0.351 per M input tokens  $0.555 per M output tokens                                   | 31876 neurons per M input tokens  50488 neurons per M output tokens                                          |
| @cf/ibm-granite/granite-4.0-h-micro          | $0.017 per M input tokens  $0.112 per M output tokens                                   | 1542 neurons per M input tokens  10158 neurons per M output tokens                                           |
| @cf/zai-org/glm-4.7-flash                    | $0.060 per M input tokens  $0.400 per M output tokens                                   | 5500 neurons per M input tokens  36400 neurons per M output tokens                                           |
| @cf/nvidia/nemotron-3-120b-a12b              | $0.500 per M input tokens  $1.500 per M output tokens                                   | 45455 neurons per M input tokens  136364 neurons per M output tokens                                         |
| @cf/moonshotai/kimi-k2.5                     | $0.600 per M input tokens  $0.100 per M cached input tokens  $3.000 per M output tokens | 54545 neurons per M input tokens  9091 neurons per M cached input tokens  272727 neurons per M output tokens |

## Embeddings model pricing

| Model                         | Price in Tokens           | Price in Neurons                 |
| ----------------------------- | ------------------------- | -------------------------------- |
| @cf/baai/bge-small-en-v1.5    | $0.020 per M input tokens | 1841 neurons per M input tokens  |
| @cf/baai/bge-base-en-v1.5     | $0.067 per M input tokens | 6058 neurons per M input tokens  |
| @cf/baai/bge-large-en-v1.5    | $0.204 per M input tokens | 18582 neurons per M input tokens |
| @cf/baai/bge-m3               | $0.012 per M input tokens | 1075 neurons per M input tokens  |
| @cf/pfnet/plamo-embedding-1b  | $0.019 per M input tokens | 1689 neurons per M input tokens  |
| @cf/qwen/qwen3-embedding-0.6b | $0.012 per M input tokens | 1075 neurons per M input tokens  |

## Image model pricing

| Model                                 | Price in Tokens                                                                       | Price in Neurons                                                                                              |
| ------------------------------------- | ------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------- |
| @cf/black-forest-labs/flux-1-schnell  | $0.0000528 per 512x512 tile  $0.0001056 per step                                      | 4.80 neurons per 512x512 tile  9.60 neurons per step                                                          |
| @cf/leonardo/lucid-origin             | $0.006996 per 512x512 tile  $0.000132 per step                                        | 636.00 neurons per 512x512 tile  12.00 neurons per step                                                       |
| @cf/leonardo/phoenix-1.0              | $0.005830 per 512x512 tile  $0.000110 per step                                        | 530.00 neurons per 512x512 tile  10.00 neurons per step                                                       |
| @cf/black-forest-labs/flux-2-dev      | $0.00021 per input 512x512 tile, per step  $0.00041 per output 512x512 tile, per step | 18.75 neurons per input 512x512 tile, per step  37.50 neurons per output 512x512 tile, per step               |
| @cf/black-forest-labs/flux-2-klein-4b | $0.000059 per input 512x512 tile  $0.000287 per output 512x512 tile                   | 5.37 neurons per input 512x512 tile  26.05 neurons per output 512x512 tile                                    |
| @cf/black-forest-labs/flux-2-klein-9b | $0.015 per first MP (1024x1024)  $0.002 per subsequent MP  $0.002 per input image MP  | 1363.64 neurons per first MP (1024x1024)  181.82 neurons per subsequent MP  181.82 neurons per input image MP |

## Audio model pricing

| Model                             | Price in Tokens                    | Price in Neurons                         |
| --------------------------------- | ---------------------------------- | ---------------------------------------- |
| @cf/openai/whisper                | $0.0005 per audio minute           | 41.14 neurons per audio minute           |
| @cf/openai/whisper-large-v3-turbo | $0.0005 per audio minute           | 46.63 neurons per audio minute           |
| @cf/myshell-ai/melotts            | $0.0002 per audio minute           | 18.63 neurons per audio minute           |
| @cf/deepgram/aura-1               | $0.015 per 1k characters input     | 1,363.64 neurons per 1k characters input |
| @cf/deepgram/nova-3               | $0.0052 per audio minute input     | 472.73 neurons per audio minute input    |
| @cf/deepgram/nova-3 (WebSocket)   | $0.0092 per audio minute input     | 836.36 neurons per audio minute input    |
| @cf/pipecat-ai/smart-turn-v2      | $0.00033795 per audio minute input | 0.51 neurons per audio minute input      |
| @cf/deepgram/aura-2-en            | $0.030 per 1k characters input     | 2727.27 neurons per 1k characters input  |
| @cf/deepgram/aura-2-es            | $0.030 per 1k characters input     | 2727.27 neurons per 1k characters input  |
| @cf/deepgram/flux (WebSocket)     | $0.0077 per audio minute           | 700.00 neurons per audio minute          |

## Other model pricing

| Model                                 | Price in Tokens                                       | Price in Neurons                                                    |
| ------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------------------- |
| @cf/huggingface/distilbert-sst-2-int8 | $0.026 per M input tokens                             | 2394 neurons per M input tokens                                     |
| @cf/baai/bge-reranker-base            | $0.003 per M input tokens                             | 283 neurons per M input tokens                                      |
| @cf/meta/m2m100-1.2b                  | $0.342 per M input tokens  $0.342 per M output tokens | 31050 neurons per M input tokens  31050 neurons per M output tokens |
| @cf/microsoft/resnet-50               | $2.51 per M images                                    | 228055 neurons per M images                                         |
| @cf/ai4bharat/indictrans2-en-indic-1B | $0.342 per M input tokens  $0.342 per M output tokens | 31050 neurons per M input tokens  31050 neurons per M output tokens |

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers-ai/","name":"Workers AI"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers-ai/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers-ai/platform/pricing/","name":"Pricing"}}]}
```

---

---
title: Choose a data or storage product
description: Storage and database options available on Cloudflare's developer platform.
image: https://developers.cloudflare.com/dev-products-preview.png
---

[Skip to content](#%5Ftop) 

Was this helpful?

YesNo

[ Edit page ](https://github.com/cloudflare/cloudflare-docs/edit/production/src/content/docs/workers/platform/storage-options.mdx) [ Report issue ](https://github.com/cloudflare/cloudflare-docs/issues/new/choose) 

Copy page

# Choose a data or storage product

This guide describes the storage & database products available as part of Cloudflare Workers, including recommended use-cases and best practices.

## Choose a storage product

The following table maps our storage & database products to common industry terms as well as recommended use-cases:

| Use-case                                  | Product                                                                           | Ideal for                                                                                                                                                     |
| ----------------------------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Key-value storage                         | [Workers KV](https://developers.cloudflare.com/kv/)                               | Configuration data, service routing metadata, personalization (A/B testing)                                                                                   |
| Object storage / blob storage             | [R2](https://developers.cloudflare.com/r2/)                                       | User-facing web assets, images, machine learning and training datasets, analytics datasets, log and event data.                                               |
| Accelerate a Postgres or MySQL database   | [Hyperdrive](https://developers.cloudflare.com/hyperdrive/)                       | Connecting to an existing database in a cloud or on-premise using your existing database drivers & ORMs.                                                      |
| Global coordination & stateful serverless | [Durable Objects](https://developers.cloudflare.com/durable-objects/)             | Building collaborative applications; global coordination across clients; real-time WebSocket applications; strongly consistent, transactional storage.        |
| Lightweight SQL database                  | [D1](https://developers.cloudflare.com/d1/)                                       | Relational data, including user profiles, product listings and orders, and/or customer data.                                                                  |
| Task processing, batching and messaging   | [Queues](https://developers.cloudflare.com/queues/)                               | Background job processing (emails, notifications, APIs), message queuing, and deferred tasks.                                                                 |
| Vector search & embeddings queries        | [Vectorize](https://developers.cloudflare.com/vectorize/)                         | Storing [embeddings](https://developers.cloudflare.com/workers-ai/models/?tasks=Text+Embeddings) from AI models for semantic search and classification tasks. |
| Streaming ingestion                       | [Pipelines](https://developers.cloudflare.com/pipelines/)                         | Streaming data ingestion and processing, including clickstream analytics, telemetry/log data, and structured data for querying                                |
| Time-series metrics                       | [Analytics Engine](https://developers.cloudflare.com/analytics/analytics-engine/) | Write and query high-cardinality time-series data, usage metrics, and service-level telemetry using Workers and/or SQL.                                       |

Applications can build on multiple storage & database products: for example, using Workers KV for session data; R2 for large file storage, media assets and user-uploaded files; and Hyperdrive to connect to a hosted Postgres or MySQL database.

Pages Functions

Storage options can also be used by your front-end application built with Cloudflare Pages. For more information on available storage options for Pages applications, refer to the [Pages Functions bindings documentation](https://developers.cloudflare.com/pages/functions/bindings/).

## SQL database options

There are three options for SQL-based databases available when building applications with Workers.

* **Hyperdrive** if you have an existing Postgres or MySQL database, require large (1TB, 100TB or more) single databases, and/or want to use your existing database tools. You can also connect Hyperdrive to database platforms like [PlanetScale ↗](https://planetscale.com/) or [Neon ↗](https://neon.tech/).
* **D1** for lightweight, serverless applications that are read-heavy, have global users that benefit from D1's [read replication](https://developers.cloudflare.com/d1/best-practices/read-replication/), and do not require you to manage and maintain a traditional RDBMS.
* **Durable Objects** for stateful serverless workloads, per-user or per-customer SQL state, and building distributed systems (D1 and Queues are built on Durable Objects) where Durable Object's [strict serializability ↗](https://blog.cloudflare.com/durable-objects-easy-fast-correct-choose-three/) enables global ordering of requests and storage operations.

### Session storage

We recommend using [Workers KV](https://developers.cloudflare.com/kv/) for storing session data, credentials (API keys), and/or configuration data. These are typically read at high rates (thousands of RPS or more), are not typically modified (within KV's 1 write RPS per unique key limit), and do not need to be immediately consistent.

Frequently read keys benefit from KV's [internal cache](https://developers.cloudflare.com/kv/concepts/how-kv-works/), and repeated reads to these "hot" keys will typically see latencies in the 500µs to 10ms range.

Authentication frameworks like [OpenAuth ↗](https://openauth.js.org/docs/storage/cloudflare/) use Workers KV as session storage when deployed to Cloudflare, and [Cloudflare Access](https://developers.cloudflare.com/cloudflare-one/access-controls/policies/) uses KV to securely store and distribute user credentials so that they can be validated as close to the user as possible and reduce overall latency.

## Product overviews

### Workers KV

Workers KV is an eventually consistent key-value data store that caches on the Cloudflare global network.

It is ideal for projects that require:

* High volumes of reads and/or repeated reads to the same keys.
* Low-latency global reads (typically within 10ms for hot keys)
* Per-object time-to-live (TTL).
* Distributed configuration and/or session storage.

To get started with KV:

* Read how [KV works](https://developers.cloudflare.com/kv/concepts/how-kv-works/).
* Create a [KV namespace](https://developers.cloudflare.com/kv/concepts/kv-namespaces/).
* Review the [KV Runtime API](https://developers.cloudflare.com/kv/api/).
* Learn about KV [Limits](https://developers.cloudflare.com/kv/platform/limits/).

### R2

R2 is S3-compatible blob storage that allows developers to store large amounts of unstructured data without egress fees associated with typical cloud storage services.

It is ideal for projects that require:

* Storage for files which are infrequently accessed.
* Large object storage (for example, gigabytes or more per object).
* Strong consistency per object.
* Asset storage for websites (refer to [caching guide](https://developers.cloudflare.com/r2/buckets/public-buckets/#caching))

To get started with R2:

* Read the [Get started guide](https://developers.cloudflare.com/r2/get-started/).
* Learn about R2 [Limits](https://developers.cloudflare.com/r2/platform/limits/).
* Review the [R2 Workers API](https://developers.cloudflare.com/r2/api/workers/workers-api-reference/).

### Durable Objects

Durable Objects provide low-latency coordination and consistent storage for the Workers platform through global uniqueness and a transactional storage API.

* Global Uniqueness guarantees that there will be a single instance of a Durable Object class with a given ID running at once, across the world. Requests for a Durable Object ID are routed by the Workers runtime to the Cloudflare data center that owns the Durable Object.
* The transactional storage API provides strongly consistent key-value storage to the Durable Object. Each Object can only read and modify keys associated with that Object. Execution of a Durable Object is single-threaded, but multiple request events may still be processed out-of-order from how they arrived at the Object.

It is ideal for projects that require:

* Real-time collaboration (such as a chat application or a game server).
* Consistent storage.
* Data locality.

To get started with Durable Objects:

* Read the [introductory blog post ↗](https://blog.cloudflare.com/introducing-workers-durable-objects/).
* Review the [Durable Objects documentation](https://developers.cloudflare.com/durable-objects/).
* Get started with [Durable Objects](https://developers.cloudflare.com/durable-objects/get-started/).
* Learn about Durable Objects [Limits](https://developers.cloudflare.com/durable-objects/platform/limits/).

### D1

[D1](https://developers.cloudflare.com/d1/) is Cloudflare’s native serverless database. With D1, you can create a database by importing data or defining your tables and writing your queries within a Worker or through the API.

D1 is ideal for:

* Persistent, relational storage for user data, account data, and other structured datasets.
* Use-cases that require querying across your data ad-hoc (using SQL).
* Workloads with a high ratio of reads to writes (most web applications).

To get started with D1:

* Read [the documentation](https://developers.cloudflare.com/d1)
* Follow the [Get started guide](https://developers.cloudflare.com/d1/get-started/) to provision your first D1 database.
* Review the [D1 Workers Binding API](https://developers.cloudflare.com/d1/worker-api/).

Note

If your working data size exceeds 10 GB (the maximum size for a D1 database), consider splitting the database into multiple, smaller D1 databases.

### Queues

Cloudflare Queues allows developers to send and receive messages with guaranteed delivery. It integrates with [Cloudflare Workers](https://developers.cloudflare.com/workers) and offers at-least once delivery, message batching, and does not charge for egress bandwidth.

Queues is ideal for:

* Offloading work from a request to schedule later.
* Send data from Worker to Worker (inter-Service communication).
* Buffering or batching data before writing to upstream systems, including third-party APIs or [Cloudflare R2](https://developers.cloudflare.com/queues/examples/send-errors-to-r2/).

To get started with Queues:

* [Set up your first queue](https://developers.cloudflare.com/queues/get-started/).
* Learn more [about how Queues works](https://developers.cloudflare.com/queues/reference/how-queues-works/).

### Hyperdrive

Hyperdrive is a service that accelerates queries you make to MySQL and Postgres databases, making it faster to access your data from across the globe, irrespective of your users’ location.

Hyperdrive allows you to:

* Connect to an existing database from Workers without connection overhead.
* Cache frequent queries across Cloudflare's global network to reduce response times on highly trafficked content.
* Reduce load on your origin database with connection pooling.

To get started with Hyperdrive:

* [Connect Hyperdrive](https://developers.cloudflare.com/hyperdrive/get-started/) to your existing database.
* Learn more [about how Hyperdrive speeds up your database queries](https://developers.cloudflare.com/hyperdrive/concepts/how-hyperdrive-works/).

## Pipelines

Pipelines is a streaming ingestion service that allows you to ingest high volumes of real time data, without managing any infrastructure.

Pipelines allows you to:

* Ingest data at extremely high throughput (tens of thousands of records per second or more)
* Batch and write data directly to object storage, ready for querying
* (Future) Transform and aggregate data during ingestion

To get started with Pipelines:

* [Create a Pipeline](https://developers.cloudflare.com/pipelines/getting-started/) that can batch and write records to R2.

### Analytics Engine

Analytics Engine is Cloudflare's time-series and metrics database that allows you to write unlimited-cardinality analytics at scale using a built-in API to write data points from Workers and query that data using SQL directly.

Analytics Engine allows you to:

* Expose custom analytics to your own customers
* Build usage-based billing systems
* Understand the health of your service on a per-customer or per-user basis
* Add instrumentation to frequently called code paths, without impacting performance or overwhelming external analytics systems with events

Cloudflare uses Analytics Engine internally to store and product per-product metrics for products like D1 and R2 at scale.

To get started with Analytics Engine:

* Learn how to [get started with Analytics Engine](https://developers.cloudflare.com/analytics/analytics-engine/get-started/)
* See [an example of writing time-series data to Analytics Engine](https://developers.cloudflare.com/analytics/analytics-engine/recipes/usage-based-billing-for-your-saas-product/)
* Understand the [SQL API](https://developers.cloudflare.com/analytics/analytics-engine/sql-api/) for reading data from your Analytics Engine datasets

### Vectorize

Vectorize is a globally distributed vector database that enables you to build full-stack, AI-powered applications with Cloudflare Workers and [Workers AI](https://developers.cloudflare.com/workers-ai/).

Vectorize allows you to:

* Store embeddings from any vector embeddings model (Bring Your Own embeddings) for semantic search and classification tasks.
* Add context to Large Language Model (LLM) queries by using vector search as part of a [Retrieval Augmented Generation](https://developers.cloudflare.com/workers-ai/guides/tutorials/build-a-retrieval-augmented-generation-ai/) (RAG) workflow.
* [Filter on vector metadata](https://developers.cloudflare.com/vectorize/reference/metadata-filtering/) to reduce the search space and return more relevant results.

To get started with Vectorize:

* [Create your first vector database](https://developers.cloudflare.com/vectorize/get-started/intro/).
* Combine [Workers AI and Vectorize](https://developers.cloudflare.com/vectorize/get-started/embeddings/) to generate, store and query text embeddings.
* Learn more about [how vector databases work](https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/).

## SQL in Durable Objects vs D1

Cloudflare Workers offers a SQLite-backed serverless database product - [D1](https://developers.cloudflare.com/d1/). How should you compare [SQLite in Durable Objects](https://developers.cloudflare.com/durable-objects/best-practices/access-durable-objects-storage/) and D1?

**D1 is a managed database product.**

D1 fits into a familiar architecture for developers, where application servers communicate with a database over the network. Application servers are typically Workers; however, D1 also supports external, non-Worker access via an [HTTP API ↗](https://developers.cloudflare.com/api/resources/d1/subresources/database/methods/query/), which helps unlock [third-party tooling](https://developers.cloudflare.com/d1/reference/community-projects/#%5Ftop) support for D1.

D1 aims for a "batteries included" feature set, including the above HTTP API, [database schema management](https://developers.cloudflare.com/d1/reference/migrations/#%5Ftop), [data import/export](https://developers.cloudflare.com/d1/best-practices/import-export-data/), and [database query insights](https://developers.cloudflare.com/d1/observability/metrics-analytics/#query-insights).

With D1, your application code and SQL database queries are not colocated which can impact application performance. If performance is a concern with D1, Workers has [Smart Placement](https://developers.cloudflare.com/workers/configuration/placement/#%5Ftop) to dynamically run your Worker in the best location to reduce total Worker request latency, considering everything your Worker talks to, including D1.

**SQLite in Durable Objects is a lower-level compute with storage building block for distributed systems.**

By design, Durable Objects are accessed with Workers-only.

Durable Objects require a bit more effort, but in return, give you more flexibility and control. With Durable Objects, you must implement two pieces of code that run in different places: a front-end Worker which routes incoming requests from the Internet to a unique Durable Object, and the Durable Object itself, which runs on the same machine as the SQLite database. You get to choose what runs where, and it may be that your application benefits from running some application business logic right next to the database.

With SQLite in Durable Objects, you may also need to build some of your own database tooling that comes out-of-the-box with D1.

SQL query pricing and limits are intended to be identical between D1 ([pricing](https://developers.cloudflare.com/d1/platform/pricing/), [limits](https://developers.cloudflare.com/d1/platform/limits/)) and SQLite in Durable Objects ([pricing](https://developers.cloudflare.com/durable-objects/platform/pricing/#sqlite-storage-backend), [limits](https://developers.cloudflare.com/durable-objects/platform/limits/)).

```json
{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"item":{"@id":"/directory/","name":"Directory"}},{"@type":"ListItem","position":2,"item":{"@id":"/workers/","name":"Workers"}},{"@type":"ListItem","position":3,"item":{"@id":"/workers/platform/","name":"Platform"}},{"@type":"ListItem","position":4,"item":{"@id":"/workers/platform/storage-options/","name":"Choose a data or storage product"}}]}
```
