Building an AI Agent Made Easy Thanks to the AI SDK

Part of the series AI Agents and MCP Server: Teaming Up for the Agentic Web

Before going any further, what's an AI Agent? I think it's important to clarify this concept so we have a better understanding of what we're trying to achieve.

An AI Agent is software that uses AI to perform tasks on behalf of a user. This means you can ask it something using natural language, text or voice, and it will understand and perform the task accordingly. This is a huge step toward making our interactions with machines more natural and intuitive.

You don't need to know exactly what you're looking for or what you want to perform. The AI Agent can understand a suite of actions within the same request, which is something new. Until now, AI assistants like Siri or Google Assistant required more specific commands and could only handle one action at a time. For example, "Start a timer and add an item to the list" would previously require separate commands.

AI's Functionality

To understand this more clearly, let's review today's capabilities of AI. To be precise, we'll focus on LLMs that generate text from text input; this keeps the series readable without reducing relevance.

  1. Generates text based on the input provided.
  2. Reasons about the context to increase relevance and coherence.
  3. Calls tools to perform actions in the real world.

Note

Depending on the provider, there may be additional functionalities available, like built-in tools or MCP tools. Ultimately, everything is considered a tool, whether it's a built-in function, an external API via MCP, or a custom tool.

When you interact with an AI, it will use one or more of these capabilities to answer your query. However, there are some important things to note.

Reasoning always happens before the AI responds. That means if the AI reasons, it will then either generate text (capability 1) or call a tool (capability 3).

Also, the latest models can call multiple tools within a single request, and generate text before and between tool calls to update the user on progress.

This means that if the AI chooses to call tools, the user won't receive an immediate answer because the generate-text capability isn't used. If the user then has to prompt the AI again with "Could I have my answer?", the chat we wanted to build won't be viable.

Note

It's important to note that the AI does not run custom tools directly. It asks the client, via a dedicated response, to call the tool on its behalf. That's why defining tools is often just defining a JSON schema. I recommend reading the tool calling flow in the OpenAI documentation.

Looping on Herself

To solve this problem, we can implement a simple yet powerful feature: a loop.

Yes, a simple while loop makes our chat interface feasible. If the answer is a tool call, we can loop until we get a text response.

With some pseudo-code, this looks like:

js
while (true) {
  response = AI.ask(userInput)

  if (response.isText()) {
    break
  }

  // Call the tool and update the user
}

That's it. This simple loop combined with the three AI capabilities we defined earlier is what we call an AI Agent. While the answer isn't a direct text generation, it still involves reasoning and tool usage.

Note

To avoid an infinite loop, define a maximum number of iterations.

Our Agent

I think we're ready to start building our AI Agent.

For this series, we will use the AI SDK. We could have used an SDK from a provider like OpenAI or Anthropic, but the higher level of the AI SDK will make it easier while avoiding vendor lock.

To make sure we're all on the same page, here's a quick overview of the infrastructure we're building.

AI Agent Infrastructure.
AI Agent Infrastructure.

For this part, we'll focus on the server-side implementation and the communication with the provider.

To do so, we'll use Nitro, but you can use any JS backend framework of your choice.

Installation

First, create a new Nitro project:

bash
pnpm dlx giget@latest nitro ai-agent --install

Then remove the server/routes/index.ts file, which we won't need:

bash
rm server/routes/index.ts

Starting with the AI SDK

Now that our backend is set up, we can start working with the AI SDK.

Install the SDK:

bash
pnpm add ai @ai-sdk/openai

We also install the OpenAI adapter to communicate with the OpenAI API, but you can use any other adapter you prefer.

To communicate with OpenAI, set up an API key. You can obtain one by creating an account on the OpenAI website. Then set the OPENAI_API_KEY environment variable in your .env file:

ini
NITRO_OPEN_AI_API_KEY=

Finally, create a runtime variable within the Nitro configuration:

ts
export default defineNuxtConfig({
  runtimeConfig: {
    openAiApiKey: '',
  },
  // ...
})

Now we're ready to start building our AI Agent.

Streaming Some Text

The first step is to make sure our AI Agent can generate text based on user input. This may be more challenging than it seems, but the AI SDK provides a simple way to achieve it.

Create a new endpoint in the Nitro server that will receive the user query and return the AI-generated response:

bash
mkdir server/api
touch server/api/chat.ts

Then create the endpoint in server/api/chat.ts:

ts
import { createOpenAI } from '@ai-sdk/openai'
import { convertToModelMessages, streamText } from 'ai'
import { defineEventHandler, defineLazyEventHandler, readBody } from 'h3'
import { useRuntimeConfig } from 'nitropack/runtime'

export default defineLazyEventHandler(() => {
  const runtimeConfig = useRuntimeConfig()

  const model = createOpenAI({
    apiKey: runtimeConfig.openAiApiKey,
  })

  return defineEventHandler(async (event) => {
    const { messages } = await readBody(event)

    return streamText({
      model: model('gpt-5-nano'),
      system: `You are a helpful assistant.`,
      messages: convertToModelMessages(messages),
    }).toUIMessageStreamResponse()
  })
})

There are two important sections in this file.

  1. Creating the OpenAI Model:
ts
const model = createOpenAI({
  apiKey: runtimeConfig.openAiApiKey,
})

This creates the adapter for the OpenAI model. If you wish to use another provider, change the adapter creation code. This is also where we use the openAiApiKey from the runtime config.

  1. Streaming Text Responses:
ts
return streamText({
  model: model('gpt-5-nano'),
  system: `You are a helpful assistant.`,
  messages: convertToModelMessages(messages),
}).toUIMessageStreamResponse()

The second important section is the streaming of text responses. We use the streamText function from the AI SDK to stream the AI-generated responses back to the client. We give it the model, a system prompt, and the user messages. The toUIMessageStreamResponse function transforms the response into a format suitable for building the frontend. If you want only the text, you could use toTextStreamResponse.

All of these components are wrapped in a lazy event handler to ensure the model is only created once, on the first request. This avoids having each request create a new model instance.

You can test your AI Agent endpoint using the following curl command:

bash
curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "parts": [
          {
            "type": "text",
            "text": "Why is the sky blue?"
          }
        ]
      }
    ]
  }'

This command sends a sample user message to your /api/chat endpoint and streams back the AI-generated response.

The AI Agent streaming text responses.

Isn't it beautiful in its simplicity?

Calling a Tool

Now we can add a tool to our AI Agent. As a reminder, tools are functions that the AI Agent can call to perform specific tasks; the code runs on the server, not in the AI provider.

For this example, we'll add a simple addition tool. Before going further, install Zod to create schemas:

bash
pnpm add zod

Now add the addition tool to our AI Agent:

ts
return streamText({
  model: model('gpt-5-nano'),
  system: `You are a helpful assistant.`,
  tools: {
    addition: tool({
      description: 'Adds two numbers',
      inputSchema: z.object({
        a: z.number().describe('The first number'),
        b: z.number().describe('The second number'),
      }),
      execute: ({ a, b }) => ({
        a,
        b,
        result: a + b
      }),
    }),
  },
  messages: convertToModelMessages(messages),
}).toUIMessageStreamResponse()

Finally, update our prompt to ensure the AI will use the new tool:

ts
return streamText({
  model: model('gpt-5-nano'),
  system: `You are a helpful assistant. You can use the tool to add two numbers together.`,
  // ...
}).toUIMessageStreamResponse()

And let's give it a try:

bash
curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "parts": [
          {
            "type": "text",
            "text": "What is 2 + 2?"
          }
        ]
      }
    ]
  }'
The AI Agent using a tool to perform a calculation.

Can you notice something in the response?

The AI performs the calculation using the addition tool:

txt
data: {"type":"tool-output-available","toolCallId":"call_Zmi5NtcVDRFZsZHcsULlutnG","output":{"a":12,"b":24,"result":36}}

But it does not provide the final answer directly. It does not say "The answer is 36." If you remember the three capabilities of the AI, this makes sense. The AI used the tool-calling capability, so it cannot provide the final answer directly. We need to call the AI again.

We need to make it an AI Agent!

Making It an Agent

Thanks to the AI SDK, it's trivial to create an AI Agent. We just need to tell streamText the maximum steps it can take to produce a final answer.

ts
import { stepCountIs, streamText } from 'ai'

return streamText({
  model: model('gpt-5-nano'),
  system: `You are a helpful assistant. You can use the tool to add two numbers together.`,
  stopWhen: stepCountIs(2),
  // ...
}).toUIMessageStreamResponse()

The stopWhen: stepCountIs(2) option tells the AI SDK to stop the interaction when the AI has taken two steps, or when the latest step is a text response. With that, our AI Agent should perform the tool call and then generate an answer based on the tool's output.

Note

I recommend reading the Loop Control page in the documentation to understand what happens under the hood.

Let's give it a try:

bash
curl -X POST http://localhost:3000/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "parts": [
          {
            "type": "text",
            "text": "What is 2 + 2?"
          }
        ]
      }
    ]
  }'
The AI Agent using a tool and generating a response.

Yeah, it works! Our AI is now an Agent capable of using tools and generating responses based on their output. Perfect!

PP

Thanks for reading! My name is Estéban, and I love to write about web development and the human journey around it.

I've been coding for several years now, and I'm still learning new things every day. I enjoy sharing my knowledge with others, as I would have appreciated having access to such clear and complete resources when I first started learning programming.

If you have any questions or want to chat, feel free to comment below or reach out to me on Bluesky, X, and LinkedIn.

I hope you enjoyed this article and learned something new. Please consider sharing it with your friends or on social media, and feel free to leave a comment or a reaction below—it would mean a lot to me! If you'd like to support my work, you can sponsor me on GitHub!

Continue readingMCP to Provide Additional Capabilities to the AI Agent

Reactions

Discussions

Add a Comment

You need to be logged in to access this feature.