writing/tutorial/2026/06
TutorialJun 17, 2026·30 min read

OpenAI Responses API: Build Production AI Agents with Tools in TypeScript

Master the OpenAI Responses API — the production-ready replacement for the deprecated Assistants API. Build AI agents with web search, code interpreter, custom tools, and streaming in TypeScript and Next.js 15.

The OpenAI Assistants API is shutting down on August 26, 2026. If your application relies on it — threads, runs, polling loops, assistant objects — you need to migrate. The replacement is the Responses API, a cleaner, faster, and more capable interface that collapses the entire Assistants workflow into a single function call.

This tutorial walks you through everything: basic usage, built-in tools (web search and code interpreter), custom function calling, streaming, multi-turn conversation management, and a complete migration guide. By the end, your Next.js 15 app will have a production-ready AI agent backend.

Prerequisites

Before starting, ensure you have:

  • Node.js 20 or newer installed
  • An OpenAI account with an API key (from platform.openai.com/api-keys)
  • A Next.js 15 project (npx create-next-app@latest my-agent --typescript)
  • Familiarity with TypeScript and async/await patterns

What You Will Build

A multilingual AI research assistant in Next.js 15 that:

  • Answers questions grounded in real-time web search
  • Runs Python code on-demand with the code interpreter
  • Calls custom functions to fetch structured data
  • Streams responses token by token to the browser
  • Maintains multi-turn conversation history without resending full context

What Is the Responses API?

The Responses API is OpenAI's unified interface for agentic AI, launched in early 2025 and now the recommended path for all new projects. The mental model is simple:

  • You send input items (messages, tool results)
  • You receive output items (text, function calls, tool results, reasoning traces)
  • The API handles the agentic loop — it calls built-in tools, executes function calls, and chains them automatically within a single request

Why is it better than the Assistants API?

The Assistants API required orchestrating four separate objects — Assistants, Threads, Messages, and Runs — plus a polling loop to wait for completion. A simple chatbot needed six API calls. The Responses API collapses all of that into one responses.create() call. Cache utilization improves by 40–80%, cutting costs significantly. You also unlock newer capabilities: deep research, remote MCP servers, and computer use.

Step 1: Install and Configure the SDK

Install the latest OpenAI Node.js SDK:

npm install openai

Add your API key to .env.local:

OPENAI_API_KEY=sk-proj-your-key-here

Create a shared OpenAI client at lib/openai.ts:

import OpenAI from "openai";
 
export const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
  maxRetries: 3,
});

Setting maxRetries: 3 gives you automatic exponential backoff for transient errors — important for production.

Step 2: Your First Response

Create a basic API route at app/api/chat/route.ts:

import { openai } from "@/lib/openai";
import { NextResponse } from "next/server";
 
export async function POST(req: Request) {
  const { message } = await req.json();
 
  const response = await openai.responses.create({
    model: "gpt-4.5",
    instructions: "You are a helpful research assistant.",
    input: message,
  });
 
  return NextResponse.json({ text: response.output_text });
}

Three key parameters:

  • model — the model to use (gpt-4.5, gpt-4.5-mini, o3, etc.)
  • instructions — replaces the system message; sets the agent's persona and behavior
  • input — the user's message (string or array of input items)

response.output_text is a convenience property that extracts the first text content from the response. It is equivalent to iterating response.output and reading the first message item.

Step 3: Input and Output Items

For multi-turn conversations, pass an array of items to input:

const response = await openai.responses.create({
  model: "gpt-4.5",
  input: [
    { role: "user", content: "What is the capital of Tunisia?" },
    { role: "assistant", content: "The capital of Tunisia is Tunis." },
    { role: "user", content: "What is its population?" },
  ],
});

The response.output array contains typed output items. Always check item.type before accessing type-specific fields:

for (const item of response.output) {
  if (item.type === "message") {
    for (const part of item.content) {
      if (part.type === "output_text") {
        console.log(part.text);
      }
    }
  }
}

Possible output item types include message, function_call, function_call_output, reasoning, and web_search_call.

Enable real-time web search by adding { type: "web_search_preview" } to the tools array. The model decides autonomously when a search is warranted:

const response = await openai.responses.create({
  model: "gpt-4.5",
  instructions: "You are a research assistant. Always cite your sources.",
  input: "What are the main features of the OpenAI Responses API released in 2026?",
  tools: [{ type: "web_search_preview" }],
});
 
console.log(response.output_text);

Search results include annotations — inline URL citations. Extract them from the response:

for (const item of response.output) {
  if (item.type === "message") {
    for (const part of item.content) {
      if (part.type === "output_text" && part.annotations) {
        for (const ann of part.annotations) {
          if (ann.type === "url_citation") {
            console.log(`Cited: ${ann.title} — ${ann.url}`);
          }
        }
      }
    }
  }
}

This gives you grounded, cited answers without any external search API setup.

Step 5: Built-in Tool — Code Interpreter

The code interpreter runs Python in a sandboxed container. Use it for math, data analysis, file processing, and chart generation:

const response = await openai.responses.create({
  model: "gpt-4.5",
  instructions: "You are a data analyst. Always show your work with code.",
  input: "Calculate compound interest on $10,000 at 7% annually for 10 years. Show each year.",
  tools: [
    {
      type: "code_interpreter",
      container: { type: "auto" },
    },
  ],
});
 
console.log(response.output_text);

The model writes Python, executes it inside the container, and returns the results as part of the response. Files generated (charts, CSVs) are referenced by file_id in the output and can be downloaded via the Files API.

Note: the code interpreter requires the Node.js runtime on Vercel — not the Edge runtime.

Step 6: Custom Function Tools

Custom functions let the model call your own business logic. You define the schema; the model decides when to invoke it.

Define the tool:

const getWeatherTool = {
  type: "function" as const,
  name: "get_weather",
  description: "Get current weather conditions for a city.",
  parameters: {
    type: "object",
    properties: {
      city: { type: "string", description: "City name, e.g. Tunis" },
      unit: {
        type: "string",
        enum: ["celsius", "fahrenheit"],
        description: "Temperature unit",
      },
    },
    required: ["city"],
    additionalProperties: false,
  },
  strict: true,
};

Setting strict: true enables structured outputs for function arguments — the model always returns valid JSON matching your schema.

Run the agentic loop:

import type OpenAI from "openai";
 
async function runAgent(userMessage: string): Promise<string> {
  let input: string | OpenAI.Responses.InputItem[] = userMessage;
 
  while (true) {
    const response = await openai.responses.create({
      model: "gpt-4.5",
      input,
      tools: [getWeatherTool],
    });
 
    const functionCall = response.output.find(
      (item): item is OpenAI.Responses.FunctionCallItem =>
        item.type === "function_call"
    );
 
    if (!functionCall) {
      return response.output_text;
    }
 
    const args = JSON.parse(functionCall.arguments) as {
      city: string;
      unit?: string;
    };
    const weather = await fetchWeather(args.city, args.unit ?? "celsius");
 
    // Append the function result and loop
    input = [
      { role: "user", content: userMessage },
      ...response.output,
      {
        type: "function_call_output" as const,
        call_id: functionCall.call_id,
        output: JSON.stringify(weather),
      },
    ];
  }
}
 
async function fetchWeather(city: string, unit: string) {
  // Call your real weather API here
  return { city, temperature: 24, unit, condition: "Sunny" };
}

The loop continues until the model returns a final message without requesting any more tool calls. Add a turns counter to guard against infinite loops in production.

Step 7: Streaming Responses

Streaming delivers a far better user experience for longer responses. Use openai.responses.stream():

// app/api/chat/stream/route.ts
import { openai } from "@/lib/openai";
 
export const runtime = "nodejs";
 
export async function POST(req: Request) {
  const { message } = await req.json();
 
  const encoder = new TextEncoder();
 
  const readable = new ReadableStream({
    async start(controller) {
      const stream = openai.responses.stream({
        model: "gpt-4.5",
        instructions: "You are a helpful assistant.",
        input: message,
        tools: [{ type: "web_search_preview" }],
      });
 
      for await (const event of stream) {
        if (
          event.type === "response.output_text.delta" &&
          event.delta
        ) {
          controller.enqueue(encoder.encode(event.delta));
        }
      }
 
      controller.close();
    },
  });
 
  return new Response(readable, {
    headers: {
      "Content-Type": "text/plain; charset=utf-8",
      "Cache-Control": "no-cache",
    },
  });
}

Read the stream on the client:

"use client";
import { useState } from "react";
 
export function ChatBox() {
  const [output, setOutput] = useState("");
 
  async function ask(message: string) {
    setOutput("");
    const res = await fetch("/api/chat/stream", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ message }),
    });
 
    const reader = res.body!.getReader();
    const decoder = new TextDecoder();
 
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      setOutput((prev) => prev + decoder.decode(value));
    }
  }
 
  return (
    <div className="p-4">
      <button
        className="px-4 py-2 bg-blue-600 text-white rounded"
        onClick={() => ask("What happened in AI this week?")}
      >
        Ask
      </button>
      <p className="mt-4 whitespace-pre-wrap">{output}</p>
    </div>
  );
}

Key streaming event types to handle:

  • response.output_text.delta — incremental text chunk
  • response.output_item.added — a new output item started
  • response.completed — the full response is done

Step 8: Multi-turn Conversations with previous_response_id

Instead of resending the full conversation history on every turn, use previous_response_id. OpenAI caches the prior context server-side.

// app/api/research/route.ts
import { openai } from "@/lib/openai";
import { NextResponse } from "next/server";
 
export async function POST(req: Request) {
  const { message, previousResponseId } = await req.json();
 
  const response = await openai.responses.create({
    model: "gpt-4.5",
    input: message,
    previous_response_id: previousResponseId ?? undefined,
    tools: [{ type: "web_search_preview" }],
  });
 
  return NextResponse.json({
    text: response.output_text,
    responseId: response.id,
  });
}

Your client stores responseId from each response and passes it back as previousResponseId on the next request. This is more efficient than manual history arrays — and you pay for the cached tokens at a fraction of the input token price.

For durable, named conversation threads (equivalent to the old Threads API), use the Conversations API and pass conversation_id alongside your request. The Conversations API creates a persistent, retrievable history that survives across sessions.

Step 9: Complete Production Route

A full Next.js 15 API route combining all the patterns above:

// app/api/agent/route.ts
import { openai } from "@/lib/openai";
import { NextResponse } from "next/server";
import type OpenAI from "openai";
 
export const runtime = "nodejs";
 
const searchTool: OpenAI.Responses.Tool = { type: "web_search_preview" };
const codeTool: OpenAI.Responses.Tool = {
  type: "code_interpreter",
  container: { type: "auto" },
};
 
export async function POST(req: Request) {
  const {
    message,
    previousResponseId,
    enableSearch = true,
    enableCode = false,
  } = await req.json();
 
  const tools: OpenAI.Responses.Tool[] = [
    ...(enableSearch ? [searchTool] : []),
    ...(enableCode ? [codeTool] : []),
  ];
 
  try {
    const response = await openai.responses.create({
      model: "gpt-4.5",
      instructions:
        "You are an expert research assistant for developers in the MENA region. " +
        "Cite sources when you use web search. Show code when asked to analyze data.",
      input: message,
      tools: tools.length > 0 ? tools : undefined,
      previous_response_id: previousResponseId ?? undefined,
    });
 
    return NextResponse.json({
      text: response.output_text,
      responseId: response.id,
      usage: {
        inputTokens: response.usage?.input_tokens,
        outputTokens: response.usage?.output_tokens,
        cachedTokens: response.usage?.input_tokens_details?.cached_tokens,
      },
    });
  } catch (error) {
    if (error instanceof OpenAI.APIError) {
      return NextResponse.json(
        { error: error.message },
        { status: error.status ?? 500 }
      );
    }
    throw error;
  }
}

Exposing cachedTokens in the response lets you monitor cache hit rates in your logs — a useful signal for tuning previous_response_id usage.

Step 10: Migrating from the Assistants API

Here is the complete concept mapping to guide your migration:

Assistants APIResponses API Equivalent
assistants.create()instructions parameter on responses.create()
threads.create()previous_response_id (lightweight) or Conversations API (persistent)
threads.messages.create()input parameter
threads.runs.create()responses.create()
Run polling loopNot needed — synchronous by default
threads.messages.list()response.output array
File search tool{ type: "file_search", vector_store_ids: [...] }
Code interpreter tool{ type: "code_interpreter", container: { type: "auto" } }
Function tools{ type: "function", name, parameters } (same schema format)

Before (Assistants API — 6 API calls + polling):

const assistant = await openai.beta.assistants.create({
  name: "Research Bot",
  instructions: "You help with research.",
  tools: [{ type: "web_search" }],
  model: "gpt-4-turbo",
});
 
const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
  role: "user",
  content: userMessage,
});
 
let run = await openai.beta.threads.runs.create(thread.id, {
  assistant_id: assistant.id,
});
 
while (run.status !== "completed") {
  await new Promise((r) => setTimeout(r, 1000));
  run = await openai.beta.threads.runs.retrieve(thread.id, run.id);
}
 
const messages = await openai.beta.threads.messages.list(thread.id);
const text = messages.data[0].content[0].text.value;

After (Responses API — 1 API call):

const response = await openai.responses.create({
  model: "gpt-4.5",
  instructions: "You help with research.",
  input: userMessage,
  tools: [{ type: "web_search_preview" }],
});
 
const text = response.output_text;

The migration is mostly mechanical — the biggest conceptual shift is that you now own conversation state (via previous_response_id or your own history array) rather than having OpenAI manage it in Threads.

Troubleshooting

ERR_STREAM_WRITE_AFTER_END on Edge runtime: The code interpreter requires a full Node.js environment. Add export const runtime = "nodejs" to any route that uses it.

Infinite tool call loops: Add a maxTurns guard to your while loop and return an error if exceeded. A reasonable limit is 10 turns.

Streaming timing out on Vercel: Increase the function maxDuration in next.config.ts for routes that use deep research or code interpreter — they can run for up to 30 seconds.

Rate limit (429) errors: The maxRetries: 3 in the OpenAI client handles transient limits automatically. For sustained high volume, request a rate limit increase from OpenAI's platform dashboard.

output_text is empty: This happens when the model's final output is a function call rather than a message. Check response.output for items of type function_call and complete the agentic loop.

Next Steps

  • File search: Upload PDFs to a vector store with openai.vectorStores.create(), then add { type: "file_search", vector_store_ids: ["vs_..."] } to enable retrieval-augmented generation.
  • Deep research: Use { type: "deep_research" } for multi-step research reports. These are asynchronous and can run for 5–30 minutes.
  • Remote MCP: Connect to any Model Context Protocol server with { type: "mcp", server_url: "...", headers: { Authorization: "Bearer ..." } }.
  • Computer use: Build browser automation agents with { type: "computer_use_preview" } for tasks that require navigating real interfaces.
  • Add observability: Integrate Langfuse or LangSmith to trace every tool call, measure latency, and run evals on your agent's outputs.

Conclusion

The OpenAI Responses API turns a six-call, polling-heavy Assistants workflow into a single expressive function call. With web search, code interpreter, function tools, streaming, and previous_response_id conversation chaining all available out of the box, you have everything needed to build production-grade AI agents in TypeScript. The August 26, 2026 sunset of the Assistants API makes migration urgent — and the good news is the new API is genuinely better in every dimension: simpler, faster, cheaper, and more capable.