Google Gemini 2.5 Pro is here. With a 1M token context window, native multimodal capabilities, and built-in function calling, Gemini is one of the most powerful AI APIs available today. In this tutorial, you will learn to harness its full potential using TypeScript.

What You Will Learn

By the end of this tutorial, you will:

Set up the Google Generative AI SDK in a TypeScript project
Generate text using Gemini 2.5 Pro and Gemini 2.5 Flash
Process images, PDFs, and audio with multimodal input
Implement real-time streaming responses
Use function calling to connect Gemini to external tools
Extract structured data with JSON schema output
Build a practical AI assistant with conversation history

Prerequisites

Before starting, ensure you have:

Node.js 20+ installed (node --version)
TypeScript knowledge (types, async/await, modules)
A Google AI Studio API key — get one free at aistudio.google.com
A code editor — VS Code or Cursor recommended
Basic understanding of REST APIs and async programming

Step 1: Project Setup

Create a new TypeScript project and install the Google Generative AI SDK:

mkdir gemini-ai-app && cd gemini-ai-app
npm init -y
npm install @google/generative-ai
npm install -D typescript @types/node tsx

Initialize TypeScript:

npx tsc --init --target ES2022 --module NodeNext --moduleResolution NodeNext --outDir dist --rootDir src --strict

Create the project structure:

mkdir src
touch src/index.ts .env

Add your API key to .env:

GEMINI_API_KEY=your_api_key_here

Update package.json scripts:

{
  "scripts": {
    "dev": "tsx watch src/index.ts",
    "build": "tsc",
    "start": "node dist/index.js"
  }
}

Step 2: Your First Text Generation

Let us start with a simple text generation example. Create src/index.ts:

import { GoogleGenerativeAI } from "@google/generative-ai";
 
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
 
async function generateText() {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.5-pro",
  });
 
  const result = await model.generateContent(
    "Explain quantum computing in 3 sentences for a software developer."
  );
 
  console.log(result.response.text());
}
 
generateText();

Run it:

GEMINI_API_KEY=your_key tsx src/index.ts

You should see a concise explanation of quantum computing. The generateContent method sends a single prompt and returns the complete response.

Understanding the Response Object

The response contains more than just text:

async function inspectResponse() {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });
 
  const result = await model.generateContent("Hello, Gemini!");
  const response = result.response;
 
  console.log("Text:", response.text());
  console.log("Usage:", response.usageMetadata);
  // { promptTokenCount: 4, candidatesTokenCount: 12, totalTokenCount: 16 }
}

The usageMetadata field helps you track token consumption for cost management.

Step 3: Choosing the Right Model

Gemini offers several models optimized for different use cases:

Model	Best For	Context Window	Speed
`gemini-2.5-pro`	Complex reasoning, coding, analysis	1M tokens	Moderate
`gemini-2.5-flash`	Fast responses, high throughput	1M tokens	Fast
`gemini-2.5-flash-lite`	Cost-efficient, simple tasks	1M tokens	Fastest

// For complex analysis
const pro = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });
 
// For speed-critical applications
const flash = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
// For high-volume, simple tasks
const lite = genAI.getGenerativeModel({ model: "gemini-2.5-flash-lite" });

Tip: Start with gemini-2.5-flash for development. Switch to pro only when you need its enhanced reasoning. Flash is significantly cheaper and faster.

Step 4: Multimodal Input — Images and PDFs

One of Gemini's strongest features is native multimodal understanding. You can send images, PDFs, audio, and video alongside text.

Analyzing an Image

import { GoogleGenerativeAI } from "@google/generative-ai";
import * as fs from "fs";
import * as path from "path";
 
async function analyzeImage(imagePath: string) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
  const imageData = fs.readFileSync(imagePath);
  const base64Image = imageData.toString("base64");
  const mimeType = imagePath.endsWith(".png") ? "image/png" : "image/jpeg";
 
  const result = await model.generateContent([
    {
      inlineData: {
        mimeType,
        data: base64Image,
      },
    },
    "Describe this image in detail. What objects, colors, and scene do you see?",
  ]);
 
  console.log(result.response.text());
}
 
analyzeImage("./sample-image.jpg");

Processing a PDF Document

async function analyzePDF(pdfPath: string) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-pro" });
 
  const pdfData = fs.readFileSync(pdfPath);
  const base64PDF = pdfData.toString("base64");
 
  const result = await model.generateContent([
    {
      inlineData: {
        mimeType: "application/pdf",
        data: base64PDF,
      },
    },
    "Summarize the key points of this document. List the main topics and any action items.",
  ]);
 
  console.log(result.response.text());
}

Comparing Multiple Images

async function compareImages(image1Path: string, image2Path: string) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
  const image1 = fs.readFileSync(image1Path).toString("base64");
  const image2 = fs.readFileSync(image2Path).toString("base64");
 
  const result = await model.generateContent([
    { inlineData: { mimeType: "image/jpeg", data: image1 } },
    { inlineData: { mimeType: "image/jpeg", data: image2 } },
    "Compare these two images. What are the differences and similarities?",
  ]);
 
  console.log(result.response.text());
}

Step 5: Streaming Responses

For better user experience, stream responses token by token instead of waiting for the full response:

async function streamResponse(prompt: string) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
  const result = await model.generateContentStream(prompt);
 
  process.stdout.write("Gemini: ");
 
  for await (const chunk of result.stream) {
    const text = chunk.text();
    process.stdout.write(text);
  }
 
  console.log("\n");
 
  // Access the aggregated response after streaming
  const aggregated = await result.response;
  console.log("Total tokens:", aggregated.usageMetadata?.totalTokenCount);
}
 
streamResponse("Write a short poem about TypeScript.");

Streaming in a Web Server

Here is how to integrate streaming with a simple HTTP server:

import { createServer } from "http";
 
const server = createServer(async (req, res) => {
  if (req.method === "POST" && req.url === "/chat") {
    const body = await getBody(req);
    const { message } = JSON.parse(body);
 
    const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
    const result = await model.generateContentStream(message);
 
    res.writeHead(200, {
      "Content-Type": "text/event-stream",
      "Cache-Control": "no-cache",
      Connection: "keep-alive",
    });
 
    for await (const chunk of result.stream) {
      res.write(`data: ${JSON.stringify({ text: chunk.text() })}\n\n`);
    }
 
    res.write("data: [DONE]\n\n");
    res.end();
  }
});
 
server.listen(3000, () => console.log("Server running on port 3000"));
 
function getBody(req: any): Promise<string> {
  return new Promise((resolve) => {
    let body = "";
    req.on("data", (chunk: string) => (body += chunk));
    req.on("end", () => resolve(body));
  });
}

Step 6: Function Calling

Function calling lets Gemini invoke your TypeScript functions to fetch real-time data, interact with APIs, or perform calculations.

Defining Tools

import {
  GoogleGenerativeAI,
  FunctionDeclarationSchemaType,
} from "@google/generative-ai";
 
const tools = [
  {
    functionDeclarations: [
      {
        name: "getWeather",
        description:
          "Get the current weather for a specific location",
        parameters: {
          type: FunctionDeclarationSchemaType.OBJECT,
          properties: {
            location: {
              type: FunctionDeclarationSchemaType.STRING,
              description: "City name, e.g., 'Tunis' or 'Paris'",
            },
            unit: {
              type: FunctionDeclarationSchemaType.STRING,
              enum: ["celsius", "fahrenheit"],
              description: "Temperature unit",
            },
          },
          required: ["location"],
        },
      },
      {
        name: "searchProducts",
        description: "Search for products in the catalog",
        parameters: {
          type: FunctionDeclarationSchemaType.OBJECT,
          properties: {
            query: {
              type: FunctionDeclarationSchemaType.STRING,
              description: "Search query",
            },
            maxPrice: {
              type: FunctionDeclarationSchemaType.NUMBER,
              description: "Maximum price filter",
            },
          },
          required: ["query"],
        },
      },
    ],
  },
];

Handling Function Calls

// Simulated function implementations
function getWeather(location: string, unit = "celsius") {
  const mockData: Record<string, { temp: number; condition: string }> = {
    tunis: { temp: 24, condition: "Sunny" },
    paris: { temp: 15, condition: "Cloudy" },
    tokyo: { temp: 20, condition: "Clear" },
  };
 
  const data = mockData[location.toLowerCase()] || {
    temp: 20,
    condition: "Unknown",
  };
 
  return {
    location,
    temperature: data.temp,
    unit,
    condition: data.condition,
  };
}
 
function searchProducts(query: string, maxPrice?: number) {
  return {
    results: [
      { name: `${query} Pro`, price: 99.99, rating: 4.5 },
      { name: `${query} Lite`, price: 49.99, rating: 4.2 },
    ].filter((p) => !maxPrice || p.price <= maxPrice),
  };
}
 
async function chatWithTools(userMessage: string) {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.5-flash",
    tools,
  });
 
  const chat = model.startChat();
  const result = await chat.sendMessage(userMessage);
 
  const response = result.response;
  const functionCalls = response.functionCalls();
 
  if (functionCalls && functionCalls.length > 0) {
    const functionResponses = functionCalls.map((call) => {
      let result;
      switch (call.name) {
        case "getWeather":
          result = getWeather(
            call.args.location as string,
            call.args.unit as string
          );
          break;
        case "searchProducts":
          result = searchProducts(
            call.args.query as string,
            call.args.maxPrice as number
          );
          break;
        default:
          result = { error: "Unknown function" };
      }
 
      return {
        functionResponse: {
          name: call.name,
          response: result,
        },
      };
    });
 
    // Send function results back to Gemini
    const finalResult = await chat.sendMessage(functionResponses);
    console.log(finalResult.response.text());
  } else {
    console.log(response.text());
  }
}
 
chatWithTools("What is the weather in Tunis and find me some laptop options under $80?");

Gemini will call both functions in parallel and synthesize the results into a natural language response.

Step 7: Structured Output with JSON Schema

Force Gemini to return data in a specific JSON structure — perfect for building reliable data pipelines:

async function extractStructuredData(text: string) {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.5-flash",
    generationConfig: {
      responseMimeType: "application/json",
      responseSchema: {
        type: FunctionDeclarationSchemaType.OBJECT,
        properties: {
          sentiment: {
            type: FunctionDeclarationSchemaType.STRING,
            enum: ["positive", "negative", "neutral"],
          },
          confidence: {
            type: FunctionDeclarationSchemaType.NUMBER,
          },
          topics: {
            type: FunctionDeclarationSchemaType.ARRAY,
            items: {
              type: FunctionDeclarationSchemaType.STRING,
            },
          },
          summary: {
            type: FunctionDeclarationSchemaType.STRING,
          },
        },
        required: ["sentiment", "confidence", "topics", "summary"],
      },
    },
  });
 
  const result = await model.generateContent(
    `Analyze the following text:\n\n${text}`
  );
 
  const data = JSON.parse(result.response.text());
  return data;
}
 
// Usage
const analysis = await extractStructuredData(
  "The new Gemini API is incredible! The documentation is clear, " +
  "the TypeScript SDK is well-designed, and the pricing is very competitive. " +
  "However, rate limits can be a concern for high-traffic applications."
);
 
console.log(analysis);
// {
//   sentiment: "positive",
//   confidence: 0.85,
//   topics: ["API", "documentation", "SDK", "pricing", "rate limits"],
//   summary: "Positive review of Gemini API with minor concern about rate limits"
// }

Step 8: Building a Conversational AI Assistant

Let us put everything together into a practical AI assistant with conversation history, system instructions, and safety settings:

import {
  GoogleGenerativeAI,
  HarmCategory,
  HarmBlockThreshold,
} from "@google/generative-ai";
import * as readline from "readline";
 
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY!);
 
async function startAssistant() {
  const model = genAI.getGenerativeModel({
    model: "gemini-2.5-flash",
    systemInstruction: `You are a helpful coding assistant specializing in TypeScript and Node.js.
    You provide concise, practical answers with code examples when appropriate.
    Always consider best practices and security implications.
    If you are unsure about something, say so rather than guessing.`,
    safetySettings: [
      {
        category: HarmCategory.HARM_CATEGORY_HARASSMENT,
        threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
      },
      {
        category: HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
        threshold: HarmBlockThreshold.BLOCK_MEDIUM_AND_ABOVE,
      },
    ],
    generationConfig: {
      temperature: 0.7,
      topP: 0.95,
      topK: 40,
      maxOutputTokens: 2048,
    },
  });
 
  const chat = model.startChat({
    history: [],
  });
 
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });
 
  console.log("AI Assistant ready! Type 'exit' to quit.\n");
 
  const askQuestion = () => {
    rl.question("You: ", async (input) => {
      const trimmed = input.trim();
      if (trimmed.toLowerCase() === "exit") {
        console.log("Goodbye!");
        rl.close();
        return;
      }
 
      try {
        const result = await chat.sendMessageStream(trimmed);
 
        process.stdout.write("Assistant: ");
        for await (const chunk of result.stream) {
          process.stdout.write(chunk.text());
        }
        console.log("\n");
      } catch (error: any) {
        console.error("Error:", error.message);
      }
 
      askQuestion();
    });
  };
 
  askQuestion();
}
 
startAssistant();

Generation Config Explained

Parameter	Default	Description
`temperature`	1.0	Controls randomness. Lower = more focused, higher = more creative
`topP`	0.95	Nucleus sampling. Consider tokens with cumulative probability up to this value
`topK`	40	Consider only the top K tokens at each step
`maxOutputTokens`	Varies	Maximum number of tokens to generate

Step 9: Error Handling and Rate Limiting

Production applications need robust error handling:

import { GoogleGenerativeAI, GoogleGenerativeAIError } from "@google/generative-ai";
 
async function safeGenerate(prompt: string, retries = 3): Promise<string> {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
  for (let attempt = 1; attempt <= retries; attempt++) {
    try {
      const result = await model.generateContent(prompt);
      return result.response.text();
    } catch (error: any) {
      const status = error?.status;
 
      if (status === 429) {
        // Rate limited — exponential backoff
        const delay = Math.pow(2, attempt) * 1000;
        console.warn(
          `Rate limited. Retrying in ${delay / 1000}s (attempt ${attempt}/${retries})`
        );
        await new Promise((r) => setTimeout(r, delay));
        continue;
      }
 
      if (status === 400) {
        console.error("Bad request — check your prompt or parameters");
        throw error;
      }
 
      if (status === 403) {
        console.error("API key invalid or quota exceeded");
        throw error;
      }
 
      if (attempt === retries) throw error;
 
      console.warn(`Error (attempt ${attempt}/${retries}):`, error.message);
      await new Promise((r) => setTimeout(r, 1000 * attempt));
    }
  }
 
  throw new Error("Max retries exceeded");
}

Rate Limit Best Practices

Use exponential backoff — double the wait time between retries
Cache responses — store results for identical prompts
Use Flash for high-volume — it has higher rate limits than Pro
Batch requests — group related prompts when possible

Step 10: Production Tips

Environment Configuration

// src/config.ts
interface GeminiConfig {
  apiKey: string;
  model: string;
  maxRetries: number;
  timeout: number;
}
 
export function getConfig(): GeminiConfig {
  const apiKey = process.env.GEMINI_API_KEY;
  if (!apiKey) {
    throw new Error("GEMINI_API_KEY environment variable is required");
  }
 
  return {
    apiKey,
    model: process.env.GEMINI_MODEL || "gemini-2.5-flash",
    maxRetries: parseInt(process.env.GEMINI_MAX_RETRIES || "3", 10),
    timeout: parseInt(process.env.GEMINI_TIMEOUT || "30000", 10),
  };
}

Token Counting

Before sending large prompts, check the token count:

async function countTokens(content: string) {
  const model = genAI.getGenerativeModel({ model: "gemini-2.5-flash" });
 
  const result = await model.countTokens(content);
  console.log(`Token count: ${result.totalTokens}`);
 
  return result.totalTokens;
}

Cost Estimation

function estimateCost(
  inputTokens: number,
  outputTokens: number,
  model: "pro" | "flash" | "flash-lite"
) {
  // Approximate pricing per 1M tokens (check current pricing)
  const pricing = {
    pro: { input: 1.25, output: 10.0 },
    flash: { input: 0.15, output: 0.6 },
    "flash-lite": { input: 0.075, output: 0.3 },
  };
 
  const p = pricing[model];
  const inputCost = (inputTokens / 1_000_000) * p.input;
  const outputCost = (outputTokens / 1_000_000) * p.output;
 
  return {
    inputCost: `$${inputCost.toFixed(4)}`,
    outputCost: `$${outputCost.toFixed(4)}`,
    totalCost: `$${(inputCost + outputCost).toFixed(4)}`,
  };
}

Troubleshooting

Common Issues

"API key not valid"

Verify your key at aistudio.google.com
Ensure the Generative AI API is enabled in your Google Cloud project
Check that your key is not restricted to specific APIs

"Resource exhausted" (429 errors)

You have hit the rate limit. Implement exponential backoff
Consider upgrading to a paid tier for higher limits
Use gemini-2.5-flash which has more generous rate limits

"Content blocked by safety filters"

Adjust safetySettings thresholds (but be responsible)
Rephrase the prompt to avoid triggering filters
Review Google's usage policies

Empty responses

The model may have been blocked by safety filters with no text returned
Check response.promptFeedback for block reasons

Next Steps

Now that you have mastered the Gemini API fundamentals, explore these advanced topics:

Gemini with LangChain.js — Build complex AI chains and agents
Vertex AI — Enterprise-grade deployment with Google Cloud
Fine-tuning — Customize Gemini models for your domain
Context caching — Reduce costs for repeated context windows
Grounding with Google Search — Enhance responses with real-time web data

Conclusion

You have built a complete AI application toolkit using the Google Gemini API and TypeScript. From simple text generation to multimodal analysis, streaming, function calling, and structured output — you now have all the building blocks to create sophisticated AI-powered applications.

The Gemini API's combination of a massive context window, competitive pricing, and native multimodal support makes it an excellent choice for production AI applications. Start with Flash for development speed, and scale to Pro when you need enhanced reasoning capabilities.