Cline is a powerful AI coding agent within VS Code. While text input is standard, wouldn't it be convenient to issue commands using your voice? This tutorial demonstrates how we added voice command capabilities to Cline by creating a dedicated VS Code extension (cline-voice-assistant) that leverages an ElevenLabs MCP (Model Context Protocol) server for accurate speech-to-text (STT) transcription.
What this solution provides:
Hands-free Interaction: Trigger voice recording via a command or keybinding.
Accurate Transcription: Utilizes ElevenLabs' STT API via a local MCP server.
Seamless Integration: Sends the transcribed text directly to the main Cline extension for processing.
How it Works:
A user triggers the Cline: Start Voice Command in VS Code (provided by cline-voice-assistant).
The extension uses the sox command-line tool to record audio from the default microphone, saving it to a temporary file.
The extension connects to a locally running elevenlabs-mcp-server using the MCP SDK.
It calls the elevenlabs_stt tool on the MCP server, passing the path to the recorded audio file.
The MCP server sends the audio to the ElevenLabs API and returns the transcription.
The cline-voice-assistant extension retrieves the API exported by the main Cline extension (saoudrizwan.claude-dev).
It uses the sendMessage method from the Cline API to send the transcribed text to the main Cline chat interface.
Cline processes the text as if it were typed, and the response appears in the chat window.
This tutorial focuses on voice input. The response from Cline will still be text-based in the chat window. Adding voice output (Text-to-Speech) for Cline's responses would require further modifications, potentially to the main Cline extension itself.
Step-by-Step Guide
Let's walk through the key steps involved in creating this voice assistant setup.
Prerequisites
Cline Extension: The main Cline VS Code extension (saoudrizwan.claude-dev) must be installed.
Node.js & npm: Required for running MCP servers and building extensions.
sox: A command-line audio utility. Install it (e.g., on macOS: brew install sox).
ElevenLabs Account & API Key: Sign up at ElevenLabs and get an API key.
Saiku Project: This tutorial assumes you are working within the Saiku project structure (/Users/macbookpro/Developer/saiku in this example).
Create ElevenLabs MCP Server
We need a server to handle STT requests using the ElevenLabs API.
Alternatively, install manually via the Extensions view (... > "Install from VSIX...").
Restart VS Code: Restart VS Code completely.
Usage
Ensure sox is installed and the elevenlabs-mcp-server is running.
Open the Command Palette (Cmd+Shift+P or Ctrl+Shift+P).
Run Cline: Start Voice Command.
Speak your command.
The transcribed text appears in the Cline chat window, followed by Cline's text response. Check Developer Tools (Help > Toggle Developer Tools > Console) for logs or errors.
Conclusion
By creating a dedicated VS Code extension and leveraging an ElevenLabs MCP server, we've successfully enabled voice command input for Cline. This setup uses sox for recording, the MCP server for ElevenLabs STT, and the main Cline extension's API to process the transcribed text. While the response remains text-based, this provides a significant convenience for hands-free interaction.
Future Possibilities
This setup provides a solid foundation for voice input. Here are some potential next steps:
Voice Output: Modify the main Cline extension (extensions/cline/) to check if input came via voice and, if so, use the elevenlabs_tts_and_play MCP tool to speak the response instead of just displaying text. This requires understanding and modifying the Cline extension's core logic.
Alternative STT: Replace the ElevenLabs MCP server with one using a different STT service (like Whisper, either local via whisper.cpp or the OpenAI API).
Integrated Recording: Replace the sox dependency by implementing recording directly within the VS Code extension using Webview APIs (MediaRecorder), making the setup more self-contained.
UI Button: Add a microphone button to the Cline UI instead of relying on the Command Palette.
If you're interested in enhancing Cline's capabilities, consider:
Forking the Project: Explore the Saiku codebase (https://github.com/nooqta/saiku) and experiment with your own modifications.
Contributing: If you develop improvements, consider contributing back to the main project following their contribution guidelines.