MCP Server Whisper

JSON →

http

communication

Advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models.

Tools · 8

list_audio_files Lists audio files with comprehensive filtering and sorting options: Filter by regex pattern matching on filenames, filter by file size, duration, modification time, or format, sort by name, size, duration, modification time, or format, returns type-safe FilePathSupportParams with full metadata
get_latest_audio Gets the most recently modified audio file with model support info
convert_audio Converts audio files to supported formats (mp3 or wav), returns AudioProcessingResult with output path
compress_audio Compresses audio files that exceed size limits, returns AudioProcessingResult with output path
transcribe_audio Advanced transcription using OpenAI's models: Supports whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe, custom prompts for guided transcription, optional timestamp granularities for word and segment-level timing, JSON response format option, returns TranscriptionResult with text, usage data, and optional timestamps
chat_with_audio Interactive audio analysis using GPT-4o audio models: Supports gpt-4o-audio-preview (recommended) and dated versions, note: gpt-4o-mini-audio-preview has limitations with audio chat and is not recommended, custom system and user prompts, provides conversational responses to audio content, returns ChatResult with response text
transcribe_with_enhancement Enhanced transcription with specialized templates: detailed - includes tone, emotion, and background details, storytelling - transforms the transcript into a narrative form, professional - creates formal, business-appropriate transcriptions, analytical - adds analysis of speech patterns and key points, returns TranscriptionResult with enhanced output
create_audio Generate text-to-speech audio using OpenAI's TTS API: Supports gpt-4o-mini-tts (preferred) and other speech models, multiple voice options (alloy, ash, ballad, coral, echo, sage, shimmer, verse, marin, cedar), speed adjustment and custom instructions, customizable output file paths, handles texts of

Environment variables

OPENAI_API_KEY

Links

githubgithub.com/arcaputo3/mcp-server-whisper ↗

★ 54 GitHub stars