Audio Transcription
Powered by Whisper AI with Speaker Detection
Click to upload or drag and drop
Supports MP3, WAV, M4A, FLAC, OGG, and more
Supports 1000+ sites including YouTube, TED Talks, Vimeo, Twitter/X, Reddit, and many more. Use "Download" to save the video without transcribing.
0 / 5000 characters
Select a voice and click "Generate Speech" to create audio.
Manage Custom Voices
Upload Voice Sample (for cloning)
Upload a clear WAV/MP3 of the target voice. Uses GPU server when available, CPU fallback (slower).
Upload Piper Model (instant playback)
Upload a trained .onnx + .onnx.json pair for instant local TTS.
Prepare Training Dataset
Upload raw audio → auto-transcribe with Whisper → download training-ready dataset ZIP.
Train on a GPU machine, then upload the .onnx model for instant playback (~0.5s).
Users
| Username | Role | Status | Transcriptions | Joined | Actions |
|---|