v0.2.11Free during early access on Windows · Get the install command
Four keypresses from thought to shipped.
The whole flow is one global hotkey, your voice, and an optional word that turns plain dictation into a structured prompt. Nothing else to learn.
- 01
Press the hotkey
Ctrl + Alt by default. Remap to any one or two-key combo.
Ctrl+Alt - 02
Speak naturally
Whisper transcribes locally with screen-context bias for technical terms.
··· - 03
Say “voxify” (optional)
Qwen3 classifies intent and slots your words into a clean prompt shape.
voxify - 04
Lands in your app
Type-simulation pastes it where your cursor is, then restores focus.
↵
4× faster than typing.
You think at 400 words per minute, speak at ~180, and type at 45. OmniVox closes the gap between thought and prompt — so you stop translating ideas into keystrokes and start shipping them.
Keyboard
45wpm
I'm getting started with the project. How would you like to set up the file…
OmniVox · spoken
180wpm
Natural speech, captured locally, structured for the tool you're aiming it at.
Rambling thought, agent-ready prompt.
Say “voxify” at the end of any thought. A local Qwen3 model recognises whether you're asking to implement, explore, or get advice, and slots your speech into a grammar-validated prompt — file paths intact, fillers gone, hallucinations refused.
raw transcript ▾
“Um, okay, so we need to look at the auth middleware — the JWT refresh thing — it's failing on stale tokens in src/middleware/auth.ts. Has to ship by Friday's review, can't break the refresh-token flow. Voxify.”
structured prompt
implementationLives on your machine
Whisper and Qwen run locally with optional Vulkan or CUDA acceleration. No cloud round-trip. No API keys. No telemetry. Air-gap-friendly.
Reads what's on screen
Windows-only: OmniVox peeks at your focused window and feeds file paths, identifiers, and CLI flags to Whisper as bias tokens — so technical strings transcribe verbatim.
// editing src/middleware/auth.ts
// verifyJWT( ) failed @ line 42
bias tokens captured
Three layers of vocabulary
Bias Whisper's recognition, post-correct phonetic mishears, and expand voice shortcuts into full text. Scope each entry globally or to a single mode.
Speak the keystroke
A short, focused command vocabulary that turns into real keystrokes. Toggle each command on or off — independent toggles, no accidents.
A different brain for every app.
Build a profile per workflow. Each mode gets its own vocabulary, dictionary, snippets, writing style, icon, and app bindings — and OmniVox switches the second you switch windows.
Tech vocab, file-paths bias, code-style snippets
Formal structure, signoffs, salutations
Minimal formatting, append straight to a note
Ship mode on — send right after transcription
Built for your hardware
Hardware-aware. Model-agnostic.
What changes when nothing leaves your machine.
comparison based on public docs of major cloud dictation tools, may 2026
Honest answers, no marketing.
The questions worth answering before you install something on your machine that listens to you.
- Yes — both Whisper (speech) and Qwen3 (the optional structuring LLM) run as native code through whisper.cpp and llama.cpp on your hardware. There are no outbound network calls, no API keys, no telemetry. You can run OmniVox on an air-gapped laptop and it still works.
- Whisper's base and small models run comfortably on modern CPUs. For medium / large models and live Structured Mode, a Vulkan-capable GPU (or NVIDIA + CUDA) is a noticeable upgrade. The Models tab inspects your machine and recommends a fit.
- Whisper is state-of-the-art for general English. OmniVox layers two things on top: a Vocabulary prompt that biases recognition toward your domain (file paths, product names, jargon), and on Windows, screen-context bias that reads identifiers from your focused window. Technical strings transcribe verbatim where cloud tools mishear them.
- Structured Mode runs a small local LLM over your transcript and rewrites it as a clean prompt with labeled sections — Goal, Files, Constraints, Urgency — that agentic tools (Claude Code, Cursor, Codex) read directly. It only fires when you end your speech with the phrase “voxify” (or one of its phonetic aliases). Plain dictation otherwise.
- Anywhere a keyboard works. OmniVox simulates keystrokes (or pastes from clipboard, or both — your choice) into the focused window, then restores focus. We test against Claude Code, Cursor, VS Code, terminals, Notion, Linear, Slack, Gmail, Discord, and every other text field on your desktop.
- Yes — that's what Context Modes are. Each mode gets its own vocabulary, dictionary (phonetic substitutions), snippets (text expansions), writing style, and app bindings. Switch from the floating pill or let OmniVox auto-switch when you switch windows.
- Whisper handles 100+ languages with optional translation. OmniVox exposes the language selector in Settings and routes through the same local pipeline — no per-language cloud surfaces, no per-language pricing.
- Most alternatives stream your audio to a cloud, charge a subscription, and emit raw text. OmniVox runs entirely on your machine, is free during early access, and shapes its output for agentic dev tools instead of just typing what you said. No account required.
