v0.2.11Free during early access on Windows · Get the install command

local-first dictation for the agentic age

Speak your intent. Ship a prompt.

OmniVox listens, structures, and types — all on your machine. Whisper for speech, a local Qwen for cleanup, output shaped for Claude Code, Cursor, and Codex. No cloud, no API keys, no telemetry.

PowerShell · paste & run

irm https://tryomnivox.com/install.ps1 | iex

See Structured Mode

Windows · macOS · Linux soon · v0.2.11 · 10 MB

Raw transcript

whisper-medium

press Ctrl + Alt to speak…

trigger detected

Ctrl + Alt · global hotkey

Structured Mode

waiting

intent

goal

files

constraints

urgency

·voxify

01 — Speech in (local Whisper)

02 — Voxify trigger fires

03 — Slot-shaped prompt out

How it works

Four keypresses from thought to shipped.

The whole flow is one global hotkey, your voice, and an optional word that turns plain dictation into a structured prompt. Nothing else to learn.

01
Press the hotkey
Ctrl + Alt by default. Remap to any one or two-key combo.
Ctrl+Alt
02
Speak naturally
Whisper transcribes locally with screen-context bias for technical terms.
···
03
Say “voxify” (optional)
Qwen3 classifies intent and slots your words into a clean prompt shape.
voxify
04
Lands in your app
Type-simulation pastes it where your cursor is, then restores focus.
↵

Type Simulation

Lands cleanly in every app you already use.

OmniVox simulates keystrokes the moment a transcript is ready, so your words land in the focused window — whether that's Claude Code, Cursor, your terminal, Linear, Notion, or any text field. Clipboard, type, or both, with focus restored after.

ClipboardType-simulationAuto-pasteFocus restore

auth.ts — claude code

Speed

4× faster than typing.

You think at 400 words per minute, speak at ~180, and type at 45. OmniVox closes the gap between thought and prompt — so you stop translating ideas into keystrokes and start shipping them.

Keyboard

45wpm

I'm getting started with the project. How would you like to set up the file…

OmniVox · spoken

180wpm

Natural speech, captured locally, structured for the tool you're aiming it at.

Structured Mode · voxify

Rambling thought, agent-ready prompt.

Say “voxify” at the end of any thought. A local Qwen3 model recognises whether you're asking to implement, explore, or get advice, and slots your speech into a grammar-validated prompt — file paths intact, fillers gone, hallucinations refused.

Always in shapeSlots stay slots. The model can't drift into prose, double-fill, or skip required fields.

Fabrication defencesFiles must appear in your speech. Third-person rewritten. Short input refused.

Voxify triggerOnly structures when you ask. Otherwise it's just clean dictation.

omnivox · structured previewready

raw transcript ▾

“Um, okay, so we need to look at the auth middleware — the JWT refresh thing — it's failing on stale tokens in src/middleware/auth.ts. Has to ship by Friday's review, can't break the refresh-token flow. Voxify.”

qwen3 · structured · 240ms

structured prompt

implementation

goal

Fix auth middleware on stale JWT refresh

files

src/middleware/auth.ts

constraints

Must not break refresh-token flow

urgencyhigh

High — Friday review

↵ enter to paste

intent detected

Privacy

Lives on your machine

Whisper and Qwen run locally with optional Vulkan or CUDA acceleration. No cloud round-trip. No API keys. No telemetry. Air-gap-friendly.

your machine

whisper.cppqwen3 · 0.6Bvulkancuda

Screen Context

Reads what's on screen

Windows-only: OmniVox peeks at your focused window and feeds file paths, identifiers, and CLI flags to Whisper as bias tokens — so technical strings transcribe verbatim.

$ claude --model opus-4-7
// editing src/middleware/auth.ts
// verifyJWT( ) failed @ line 42

bias tokens captured

claudeopus-4-7src/middleware/auth.tsverifyJWT--modelline 42

Whisper now hears verifyJWT — not verify J. W. T.

Customization

Three layers of vocabulary

Bias Whisper's recognition, post-correct phonetic mishears, and expand voice shortcuts into full text. Scope each entry globally or to a single mode.

Vocabularyseed Whisper's prompt

OmniVoxClaude CodeQwen3TauriVulkanverifyJWT

Dictionaryreplace what Whisper hears

omni vox → OmniVoxclaude codes → Claude Codetow ree → Tauri

Snippetsexpand short phrases

ty → thank youtldr → too long; didn’t readcalendly → cal.com/me/intro

Voice Commands

Speak the keystroke

A short, focused command vocabulary that turns into real keystrokes. Toggle each command on or off — independent toggles, no accidents.

“new line”

Shift + Enter

“new paragraph”

Shift + Enter × 2

“delete last word”

Ctrl + Backspace

“send”

Enter ↵

Add app-binding (launches hotkeys & programs)

Context Modes

A different brain for every app.

Build a profile per workflow. Each mode gets its own vocabulary, dictionary, snippets, writing style, icon, and app bindings — and OmniVox switches the second you switch windows.

VocabularySnippetsStyleBindingsIconColor

4 modes · auto-switchinglive

CodingCasual

Tech vocab, file-paths bias, code-style snippets

⌥1Auto

EmailFormal

Formal structure, signoffs, salutations

⌥2Auto

NotesVery Casual

Minimal formatting, append straight to a note

⌥3Auto

ChatVery Casual

Ship mode on — send right after transcription

⌥4Auto

Built for your hardware

Hardware-aware. Model-agnostic.

Speech modelsbase · small · medium · large

LLMQwen3 0.6B (local)

AccelerationVulkan + CUDA

Languages100+

A developer's workspace at night — OmniVox pill projecting structured prompt cards into a code editor next to an open journal

Friday · 11:42 PM · Claude Code

Built for agentic dev tools

Talk to your agents.

Most dictation tools just paste text. OmniVox shapes what you say into the structure agentic coding tools actually want — intent classification, preserved file paths, explicit constraints. Skip the prompt-engineering tax.

tested with

Claude Code

Cursor

Codex

Aider

01you say

+ voxify

fix the auth middleware, the jwt refresh keeps failing on stale tokens, has to ship by friday, voxify

02structured

intent · implementation

GoalFix auth middleware on stale JWT refresh

Filessrc/middleware/auth.ts

ConstraintsMust not break refresh-token flow

Urgencyhigh — Friday review

03agent receives

pasted by OV

Reading src/middleware/auth.ts… I see the JWT refresh handler. To preserve the existing flow, I'll …

File paths preservedIf you said it, it appears. If you didn’t, it doesn’t — the model is forbidden from inventing filenames.

Intent classifiedImplementation, exploration, or advice. Each maps to a different slot shape so agents read context the right way.

Clean every timeFiller words, repetition, and third-person mishears are stripped. What lands in your editor is the prompt you meant to write.

How it stacks up

What changes when nothing leaves your machine.

CapabilityOmniVoxCloud dictation

Runs entirely on your device

Works offline / air-gapped

Account & API keys required

Monthly subscription

Structured output for agentic tools

Per-app context modes

Screen-context bias for code

Voice commands → keystrokes

comparison based on public docs of major cloud dictation tools, may 2026

FAQ

Honest answers, no marketing.

The questions worth answering before you install something on your machine that listens to you.

Yes — both Whisper (speech) and Qwen3 (the optional structuring LLM) run as native code through whisper.cpp and llama.cpp on your hardware. There are no outbound network calls, no API keys, no telemetry. You can run OmniVox on an air-gapped laptop and it still works.
Whisper's base and small models run comfortably on modern CPUs. For medium / large models and live Structured Mode, a Vulkan-capable GPU (or NVIDIA + CUDA) is a noticeable upgrade. The Models tab inspects your machine and recommends a fit.
Whisper is state-of-the-art for general English. OmniVox layers two things on top: a Vocabulary prompt that biases recognition toward your domain (file paths, product names, jargon), and on Windows, screen-context bias that reads identifiers from your focused window. Technical strings transcribe verbatim where cloud tools mishear them.
Structured Mode runs a small local LLM over your transcript and rewrites it as a clean prompt with labeled sections — Goal, Files, Constraints, Urgency — that agentic tools (Claude Code, Cursor, Codex) read directly. It only fires when you end your speech with the phrase “voxify” (or one of its phonetic aliases). Plain dictation otherwise.
Anywhere a keyboard works. OmniVox simulates keystrokes (or pastes from clipboard, or both — your choice) into the focused window, then restores focus. We test against Claude Code, Cursor, VS Code, terminals, Notion, Linear, Slack, Gmail, Discord, and every other text field on your desktop.
Yes — that's what Context Modes are. Each mode gets its own vocabulary, dictionary (phonetic substitutions), snippets (text expansions), writing style, and app bindings. Switch from the floating pill or let OmniVox auto-switch when you switch windows.
Whisper handles 100+ languages with optional translation. OmniVox exposes the language selector in Settings and routes through the same local pipeline — no per-language cloud surfaces, no per-language pricing.
Most alternatives stream your audio to a cloud, charge a subscription, and emit raw text. OmniVox runs entirely on your machine, is free during early access, and shapes its output for agentic dev tools instead of just typing what you said. No account required.

Privacy by architecture

Your voice. Your machine.

No cloud round-trip. No API keys. No telemetry. Whisper and Qwen run on your hardware — air-gap your laptop and OmniVox still works.

Latest · v0.2.11· May 31, 2026

Release notes →

Open PowerShell — paste — press enter

PowerShell · paste & run

irm https://tryomnivox.com/install.ps1 | iex

The script fetches the signed installer, verifies its SHA-256, and runs it. No GitHub account, no browser download, no clicks.

SHA-256eadc61e86d7b3536…37b514d4full hash →

Windows 10 / 11 (x64)4 GB+ RAMNo internet required

macOS · soon·Linux · soon

No cloudSpeech never leaves your device. Disconnect and it still runs.

No keysNo accounts, no API tokens, no third-party billing surfaces.

No telemetryZero analytics calls. We don't know you exist. That's the point.