v0.2.11Free during early access on Windows · Get the install command

local-first dictation for the agentic age

Speak your intent. Ship a prompt.

OmniVox listens, structures, and types — all on your machine. Whisper for speech, a local Qwen for cleanup, output shaped for Claude Code, Cursor, and Codex. No cloud, no API keys, no telemetry.

PowerShell · paste & run
irm https://tryomnivox.com/install.ps1 | iex
See Structured Mode

Windows · macOS · Linux soon · v0.2.11 · 10 MB

Raw transcript
whisper-medium

press Ctrl + Alt to speak…

trigger detected
OV

Ctrl + Alt · global hotkey

Structured Mode
waiting
intent
goal
files
constraints
urgency
·voxify

01 — Speech in (local Whisper)

02 — Voxify trigger fires

03 — Slot-shaped prompt out

How it works

Four keypresses from thought to shipped.

The whole flow is one global hotkey, your voice, and an optional word that turns plain dictation into a structured prompt. Nothing else to learn.

  1. 01

    Press the hotkey

    Ctrl + Alt by default. Remap to any one or two-key combo.

    Ctrl+Alt
  2. 02

    Speak naturally

    Whisper transcribes locally with screen-context bias for technical terms.

    ···
  3. 03

    Say “voxify” (optional)

    Qwen3 classifies intent and slots your words into a clean prompt shape.

    voxify
  4. 04

    Lands in your app

    Type-simulation pastes it where your cursor is, then restores focus.

Type Simulation

Lands cleanly in every app you already use.

OmniVox simulates keystrokes the moment a transcript is ready, so your words land in the focused window — whether that's Claude Code, Cursor, your terminal, Linear, Notion, or any text field. Clipboard, type, or both, with focus restored after.

ClipboardType-simulationAuto-pasteFocus restore
auth.ts — claude code
1
2
3
4
5
6
7
8
OV
Speed

faster than typing.

You think at 400 words per minute, speak at ~180, and type at 45. OmniVox closes the gap between thought and prompt — so you stop translating ideas into keystrokes and start shipping them.

Keyboard

45wpm

I'm getting started with the project. How would you like to set up the file…

OmniVox · spoken

180wpm

Natural speech, captured locally, structured for the tool you're aiming it at.

Structured Mode · voxify

Rambling thought, agent-ready prompt.

Say “voxify” at the end of any thought. A local Qwen3 model recognises whether you're asking to implement, explore, or get advice, and slots your speech into a grammar-validated prompt — file paths intact, fillers gone, hallucinations refused.

Always in shapeSlots stay slots. The model can't drift into prose, double-fill, or skip required fields.
Fabrication defencesFiles must appear in your speech. Third-person rewritten. Short input refused.
Voxify triggerOnly structures when you ask. Otherwise it's just clean dictation.
omnivox · structured previewready

raw transcript ▾

“Um, okay, so we need to look at the auth middleware — the JWT refresh thing — it's failing on stale tokens in src/middleware/auth.ts. Has to ship by Friday's review, can't break the refresh-token flow. Voxify.

qwen3 · structured · 240ms

structured prompt

implementation
goal
Fix auth middleware on stale JWT refresh
files
src/middleware/auth.ts
constraints
Must not break refresh-token flow
urgencyhigh
High — Friday review
↵ enter to paste
intent detected
Privacy

Lives on your machine

Whisper and Qwen run locally with optional Vulkan or CUDA acceleration. No cloud round-trip. No API keys. No telemetry. Air-gap-friendly.

your machine
whisper.cppqwen3 · 0.6Bvulkancuda
Screen Context

Reads what's on screen

Windows-only: OmniVox peeks at your focused window and feeds file paths, identifiers, and CLI flags to Whisper as bias tokens — so technical strings transcribe verbatim.

$ claude --model opus-4-7
// editing src/middleware/auth.ts
// verifyJWT( ) failed @ line 42

bias tokens captured

claudeopus-4-7src/middleware/auth.tsverifyJWT--modelline 42
Whisper now hears verifyJWT — not verify J. W. T.
Customization

Three layers of vocabulary

Bias Whisper's recognition, post-correct phonetic mishears, and expand voice shortcuts into full text. Scope each entry globally or to a single mode.

Vocabularyseed Whisper's prompt
OmniVoxClaude CodeQwen3TauriVulkanverifyJWT
Dictionaryreplace what Whisper hears
omni vox → OmniVoxclaude codes → Claude Codetow ree → Tauri
Snippetsexpand short phrases
ty → thank youtldr → too long; didn’t readcalendly → cal.com/me/intro
Voice Commands

Speak the keystroke

A short, focused command vocabulary that turns into real keystrokes. Toggle each command on or off — independent toggles, no accidents.

new line
Shift + Enter
new paragraph
Shift + Enter × 2
delete last word
Ctrl + Backspace
send
Enter ↵
Add app-binding (launches hotkeys & programs)
Context Modes

A different brain for every app.

Build a profile per workflow. Each mode gets its own vocabulary, dictionary, snippets, writing style, icon, and app bindings — and OmniVox switches the second you switch windows.

VocabularySnippetsStyleBindingsIconColor
4 modes · auto-switchinglive
01
CodingCasual

Tech vocab, file-paths bias, code-style snippets

Auto
02
EmailFormal

Formal structure, signoffs, salutations

Auto
03
NotesVery Casual

Minimal formatting, append straight to a note

Auto
04
ChatVery Casual

Ship mode on — send right after transcription

Auto

Built for your hardware

Hardware-aware. Model-agnostic.

Speech modelsbase · small · medium · large
LLMQwen3 0.6B (local)
AccelerationVulkan + CUDA
Languages100+
A developer's workspace at night — OmniVox pill projecting structured prompt cards into a code editor next to an open journal
Friday · 11:42 PM · Claude Code
Built for agentic dev tools

Talk to your agents.

Most dictation tools just paste text. OmniVox shapes what you say into the structure agentic coding tools actually want — intent classification, preserved file paths, explicit constraints. Skip the prompt-engineering tax.

tested with
Claude CodeClaude Code
CursorCursor
CodexCodex
Aider
01you say
+ voxify

fix the auth middleware, the jwt refresh keeps failing on stale tokens, has to ship by friday, voxify

02structured
intent · implementation
GoalFix auth middleware on stale JWT refresh
Filessrc/middleware/auth.ts
ConstraintsMust not break refresh-token flow
Urgencyhigh — Friday review
03agent receives
pasted by OV

Reading src/middleware/auth.ts… I see the JWT refresh handler. To preserve the existing flow, I'll …

File paths preservedIf you said it, it appears. If you didn’t, it doesn’t — the model is forbidden from inventing filenames.
Intent classifiedImplementation, exploration, or advice. Each maps to a different slot shape so agents read context the right way.
Clean every timeFiller words, repetition, and third-person mishears are stripped. What lands in your editor is the prompt you meant to write.
How it stacks up

What changes when nothing leaves your machine.

CapabilityOmniVoxCloud dictation
Runs entirely on your device
Works offline / air-gapped
Account & API keys required
Monthly subscription
Structured output for agentic tools
Per-app context modes
Screen-context bias for code
Voice commands → keystrokes

comparison based on public docs of major cloud dictation tools, may 2026

FAQ

Honest answers, no marketing.

The questions worth answering before you install something on your machine that listens to you.

  1. Yes — both Whisper (speech) and Qwen3 (the optional structuring LLM) run as native code through whisper.cpp and llama.cpp on your hardware. There are no outbound network calls, no API keys, no telemetry. You can run OmniVox on an air-gapped laptop and it still works.
  2. Whisper's base and small models run comfortably on modern CPUs. For medium / large models and live Structured Mode, a Vulkan-capable GPU (or NVIDIA + CUDA) is a noticeable upgrade. The Models tab inspects your machine and recommends a fit.
  3. Whisper is state-of-the-art for general English. OmniVox layers two things on top: a Vocabulary prompt that biases recognition toward your domain (file paths, product names, jargon), and on Windows, screen-context bias that reads identifiers from your focused window. Technical strings transcribe verbatim where cloud tools mishear them.
  4. Structured Mode runs a small local LLM over your transcript and rewrites it as a clean prompt with labeled sections — Goal, Files, Constraints, Urgency — that agentic tools (Claude Code, Cursor, Codex) read directly. It only fires when you end your speech with the phrase “voxify” (or one of its phonetic aliases). Plain dictation otherwise.
  5. Anywhere a keyboard works. OmniVox simulates keystrokes (or pastes from clipboard, or both — your choice) into the focused window, then restores focus. We test against Claude Code, Cursor, VS Code, terminals, Notion, Linear, Slack, Gmail, Discord, and every other text field on your desktop.
  6. Yes — that's what Context Modes are. Each mode gets its own vocabulary, dictionary (phonetic substitutions), snippets (text expansions), writing style, and app bindings. Switch from the floating pill or let OmniVox auto-switch when you switch windows.
  7. Whisper handles 100+ languages with optional translation. OmniVox exposes the language selector in Settings and routes through the same local pipeline — no per-language cloud surfaces, no per-language pricing.
  8. Most alternatives stream your audio to a cloud, charge a subscription, and emit raw text. OmniVox runs entirely on your machine, is free during early access, and shapes its output for agentic dev tools instead of just typing what you said. No account required.
Privacy by architecture

Your voice. Your machine.

No cloud round-trip. No API keys. No telemetry. Whisper and Qwen run on your hardware — air-gap your laptop and OmniVox still works.

Latest · v0.2.11· May 31, 2026
Release notes →
Open PowerShell — paste — press enter
PowerShell · paste & run
irm https://tryomnivox.com/install.ps1 | iex

The script fetches the signed installer, verifies its SHA-256, and runs it. No GitHub account, no browser download, no clicks.

SHA-256eadc61e86d7b353637b514d4full hash →
Windows 10 / 11 (x64)4 GB+ RAMNo internet required
No cloudSpeech never leaves your device. Disconnect and it still runs.
No keysNo accounts, no API tokens, no third-party billing surfaces.
No telemetryZero analytics calls. We don't know you exist. That's the point.