On English with a decent microphone in a quiet environment, expect 5–10% WER from any modern tool. Custom dictionaries and AI cleanup take the perceived quality much higher than that raw number suggests.
Learn
Voice typing accuracy explained
Modern voice typing tools advertise "high accuracy," but the actual quality you see depends on five factors you can largely control: microphone, environment, accent fit, custom vocabulary, and choice of cleanup model.
Start 14-day free trialHow accuracy is measured
Word Error Rate (WER)
The standard metric for speech recognition is Word Error Rate (WER): the percentage of words wrong in a transcript compared to a perfect reference. Lower is better.
Modern English speech recognition models report WER in the 4–8% range on clean conversational audio. For comparison, human transcribers typically achieve 4–6% WER on the same audio.
WER varies dramatically by:
- Language: English, Spanish, Mandarin tend to have the lowest WER. Less-resourced languages can be 15–25%.
- Accent: Native-speaker accents in the training data perform best. Regional or non-native accents may add 2–10 points to WER.
- Audio quality: Clean studio audio vs noisy open-office vs voice-over-phone vary by 5–15 points.
- Vocabulary: General conversation lowest WER. Technical, medical, or legal jargon adds 3–8 points without a custom dictionary.
Microphone matters more than you think
The biggest single accuracy lever
Built-in laptop microphones pick up keyboard noise, fan noise, and reverb. A USB headset or lapel mic typically reduces WER by 2–5 points just from cleaner input.
For most users, the highest-impact accuracy upgrade isn't a different software tool — it's a $40 USB microphone or AirPods Pro with the integrated mic.
Custom dictionaries close most gaps
For words the model consistently mishears
If a speech model consistently transcribes "Cypher" as "cipher," "kubectl" as "cube cuddle," or your colleague's name "Aanya" as "on you" — adding those terms to a custom dictionary fixes the problem permanently.
Tools like FluidVox auto-learn from your corrections: when you fix a transcription, the system remembers and applies the correction across future sessions. That's how accuracy compounds over weeks of use.
On-device vs cloud accuracy
A modest gap, narrowing fast
Cloud speech recognition models (Deepgram, Azure Speech) are still slightly more accurate than on-device alternatives in 2026, but the gap has narrowed significantly. Whisper large v2 running on Apple Silicon achieves WER within 1–3 points of cloud APIs on most languages.
For most users, on-device is now the right default — the privacy gains are real and the accuracy delta is small.
AI cleanup helps perceived accuracy
Even when raw WER is the same
Two tools with identical raw WER can produce very different output quality if one applies AI cleanup and the other doesn't. Cleanup fixes:
- Filler words ("uh," "um," "like")
- Restarted sentences
- Missing punctuation
- Casing on names and acronyms
- Common transcription mistakes for specific phrases
This is the practical difference between Apple's built-in Dictation (no AI cleanup) and a tool like FluidVox (full LLM cleanup). The raw WER may be similar; the output quality is markedly different.
How to improve your accuracy
Practical levers you control
- Use a real microphone. A $40 USB headset gets you most of the way.
- Reduce ambient noise. Turn off the fan, close the window.
- Speak at conversational pace. Not too fast, not robotically slow.
- Build your custom dictionary. Add the 50 terms you use most that get mistranscribed.
- Pick the right tool. See our 2026 Mac comparison or Windows comparison.
Frequently asked questions
Heavily under-represented accents in training data (e.g., very strong regional dialects) may have higher WER. Custom dictionaries help compensate. Some users report that Wispr Flow and FluidVox handle accents better than older Apple Dictation.
Yes. AirPods with active noise cancellation, or a USB headset with a directional mic, both reduce ambient pickup and improve accuracy.
On-device Whisper models load slower on older hardware, and resource contention can cause dropped audio. Try the cloud transcription option (FluidVox Pro, Wispr Flow) to remove the hardware constraint.
With FluidVox's auto-learning dictionary, yes — the more corrections you make, the more terms it preserves correctly. With tools that don't auto-learn, you have to manually maintain the dictionary.
No. Even human transcribers make 4–6% errors. The right question is which tool gets closest to your specific environment, accent, and vocabulary — and how easy it is to correct the errors that remain.
Try FluidVox free for 14 days
Full access, no credit card required. Then $2.99/month or $39 one-time.
Start free trial