Custom styles, mute system audio on record, WASAPI mic capture, and Whisper language filtering.
Changelog
Major product updates across Windows and macOS.
This page tracks the bigger FluidVox milestones pulled from the git history so we have a durable release record we can keep extending over time.
Local file transcription — transcribe imported audio files using on-device models, free for all users.
March 27, 2026
Custom styles, system audio muting, and audio capture upgrade (v2.4.11–2.4.13)
- Custom Styles let you create your own transcription style with a free-text prompt — translate to another language, rewrite as a haiku, add annotations, or anything else you can describe. Up to 5 custom styles per user, assignable per-app.
- Custom styles now work correctly with personal Gemini API keys — translation and annotation prompts are no longer overridden by built-in language constraints.
- A new "Mute System Audio on Record" toggle silences other apps while you dictate, so background noise doesn't interfere with transcription.
- Microphone capture migrated from WaveIn to WASAPI, giving instant mic release after recording ends — other apps regain mic access immediately.
- English-only Whisper models (Tiny, Base, Small) no longer show Auto Detect or non-English languages in the language picker, preventing misconfiguration.
- Win key combo hotkey now stops cleanly on key release, fixing edge cases where the hotkey could get stuck.
- Device identification migrated to a more reliable hardware fingerprint.
March 26, 2026
Whisper language detection fix and text injection reliability (v2.4.10)
- Fixed Whisper auto-detect language not working correctly — the processor was configured with an empty language string instead of "auto", causing unreliable language detection for multilingual users.
- Text injection now releases any stuck modifier keys (Alt, Shift, Win) before pasting, preventing the destination app's menu bar from activating instead of receiving the paste.
- Alt key release uses dummy scancode padding to avoid triggering Windows menu bar activation when clearing stuck state.
March 26, 2026
Power state resilience and performance scaling (v2.4.9)
- The app now automatically recovers when switching between AC power and battery — no restart required.
- Audio devices, hotkeys, and efficiency mode settings are revalidated after every power transition.
- Local transcription models (Parakeet and Whisper) are automatically reinitialized when the power source changes.
- Whisper CUDA sessions gracefully fall back to CPU if the GPU becomes unavailable on battery, and restore GPU when AC power returns.
- Parakeet thread count now scales dynamically based on your CPU cores and power state for optimal performance.
- The system tray icon automatically recovers if Windows Explorer restarts.
- Fixed an issue where holding the Alt hotkey for long recordings could trigger the Windows menu bar.
March 25, 2026
Streaming local transcription and microphone improvements (v2.4.3–2.4.5)
- Local transcription now uses real-time streaming instead of file-based recording, significantly reducing latency.
- Recording starts instantly with no delay between pressing the hotkey and audio capture beginning.
- The selected microphone is now correctly used across all recording modes, including Help Mode voice input.
- Automatic gain control with fade-in improves audio quality, especially for quiet microphones.
- Empty audio with no speech detected is now skipped instead of producing a blank transcription.
- Alt-key combo hotkeys no longer trigger Word's KeyTips ribbon after recording stops.
March 24, 2026
CUDA auto-detection and transcription reliability (v2.4.0–2.4.2)
- The app now auto-detects your GPU and CUDA availability, with a built-in GPU setup guide for optimal local transcription performance.
- Cold-start model warmup eliminates the first-transcription delay after launching the app.
- Fixed garbled transcription caused by an incorrect sample rate mismatch detection.
- Fixed a crash that could occur when recording very short silence-only audio with Whisper.
- Improved spacing between consecutive dictations.
March 22, 2026
Local file transcription (v2.3.0)
- Imported audio and video files can now be transcribed using on-device Whisper or Parakeet models — no cloud access required.
- A new Local/Cloud toggle in the Audio Transcriptions tab lets you choose between on-device and cloud transcription for files.
- Notes now save correctly for Local tier users.
March 21, 2026
Local Monthly subscription plan (v2.2.0)
- A new Local Monthly plan is available for users who want local-only transcription on a monthly billing cycle.
- Device management support was added for multi-device licensing.
March 12, 2026
Help Mode: real-time screen-sharing guidance (v2.1.0)
- Help Mode lets you share your screen and get spoken step-by-step guidance in real time, powered by Gemini Live API.
- Local transcription quality improved with proper audio resampling and automatic gain control.
- Fixed an issue where the update dialog could appear in a loop on launch.
- Fixed a race condition where Alt-key hotkeys could leave the recording stuck.
March 11, 2026
One-click support diagnostics (v2.0.1)
- A new "Copy Support Info" button in Settings copies a full diagnostic snapshot (account, hardware, settings) to the clipboard for easy pasting into support messages.
- Internal analytics and telemetry improvements for better reliability.
March 10, 2026
Parakeet local transcription — fast, accurate, no GPU required (v2.0.0)
- Parakeet CPU-based transcription models are now available as a second local engine alongside Whisper, powered by sherpa-onnx.
- Two Parakeet models ship: a fast English-only model and a multilingual model supporting 25 languages — both run efficiently on CPU with no GPU needed.
- The onboarding hardware check now recommends Parakeet for machines without a capable GPU, so every user gets a good local transcription experience out of the box.
- The Manage Local Models dialog shows both Parakeet and Whisper models with hardware-aware "Recommended" badges and descriptions explaining when each engine works best.
- Active Parakeet models are preloaded in the background at app startup to eliminate cold-start latency on the first transcription.
- Model extraction was rebuilt to handle large archives reliably, fixing a rare issue where downloaded model files could be silently truncated.
March 10, 2026
Local Whisper language detection and Gemini response filtering (v1.3.3)
- Local Whisper transcription now detects the spoken language in auto-detect mode and passes it to Gemini for correct cleanup.
- The "Auto" transcription engine option was removed — users now choose Cloud or Local directly, simplifying settings.
- Gemini meta-responses (e.g. "Please provide the audio file...") are now filtered out instead of being injected as text.
March 9, 2026
Rich plan dialog, media pause, Microsoft OAuth, and hotkey improvements (v1.3.0)
- A new Choose Your Plan dialog matches the macOS design with rich plan cards and feature breakdowns.
- Media now automatically pauses when you start recording and resumes when you stop.
- Microsoft OAuth was added for Calendar and Tasks integration alongside Google.
- Configurable Win key hotkeys and additional hotkey combinations were added in Settings.
- Text injection was improved with streaming and progressive insertion for faster results.
March 8, 2026
Smoother Google Sign-In and smarter model recommendations (v1.2.9)
- Google Sign-In no longer triggers the "unverified app" warning — only basic profile scopes are requested at login.
- Calendar and Tasks permissions are now requested incrementally, only when the user first uses those features.
- The Whisper model management dialog now shows a GPU-aware "Recommended" badge next to the best model for your hardware.
March 7, 2026
Windows agent file handling got a major usability pass
- The file-management tool now supports fuzzy file search, better directory filtering, and optional search parameters.
- PowerShell output is being surfaced more clearly, including deferred display for larger results.
- The Vox result window can now render open-file and open-folder actions directly from agent results.
- Agent prompt rules were tightened so tool output is repeated back to the user instead of being treated as visible by default.
- Math and markdown rendering was updated to support the new file-action links inside results.
March 6, 2026
Windows moved into the 1.2.x release line
- Windows 1.2.4 through 1.2.6 shipped with Gemini key UX fixes and a Windows version bump.
- Large local Whisper model crashes were addressed with VRAM detection and recovery logic.
- Bring-your-own Gemini key routing was wired into the Windows product flow.
- RDP clipboard handling was fixed for remote desktop users.
March 5, 2026
Update prompts, verification flow, and toolbar behavior improved
- Windows 1.2.0 added an update dialog on launch so new builds are easier to discover.
- Email verification for password sign-up shipped, then moved onto the Resend-based verification flow.
- The floating toolbar became draggable.
- Speculative miss handling was optimized to keep the interaction loop tighter.
March 2-4, 2026
Vox Agent, auto-updates, and a broader Windows UI overhaul landed
- A major Windows feature update shipped with hotkey fixes, Vox Agent support, and UI overhaul work.
- Velopack auto-update support was integrated into the Windows app.
- Tray menu behavior and per-app style handling were improved.
- Cloud function prompt issues were fixed as the Windows agent stack matured.
February 23, 2026
Recording flow and hotkey behavior got much tighter
- Instant recording landed, with better start and stop sound timing.
- Alt-key recording issues were fixed, including menu activation on key release.
- Short silent recordings now skip unnecessary fallbacks instead of creating noisy failures.
- System output is muted during recording to reduce interference.
February 23, 2026
Update delivery and release infrastructure were added
- A visible Check for Updates button was added to the Windows app.
- The auto-update channel was corrected to the Windows release stream.
- Release pipeline work shipped with code signing and packaging improvements.
- License binding and auto-start behavior were hardened for real installs.
February 21-23, 2026
Vox actions expanded beyond plain dictation
- Hey Vox gained screenshot capture support for visual tasks.
- Notepad fallback handling improved text insertion reliability.
- Audio transcription UI work shipped so longer-form audio flows are visible in the app.
February 20-22, 2026
Security, onboarding, and subscription checks were upgraded
- Security hardening landed across the Windows app and release flow.
- Local Whisper support was added as part of the on-device transcription path.
- Email verification gating shipped for new accounts.
- Trial dates were added to new user profiles so account state is clearer.
February 10-11, 2026
The Windows app launched and quickly closed the first UX gaps
- The initial WinUI 3 Windows app foundation shipped.
- Dashboard UI parity work brought the Windows experience closer to macOS.
- Tray icon context menu, text injection, microphone selection, and latency issues were fixed early.
March 22, 2026
Local file transcription for all users (v2.5.0)
- Imported audio and video files can now be transcribed using on-device Whisper or Parakeet models — no cloud access or Pro subscription required.
- A new Local/Cloud toggle in the Audio Transcriptions tab lets you choose between on-device and cloud transcription.
- Free users with a downloaded local model now have full access to the file transcription feature.
- The audio language picker automatically filters to languages supported by the active local engine.
- Model loading happens on demand — if the selected model isn't in memory, it loads automatically before transcription begins.
March 9, 2026
Mouse button hotkeys, web integrations, and agent tool upgrades
- Any mouse button (M4/M5 side buttons, middle click, etc.) can now be assigned as the recording trigger — hold for walkie-talkie, tap for hands-free.
- Mouse button events are intercepted and suppressed so they don't fire their default action (e.g. Back/Forward) in other apps.
- Web integration services were added for Gmail, Google Calendar, and Microsoft services via OAuth.
- The Vox Agent gained enhanced calendar, reminders, email, and AppleScript tools for richer voice-driven actions.
- A device switch confirmation sheet was added for smoother multi-device handling.
March 3-4, 2026
Core stability work focused on freezes and agent recovery
- Main-thread freezes caused by mouse monitoring and audio-level storms were fixed.
- Engine health checks now include a startup grace period to avoid false positives.
- The Vox Agent loop can now detect and recover from malformed Gemini tool calls.
- Audio engine race conditions in the correction monitor were addressed.
February 28-March 1, 2026
Plans, API key flexibility, and usage messaging were refined
- Pro, Pro+, Local, and trial users got clearer plan management paths with a View Plans button.
- Usage messaging switched from exact token estimates to simpler relative usage labels.
- Pro, Pro+, and trial users can optionally bring their own Gemini API key.
- Pricing and token budgets were simplified around the lower Pro price point.
February 28, 2026
Vox Agent became much more capable and less noisy
- New agent tools shipped alongside better PDF and presentation creation flows.
- Math rendering and cloud backend support expanded the output quality of generated documents.
- Brief action confirmations moved to toast notifications instead of heavier interruptions.
- The agent stopped opening unnecessary browser tabs for simple knowledge questions.
- Screenshot sending in iMessage was fixed and content safety rules were added.
February 26-27, 2026
Document, spreadsheet, and style workflows expanded fast
- Live spreadsheet editing shipped for Numbers and Excel, including chart support.
- Presentation and PDF editing tools were added, followed by real file creation workflows.
- Per-app writing style was applied directly to Vox Agent composition.
- Style switcher chips were added to the floating toolbar and synced with the dashboard.
- Email reply workflows and AppleScript feedback loops were fixed.
February 24-25, 2026
Screenshots, image generation, and subscription packaging matured
- Screenshot capture moved to ScreenCaptureKit for more reliable visual context.
- Full-page capture and read-screen fallback tooling were added for the agent.
- Generated images can now render inline in the Vox Result window with a copy action.
- Image generation switched to Gemini's native image API.
- Pro+ and unified token-budget subscription routing were introduced.
February 23, 2026
Vox Agent rolled out in phases and became a headline feature
- The initial Vox Agent release shipped with multi-step tools and confirmation handling.
- Conversation memory, prompt improvements, and new tool phases followed the same day.
- Document-generation tools, command hotkeys, file reading, and language support expanded the feature set.
- Screenshot behavior, errors, hotkeys, and sound handling were tightened immediately after launch.
February 8-19, 2026
The macOS app foundation filled in updates, local models, and dictation polish
- Sparkle auto-updates, release pipeline work, and download packaging shipped.
- Bring-your-own Gemini key support and local transcription models were added.
- Automatic language detection, style matching improvements, and correction learning kept improving output quality.
- Custom hotkeys, hide-from-dock behavior, Hey Vox voice commands, and audio transcription all landed in this stretch.