Despia exposes two interoperable speech recognition surfaces on iOS and Android, both backed by the platform’s native recognizer. TheDocumentation Index
Fetch the complete documentation index at: https://setup.despia.com/llms.txt
Use this file to discover all available pages before exploring further.
speechrecognition:// URL-scheme bridge gives you a flat, four-event control flow. The window.SpeechRecognition polyfill is a drop-in Web Speech API replacement, so existing code targeting Safari or Chrome and libraries like react-speech-recognition run unmodified inside your app. The same JavaScript runs identically on both platforms.
The first session triggers a microphone permission prompt on both platforms, plus an additional Speech Recognition prompt on iOS. Until permissions are granted, no audio is captured. The decision is remembered for subsequent sessions.
Installation
- Bundle
- CDN
How it works
Register a global callback before issuing the first command, then trigger sessions through thespeechrecognition:// scheme. Events arrive as flat objects on the callback you registered, and any events emitted before the callback is set are silently dropped.
despia('speechrecognition://stop') to finalize the in-flight utterance, or despia('speechrecognition://abort') to cancel immediately with no final result. Calling start while a session is active emits an error with message: "already_started".
Start parameters
All parameters are optional and passed as query string values onspeechrecognition://start. Boolean params accept true, 1, or yes, case-insensitive.
| Param | Default | Meaning |
|---|---|---|
language | system locale | BCP-47 tag, for example en-US, de-DE, ja-JP. Omit to use the device default. |
continuous | false | Keep listening across utterances until stop or abort. |
interim | false | Stream non-final partial results. |
max | 1 | Cap on alternatives. iOS decides how many to actually return. |
known_words | none | Comma-separated list of custom words or phrases to bias the recognizer toward. Accepts the alias knownWords. See Biasing toward custom vocabulary. |
Locale.current) is used when language is omitted, which may differ from the page’s <html lang> value.
Biasing toward custom vocabulary
Product names, technical jargon, proper nouns, and other words outside the system dictionary often get transcribed phonetically (Despia becomes desk pier, SwiftUI becomes swift you why). Passing a known_words list nudges the recognizer to prefer your terms when the audio is ambiguous, without affecting recognition of anything else.
URL-scheme bridge
Passknown_words as a comma-separated query parameter. The parameter also accepts the alias knownWords.
%20, accented characters use UTF-8 percent encoding. Values are trimmed, de-duplicated, and empty entries are dropped.
Polyfill
Set theknownWords property as an array of strings before calling start(). This is a Despia extension to the Web Speech API, not part of the standard, so it is ignored cleanly outside the app.
Platform support
| Platform | Backing API | Behavior |
|---|---|---|
| iOS 10 and later | SFSpeechAudioBufferRecognitionRequest.contextualStrings | Re-applied to every recognition request, so biasing persists across utterance rotation in continuous mode. |
| Android 13 and later | RecognizerIntent.EXTRA_BIASING_STRINGS | Forwarded to the system recognizer. |
| Older Android | none | Silently ignored. Recognition still works without biasing. |
Guidance
This is a bias, not a whitelist. Words outside the list are still recognized normally, the list just shifts the recognizer’s preference when audio is ambiguous. A few practical notes:- Keep the list reasonably small. Apple’s guidance for
contextualStringsis roughly 100 short phrases or fewer for best effect. Very long lists dilute the signal. - Prefer specific terms over common words. Adding
theto the list does nothing useful, adding your product name does. - An empty or omitted list adds zero overhead, no biasing is applied at all rather than an empty bias.
- Update the list per session if context changes, for example a navigation app might pass the user’s current city’s neighborhood names.
Result events
Eachresult event carries the best alternative at the top level, plus the full ranked list under alternatives.
interim=true. In continuous=true mode, each completed utterance produces its own result with isFinal: true until you stop the session.
On Android, confidence is usually 0.0 because the platform recognizer rarely returns per-alternative scores. Do not gate UX on the confidence value, rank by array order instead, alternatives[0] is always the best transcription. iOS returns real values in the 0.0 to 1.0 range.
If nothing is recognized at all, no result is emitted, the session goes start then end (or start, error{no-speech}, end on a clean stop). Detect this by counting result events before end.
Error events
Theerror field uses the standard Web Speech vocabulary, identical on both platforms. Every error is followed by end, so cleanup that listens for end runs reliably in both success and failure paths.
error | Cause | Typical message |
|---|---|---|
not-allowed | Microphone permission denied or not yet determined. | speech_recognition_denied, ERROR_INSUFFICIENT_PERMISSIONS |
service-not-allowed | Recognizer unavailable, busy, or restricted by MDM or parental controls. | recognizer_unavailable, ERROR_RECOGNIZER_BUSY |
language-not-supported | No recognizer for the requested BCP-47 tag. | no_recognizer_for_locale, ERROR_LANGUAGE_NOT_SUPPORTED |
audio-capture | Audio engine failure, network failure on Android, or unknown command. | audio_engine_failed, ERROR_AUDIO, ERROR_NETWORK, unknown_command |
no-speech | Clean stop but nothing was recognized. | ERROR_NO_MATCH |
ERROR_NETWORK and ERROR_SERVER failures are intentionally folded into audio-capture so the same error-handling code runs on both platforms. If you need to distinguish a network failure from a true audio engine failure, read the platform-specific code from event.message.
Push to talk
Capture a single utterance for the duration the user holds the button. Mappointercancel to abort so a swipe-off discards the result instead of finalizing it.
Continuous dictation
For long-form input like notes, messaging composers, or voice memos, start withcontinuous=true and interim=true so each finalized utterance accumulates while interim partials update the UI live.
stop or abort.
Web Speech API compatibility
The same engine is exposed aswindow.SpeechRecognition (and the webkitSpeechRecognition alias), so portable Web Speech code runs as-is. This is the surface that react-speech-recognition and similar libraries already target.
start, audiostart, soundstart, speechstart, result, speechend, soundend, audioend, end) and supports multiple simultaneous recognizers, each with its own engine instance. It no-ops gracefully outside the Despia runtime, which is why the isDespia gate and the Recognition existence check work as a clean feature detection.
Opt out of the polyfill on a specific page with a meta tag, which leaves the speechrecognition:// URL-scheme bridge fully active:
Event instances, so they do not support preventDefault, stopPropagation, or bubbling. Reading event.results inside an onnomatch handler is a no-op since nomatch events do not carry a results field.
Concurrency and audio behavior
The URL-scheme bridge is single-session, only onespeechrecognition:// session can be active at a time. The polyfill supports multiple simultaneous recognizers, but they all share the single device microphone. Calling start on the URL-scheme bridge while a session is active emits an error with message: "already_started", and the running session continues uninterrupted.
Any concurrent media playback (background music, video) is ducked to a lower volume for the duration of any recognition session, and restored when the last session ends. If your app plays audio during dictation, expect it to attenuate while a session is active and recover when end fires.
Resources
NPM Package
despia-native