Run language models on-device with one function call. Models load via the device’s native AI acceleration stack. Inference jobs auto-resume across backgrounding. Downloads continue when users close the app.Documentation Index
Fetch the complete documentation index at: https://setup.despia.com/llms.txt
Use this file to discover all available pages before exploring further.
Installation
- Bundle
- CDN
Runtime detection
The SDK resolves runtime state once at import time and exposes it synchronously. Gate every call behindintelligence.runtime.ok so the same code works in a desktop browser preview.
ok is false, every API returns a not-ready handle. models.available() resolves to an empty array. The SDK never throws on a missing runtime, so your code can branch cleanly without try/catch.
Run
Fire an inference job and wire callbacks for streaming tokens and the final result.Routes the call.
'text' is the only enabled value in this release.Model id, e.g.
'qwen3-0.6b'. Must be installed first via models.download().The user prompt
System-level instruction context for the session
When
true, fires stream callbacks as tokens generateHandler callbacks
Fires for each snapshot.
chunk is the full accumulated text so far, not a delta.Fires once when inference finishes.
text is the complete response string.Fires on failure. See error codes.
Optional notification hook. Fires once per active job on
focusout. Use for UI affordances or analytics. Resume itself is automatic.Returns
run() returns a call handle synchronously. The same destructure works whether the runtime is ready or not.
true when the runtime is ready and the call was queued, false when notThe original params object, storable and re-firable.
null on the not-ready handle.Removes this job from the SDK. No further callbacks fire.
Models
Manage the on-device model catalogue. Models are downloaded from Hugging Face into the Despia container and reused across launches.- Available
- Installed
- Download
- Remove
Full catalogue the runtime can install. Returns
[] when runtime.ok is false.Currently downloaded to this device
Starts a background download. Fire-and-forget, results arrive via the callback object.
Remove one model by id
Remove every downloaded model
Download events
Per-call callbacks for the component that started the download. Global events for app-wide state that needs to survive anything, including a force-quit mid-download.Fires when a download begins
Fires on progress updates.
pct is a 0 to 100 integer in both the global event and the per-call onProgress callback.Fires when a download completes successfully
Fires on download failure
Background and return
Inference sessions do not survive backgrounding. The native context is torn down when iOS or Android suspend the WebView. The SDK handles this for you. Every in-flight job is re-fired automatically with the same params and the same handler when the user returns. Just write your code as if backgrounding does not exist. The SDK only re-fires jobs that were genuinely interrupted: jobs that complete normally never re-fire, jobs that error out never re-fire, jobs you explicitly.cancel() never re-fire. Any number of concurrent jobs all resume.
Error codes
| Code | Source | Description |
|---|---|---|
2 | run | Missing id parameter on the native bridge call |
3 | run | Runtime inference error, see err.message for detail |
React hook
Environment check
Resources
NPM Package
despia-intelligence
Introduction
Overview, model selection, and FAQs
GitHub
Source on GitHub