Skip to main content

Documentation Index

Fetch the complete documentation index at: https://setup.despia.com/llms.txt

Use this file to discover all available pages before exploring further.

Run language models on-device with one function call. Models load via the device’s native AI acceleration stack. Inference jobs auto-resume across backgrounding. Downloads continue when users close the app.

Installation

npm install despia-intelligence
import intelligence from 'despia-intelligence';

Runtime detection

The SDK resolves runtime state once at import time and exposes it synchronously. Gate every call behind intelligence.runtime.ok so the same code works in a desktop browser preview.
intelligence.runtime.ok       // boolean
intelligence.runtime.status   // 'ready' | 'outdated' | 'unavailable'
intelligence.runtime.message  // string | null

if (!intelligence.runtime.ok) {
  showBanner(intelligence.runtime.message)
  return
}
When ok is false, every API returns a not-ready handle. models.available() resolves to an empty array. The SDK never throws on a missing runtime, so your code can branch cleanly without try/catch.

Run

Fire an inference job and wire callbacks for streaming tokens and the final result.
const call = intelligence.run({
  type:   'text',
  model:  'qwen3-0.6b',
  prompt: 'Summarise this article.',
  system: 'Be concise.',
  stream: true,
}, {
  stream:   (chunk) => output.textContent = chunk,
  complete: (text)  => save(text),
  error:    (err)   => console.error(err.code, err.message),
})
stream(chunk) receives the full accumulated text so far, not a delta. Use el.textContent = chunk (replace), never el.textContent += chunk (append). Appending will produce exponentially duplicated output.
type
string
required
Routes the call. 'text' is the only enabled value in this release.
model
string
required
Model id, e.g. 'qwen3-0.6b'. Must be installed first via models.download().
prompt
string
required
The user prompt
system
string
System-level instruction context for the session
stream
boolean
When true, fires stream callbacks as tokens generate
Any extra key on the params object is forwarded to the native layer as-is. Arrays become comma-separated after URL encoding. You do not need to encode values yourself.
intelligence.run({
  type:        'text',
  model:       'qwen3-0.6b',
  prompt:      'Hello.',
  temperature: 0.7,
  top_p:       0.95,
  max_tokens:  256,
}, handler)

Handler callbacks

stream
(chunk: string) => void
Fires for each snapshot. chunk is the full accumulated text so far, not a delta.
complete
(text: string) => void
Fires once when inference finishes. text is the complete response string.
error
(err: { code, message }) => void
Fires on failure. See error codes.
interrupted
(intent: object) => void
Optional notification hook. Fires once per active job on focusout. Use for UI affordances or analytics. Resume itself is automatic.

Returns

run() returns a call handle synchronously. The same destructure works whether the runtime is ready or not.
ok
boolean
true when the runtime is ready and the call was queued, false when not
intent
object | null
The original params object, storable and re-firable. null on the not-ready handle.
cancel
() => void
Removes this job from the SDK. No further callbacks fire.
const call = intelligence.run(params, handler)

call.intent   // original params object
call.cancel() // drops the job, no further callbacks

Models

Manage the on-device model catalogue. Models are downloaded from Hugging Face into the Despia container and reused across launches.
// Full catalogue the runtime can install
const all = await intelligence.models.available()
// [{ id: 'qwen3-0.6b', name: 'Qwen3 0.6B', category: 'text' }, ...]
models.available()
() => Promise<Model[]>
Full catalogue the runtime can install. Returns [] when runtime.ok is false.
models.installed()
() => Promise<Model[]>
Currently downloaded to this device
models.download(id, callbacks)
(id, { onStart, onProgress, onEnd, onError }) => void
Starts a background download. Fire-and-forget, results arrive via the callback object.
models.remove(id)
(id: string) => Promise
Remove one model by id
models.removeAll()
() => Promise
Remove every downloaded model

Download events

Per-call callbacks for the component that started the download. Global events for app-wide state that needs to survive anything, including a force-quit mid-download.
const off = intelligence.on('downloadEnd', (modelId) => markInstalled(modelId))
off() // unsubscribe

intelligence.once('downloadEnd', (modelId) => showFirstDownloadBadge())
downloadStart
(modelId: string) => void
Fires when a download begins
downloadProgress
(modelId: string, pct: number) => void
Fires on progress updates. pct is a 0 to 100 integer in both the global event and the per-call onProgress callback.
downloadEnd
(modelId: string) => void
Fires when a download completes successfully
downloadError
(modelId: string, err: object) => void
Fires on download failure
The pattern: session callbacks for in-session UX (a progress bar on the settings page), global events for permanent state (a tab bar badge that needs to survive a force-quit).

Background and return

Inference sessions do not survive backgrounding. The native context is torn down when iOS or Android suspend the WebView. The SDK handles this for you. Every in-flight job is re-fired automatically with the same params and the same handler when the user returns. Just write your code as if backgrounding does not exist. The SDK only re-fires jobs that were genuinely interrupted: jobs that complete normally never re-fire, jobs that error out never re-fire, jobs you explicitly .cancel() never re-fire. Any number of concurrent jobs all resume.

Error codes

CodeSourceDescription
2runMissing id parameter on the native bridge call
3runRuntime inference error, see err.message for detail
intelligence.run(params, {
  error: (err) => {
    if (err.code === 3) {
      console.error('Inference failed:', err.message)
      fallbackToCloud()
    }
  },
})

React hook

import { useState, useEffect, useRef } from 'react'
import intelligence from 'despia-intelligence'

function useInference(model) {
  const [text, setText]       = useState('')
  const [running, setRunning] = useState(false)
  const callRef               = useRef(null)

  const run = (prompt, system) => {
    if (!intelligence.runtime.ok) return
    setText('')
    setRunning(true)
    callRef.current = intelligence.run({
      type: 'text', model, prompt, system, stream: true,
    }, {
      stream:   (chunk) => setText(chunk),
      complete: (full)  => { setText(full); setRunning(false) },
      error:    ()      => setRunning(false),
    })
  }

  useEffect(() => () => callRef.current?.cancel(), [])
  return { text, running, run }
}

Environment check

if (intelligence.runtime.ok) {
  // Use Despia Local AI
} else {
  // Fallback for non-Despia environment
}
The SDK never throws when the runtime is missing. It returns a not-ready handle so your code can branch cleanly. The same code path works in the Despia WebView and in a desktop browser preview.

Resources

NPM Package

despia-intelligence

Introduction

Overview, model selection, and FAQs

GitHub

Source on GitHub