Skip to main content
Despia Local Intelligence is in beta. The API spec will likely change before the official launch. A dedicated NPM package for Local Intelligence will be released before launch to make setup more convenient - the current despia-native integration is temporary.
Despia Local Intelligence requires Despia V4, which is currently in beta. To request access, email beta@despia.com.
On-device inference via HuggingFace models. Downloaded once and cached locally. All inference runs without a network connection.
HuggingFace model inference runs on both iOS and Android. The appleintelligence:// one-shot scheme is iOS only.

Installation

npm install despia-native
import despia from 'despia-native';

One-shot inference

iOS only. Runs via appleintelligence://.
Runs a prompt to completion and calls a named function on window with the full response string.
// Note: This API is not final and subject to change.
const isDespia = navigator.userAgent.toLowerCase().includes('despia')

if (isDespia) {
    despia(
        `appleintelligence://?prompt=${encodeURIComponent('What is the capital of France?')}`
    )
}

function handleAIResponse(response) {
    console.log(response)
}
With system instructions:
// Note: This API is not final and subject to change.
const system = 'You are a concise assistant. Reply in one sentence.'
const prompt  = 'Explain what a transformer model is.'

despia(
    `appleintelligence://?instructions=${encodeURIComponent(system)}&prompt=${encodeURIComponent(prompt)}`
)
Parameters
ParameterTypeRequiredDescription
promptstringYesThe user prompt
instructionsstringNoSystem-level instruction context for the session
callbackstringNoName of the global JS function to receive the response. Defaults to handleAIResponse
Callback The native layer calls window[callback](response) on success, or window[callback](errorMessage) on failure.
// Note: This API is not final and subject to change.
function handleAIResponse(response) {
    document.getElementById('output').textContent = response
}

Streaming inference

iOS and Android. Runs via intelligence://.
Streams tokens as they are generated. Set up callbacks before firing the scheme call.
// Note: This API is not final and subject to change.
const isDespia = navigator.userAgent.toLowerCase().includes('despia')

if (isDespia) {
    const jobId = crypto.randomUUID()

    window.onMLToken = (id, chunk) => {
        if (id === jobId) {
            // chunk is the full accumulated response so far - replace, do not append
            document.getElementById('output').textContent = chunk
        }
    }

    window.onMLComplete = (id, fullText) => {
        if (id === jobId) {
            console.log('Complete:', fullText)
        }
    }

    window.onMLError = ({ errorCode, errorMessage }) => {
        console.error(`Error ${errorCode}: ${errorMessage}`)
    }

    despia(
        `intelligence://?id=${encodeURIComponent(jobId)}&prompt=${encodeURIComponent('What is the capital of France?')}`
    )
}
With system instructions:
// Note: This API is not final and subject to change.
const system = 'You are a concise assistant. Reply in three sentences or fewer.'
const prompt  = 'What is the difference between TCP and UDP?'

despia(
    `intelligence://?id=${encodeURIComponent(jobId)}&system=${encodeURIComponent(system)}&prompt=${encodeURIComponent(prompt)}`
)
Parameters
ParameterTypeRequiredDescription
promptstringYesThe user prompt
idstringYesUnique job ID used to correlate token and completion events
systemstringNoSystem-level instruction context for the session
webhookstringNoReserved. Parsed by the native layer but not yet active
Callbacks
Called for each snapshot as it is generated. chunk is the full accumulated response so far, not just the new token. Replace the output element’s content rather than appending.
// Note: This API is not final and subject to change.
window.onMLToken = (id, chunk) => {
    if (id === jobId) {
        document.getElementById('output').textContent = chunk
    }
}
Called once when inference finishes. fullText is the complete response.
// Note: This API is not final and subject to change.
window.onMLComplete = (id, fullText) => {
    if (id === jobId) {
        saveToHistory(fullText)
    }
}
Called on any failure. See error codes below.
// Note: This API is not final and subject to change.
window.onMLError = ({ errorCode, errorMessage }) => {
    console.error(`Error ${errorCode}: ${errorMessage}`)
}

Multiple concurrent jobs

Use unique id values per job to handle concurrent streams without collision.
// Note: This API is not final and subject to change.
const jobs = new Map()

window.onMLToken = (id, chunk) => {
    const el = jobs.get(id)
    if (el) el.textContent = chunk
}

window.onMLComplete = (id) => {
    jobs.delete(id)
}

function runJob(prompt, outputElement) {
    const jobId = crypto.randomUUID()
    jobs.set(jobId, outputElement)
    despia(`intelligence://?id=${encodeURIComponent(jobId)}&prompt=${encodeURIComponent(prompt)}`)
}

Error codes

CodeSchemeDescription
1appleintelligence://Missing prompt parameter
2intelligence://Missing id parameter
3intelligence://Runtime inference error - see errorMessage for detail

Available models

All models are available in int4 (smaller, faster) and int8 (higher quality) quantizations.
Weight nameDisplay name
lfm2-8b-a1bLFM2 8B A1B
lfm2-2.6bLFM2 2.6B
youtu-llm-2bYoutu LLM 2B
qwen3-1.7bQwen3 1.7B
lfm2.5-1.2b-instructLFM2.5 1.2B Instruct
lfm2.5-1.2b-thinkingLFM2.5 1.2B Thinking
gemma-3n-e4b-itGemma 3n E4B IT
gemma-3-1b-itGemma 3 1B IT
qwen3-0.6bQwen3 0.6B
lfm2-700mLFM2 700M
lfm2.5-350mLFM2.5 350M

Environment check

Gate all Despia Local Intelligence calls behind a user agent check so the feature degrades gracefully in a standard browser.
// Note: This API is not final and subject to change.
const isDespia = navigator.userAgent.toLowerCase().includes('despia')

if (isDespia) {
    // Despia Local Intelligence calls here
} else {
    // Fallback - cloud API or disabled state
}

Resources

NPM Package

despia-native