Skip to main content

Realtime

Use Charivo's realtime stack when you want session-based voice interaction, streaming assistant output, or tool-enabled voice workflows.

@charivo/realtime
@charivo/realtime/remote
your /api/realtime route
@charivo/server/openai

This is the current production-oriented browser path. The browser calls your route, receives an adapter-aware bootstrap, and connects through the default remote adapter registry.

Basic Setup

import {
createRealtimeManager,
type RealtimeToolRegistration,
} from "@charivo/realtime";
import {
createAvatarControlTools,
createAvatarResultProjector,
} from "@charivo/realtime-avatar";
import { createRemoteRealtimeClient } from "@charivo/realtime/remote";

const client = createRemoteRealtimeClient({ apiEndpoint: "/api/realtime" });
const avatarTools = createAvatarControlTools({
expressions: ["Smile", "Sad"],
motions: { Idle: 2, TapBody: 3 },
});

const tools: RealtimeToolRegistration[] = [
...avatarTools,
{
definition: {
type: "function",
name: "describeCharacterProfile",
description: "Return the active character profile.",
parameters: {
type: "object",
properties: {},
},
},
async handler(_args, context) {
return {
success: true,
name: context.character?.name ?? null,
};
},
},
];

const manager = createRealtimeManager(client, {
tools,
resultProjectors: [createAvatarResultProjector()],
});

If your app also renders lipsync locally, prepare audio from a user gesture before the first realtime session:

await renderManager.prepareAudio?.();
await manager.startSession({
provider: "openai",
model: "gpt-realtime-mini",
});

Typical session start:

await manager.startSession({
provider: "openai",
model: "gpt-realtime-mini",
});

If you need stronger product-specific acting guidance, append it in the app layer on top of the library-generated base instead of making @charivo/realtime own product persona rules:

import { buildRealtimeSessionConfig } from "@charivo/realtime";
import { buildAvatarControlInstructions } from "@charivo/realtime-avatar";

const base = buildRealtimeSessionConfig({ character });

await manager.startSession({
provider: "openai",
model: "gpt-realtime-mini",
instructions: [
base.instructions,
buildAvatarControlInstructions(avatarCatalog),
"Keep replies short and natural for this product.",
].join("\n"),
});

buildRealtimeSessionConfig(...) already includes character identity, description, personality, the generic realtime/tooling rules, and character.voice.voiceId when available. It does not supply provider/model defaults. OpenAI-specific model and voice fallbacks live in the OpenAI transport/provider packages, not in the provider-agnostic manager helper.

Why @charivo/realtime/remote Is The Default

  • it is the recommended production path
  • it works through your own server route
  • it resolves a browser transport adapter from its registry
  • the built-in resolver maps OpenAI WebRTC traffic to the current adapter defaults

Today, that usually means the OpenAI Agents WebRTC bootstrap flow.

Client Choices

Remote

  • @charivo/realtime/remote
  • best default for production browser apps
  • adapter-aware and server-mediated

OpenAI Agents SDK Transport

  • @charivo/realtime/openai-agents
  • current OpenAI Agents SDK transport client and adapter
  • useful when you need to own the underlying browser client directly

Legacy Low-Level OpenAI Transport

  • @charivo/realtime/openai
  • older low-level OpenAI WebRTC path
  • mainly useful for legacy compatibility and debugging

What @charivo/realtime Owns

  • session state
  • tool registry
  • typed session config helpers
  • in-place updateSession(...) session patching
  • reconnect orchestration and reconnect observability events
  • relaying realtime output into the Charivo event stream

RealtimeManager intentionally uses setEventEmitter(...), not the full event bus. It emits realtime, tool, text, and lip-sync related events back into core.

Local tool calls are checked against the registered tool definition before the handler runs. The built-in validator covers required, enum, and basic JSON Schema type fields; invalid arguments emit realtime:tool:error and return a failure tool result. Nested object/array schemas, additionalProperties, and union type arrays are not enforced.

Avatar expression/motion/gaze tools are optional and now live in @charivo/realtime-avatar. Use a result projector when you want those tool results bridged back into realtime:expression, realtime:motion, and realtime:gaze.

Reconnect Behavior

When the browser transport drops temporarily, the manager keeps the realtime session active and attempts recovery with the latest effective config.

  • state.session.status stays "active" during recovery
  • state.connection switches back to "connecting" until recovery succeeds
  • successful reconnects do not emit synthetic realtime:session:start/end
  • updateSession(...) still updates the cached base config while reconnecting
  • sendMessage(...), sendAudioChunk(...), and interrupt() reject while the connection is recovering
  • realtime:reconnect:attempt, realtime:reconnect:success, and realtime:reconnect:exhausted are emitted for observability

Provider Route

The server route typically uses @charivo/server/openai:

const provider = createOpenAIRealtimeProvider({
apiKey: process.env.OPENAI_API_KEY!,
});

const bootstrap = await provider.createSession({
adapter: "openai-agents-webrtc",
transport: "webrtc",
session: {
provider: "openai",
model: "gpt-realtime-mini",
voice: "marin",
},
});

If model or voice are omitted from an OpenAI realtime session, the OpenAI provider applies its OpenAI-specific defaults before calling OpenAI. Apps can still pass those fields explicitly when they need deterministic provider configuration.

Alternatives

  • Use the direct Agents transport package when you need to own the realtime transport client directly in the browser.
  • Use the legacy low-level package only when you intentionally depend on the older openai-webrtc flow.
  • Use turn-based STT and TTS when you do not need continuous live sessions.

References