Streaming Responses

Streaming Text Completion

Request text completions with streaming enabled to receive partial text chunks as they are generated.

curl --location 'http://localhost:8080/v1/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/gpt-4o-mini",
    "prompt": "Write a short haiku about the ocean",
    "stream": true
}'

Response Format (Server-Sent Events):

data: {"choices":[{"text":"Waves whisper soft"}],"model":"gpt-4o-mini"}

data: {"choices":[{"text":" on distant shores, the moon calls"}],"model":"gpt-4o-mini"}

data: {"choices":[{"text":" tides to rise."}],"model":"gpt-4o-mini"}

data: [DONE]

Streaming Chat Responses

Receive AI responses in real-time as they’re generated. Perfect for chat applications where you want to show responses as they’re being typed, improving user experience.

curl --location 'http://localhost:8080/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/gpt-4o-mini",
    "messages": [
        {"role": "user", "content": "Tell me a story about a robot learning to paint"}
    ],
    "stream": true
}'

Response Format (Server-Sent Events):

data: {"choices":[{"delta":{"content":"Once"}}],"model":"gpt-4o-mini"}

data: {"choices":[{"delta":{"content":" upon"}}],"model":"gpt-4o-mini"}

data: {"choices":[{"delta":{"content":" a"}}],"model":"gpt-4o-mini"}

data: [DONE]

Each chunk contains partial content that you can append to build the complete response in real-time.

Note: Streaming requests also follow the default timeout setting defined in provider configuration, which defaults to 30 seconds.

Bifrost standardizes all stream responses to send usage and finish reason only in the last chunk, and content in the previous chunks.

Responses API Streaming

Stream the OpenAI-style Responses API with event-based SSE. This includes event: lines and does not use the [DONE] marker; the stream ends when the connection closes.

curl --location 'http://localhost:8080/v1/responses' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/gpt-4o-mini",
    "input": "Tell me one interesting fact about Mars",
    "stream": true
}'

Response Format (Server-Sent Events):

event: response.created
data: {"type":"response.created"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta": /* partial text delta payload */ }

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta": * more text delta */ }

event: response.completed
data: {"type":"response.completed","response":{ /* usage, finish_reason, etc. */ }}

Text-to-Speech Streaming: Real-time Audio Generation

Stream audio generation in real-time as text is converted to speech. Ideal for long texts or when you need immediate audio playback.

curl --location 'http://localhost:8080/v1/audio/speech' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai/gpt-4o-mini-tts",
    "input": "Hello this is a sample test, respond with hello for my Bifrost",
    "voice": "alloy",
    "stream_format": "sse"
}'

Response: Audio chunks are delivered via Server-Sent Events. Each chunk contains base64-encoded audio data that you can decode and play or save progressively.

data: {"audio":"UklGRigAAABXQVZFZm10IBAAAAABAAEA..."}

data: {"audio":"AKlFQVZFZm10IBAAAAABAAEAq..."}

data: [DONE]

To save the stream: Add > audio_stream.txt to redirect output to a file.

Speech-to-Text Streaming: Real-time Audio Transcription

Stream audio transcription results as they’re processed. Get immediate text output for real-time applications or long audio files.

curl --location 'http://localhost:8080/v1/audio/transcriptions' \
--form 'file=@"/path/to/your/audio.mp3"' \
--form 'model="openai/gpt-4o-transcribe"' \
--form 'stream="true"' \
--form 'response_format="json"'

Response Format:

data: {"text":"Hello"}

data: {"text":" this"}

data: {"text":" is"}

data: {"text":" a sample"}

data: [DONE]

Additional options: Add --form 'language="en"' or --form 'prompt="context hint"' for better accuracy.

Audio Format Support

Speech Synthesis: Supports "response_format": "mp3" (default) and "response_format": "wav" Transcription Input: Accepts MP3, WAV, M4A, and other common audio formats

Note: Streaming capabilities vary by provider and model. Check each provider’s documentation for specific streaming support and limitations.

Next Steps

Now that you understand streaming responses, explore these related topics:

Essential Topics

Tool Calling - Enable AI models to use external tools and functions
Multimodal AI - Process images, audio, and multimedia content
Provider Configuration - Multiple providers for redundancy
Integrations - Drop-in compatibility with existing SDKs

Advanced Topics

Core Features - Advanced Bifrost capabilities
Architecture - How Bifrost works internally
Deployment - Production setup and scaling

Quick Start

Models Catalog

Provider Integrations

Custom plugins

Open Source Features

Enterprise Features

Streaming Responses

Streaming Text Completion

Streaming Chat Responses

Responses API Streaming

Text-to-Speech Streaming: Real-time Audio Generation

Speech-to-Text Streaming: Real-time Audio Transcription

Audio Format Support

Next Steps

Essential Topics

Advanced Topics

Quick Start

Models Catalog

Provider Integrations

Custom plugins

Open Source Features

Enterprise Features

​Streaming Text Completion

​Streaming Chat Responses

​Responses API Streaming

​Text-to-Speech Streaming: Real-time Audio Generation

​Speech-to-Text Streaming: Real-time Audio Transcription

​Audio Format Support

​Next Steps

​Essential Topics

​Advanced Topics

Streaming Text Completion

Streaming Chat Responses

Responses API Streaming

Text-to-Speech Streaming: Real-time Audio Generation

Speech-to-Text Streaming: Real-time Audio Transcription

Audio Format Support

Next Steps

Essential Topics

Advanced Topics