Overview
SGL (SGLang) is an OpenAI-compatible local/remote inference engine used for serving models with high throughput. Bifrost delegates all operations to the OpenAI provider implementation. Key features:- OpenAI API compatibility - Identical request/response format
- Full streaming support - Server-Sent Events with usage tracking
- Tool calling - Complete function definition and execution
- Text embeddings - Support for embedding models
- Parameter filtering - Removes unsupported fields for compatibility
Supported Operations
| Operation | Non-Streaming | Streaming | Endpoint |
|---|---|---|---|
| Chat Completions | ✅ | ✅ | /v1/chat/completions |
| Responses API | ✅ | ✅ | /v1/chat/completions |
| Text Completions | ✅ | ✅ | /v1/completions |
| Embeddings | ✅ | - | /v1/embeddings |
| List Models | ✅ | - | /v1/models |
| Image Generation | ❌ | ❌ | - |
| Speech (TTS) | ❌ | ❌ | - |
| Transcriptions (STT) | ❌ | ❌ | - |
| Files | ❌ | ❌ | - |
| Batch | ❌ | ❌ | - |
Unsupported Operations (❌): Speech, Transcriptions, Files, and Batch are not supported by the upstream SGL API. These return
UnsupportedOperationError.SGL is typically self-hosted. Ensure BaseURL is configured correctly pointing to your SGL instance (e.g., http://localhost:8000).Setup & Configuration
Configure SGLang as a provider.- Web UI
- config.json
- API
- Go SDK

- Navigate to Models > Model Providers. Look for SGLang under Configured Providers. If it is missing, click on Add New Provider and select SGLang.
- Click Add New Server or edit an existing key.
- Set a name for your key.
- Leave API Key blank for local servers. If your endpoint requires auth, paste a bearer token directly or use an environment variable.
- Set SGLang URL to
http://localhost:8000or your remote SGLang endpoint. - Set Allowed Models to All Models (default) or the specific model allowlist you want this key to serve.
- Save the provider configuration.
1. Chat Completions
Request Parameters
SGL supports all standard OpenAI chat completion parameters. For full parameter reference and behavior, see OpenAI Chat Completions.Filtered Parameters
Removed for SGL compatibility:prompt_cache_key- Not supportedverbosity- Anthropic-specificstore- Not supportedservice_tier- OpenAI-specific
2. Responses API
Fallback to Chat Completions with format conversion:3. Text Completions
SGL supports legacy text completion format:| Parameter | Mapping |
|---|---|
prompt | Direct pass-through |
max_tokens | max_tokens |
temperature, top_p | Direct pass-through |
frequency_penalty, presence_penalty | Supported |
4. Embeddings
SGL supports text embeddings for vector generation:| Parameter | Notes |
|---|---|
input | Text or array of texts |
model | Embedding model name |
encoding_format | ”float” or “base64” |
dimensions | Model-specific dimension count |
5. List Models
Lists available models from SGL server with capabilities.Unsupported Features
| Feature | Reason |
|---|---|
| Speech/TTS | Not offered by SGL API |
| Transcription/STT | Not offered by SGL API |
| Batch Operations | Not offered by SGL API |
| File Management | Not offered by SGL API |
SGL requires BaseURL configuration pointing to your SGL instance (e.g.,
http://localhost:8000 for local, https://sgl.example.com for remote).Caveats
BaseURL Configuration Required
BaseURL Configuration Required
Severity: High
Behavior: BaseURL must be explicitly configured through
sgl_key_config.url or network_config.base_url
Impact: Requests fail without proper configuration
Code: Requests call baseURLOrError before contacting SGLCache Control Stripped
Cache Control Stripped
Severity: Medium
Behavior: Cache control directives are removed from messages
Impact: Prompt caching features don’t work
Code: Stripped during JSON marshaling
Parameter Filtering
Parameter Filtering
Severity: Low
Behavior: OpenAI-specific fields filtered out
Impact: prompt_cache_key, verbosity, store removed
Code: filterOpenAISpecificParameters
User Field Size Limit
User Field Size Limit
Severity: Low
Behavior: User field > 64 characters silently dropped
Impact: Longer user identifiers are lost
Code: SanitizeUserField enforces 64-char max

