Agent Mode enables Bifrost to automatically execute tool calls without requiring explicit execution API calls for each tool. This transforms Bifrost from a simple gateway into an autonomous agent runtime.
Streaming Not Supported: Agent Mode is not compatible with streaming operations (chat_stream and responses_stream). Due to architectural limitations, the autonomous tool execution loop requires complete responses before proceeding to the next iteration (we cannot store all streaming chunks in memory just “in case” we get any tool calls, this would be a big anti-pattern). Use non-streaming endpoints (chat and responses) when Agent Mode is enabled.
When a response contains both auto-executable and non-auto-executable tools:
Auto-executable tools are executed first
The response is returned with:
A text content field containing the executed tool results as JSON
Pending non-auto-executable tool calls in tool_calls
finish_reason set to "stop"
{ "choices": [{ "index": 0, "finish_reason": "stop", "message": { "role": "assistant", "content": "The Output from allowed tools calls is - {\"filesystem_list_directory\":\"[\\\"file1.go\\\", \\\"file2.go\\\"]\"}\n\nNow I shall call these tools next...", "tool_calls": [{ "id": "call_pending", "type": "function", "function": { "name": "filesystem_write_file", "arguments": "{\"path\": \"output.txt\", \"content\": \"...\"}" } }] } }]}
The content field contains a JSON summary of executed tool results. The tool_calls array contains only the non-auto-executable tools that require your approval. The finish_reason is set to "stop" to exit the agent loop.
Your application then:
Parse the content field to see what was already executed
Review the pending non-auto-executable tools in tool_calls
Be careful which tools you mark as auto-executable. Dangerous operations like write_file, delete_file, execute_command should typically require human approval.
When Agent Mode executes, each iteration through the LLM and tool execution cycle increments a counter. You can track this for logging and debugging:
// During iteration 1 -> Request made with max_tokens adjustment// Tool results collected and added to history// During iteration 2 -> Another LLM call with history// Process continues until no more tool calls or max_agent_depth reached
The max_agent_depth setting controls maximum iterations:
Default: 10
Range: 1-50 (configurable)
When reached, current response returned as-is (may contain pending tool calls)
Iteration N: Auto tools executed in parallel Non-auto tools returned in response Application reviews & approves non-auto tools Application calls execute endpoint manually Results fed back in next iteration