Ryvos supports 18+ LLM providers out of the box. Every provider implements the same LlmClient trait, giving you streaming responses, tool calling, and token estimation regardless of which model you choose.
Provider Overview
| Provider | Type | Tool Calling | Streaming | Notes |
|---|---|---|---|---|
| Anthropic | Cloud | Yes | Yes | Recommended. Native Messages API. |
| OpenAI | Cloud | Yes | Yes | Chat Completions API. |
| Gemini | Cloud | Yes | Yes | Google native Gemini API. |
| Azure OpenAI | Cloud | Yes | Yes | Enterprise Azure deployments. |
| Cohere | Cloud | Yes | Yes | Cohere v2 Chat API. |
| AWS Bedrock | Cloud | Yes | Yes | AWS-managed models. |
| Ollama | Local | Yes | Yes | Run models locally. No API key needed. |
| Groq | Cloud | Yes | Yes | Ultra-fast inference. |
| OpenRouter | Cloud | Yes | Yes | Multi-model gateway. |
| Together | Cloud | Yes | Yes | Together AI inference. |
| Fireworks | Cloud | Yes | Yes | Fireworks AI. |
| Cerebras | Cloud | Yes | Yes | Cerebras inference. |
| xAI | Cloud | Yes | Yes | Grok models. |
| Mistral | Cloud | Yes | Yes | Mistral AI. |
| Perplexity | Cloud | Yes | Yes | Search-augmented LLM. |
| DeepSeek | Cloud | Yes | Yes | DeepSeek models. |
| Claude Code | CLI | Yes | Yes | Wraps Claude Code CLI subprocess. |
| Copilot | CLI | Yes | Yes | Wraps GitHub Copilot CLI (JSONL). |
| Custom | Any | Varies | Yes | Any OpenAI-compatible endpoint. |
Cloud Providers
Anthropic (Recommended)
[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"Supported models: claude-sonnet-4-20250514, claude-opus-4-20250514, claude-haiku-4-20250506, claude-3-5-sonnet-20241022, claude-3-5-haiku-20241022
Ryvos uses Anthropic's native Messages API with streaming, extended thinking support, and accurate token estimation via tiktoken.
:::tip Anthropic models have the best tool-calling reliability. Claude Sonnet 4 is the recommended default for the best balance of speed, cost, and capability. :::
OpenAI
[model]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"Supported models: gpt-4o, gpt-4o-mini, gpt-4-turbo, o1, o1-mini, o3-mini
Google Gemini
[model]
provider = "gemini"
model_id = "gemini-2.5-pro"
api_key = "$\{GOOGLE_API_KEY\}"Supported models: gemini-2.5-pro, gemini-2.5-flash, gemini-2.0-flash
Uses the native Gemini API (not the OpenAI compatibility layer) for full feature support.
Azure OpenAI
[model]
provider = "azure"
model_id = "gpt-4o"
api_key = "$\{AZURE_OPENAI_KEY\}"
base_url = "https://your-resource.openai.azure.com/openai/deployments/gpt-4o"The base_url must point to your specific Azure deployment. The API key is sent as the api-key header.
Cohere
[model]
provider = "cohere"
model_id = "command-r-plus"
api_key = "$\{COHERE_API_KEY\}"Uses Cohere's v2 Chat API with native tool calling support.
AWS Bedrock
[model]
provider = "bedrock"
model_id = "anthropic.claude-3-5-sonnet-20241022-v2:0"
base_url = "https://bedrock-runtime.us-east-1.amazonaws.com":::note
Bedrock authentication uses your AWS credentials from environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN) or IAM roles. No api_key field is needed.
:::
Local Providers
Ollama
[model]
provider = "ollama"
model_id = "llama3.1:70b"No API key required. Ryvos connects to Ollama at http://localhost:11434 by default.
# Install and start Ollama
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.1:70b
ollama serveOverride the endpoint for remote Ollama instances:
[model]
provider = "ollama"
model_id = "llama3.1:70b"
base_url = "http://192.168.1.100:11434":::tip Ollama works great as a fallback model. Configure a cloud primary with local Ollama fallback for resilience:
[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b":::
OpenAI-Compatible Providers
These providers all use the OpenAI Chat Completions API format. Ryvos auto-fills the correct base_url and headers via apply_preset_defaults().
Groq
[model]
provider = "groq"
model_id = "llama-3.3-70b-versatile"
api_key = "$\{GROQ_API_KEY\}"Known for extremely fast inference. Good for high-throughput tasks.
OpenRouter
[model]
provider = "openrouter"
model_id = "anthropic/claude-sonnet-4-20250514"
api_key = "$\{OPENROUTER_API_KEY\}"Access hundreds of models through a single API. Model IDs use the provider/model format.
Together AI
[model]
provider = "together"
model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
api_key = "$\{TOGETHER_API_KEY\}"Fireworks AI
[model]
provider = "fireworks"
model_id = "accounts/fireworks/models/llama-v3p1-70b-instruct"
api_key = "$\{FIREWORKS_API_KEY\}"Cerebras
[model]
provider = "cerebras"
model_id = "llama3.1-70b"
api_key = "$\{CEREBRAS_API_KEY\}"Ultra-fast inference on Cerebras wafer-scale hardware.
xAI (Grok)
[model]
provider = "xai"
model_id = "grok-2"
api_key = "$\{XAI_API_KEY\}"Mistral AI
[model]
provider = "mistral"
model_id = "mistral-large-latest"
api_key = "$\{MISTRAL_API_KEY\}"Perplexity
[model]
provider = "perplexity"
model_id = "llama-3.1-sonar-large-128k-online"
api_key = "$\{PERPLEXITY_API_KEY\}"Perplexity models include built-in web search. Useful for research-heavy tasks.
DeepSeek
[model]
provider = "deepseek"
model_id = "deepseek-chat"
api_key = "$\{DEEPSEEK_API_KEY\}"CLI Subprocess Providers
These providers wrap external CLI tools as subprocesses.
Claude Code
[model]
provider = "claude_code"
model_id = "claude_code"Wraps the claude CLI. Must be installed separately (npm install -g @anthropic-ai/claude-code). Useful if you have Claude Code access but not a direct Anthropic API key.
GitHub Copilot
[model]
provider = "copilot"
model_id = "copilot"Wraps the gh copilot CLI. Requires GitHub Copilot subscription and the gh CLI. Communication uses JSONL format (rewritten in v0.4.4 for reliability).
Custom / Self-Hosted
Use any OpenAI-compatible endpoint:
[model]
provider = "openai"
model_id = "your-model-name"
api_key = "$\{YOUR_API_KEY\}"
base_url = "http://localhost:8080/v1"
extra_headers = { "X-Custom-Header" = "value" }This works with:
- vLLM —
base_url = "http://localhost:8000/v1" - llama.cpp server —
base_url = "http://localhost:8080/v1" - LocalAI —
base_url = "http://localhost:8080/v1" - LiteLLM proxy —
base_url = "http://localhost:4000/v1" - Text Generation Inference —
base_url = "http://localhost:8080/v1"
Fallback Chain
Configure multiple fallback models for resilience:
[model]
provider = "anthropic"
model_id = "claude-sonnet-4-20250514"
api_key = "$\{ANTHROPIC_API_KEY\}"
[[fallback_models]]
provider = "openai"
model_id = "gpt-4o"
api_key = "$\{OPENAI_API_KEY\}"
[[fallback_models]]
provider = "ollama"
model_id = "llama3.1:70b"Ryvos tries each model in order when the previous one fails (network error, rate limit, timeout). The fallback is transparent to the conversation.
Streaming Architecture
All providers implement the LlmClient trait with streaming:
LLM Provider --> SSE/JSONL stream --> StreamDelta events --> Agent Loop
StreamDelta variants:
TextDelta— Partial text responseThinkingDelta— Extended thinking content (Anthropic)ToolUseStart— Beginning of a tool callToolInputDelta— Partial tool input JSONStop— Stream completeUsage— Token counts
The streaming system includes a RetryingClient wrapper for automatic exponential backoff on transient failures.
Verify Your Provider
After configuring a provider, verify it works:
ryvos doctorThe doctor command tests API connectivity for your primary and all fallback models.
Next Steps
- Configuration — Full config reference
- CLI Reference — All commands and flags
- Budget System — Control spending per provider