Provider Architecture
Built-in Provider
Ollama Cloud — Free fallback, always available. Configured during setup and serves as the ultimate fallback.
Primary Provider
Plugin Provider — Optional override (OpenAI, Anthropic, etc.). When set, becomes the default provider in the smart router.
Provider Hierarchy
The smart router selects providers based on availability and query complexity:The built-in Ollama Cloud provider is always available as a fallback, even if the primary provider is unreachable.
Provider Registry
All providers (built-in + plugins) are registered in the provider registry during agent boot:Listing Providers
Ask your agent to show all registered providers:- Provider ID
- Model name
- Availability status
- Configuration (base URL, API key reference)
Managing Built-in Provider
The built-in Ollama Cloud provider is configured during setup but can be modified later.View Current Model
Switch Built-in Model
Change the built-in model to optimize for speed or capability:- Via CLI
- Via Conversation
qwen2.5:32b-instruct— Balanced (default)qwen2.5:14b-instruct— Fast, smallerqwen2.5:72b-instruct— Most capable, slower
Managing Primary Provider
The primary provider is an optional override that takes precedence over the built-in provider.Setting Primary Provider
Install a provider plugin and set it as primary:Viewing Primary Provider
Clearing Primary Provider
Revert to using only the built-in provider:- Via CLI
- Via Conversation
Smart Router
The smart router automatically selects the best provider for each query based on complexity tiers.Query Complexity Tiers
Simple
Quick lookups, greetings, simple facts
Moderate
General questions, basic reasoning
Complex
Multi-step tasks, analysis, code generation
Reasoning
Deep reasoning, chain-of-thought, complex logic
Tier-Based Routing
The router uses tier mappings to assign providers:Automatic Fallback
If a provider is unavailable, the router falls down through tiers:Fallback through tiers
If unavailable, try the next lower tier’s provider:
reasoning → complex → moderate → simpleConfiguring Tier Mapping
Customize which provider handles each tier:Provider Plugins
Available Provider Plugins
OpenAI
- GPT-4, GPT-4 Turbo, GPT-3.5
- Function calling support
- High rate limits
Anthropic
- Claude 3 (Opus, Sonnet, Haiku)
- Extended context windows
- Strong reasoning capabilities
Ollama Local
- Run models on your own hardware
- Fully private, no API calls
- Supports Llama, Mistral, etc.
Custom
- Build your own provider plugin
- Compatible with any OpenAI-compatible API
- Full control over routing
Installing Provider Plugins
Provider plugins are installed separately:Provider Configuration
Each provider plugin may have unique settings:OpenAI Settings
OpenAI Settings
- API Key: Required for authentication
- Organization ID: Optional, for team accounts
- Model:
gpt-4,gpt-4-turbo,gpt-3.5-turbo, etc. - Base URL: Override for Azure OpenAI or compatible APIs
- Max Tokens: Limit response length
Anthropic Settings
Anthropic Settings
- API Key: Required for authentication
- Model:
claude-3-opus,claude-3-sonnet,claude-3-haiku - Max Tokens: Context window limit
- Version: API version (default: latest)
Ollama Local Settings
Ollama Local Settings
- Base URL: Local Ollama server URL (e.g.,
http://localhost:11434) - Model: Any model installed locally (
llama3,mistral, etc.) - Keep Alive: Keep model loaded in memory
- Num GPU: Number of GPU layers to use
Provider Health Monitoring
Checking Provider Status
Ask your agent about provider health:- ✅ Available
- ⚠️ Slow/degraded
- ❌ Unreachable
Automatic Health Checks
The smart router performs periodic health checks:- Interval: Every 60 seconds (configurable)
- Timeout: 5 seconds per provider
- Fallback: Automatic if provider fails 3 consecutive checks
Cost Optimization
Model Selection Strategy
Use cheaper models for simple tasks, expensive models for complex reasoning:Provider Cost Comparison
| Provider | Model | Cost per 1M tokens | Best for |
|---|---|---|---|
| Ollama Cloud | qwen2.5:32b | Free (rate limited) | General use |
| OpenAI | GPT-3.5 Turbo | 1.50 | Simple/moderate |
| OpenAI | GPT-4 Turbo | 30 | Complex reasoning |
| Anthropic | Claude 3 Haiku | 1.25 | Fast, affordable |
| Anthropic | Claude 3 Opus | 75 | Best quality |
| Ollama Local | Self-hosted | Free (hardware cost) | Privacy, control |
Prices are approximate and subject to change. Check provider pricing pages for current rates.
Troubleshooting Providers
Provider Not Available
Slow Responses
Model is overloaded
Model is overloaded
Switch to a faster model or different provider:
Network latency
Network latency
Check your internet connection. Consider using a local Ollama instance for zero-latency responses.
Rate limiting
Rate limiting
You may be hitting provider rate limits. Upgrade your plan or spread queries across multiple providers.
Fallback Not Working
Check fallback logic
Review tier mappings:Ensure there’s a fallback path from higher to lower tiers.
Next Steps
Deep dive into smart router architecture and query classification