Ollama Integration
Introduction
Ollama is a framework for running large language models (LLMs) locally on your machine. RA.Aid provides robust integration with Ollama, allowing you to leverage locally-hosted models without requiring external API access or keys.
This integration gives you the ability to:
- Run RA.Aid entirely offline using local models
- Avoid API costs associated with cloud LLM providers
- Use custom fine-tuned models specific to your needs
- Control your data security by keeping all interactions local
Requirements and Setup
Installation
- Install Ollama following the instructions on the official website
- Verify Ollama is running with:
ollama list
- Pull the models you want to use:
ollama pull justinledwards/mistral-small-3.1-Q6_K
ollama pull qwq:32b
Compatible Models
RA.Aid works with many Ollama models, including:
Model Name | Description |
---|---|
justinledwards/mistral-small-3.1-Q6_K | Mistral AI's optimized small model |
qwq:32b | High-performing yet small reasoning model |
MHKetbi/Qwen2.5-Coder-32B-Instruct | Qwen 2.5 Coder Instruct model |
You can also use any custom model you've created or imported into Ollama.
Configuration Options
Environment Variables
Ollama configuration is primarily controlled through the following environment variables:
Environment Variable | Purpose | Default Value |
---|---|---|
OLLAMA_BASE_URL | The URL where Ollama is running | http://localhost:11434 |
EXPERT_OLLAMA_BASE_URL | Separate URL for expert models | Same as OLLAMA_BASE_URL |
Unlike other providers (OpenAI, Anthropic, etc.), Ollama doesn't require an API key since it runs locally.
Command-Line Parameters
RA.Aid provides several command-line parameters specifically for Ollama configuration:
ra-aid --provider ollama --model <model_name> [options]
Parameter | Description | Default |
---|---|---|
--provider ollama | Sets Ollama as the LLM provider | - |
--model <model_name> | Specifies which Ollama model to use | - |
--num-ctx <number> | Sets the context window size in tokens | 262144 |
--expert-provider ollama | Uses Ollama for expert queries | - |
--expert-model <model_name> | Sets which Ollama model to use for expert queries | - |
--expert-num-ctx <number> | Sets the expert context window size | 262144 |
--temperature <float> | Controls response randomness (0.0-1.0) | 0.7 |
Usage Examples
Basic Usage
Run RA.Aid with Ollama's justinledwards/mistral-small-3.1-Q6_K model:
ra-aid --provider ollama --model justinledwards/mistral-small-3.1-Q6_K -m "Add unit tests to the database module"
Adjusting Context Window Size
For complex tasks that require more context:
ra-aid --provider ollama --model justinledwards/mistral-small-3.1-Q6_K --num-ctx 8192 -m "Refactor the entire error handling system"
Using Different Models for Expert Mode
Configure separate models for main tasks and expert queries:
ra-aid --provider ollama --model justinledwards/mistral-small-3.1-Q6_K --expert-provider ollama --expert-model qwq:32b -m "Create a React component for user authentication"
Ollama-Specific Features
Context Window Size (num-ctx
)
The --num-ctx
parameter controls how many tokens the model can process at once, affecting:
- Input Processing: Larger values allow RA.Aid to provide more context from your codebase
- Memory Capacity: How much information the model can reference while working
- Generation Length: Influences how detailed the responses can be
The optimal value depends on:
- The specific model you're using (some models have built-in limits)
- Your available system resources (larger values require more RAM)
- The complexity of your project and tasks
For most development tasks, values between 4,096 and 16,384 work well. Extremely large values (like the default 262,144) are typically unnecessary and may consume excessive resources.
Local Model Advantages
Using Ollama with RA.Aid provides several benefits:
- Privacy: Your code and prompts never leave your machine
- No Rate Limits: Unlimited usage without API quotas or rate limits
- Offline Operation: Work without internet connectivity
- Cost-Free: No usage-based billing or API charges
- Customization: Use custom models fine-tuned for your specific needs
Troubleshooting
Common Issues
Ollama Not Running
Symptoms: RA.Aid reports connection errors when trying to use Ollama models.
Solution:
- Verify Ollama is running with
ollama list
- Start Ollama if needed (this varies by OS):
ollama serve
- Check if Ollama is running on a different port and set
OLLAMA_BASE_URL
accordingly
Model Not Found
Symptoms: Error message indicating the specified model doesn't exist.
Solution:
- List available models with
ollama list
- Pull the model you want:
ollama pull <model_name>
- Check for typos in the model name
Out of Memory Errors
Symptoms: Ollama crashes or returns errors when processing large inputs.
Solution:
- Use a smaller
--num-ctx
value - Try a more memory-efficient model
- Close other memory-intensive applications
- Ensure your system has enough RAM for the chosen model
Slow Performance
Symptoms: Responses take a very long time to generate.
Solution:
- Use a smaller or more efficient model
- Reduce the
--num-ctx
value - Check if your GPU is being utilized (if available)
- Consider using quantized versions of models and/or smaller models
Best Practices
-
Match Model to Task: Use code-specific models for programming tasks and general models for research or planning.
-
Start with Smaller Context: Begin with a modest
--num-ctx
value (4096-8192) and increase only if needed. -
Quantized Models: For better performance on consumer hardware, use quantized models (identified by q4, q5, etc. in their name).
-
GPU Acceleration: If you have a compatible GPU, ensure Ollama is configured to use it for significant speed improvements.
Expert Model Configuration
RA.Aid's expert model allows using different models for specialized query types. With Ollama, you can configure this using:
ra-aid --provider ollama --model justinledwards/mistral-small-3.1-Q6_K --expert-provider ollama --expert-model qwq:32b -m "Your task here"
This configuration uses:
justinledwards/mistral-small-3.1-Q6_K
for the main task planning and implementationqwq:32b
for expert queries requiring deeper code analysis
You can also use different providers for main and expert functions:
ra-aid --provider ollama --model justinledwards/mistral-small-3.1-Q6_K --expert-provider anthropic --expert-model claude-3-sonnet-20240229 -m "Your task here"
This hybrid approach lets you:
- Use local models for most operations, saving API costs
- Leverage cloud models only for complex expert queries
- Optimize your workflow by matching models to specific query types
In addition to the expert model for domain questions, RA.Aid supports the --reasoning-assistance
flag which enables the expert model (qwq:32b) to help the primary model make optimal use of available tools. This functionality is separate from the expert query system and can significantly improve the performance of smaller models by providing guidance on tool selection and usage. For complete details on this feature, see our Reasoning Assistance guide.