When deploying Ollama in production environments, consider the following key aspects:
1. System Requirements:
Hardware Requirements:
- CPU: Modern processor with AVX2 instruction set support
- Memory: At least 8GB RAM, recommended 16GB+
- Storage: SSD storage, 4-20GB per model
- GPU (Optional): NVIDIA GPU (CUDA 11.0+) or Apple Silicon (M1/M2/M3)
Operating System:
- Linux (recommended Ubuntu 20.04+)
- macOS 11+
- Windows 10/11
2. Deployment Architecture:
Single Machine Deployment:
bash# Install and start service ollama serve # Listens on 0.0.0.0:11434 by default
Docker Deployment:
dockerfileFROM ollama/ollama # Copy custom models COPY my-model.gguf /root/.ollama/models/ # Start service CMD ["ollama", "serve"]
bash# Run container docker run -d -v ollama:/root/.ollama -p 11434:11434 --gpus all ollama/ollama
3. Load Balancing:
Using Nginx as reverse proxy:
nginxupstream ollama_backend { server 192.168.1.10:11434; server 192.168.1.11:11434; server 192.168.1.12:11434; } server { listen 80; server_name ollama.example.com; location / { proxy_pass http://ollama_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } }
4. Monitoring and Logging:
Health Check:
bashcurl http://localhost:11434/api/tags
Log Management:
bash# View real-time logs ollama logs -f # Configure log level export OLLAMA_LOG_LEVEL=debug
5. Security Configuration:
API Authentication: Add authentication using reverse proxy:
nginxlocation /api/ { auth_basic "Restricted"; auth_basic_user_file /etc/nginx/.htpasswd; proxy_pass http://localhost:11434/api/; }
Firewall Configuration:
bash# Only allow specific IPs ufw allow from 192.168.1.0/24 to any port 11434
6. Performance Optimization:
Model Preloading:
bash# Preload models on startup ollama run llama3.1 &
Concurrent Processing:
dockerfile# Set in Modelfile PARAMETER num_parallel 4
7. Backup and Recovery:
bash# Backup models tar -czf ollama-backup.tar.gz ~/.ollama/ # Restore models tar -xzf ollama-backup.tar.gz -C ~/