Ollama provides a complete RESTful API that runs by default on http://localhost:11434. The main API endpoints include:
1. Generate Text (POST /api/generate):
bashcurl http://localhost:11434/api/generate -d '{ "model": "llama2", "prompt": "Hello, how are you?", "stream": false }'
2. Chat (POST /api/chat):
bashcurl http://localhost:11434/api/chat -d '{ "model": "llama2", "messages": [ { "role": "user", "content": "Hello!" }, { "role": "assistant", "content": "Hi there!" }, { "role": "user", "content": "How are you?" } ] }'
3. List Models (GET /api/tags):
bashcurl http://localhost:11434/api/tags
4. Show Model Info (POST /api/show):
bashcurl http://localhost:11434/api/show -d '{ "name": "llama2" }'
5. Copy Model (POST /api/copy):
bashcurl http://localhost:11434/api/copy -d '{ "source": "llama2", "destination": "my-llama2" }'
6. Delete Model (DELETE /api/delete):
bashcurl -X DELETE http://localhost:11434/api/delete -d '{ "name": "llama2" }'
7. Pull Model (POST /api/pull):
bashcurl http://localhost:11434/api/pull -d '{ "name": "llama2" }'
Streaming Response:
Set "stream": true to get streaming responses, suitable for real-time display of generated content.
Python Integration Example:
pythonimport requests response = requests.post('http://localhost:11434/api/generate', json={ 'model': 'llama2', 'prompt': 'Tell me a joke', 'stream': False }) print(response.json()['response'])