Drew Bednar 952144538a | 3 weeks ago | |
---|---|---|
.gitignore | 4 weeks ago | |
LICENSE | 4 weeks ago | |
README.md | 3 weeks ago | |
docker-compose-ollama.yaml | 3 weeks ago | |
ollama.service | 3 weeks ago |
README.md
Jetson-ollama
Simple hosting of ollama and additional value add services on a Jetson Orin.
Install
If you are ever using a Jetpack version that doesn't have pre-built images yet, you will want to install https://github.com/dusty-nv/jetson-containers and use the commands to build your own docker containers.
This repo simply hosts a docker-compose file and a systemd service file to ensure the docker-compose services are started on boot.
sudo mkdir -p /opt/ollama
sudo cp docker-compose-ollama.yaml /opt/ollama/docker-compose-ollama.yaml
sudo cp ollama.service /etc/systemd/system/ollama.service
sudo systemctl enable ollama.service
sudo systemctl start ollama.service
Hosting a Model
You now will need pull and then run a model:
curl http://localhost:11434/api/pull -d '{
"model": "mistral-nemo"
}'
Output:
{"status":"pulling manifest"}
{"status":"pulling b559938ab7a0","digest":"sha256:b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94","total":7071700672,"completed":7071700672}
{"status":"pulling f023d1ce0e55","digest":"sha256:f023d1ce0e55d0dcdeaf70ad81555c2a20822ed607a7abd8de3c3131360f5f0a","total":688,"completed":688}
{"status":"pulling 43070e2d4e53","digest":"sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1","total":11356,"completed":11356}
{"status":"pulling ed11eda7790d","digest":"sha256:ed11eda7790d05b49395598a42b155812b17e263214292f7b87d15e14003d337","total":30,"completed":30}
{"status":"pulling 65d37de20e59","digest":"sha256:65d37de20e5951c7434ad4230c51a4d5be99b8cb7407d2135074d82c40b44b45","total":486,"completed":486}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"success"}
We can validate that we at least have one model available by checking the tags:
curl http://0.0.0.0:11434/api/tags
Output:
{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","modified_at":"2024-12-25T00:01:25.24932255Z","size":7071713232,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"}}]}
We can check if we have any running models:
curl http://0.0.0.0:11434/api/ps
Output:
{"models":[]}
We can load a model model into memory by submitting an empty request:
curl http://0.0.0.0:11434/api/generate -d '{
"model": "mistral-nemo"
}'
Output:
{"model":"mistral-nemo","created_at":"2024-12-25T00:13:16.691913415Z","response":"","done":true,"done_reason":"load"}
curl http://0.0.0.0:11434/api/ps
Output:
{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","size":9290250240,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"},"expires_at":"2024-12-25T00:18:16.692451257Z","size_vram":9290250240}]}%
This shows that the model will remain in memory for three hours, then it will expire and be removed:
date -u
Wed Dec 25 00:15:21 UTC 2024
If we want to keep the model around indefinitely in memory you can use the keep_alive
parameter. A value of -1
will set an inifinte expirey. A value of 0
will unload the model from memory. Other values follow the form [5m, 1h, 3d].
curl http://0.0.0.0:11434/api/generate -d '{
"model": "mistral-nemo"
"keep_alive": -1
}'
Lastly, we can submit a chat request like
curl http://0.0.0.0:11434/api/chat -d '{
"model": "mistral-nemo",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'
Output:
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.203017147Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.38823807Z","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.546712356Z","message":{"role":"assistant","content":" appears"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.705009365Z","message":{"role":"assistant","content":" blue"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.858441986Z","message":{"role":"assistant","content":" due"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.011519221Z","message":{"role":"assistant","content":" to"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.164679014Z","message":{"role":"assistant","content":" a"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.317643996Z","message":{"role":"assistant","content":" phenomenon"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.470739343Z","message":{"role":"assistant","content":" called"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.623952448Z","message":{"role":"assistant","content":" Ray"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.777503755Z","message":{"role":"assistant","content":"leigh"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.929986217Z","message":{"role":"assistant","content":" scattering"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:10.083242522Z","message":{"role":"assistant","content":"."},"done":false}
...
Which will continue until the stop token is generated by the model. Naturally this API method is cumbersome for general use, so for that we also included the Open WebUI which is hosted port 11433.