# Jetson-ollama Simple hosting of ollama and additional value add services on a Jetson Orin. ## Install If you are ever using a Jetpack version that doesn't have pre-built images yet, you will want to install https://github.com/dusty-nv/jetson-containers and use the commands to build your own docker containers. This repo simply hosts a docker-compose file and a systemd service file to ensure the docker-compose services are started on boot. ``` sudo mkdir -p /opt/ollama sudo cp docker-compose-ollama.yaml /opt/ollama/docker-compose-ollama.yaml ``` ``` sudo cp ollama.service /etc/systemd/system/ollama.service sudo systemctl enable ollama.service sudo systemctl start ollama.service ``` ## Hosting a Model You now will need pull and then run a model: ``` curl http://localhost:11434/api/pull -d '{ "model": "mistral-nemo" }' ``` Output: ``` {"status":"pulling manifest"} {"status":"pulling b559938ab7a0","digest":"sha256:b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94","total":7071700672,"completed":7071700672} {"status":"pulling f023d1ce0e55","digest":"sha256:f023d1ce0e55d0dcdeaf70ad81555c2a20822ed607a7abd8de3c3131360f5f0a","total":688,"completed":688} {"status":"pulling 43070e2d4e53","digest":"sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1","total":11356,"completed":11356} {"status":"pulling ed11eda7790d","digest":"sha256:ed11eda7790d05b49395598a42b155812b17e263214292f7b87d15e14003d337","total":30,"completed":30} {"status":"pulling 65d37de20e59","digest":"sha256:65d37de20e5951c7434ad4230c51a4d5be99b8cb7407d2135074d82c40b44b45","total":486,"completed":486} {"status":"verifying sha256 digest"} {"status":"writing manifest"} {"status":"success"} ``` We can validate that we at least have one model available by checking the tags: ``` curl http://0.0.0.0:11434/api/tags ``` Output: ``` {"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","modified_at":"2024-12-25T00:01:25.24932255Z","size":7071713232,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"}}]} ``` We can check if we have any running models: ``` curl http://0.0.0.0:11434/api/ps ``` Output: ``` {"models":[]} ``` We can load a model model into memory by submitting an empty request: ``` curl http://0.0.0.0:11434/api/generate -d '{ "model": "mistral-nemo" }' ``` Output: ``` {"model":"mistral-nemo","created_at":"2024-12-25T00:13:16.691913415Z","response":"","done":true,"done_reason":"load"} ``` ``` curl http://0.0.0.0:11434/api/ps ``` Output: ``` {"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","size":9290250240,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"},"expires_at":"2024-12-25T00:18:16.692451257Z","size_vram":9290250240}]}% ``` This shows that the model will remain in memory for three hours, then it will expire and be removed: ``` date -u Wed Dec 25 00:15:21 UTC 2024 ``` If we want to keep the model around indefinitely in memory you can use the `keep_alive` parameter. A value of `-1` will set an inifinte expirey. A value of `0` will unload the model from memory. Other values follow the form [5m, 1h, 3d]. ``` curl http://0.0.0.0:11434/api/generate -d '{ "model": "mistral-nemo" "keep_alive": -1 }' ``` Lastly, we can submit a chat request like ``` curl http://0.0.0.0:11434/api/chat -d '{ "model": "mistral-nemo", "messages": [ { "role": "user", "content": "Why is the sky blue?" } ] }' ``` Output: ``` {"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.203017147Z","message":{"role":"assistant","content":"The"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.38823807Z","message":{"role":"assistant","content":" sky"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.546712356Z","message":{"role":"assistant","content":" appears"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.705009365Z","message":{"role":"assistant","content":" blue"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.858441986Z","message":{"role":"assistant","content":" due"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.011519221Z","message":{"role":"assistant","content":" to"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.164679014Z","message":{"role":"assistant","content":" a"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.317643996Z","message":{"role":"assistant","content":" phenomenon"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.470739343Z","message":{"role":"assistant","content":" called"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.623952448Z","message":{"role":"assistant","content":" Ray"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.777503755Z","message":{"role":"assistant","content":"leigh"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.929986217Z","message":{"role":"assistant","content":" scattering"},"done":false} {"model":"mistral-nemo","created_at":"2024-12-25T00:26:10.083242522Z","message":{"role":"assistant","content":"."},"done":false} ... ``` Which will continue until the stop token is generated by the model. Naturally this API method is cumbersome for general use, so for that we also included the [Open WebUI](https://github.com/open-webui/open-webui) which is hosted port 11433.