Simple hosting of ollama and additional value add services on a Jetson Orin
You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Drew Bednar 952144538a Basic setup 3 weeks ago
.gitignore Initial commit 4 weeks ago
LICENSE Initial commit 4 weeks ago
README.md Basic setup 3 weeks ago
docker-compose-ollama.yaml Basic setup 3 weeks ago
ollama.service Basic setup 3 weeks ago

README.md

Jetson-ollama

Simple hosting of ollama and additional value add services on a Jetson Orin.

Install

If you are ever using a Jetpack version that doesn't have pre-built images yet, you will want to install https://github.com/dusty-nv/jetson-containers and use the commands to build your own docker containers.

This repo simply hosts a docker-compose file and a systemd service file to ensure the docker-compose services are started on boot.

sudo mkdir -p /opt/ollama
sudo cp docker-compose-ollama.yaml /opt/ollama/docker-compose-ollama.yaml
sudo cp ollama.service /etc/systemd/system/ollama.service
sudo systemctl enable ollama.service
sudo systemctl start ollama.service

Hosting a Model

You now will need pull and then run a model:

curl http://localhost:11434/api/pull -d '{
  "model": "mistral-nemo"
}'

Output:

{"status":"pulling manifest"}
{"status":"pulling b559938ab7a0","digest":"sha256:b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94","total":7071700672,"completed":7071700672}
{"status":"pulling f023d1ce0e55","digest":"sha256:f023d1ce0e55d0dcdeaf70ad81555c2a20822ed607a7abd8de3c3131360f5f0a","total":688,"completed":688}
{"status":"pulling 43070e2d4e53","digest":"sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1","total":11356,"completed":11356}
{"status":"pulling ed11eda7790d","digest":"sha256:ed11eda7790d05b49395598a42b155812b17e263214292f7b87d15e14003d337","total":30,"completed":30}
{"status":"pulling 65d37de20e59","digest":"sha256:65d37de20e5951c7434ad4230c51a4d5be99b8cb7407d2135074d82c40b44b45","total":486,"completed":486}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"success"}

We can validate that we at least have one model available by checking the tags:

curl http://0.0.0.0:11434/api/tags

Output:

{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","modified_at":"2024-12-25T00:01:25.24932255Z","size":7071713232,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"}}]}

We can check if we have any running models:

curl http://0.0.0.0:11434/api/ps

Output:

{"models":[]}

We can load a model model into memory by submitting an empty request:

curl http://0.0.0.0:11434/api/generate -d '{
  "model": "mistral-nemo"
}'

Output:

{"model":"mistral-nemo","created_at":"2024-12-25T00:13:16.691913415Z","response":"","done":true,"done_reason":"load"}
curl http://0.0.0.0:11434/api/ps

Output:

{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","size":9290250240,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"},"expires_at":"2024-12-25T00:18:16.692451257Z","size_vram":9290250240}]}% 

This shows that the model will remain in memory for three hours, then it will expire and be removed:

date -u
Wed Dec 25 00:15:21 UTC 2024

If we want to keep the model around indefinitely in memory you can use the keep_alive parameter. A value of -1 will set an inifinte expirey. A value of 0 will unload the model from memory. Other values follow the form [5m, 1h, 3d].

curl http://0.0.0.0:11434/api/generate -d '{
  "model": "mistral-nemo"
  "keep_alive": -1
}'

Lastly, we can submit a chat request like

curl http://0.0.0.0:11434/api/chat -d '{
  "model": "mistral-nemo",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

Output:

{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.203017147Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.38823807Z","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.546712356Z","message":{"role":"assistant","content":" appears"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.705009365Z","message":{"role":"assistant","content":" blue"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.858441986Z","message":{"role":"assistant","content":" due"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.011519221Z","message":{"role":"assistant","content":" to"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.164679014Z","message":{"role":"assistant","content":" a"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.317643996Z","message":{"role":"assistant","content":" phenomenon"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.470739343Z","message":{"role":"assistant","content":" called"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.623952448Z","message":{"role":"assistant","content":" Ray"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.777503755Z","message":{"role":"assistant","content":"leigh"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.929986217Z","message":{"role":"assistant","content":" scattering"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:10.083242522Z","message":{"role":"assistant","content":"."},"done":false}
...

Which will continue until the stop token is generated by the model. Naturally this API method is cumbersome for general use, so for that we also included the Open WebUI which is hosted port 11433.