jetson-ollama/README.md

# Jetson-ollama

Simple hosting of ollama and additional value add services on a Jetson Orin.

## Install

If you are ever using a Jetpack version that doesn't have pre-built images yet, you will want to install https://github.com/dusty-nv/jetson-containers and use the commands to build your own docker containers.

This repo simply hosts a docker-compose file and a systemd service file to ensure the docker-compose services are started on boot.


```
sudo mkdir -p /opt/ollama
sudo cp docker-compose-ollama.yaml /opt/ollama/docker-compose-ollama.yaml
```

```
sudo cp ollama.service /etc/systemd/system/ollama.service
sudo systemctl enable ollama.service
sudo systemctl start ollama.service
```

## Hosting a Model

You now will need pull and then run a model:

```
curl http://localhost:11434/api/pull -d '{
  "model": "mistral-nemo"
}'
```

Output:
```
{"status":"pulling manifest"}
{"status":"pulling b559938ab7a0","digest":"sha256:b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94","total":7071700672,"completed":7071700672}
{"status":"pulling f023d1ce0e55","digest":"sha256:f023d1ce0e55d0dcdeaf70ad81555c2a20822ed607a7abd8de3c3131360f5f0a","total":688,"completed":688}
{"status":"pulling 43070e2d4e53","digest":"sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1","total":11356,"completed":11356}
{"status":"pulling ed11eda7790d","digest":"sha256:ed11eda7790d05b49395598a42b155812b17e263214292f7b87d15e14003d337","total":30,"completed":30}
{"status":"pulling 65d37de20e59","digest":"sha256:65d37de20e5951c7434ad4230c51a4d5be99b8cb7407d2135074d82c40b44b45","total":486,"completed":486}
{"status":"verifying sha256 digest"}
{"status":"writing manifest"}
{"status":"success"}
```

We can validate that we at least have one model available by checking the tags:

```
curl http://0.0.0.0:11434/api/tags
```

Output:
```
{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","modified_at":"2024-12-25T00:01:25.24932255Z","size":7071713232,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"}}]}
```

We can check if we have any running models:

```
curl http://0.0.0.0:11434/api/ps
```

Output:
```
{"models":[]}
```

We can load a model model into memory by submitting an empty request:

```
curl http://0.0.0.0:11434/api/generate -d '{
  "model": "mistral-nemo"
}'
```

Output:
```
{"model":"mistral-nemo","created_at":"2024-12-25T00:13:16.691913415Z","response":"","done":true,"done_reason":"load"}
```

```
curl http://0.0.0.0:11434/api/ps
```

Output:
```
{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","size":9290250240,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"},"expires_at":"2024-12-25T00:18:16.692451257Z","size_vram":9290250240}]}% 
```

This shows that the model will remain in memory for three hours, then it will expire and be removed:

```
date -u
Wed Dec 25 00:15:21 UTC 2024
```

If we want to keep the model around indefinitely in memory you can use the `keep_alive` parameter. A value of `-1` will set an inifinte expirey. A value of `0` will unload the model from memory. Other values follow the form [5m, 1h, 3d].

```
curl http://0.0.0.0:11434/api/generate -d '{
  "model": "mistral-nemo"
  "keep_alive": -1
}'
```

Lastly, we can submit a chat request like


```
curl http://0.0.0.0:11434/api/chat -d '{
  "model": "mistral-nemo",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'
```

Output:
```
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.203017147Z","message":{"role":"assistant","content":"The"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.38823807Z","message":{"role":"assistant","content":" sky"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.546712356Z","message":{"role":"assistant","content":" appears"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.705009365Z","message":{"role":"assistant","content":" blue"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.858441986Z","message":{"role":"assistant","content":" due"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.011519221Z","message":{"role":"assistant","content":" to"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.164679014Z","message":{"role":"assistant","content":" a"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.317643996Z","message":{"role":"assistant","content":" phenomenon"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.470739343Z","message":{"role":"assistant","content":" called"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.623952448Z","message":{"role":"assistant","content":" Ray"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.777503755Z","message":{"role":"assistant","content":"leigh"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.929986217Z","message":{"role":"assistant","content":" scattering"},"done":false}
{"model":"mistral-nemo","created_at":"2024-12-25T00:26:10.083242522Z","message":{"role":"assistant","content":"."},"done":false}
...
```

Which will continue until the stop token is generated by the model. Naturally this API method is cumbersome for general use, so for that we also included the [Open WebUI](https://github.com/open-webui/open-webui) which is hosted port 11433.
Basic setup 3 weeks ago			`# Jetson-ollama`

			`Simple hosting of ollama and additional value add services on a Jetson Orin.`

			`## Install`

			`If you are ever using a Jetpack version that doesn't have pre-built images yet, you will want to install https://github.com/dusty-nv/jetson-containers and use the commands to build your own docker containers.`

			`This repo simply hosts a docker-compose file and a systemd service file to ensure the docker-compose services are started on boot.`


			```
			`sudo mkdir -p /opt/ollama`
			`sudo cp docker-compose-ollama.yaml /opt/ollama/docker-compose-ollama.yaml`
			```

			```
			`sudo cp ollama.service /etc/systemd/system/ollama.service`
			`sudo systemctl enable ollama.service`
			`sudo systemctl start ollama.service`
			```

			`## Hosting a Model`

			`You now will need pull and then run a model:`

			```
			`curl http://localhost:11434/api/pull -d '{`
			`"model": "mistral-nemo"`
			`}'`
			```

			`Output:`
			```
			`{"status":"pulling manifest"}`
			`{"status":"pulling b559938ab7a0","digest":"sha256:b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94","total":7071700672,"completed":7071700672}`
			`{"status":"pulling f023d1ce0e55","digest":"sha256:f023d1ce0e55d0dcdeaf70ad81555c2a20822ed607a7abd8de3c3131360f5f0a","total":688,"completed":688}`
			`{"status":"pulling 43070e2d4e53","digest":"sha256:43070e2d4e532684de521b885f385d0841030efa2b1a20bafb76133a5e1379c1","total":11356,"completed":11356}`
			`{"status":"pulling ed11eda7790d","digest":"sha256:ed11eda7790d05b49395598a42b155812b17e263214292f7b87d15e14003d337","total":30,"completed":30}`
			`{"status":"pulling 65d37de20e59","digest":"sha256:65d37de20e5951c7434ad4230c51a4d5be99b8cb7407d2135074d82c40b44b45","total":486,"completed":486}`
			`{"status":"verifying sha256 digest"}`
			`{"status":"writing manifest"}`
			`{"status":"success"}`
			```

			`We can validate that we at least have one model available by checking the tags:`

			```
			`curl http://0.0.0.0:11434/api/tags`
			```

			`Output:`
			```
			`{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","modified_at":"2024-12-25T00:01:25.24932255Z","size":7071713232,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"}}]}`
			```

			`We can check if we have any running models:`

			```
			`curl http://0.0.0.0:11434/api/ps`
			```

			`Output:`
			```
			`{"models":[]}`
			```

			`We can load a model model into memory by submitting an empty request:`

			```
			`curl http://0.0.0.0:11434/api/generate -d '{`
			`"model": "mistral-nemo"`
			`}'`
			```

			`Output:`
			```
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:13:16.691913415Z","response":"","done":true,"done_reason":"load"}`
			```

			```
			`curl http://0.0.0.0:11434/api/ps`
			```

			`Output:`
			```
			`{"models":[{"name":"mistral-nemo:latest","model":"mistral-nemo:latest","size":9290250240,"digest":"994f3b8b78011aa6d578b0c889cbb89a64b778f80d73b8d991a8db1f1e710ace","details":{"parent_model":"","format":"gguf","family":"llama","families":["llama"],"parameter_size":"12.2B","quantization_level":"Q4_0"},"expires_at":"2024-12-25T00:18:16.692451257Z","size_vram":9290250240}]}%`
			```

			`This shows that the model will remain in memory for three hours, then it will expire and be removed:`

			```
			`date -u`
			`Wed Dec 25 00:15:21 UTC 2024`
			```

			If we want to keep the model around indefinitely in memory you can use the `keep_alive` parameter. A value of `-1` will set an inifinte expirey. A value of `0` will unload the model from memory. Other values follow the form [5m, 1h, 3d].

			```
			`curl http://0.0.0.0:11434/api/generate -d '{`
			`"model": "mistral-nemo"`
			`"keep_alive": -1`
			`}'`
			```

			`Lastly, we can submit a chat request like`


			```
			`curl http://0.0.0.0:11434/api/chat -d '{`
			`"model": "mistral-nemo",`
			`"messages": [`
			`{`
			`"role": "user",`
			`"content": "Why is the sky blue?"`
			`}`
			`]`
			`}'`
			```

			`Output:`
			```
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.203017147Z","message":{"role":"assistant","content":"The"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.38823807Z","message":{"role":"assistant","content":" sky"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.546712356Z","message":{"role":"assistant","content":" appears"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.705009365Z","message":{"role":"assistant","content":" blue"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:08.858441986Z","message":{"role":"assistant","content":" due"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.011519221Z","message":{"role":"assistant","content":" to"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.164679014Z","message":{"role":"assistant","content":" a"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.317643996Z","message":{"role":"assistant","content":" phenomenon"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.470739343Z","message":{"role":"assistant","content":" called"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.623952448Z","message":{"role":"assistant","content":" Ray"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.777503755Z","message":{"role":"assistant","content":"leigh"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:09.929986217Z","message":{"role":"assistant","content":" scattering"},"done":false}`
			`{"model":"mistral-nemo","created_at":"2024-12-25T00:26:10.083242522Z","message":{"role":"assistant","content":"."},"done":false}`
			`...`
			```

			`Which will continue until the stop token is generated by the model. Naturally this API method is cumbersome for general use, so for that we also included the [Open WebUI](https://github.com/open-webui/open-webui) which is hosted port 11433.`


Initial commit 4 weeks ago