I recently set up ollama
on my 7-year old desktop (AMD Ryzen 7 1700 8 core, 32GB RAM) with an equally old NVidia GPU (GeForce GTX 1070 8GB VRAM). I was able to run llama3.1:8b
successfully via terminal CLI.
I then configured Open Web UI, which gives me a friendly UI to work with the AI bot.
Setup on Ubuntu (Should work on other Linux distro as well)
- Install
ollama
curl -fsSL https://ollama.com/install.sh | sh
- Install Docker if you don’t have it already
- Find out the default location for your AI models. In my case, it is
/usr/share/ollama/
.
I used the brute force search to find where mine is:sudo find / -type d -name ollama
Take a note of this. You need it to run Docker container discussed below. - Pull in some models locally. Find available models by browsing https://ollama.com/library
For example, to pull in modelqwen2.5
, run the following:ollama pull qwen2.5
Add the tag if you want a particular billion parameter mode, for example:ollama pull qwen2.5:7b
One advantage of this setup is that only one copy of models will be downloaded locally. They will be stored in /path/to/model
mentioned above.
Run Open Web UI
Run the following command, replace /path/to/model with the default location mentioned above:docker run -d -p 3000:8080 --gpus=all -v /path/to/model:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama
Try different AI models locally and securely
Visit http://127.0.0.1:3000/. Create a login. There is a way of bypassing it, but I didn’t bother. Update: I am glad that I left the login requirement in place, because this allowed me to create apiKey
easily for each login. The apiKey
is important if you want to integrate the local LLMs with tools such as aider
or Continue
extension for VS Code or JetBrains IDEs.
With this in place, I can try different AI models easily by picking the model from the UI.
Additional tips
- Remove unused models to save space. For example:
ollama rm llama3.1:70b
- If you pulled a new model, you may need to refresh the Web UI for it to show up in the drop-down.
- Find out if your hardware is capable of running a particular model, by running it through a terminal
~ ollama run llama3.1:70b
pulling manifest
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 39 GB
pulling 948af2743fc7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB
pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 12 KB
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 96 B
pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 486 B
verifying sha256 digest
writing manifest
success
Error: model requires more system memory (33.2 GiB) than is available (26.6 GiB)
~ free -g
total used free shared buff/cache available
Mem: 31 5 0 0 24 24
Swap: 1 0 1
~ ollama ls
NAME ID SIZE MODIFIED
llama3.1:70b c0df3564cfe8 39 GB 22 minutes ago
llama3:latest 365c0bd3c000 4.7 GB 46 hours ago
~ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.2G 2.0M 3.2G 1% /run
/dev/sda3 457G 158G 276G 37% /
tmpfs 16G 28K 16G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/sda2 512M 6.1M 506M 2% /boot/efi
tmpfs 3.2G 1.7M 3.2G 1% /run/user/1000
~ ollama rm llama3.1:70b
deleted 'llama3.1:70b'
~ df -h
Filesystem Size Used Avail Use% Mounted on
tmpfs 3.2G 2.0M 3.2G 1% /run
/dev/sda3 457G 121G 313G 28% /
tmpfs 16G 28K 16G 1% /dev/shm
tmpfs 5.0M 4.0K 5.0M 1% /run/lock
/dev/sda2 512M 6.1M 506M 2% /boot/efi
tmpfs 3.2G 1.7M 3.2G 1% /run/user/1000
One response to “Run AI models locally with web interface”
[…] last blog post, I talked about running AI LLM models locally. Read that first if you want to follow […]