Run AI models locally with web interface

I recently set up ollama on my 7-year old desktop (AMD Ryzen 7 1700 8 core, 32GB RAM) with an equally old NVidia GPU (GeForce GTX 1070 8GB VRAM). I was able to run llama3.1:8b successfully via terminal CLI.

I then configured Open Web UI, which gives me a friendly UI to work with the AI bot.

Setup on Ubuntu (Should work on other Linux distro as well)

Install ollama
curl -fsSL https://ollama.com/install.sh | sh
Install Docker if you don’t have it already
Find out the default location for your AI models. In my case, it is /usr/share/ollama/.
I used the brute force search to find where mine is:
sudo find / -type d -name ollama
Take a note of this. You need it to run Docker container discussed below.
Pull in some models locally. Find available models by browsing https://ollama.com/library
For example, to pull in model qwen2.5, run the following:
ollama pull qwen2.5
Add the tag if you want a particular billion parameter mode, for example:
ollama pull qwen2.5:7b

One advantage of this setup is that only one copy of models will be downloaded locally. They will be stored in /path/to/model mentioned above.

Run Open Web UI

Run the following command, replace /path/to/model with the default location mentioned above:
docker run -d -p 3000:8080 --gpus=all -v /path/to/model:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama

Try different AI models locally and securely

Visit http://127.0.0.1:3000/. Create a login. There is a way of bypassing it, but I didn’t bother. Update: I am glad that I left the login requirement in place, because this allowed me to create apiKey easily for each login. The apiKey is important if you want to integrate the local LLMs with tools such as aider or Continue extension for VS Code or JetBrains IDEs.

With this in place, I can try different AI models easily by picking the model from the UI.

Additional tips

Remove unused models to save space. For example:
ollama rm llama3.1:70b
If you pulled a new model, you may need to refresh the Web UI for it to show up in the drop-down.
Find out if your hardware is capable of running a particular model, by running it through a terminal

~ ollama run llama3.1:70b
pulling manifest 
pulling a677b4a4b70c... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏  39 GB                         
pulling 948af2743fc7... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏ 1.5 KB                         
pulling 0ba8f0e314b4... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏  12 KB                         
pulling 56bb8bd477a5... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏   96 B                         
pulling 654440dac7f3... 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████▏  486 B                         
verifying sha256 digest 
writing manifest 
success 
Error: model requires more system memory (33.2 GiB) than is available (26.6 GiB)
~ free -g
               total        used        free      shared  buff/cache   available
Mem:              31           5           0           0          24          24
Swap:              1           0           1
~ ollama ls
NAME             ID              SIZE      MODIFIED       
llama3.1:70b     c0df3564cfe8    39 GB     22 minutes ago    
llama3:latest    365c0bd3c000    4.7 GB    46 hours ago      
~ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  2.0M  3.2G   1% /run
/dev/sda3       457G  158G  276G  37% /
tmpfs            16G   28K   16G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
/dev/sda2       512M  6.1M  506M   2% /boot/efi
tmpfs           3.2G  1.7M  3.2G   1% /run/user/1000
~ ollama rm llama3.1:70b
deleted 'llama3.1:70b'
~ df -h
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.2G  2.0M  3.2G   1% /run
/dev/sda3       457G  121G  313G  28% /
tmpfs            16G   28K   16G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
/dev/sda2       512M  6.1M  506M   2% /boot/efi
tmpfs           3.2G  1.7M  3.2G   1% /run/user/1000

One response to “Run AI models locally with web interface”

Productive development using VS Code Continue extension with local LLMs – The Ji Village News says:

2024 September 26 at

[…] last blog post, I talked about running AI LLM models locally. Read that first if you want to follow […]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Run AI models locally with web interface

One response to “Run AI models locally with web interface”

Leave a Reply