Running Generative AI Models Locally

Inspiration from this document

Choosing a model

Find a model that matches your system requirements here:

  • https://ollama.com/search

Install Ollama

Download and install the ollama binary

curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz && \
sudo tar -C /usr -xzf ollama-linux-amd64.tgz

Create an Ollama user & group service account

sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama && \

Add your account to the ollama group

sudo usermod -a -G ollama $(whoami)

Systemd service unit file

cat <<EOF > /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"

[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl start ollama
sudo systemctl status ollama
sudo  journalctl -e -u ollama

Using ollama

ollama --version

Run the openchat model

ollama run openchat:latest

This will run the latest openchat AI model that you will be interacting with

openchat model notes

  • Requires only ~4gb of memory (must run in memory)
  • Only has data up to 2021