3. Include in all HTML files with: -->

Flowith AI Deployment

Self-Hosted AI Agent Setup Guide

Introduction

Self-hosting AI models gives you complete control over your data, customization options, and often costs less than using API services for continuous usage. This guide will walk you through setting up a production-ready AI agent environment on AWS EC2.

Pro Tip: This setup is ideal for teams needing dedicated AI resources without sending sensitive data to external APIs. The auto-shutdown feature helps manage costs by only running the instance when needed.

EC2 Instance Recommendations

Choosing the right EC2 instance type is critical for optimal performance. Your selection should be based on the AI models you plan to run and your budget constraints.

Model Size Requirements

Model Type VRAM Required Recommended Instance Hourly Cost (us-east-1)
Llama 3 8B / Phi-3 Mini ~8-10 GB g4dn.xlarge $0.526
Deepseek Coder / Mistral 7B ~14 GB g5.xlarge $1.006
Llama 3 70B / Mixtral 8x7B ~70-80 GB g5.12xlarge $4.08
Multiple Large Models 140+ GB g5.48xlarge $16.288

Budget-Friendly Option

For smaller models or testing environments:

g4dn.xlarge
  • 4 vCPUs, 16 GB RAM
  • 16 GB GPU Memory (T4)
  • Good for 7-13B models
  • ~$380/month (full time)

Production Recommendation

For running larger models with good performance:

g5.4xlarge
  • 16 vCPUs, 64 GB RAM
  • 24 GB GPU Memory (A10G)
  • Can run most models up to 30B
  • ~$1350/month (full time)

Important Considerations

Remember to factor in EBS storage costs ($0.08/GB-month for gp3) and data transfer costs. For cost optimization, consider using spot instances for non-critical workloads (60-70% cheaper) or implementing auto-shutdown scripts.

Deployment Scripts & Steps

Initial Server Setup

# Update system and install dependencies
sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl wget build-essential cmake nvidia-driver-525

After installing NVIDIA drivers, you'll need to reboot your instance with: sudo reboot

Installing Docker & NVIDIA Container Toolkit

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER

# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Verify the installation by running:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Ollama Deployment

Ollama provides an easy way to run various open-source models locally. Here's how to set it up:

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
systemctl --user enable ollama
systemctl --user start ollama

# Pull models (examples)
ollama pull llama3
ollama pull phi3
ollama pull deepseek-coder

Once installed, Ollama runs on port 11434. You can verify it's working by running: curl http://localhost:11434/api/tags

LocalGPT Deployment (Alternative)

LocalGPT provides a more customizable interface for local AI models:

# Clone the LocalGPT repository
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT

# Set up with Docker
docker build -t localgpt .
docker run -d --gpus all -p 5000:5000 -v $(pwd):/app localgpt

Nginx Proxy with SSL Configuration

Setting up Nginx as a reverse proxy with SSL encryption allows secure access to your AI services from the internet.

Installing Nginx and Certbot

# Install Nginx and Certbot
sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx

# Start and enable Nginx
sudo systemctl start nginx
sudo systemctl enable nginx

Configuring Nginx for Ollama

Create a new Nginx config file:

sudo nano /etc/nginx/sites-available/ollama

Add the following configuration (replace yourdomain.com with your actual domain):

server {
    listen 80;
    server_name ai.yourdomain.com;

    location / {
        proxy_pass http://localhost:11434;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # WebSocket support
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";

        # Increase timeouts for long-running AI queries
        proxy_read_timeout 300s;
        proxy_connect_timeout 300s;
        proxy_send_timeout 300s;
    }

    # Security headers
    add_header X-Content-Type-Options "nosniff" always;
    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
}

Enable the site and get SSL certificate:

sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Get SSL certificate
sudo certbot --nginx -d ai.yourdomain.com

Basic Authentication (Optional)

Add basic authentication to protect your AI service:

# Install apache2-utils for htpasswd
sudo apt install -y apache2-utils

# Create password file
sudo htpasswd -c /etc/nginx/.htpasswd yourusername

# Add to your Nginx config inside the location block:
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;

Security Note: Basic authentication transmits credentials with base64 encoding, which is why SSL is crucial. For production environments, consider implementing more robust authentication methods.

Auto-Start Services Configuration

Ensure your AI services automatically start when your EC2 instance boots up.

Systemd Service for Ollama

Create a systemd service file:

sudo nano /etc/systemd/system/ollama.service

Add the following content:

[Unit]
Description=Ollama AI Service
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=10
Environment=HOME=/home/ubuntu

[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl daemon-reload
sudo systemctl enable ollama.service
sudo systemctl start ollama.service
sudo systemctl status ollama.service

Docker Auto-Start for Other Services

For services running in Docker containers, enable auto-restart:

# For LocalGPT or other Docker containers
docker update --restart=always container_name_or_id

# Example:
docker update --restart=always localgpt

# For docker-compose based deployments
# Modify docker-compose.yml to include:
services:
  service_name:
    restart: always
    # other configuration...

Tip: To test your auto-start configuration, reboot your instance with sudo reboot and verify all services come back online automatically.

Code-Server Installation

Code-Server provides a browser-based VS Code environment, making it easy to develop and maintain your AI applications directly on your server.

Installing Code-Server

# Install Code-Server
curl -fsSL https://code-server.dev/install.sh | sh

# Start and enable the service
sudo systemctl enable --now code-server@$USER

# Configure Code-Server
mkdir -p ~/.config/code-server
nano ~/.config/code-server/config.yaml

Edit the configuration file with:

bind-addr: 127.0.0.1:8080
auth: password
password: your_secure_password
cert: false

Restart Code-Server to apply changes:

sudo systemctl restart code-server@$USER

Nginx Configuration for Code-Server

Create a new Nginx site configuration:

sudo nano /etc/nginx/sites-available/code-server

Add the following configuration:

server {
    listen 80;
    server_name code.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:8080;
        proxy_set_header Host $host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection upgrade;
        proxy_set_header Accept-Encoding gzip;
    }
}

Enable the site and get SSL certificate:

sudo ln -s /etc/nginx/sites-available/code-server /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx

# Get SSL certificate
sudo certbot --nginx -d code.yourdomain.com

Success: Code-Server should now be available at https://code.yourdomain.com with password protection.

Installing Useful Extensions

Enhance your Code-Server with these useful extensions for AI development:

Python

Essential Python language support with IntelliSense, linting, and debugging.

ms-python.python

Jupyter

Run and view Jupyter notebooks directly within VS Code.

ms-toolsai.jupyter

Docker

Manage Docker containers and images directly from the editor.

ms-azuretools.vscode-docker

IntelliCode

AI-assisted development with intelligent code completions.

visualstudioexptteam.vscodeintellicode

Install extensions from the command line:

code-server --install-extension ms-python.python
code-server --install-extension ms-toolsai.jupyter
code-server --install-extension ms-azuretools.vscode-docker
code-server --install-extension visualstudioexptteam.vscodeintellicode

EC2 Auto-Shutdown Script

Implement an auto-shutdown script to automatically turn off your EC2 instance during periods of inactivity, helping to reduce costs significantly.

Creating the Auto-Shutdown Script

Create a new script file:

nano ~/auto-shutdown.sh

Add the following content:

#!/bin/bash

# Auto-shutdown script for EC2 instances
# This script checks for system activity and shuts down the instance if idle

# Configuration
IDLE_TIME_THRESHOLD=30  # Minutes of inactivity before shutdown
LOAD_THRESHOLD=0.1      # Load average threshold (1-minute)
EXCLUDE_PROCESSES=("ollama" "code-server")  # Processes to exclude from activity check

# Get current load average (1-minute)
LOAD_AVG=$(cat /proc/loadavg | awk '{print $1}')

# Check if any excluded processes are actively processing
for PROCESS in "${EXCLUDE_PROCESSES[@]}"; do
    PROCESS_CPU=$(ps aux | grep $PROCESS | grep -v grep | awk '{sum+=$3} END {print sum}')
    if (( $(echo "$PROCESS_CPU > 5.0" | bc -l) )); then
        echo "Process $PROCESS is active (CPU: $PROCESS_CPU%). Aborting shutdown."
        exit 0
    fi
done

# Check if load is below threshold
if (( $(echo "$LOAD_AVG < $LOAD_THRESHOLD" | bc -l) )); then
    echo "System load ($LOAD_AVG) is below threshold ($LOAD_THRESHOLD)"

    # Check for active SSH connections
    SSH_CONNECTIONS=$(netstat -tn | grep :22 | grep ESTABLISHED | wc -l)
    if [ $SSH_CONNECTIONS -gt 0 ]; then
        echo "Active SSH connections found. Aborting shutdown."
        exit 0
    fi

    # Check for recent user activity
    LAST_ACTIVITY=$(w -i | grep -v JCPU | awk '{print $5}')
    CURRENT_TIME=$(date +%s)

    # If no users are logged in, proceed with shutdown consideration
    if [ -z "$LAST_ACTIVITY" ]; then
        echo "No users logged in. Checking system activity..."

        # Check if system has been idle for threshold time
        UPTIME_IDLE=$(grep 'cpu ' /proc/stat | awk '{idle=$5} {total=$1+$2+$3+$4+$5+$6+$7+$8+$9} {printf "%.2f", idle/total*100}')
        if (( $(echo "$UPTIME_IDLE > 95.0" | bc -l) )); then
            echo "System is idle (CPU idle: $UPTIME_IDLE%). Initiating shutdown..."
            # Log the shutdown
            echo "Auto-shutdown initiated at $(date)" >> /var/log/auto-shutdown.log
            # Shutdown the system
            sudo shutdown -h now
        else
            echo "System not idle enough (CPU idle: $UPTIME_IDLE%). Aborting shutdown."
        fi
    else
        echo "Active users detected. Aborting shutdown."
    fi
else
    echo "System load ($LOAD_AVG) is above threshold ($LOAD_THRESHOLD). Aborting shutdown."
fi

Make the script executable:

chmod +x ~/auto-shutdown.sh

Setting Up a Cron Job

Configure the script to run every 15 minutes:

crontab -e

Add this line to the crontab:

*/15 * * * * /home/ubuntu/auto-shutdown.sh >> /home/ubuntu/auto-shutdown.log 2>&1

Using AWS Instance Scheduler (Alternative)

AWS offers a managed solution for scheduled instance starts and stops:

  1. Deploy the AWS Instance Scheduler solution from the AWS Solutions Library
  2. Configure schedules based on your usage patterns (e.g., working hours on weekdays)
  3. Tag your EC2 instances with the appropriate schedule identifier
  4. The scheduler will automatically start and stop instances based on the defined schedule

Cost Considerations

The AWS Instance Scheduler solution costs approximately $13.15 per month to operate, so it makes sense for environments with multiple instances. For a single instance, the script-based approach above is more cost-effective.

Cost Estimation

Understanding the cost implications of running AI models on AWS is crucial for budgeting. Here's a breakdown of the estimated costs.

Item Description Unit Cost Quantity Monthly Cost
EC2 Instance (g5.4xlarge) Mid-sized GPU instance for AI workloads $1.352/hour 176 hours (8 hours/day × 22 days) $237.95
EBS Storage (gp3) General Purpose SSD storage $0.08/GB-month 250 GB $20.00
Data Transfer Out Network egress from AWS to Internet $0.09/GB 50 GB $4.50
Total Monthly Cost (8h/day usage) $262.45
Total Monthly Cost (24h/day usage) $993.64

Cost Optimization Tips

  • Use auto-shutdown scripts during idle periods
  • Consider Spot Instances for non-critical workloads (up to 70% discount)
  • Start with smaller instance types and scale up as needed
  • Use gp3 EBS volumes instead of gp2 for better performance/cost ratio
  • Consider Reserved Instances if you plan long-term usage

Cost Calculator

1h 8h 24h
Estimated Monthly Cost: $262.45
EC2 Instance: $238.00
Storage (EBS): $20.00
Data Transfer: $4.50

Reminder: AWS provides a Free Tier for new accounts which includes 750 hours of EC2 t2.micro instance usage, but this is insufficient for most AI workloads. The cost estimates above assume standard on-demand pricing.

Looking for Other Resources?

Check out my other AWS research pages: