Introduction
Self-hosting AI models gives you complete control over your data, customization options, and often costs less than using API services for continuous usage. This guide will walk you through setting up a production-ready AI agent environment on AWS EC2.
Pro Tip: This setup is ideal for teams needing dedicated AI resources without sending sensitive data to external APIs. The auto-shutdown feature helps manage costs by only running the instance when needed.
EC2 Instance Recommendations
Choosing the right EC2 instance type is critical for optimal performance. Your selection should be based on the AI models you plan to run and your budget constraints.
Model Size Requirements
Model Type | VRAM Required | Recommended Instance | Hourly Cost (us-east-1) |
---|---|---|---|
Llama 3 8B / Phi-3 Mini | ~8-10 GB | g4dn.xlarge | $0.526 |
Deepseek Coder / Mistral 7B | ~14 GB | g5.xlarge | $1.006 |
Llama 3 70B / Mixtral 8x7B | ~70-80 GB | g5.12xlarge | $4.08 |
Multiple Large Models | 140+ GB | g5.48xlarge | $16.288 |
Budget-Friendly Option
For smaller models or testing environments:
- 4 vCPUs, 16 GB RAM
- 16 GB GPU Memory (T4)
- Good for 7-13B models
- ~$380/month (full time)
Production Recommendation
For running larger models with good performance:
- 16 vCPUs, 64 GB RAM
- 24 GB GPU Memory (A10G)
- Can run most models up to 30B
- ~$1350/month (full time)
Important Considerations
Remember to factor in EBS storage costs ($0.08/GB-month for gp3) and data transfer costs. For cost optimization, consider using spot instances for non-critical workloads (60-70% cheaper) or implementing auto-shutdown scripts.
Deployment Scripts & Steps
Initial Server Setup
# Update system and install dependencies
sudo apt update && sudo apt upgrade -y
sudo apt install -y git curl wget build-essential cmake nvidia-driver-525
After installing NVIDIA drivers, you'll need to reboot your instance with: sudo reboot
Installing Docker & NVIDIA Container Toolkit
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker $USER
# Install NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
Verify the installation by running:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Ollama Deployment
Ollama provides an easy way to run various open-source models locally. Here's how to set it up:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
systemctl --user enable ollama
systemctl --user start ollama
# Pull models (examples)
ollama pull llama3
ollama pull phi3
ollama pull deepseek-coder
Once installed, Ollama runs on port 11434. You can verify it's working by running:
curl http://localhost:11434/api/tags
LocalGPT Deployment (Alternative)
LocalGPT provides a more customizable interface for local AI models:
# Clone the LocalGPT repository
git clone https://github.com/PromtEngineer/localGPT.git
cd localGPT
# Set up with Docker
docker build -t localgpt .
docker run -d --gpus all -p 5000:5000 -v $(pwd):/app localgpt
Nginx Proxy with SSL Configuration
Setting up Nginx as a reverse proxy with SSL encryption allows secure access to your AI services from the internet.
Installing Nginx and Certbot
# Install Nginx and Certbot
sudo apt update
sudo apt install -y nginx certbot python3-certbot-nginx
# Start and enable Nginx
sudo systemctl start nginx
sudo systemctl enable nginx
Configuring Nginx for Ollama
Create a new Nginx config file:
sudo nano /etc/nginx/sites-available/ollama
Add the following configuration (replace yourdomain.com with your actual domain):
server {
listen 80;
server_name ai.yourdomain.com;
location / {
proxy_pass http://localhost:11434;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# WebSocket support
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Increase timeouts for long-running AI queries
proxy_read_timeout 300s;
proxy_connect_timeout 300s;
proxy_send_timeout 300s;
}
# Security headers
add_header X-Content-Type-Options "nosniff" always;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
}
Enable the site and get SSL certificate:
sudo ln -s /etc/nginx/sites-available/ollama /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Get SSL certificate
sudo certbot --nginx -d ai.yourdomain.com
Basic Authentication (Optional)
Add basic authentication to protect your AI service:
# Install apache2-utils for htpasswd
sudo apt install -y apache2-utils
# Create password file
sudo htpasswd -c /etc/nginx/.htpasswd yourusername
# Add to your Nginx config inside the location block:
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/.htpasswd;
Security Note: Basic authentication transmits credentials with base64 encoding, which is why SSL is crucial. For production environments, consider implementing more robust authentication methods.
Auto-Start Services Configuration
Ensure your AI services automatically start when your EC2 instance boots up.
Systemd Service for Ollama
Create a systemd service file:
sudo nano /etc/systemd/system/ollama.service
Add the following content:
[Unit]
Description=Ollama AI Service
After=network.target
[Service]
Type=simple
User=ubuntu
WorkingDirectory=/home/ubuntu
ExecStart=/usr/local/bin/ollama serve
Restart=always
RestartSec=10
Environment=HOME=/home/ubuntu
[Install]
WantedBy=multi-user.target
Enable and start the service:
sudo systemctl daemon-reload
sudo systemctl enable ollama.service
sudo systemctl start ollama.service
sudo systemctl status ollama.service
Docker Auto-Start for Other Services
For services running in Docker containers, enable auto-restart:
# For LocalGPT or other Docker containers
docker update --restart=always container_name_or_id
# Example:
docker update --restart=always localgpt
# For docker-compose based deployments
# Modify docker-compose.yml to include:
services:
service_name:
restart: always
# other configuration...
Tip: To test your auto-start configuration, reboot your instance
with sudo reboot
and verify all
services come back online automatically.
Code-Server Installation
Code-Server provides a browser-based VS Code environment, making it easy to develop and maintain your AI applications directly on your server.
Installing Code-Server
# Install Code-Server
curl -fsSL https://code-server.dev/install.sh | sh
# Start and enable the service
sudo systemctl enable --now code-server@$USER
# Configure Code-Server
mkdir -p ~/.config/code-server
nano ~/.config/code-server/config.yaml
Edit the configuration file with:
bind-addr: 127.0.0.1:8080
auth: password
password: your_secure_password
cert: false
Restart Code-Server to apply changes:
sudo systemctl restart code-server@$USER
Nginx Configuration for Code-Server
Create a new Nginx site configuration:
sudo nano /etc/nginx/sites-available/code-server
Add the following configuration:
server {
listen 80;
server_name code.yourdomain.com;
location / {
proxy_pass http://127.0.0.1:8080;
proxy_set_header Host $host;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection upgrade;
proxy_set_header Accept-Encoding gzip;
}
}
Enable the site and get SSL certificate:
sudo ln -s /etc/nginx/sites-available/code-server /etc/nginx/sites-enabled/
sudo nginx -t
sudo systemctl reload nginx
# Get SSL certificate
sudo certbot --nginx -d code.yourdomain.com
Success: Code-Server should now be available at https://code.yourdomain.com with password protection.
Installing Useful Extensions
Enhance your Code-Server with these useful extensions for AI development:
Python
Essential Python language support with IntelliSense, linting, and debugging.
Jupyter
Run and view Jupyter notebooks directly within VS Code.
Docker
Manage Docker containers and images directly from the editor.
IntelliCode
AI-assisted development with intelligent code completions.
Install extensions from the command line:
code-server --install-extension ms-python.python
code-server --install-extension ms-toolsai.jupyter
code-server --install-extension ms-azuretools.vscode-docker
code-server --install-extension visualstudioexptteam.vscodeintellicode
EC2 Auto-Shutdown Script
Implement an auto-shutdown script to automatically turn off your EC2 instance during periods of inactivity, helping to reduce costs significantly.
Creating the Auto-Shutdown Script
Create a new script file:
nano ~/auto-shutdown.sh
Add the following content:
#!/bin/bash
# Auto-shutdown script for EC2 instances
# This script checks for system activity and shuts down the instance if idle
# Configuration
IDLE_TIME_THRESHOLD=30 # Minutes of inactivity before shutdown
LOAD_THRESHOLD=0.1 # Load average threshold (1-minute)
EXCLUDE_PROCESSES=("ollama" "code-server") # Processes to exclude from activity check
# Get current load average (1-minute)
LOAD_AVG=$(cat /proc/loadavg | awk '{print $1}')
# Check if any excluded processes are actively processing
for PROCESS in "${EXCLUDE_PROCESSES[@]}"; do
PROCESS_CPU=$(ps aux | grep $PROCESS | grep -v grep | awk '{sum+=$3} END {print sum}')
if (( $(echo "$PROCESS_CPU > 5.0" | bc -l) )); then
echo "Process $PROCESS is active (CPU: $PROCESS_CPU%). Aborting shutdown."
exit 0
fi
done
# Check if load is below threshold
if (( $(echo "$LOAD_AVG < $LOAD_THRESHOLD" | bc -l) )); then
echo "System load ($LOAD_AVG) is below threshold ($LOAD_THRESHOLD)"
# Check for active SSH connections
SSH_CONNECTIONS=$(netstat -tn | grep :22 | grep ESTABLISHED | wc -l)
if [ $SSH_CONNECTIONS -gt 0 ]; then
echo "Active SSH connections found. Aborting shutdown."
exit 0
fi
# Check for recent user activity
LAST_ACTIVITY=$(w -i | grep -v JCPU | awk '{print $5}')
CURRENT_TIME=$(date +%s)
# If no users are logged in, proceed with shutdown consideration
if [ -z "$LAST_ACTIVITY" ]; then
echo "No users logged in. Checking system activity..."
# Check if system has been idle for threshold time
UPTIME_IDLE=$(grep 'cpu ' /proc/stat | awk '{idle=$5} {total=$1+$2+$3+$4+$5+$6+$7+$8+$9} {printf "%.2f", idle/total*100}')
if (( $(echo "$UPTIME_IDLE > 95.0" | bc -l) )); then
echo "System is idle (CPU idle: $UPTIME_IDLE%). Initiating shutdown..."
# Log the shutdown
echo "Auto-shutdown initiated at $(date)" >> /var/log/auto-shutdown.log
# Shutdown the system
sudo shutdown -h now
else
echo "System not idle enough (CPU idle: $UPTIME_IDLE%). Aborting shutdown."
fi
else
echo "Active users detected. Aborting shutdown."
fi
else
echo "System load ($LOAD_AVG) is above threshold ($LOAD_THRESHOLD). Aborting shutdown."
fi
Make the script executable:
chmod +x ~/auto-shutdown.sh
Setting Up a Cron Job
Configure the script to run every 15 minutes:
crontab -e
Add this line to the crontab:
*/15 * * * * /home/ubuntu/auto-shutdown.sh >> /home/ubuntu/auto-shutdown.log 2>&1
Using AWS Instance Scheduler (Alternative)
AWS offers a managed solution for scheduled instance starts and stops:
- Deploy the AWS Instance Scheduler solution from the AWS Solutions Library
- Configure schedules based on your usage patterns (e.g., working hours on weekdays)
- Tag your EC2 instances with the appropriate schedule identifier
- The scheduler will automatically start and stop instances based on the defined schedule
Cost Considerations
The AWS Instance Scheduler solution costs approximately $13.15 per month to operate, so it makes sense for environments with multiple instances. For a single instance, the script-based approach above is more cost-effective.
Cost Estimation
Understanding the cost implications of running AI models on AWS is crucial for budgeting. Here's a breakdown of the estimated costs.
Item | Description | Unit Cost | Quantity | Monthly Cost |
---|---|---|---|---|
EC2 Instance (g5.4xlarge) | Mid-sized GPU instance for AI workloads | $1.352/hour | 176 hours (8 hours/day × 22 days) | $237.95 |
EBS Storage (gp3) | General Purpose SSD storage | $0.08/GB-month | 250 GB | $20.00 |
Data Transfer Out | Network egress from AWS to Internet | $0.09/GB | 50 GB | $4.50 |
Total Monthly Cost (8h/day usage) | $262.45 | |||
Total Monthly Cost (24h/day usage) | $993.64 |
Cost Optimization Tips
- Use auto-shutdown scripts during idle periods
- Consider Spot Instances for non-critical workloads (up to 70% discount)
- Start with smaller instance types and scale up as needed
- Use gp3 EBS volumes instead of gp2 for better performance/cost ratio
- Consider Reserved Instances if you plan long-term usage
Cost Calculator
Reminder: AWS provides a Free Tier for new accounts which includes 750 hours of EC2 t2.micro instance usage, but this is insufficient for most AI workloads. The cost estimates above assume standard on-demand pricing.