This guide provides a comprehensive approach for setting up a self-hosted AI agent on an Amazon Web Services (AWS) EC2 instance, tailored to run AI models like Llama 3 70B, Qwen2.5-Coder-7B, and others. The setup leverages the g6.xlarge instance for cost-effective GPU performance, installs Ollama for model execution, Open WebUI for a user-friendly web interface, and Code-Server for remote development.
EC2 Instance Recommendation
For running AI models such as Llama 3 70B with quantization, the g6.xlarge EC2 instance is recommended. This instance features:
- 1 NVIDIA L4 GPU with 24 GB VRAM, suitable for quantized large language models (LLMs).
- 4 vCPUs and 16 GB RAM, sufficient for development tasks.
- Cost: Approximately $0.8048 per hour in US East (N. Virginia), as per Vantage.
The L4 GPU, based on NVIDIA's Ada Lovelace architecture, offers comparable performance to the A10G GPU (in g5.xlarge) for memory-bound inference tasks, but at a lower cost (~20% less than g5.xlarge's $1.006/hour). The 24 GB VRAM supports models like Llama 3 70B when quantized to 4-bit, which reduces memory needs to ~35 GB or less with optimization, fitting within the GPU's capacity.
Why g6.xlarge? It balances cost and performance, leveraging the newer L4 GPU's efficiency. The g5.xlarge, while viable, is less cost-effective, and g4dn instances (T4 GPUs, 16 GB VRAM) may not support larger models reliably. The Deep Learning AMI (Ubuntu) is recommended for its pre-installed NVIDIA CUDA drivers.
Deployment Instructions
To deploy the AI agent, follow these steps:
- Launch an EC2 Instance: Select the g6.xlarge instance with the Deep Learning AMI (Ubuntu) from the AWS Marketplace. Ensure the "Shutdown behavior" is set to "Stop" in the EC2 console to enable cost-saving shutdowns.
- Connect to the Instance: Use SSH to access the instance (e.g.,
ssh -i your-key.pem ubuntu@instance-ip). - Run the Setup Script: Copy and execute the provided script to install and configure all components.
Setup Script
Script Explanation
| Component | Description |
|---|---|
| Dependencies | Installs Docker for container management, Nginx for web serving, and Certbot for SSL. |
| Ollama | Installed via the official script, enabling GPU-accelerated model execution. |
| Open WebUI | Deployed as a Docker container, connected to Ollama's API. |
| Nginx | Configured to proxy requests to Open WebUI, with security headers for protection. |
| SSL | Optionally configures Let's Encrypt SSL if a domain is provided. |
| Code-Server | Installed and set up as a systemd service for browser-based VSCode access. |
| Auto-Shutdown | Cron job stops the instance at 2 AM daily to reduce costs. |
After running the script, access Open WebUI at http://your-instance-ip:80 or your
domain (if
configured). Download models via Open WebUI (e.g., ollama run llama3). Some models
may require manual
GGUF file imports if not available in Ollama's library.
Server Configuration
Nginx serves Open WebUI, proxying requests from port 80 to port 3000. The configuration includes security headers to mitigate common web vulnerabilities. If you have a domain, Certbot automates SSL setup, enabling HTTPS access.
Security Tip: Without a domain, you'll use the instance's public IP over HTTP, which is less secure. Consider a cheap domain or free subdomain service for SSL support.
Auto-Start and Reliability
To ensure services run reliably and restart after reboots:
- Ollama: Installed as a systemd service, automatically starting and restarting on failure.
- Open WebUI: Docker container with
--restart alwayspolicy, ensuring uptime. - Code-Server: Configured as a systemd service for consistent operation.
Verify service status with systemctl status ollama,
docker ps, and
systemctl status code-server.
Remote Development
Code-Server provides a browser-based Visual Studio Code environment,
accessible at
http://your-instance-ip:8080. Set a secure password in the script's
CODE_SERVER_PASSWORD variable.
Pro Tip: For added security, extend the Nginx configuration to proxy Code-Server with SSL, similar to Open WebUI. This setup allows you to code and manage the server remotely, mimicking the convenience of cloud IDEs.
Cost Management
To optimize costs, the script includes a cron job that stops the instance at 2 AM daily, assuming the EC2 "Shutdown behavior" is set to "Stop." This reduces runtime to ~8 hours/day, saving ~60% compared to 24/7 operation.
To restart, use the AWS console or AWS CLI
(aws ec2 start-instances). For more advanced
scheduling, consider AWS Instance Scheduler, though the cron job is simpler for basic needs.
Cost Estimate
The following table outlines estimated monthly costs for the g6.xlarge setup in US East (N. Virginia):
| Component | Details | Cost/Month |
|---|---|---|
| Instance (g6.xlarge) | $0.8048/hour, 8 hours/day (240 hours/month) | $193.15 |
| Instance (24/7) | $0.8048/hour, 730 hours/month | $587.50 |
| Storage (Local) | 250 GB NVMe SSD (included with g6.xlarge) | $0.00 |
| Storage (EBS) | Optional 100 GB gp3 EBS volume, $0.08/GB-month | $8.00 |
| Data Transfer | 100 GB out free, $0.09/GB thereafter (minimal) | ~$0.00 |
| Total (8 hours/day) | Local storage, minimal data transfer | $193.15–$201.15 |
| Total (24/7) | Local storage, minimal data transfer | $587.50–$595.50 |
Notes: Spot instances (~$0.24/hour for g6.xlarge) could further reduce costs but risk interruptions. Savings Plans or Reserved Instances offer up to 72% savings for long-term commitments but require planning.
Additional Considerations
- Model Support: Verify model availability in Ollama's library. For unavailable models (e.g., Qwen2.5-Coder), import GGUF files manually.
- Security: HTTP access without SSL is less secure. A domain enables HTTPS, improving safety.
- Scalability: For heavier workloads, consider g6.12xlarge (4 GPUs, 96 GB VRAM, ~$4.83/hour).
- Monitoring: Use AWS CloudWatch to track usage and set cost alerts.
Conclusion
This setup delivers a cost-effective, self-hosted AI agent on AWS EC2, avoiding subscription fees while providing cloud-like accessibility. The g6.xlarge instance, combined with Ollama, Open WebUI, and Code-Server, offers a powerful platform for running AI models and developing remotely.
Automated cost management ensures affordability, with monthly costs as low as $193 for moderate use. By following this guide, you can achieve a reliable, secure, and efficient AI development environment tailored to your needs.
Citations
- Amazon EC2 G6 Instances Specifications
- Amazon EC2 G5 Instances Specifications
- g6.xlarge Pricing and Specifications
- g5.xlarge Pricing and Specifications
- Ollama Linux Installation Guide
- Open WebUI Quick Start Guide
- Code-Server Installation Instructions
- NVIDIA L4 Tensor Core GPU Specifications
- NVIDIA A10 Tensor Core GPU Specifications
Looking for Other Resources?
Check out my other AWS research pages: