Grok AI on AWS EC2 | Self-Hosted AI Research

This guide provides a comprehensive approach for setting up a self-hosted AI agent on an Amazon Web Services (AWS) EC2 instance, tailored to run AI models like Llama 3 70B, Qwen2.5-Coder-7B, and others. The setup leverages the g6.xlarge instance for cost-effective GPU performance, installs Ollama for model execution, Open WebUI for a user-friendly web interface, and Code-Server for remote development.

EC2 Instance Recommendation

For running AI models such as Llama 3 70B with quantization, the g6.xlarge EC2 instance is recommended. This instance features:

1 NVIDIA L4 GPU with 24 GB VRAM, suitable for quantized large language models (LLMs).
4 vCPUs and 16 GB RAM, sufficient for development tasks.
Cost: Approximately $0.8048 per hour in US East (N. Virginia), as per Vantage.

The L4 GPU, based on NVIDIA's Ada Lovelace architecture, offers comparable performance to the A10G GPU (in g5.xlarge) for memory-bound inference tasks, but at a lower cost (~20% less than g5.xlarge's $1.006/hour). The 24 GB VRAM supports models like Llama 3 70B when quantized to 4-bit, which reduces memory needs to ~35 GB or less with optimization, fitting within the GPU's capacity.

Why g6.xlarge? It balances cost and performance, leveraging the newer L4 GPU's efficiency. The g5.xlarge, while viable, is less cost-effective, and g4dn instances (T4 GPUs, 16 GB VRAM) may not support larger models reliably. The Deep Learning AMI (Ubuntu) is recommended for its pre-installed NVIDIA CUDA drivers.

Deployment Instructions

To deploy the AI agent, follow these steps:

Launch an EC2 Instance: Select the g6.xlarge instance with the Deep Learning AMI (Ubuntu) from the AWS Marketplace. Ensure the "Shutdown behavior" is set to "Stop" in the EC2 console to enable cost-saving shutdowns.
Connect to the Instance: Use SSH to access the instance (e.g., ssh -i your-key.pem ubuntu@instance-ip).
Run the Setup Script: Copy and execute the provided script to install and configure all components.

Setup Script

Script Explanation

Component	Description
Dependencies	Installs Docker for container management, Nginx for web serving, and Certbot for SSL.
Ollama	Installed via the official script, enabling GPU-accelerated model execution.
Open WebUI	Deployed as a Docker container, connected to Ollama's API.
Nginx	Configured to proxy requests to Open WebUI, with security headers for protection.
SSL	Optionally configures Let's Encrypt SSL if a domain is provided.
Code-Server	Installed and set up as a systemd service for browser-based VSCode access.
Auto-Shutdown	Cron job stops the instance at 2 AM daily to reduce costs.

After running the script, access Open WebUI at http://your-instance-ip:80 or your domain (if configured). Download models via Open WebUI (e.g., ollama run llama3). Some models may require manual GGUF file imports if not available in Ollama's library.

Server Configuration

Nginx serves Open WebUI, proxying requests from port 80 to port 3000. The configuration includes security headers to mitigate common web vulnerabilities. If you have a domain, Certbot automates SSL setup, enabling HTTPS access.

Security Tip: Without a domain, you'll use the instance's public IP over HTTP, which is less secure. Consider a cheap domain or free subdomain service for SSL support.

Auto-Start and Reliability

To ensure services run reliably and restart after reboots:

Ollama: Installed as a systemd service, automatically starting and restarting on failure.
Open WebUI: Docker container with --restart always policy, ensuring uptime.
Code-Server: Configured as a systemd service for consistent operation.

Verify service status with systemctl status ollama, docker ps, and systemctl status code-server.

Remote Development

Code-Server provides a browser-based Visual Studio Code environment, accessible at http://your-instance-ip:8080. Set a secure password in the script's CODE_SERVER_PASSWORD variable.

Pro Tip: For added security, extend the Nginx configuration to proxy Code-Server with SSL, similar to Open WebUI. This setup allows you to code and manage the server remotely, mimicking the convenience of cloud IDEs.

Cost Management

To optimize costs, the script includes a cron job that stops the instance at 2 AM daily, assuming the EC2 "Shutdown behavior" is set to "Stop." This reduces runtime to ~8 hours/day, saving ~60% compared to 24/7 operation.

To restart, use the AWS console or AWS CLI (aws ec2 start-instances). For more advanced scheduling, consider AWS Instance Scheduler, though the cron job is simpler for basic needs.

Cost Estimate

The following table outlines estimated monthly costs for the g6.xlarge setup in US East (N. Virginia):

Component	Details	Cost/Month
Instance (g6.xlarge)	$0.8048/hour, 8 hours/day (240 hours/month)	$193.15
Instance (24/7)	$0.8048/hour, 730 hours/month	$587.50
Storage (Local)	250 GB NVMe SSD (included with g6.xlarge)	$0.00
Storage (EBS)	Optional 100 GB gp3 EBS volume, $0.08/GB-month	$8.00
Data Transfer	100 GB out free, $0.09/GB thereafter (minimal)	~$0.00
Total (8 hours/day)	Local storage, minimal data transfer	$193.15–$201.15
Total (24/7)	Local storage, minimal data transfer	$587.50–$595.50

Notes: Spot instances (~$0.24/hour for g6.xlarge) could further reduce costs but risk interruptions. Savings Plans or Reserved Instances offer up to 72% savings for long-term commitments but require planning.

Additional Considerations

Model Support: Verify model availability in Ollama's library. For unavailable models (e.g., Qwen2.5-Coder), import GGUF files manually.
Security: HTTP access without SSL is less secure. A domain enables HTTPS, improving safety.
Scalability: For heavier workloads, consider g6.12xlarge (4 GPUs, 96 GB VRAM, ~$4.83/hour).
Monitoring: Use AWS CloudWatch to track usage and set cost alerts.

Conclusion

This setup delivers a cost-effective, self-hosted AI agent on AWS EC2, avoiding subscription fees while providing cloud-like accessibility. The g6.xlarge instance, combined with Ollama, Open WebUI, and Code-Server, offers a powerful platform for running AI models and developing remotely.

Automated cost management ensures affordability, with monthly costs as low as $193 for moderate use. By following this guide, you can achieve a reliable, secure, and efficient AI development environment tailored to your needs.

Citations

Looking for Other Resources?

Check out my other AWS research pages:

DeepSeek AI Config Gemini AI Setup Flowith AI Deployment Manus AI Implementation