Deploying a public Next.js app to AWS ECS

Wednesday, September 11, 2024

Having a reliable and tested system for running apps on cloud servers in our fast-paced development environment is essential. As businesses and personal projects grow, the need for scalable, flexible solutions becomes more critical. Dockerized containers offer an ideal way to package and deploy applications consistently, regardless of the underlying infrastructure. By leveraging cloud platforms like AWS and utilizing container orchestration tools such as ECS (Elastic Container Service), you can ensure your applications are both highly available and able to scale up or down based on demand. Whether you're managing multiple apps or building solutions for fun, establishing a proven system for deploying Next.js apps in the cloud is key to maintaining efficiency and reliability.

Alongside multiple *.tldrlw.com apps, I consistently work on other projects for friends, family and partnerships, therefore, I needed to build a template to deploy Next.js apps efficiently and reliably. Having worked out a good system I want to share this with others who might be looking for a similar solution. This guide will take a very barebones Next.js app and deploy it to ECS. I'm assuming you have a basic understanding of Terraform and how it works, creating an AWS IAM (Identity Access Management) User with a Secret Key and Secret Access Key to run Terraform locally against your AWS account, and a basic understanding of Bash scripting. Code references for everything we will do can be found here.

The first step is to set up your Terraform remote backend state using S3 and DynamoDB, use this guide to get that done. You can do this through the AWS managment console or using Terraform, in my case, I did it using Terraform, but I have this code available only locally and in a different directory. To provision the S3 bucket and the DynamoDB table required for my main codebase' remote backend state, I used the Terraform local state as opposed to the remote backend state.

After you've created the remote backend state resources, you'll need to reference them in the remote backend state you'll be setting up for this project. For my project, I'm provisioning components of my infrastructure in the region us-east-1, you can pick whatever region is best suited to you, find more information about AWS regions here. There's also a neat tool you can use to gauge which region will result in the lowest latency for the geographic request origin of most of your users. Let's begin by setting up our provider.tf file, which will have the remote backend state and provider configuration.

A Terraform provider is a plugin that enables Terraform to interact with APIs of cloud platforms and other services, allowing Terraform to manage and provision resources on those platforms (e.g., AWS, Azure, Google Cloud).

provider.tf


terraform {
  backend "s3" {
    bucket         = "<your s3 bucket name>"
    key            = "global/s3/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "<your dynamodb table name>"
    encrypt        = true
  }
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.40.0"
    }
  }
  required_version = ">= 1.7.4"
}

provider "aws" {
  region = "us-east-1"
  default_tags {
    tags = {
      ManagedBy = "Terraform"
    }
  }
}

Let's now start building out our other app-related resources starting with ECR (Elastic Container Registry) and VPC (Virtual Private Cloud), with the latter having a couple of resources in a single file.

They provide a lightweight, portable, and consistent environment, ensuring that apps can run uniformly across development, testing, and production environments. This helps eliminate the “it works on my machine” problem by encapsulating all dependencies, making environments identical across different stages of the software lifecycle, while also improving scalability and resource efficiency.

ECR is a fully managed Docker container registry service, and it allows you to store, manage, and deploy Docker container images securely. With ECR, developers can push, pull, and manage container images, making it an essential service for working with AWS services like ECS.

ecr.tf


resource "aws_ecr_repository" "main" {
  name                 = var.APP_NAME
  image_tag_mutability = "MUTABLE"
  image_scanning_configuration {
    scan_on_push = true
  }
}

As you can see below, we have five distinct resources as part of our VPC configuration below, below the code block, we will dive deep into each one of them and understand how they all work together.

vpc.tf


resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_support   = true
  enable_dns_hostnames = true
}

resource "aws_subnet" "public" {
  count                   = 3
  vpc_id                  = aws_vpc.main.id
  cidr_block              = "10.0.${count.index}.0/24"
  map_public_ip_on_launch = true
  availability_zone       = data.aws_availability_zones.available.names[count.index]
  # data.aws_availability_zones.available.names[count.index], will 
  # dynamically get the available zones for the region 
  # you are in makes your configuration more flexible 
  # and adaptable to different regions with different 
  # availability zones all regions have a MINIMUM 
  # of 3 availability zones, so this will alert you when 
  # trying to create x subnets > 3 in an unsupported region
}

resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
}

resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
}

resource "aws_route_table_association" "public" {
  count          = 3
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

VPC Configuration Overview

This Terraform configuration above creates a highly available infrastructure on AWS. Below is a breakdown of each resource and how they interact to form a scalable and resilient network environment.

VPC

The VPC in this configuration defines an isolated network environment on AWS. It is created with a large CIDR block of 10.0.0.0/16, allowing for multiple subnets to exist within it. DNS support and hostnames are enabled to ensure seamless communication between resources inside the VPC. A VPC provides full control over your networking, including subnets, routing, and traffic management, making it essential when deploying apps on Amazon ECS.

In this case, the VPC allows containers running in ECS to securely communicate within the network, while load balancers distribute traffic efficiently across multiple subnets, enhancing both reliability and performance. By configuring public and private subnets, the VPC ensures that your application has secure internet access when needed while maintaining high availability across AWS region AZs (Availability Zone). This integration of VPC with ECS enables scalable, secure, and resilient containerized apps.

Subnets

A subnet within a VPC is a range of IP addresses that divides the VPC into smaller, isolated networks. It allows resources, such as EC2 instances (which power ECS under thehood), to be placed in different parts of the VPC with distinct network configurations. Subnets can be either public (accessible from the internet) or private (isolated from the internet), depending on whether they are associated with an Internet Gateway.

Three public subnets are defined, distributed across three different availability zones for high availability. The CIDR blocks for these subnets follow a pattern like 10.0.x.0/24 where x is based on the availability zone index, ensuring each subnet has a unique range of IP addresses. These subnets automatically assign public IP addresses to instances launched within them, making them accessible from the internet.

Internet Gateway

An Internet Gateway is created to allow communication between the VPC and the internet. This resource enables instances within the public subnets to send and receive traffic from the internet, providing external access to services and resources.

Route Table

A route table is set up for directing traffic. The table includes a route that sends all outgoing traffic (destination 0.0.0.0/0) to the internet gateway. This ensures that instances in the public subnets can access the internet.

Route Table Associations

The route table is associated with each of the three subnets, ensuring that all subnets direct their traffic through the internet gateway. This setup is essential for maintaining internet connectivity in each availability zone.

By distributing the subnets across different availability zones, this architecture ensures high availability. If one availability zone goes down, the other subnets in different zones remain operational, making this setup resilient to failure and providing continuous service.

Now that we have our network configured correctly to maintain high availability for our app running ECS, we can proceed to build out our ALB (Application Load Balancer) with a SSL (Secure Sockets Layer) certificate request from ACM (AWS Certificate Manager), the certificate ensures that our app serves traffic only over HTTPS, for secure transmissions between client and server. However, in order to request the certificate we need to own a domain first, and you can use Route 53 to buy a domain of your choosing, I bought tldrlw.com (and I did this throught the management console), upon which I was provided with a Route 53 hosted zone. We will need this hosted zone moving forward, so make sure you buy yourself a domain before proceeding.

A Terraform data source is a mechanism that allows Terraform to query external data or resources that are managed outside of the current Terraform configuration. Instead of creating resources, data sources are used to retrieve information about existing infrastructure—such as AWS AMIs, VPCs, or subnets—so that this data can be used in the Terraform configuration. This allows you to incorporate existing resources into your infrastructure without duplicating them or managing them directly with Terraform.

sources.tf


data "aws_route53_zone" "main" {
  # change the name to whatever domain 
  # you bought from Route 53
  name         = "<your domain goes here, e.g., helloworld.com>"
  private_zone = false
}

data "aws_availability_zones" "available" {
  state = "available"
}

Like the defintion above explains, we need to pull information about the hosted zone that was created when you registered your new domain in Route 53. We will need to pass in the zone id when creating a Route 53 A record, and also when creating Route 53 records involved in our process of requesting a certificate from ACM. Let's create the Route 53 and ACM resources and then understand how it all works.

route53-acm.tf


resource "aws_acm_certificate" "main" {
  domain_name       = var.HOSTNAME
  validation_method = "DNS"
  key_algorithm     = "RSA_2048"
  tags = {
    Name = var.APP_NAME
  }
}

resource "aws_route53_record" "main_cert_validation" {
  for_each = {
    for dvo in aws_acm_certificate.main.domain_validation_options : dvo.domain_name => {
      name  = dvo.resource_record_name
      type  = dvo.resource_record_type
      value = dvo.resource_record_value
    }
  }
  allow_overwrite = true
  name            = each.value.name
  records         = [each.value.value]
  ttl             = 60
  type            = each.value.type
  zone_id         = data.aws_route53_zone.tldrlw_com.zone_id
}

resource "aws_acm_certificate_validation" "main" {
  certificate_arn         = aws_acm_certificate.main.arn
  validation_record_fqdns = [for record in aws_route53_record.main_cert_validation : record.fqdn]
}

resource "aws_route53_record" "main" {
  zone_id = data.aws_route53_zone.main.zone_id
  # set the name to a subdomain if you want one, 
  # otherwise set it to "" to use your root 
  # domain, for my use case I wanted a subdomain
  name    = "blog"
  # will show up in management console Route 53 
  # as 'blog.tldrlw.com'
  type = "A"
  alias {
    name                   = module.main.alb_dns_name
    zone_id                = module.main.alb_zone_id
    evaluate_target_health = true
  }
}

ACM & Route 53 Configuration Overview

This Terraform configuration above sets up an SSL certificate for a domain using ACM and integrates it with Route 53 for DNS validation. Below is a detailed breakdown of how each resource works together to ensure secure communication for your app.

What is AWS Route 53?

AWS's scalable and highly available Domain Name System (DNS) web service. It is used to route users to your app by translating human-readable domain names (like example.com) into machine-readable IP addresses. Route 53 can also register domain names, manage DNS records, and monitor health checks, ensuring traffic is directed to healthy endpoints.

What is ACM (AWS Certificate Manager)?

Service that lets you easily provision, manage, and deploy SSL/TLS (Transport Layer Security) certificates. These certificates are used to secure network communications and establish the identity of websites over the internet. ACM handles the complexities of certificate management, including renewal, allowing you to focus on running your app while ensuring they are secured with industry-standard encryption.

ACM Certificate

ACM is used to request an SSL certificate for the domain. In this configuration, a certificate is requested for the domain defined by the var.HOSTNAME. The validation method is set to "DNS", which means the domain's ownership will be verified via DNS records. The RSA_2048 algorithm is used for the certificate's key, providing strong encryption.

ACM automatically manages certificate renewals, ensuring continuous security for your domain. By using DNS validation, there's no need to manually approve the certificate; the validation process is handled seamlessly through your Route 53 records.

Route 53 DNS Validation Record

This resource sets up DNS validation for the SSL certificate. For each domain validation option that ACM provides, a Route 53 record is created. The Terraform for_each loop dynamically generates these records based on ACM's domain validation options, including the name, type, and value of the DNS record required to prove ownership of the domain.

These records are critical for enabling ACM to verify ownership of your domain without manual intervention. By automatically creating these records, you reduce the risk of delays or errors during the validation process. The short TTL ensures that any updates to these records propagate quickly.

ACM Certificate Validation

After setting up the DNS records, this resource completes the validation process by referencing the certificate's ARN and the fully qualified domain names (FQDNs) of the Route 53 records. This step ensures that the SSL certificate becomes active once AWS verifies the domain ownership through the DNS validation records.

Once validated, the certificate can be used to secure traffic to your domain using HTTPS. This automation reduces the need for manual steps and ensures that the certificate becomes active as soon as validation is complete.

Route 53 A Record for Domain

The final Route 53 record points the domain name (e.g., blog.tldrlw.com) to the ALB. The alias block within the record defines an alias that links the domain to the ALB's DNS name and zone ID. The evaluate_target_health parameter ensures that the DNS will only resolve if the ALB is healthy, increasing reliability for end-users.

By using an alias record, AWS can dynamically manage the underlying IP addresses of the ALB, ensuring that the domain always points to the correct location without needing manual updates. This also allows for automatic scaling and resilience without impacting DNS resolution.

This setup seamlessly integrates ACM and Route 53 to manage SSL certificates and DNS records, ensuring your domain is validated and secured for HTTPS traffic. Automating the entire process reduces errors and ensures continuous availability and security.

By now you would've noticed things like var.APP_NAME and var.HOSTNAME, these are Terraform variables we will need to define in a variables.tf file. Using variables ensure consistency and maintainability when building infrastructure, furthermore, should you need to make any changes, you only need to do it in one place. Let's define our variables like how you see below.

variables.tf


variable "APP_NAME" {
  type    = string
  default = "blog-tldrlw"
}

output "TF_VAR_APP_NAME" {
  value = var.APP_NAME
}

variable "IMAGE_TAG" {
  type    = string
  # after running the bash script that will build and push your
  # Docker image to ECR, you can update the default value here
  # and run terraform plan and terraform apply --auto-approve
  default = "latest"
}

output "TF_VAR_IMAGE_TAG" {
  value = var.IMAGE_TAG
}

variable "HOSTNAME" {
  type = string
  # change this to what you want you deployed 
  # domain to be, e.g., mywebsite.com, my.website.com, etc.
  # if using a subdomain like my.website.com, 
  # be sure to check the "name" property in the 
  # resource aws_route_53_record.main we created above
  default = "blog.tldrlw.com"
}

output "TF_VAR_HOSTNAME" {
  value = var.HOSTNAME
}

At this point, for the infrastructure part of things, all we have left are the ECS and ALB resources. For these resources, we will be using Terraform modules. A Terraform module is a reusable collection of Terraform resources that are organized and packaged to perform a specific task, such as provisioning infrastructure components. It helps simplify infrastructure management by allowing you to define common patterns and reuse them across different configurations, making your code more maintainable and scalable. Since I'm building multiple Next.js apps, I've tried to modularize as much Terraform configuration as possible, to keep app-specific repositories "DRY" (Don't Repeat Yourself). The modules we'll be using can be found here.

For the purposes of education, my preference in relying on modules is not ideal, since you won't be able to see the underlying infrastructure resources that go into provisioning an ECS service + task and the ALB. I will have another blog post soon explaining the components of these two modules and their synergies.

alb.tf


module "main" {
  source               = "git::https://github.com/tldrlw/terraform-modules.git//app-load-balancer"
  vpc_id               = aws_vpc.main.id
  subnet_ids           = aws_subnet.public[*].id
  alb_name             = var.APP_NAME
  target_group_and_listener_config = [
    {
      name              = var.APP_NAME
      domain            = var.HOSTNAME
      health_check_path = "/"
    }
  ]
  certificate_arn      = aws_acm_certificate_validation.main.certificate_arn
  # change if you don't want your app to be entirely public
  security_group_cidrs = ["0.0.0.0/0"]
}

As you can see above the CIDR range is set to "0.0.0.0/0", this is because we are building a public app, if your app needs to only serve traffic from a confined space like an office network, you can change this value to your appropriate CIDR range.

And now that the ALB is created, we can move on to ECS. As was mentioned above, we will rely on my module to create the service, but the cluster itself will be managed normally as a resource in your file.

ecs.tf


resource "aws_ecs_cluster" "main" {
  name = "main"
}

module "ecs_service" {
  source                      = "git::https://github.com/tldrlw/terraform-modules.git//ecs-service?ref=dev"
  app_name                    = var.APP_NAME
  ecr_repo_url                = aws_ecr_repository.main.repository_url
  image_tag                   = var.IMAGE_TAG
  ecs_cluster_id              = aws_ecs_cluster.main.id
  task_count                  = 1
  alb_target_group_arn        = module.main.alb_target_group_arns[0]
  source_security_group_id    = module.main.alb_security_group_id
  # change if you don't want your app to be entirely public
  security_group_egress_cidrs = ["0.0.0.0/0"]
  subnets                     = aws_subnet.public[*].id
  vpc_id                      = aws_vpc.main.id
  container_port              = 3000
  host_port                   = 3000
  # linux_arm64          = true
  # ^ set to true if using the following scripts to build and push images to ECR on M-series Macs:
  # https://github.com/tldrlw/blog-tldrlw/blob/boilerplate-nextjs/front-end/docker-push.sh
}

If you're running the provided bash script to build and push your Next.js Docker image to ECR on an M-series Mac, you will need to set linux_arm64 = true.

Note the lines alb_target_group_arn = module.main.alb_target_group_arn source_security_group_id = module.main.alb_security_group_id above, this module is getting configuration data that is being outputted from the ALB module reference. This is something you can do in Terraform when one module reference relies on information upon the creation of resources from another module reference. Having set up our ALB and ECS resources, let's try and better understand what they do (quick refreshers on some things) and how they're connected.

ALB & ECS Service Configuration Overview

This Terraform configuration sets up an ALB and ECS Service to host and scale the containerized app. Below is a detailed breakdown of how each resource works together to ensure efficient traffic distribution and container management for your Next.js app.

What is an ALB?

An ALB is a key component in AWS that distributes incoming traffic across multiple targets, such as EC2 instances or containers, in different availability zones. It operates at the application layer (Layer 7 of the OSI model) and supports advanced routing features, making it ideal for microservices architectures and container-based deployments. ALBs can automatically distribute traffic to healthy targets, ensuring high availability and better performance for your application.

What is ECS?

ECS is a fully managed container orchestration service that allows you to run and scale containerized apps easily. With ECS, you can deploy containers on a cluster of EC2 instances or use AWS Fargate (what we're using) to run containers without managing underlying infrastructure. ECS integrates seamlessly with other AWS services like ALB and ECR, enabling efficient scaling, secure networking, and management of container-based workloads.

ALB Setup

In this configuration, the ALB is provisioned using a Terraform module that handles its setup. The ALB is created within the VPC specified by vpc_id, and it is associated with the public subnets (subnet_ids) to make the application accessible from the internet. The ALB listens for traffic on a specific port, and the SSL certificate (managed by ACM) ensures that traffic is encrypted via HTTPS.

Additionally, a target group is created that registers ECS tasks running in the cluster as targets. This target group allows the ALB to distribute incoming requests across the tasks, ensuring load balancing and high availability.

ECS Cluster Setup

An ECS cluster is the fundamental resource where your containers run. In this configuration, the cluster named "main" is created, and all services and tasks will be deployed into it. Here, the ECS service interacts with the ALB through the target group, enabling efficient distribution of network traffic to the containerized app.

ECS Service

The ECS service is responsible for managing the deployment and scaling of your containers in the ECS cluster. In this configuration, the service pulls container images from the ECR repository, identified by the ecr_repo_url, and deploys them using the image tag specified in var.IMAGE_TAG.

The service launches containers that listen on port 3000, and traffic is routed to these containers via the ALB. By specifying security group rules and subnet configurations, the service ensures that the containers are securely accessible, while remaining scalable to meet demand. This ECS service can dynamically adjust the number of running tasks based on the desired task count and traffic load. For our purposes, we have a static value of "1" for task_count, but feel free to change this if youd like, and monitor task deployments across AZs to guarantee high availability.

By integrating the ALB with the ECS service, this setup ensures that your containerized app is securely deployed, automatically load balanced, and scalable, while using Terraform to automate the entire provisioning process.

At this point we are almost done, we have to validate our code, and we can do it by runningterraform validate, and if the checks pass we can run terraform plan. Reading through everything that is outputted here in the plan is imperative, as it will give you an understanding of what you're about to provision into your AWS account. You could also see errors here (also when you run terraform validate), and in such an event, you'll have to make changes to your code. If the plan stage looks good, you can run terraform apply --auto-approve, and voila, you have deployed your new ECS infrastructure.

You can take some time and look through the different resources you created in the management console, but pay close attention to what you see in ECS, you should see your ECS task deployment failing. Why is that? It's because our variable IMAGE_TAG has a default value of "latest", but we have yet to build and push our Next.js app to ECR. In the ECS logs you'll see error messages saying that the container with tag "latest" can't be found, which makes sense because nothing exists in ECR at this point.

As mentioned prior, you can clone this repo, and run the docker-push.sh from the front-end directory, passing in your ECR repo name and the region (e.g., us-east-1). Once the Docker image is built and pushed to ECR, you'll get a six digit image tag. In the variables.tf file, change IMAGE_TAG to have the default value of the image tag, then run Terraform again. Your ECS task definition will update with the newly provided image tag, and your app should be running after a couple of minutes, reachable at whatever you set for aws_route53_record.main.

In an upcoming blog post, I'll share how to use Github Actions' workflows to implement a CI/CD system, eliminating the need for manual Docker image builds and Terraform executions.