Custom EC2 hostnames and DNS entries

I’ve been doing some work with EC2 recently. I wanted to be able to bring up a server using Ansible, pre-configured with a hostname and valid, working FQDN.

There’s a few complexities to this. Unless you’re using Elastic IP addresses, EC2 instances will change public IP address on reboot, so I needed to ensure that the DNS entry of the FQDN will update if the host changes.

A common way of doing this is to bake out an AMI, pre-configured with a script that runs on boot to talk to the DNS server and create/update the entry. But you still need a way of passing the desired hostname in when you launch the instance for the first time, and you end up with your security keys baked onto a AMI, making it difficult to rotate them. And custom AMIs are fiddly – I’d prefer to use the official ones from Ubuntu or Amazon so I don’t have to bake out a new AMI on every OS point release.

I ended up with an approach that uses a combination of cloud-init, an IAM instance role, and Route 53, to set the hostname, and write a boot script to grab temporary credentials and set the DNS entry.

EC2 supports a thing called IAM Instance Roles, allowing an EC2 instance to grab temporary credentials for a role, letting it access AWS resources without hardcoding the access tokens. It does this by fetching credentials from an internal HTTP server, but if you use awscli or other official libraries, they’ll do this for you, unless you provide credentials explicitly.

In this case, we grant just enough permission to be able to update a specific zone on Route 53. I chose to put all my server DNS entries in their own zone to isolate them, but you don’t have to do that. I made a role called ‘set-internal-dns’ and gave it a policy document like this:

{
  "Statement":[
    {
      "Action":[
        "route53:ChangeResourceRecordSets",
        "route53:GetHostedZone",
        "route53:ListResourceRecordSets"
      ],
      "Effect":"Allow",
      "Resource":[
        "arn:aws:route53:::hostedzone/<ZONE_ID_HERE>"
      ]
    }
  ]
}

Next, I wrote an Ansible task to boot a machine set to that role, with a user-data string containing cloud-init config.

- name: Launch instance
  ec2:
    keypair: "{{ keypair }}"
    region: "{{ region }}"
    zone: "{{ az }}"
    image: "{{ image }}"
    instance_type: "m3.medium"
    vpc_subnet_id: "{{ vpc_subnet_id }}"
    assign_public_ip: true
    group: ['ssh_external']
    exact_count: 1
    count_tag:
      Name: "{{ item.hostname }}"
    instance_tags:
      Name: "{{ item.hostname }}"
      role: "{{ item.role }}"
      environment: "{{ item.environment }}"
    volumes:
      - device_name: /dev/sda1
        volume_size: 30
        device_type: gp2
        delete_on_termination: true
    wait: true
    instance_profile_name: set-internal-dns
    user_data: "{{ lookup('template', 'templates/user_data_route53_dns.yml.j2') }}"
  with_items:
    - hostname: "computer1", 
      fqdn: "computer1.{{ domain }}"
      role: "computation"
      environment: "production"

Ansible expects the user_data property as a string, so we load a template as a string using lookup.

cloud-init has the lowest documentation quality to software usefulness ratio I think I’ve ever seen. In combination with EC2 (and presumably other cloud services?), it allows you to pass in configuration settings, packages to install, files to upload and much more, all through a handy YAML file. But all the useful documentation about the supported settings is completely hidden away or just placeholder text, except for a huge example config.

Our user_data_route53_dns.yml.j2 template file is below. If you’re not using Ansible, the bits in the curly brackets are templated variables being set by the task above.

#cloud-config

# Set the hostname and FQDN
hostname: "{{ item.hostname }}"
fqdn: "{{ item.fqdn }}"
# Set our hostname in /etc/hosts too
manage_etc_hosts: true

# Our script below depends on this:
packages:
  - awscli

# Write a script that executes on every boot and sets a DNS entry pointing to
# this instance. This requires the instance having an appropriate IAM role set,
# so it has permission to perform the changes to Route53.
write_files:
  - content: |
      #!/bin/sh
      FQDN=`hostname -f`
      ZONE_ID="{{ zone_id }}"
      TTL=300
      SELF_META_URL="http://169.254.169.254/latest/meta-data"
      PUBLIC_DNS=$(curl ${SELF_META_URL}/public-hostname 2>/dev/null)

      cat << EOT > /tmp/aws_r53_batch.json
      {
        "Comment": "Assign AWS Public DNS as a CNAME of hostname",
        "Changes": [
          {
            "Action": "UPSERT",
            "ResourceRecordSet": {
              "Name": "${FQDN}.",
              "Type": "CNAME",
              "TTL": ${TTL},
              "ResourceRecords": [
                {
                  "Value": "${PUBLIC_DNS}"
                }
              ]
            }
          }
        ]
      }
      EOT

      aws route53 change-resource-record-sets --hosted-zone-id ${ZONE_ID} --change-batch file:///tmp/aws_r53_batch.json
      rm -f /tmp/aws_r53_batch.json
    path: /var/lib/cloud/scripts/per-boot/set_route53_dns.sh
    permissions: 0755

We’re installing our script into cloud-init’s per-boot scripts rather than anywhere else because I know cloud-init will run it on first boot, after it has been installed. If we put it in rc.d, for example, we’d still have to tell cloud-init to go and run it on first boot, so this is just one less thing to mess up. I’m already feeling pretty bad about writing JSON in a shell script in a YAML file.

When you boot the instance you should be able to tail /var/log/cloud-init-output.log and see a confirmation from the awscli script that the DNS change is pending. It can take 10-60 seconds to become available.

We’re using a CNAME to the EC2 public DNS entry because I still want to use split horizon DNS – if you look that entry up from inside your EC2/VPC network you’ll get the internal IP address.

Computers.