11 March 20 Killing off ‘Click-Ops’: what it is, why it’s problematic, and how to avoid it.
‘Click-Ops’. It sounds like an elite, highly specialised squadron from a sci-fi movie. The reality couldn’t be further from the truth. Click-Ops is slow, error prone, and more than likely costing you money. So, let’s eliminate it.
Before we continue, let’s stick with the sci-fi theme for a minute. (Unless you’re familiar with the basic concept of on-demand cloud computing; in that case, let’s get straight to the nitty gritty.)
A long time ago in a galaxy far, far away… you’d typically use a dedicated server to host your website(s). This is a physical object: a small box in a warehouse somewhere.
Then, we collectively moved to virtualization, or ‘cloud servers’.
Cloud servers are software constructs which allow hosting providers to quickly scale the resources you need by running software. There’s still a small physical box somewhere, but you’ll never need to worry about it. With cloud servers, hosting providers pool physical resources together to sell ‘cloud capacity’, which can be purchased without incurring the cost of configuring the physical box. This is referred to as virtualization.
It’s all about virtualization in the cloud. (via giphy.com)
With virtualization, you can have the capability of multiple boxes in warehouses almost immediately—matched to your evolving needs—without the time and cost of physically connecting and configuring those multiple physical devices. Or, in accounting terms, we’re outsourcing the CAPEX spend associated with hardware procurement, and substituting it for an OPEX spend.
In the wake of virtualization, ‘Infrastructure as a service’ (or IaaS) has become the standard for automating this process. Heavy hitters like Amazon (AWS), Microsoft (Azure), and Google (Google Cloud Platform) all compete to provide platforms to help people, companies, and even governments manage their requirements for automated resourcing.
So how does it work, practically?
For example, let’s say you run a bricks-and-mortar retail store. You could configure your store’s website to automatically align resource capacity with your business hours; scaling up hosting capacity to accommodate high traffic from shoppers during the day, then scaling back to the bare minimum overnight—while most of your customers are asleep—to save money.
Pro tip for anyone looking to recreate the example above: In Amazon, you’d use an ‘Auto Scaling Group’ with scaling policies to achieve this result.
That concludes your crash course in on-demand cloud computing. Now then, what is Click-Ops?
While most on-demand cloud platforms are fast, cost-efficient, and scalable, there is a downside.
The basic, default website interface for hosting providers is a series of screens with clickable options to manually select computing resources. Or, if you picked up on the bold text, ‘Click-Ops’. (Side note: for those more familiar with development and systems terminology, we’re aware that ‘Click-Ops’ is also a tongue-in-cheek subversion of ‘DevOps’.)
Therefore:
‘Click-Ops’
Abstract noun
‘The error-prone and time-consuming process of having people click-through various menu options in cloud providers’ websites, to select and configure the correct automated computing infrastructure.’
Click after click after click after click after click…
While click-through menus are serviceable as a starting point, they go against the ideas of recording documented, versioned, auditable states that we hold as best practice in all other areas of development. Basically, we can—and should—do better.
So, now we’re finally well-versed in the idea of Click-Ops, let’s banish it for eternity.
Learning about a concept only to eradicate it immediately seems wasteful, but no more wasteful than the actual practice of Click-Ops itself. Remember, it’s likely costing you significant time and money. So, without further ado, let’s…
(via giphy.com)
Terraform, created by Hashicorp, is a tool used to automate the process of creating infrastructure with cloud providers.
Basically, you write some code to express a desired state or hosting intention, including the ability to detect failures or availability issues and automatically launch corrective measures.
Terraform will take the code as input, and immediately create the actual infrastructure resources required on different cloud providers. The cloud providers then scale, adapt or configure those resources based on the parameters established by Terraform.
If your hosting environment is a party, think of Terraform as your party planner and cloud providers as your bar tenders and wait staff.
Let’s say you’re expecting 50 guests at a cocktail event, but they’re all scheduled to arrive at different times. Without a plan, you’d have to serve each new guest as they arrive, seriously hampering your ability to be the host with the most. This scenario is analogous to Click-Ops; you’re manually adjusting resources—mixing cocktails, getting ice cubes, briefing wait staff, managing logistics—based on continually evolving changes to demand.
Or, you could provide your party planner (Terraform) with detailed and specific instructions to execute on your behalf: put this much food and drink here initially, then bring out the extra canapés and cocktails when a certain number of guests are in the room, as triggered by attendee names being crossed off a guest list on arrival. Your party planner will then discreetly handle the entire process associated with briefing and managing the bartenders and waitstaff (your cloud hosting provider) to serve your guests (configure relevant infrastructure to appropriately service your digital traffic).
Terraform’s shaking things up in virtualization. (via giphy.com)
For example, let’s say you want two web servers. Write some Terraform code, run it, and BAM! You now have two web servers, fully spun-up in AWS, with all the (usually) manual configuration handled immediately: things like IP addresses, security groups, subnet definitions, and more.
Let’s say you want two hundred web servers. No problem: change the ‘count’ parameter in your existing Terraform code from ‘2’ to ‘200’, and BAM! You’re now likely broke, but you also have 200 web servers ready for action.
This approach, facilitated by Terraform, is commonly referred to as ‘infrastructure-as-code’.
Unfathomable power in the palm of your hand, all thanks to infrastructure-as-code.
(via giphy.com)
But now I’m writing code instead of managing click-through menus. Isn’t that ultimately the same tradeoff?
Not exactly. The infrastructure-as-code approach facilitated by Terraform is infinitely more beneficial than the Click-Ops approach. Here are a few reasons why:
- Create auditable trails for changes to infrastructure: When writing code, any developer worth their salt will use version control to track any changes made to a code base. Likewise, they—or a DevOps team or specialist—will automate the deployment of said code with an audit trail that documents who deployed the change, and when. With Click-ops, there is no such trail for changes to infrastructure. With infrastructure-as-code, this trail is always available and continuously updating.
- Increase transparency and visibility of infrastructure being used, and for what purpose: The code-based approach removes your developer operations ‘oracle’; your single source of truth. If Kevin’s your go-to guy, and Kevin doesn’t come into work for a week, a month, or a year, no problems. Except maybe for Kevin’s wellbeing and whereabouts.
- Improve ability to identify and control ‘state drift’: The code-based approach provides quick, clear comparison of ‘what’s supposed to be’ with ‘what actually is’. For example, if someone accidentally updates your infrastructure environment, it’s a simple process to roll-back to a previous state.
- Increase efficiencies and free up time for other tasks: Remove the error-prone and time-consuming process of having humans click through various menu options in cloud providers’ websites to select and configure the infrastructure items they need. Changes are automated, dependencies are calculated, and errors in configuration can be highlighted.
- Scale resources, immediately and powerfully: The difference between two and 200 virtual web servers may literally be a single count parameter to a piece of code within a Terraform module. Without infrastructure-as-code, countless extra clicks would likely be incurred for a human operator navigating a web menu to initiate a change of this scale.
- Leverage the insight and ability of world-leading organisations: The Terraform Module Registry (see more below) provides pre-written code, thoroughly tested and vetted for best-practice by the people who run the infrastructure. This isn’t a frivolous toy—it’s used by the biggest and brightest organisations on the planet.
Sold? Then let’s get started with some basic infrastructure-as-code tutorials.
Does the idea of writing infrastructure code sound boring? Or beyond your skill set? There’s no need to worry, nor reinvent the wheel.
Terraform offers a module registry. It’s essentially an open-source library of pre-written infrastructure code, thoroughly tested and vetted for best-practice. You can take pre-existing source code from the registry and lightly customise it for your own needs.
The module registry is designed to help you:
- Get started with Terraform quickly.
- See examples of how to write code for Terraform.
- Find pre-made modules for infrastructure components you may need.
In this example, I’ll make use of the module registry to set up some basic AWS resources.
If you’re using another cloud provider beyond Amazon, never fear; the registry has examples for all the major providers, and the process is completely replicable regardless of your chosen provider.
First, having installed Terraform, make a new working directory, and create a file inside called main.tf with the following contents:
##################################################################
provider "aws" {
region = "ap-northeast-2"
}
##################################################################
# Data sources to get default VPC, subnet, and AMI details.
##################################################################
data "aws_vpc" "default" {
default = true
}
data "aws_subnet_ids" "all" {
vpc_id = data.aws_vpc.default.id
}
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = [
"amzn-ami-hvm-*-x86_64-gp2",
]
}
filter {
name = "owner-alias"
values = [
"amazon",
]
}
}
##################################################################
# Here, we are creating a security group using the official module
##################################################################
module "security_group" {
source = "terraform-aws-modules/security-group/aws"
version = "~> 3.0"
name = "example"
description = "Security group for example usage with EC2 instance"
vpc_id = data.aws_vpc.default.id
ingress_cidr_blocks = ["0.0.0.0/0"]
ingress_rules = ["http-80-tcp", "all-icmp"]
egress_rules = ["all-all"]
}
##################################################################
# Here, we are specifying some initial code to run on first boot.
##################################################################
locals {
user_data = <<EOF
#!/bin/bash
yum install -y nginx
echo "<h1>Hello from August! :)</h1>" > /usr/share/nginx/html/index.html
service nginx start
EOF
}
##################################################################
# Here, we create the actual instance using the registry module.
##################################################################
module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"
version = "2.12.0"
instance_count = 1
name = "terraform-created-me"
ami = data.aws_ami.amazon_linux.id
instance_type = "t3.nano"
subnet_id = tolist(data.aws_subnet_ids.all.ids)[0]
vpc_security_group_ids = [module.security_group.this_security_group_id]
associate_public_ip_address = true
user_data_base64 = base64encode(local.user_data)
root_block_device = [
{
volume_type = "gp2"
volume_size = 10
},
]
tags = {
"Env" = "Test"
"Location" = "August"
}
}
output "public_ip" {
value = module.ec2.public_ip
}
Example code adapted from this EC2-instance module.
Awesome. With that taken care of, we can spin up our first resources using Terraform!
First, we need to tell Terraform to initialise and pull in the dependencies and off-the-shelf modules we’ll take advantage of:
terraform init
Pulling in the off-the-shelf modules we’ll use in this example.
Secondly, we should never just blindly deploy infrastructure. Luckily, Terraform allows us to preview the actions we are about to take, so let’s have a look:
terraform plan
Previewing Terraform actions before deployment.
Note the ‘Plan: 5 to add, 0 to change, 0 to destroy.’ line above. Here, Terraform is telling us it will create five new resources if we proceed. If we’re happy with what we see in the preview, it’s go time! Let’s actually deploy our resources:
terraform apply
Deploying our infrastructure resources via Terraform.
If you want to see our amazing new webservers in action now that they’re deployed, we can visit the public IP address of the instance we’ve just created. This is output at the end of the apply command.
Finally to tear it all down, and clean up after ourselves (which is a must to avoid further Amazon billing), type:
terraform destroy
Destroying our previously deployed infrastructure.
And we’ve only just barely scratched the surface! The transition from your first webserver to an entire digital ecosystem may only involve a few Terraform modules and commands.
The real fun begins when you can piece together different modules—databases, load balancers, caching layers—to create something much greater than the sum of its parts, and really add value for projects, businesses, and organisations.
I hope this has highlighted some of the many benefits associated with treating infrastructure as code, and I hope you join me in killing Click-Ops once and for all!