Kubernetes Components with Terraform & Notes

Kubernetes is a production-ready open-source system for running & orchestrating containers. It manages instances, self-heals, adds abstractions to make networking a breeze and allows multiple logical apps to run on the same logical deployment.

Components of the System

Worker nodes can have multiple (100s) pods running. Each node has processes containter runtime, kublet and kube-proxy running. without these runtimes, it will fail - the containter runtime runtime runs any number of Container images within the Pod, such as docker or podman images. kublet interacts with the Containers and the Kubernetes master Pods needed, kube-proxy is responsible for sending requests to other services without impacting network performance.

The Master node has similar; 4 services must be running. api server, which is responsible for interacting with the outside world and receiving the metadata changes including validation of requests. the scheduler app which is responsible for actual placement of jobs & pods. It interacts with the kubelet on nodes asking them to start the jobs. controller manager is responsible for understanding the existing layout of the state of the cluster, understanding which jobs have failed or need restarting, i.e. are out outside the resource definition bounds, and sends requests to the scheduler to get the resources spun back up. etcd is a key-value store which acts as the 'brain' of the cluster, interacting with the other services to provide them the information needed to run. It is the stateful part of the k8s cluster.

Resource Types

Below is the major components of an application running on kubernetes. There are many more types of resources k8s provides but these are the major ones. We will be creating an example deployment usingthe Terraform kubernetes provider.

Its worth noting that all resource definitions in a k8s deployment have a metadata section and a spec section. The metadata is where you can describe information about the resource like name, tags and namespace. The spec is varied for each resource and is worth reading up on the specifics in the k8s docs.

A note on tagging: Tags are the way k8s interacts with itself, if a Deployment has a set of tags on its pods matching that of an existing Deployment then the new Deployment will interact with those tagged pods, potentially overwriting the original pods.

Deployment & Stateful Set

These components are for deploying replicas of images. Every deployment consists of a collection of pods, which in turn are collections of containers. Normally a Pod is a logical deployment of your application and has a single container, unless there are required service sidecars such as metric reporters and the like. The control plane schedules the Pods to run on individual Nodes. After the initial apply the control plane manages the Pod instances and makes sure they are up-to-date, ensuring a healthcheck and self-healing when nodes go down.

Other concepts that are defined alongside the Pod include Container images port exposures & update strategy. Each section in the docs for the Deployment and Stateful Set will describe what can be placed in the spec and how it effects the resource.

The difference between a stateful set and a deployment are to do with how the pods interact with volumes via volume claims, this blog post is much more in depth. There is also a DaemonSet object which enables a single Pod to be run on each of the nodes in a cluster, rather than defining application availability it is used more as a cluster management tool.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
locals {
  whoamiLabelVal    = "whoamiExample"
  whoamiAppLabelMap = {
    appname = local.whoamiLabelVal
  }
}

resource "kubernetes_deployment" "whoami_example" {
  metadata {
    name   = "whoami"
    labels = local.whoamiAppLabelMap
  }
  spec {
    replicas = 5
    selector {
      match_labels = local.whoamiAppLabelMap
    }
    template {
      metadata {
        labels = local.whoamiAppLabelMap
      }
      spec {
        container {
          image = "traefik/whoami"
          name  = "whoami-name"
          port {
            container_port = 80
          }
        }
      }
    }
  }
}

Service & Ingress

Services are ways to link together Deployment pods under a single banner. These pods have an IP address which is likely to change, a service binds these under a single Service IP address which balances requests towards the underlying pods. To know which pods to forward the requests to it leverages a selector, a collection of label pairs, and will forward to any pods which match all the labels in the selector. A Service also has a ports section which will define which port to listen and forward requests to. There is a tertiary resource called an Endpoint which is keeps track of the pods matching the tags. The Endpoint is usually managed for you.

Services can also optionally have multiple ports exposed. In this event we must name the Service ports, and can route requests to another port that exists on the matching pods. For example a metrics collector which will collect information about the main Container and expose this data in an aggregate form.

There are 4 types of service, structured in layers:

Cluster IP: The default and base type. Nothing more than the above. However, a Headless Cluster IP is when you want to talk to an individual pod as a service, rather than load balancing. This may be used when needing to talk to asymmetric replicas.
Node Port: This service builds on the Cluster IP and adds a port binding to the node itself and across the cluster internally, allowing access to the nodes IP address at the port defined in the service from outside the cluster. This is similar to exposing a docker image to the host IP but across all nodes.
Load Balancer: This is another extension of the Node Port Service which provides and externally facing load balancer on the cluster which routes to the same endpoints as the Cluster IP Service. This is similar to the Ingres but doesn't have the same feature set.

Ultimately as you move up the service layers you get more and more exposed to the outside world. Thankfully there is another way to create interactions, especially if it is for public access and not for other internal consumption. An Ingress is a layer that sits on the boundary to the outside world and the cluster.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
resource "kubernetes_service" "whoami_service" {
  metadata {
    name   = "whoami"
    labels = kubernetes_deployment.whoami_example.metadata[0].labels
  }
  spec {
    #    type     = "LoadBalancer"
    selector = kubernetes_deployment.whoami_example.spec[0].selector[0].match_labels
    port {
      port        = 8080
      target_port = kubernetes_deployment.whoami_example.spec[0].template[0].spec[0].container[0].port[0].container_port
    }
  }
}

A note on external to internal traffic: Due to how k8s is inherently self-healing & IPs are constantly changing, to have a nice externally facing IP on a nice port number it's a good idea to have any domain associated IP outside the cluster as a dedicated proxy. This is where an Ingress allows domain routing rules on cluster Services whether an internal service Cluster IP or an external Load Balancer

When routing traffic into a cluster for testing there is an easy way to create an ingres loadbalancer by just port-forwarding requests to an internal service such as the above. By running kubectl port-forward svc/whoami 80:8080 all requests to localhost:80 are forwarded to the services port, allowing connection as if it were a load balancing service.

In a production setting you can use an Ingress; a service specifically spun up to take your config and route traffic to the correct resource pods. As per the docs this is a relatively complicated matter & can enable things like TLS Certs and path transforms, depending on your provider.

It is recommended to be by a friend that, unless your business is networking, you should go with a more cloud specific approach when designing public ingres to your cluster. This means leveraging a Load Balancer type Service for each logical internal service you might spin up & then point the clouds native load balancer to the Services IP. This allows flexibility to not need to manage that specific networking piece of the stack and allow the potentially more powerful service to manage spikes in traffic, attacks and isolation.

ConfigMap & Secrets

ConfigMaps & Secrets are a way of defining variables that might change the runtime of a piece of code. By updating the data in these resources k8s will change the underlying data on the fly. To refresh the Deployment with the new data, assuming they correct mappings are made in the definitions, is to restart the Pods.

1
2
3
4
5
6
7
8
9
10
11
12
resource "kubernetes_config_map" "whoami" {
  immutable = true
  metadata {
    name      = "whoami"
    labels    = local.whoamiAppLabelMap
    namespace = kubernetes_namespace.whoami_ns.metadata[0].name
  }
  data = {
    TEST_DATA   = 1
    WHOAMI_NAME = "a name from config map"
  }
}

In terraform the data can be applied to pods by adding the below to the Deployments definition. This will read in and apply all the values of the config map to the containers

1
2
3
4
5
6
7
8
9
container {
  # ...
  env_from {
    config_map_ref {
      name = kubernetes_config_map.whoami.metadata[0].name
    }
  }
  # ...
}

Similar for Secrets, but creating Secrets there is a level of encryption applied to maintain the safety of the data inside. It has the same form as the ConfigMap aboveTo apply this to a Deployment just add the below

1
2
3
4
5
6
7
8
9
container {
  # ...
  env_from {
    secret_ref {
      name = kubernetes_secret.whoami.metadata[0].name
    }
  }
  # ...
}

In my opinion the best way to manage changes to a ConfigMap or Secret is to consider them always immutable. If an update must be made you should swap over to a new version of the resource and force a restart of the Pods.

Volume

RO Volumes are simple and are useful for mounting things like certificates, secret files or other useful bits of info. Typically, the entry point is already some defined information already stored as a ConfigMap or Secret. With this information you can create volume mounts with the following additions to the Pod & Container definitions within the Deployment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
#...
volume {
  name = "imasecret"
  secret {
    secret_name = kubernetes_secret.whoami.metadata[0].name
  }
}
#...
container {
  #  # ...
  volume_mount {
    mount_path = "/secrets"
    name       = "pod-mount-name"
    read_only  = true
  }
  #  # ...
}
#...

A note on Pods & Containers: the Pod is the Kubernetes concept, and the Container just a runtime based on docker, podman or whatever. When mounting volumes it must first be associated with the Pod, then the Container . Its, like hotel - you must let it into the outer room (lobby) first before allowing it into the inner rooms.

RW Volumes are less important with regular Deployments then Stateful Sets, typically you should only use them if you need to persist something locally as a cache or as a helper to some process. Ideally applications should be stateless & Stateful applications should be a dependant cog in the wheel. If your application does do some sort of persistence in a distributed manner then read & write replicas should be leveraged to ensure data is correct.