Tumblr Engineering — Open Sourcing our Kubernetes Tools

1.5M ratings
277k ratings

See, that’s what the app is perfect for.

Sounds perfect Wahhhh, I don’t wanna

Open Sourcing our Kubernetes Tools

At Tumblr, we are avid fans of Kubernetes. We have been using Kubernetes for all manner of workloads, like critical-path web requests handling for tumblr.com, background task executions like sending queued posts and push notifications, and scheduled jobs for spam detection and content moderation. Throughout our journey to move our 11 year old (almost 12! 🎂) platform to a container-native architecture, we have made innumerable changes to how our applications are designed and run. Inspired by a lot of existing Kubernetes APIs and best practices, we’re excited to share with the community some of the tools we’ve developed at Tumblr as our infrastructure has evolved to work with Kubernetes.

To help us integrate Kubernetes into our workflows, we have built a handful of tools of which we are open-sourcing three today! Each tool is a small, focused utility, designed to solve specific integration needs Tumblr had while migrating our workflows to Kubernetes. The tools were built to handle our needs internally, but we believe they are useful to the wider Kubernetes community.

k8s-sidecar-injector

Any company that has containerized an application as large and complex as Tumblr knows that it requires a tremendous amount of effort. Applications don’t become container-native overnight, and sidecars can be useful to help emulate older deployments with colocated services on physical hosts or VMs. To reduce the amount of fragile copy-paste code by developers adding in sidecars to their Deployments and CronJobs, we created a service to dynamically inject sidecars, volumes, and environment data into pods as they are launched.

The k8s-sidecar-injector listens to the Kubernetes API for Pod launches that contain annotations requesting a specific sidecar to be injected. For example, the annotation injector.tumblr.com/request=sidecar-prod-v1 will add any environment variables, volumes, and containers defined in the sidecar-prod-v1 configuration. We use this to add sidecars like logging and metrics daemons, cluster-wide environment variables like DATACENTER and HTTP_PROXY settings, and volumes for shared configuration data. By centralizing configuration of sidecars, we were able to reduce complexity in CronJobs and Deployments by hundreds of lines, eliminated copy-paste errors, and made rolling out updates to shared components in our sidecars effortless.

An example sidecar ConfigMap is below, which adds a logging container, a volume from a logger-config ConfigMap, and some environment variables into the Pod.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: example-sidecars
  namespace: kube-system
  labels
    app: k8s-sidecar-injector
data:
  logger-v1: |
    name: logger-v1
    containers:
    - name: logger
      image: some/logger:2.2.3
      imagePullPolicy: IfNotPresent
      ports:
      - containerPort: 8888
      volumeMounts:
      - name: logger-conf
        mountPath: /etc/logger
    volumes:
    - name: logger-conf
      configMap:
        name: logger-config
    env:
    - name: DATACENTER
      value: dc01
    - name: HTTP_PROXY
      value: http://my-proxy.org:8080/
    - name: HTTPS_PROXY
      value: http://my-proxy.org:8080/

This configuration will add the logger container into each pod with the annotation injector.tumblr.com/request: logger-v1, with a ConfigMap projected as a volume in /etc/logger. Additionally, every container in the Pod will get the DATACENTER=dc01 and HTTP_PROXY environment variables added, if they were not already set. This has allowed us to drastically reduce our boilerplate configuration when containerizing legacy applications that require a complex sidecar configuration.

k8s-config-projector

Internally, we have many types of configuration data that is needed by a variety of applications. We store canonical settings data like feature flags, lists of hosts/IPs+ports, and application settings in git. This allows automated generation/manipulation of these settings by bots, cron jobs, Collins, and humans alike. Applications want to know about some subset of this configuration data, and they want to be informed when this data changes as quickly as possible. Kubernetes provides the ConfigMap resource, which enables users to provide their service with configuration data and update the data in running pods without requiring a redeployment. We wanted to use this to configure our services and jobs in a Kubernetes-native manner, but needed a way to bridge the gap between our canonical configuration store (git repo of config files) to ConfigMaps. Thus, was k8s-config-projector born.

The Config Projector (github.com/tumblr/k8s-config-projector)[github.com/tumblr/k8s-config-projector] is a command line tool, meant to be run by CI processes. It combines a git repo hosting configuration data (feature flags, lists of hostnames+ports, application settings) with a set of “projection manifest” files that describe how to group/extract settings from the config repo and transmute them into ConfigMaps. The config projector allows developers to encode a set of configuration data the application needs to run into a projection manifest. As the configuration data changes in the git repository, CI will run the projector, projecting and deploying new ConfigMaps containing this updated data, without needing the application to be redeployed. Projection datasources can handle both structured and unstructured configuration files (YAML, JSON, and raw text/binary).

An example projection manifest is below, describing how a fictitious notification application could request some configuration data that may dynamically change (memcached hosts, log level, launch flags, etc):

---
name: notifications-us-east-1-production
namespace: notification-production
data:
# extract some fields from JSON
- source: generated/us-east-1/production/config.json
  output_file: config.json
  field_extraction:
  - memcached_hosts: $.memcached.notifications.production.hosts
  - settings: $.applications.notification.production.settings
  - datacenter: $.datacenter
  - environment: $.environment
# extract a scalar value from a YAML
- source: apps/us-east-1/production/notification.yaml
  output_file: launch_flags
  extract: $.launch_flags

After processing by the config projector, the following ConfigMap is generated, which can then be posted to a Kubernetes cluster with kubectl create -f <generatedfile>.

kind: ConfigMap
apiVersion: v1
metadata
  name: notifications-us-east-1-production
  namespace: notification-production
  labels:
    tumblr.com/config-version: "1539778254"
    tumblr.com/managed-configmap: "true"
  data:
    config.json: |
      {
        "memcached_hosts": ["2.3.4.5:11211","4.5.6.7:11211","6.7.8.9:11211"],
        "settings": {
          "debug": false,
          "buffer": "2000",
          "flavor": "out of control",
          "log_level": "INFO",
        },
        "datacenter": "us-east-1",
        "environment": "production"
      }
    launch_flags: "-Xmx5g -Dsun.net.inetaddr.ttl=10"

With this tool, we have enabled our applications running in kubernetes to receive dynamic configuration updates without requiring container rebuilds or deployments. More examples can be found here.

k8s-secret-projector

Similar to our configuration repository, we store secure credentials in access controlled vaults, divided by production levels. We wanted to enable developers to request access to subsets of credentials for a given application without needing to grant the user access to the secrets themselves. Additionally, we wanted to make certificate and password rotation transparent to all applications, enabling us to rotate credentials in an application-agnostic manner, without needing to redeploy applications. Lastly, we wanted to introduce a mechanism where application developers would explicitly describe which credentials their services need, and enable a framework to audit and grant permissions for a service to consume a secret.

The k8s-secret-projector operates similarly to the k8s-config-projector, albeit with a few differences. The secret projector combines a repository of projection manifests with a set of credential repositories. A Continuous Integration (CI) tool like Jenkins will run the k8s-secret-projector against any changes in the projection manifests repository to generate new Kubernetes Secret YAML files. Then, Continuous Deployment can deploy the generated and validated Secret files to any number of Kubernetes clusters.

Take this file in the production credentials repository, named aws/credentials.json:

{
 "us-east-1": {
   "region": "us-east-1",
   "aws": {
     "key": "somethignSekri7T!",
   },
   "s3": {
     "key": "passW0rD!",
   },
   "redshift": {
     "key": "ello0liv3r!",
     "database": "mydatabase"
   }
 },
 "us-west-2": {
   "region": "us-west-2",
   "aws": {
     "key": "anotherPasswr09d!",
   },
   "s3": {
     "key": "sueprSekur#",
   }
 }
}

We need to create an amazon.yaml configuration file containing the s3.key and aws.key for us-east-1, as well as a text file containing our region. The projection manifest below will extract only the fields we need, and output them in the format desired.

name: aws-credentials
namespace: myteam
repo: production
data:
# create an amazon.yaml config with the secrets we care about
- name: amazon.yaml
  source:
    format: yaml
    json: aws/credentials.json
    jsonpaths:
      s3: $.us-east-1.s3.key
      aws: $.us-east-1.aws.key
      region: $.us-east-1.region
# create a item containing just the name of the region we are in
- name: region
  source:
    json: aws/credentials.json
    jsonpath: $.us-east-1.region

Projecting this manifest with the above credentials results in the following Kubernetes Secret YAML file:

apiVersion: v1
kind: Secret
metadata:
  labels:
    tumblr.com/managed-secret: "true"
    tumblr.com/secret-version: master-741-7459d1abcc120
  name: aws-credentials
  namespace: myteam
data:
  region: dXMtZWFzdC0x
  # region decoded for clarity: us-east-1
  amazon.yaml: LS0tCnMzOiAicGFzc1cwckQhIgphd3M6ICJzb21ldGhpZ25TZWtyaTdUISIKcmVnaW9uOiB1cy1lYXN0LTEK
  # amazon.yaml decoded for clarity:
  # ---
  # s3: "passW0rD!"
  # aws: "somethignSekri7T!"
  # region: us-east-1

In addition to being able to extract fields from structured YAML and JSON sources, we gave it the ability to encrypt generated Secrets before they touch disk. This allows Secrets to be deployed in shared Kubernetes environments, where users are colocated with other users, and do not feel comfortable with their Secret resources being unencrypted in etcd. Please note, this requires decryption by your applications before use. More details on how the encryption modules work can be found here.

For more examples of how to use this, check out examples here!

What’s Next

We are excited to share these tools with the Kubernetes open source community, and we hope they can help your organization adopt container-native thinking when managing application lifecycle like they helped Tumblr. Feature enhancements and bug fixes are welcome! And, shameless plug: if you are interested in Kubernetes, containerization technology, open source, and scaling a massive website with industry leading technologies and practices? Come join us!.

- @pipefail

engineering kubernetes open source

See more posts like this on Tumblr

#engineering #open source #kubernetes