Kubernetes Executor in Airflow

Table of Contents

  • Need to set-up extra infrastructure like RabbitMQ/Redis and Flower.
  • Need to manage dependencies for Celery, RabbitMQ/Redis, and Flower.
  • Airflow workers stay idle when there is no workload, so it leads to wastage of resources.
  • Worker nodes are not as resilient as you think.
  • Runs the tasks in a Kubernetes cluster
  • Each task runs in it’s own pod
  • Expands and shrinks the cluster according to workload. So we can scale it down to zero as well.
  • Scheduler subscribes to the Kubernetes API so communication is possible between them.

Prerequisites

  • You should have Docker installed on your system. Docker does not work on Windows 7, 8, 8.1, and 10 Home Edition.
  • Although, you can use WSL2 based Docker on the Windows 10 Home Edition, it is preferred to have the Professional Edition.
  • It is expected that you have Windows 10 Professional.
  • Chocolatey, a package installer for Windows.
  • You need a storage of more than 15GB on your system, to create a PersistentVolumne and keep other relevant files.
  • Some decent knowledge of Docker, Kubernetes, Containers, and Pods.

Process

  • We will install Kubernetes, kubectl (CLI for Kubernetes), and Helm, which is a package manager for Kubernetes (think of Helm like apt commands in Ubuntu).
  • You can refer to Kubernetes installation at this link and this link.
  • You can refer to Helm installation at this link.
  • To ease the airflow installation for beginners, we are using a GitHub repository which you can download from here. (Credits given in the end)
  • We will then install airflow with Kubernetes executor in the Kubernetes environment using Helm.
  • This installation will also create a volume to store your DAGs.

Let’s Get Started…

Install Kubernetes and Helm

  • First step is to install Kubernetes using Minikube. To do so, open Powershell as an administrator and type the following command:
choco install minikube -y
  • Once the installation is complete, you can initialize a local cluster using:
minikube start
  • You can get the node info using this command:
kubectl get nodes
OUTPUT:
NAME STATUS ROLES AGE VERSION
minikube Ready master 2d
  • Now we can install Helm using:
choco install kubernetes-helm
  • Helm will allow us to properly install complex packages with various dependencies like Apache Airflow inside a Kubernetes cluster.

Configure and Install Airflow

  • You need to download the repository mentioned here and extract the relevant files at an appropriate location like C:/Users/username/Documents. Note this path as absolute path.
  • Now, open the chapter2/airflow-helm-config-kubernetes-executor.yaml file and change the path on line 22 to the absolute path. It should look something like this:
path: "/Users/username/Documents/etl-series/dags"
  • The configuration basically creates a Volume at the given path and mounts this volume to Airflow Scheduler, Webserver, and Workers.
  • We can now write dags on our local machine and let Airflow running inside Kubernetes pick it up from there.
  • Now, we can install Airflow with the following command:
helm install airflow stable/airflow -f chapter2/airflow-helm-config-kubernetes-executor.yaml --version 7.2.0
  • The installation status can be checked using:
helm list
  • Once deployed, you can type the following command:
export POD_NAME=$(kubectl get pods --namespace default -l "component=web,app=airflow" -o jsonpath="{.items[0].metadata.name}")
echo
http://127.0.0.1:8080
kubectl port-forward --namespace default $POD_NAME 8080:8080

DAGRun

kubectl get pods
  • The output will show the various pods that are spawned along with their name, status, and age.
  • When you run the DAG, you will observe that Airflow schedules a pod that begins with dagthatexecutesviak8sexecutor.
  • This pod, in return, starts another pod to execute the actual tasks defined in that DAG using the KubernetesPodOperator. Notice pods that begins with dagthatexecutesviakubernetespodoperator.

References

What’s Next?

--

--

--

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

iOS and watchOS — A comparison for beginners

Improving performance in Spark jobs

Light painting photos : https://www.instagram.com/ccarreno_photos/

Python’s super() method

[Solved] Advanced Cash On Delivery App By Shopify

Morse code is alive

Understanding Kubernetes Networking — Part 4

Iframe Performance Part 2: The Good News

Making API calls on Adobe I/O | Adobe Campaign Standard

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Setting up Apache Airflow on Kubernetes with GitSync

Airflow factory pattern for DAG and Kubernetes pod operator

Apache Airflow migration journey from self-hosted to AWS Managed Airflow

Applied Apache Airflow- Pros/Cons

Apache Airflow components with Open Source Technologies