Monitoring Airflow Metrics

Table of Contents

  • Monitoring metrics of data pipelines can be tricky. In Airflow, we need to check between web server UI, the python code, the DAG logs, and some other monitoring tools.
  • In this article, we will explore a monitoring system consisting of StatsD, Prometheus, and Grafana.
  • Airflow exposes metrics such as DAG bag size, number of currently running tasks, and task duration time, every moment the cluster is running.
  • While we set them up, we will see what each of them specializes in.
  • You can find a list of all the different metrics exposed, along with descriptions, in the official Airflow documentation.

Prerequisites

  • You need to have Docker installed on your system before proceeding with the following steps.
  • We have Docker Desktop installed with a WSL2 backend, so you can proceed with the same.
  • StatsD is a widely used service for collecting and aggregating metrics from various sources.
  • Airflow has built-in support for sending metrics into the StatsD server.
  • Once configured, Airflow will then push metrics to the StatsD server and we will be able to visualize them.
  • You need to open the airflow.cfg file and search for statsd.
  • You will see the following variables:
[metrics]
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
  • Set statsd_on =True. If you wish to change the port where we can listen to the metrics, you can do so.
  • Now run a DAG in airflow and in a different terminal, type the following command:
nc -l -u localhost 8125
  • This command will basically return the airflow metrics that are exposed and caught by statsd.
  • The output will be continuous since metrics are collected constantly after specific intervals.
Metrics scraping using StatsD
  • We will now use a statsd exporter to send these metrics to Prometheus.

2. Prometheus

  • Prometheus is a popular solution for storing metrics and alerting.
  • Because it is typically used to collect metrics from other sources, like RDBMSes and webservers, we will use Prometheus as the main storage for our metrics.
  • Because Airflow doesn’t have an integration with Prometheus, we’ll use Prometheus StatsD Exporter to collect metrics and transform them into a Prometheus-readable format.
  • It bridges the gap between StatsD and Prometheus by translating StatsD metrics into Prometheus metrics via configured mapping rules.
Source
  • To set up statsd-exporter, you need to write two files: “prometheus.yml” and “mapping.yml”, you can find those files on our GitHub repository linked below.
  • You need to store these files inside a new folder named as “.prometheus” inside your airflow directory.
  • After you have copied these two files inside the folder, type the following command on a Ubuntu terminal:
docker run --name=prom-statsd-exporter \
-p 9123:9102 \
-p 8125:8125/udp \
-v $PWD/mapping.yml:/tmp/mapping.yml \
prom/statsd-exporter \
--statsd.mapping-config=/tmp/mapping.yml \
--statsd.listen-udp=:8125 \
--web.listen-address=:9102
  • If you see the following line in the output, you are good to go:
level=info ts=2021-10-04T04:38:30.408Z caller=main.go:358 msg="Accepting Prometheus Requests" addr=:9102
  • Keep this terminal open and type a new command in a new Ubuntu terminal:
docker run --name=prometheus \
-p 9090:9090 \
-v $PWD/prometheus.yml:prometheus.yml \
prom/prometheus \
--config.file=prometheus.yml \
--log.level=debug \
--web.listen-address=:9090 \
  • Your prometheus has been set up successfully if you see the following line in your output:
level=info ts=2021-10-02T21:09:59.717Z caller=main.go:794 msg="Server is ready to receive web requests."

3. Grafana

  • Grafana is our preferred metrics visualization tool.
  • It has native Prometheus support and we will use it to set up our Airflow Cluster Monitoring Dashboard.
  • You can see here all vital metrics: like scheduler heartbeat, dagbag size, queued/running tasks count, currently running DAGs aggregated by tasks, etc.
  • You can see DAG-related metrics: success DAG run duration, failed DAG run duration, DAG run dependency check time, and DAG run schedule delay.
  • To set up Grafana on your system, type the following command in your Ubuntu terminal:
docker run -d --name=grafana -p 3000:3000 grafana/grafana
  • After successful setup, you can go to https://localhost:3000/ to access the grafana dashboard.
  • On the dashboard, click on icon below the + icon to create a new data-source, and select Prometheus data source and in the URL section, type:
host.docket.internal:9090
  • Now, create a new dashboard and assign it any name. You will be directed to a Edit Panel.
New Dashboard
Edit Panel
  • In the Metrics Browser, you can type which metrics you want to monitor. We will check the scheduler heartbeat, so we will type the following command:
rate(airflow{host="airflowStatsD", instance="host.docker.internal:9123", job="airflow", metric="scheduler_heartbeat"}[1m])
  • We will see that a plot (time-series in our case)is being generated for the above command, and the plot will be updated after every minute (indicated by 1m).
Metrics Browser Command
  • In the panel options, you can also format your plot by assigning X-axis name and title for the plot.
Metric Plot given a title and axis name
  • Congratulations! You have successfully set up a monitoring system for Airflow using StatsD, Prometheus, and Grafana.

GitHub Repository

--

--

--

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Linked Lists — slow and fast

READ/DOWNLOAD%* Technology In Action Complete FULL

Connect Amazon Redshift from SQL Workbench using JDBC Driver

Setting up the Sleeping Guard!

Building an intelligent voice controlled mirror

Bringing the pub quiz to the digital world

fix `window is not defined` for server-side render react-rails with webpacker split chunks in Rails

OVH Fire — Time To Activate Your Disaster Recovery Plan

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Setting up Apache Airflow on Kubernetes with GitSync

Applied Apache Airflow- Pros/Cons

Apache Airflow components with Open Source Technologies

Airflow factory pattern for DAG and Kubernetes pod operator

Setup Mini Data Lake and Platform on M1 Mac — Part 6