Monitoring Airflow Metrics

By Hiren Rupchandani & Mukesh Kumar

Table of Contents

1. Prerequisites

2. StatsD

3. Prometheus

4. Grafana

5. GitHub Repository

  • Monitoring metrics of data pipelines can be tricky. In Airflow, we need to check between web server UI, the python code, the DAG logs, and some other monitoring tools.
  • In this article, we will explore a monitoring system consisting of StatsD, Prometheus, and Grafana.
  • Airflow exposes metrics such as DAG bag size, number of currently running tasks, and task duration time, every moment the cluster is running.
  • While we set them up, we will see what each of them specializes in.
  • You can find a list of all the different metrics exposed, along with descriptions, in the official Airflow documentation.
  • You need to have Docker installed on your system before proceeding with the following steps.
  • We have Docker Desktop installed with a WSL2 backend, so you can proceed with the same.

Let’s get started…

1. StatsD

  • StatsD is a widely used service for collecting and aggregating metrics from various sources.
  • Airflow has built-in support for sending metrics into the StatsD server.
  • Once configured, Airflow will then push metrics to the StatsD server and we will be able to visualize them.
  • You need to open the airflow.cfg file and search for statsd.
  • You will see the following variables:
[metrics]
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
  • Set statsd_on =True. If you wish to change the port where we can listen to the metrics, you can do so.
  • Now run a DAG in airflow and in a different terminal, type the following command:
nc -l -u localhost 8125
  • This command will basically return the airflow metrics that are exposed and caught by statsd.
  • The output will be continuous since metrics are collected constantly after specific intervals.
Metrics scraping using StatsD
  • We will now use a statsd exporter to send these metrics to Prometheus.
  • Prometheus is a popular solution for storing metrics and alerting.
  • Because it is typically used to collect metrics from other sources, like RDBMSes and webservers, we will use Prometheus as the main storage for our metrics.
  • Because Airflow doesn’t have an integration with Prometheus, we’ll use Prometheus StatsD Exporter to collect metrics and transform them into a Prometheus-readable format.
  • It bridges the gap between StatsD and Prometheus by translating StatsD metrics into Prometheus metrics via configured mapping rules.
Source
  • To set up statsd-exporter, you need to write two files: “prometheus.yml” and “mapping.yml”, you can find those files on our GitHub repository linked below.
  • You need to store these files inside a new folder named as “.prometheus” inside your airflow directory.
  • After you have copied these two files inside the folder, type the following command on a Ubuntu terminal:
docker run --name=prom-statsd-exporter \
-p 9123:9102 \
-p 8125:8125/udp \
-v $PWD/mapping.yml:/tmp/mapping.yml \
prom/statsd-exporter \
--statsd.mapping-config=/tmp/mapping.yml \
--statsd.listen-udp=:8125 \
--web.listen-address=:9102
  • If you see the following line in the output, you are good to go:
level=info ts=2021-10-04T04:38:30.408Z caller=main.go:358 msg="Accepting Prometheus Requests" addr=:9102
  • Keep this terminal open and type a new command in a new Ubuntu terminal:
docker run --name=prometheus \
-p 9090:9090 \
-v $PWD/prometheus.yml:prometheus.yml \
prom/prometheus \
--config.file=prometheus.yml \
--log.level=debug \
--web.listen-address=:9090 \
  • Your prometheus has been set up successfully if you see the following line in your output:
level=info ts=2021-10-02T21:09:59.717Z caller=main.go:794 msg="Server is ready to receive web requests."
  • Grafana is our preferred metrics visualization tool.
  • It has native Prometheus support and we will use it to set up our Airflow Cluster Monitoring Dashboard.
  • You can see here all vital metrics: like scheduler heartbeat, dagbag size, queued/running tasks count, currently running DAGs aggregated by tasks, etc.
  • You can see DAG-related metrics: success DAG run duration, failed DAG run duration, DAG run dependency check time, and DAG run schedule delay.
  • To set up Grafana on your system, type the following command in your Ubuntu terminal:
docker run -d --name=grafana -p 3000:3000 grafana/grafana
  • After successful setup, you can go to https://localhost:3000/ to access the grafana dashboard.
  • On the dashboard, click on icon below the + icon to create a new data-source, and select Prometheus data source and in the URL section, type:
host.docket.internal:9090
  • Now, create a new dashboard and assign it any name. You will be directed to a Edit Panel.
New Dashboard
Edit Panel
  • In the Metrics Browser, you can type which metrics you want to monitor. We will check the scheduler heartbeat, so we will type the following command:
rate(airflow{host="airflowStatsD", instance="host.docker.internal:9123", job="airflow", metric="scheduler_heartbeat"}[1m])
  • We will see that a plot (time-series in our case)is being generated for the above command, and the plot will be updated after every minute (indicated by 1m).
Metrics Browser Command
  • In the panel options, you can also format your plot by assigning X-axis name and title for the plot.
Metric Plot given a title and axis name
  • Congratulations! You have successfully set up a monitoring system for Airflow using StatsD, Prometheus, and Grafana.

GitHub Repository

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!