Airflow Error Tracking using Sentry

By Hiren Rupchandani & Mukesh Kumar

Table of Contents

1. Install Sentry dependency for Airflow

2. Set up Your Account on sentry.io

3. Setup airflow.cfg

4. DAG File

5. Error Tracking using Sentry

6. What’s Next?

Sentry.io is an open-source full-stack error tracking sysem that supports a wide range of servers, browsers, desktops, and mobile languages and frameworks including PHP, Node.js, Python, etc.

The system is used by Dropbox, AirBnB, PayPal, Uber, Reddit, Mozilla, Slack, and Microsoft to monitor thousands of applications. While it is used by a wide range of sectors, it is seeing continued growth in gaming and streaming media, and new demand in industries and services, like finance, commerce, and healthcare.

Airflow can be set up to send errors to Sentry and it is a very simple installation with few steps:

Install sentry dependency for airflow:

  • You can install sentry’s dependency in your virtual environment using the following command:
pip install ‘apache-airflow[sentry]’

Set Up your Account on sentry.io.

  • While the installation finishes, go to sentry.io and sign up for a free account.
  • After signing up, create a new project, give that project any name, like “airflow_error_mgmt” select any team name, and select Python as the language.
New Project in Sentry
  • We will be then taken to a page that has a block of code like this:
DSN Key partially hidden from our side
  • Copy the DSN key given in the block of code.

Set Up airflow.cfg

  • Next, in the airflow.cfg file in the airflow directory, search for sentry and we will see two parameters:
  1. sentry_on
  2. sentry_dsn

Set the sentry_on parameter as True and paste the DSN key for the sentry_dsn parameter:

airflow.cfg file
  • So, sentry has been set up on our system.
  • Now, to make sure that Sentry is working well with airflow, we need some errors.

DAG File

  • Let’ create a simple error by commenting out any of the import statements (don’t comment out the import DAG statement, your DAG file will never be recognized by the scheduler)
  • In the given file, we have commented out the import statement of the Python Operator.

Error Tracking using Sentry

  • Now, activate the airflow scheduler, webserver, and workers. The webserver will throw an error on the CLI and UI.
Webserver Error
  • If we go to sentry’s dashboard and click on the issues tab, we should see that sentry has caught an error.
Sentry Dashboard reporting Error
  • It also mentions the error that has occurred, the part of the code that caused the error, and the number of events that have taken place.
  • You will also receive an email on your registered email account for sentry.
Sentry Email Error Reporting
  • So Sentry is useful for remote tracking of your data pipelines and other systems.
  • You can also explore the various elements provided by sentry to troubleshoot and test your pipeline.

Congratulations! Sentry has been set up to monitor errors in airflow.

What’s Next?

Monitoring Airflow Metrics

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!