Airflow Error Tracking using Sentry
Table of Contents
4. DAG File
6. What’s Next?
Sentry.io is an open-source full-stack error tracking sysem that supports a wide range of servers, browsers, desktops, and mobile languages and frameworks including PHP, Node.js, Python, etc.
The system is used by Dropbox, AirBnB, PayPal, Uber, Reddit, Mozilla, Slack, and Microsoft to monitor thousands of applications. While it is used by a wide range of sectors, it is seeing continued growth in gaming and streaming media, and new demand in industries and services, like finance, commerce, and healthcare.
Airflow can be set up to send errors to Sentry and it is a very simple installation with few steps:
Install sentry dependency for airflow:
- You can install sentry’s dependency in your virtual environment using the following command:
pip install ‘apache-airflow[sentry]’
Set Up your Account on sentry.io.
- While the installation finishes, go to sentry.io and sign up for a free account.
- After signing up, create a new project, give that project any name, like “airflow_error_mgmt” select any team name, and select Python as the language.
- We will be then taken to a page that has a block of code like this:
- Copy the DSN key given in the block of code.
Set Up airflow.cfg
- Next, in the airflow.cfg file in the airflow directory, search for sentry and we will see two parameters:
Set the sentry_on parameter as True and paste the DSN key for the sentry_dsn parameter:
- So, sentry has been set up on our system.
- Now, to make sure that Sentry is working well with airflow, we need some errors.
- Let’ create a simple error by commenting out any of the import statements (don’t comment out the import DAG statement, your DAG file will never be recognized by the scheduler)
- In the given file, we have commented out the import statement of the Python Operator.
Error Tracking using Sentry
- Now, activate the airflow scheduler, webserver, and workers. The webserver will throw an error on the CLI and UI.
- If we go to sentry’s dashboard and click on the issues tab, we should see that sentry has caught an error.
- It also mentions the error that has occurred, the part of the code that caused the error, and the number of events that have taken place.
- You will also receive an email on your registered email account for sentry.
- So Sentry is useful for remote tracking of your data pipelines and other systems.
- You can also explore the various elements provided by sentry to troubleshoot and test your pipeline.
Congratulations! Sentry has been set up to monitor errors in airflow.