Table of Contents
10. Running the DAG
11. What’s Next?
In our last article, we saw a high-level overview of the various executors in airflow. Now, we will see how a local executor can be used to achieve parallelism in airflow. So, let’s get started…
Setup for Local Executor
- Before we start implementing airflow using the local executor, we will need to set up MySQL (Works for WSL2 and Ubuntu) on our system and set up airflow with MySQL. You can refer to this guide to set up MySQL as a database backend.
- After setting up the MySQL database, go to airflow.cfg file in the airflow directory, search for
executorvariable, and change the value from
LocalExecutor.Save the file and run the airflow webserver and scheduler to check if they are running properly with the LocalExecutor.
- If they are running properly, we can proceed with the DAG creation.
Creating a python file
Similar to the previous article, we will create a DAG, with multiple tasks inside the DAG. We will set dependencies between those tasks such that two tasks are executed in parallel.
- Create a new python file inside the airflow/dags directory on your system as “local_executor_demo.py” and open the file in your favorite editor.
Importing the modules
- We will import the “DAG” module and the “operators.python” module from the airflow package.
- We will import the “datetime” module to help us schedule the dags.
- Additionally, we will import time module to observe the execution of tasks in the DAG.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
Creating a DAG object
- We will now initialize a DAG object as follows:
catchup=False) as dag:
- We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
- task1: Calls a “hello_function” and sleeps for 5 seconds.
- task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
- task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
- task3: Calls a “bye_function”.
- The task definitions will look like the following:
Creating callable functions
- task1 calls a function named as hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
- task2_1 and task2_2 will call a function named as sleeping_function that will sleep for 5 seconds using the time.sleep() function.
- task3 calls a function named as bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
- The callable functions will look like this:
print('Hello, this is the first task of the DAG')
print('DAG run is done.')def sleeping_function():
print("Sleeping for 5 seconds")
- The dependencies will be set such that task2_1 and task2_2 will run concurrently after task1 has finished its execution. Task task3 will run only after both task2_1 and task2_2 have finished their execution.
- The dependencies will look like this:
Voila, it’s a DAG file
After compiling all the elements of the DAG, our final code should look like this:
You can refer to the given configuration file and code file:
Running the DAG
- In order to see the file running, activate the virtual environment and start your airflow webserver and scheduler.
- Go to http://localhost:8080/home (or your dedicated port for airflow) and you should see the following on the webserver UI:
- Activate the the “local_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
- You can also view the gantt chart of the execution by selecting the Gantt option. The gantt chart should look like this:
- We can see that sleepy_1 (task2_1) and sleepy_2 (task2_2) run in parallel in the DAG.
- Congratulations! We have implemented parallel execution in Airflow using the local executor.