Local Executor in Airflow

By Hiren Rupchandani & Mukesh Kumar

Table of Contents

1. Setup for local executor

2. Creating a python file

3. Importing the modules

4. Creating a DAG Object

5. Creating Tasks

6. Creating callable functions

7. Setting Dependencies

8. Voila, It’s a DAG File

9. Code Reference

10. Running the DAG

11. What’s Next?

In our last article, we saw a high-level overview of the various executors in airflow. Now, we will see how a local executor can be used to achieve parallelism in airflow. So, let’s get started…

Setup for Local Executor

  • Before we start implementing airflow using the local executor, we will need to set up MySQL (Works for WSL2 and Ubuntu) on our system and set up airflow with MySQL. You can refer to this guide to set up MySQL as a database backend.
  • After setting up the MySQL database, go to airflow.cfg file in the airflow directory, search for executor variable, and change the value from SequentialExecutor to LocalExecutor. Save the file and run the airflow webserver and scheduler to check if they are running properly with the LocalExecutor.
  • If they are running properly, we can proceed with the DAG creation.

Creating a python file

Similar to the previous article, we will create a DAG, with multiple tasks inside the DAG. We will set dependencies between those tasks such that two tasks are executed in parallel.

  • Create a new python file inside the airflow/dags directory on your system as “local_executor_demo.py” and open the file in your favorite editor.

Importing the modules

  • We will import the “DAG” module and the “operators.python” module from the airflow package.
  • We will import the “datetime” module to help us schedule the dags.
  • Additionally, we will import time module to observe the execution of tasks in the DAG.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import time

Creating a DAG object

  • We will now initialize a DAG object as follows:
with DAG(
catchup=False) as dag:

Creating Tasks

  • We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
  • task1: Calls a “hello_function” and sleeps for 5 seconds.
  • task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task3: Calls a “bye_function”.
  • The task definitions will look like the following:

Creating callable functions

  • task1 calls a function named as hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
  • task2_1 and task2_2 will call a function named as sleeping_function that will sleep for 5 seconds using the time.sleep() function.
  • task3 calls a function named as bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
  • The callable functions will look like this:
def hello_function():
print('Hello, this is the first task of the DAG')
def last_function():
print('DAG run is done.')
def sleeping_function():
print("Sleeping for 5 seconds")

Setting dependencies

  • The dependencies will be set such that task2_1 and task2_2 will run concurrently after task1 has finished its execution. Task task3 will run only after both task2_1 and task2_2 have finished their execution.
  • The dependencies will look like this:

Voila, it’s a DAG file

After compiling all the elements of the DAG, our final code should look like this:

DAG File

Code Reference

You can refer to the given configuration file and code file:

Running the DAG

  • In order to see the file running, activate the virtual environment and start your airflow webserver and scheduler.
  • Go to http://localhost:8080/home (or your dedicated port for airflow) and you should see the following on the webserver UI:
Webserver Home Screen
  • Activate the the “local_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
  • You can also view the gantt chart of the execution by selecting the Gantt option. The gantt chart should look like this:
Gantt Chart
  • We can see that sleepy_1 (task2_1) and sleepy_2 (task2_2) run in parallel in the DAG.
  • Congratulations! We have implemented parallel execution in Airflow using the local executor.

What’s next?

Set Up Celery, Flower, and RabbitMQ for Airflow

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!