Local Executor in Airflow

Table of Contents

Setup for Local Executor

  • Before we start implementing airflow using the local executor, we will need to set up MySQL (Works for WSL2 and Ubuntu) on our system and set up airflow with MySQL. You can refer to this guide to set up MySQL as a database backend.
  • After setting up the MySQL database, go to airflow.cfg file in the airflow directory, search for executor variable, and change the value from SequentialExecutor to LocalExecutor. Save the file and run the airflow webserver and scheduler to check if they are running properly with the LocalExecutor.
  • If they are running properly, we can proceed with the DAG creation.

Creating a python file

  • Create a new python file inside the airflow/dags directory on your system as “local_executor_demo.py” and open the file in your favorite editor.

Importing the modules

  • We will import the “DAG” module and the “operators.python” module from the airflow package.
  • We will import the “datetime” module to help us schedule the dags.
  • Additionally, we will import time module to observe the execution of tasks in the DAG.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import time

Creating a DAG object

  • We will now initialize a DAG object as follows:
with DAG(
dag_id="local_executor_demo",
start_date=datetime(2021,1,1),
schedule_interval="
@hourly",
catchup=False) as dag:

Creating Tasks

  • We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
  • task1: Calls a “hello_function” and sleeps for 5 seconds.
  • task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task3: Calls a “bye_function”.
  • The task definitions will look like the following:
task1=PythonOperator(
task_id="hello_function",
python_callable=hello_function
)
task2_1=PythonOperator(
task_id="sleepy_1",
python_callable=sleeping_function
)
task2_2=PythonOperator(
task_id="sleepy_2",
python_callable=sleeping_function
)
task3=PythonOperator(
task_id="bye_function",
python_callable=last_function
)

Creating callable functions

  • task1 calls a function named as hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
  • task2_1 and task2_2 will call a function named as sleeping_function that will sleep for 5 seconds using the time.sleep() function.
  • task3 calls a function named as bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
  • The callable functions will look like this:
def hello_function():
print('Hello, this is the first task of the DAG')
time.sleep(5)
def last_function():
print('DAG run is done.')
def sleeping_function():
print("Sleeping for 5 seconds")
time.sleep(5)

Setting dependencies

  • The dependencies will be set such that task2_1 and task2_2 will run concurrently after task1 has finished its execution. Task task3 will run only after both task2_1 and task2_2 have finished their execution.
  • The dependencies will look like this:
task1>>[task2_1,task2_2]>>task3

Voila, it’s a DAG file

DAG File

Code Reference

Running the DAG

  • In order to see the file running, activate the virtual environment and start your airflow webserver and scheduler.
  • Go to http://localhost:8080/home (or your dedicated port for airflow) and you should see the following on the webserver UI:
Webserver Home Screen
  • Activate the the “local_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
DAG Run
  • You can also view the gantt chart of the execution by selecting the Gantt option. The gantt chart should look like this:
Gantt Chart
  • We can see that sleepy_1 (task2_1) and sleepy_2 (task2_2) run in parallel in the DAG.
  • Congratulations! We have implemented parallel execution in Airflow using the local executor.

What’s next?

--

--

--

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Reconfigurable Manufacturing Systems: how to enable flexibility

Setup time reduction in reconfigurable manufacturing systems

Add Apple Maps to your website using a serverless function and host it on Netlify

How having a non-typical tech stack actually helped us get better candidates

Clone of mytheresa.com

Spark & AI summit and a glimpse of Spark 3.0

The DevOps tool arsenal: Results from ~100 DevOps/SRE surveys

Using Notion to organise programming topics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

How to build a data lake from scratch — Part 2: Connecting the components

Apache Hop, installation on Windows 10/11

Running Airflow on Heroku (Part II)

Data lake with Pyspark through Dataproc GCP using Airflow