Setting up Apache-Airflow in Windows using WSL

Table of Contents

Step 1: Install pip on WSL

  • To set up a virtual environment, we need to install a python package named virtualenv.
  • We will use the pip command for the same.
  • If the python-pip is not installed, simply run the following command in an Ubuntu terminal:
username@desktop_name:~$ sudo apt install python3-pip
[sudo] password for username:
  • Type the Linux password to proceed with the installation

Step 2: Install virtualenv package

username@desktop_name:~$ sudo pip3 install virtualenv
[sudo] password for username:
OUTPUT:
Collecting virtualenv

Step 3: Create a virtual environment

  • We will now create a virtual environment.
  • It will keep its libraries and dependencies separate from the global as well as any other project libraries to avoid any conflict between them.
  • We can create a virtual environment in WSL using the following command:
username@desktop_name:~$ virtualenv airflow_envOUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
creator CPython3Posix(dest=/home/username/airflow_env, clear=False, no_vcs_ignore=False, global=False)seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/username/.local/share/virtualenv)added seed packages: pip==21.2.2, setuptools==57.4.0, wheel==0.36.2activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator
  • We can activate the environment using the following code:
username@desktop_name:~$ source airflow_env/bin/activate
  • We should now see that our virtual environment name precedes the terminal command line like:
(airflow_env) username@desktop_name:~$
  • This indicates that the virtual environment has been activated and the following commands will take effect only inside this environment.

Step 4: Set up an environment variable for the Airflow directory

  • Now, we will set up an environment variable to help us navigate to the airflow directory every time we restart the system.
  • We will first create a directory named as airflow on Windows. Our directory will be located at C:/Users/username/Documents/.
  • Next, we will get the directory path in WSL using the command line and then we will set this path in our WSL environment.
  • Generally, system drives are located in the mnt folder. And Documents directory is located under our username directory in the c/Users directory.
  • We will now travel to the airflow directory from the root directory using the following commands:
username@desktop_name:~$ cd /
username@desktop_name:/$ ls
bin dev etc g init lib32 libx32 media opt root sbin srv tmp var
boot home lib lib64 lost+found mnt proc run snap sys usr
username@desktop_name:/$ cd mnt
username@desktop_name:/mnt$ cd c
username@desktop_name:/mnt/c$ ls
'$Recycle.Bin' AVScanner.ini MSOCache 'Program Files (x86)' Users pagefile.sys '$WinREAgent' Config.Msi Octave ProgramData Windows swapfile.sys 'Documents and Settings' PerfLogs Recovery temp tmp 'Program Files' 'System Volume Information' hiberfil.sys tools
username@desktop_name:/mnt/c$ cd Users
username@desktop_name:/mnt/c/Users$ ls
'All Users' Default 'Default User' username Public desktop.ini
username@desktop_name:/mnt/c/Users$ cd username
username@desktop_name:/mnt/c/Users/username$ ls
OneDrive Pictures PrintHood Anaconda3 Recent AppData Searches 'Application Data' myWebApp 'Creative Cloud Files' Templates Desktop Documents Downloads MicrosoftEdgeBackups Music 'My Documents'
username@desktop_name:/mnt/c/Users/username$ cd Documents
username@desktop_name:/mnt/c/Users/username/Documents$ ls
Zoom airflow desktop.ini
username@desktop_name:/mnt/c/Users/username/Documents$ cd airflow
username@desktop_name:/mnt/c/Users/username/Documents/airflow$
  • We will now copy this entire path- /mnt/c/Users/username/Documents/airflow and store it in a variable.
  • To create an environment variable, open a new terminal and enter the following command to edit the bash script:
username@desktop_name:~$ sudo nano ~/.bashrc
  • You have to enter the following instruction anywhere in the given console:
export AIRFLOW_HOME=/mnt/c/Users/username/Documents/airflow
Entering export command in the console
  • This command will save the airflow directory path in an environment variable named AIRFLOW_HOME. Close all the open terminals.
  • When next time we start the terminal, we can simply write cd $AIRFLOW_HOME to go to the airflow directory.
username@desktop_name:~$ cd $AIRFLOW_HOME
username@desktop_name:~/mnt/c/Users/username/Documents/airflow$

Step 5: Installing Airflow and necessary libraries

  • Now we will open a new terminal and activate the virtual environment using the following command:
username@desktop_name:~$ source airflow_env/bin/activate
  • Next, we will install airflow and the required libraries using the following command:
(airflow_env) username@desktop_name:~ pip3 install apache-airflow[gcp,sentry,statsd]OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark.
(airflow_env) username@desktop_name:~ pip3 install pyspark
(airflow_env) username@desktop_name:~ pip3 install sklearn

Step 6: Initialize Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@desktop_name:~ cd $AIRFLOW_HOME
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • We can now see some files and directories inside the airflow directory
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ lsOUTPUT:
airflow.cfg airflow.db logs webserver_config.py
  • It is time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ mkdir dags

Step 7: Creating a new user

  • We have to create a new user on the first startup of airflow.
  • This can be done with the help of the “users create” command.
  • To create a new user with username as admin with Admin role, we can run the following code:
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Step 8: Starting the Airflow scheduler and webserver

  • Now start the airflow scheduler using the airflow scheduler command:
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow scheduler
  • Start a new terminal, activate the virtual environment, go to the airflow directory, and start the webserver.
username@desktop_name:~$ source airflow_env/bin/activate
(airflow_env) username@desktop_name:~$ cd $AIRFLOW_HOME
(airflow_env) username@desktop_name:~/mnt/c/Users/username/Documents/airflow$ airflow webserver
  • After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow. We see the following page:
Airflow Login
  • If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following webserver UI.
Airflow Home Page
  • These are some of the prebuilt dags that are available when we login for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags and see their workings.

What’s next?

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Living with “Strange Strangers”

Vector-ts, a Python List -like data structure for Typescript / JavaScript.

39 Websites That Can Make You Unbelievably Smarter Just in 10 Minutes a Day

Concept of Big Data

Sfml Graphics Hpp Download For Dev C++

Kong API Gateway — Docker Container

CDA - “Hello World” and Cheers to new beginnings

GKE Ingress SSL with Google Managed Certificates

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Airflow: Create Custom Operator from MySQL to PostgreSQL

Docker inside Airflow when running via Docker Compose

Logos of Airflow and Docker

Faster Data Loading for Pandas on S3

How to install Airflow on Ubuntu 20.04