Setting up Apache-Airflow in Windows using WSL

By Hiren Rupchandani & Mukesh Kumar

Table of Contents

Step 1: Install pip on WSL

Step 2: Install virtualenv package

Step 3: Create a virtual environment

Step 4: Set up an environment variable for the Airflow directory

Step 5: Installing Airflow and necessary libraries

Step 6: Initialize Airflow Database

Step 7: Creating a new user

Step 8: Starting the Airflow scheduler and webserver

What’s Next?

In our previous article, we set up Ubuntu 20.04 on Windows 10 as our Linux Subsystem Distribution. In this article, we will install Apache Airflow in WSL using a virtual environment.

Step 1: Install pip on WSL

  • To set up a virtual environment, we need to install a python package named virtualenv.
  • We will use the pip command for the same.
  • If the python-pip is not installed, simply run the following command in an Ubuntu terminal:
  • Type the Linux password to proceed with the installation

Step 2: Install virtualenv package

After successfully installing pip, we will now install the virtualenv package using the following command

Step 3: Create a virtual environment

  • We will now create a virtual environment.
  • It will keep its libraries and dependencies separate from the global as well as any other project libraries to avoid any conflict between them.
  • We can create a virtual environment in WSL using the following command:
  • We can activate the environment using the following code:
  • We should now see that our virtual environment name precedes the terminal command line like:
  • This indicates that the virtual environment has been activated and the following commands will take effect only inside this environment.

Step 4: Set up an environment variable for the Airflow directory

  • Now, we will set up an environment variable to help us navigate to the airflow directory every time we restart the system.
  • We will first create a directory named as airflow on Windows. Our directory will be located at C:/Users/username/Documents/.
  • Next, we will get the directory path in WSL using the command line and then we will set this path in our WSL environment.
  • Generally, system drives are located in the mnt folder. And Documents directory is located under our username directory in the c/Users directory.
  • We will now travel to the airflow directory from the root directory using the following commands:
  • We will now copy this entire path- /mnt/c/Users/username/Documents/airflow and store it in a variable.
  • To create an environment variable, open a new terminal and enter the following command to edit the bash script:
  • You have to enter the following instruction anywhere in the given console:
  • This command will save the airflow directory path in an environment variable named AIRFLOW_HOME. Close all the open terminals.
  • When next time we start the terminal, we can simply write cd $AIRFLOW_HOME to go to the airflow directory.

Step 5: Installing Airflow and necessary libraries

  • Now we will open a new terminal and activate the virtual environment using the following command:
  • Next, we will install airflow and the required libraries using the following command:
  • This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark.

Step 6: Initialize Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
  • We can now see some files and directories inside the airflow directory
  • It is time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.

Step 7: Creating a new user

  • We have to create a new user on the first startup of airflow.
  • This can be done with the help of the “users create” command.
  • To create a new user with username as admin with Admin role, we can run the following code:
  • Run the following command to check if the user was created successfully:

Step 8: Starting the Airflow scheduler and webserver

  • Now start the airflow scheduler using the airflow scheduler command:
  • Start a new terminal, activate the virtual environment, go to the airflow directory, and start the webserver.
  • After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow. We see the following page:
  • If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file and change the port number.
  • After logging in using our airflow username and password, we should see the following webserver UI.
  • These are some of the prebuilt dags that are available when we login for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags and see their workings.

What’s next?

Hello World in Apache-Airflow

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!