Setting up Apache-Airflow in MacOS

By Hiren Rupchandani & Mukesh Kumar

Table of Contents

Step 1: Install pip on MacOS

Step 2: Install and set up a virtual environment using virtualenv

Step 3: Installing Airflow and necessary libraries

Step 4: Initialize Airflow Database

Step 5: Creating a new user

Step 6: Starting the Airflow scheduler and webserver

What’s Next?

In the previous articles, we saw a high-level overview of data and data pipelines, followed by an introduction to apache airflow.

Continuing the series, in this article, we will install Apache Airflow on a MacOS machine using a virtual environment.

It is assumed that you have homebrew installed on your system. If not, you can refer to this link to install homebrew.

  • If the python-pip is not installed, simply run the following command in a Mac terminal:
username@device_name ~ % brew install pip
  • After successfully installing pip, we will now install the virtualenv package using the following command:
username@device_name~ % sudo pip3 install virtualenv
  • Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (will create next) and the airflow directory (create manually).
  • To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@device_name~/airflow_workspace % virtualenv airflow_env
OUTPUT:

created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator
airflow_workspace after creating virtual environment (airflow_env) directory and airflow directory
  • We can activate the environment using the following command:
username@device_name~/airflow_workspace % source airflow_env/bin/activate
  • We should now see that our virtual environment name precedes the terminal command line like:
(airflow_env) username@device_name~/airflow_workspace %
  • This indicates that the virtual environment has been activated.
  • Next, we will install airflow and the required libraries using the following command:
(airflow_env) username@device_name~/airflow_workspace % pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:

Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@device_name~/airflow_workspace % pip3 install pyspark
(airflow_env) username@device_name~/airflow_workspace % pip3 install sklearn
  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • We can now see some files and directories inside the airflow directory.
  • It‘s time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.
(airflow_env) username@device_name~/airflow_workspace/airflow % mkdir dags
  • We have to create a new user on the first startup of airflow.
  • This can be done with the help of the “users create” command.
  • To create a new user with username as admin with Admin role, we can run the following code:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin
  • Now start the airflow scheduler using the airflow scheduler command:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow scheduler
  • Start a new terminal in the airflow_workspace, activate the virtual environment, go to the airflow directory, and start the webserver.
username@device_name~/airflow_workspace source airflow_env/bin/activate
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow webserver
  • After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/. Port 8080 should be the default port for Airflow.
  • If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file in the airflow directory and change the port number.
Airflow Login Page
  • After logging in using our airflow username and password, we should see the webserver UI of airflow.
Airflow Home Page
  • These are some of the prebuilt dags that are available when we log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags, and see their workings.

What’s next?

Hello World in Apache-Airflow