Setting up Apache-Airflow in MacOS

Table of Contents

Step 1: Install pip on MacOS

  • If the python-pip is not installed, simply run the following command in a Mac terminal:
username@device_name ~ % brew install pip

Step 2: Install and set up a virtual environment using virtualenv

  • After successfully installing pip, we will now install the virtualenv package using the following command:
username@device_name~ % sudo pip3 install virtualenv
  • Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (will create next) and the airflow directory (create manually).
  • To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@device_name~/airflow_workspace % virtualenv airflow_env
OUTPUT:

created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator
airflow_workspace after creating virtual environment (airflow_env) directory and airflow directory
  • We can activate the environment using the following command:
username@device_name~/airflow_workspace % source airflow_env/bin/activate
  • We should now see that our virtual environment name precedes the terminal command line like:
(airflow_env) username@device_name~/airflow_workspace %
  • This indicates that the virtual environment has been activated.

Step 3: Installing Airflow and necessary libraries

  • Next, we will install airflow and the required libraries using the following command:
(airflow_env) username@device_name~/airflow_workspace % pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:

Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@device_name~/airflow_workspace % pip3 install pyspark
(airflow_env) username@device_name~/airflow_workspace % pip3 install sklearn

Step 4: Initialize Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • We can now see some files and directories inside the airflow directory.
  • It‘s time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.
(airflow_env) username@device_name~/airflow_workspace/airflow % mkdir dags

Step 5: Creating a new user

  • We have to create a new user on the first startup of airflow.
  • This can be done with the help of the “users create” command.
  • To create a new user with username as admin with Admin role, we can run the following code:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Step 6: Starting the Airflow scheduler and webserver

  • Now start the airflow scheduler using the airflow scheduler command:
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow scheduler
  • Start a new terminal in the airflow_workspace, activate the virtual environment, go to the airflow directory, and start the webserver.
username@device_name~/airflow_workspace source airflow_env/bin/activate
(airflow_env) username@device_name~/airflow_workspace % cd airflow
(airflow_env) username@device_name~/airflow_workspace/airflow % airflow webserver
  • After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/. Port 8080 should be the default port for Airflow.
  • If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file in the airflow directory and change the port number.
Airflow Login Page
  • After logging in using our airflow username and password, we should see the webserver UI of airflow.
Airflow Home Page
  • These are some of the prebuilt dags that are available when we log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags, and see their workings.

What’s next?

--

--

--

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

My journey to becoming a developer

My first Rails application

Method of Creating LP Staking Pool on Bitbyte (Tutorial)

What is a microservice? What is a container?

Whitepaper Release Announcement!

Why I ditched beloved Gson for my Kotlin project

Rabbitmq example working with GoLang

ZeroMQ Proxy

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

What is a Graph Database?

Data Flow with Apache Nifi in Google Cloud Platform

Enabling Local Airflow Development

Data Transfer from Amazon S3 to PostgreSQL (on RDS) — 2