Setting up Apache-Airflow in Ubuntu

Table of Contents

Step 1: Install pip on Ubuntu

  • To set up a virtual environment, we need to install a python package named virtualenv.
  • We will use the pip command for the same.
  • If the python-pip is not installed, simply run the following command in an Ubuntu terminal:
username@desktop_name:~$ sudo apt install python3-pip
[sudo] password for username:
  • Type the Ubuntu password to proceed with the installation.

Step 2: Install and set up a virtual environment using virtualenv

  • After successfully installing pip, we will now install the virtualenv package using the following command:
username@desktop_name:~$ sudo pip3 install virtualenv
[sudo] password for username:
OUTPUT:
Collecting virtualenv
  • Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (will create next) and the airflow directory (create manually).
  • To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@desktop_name:~/airflow_workspace$ virtualenv airflow_env
OUTPUT:

created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator
airflow_workspace after creating virtual environment (airflow_env) directory and airflow directory
  • We can activate the environment using the following command:
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
  • We should now see that our virtual environment name precedes the terminal command line like:
(airflow_env) username@desktop_name:~/airflow_workspace$
  • This indicates that the virtual environment has been activated.

Step 3: Installing Airflow and necessary libraries

  • Next, we will install airflow and the required libraries using the following command:
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:

Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
  • This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
  • After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install pyspark
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install sklearn

Step 4: Initialize Airflow Database

  • Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow db init
OUTPUT:
Modules imported successfully
Initialization done
  • We can now see some files and directories inside the airflow directory
Airflow Directory after ‘airflow db init’ command
  • It is time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ mkdir dags

Step 5: Creating a new user

  • We have to create a new user on the first startup of airflow.
  • This can be done with the help of the “users create” command.
  • To create a new user with username as admin with Admin role, we can run the following code:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
  • Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin

Step 6: Starting the Airflow scheduler and webserver

  • Now start the airflow scheduler using the airflow scheduler command:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow scheduler
  • Start a new terminal in the airflow_workspace, activate the virtual environment, go to the airflow directory, and start the webserver.
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow webserver
  • After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/.
  • Port 8080 should be the default port for Airflow. We see the following page:
  • If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file inside the airflow directory and change the port number.
  • After logging in using our airflow username and password, we should see the following webserver UI:
  • These are some of the prebuilt dags that are available when we log in for the first time.
  • If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
  • You can explore the UI, experiment with the dags, and see their workings.

What’s next?

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Trust, but verify.

Stability → Productivity → Maintainability

AWS Lambda + Serverless Framework + Python — A Step By Step Tutorial — Part 1 “Hello World”

Image result for serverless framework

Feynman Technique for Learning Programming and Computer Science

An AWS Cloud User’s Guide: Demystifying AWS Reserved Instances

Why Your Business Needs Cloud Computing

A newbie’s guide to Web Architecture

Terraform Workspaces Basics

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
INSAID

INSAID

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!

More from Medium

Creating Dynamically DAG’s Apache Airflow with Various and Dependencies Task

Airflow on Kubernetes Cluster: Using your own Python modules on Spark Jobs

How to install Airflow on Ubuntu 20.04