Setting up Apache-Airflow in Ubuntu
By Hiren Rupchandani & Mukesh Kumar

Table of Contents
Step 2: Install and set up a virtual environment using virtualenv
Step 3: Installing Airflow and necessary libraries
Step 4: Initialize Airflow Database
In the previous articles, we saw a high-level overview of data and data pipelines, followed by an introduction to apache airflow.
Continuing the series, in this article, we will install Apache Airflow on a Ubuntu machine using a virtual environment.
The following steps were performed in Ubuntu 18.04 and will work with Ubuntu 20.04 as well.
Step 1: Install pip on Ubuntu
- To set up a virtual environment, we need to install a python package named virtualenv.
- We will use the pip command for the same.
- If the python-pip is not installed, simply run the following command in an Ubuntu terminal:
username@desktop_name:~$ sudo apt install python3-pip
[sudo] password for username:
- Type the Ubuntu password to proceed with the installation.
Step 2: Install and set up a virtual environment using virtualenv
- After successfully installing pip, we will now install the virtualenv package using the following command:
username@desktop_name:~$ sudo pip3 install virtualenv
[sudo] password for username:
OUTPUT:
Collecting virtualenv
- Let’s create a new directory (say airflow_workspace) that will contain the virtual environment directory (will create next) and the airflow directory (create manually).
- To create a virtual environment directory as “airflow_env” inside the “airflow_workspace” directory, execute the following command:
username@desktop_name:~/airflow_workspace$ virtualenv airflow_env
OUTPUT:
created virtual environment CPython3.8.10.final.0–64 in 841ms
.
.
activators BashActivator, CShellActivator, FishActivator, PowerShellActivator, PythonActivator

- We can activate the environment using the following command:
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
- We should now see that our virtual environment name precedes the terminal command line like:
(airflow_env) username@desktop_name:~/airflow_workspace$
- This indicates that the virtual environment has been activated.
Step 3: Installing Airflow and necessary libraries
- Next, we will install airflow and the required libraries using the following command:
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install apache-airflow[gcp,sentry,statsd]
OUTPUT:
Collecting apache-airflow[gcp,sentry,statsd]
.
Installing collected packages: (All the downloaded packages)
- This installation may take some time. Make sure you have an active internet connection. Grab a cup of coffee while the installation proceeds…
- After successful installation, we will also install some additional libraries like sklearn and pyspark that you might need in the future.
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install pyspark
(airflow_env) username@desktop_name:~/airflow_workspace$ pip3 install sklearn
Step 4: Initialize Airflow Database
- Now we will go to the airflow directory and initialize the airflow database using the following commands:
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow db initOUTPUT:
Modules imported successfully
Initialization done
- We can now see some files and directories inside the airflow directory

- It is time to create a dags folder. All the future dags will be stored here and will be accessed by the airflow components.
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ mkdir dags
Step 5: Creating a new user
- We have to create a new user on the first startup of airflow.
- This can be done with the help of the “users create” command.
- To create a new user with username as admin with Admin role, we can run the following code:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users create --username admin --password your_password --firstname your_first_name --lastname your_last_name --role Admin --email your_email@some.com
- Run the following command to check if the user was created successfully:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow users listOUTPUT:
id | username | email | first_name | last_name | roles
===+==========+=======+============+===========+======
1 | admin | your_email@some.com | yout_first_name | your_last_name | Admin
Step 6: Starting the Airflow scheduler and webserver
- Now start the airflow scheduler using the airflow scheduler command:
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow scheduler
- Start a new terminal in the airflow_workspace, activate the virtual environment, go to the airflow directory, and start the webserver.
username@desktop_name:~/airflow_workspace$ source airflow_env/bin/activate
(airflow_env) username@desktop_name:~/airflow_workspace$ cd airflow
(airflow_env) username@desktop_name:~/airflow_workspace/airflow$ airflow webserver
- After the scheduler and webserver have been initialized, open any browser and go to http://localhost:8080/.
- Port 8080 should be the default port for Airflow. We see the following page:

- If it doesn’t work or is occupied by some other program, simply go to airflow.cfg file inside the airflow directory and change the port number.
- After logging in using our airflow username and password, we should see the following webserver UI:

- These are some of the prebuilt dags that are available when we log in for the first time.
- If you see this page, congratulations! you have successfully installed Apache Airflow in your system.
- You can explore the UI, experiment with the dags, and see their workings.