October 21, 2021

How To Install Apache Airflow 2.1 On Ubuntu 20.04 LTS

On this article we will discuss the procedures how to install Apache Airflow 2.1 on Ubuntu 20.04 LTS Operating system.

Introduction

Apache Airflow is an open-source workflow management platform for building the data pipelines. Airflow is written in Python, and workflows are created via Python scripts. Airflow is designed under the principle of “configuration as code”. Airflow is initiated at Airbnb in October 2014 as a solution to manage the company’s increasingly complex workflows. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a Top-Level Apache Software Foundation project in January 2019. Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. In this article, we will discuss the procedures for installing Apache Airflow version 2.1 on Ubuntu 20.04 LTS operating system.



Apache Airflow Installation on Ubuntu 20.04 LTS

On this article we will use the last stable release of Apache Airflow, Apache Airflow version 2.1 which was released on July 2, 2021. We will install Airflow as standalone instance on local machine. As stated on Airflow documentation, if only pip installation is currently officially supported, so we have to install pip first on our local machine. The installation process will be detailed explained below.

Prerequisite

There are several prerequisites that must be met before we can install Airflow on Linux Ubuntu 20.04 LTS. The prerequisites are as follow :

  • An Ubuntu 20.04 LTS Server with sufficient disk space
  • An account with sudo or root access to run privileged commands.
  • Python: 3.6, 3.7, 3.8 must be installed (3.9 is not supported)
  • Databases installed on the server :PostgreSQL (version :9.6, 10, 11, 12, 13), MySQL (version :5.7, 8), SQLite (version :3.15.0+)

On our environment test, we will use MySQL version 5.7 as database for Apache Airflow.

Apache Airflow Installation

1. Installing pip

Pip is a management system designed to install software packages written in Python. There are several steps that must be done to install pip. The following steps will be carried out to perform pip installation :

  1. sudo apt-get install software-properties-common
  2. sudo apt-add-repository universe
  3. sudo apt-get update
  4. sudo apt-get install python-setuptools
  5. sudo apt install python3-pip
ramans@otodiginet:~$ sudo apt-get install software-properties-common
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  python3-software-properties software-properties-gtk
The following packages will be upgraded:
  python3-software-properties software-properties-common software-properties-gtk
3 upgraded, 0 newly installed, 0 to remove and 541 not upgraded.
Need to get 0 B/99.7 kB of archives.
After this operation, 0 B of additional disk space will be used.
Do you want to continue? [Y/n] Y
(Reading database ... 142625 files and directories currently installed.)
ramans@otodiginet:~$ sudo apt-add-repository universe
'universe' distribution component is already enabled for all sources.
ramans@otodiginet:~$ sudo apt-get update
Hit:1 http://security.ubuntu.com/ubuntu focal-security InRelease             
Hit:2 http://us.archive.ubuntu.com/ubuntu focal InRelease                    
Hit:3 http://us.archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:4 http://us.archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists... Done
Install prerequisites packages software
Install prerequisites packages software



ramans@otodiginet:~$ sudo apt-get install python-setuptools
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python2 python2-minimal python2.7
  python2.7-minimal
Suggested packages:
  python-setuptools-doc python2-doc python-tk python2.7-doc binutils binfmt-support
The following NEW packages will be installed:
  libpython2-stdlib libpython2.7-minimal libpython2.7-stdlib python-pkg-resources python-setuptools python2
  python2-minimal python2.7 python2.7-minimal
0 upgraded, 9 newly installed, 0 to remove and 541 not upgraded.
Need to get 4,275 kB of archives.
After this operation, 18.5 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Install Python-setuptools
Install Python-setuptools
ramans@otodiginet:~$ sudo apt install python3-pip
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu build-essential cpp-9 dpkg-dev fakeroot g++ g++-9 gcc gcc-10-base
  gcc-9 gcc-9-base libalgorithm-diff-perl libalgorithm-diff-xs-perl libalgorithm-merge-perl libasan5 libatomic1
  libbinutils libc-dev-bin libc6 libc6-dbg libc6-dev libcc1-0 libcrypt-dev libctf-nobfd0 libctf0 libexpat1-dev libfakeroot
  libgcc-9-dev libgcc-s1 libgomp1 libitm1 liblsan0 libpython3-dev libpython3.8 libpython3.8-dev libpython3.8-minimal
  libpython3.8-stdlib libquadmath0 libstdc++-9-dev libstdc++6 libtsan0 libubsan1 linux-libc-dev make manpages-dev
  python-pip-whl python3-dev python3-distutils python3-lib2to3 python3-setuptools python3-wheel python3.8 python3.8-dev
  python3.8-minimal zlib1g zlib1g-dev
Suggested packages:
  binutils-doc gcc-9-locales debian-keyring g++-multilib g++-9-multilib gcc-9-doc gcc-multilib autoconf automake libtool
  flex bison gcc-doc gcc-9-multilib glibc-doc libstdc++-9-doc make-doc python-setuptools-doc python3.8-venv python3.8-doc
  binfmt-support
Install Pyhton3 pip
Install Pyhton3 pip

2. Installing Airflow Dependencies Packages

On this stage, we will install all dependencies packages which are required by Airflow. By default, Apache Airflow uses Sqllite as main database, but if we need more scalable database, we can use MySql, PostgreSQL or others. For tutorial purpose, using Sqlite is an acceptable thing.

ramans@otodiginet:~$ sudo apt-get install libmysqlclient-dev
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libmysqlclient21 libssl-dev libssl1.1
Suggested packages:
  libssl-doc
The following NEW packages will be installed:
  libmysqlclient-dev libssl-dev
The following packages will be upgraded:
  libmysqlclient21 libssl1.1
2 upgraded, 2 newly installed, 0 to remove and 523 not upgraded.
ramans@otodiginet:~$ sudo apt-get install libssl-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libssl-dev is already the newest version (1.1.1r-1ubuntu2.4).
libssl-dev set to manually installed
0 upgraded, 0 newly installed, 0 to remove and 523 not upgraded
Install libssl-dev
Install libssl-dev
ramans@otodiginet:~$ sudo apt-get install libkrb5-dev
[sudo] password for ramans: 
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  comerr-dev krb5-multidev libgssapi-krb5-2 libgssrpc4 libk5crypto3 libkadm5clnt-mit11
  libkadm5srv-mit11 libkdb5-9 libkrb5-3 libkrb5support0
Suggested packages:
  doc-base krb5-doc krb5-user
The following NEW packages will be installed:
  comerr-dev krb5-multidev libgssrpc4 libkadm5clnt-mit11 libkadm5srv-mit11 libkdb5-9
  libkrb5-dev
The following packages will be upgraded:
  libgssapi-krb5-2 libk5crypto3 libkrb5-3 libkrb5support0
4 upgraded, 7 newly installed, 0 to remove and 519 not upgraded.
Need to get 913 kB of archives.
After this operation, 2,111 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libgssapi-krb5-2 amd64 1.17-6ubuntu4.1 [121 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libkrb5-3 amd64 1.17-6ubuntu4.1 [330 kB]
Get:3 http://us.archive.ubuntu.com/ubuntu focal-updates/main amd64 libkrb5support0 amd64 1.17-6ubuntu4.1 [30.9 kB]
Install libkrb5-dev
Install libkrb5-dev

3. Installing Apache Airflow 2.1

After all the requirements and package dependencies are installed, it is time to start installing Apache Airflow. Airflow installation will be done in the Python environment, so we have to install Python environment first. Here are the steps.

  1. sudo apt install python3-virtualenv
  2. virtualenv airflow_otodigi
  3. cd airflow_otodigi/
  4. source activate
  5. export AIRFLOW_HOME=~/airflow
  6. install apache-airflow
  7. pip3 install typing_extensions
  8. airflow db init
  9. airflow webserver -p 8080



ramans@otodiginet:~$ sudo apt install python3-virtualenv
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools
  python3-zipp
The following NEW packages will be installed:
  python3-appdirs python3-distlib python3-filelock python3-importlib-metadata python3-more-itertools
  python3-virtualenv python3-zipp
0 upgraded, 7 newly installed, 0 to remove and 523 not upgraded.
Need to get 252 kB of archives.
After this operation, 1,304 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://us.archive.ubuntu.com/ubuntu focal/main amd64 python3-appdirs all 1.4.3-2.1 [10.8 kB]
ramans@otodiginet:~$ virtualenv airflow_otodigi
created virtual environment CPython3.8.10.final.0-64 in 567ms
  creator CPython3Posix(dest=/home/ramans/airflow_otodigi, clear=False, global=False)
  seeder FromAppData(download=False, pip=latest, setuptools=latest, wheel=latest, pkg_resources=latest, via=copy, app_data_dir=/home/ramans/.local/share/virtualenv/seed-app-data/v1.0.1.debian.1)
  activators BashActivator,CShellActivator,FishActivator,PowerShellActivator,PythonActivator,XonshActivator
ramans@otodiginet:~$ ls -lt
total 36
drwxrwxr-x 4 ramans ramans 4096 Jul 29 02:44 airflow_otodigi
ramans@otodiginet:~$ cd airflow_otodigi/
ramans@otodiginet:~/airflow_otodigi$ ls -l
total 12
drwxrwxr-x 2 ramans ramans 4096 Jul 29 02:44 bin
drwxrwxr-x 3 ramans ramans 4096 Jul 29 02:44 lib
-rw-rw-r-- 1 ramans ramans  203 Jul 29 02:44 pyvenv.cfg
ramans@otodiginet:~/airflow_otodigi$ cd bin
ramans@otodiginet:~/airflow_otodigi/bin$ ls
activate       activate.ps1      easy_install      pip      pip3.8   python3.8  wheel-3.8
activate.csh   activate_this.py  easy_install3     pip3     python   wheel
activate.fish  activate.xsh      easy_install-3.8  pip-3.8  python3  wheel3
ramans@otodiginet:~/airflow_otodigi/bin$ source activate
(airflow_otodigi) ramans@otodiginet:~/
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ export AIRFLOW_HOME=~/airflow
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ pip3 install apache-airflow
Collecting apache-airflow
  Using cached apache_airflow-2.1.2-py3-none-any.whl (5.2 MB)
Collecting inflection>=0.3.1
  Using cached inflection-0.5.1-py2.py3-none-any.whl (9.5 kB)
Collecting lazy-object-proxy
  Using cached lazy_object_proxy-1.6.0-cp38-cp38-manylinux1_x86_64.whl (58 kB)
Collecting httpx
  Using cached httpx-0.18.2-py3-none-any.whl (76 kB)
Collecting flask-wtf<0.15,>=0.14.3
  Using cached Flask_WTF-0.14.3-py2.py3-none-any.whl (13 kB)
Collecting iso8601>=0.1.12
  Using cached iso8601-0.1.16-py2.py3-none-any.whl (10 kB)
Collecting blinker
Apache Airflow Installation script on pip3
Apache Airflow Installation script on pip3
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ pip3 install typing_extensions
Collecting typing_extensions
  Using cached typing_extensions-3.10.0.0-py3-none-any.whl (26 kB)
Installing collected packages: typing-extensions
Successfully installed typing-extensions-3.10.0.0
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow db init
DB: sqlite:////home/ramans/airflow/airflow.db
[2021-07-29 02:57:04,096] {db.py:692} INFO - Creating tables
INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> e3a246e0dc1, current schema
INFO  [alembic.runtime.migration] Running upgrade e3a246e0dc1 -> 1507a7289a2f, create is_encrypted
/home/ramans/airflow_otodigi/lib/python3.8/site-packages/alembic/ddl/sqlite.py:43 UserWarning: Skipping unsupported ALTER for creation of implicit constraintPlease refer to the batch mode feature which allows for SQLite migrations using a copy-and-move strategy.
INFO  [alembic.runtime.migration] Running upgrade 1507a7289a2f -> 13eb55f81627, maintain history for compatibility with earlier migrations
INFO  [alembic.runtime.migration] Running upgrade 13eb55f81627 -> 338e90f54d61, More logging into task_instance
INFO  [alembic.runtime.migration] Running upgrade 338e90f54d61 -> 52d714495f0, job_id indices
Apache Airflow Init db | Sqlite database initiation for Apache Airflow
Sqlite database initiation for Apache Airflow

Then the next step is to starting up the Airflow webserver. We will be using port 8080 for Apache Airflow application. We are free to use any port as long as it does not conflict with other applications on the server. The command line is as shown below.

(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow webserver -p 8080
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
[2021-07-29 03:04:32,130] {dagbag.py:496} INFO - Filling up the DagBag from /dev/null
[2021-07-29 03:04:32,232] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120
Logfiles: - -
Access Logformat: 
=================================================================            
[2021-07-29 03:04:37 -0700] [32944] [INFO] Starting gunicorn 20.1.0
[2021-07-29 03:04:37 -0700] [32944] [INFO] Listening at: http://0.0.0.0:8080 (32944)
[2021-07-29 03:04:37 -0700] [32944] [INFO] Using worker: sync
[2021-07-29 03:04:37 -0700] [32946] [INFO] Booting worker with pid: 32946
[2021-07-29 03:04:37 -0700] [32947] [INFO] Booting worker with pid: 32947
[2021-07-29 03:04:37 -0700] [32948] [INFO] Booting worker with pid: 32948
[2021-07-29 03:04:37 -0700] [32949] [INFO] Booting worker with pid: 32949
[2021-07-29 03:04:42,443] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
[2021-07-29 03:04:42,593] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
[2021-07-29 03:04:42,611] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
[2021-07-29 03:04:42,748] {manager.py:788} WARNING - No user yet created, use flask fab command to do it.
Starting Webserver Apache Airflow
Starting Webserver Apache Airflow



Unfortunately, when Apache Airflow was successfully started, I forgot to setup the user who has privilege to access the Airflow application. We will be prompted by the ” WARNING – No user yet created, use flask fab command to do it. ” as shown above. But, Don’t worry, if we get a message like that. To mitigate this problem, add the following command line.

(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow users create \
>           --username admin \
>           --firstname Admin \
>           --lastname Otodiginet \
>           --role Admin \
>           --email admin@otodiginet.com
Adding new user for Apache Airflow Admin
Adding new user for Apache Airflow Admin

After message ” WARNING – No user yet created, use flask fab command to do it. ” disappear, then we just rerun the Airflow web server. Before we start using Apache Airflow, we must first activate the Airflow scheduler with the command line: airflow scheduler . This command line is executed on a different console from when we run the Airflow webserver.

(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ export AIRFLOW_HOME=~/airflow
(airflow_otodigi) ramans@otodiginet:~/airflow_otodigi/bin$ airflow scheduler
  ____________       _____________
 ____    |__( )_________  __/__  /________      __
____  /| |_  /__  ___/_  /_ __  /_  __ \_ | /| / /
___  ___ |  / _  /   _  __/ _  / / /_/ /_ |/ |/ /
 _/_/  |_/_/  /_/    /_/    /_/  \____/____/|__/
 * Serving Flask app "airflow.utils.serve_logs" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
Staring up Airflow Scheduler
Staring up Airflow Scheduler

On this stage, all of our journeys in installing Apache Airflow will determine whether the results are appropriate or not. To do the test, we can open a web browser and type the url : https://localhost:8080 according to the port we specified earlier.

Apache Airflow Login Page
Apache Airflow Login page



Apache Airflow Main Dashboard
Apache Airflow Main Dashboard
 Apache Airflow Scheduler
Apache Airflow Scheduler
Apache Airflow Graph View
Apache Airflow Graph View

Conclusion

OK, we have shown you how to install Apache Airflow on Ubuntu 20.04 LTS with detail. I hope, this article can be useful for those in need. A more detailed of Apache Airflow explanation can be found on the Apache Airflow official website.

Share this article via :

2 thoughts on “How To Install Apache Airflow 2.1 On Ubuntu 20.04 LTS

Leave a Reply

Your email address will not be published. Required fields are marked *