December 3, 2021
How to run Airflow on Docker CE

Running Apache Airflow In Docker

On this tutorial we will learn how to run/deploy Apache Airflow 2.2 in Docker Community Edition.

Introduction

Apache Airflow is an open-source workflow management platform for building the data pipelines. Airflow uses directed acyclic graphs (DAGs) to manage workflow orchestration. Tasks and dependencies are defined in Python and then Airflow manages the scheduling and execution. In this article, we will discuss the procedures for running Apache Airflow in Docker container (Community Edition).

Running Apache Airflow in Docker

On this tutorial, we will discuss how to run airflow in the Docker CE environment. This tutorial is based on the document which was provided by Apache Airflow team, which refers to CeleryExecutor in Docker. The deployment process will will consist of several stages, namely:

  1. Prerequisite
  2. Fetching docker-compose.yaml
  3. Initializing Environment
    • Setting the right Airflow user
    • Initialize the database
  4. Running Apache Airflow
  5. Accessing the environment
    • Running the CLI commands
    • Accessing the web interface
    • Sending requests to the REST API

Prerequisite

To run airflow in docker, prerequisites must be met, namely:

  1. Docker Community Edition (CE). If we don’t have docker installed on the system yet, we have to install it first. We can follow the article about Docker CE installation.
  2. Docker Compose v1.29.1 and newer on our workstation. We can follow the article about install Docker Compose.

Fetching docker-compose.yaml

On this step, we will fetch the docker-compose.yaml document and then later will be used in the stage of deploying airflow in docker. The docker-compose.yaml document can be got here. We will download it by using command line curl, as shown below.

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.2.0/docker-compose.yaml'
[ramansah@otodiginet ~]$ sudo curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
[sudo] password for ramansah: 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   633  100   633    0     0     91      0  0:00:06  0:00:06 --:--:--   168
100 12.1M  100 12.1M    0     0   167k      0  0:01:14  0:01:14 --:--:--  396k
Fetching Apache Airflow docker.compose.yaml
Fetching Apache Airflow docker.compose.yaml

Initializing Environment

To deploy Airflow on Docker, we need to prepare the environment. The things to be prepared are : create the necessary files or directories and initialize the database.

1. Create Directories and Setting right Airflow user

Some directories in the container are mounted, which means that their contents are synchronized between container and our locar computer. We will create a directories under user ramansah user :

[ramansah@otodiginet ~]$ mkdir -p ./dags ./logs ./plugins
[ramansah@otodiginet ~]$ echo -e "AIRFLOW_UID=$(id -u)" > .env
Setting right Airflow user
Setting right Airflow user

2. Initialize Database

We need to run database migrations and create the first user account. To do it we will submit the command line :

docker-compose up airflow-init
[ramansah@otodiginet ~]$ docker-compose up airflow-init
Creating network "ramansah_default" with the default driver
Creating volume "ramansah_postgres-db-volume" with default driver
Pulling postgres (postgres:13)...
13: Pulling from library/postgres
7d63c13d9b9b: Pull complete
cad0f9d5f5fe: Pull complete
ff74a7a559cb: Pull complete
c43dfd845683: Pull complete
e554331369f5: Pull complete
d25d54a3ac3a: Pull complete
bbc6df00588c: Pull complete
d4deb2e86480: Pull complete
d4132927c0d9: Pull complete
3d03efa70ed1: Pull complete
645312b7d892: Pull complete
3cc7074f2000: Pull complete
5d6e98ee16de: Pull complete
Digest: sha256:bdc05bf68e78e893d9423d0632a24f52b2bdba5a9c9686f99da9200e2f7cb672
Status: Downloaded newer image for postgres:13
Pulling redis (redis:latest)...

...

airflow-init_1       | [2021-10-12 14:32:17,036] {manager.py:214} INFO - Added user airflow
airflow-init_1       | User "airflow" created with role "Admin"
airflow-init_1       | 2.2.0
ramansah_airflow-init_1 exited with code 0
Airflow Init database
Airflow Init database

Running Apache Airflow On Docker

After all are set, then we will start all service by typing command line : docker-compose up.

ramansah@otodiginet ~]$ docker-compose up
ramansah_postgres_1 is up-to-date
ramansah_redis_1 is up-to-date
Starting ramansah_airflow-init_1 ... done
Creating ramansah_airflow-webserver_1 ... done
Creating ramansah_airflow-worker_1    ... done
Creating ramansah_airflow-scheduler_1 ... done
Creating ramansah_flower_1            ... done
Creating ramansah_airflow-triggerer_1 ... done
Attaching to ramansah_postgres_1, ramansah_redis_1, ramansah_airflow-init_1, ramansah_airflow-worker_1, ramansah_airflow-webserver_1, ramansah_airflow-triggerer_1, ramansah_airflow-scheduler_1, ramansah_flower_1
Apache Airflow run all services
Apache Airflow run all services

While the container is running, we will check on the second console/terminal to ensure if the container is running properly condition. This task can be done by typing command line : docker ps.

[ramansah@otodiginet ~]$  docker ps
CONTAINER ID   IMAGE                  COMMAND                  CREATED          STATUS                     PORTS                                                 NAMES
c9d574433311   apache/airflow:2.2.0   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (unhealthy)   8080/tcp                                              ramansah_airflow-triggerer_1
e29e5d0f1cc0   apache/airflow:2.2.0   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (healthy)     0.0.0.0:5555->5555/tcp, :::5555->5555/tcp, 8080/tcp   ramansah_flower_1
8b92d740dc4d   apache/airflow:2.2.0   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (healthy)     8080/tcp                                              ramansah_airflow-scheduler_1
fe73e92aaf83   apache/airflow:2.2.0   "/usr/bin/dumb-init …"   3 minutes ago    Up 3 minutes (healthy)     8080/tcp                                              ramansah_airflow-worker_1
a893a3f83b41   apache/airflow:2.2.0   "/usr/bin/dumb-init …"   4 minutes ago    Up 3 minutes (healthy)     0.0.0.0:8080->8080/tcp, :::8080->8080/tcp             ramansah_airflow-webserver_1
28a59444a7ae   redis:latest           "docker-entrypoint.s…"   12 minutes ago   Up 12 minutes (healthy)    6379/tcp                                              ramansah_redis_1
c6252b3bc5c5   postgres:13            "docker-entrypoint.s…"   12 minutes ago   Up 12 minutes (healthy)    5432/tcp
Airflow Container
Airflow Container

It can be seen from the status above, if the Airflow container is running normally.

Accessing Environment

After starting Airflow as container, we can interact with it in 3 ways:

  • by running CLI commands
  • via a browser using the web interface
  • using the REST API
1. Running CLI Commands

On this tutorial, we will to test interact with Airflow, by submitting command line : docker-compose run airflow-worker airflow info.

[ramansah@otodiginet ~]$ docker-compose run airflow-worker airflow info
Starting ramansah_airflow-init_1 ... done
Creating ramansah_airflow-worker_run ... done


Apache Airflow
version                | 2.2.0                                                 
executor               | CeleryExecutor                                        
task_logging_handler   | airflow.utils.log.file_task_handler.FileTaskHandler   
sql_alchemy_conn       | postgresql+psycopg2://airflow:airflow@postgres/airflow
dags_folder            | /opt/airflow/dags                                     
plugins_folder         | /opt/airflow/plugins                                  
base_log_folder        | /opt/airflow/logs                                     
remote_base_log_folder |                                                       
                                                                               

System info
OS              | Linux                                                                                                                                                                   
architecture    | x86_64                                                                                                                                                                  
uname           | uname_result(system='Linux', node='a568846d819c', release='4.18.0-305.19.1.el8_4.x86_64', version='#1 SMP Wed Sep 15 19:12:32 UTC 2021', machine='x86_64', processor='')
locale          | ('en_US', 'UTF-8')                                                                                                                                                      
python_version  | 3.6.15 (default, Sep 28 2021, 20:40:56)  [GCC 8.3.0]                                                                                                                    
python_location | /usr/local/bin/python                                                                                                                                                   
                                                                                                                                                                                          

Tools info
git             | NOT AVAILABLE                                                                              
ssh             | OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1d  10 Sep 2019                               
kubectl         | NOT AVAILABLE                                                                              
gcloud          | NOT AVAILABLE                                                                              
cloud_sql_proxy | NOT AVAILABLE                                                                              
mysql           | mysql  Ver 8.0.26 for Linux on x86_64 (MySQL Community Server - GPL)                       
sqlite3         | 3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1
psql            | psql (PostgreSQL) 11.13 (Debian 11.13-0+deb10u1)                                           
                                                                                                             

Paths info
airflow_home    | /opt/airflow                                                                                                                                                                 
system_path     | /home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin                                                                         
python_path     | /home/airflow/.local/bin:/usr/local/lib/python36.zip:/usr/local/lib/python3.6:/usr/local/lib/python3.6/lib-dynload:/home/airflow/.local/lib/python3.6/site-packages:/usr/loca
                | l/lib/python3.6/site-packages:/opt/airflow/dags:/opt/airflow/config:/opt/airflow/plugins                                                                                     
airflow_on_path | True                                                                                                                                                                         
                                                                                                                                                                                               

Providers info
apache-airflow-providers-amazon          | 2.3.0
apache-airflow-providers-celery          | 2.1.0
apache-airflow-providers-cncf-kubernetes | 2.0.3
apache-airflow-providers-docker          | 2.2.0
apache-airflow-providers-elasticsearch   | 2.0.3
apache-airflow-providers-ftp             | 2.0.1
apache-airflow-providers-google          | 6.0.0
apache-airflow-providers-grpc            | 2.0.1
apache-airflow-providers-hashicorp       | 2.1.1
apache-airflow-providers-http            | 2.0.1
apache-airflow-providers-imap            | 2.0.1
apache-airflow-providers-microsoft-azure | 3.2.0
apache-airflow-providers-mysql           | 2.1.1
apache-airflow-providers-odbc            | 2.0.1
apache-airflow-providers-postgres        | 2.3.0
apache-airflow-providers-redis           | 2.0.1
apache-airflow-providers-sendgrid        | 2.0.1
apache-airflow-providers-sftp            | 2.1.1
apache-airflow-providers-slack           | 4.1.0
apache-airflow-providers-sqlite          | 2.0.1
apache-airflow-providers-ssh             | 2.2.0
2. Accessing Web Interface

We can log in to the web interface and try to run some tasks. The Apache Airflow webserver is available at: http://localhost:8080. The default account has the login airflow and the password airflow.

Apache Airflow Web Interface
Apache Airflow Web Interface
 Apache Airflow DAGs
Apache Airflow DAGs

Conclusion

So far, we have shown how to run/deploy Apache Airflow on Docker Community Edition. This tutorial is based on article on https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html.

Share this article via :

Leave a Reply

Your email address will not be published. Required fields are marked *