Mastering Meltano

How to Run a Meltano Pipeline with a REST API and PostgreSQL

This guide takes an existing Meltano proof-of-concept and elevates it by using a true data source and target database. We will be working with public REST API endpoints as an extractor, and we will use PostgreSQL as a loader. Additionally, we will include some key version control basics using Git as well as a remote repository using GitHub.

What Have We Done So Far?

  1. Installed Python, Meltano, and Docker.
  2. Built a Docker image that ran a test extractor with a JSONL loader. If you need to catch up, follow along with the Meltano Docker Guide

Prerequisites for This Guide

  1. Completed Part 1: Python, Meltano, and Docker are installed.
  2. PostgreSQL: Installed locally or accessible elsewhere (cloud, hypervisor, bare-metal).
  3. Ubuntu/Debian terminal: These instructions assume a Debian-based Linux shell. Windows users can use WSL (Ubuntu).
  4. Git: Installed locally with sudo apt install git.
  5. GitHub: An account to create a remote repository.
  6. Text Editor: Use your preferred text editor (nano, vim, neovim, etc.).

1. Install PostgreSQL

Update system packages and install postgres on your local machine:

sudo apt update
sudo apt install postgresql

Check the status of the service and, if needed, start it:

sudo systemctl status postgresql
sudo systemctl start postgresql

Later, we will use the same variables below in a .env file. For now, let’s export them to the terminal and create them in Postgres:

# These export commands load our variables into our current shell session.

export TARGET_POSTGRES_USER=meltano
export TARGET_POSTGRES_PASSWORD='meltanopassword'
export TARGET_POSTGRES_DATABASE=meltano_database

# The following PostgreSQL commands create these variables based one what is specified above.
# This will create the user, create the database, and grant this user all priveleges.

sudo -u postgres psql -c "CREATE ROLE \"$TARGET_POSTGRES_USER\" WITH LOGIN PASSWORD '$TARGET_POSTGRES_PASSWORD';"
sudo -u postgres psql -c "CREATE DATABASE \"$TARGET_POSTGRES_DATABASE\" OWNER \"$TARGET_POSTGRES_USER\";"
sudo -u postgres psql -c "GRANT ALL PRIVILEGES ON DATABASE \"$TARGET_POSTGRES_DATABASE\" TO \"$TARGET_POSTGRES_USER\";"

2. Initialize a New Meltano Project

cd /path/to/projects
meltano init meltano-restapi-postgres-example
cd meltano-restapi-postgres-example
uv venv
source .venv/bin/activate
which python

3. Configure Meltano Extractor and Loader

The public API will be using is called JSONPlaceholder, a public API endpoint where we can grab fake user data. Vist https://jsonplaceholder.typicode.com/ to check out the resources they offer, today we will be using /users.

Add the tap/extractor:

meltano add extractor tap-rest-api-msdk

Add the target/loader:

meltano add loader target-postgres

Edit the plugins section of meltano.yml to look like this:

plugins:
  extractors:
  - name: tap-rest-api-msdk
    variant: widen
    pip_url: tap-rest-api-msdk
    config:
      api_url: https://jsonplaceholder.typicode.com
      streams:
      - name: users
        path: /users
        primary_keys: [id]


  loaders:
  - name: target-postgres
    variant: meltanolabs
    pip_url: meltanolabs-target-postgres
    config:
      host: ${TARGET_POSTGRES_HOST}
      port: ${TARGET_POSTGRES_PORT}
      user: ${TARGET_POSTGRES_USER}
      password: ${TARGET_POSTGRES_PASSWORD}
      database: ${TARGET_POSTGRES_DATABASE}

You may notice the use of variables inside curly braces in the loaders section. These exist to keep the actual credentials and database values safe. We will next create a .env file where the real values will live.

4. Create a .env File for the Postgres Credentials

Using your preferred text editor (nano, vim, neovim, etc.), enter the actual values for the Postgres database. sudo nano .env or sudo vim .env

Enter the values that match what we created in Postgres earlier. We are also including the host and port for the Postgres database. 127.0.0.1 is the localhost IP address, and 5432 is the default port for Postgres:

TARGET_POSTGRES_HOST=127.0.0.1
TARGET_POSTGRES_PORT=5432
TARGET_POSTGRES_USER=meltano
TARGET_POSTGRES_PASSWORD='meltanopassword'
TARGET_POSTGRES_DATABASE=meltano_database

Save the .env file and exit. If using nano: ctrl + x to exit, press y to save. If using vim: :wq to write/save the changes and quit vim.

5. Run the Full EL Pipeline

Invoke the tap (prints API JSON data to the terminal):

meltano invoke tap-rest-api-msdk

Run both the tap and the target:

meltano run tap-rest-api-msdk target-postgres

This pipeline should have created a new schema inside PostgreSQL called “tap_rest_api_msdk”.

6. Connect to the Database and Find Our Table

Sign-in to Postgres:

sudo -u postgres psql

(Inside Postgres) Connect to our database:

\c meltano_database

List the available schemas in that database (tap_rest_api_msdk should be listed in the output):

\dn

List tables inside our schema:

\dt tap_rest_api_msdk.*

Expected output:

      Schema       | Name  | Type  |  Owner  
-------------------+-------+-------+---------
 tap_rest_api_msdk | users | table | meltano
(1 row)

Now select all rows from the ‘users’ table:

SELECT * 
FROM tap_rest_api_msdk.users;

Press q to quit the table view.

Depending on the window size of your terminal, it may be difficult to view. To see the query output in a vertical format, use this command:

SELECT * FROM tap_rest_api_msdk.users \gx

7. Implement Version Control with Git

This is a good place to stop and introduce version control using Git. Git is a version control system that allows you to track changes in your code and collaborate with others.

First, we need to initialize a new git repository (be sure you are in the meltano-restapi-postgres-example directory):

git init

This will initialize a new git repository in the current directory.

Next, we need to ensure our .env file is not tracked by git. This is important because it contains sensitive information such as database credentials. Luckily, Meltano has already created a .gitignore file that includes .env files. If you want to check, run the following command (depending on your text editor): nano .gitignore or vim .gitignore Expected Output:

/venv
/.meltano
.env

Now, let’s add the files to the git repository and commit them with a message describing what we have done so far:

git add .
git commit -m "Initial commit: Meltano project with REST API and Postgres integration"

Let’s also change the default branch name from master to main (if it is not already set):

git branch -m main

8. Create a Remote Repository on GitHub and Push Your Code To It

  1. Go to your GitHub account and log in.
  2. Click “New repository” in the top left corner.
  3. Name your repository meltano-restapi-postgres-example.
  4. Optionally, add a description. Something like “Meltano project using REST API, PostgreSQL, and Docker integration.”
  5. Select “Public” or “Private” based on your preference.
  6. Do not initialize the repository with a .gitignore, LICENSE or README, as we already have those files in our local repository.
  7. Click “Create repository”.
  8. Once the repository is created, copy the repository URL or SSH link provided by GitHub.

Now, we need to link our local repository to the remote repository we just created on GitHub:

git remote add origin <your-github-repo-url>

Fetch all the branches from the remote repository:

git fetch origin

Push your local changes to the remote repository:

git push -u origin main

9. Dockerize the Meltano Project

Now that we have a Meltano project with a REST API and Postgres integration, let’s Dockerize it.

Add the Meltano Docker bundle to your project:

meltano add files files-docker

Replace the contents of the Dockerfile with the following:

FROM meltano/meltano:latest-python3.12

WORKDIR /project

RUN apt update && \
    apt install -y jq curl inetutils-ping && \
    apt clean

# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt

# Copy over Meltano project directory
COPY . .
RUN meltano install

# Prevent runtime modifications
ENV MELTANO_PROJECT_READONLY=1

ENTRYPOINT ["meltano"]
CMD ["run", "tap-rest-api-msdk", "target-postgres"]

Build the Docker image and name it meltano-restapi-postgres-example:

docker build --no-cache -t meltano-restapi-postgres-example .

Run the Docker container based on the image we just built. Since we use a .env file to store our variables, we need to pass them to our Docker container at runtime. The importance in doing this is to ensure our secrets are not hardcoded into the Dockerfile:

docker run --rm --network host \
  -e TARGET_POSTGRES_HOST=127.0.0.1 \
  -e TARGET_POSTGRES_PORT=5432 \
  -e TARGET_POSTGRES_USER=meltano \
  -e TARGET_POSTGRES_PASSWORD=meltanopassword \
  -e TARGET_POSTGRES_DATABASE=meltano_database \
  meltano-restapi-postgres-example

When run successfully, the Meltano EL process should look similar to when run locally. The major difference is that it is now running inside a Docker container, which allows for better portability and isolation of dependencies.

Now that we have successfully implemented Docker into our Meltano project, let’s commit these changes to our Git repository and push them to GitHub:

git add .
git commit -m "Added Docker support for Meltano project with REST API and Postgres integration"
git push origin main

Summary

  • Built a Meltano project that extracts data from a public REST API and loads it into a Postgres database.
  • Implemented version control using Git and pushed the project to a remote repository on GitHub.
  • Dockerized the Meltano project to allow for better portability and isolation of dependencies.

Extras

  • Explore the JSONPlaceholder API further by adding more streams (e.g., posts, comments).
  • Try setting up PostgreSQL on another machine and connect Meltano to it remotely.
  • Add other loaders to the Meltano project, such as a CSV loader or a JSON loader.
  • Experiment with Git and GitHub features such as branches, pull requests, and issues.