How to Build a Docker Image to Run Containerized Meltano Pipelines

Containerize a Meltano EL pipeline with Docker to get a reproducible, self-contained workflow that produces a JSONL artifact.

Introduction

This guide walks you through containerizing a Meltano EL pipeline with Docker. You’ll initialize a Meltano project, configure a smoke-test extractor and JSONL loader, build a Docker image that embeds the pipeline, run it in a container, and verify the output (animals.jsonl). By the end, you’ll have a working image, a running pipeline, and a tangible JSONL output file to inspect.

What is Docker?

Docker is an open-source container platform that packages applications and their dependencies into isolated environments. A Dockerfile defines how to build an image, the image is the immutable artifact used to launch a container.

In short:

Dockerfile → Image → Container

What is Meltano?

Meltano is an open-source, CLI-based ELT platform built on Singer SDK connectors. It gives data engineers the flexibility to run extractors and loaders from the same repository, with configurable schema/catalog options and command arguments to match pipeline requirements.

Prerequisites

Before initializing the Meltano project and building the Docker image, make sure you have the following:

  • Operating System: Windows (with WSL), or a supported Linux distribution.

  • Python: Python 3.9 or newer.

  • pipx and uv package manager:

    Install pipx: python3 -m pip install pipx

  • Install uv:

    Use: pipx install uv

  • Docker Engine

      Both Linux and WSL can run Docker Engine natively.
    
  • Install Meltano:

pipx install meltano

# or using uv
uv tool install meltano

1. Initialize Meltano Project

Start by creating and entering your project directory:

cd /path/to/projects
meltano init meltano-docker-example
cd meltano-docker-example
uv venv
source .venv/bin/activate
which python

Expected output (example):

/path/to/projects/meltano-docker-example/.venv/bin/python

2. Configure Meltano Extractor and Loader

We’ll use the “Smoke Test” extractor and target-jsonl loader.

Add the extractor and loader:

meltano add extractor tap-smoke-test
meltano add loader target-jsonl

Edit meltano.yml to configure the extractor:

extractors:
  - name: tap-smoke-test
    variant: meltano
    pip_url: git+https://github.com/meltano/tap-smoke-test.git
    config:
      streams:
        - stream_name: animals
          input_filename: https://raw.githubusercontent.com/meltano/tap-smoke-test/main/demo-data/animals-data.jsonl

3. Invoke the Tap and Test the EL Process Locally

Run the extractor:

meltano invoke tap-smoke-test

You should see a line like:

Beginning full_table sync of 'animals'...

Then run the full pipeline:

meltano run tap-smoke-test target-jsonl

Inspect the output:

cd output
cat animals.jsonl

Expected sample content:

{"id": 1, "description": "Red-headed woodpecker", "verified": true, "views": 27, "created_at": "2021-09-22T01:01:05Z"}
{"id": 2, "description": "Dragon, Melty", "verified": true, "views": 27, "created_at": "2021-07-01T18:47:52Z"}

Return to project root:

cd ..

4. Configure Docker

Add the Docker bundle:

meltano add files files-docker

Replace the Dockerfile with:

FROM meltano/meltano:latest-python3.12

WORKDIR /project

RUN apt update && \\
    apt clean

# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt

# Copy over Meltano project directory
COPY . .
RUN meltano install

# Prevent runtime modifications
ENV MELTANO_PROJECT_READONLY=1

ENTRYPOINT ["meltano"]
CMD ["run", "tap-smoke-test", "target-jsonl"]

Build the image:

docker build --no-cache -t meltano-docker-example .

Verify the image exists:

docker images

5. Run Meltano in a Docker Container

Execute the container:

docker run meltano-docker-example

On the first run you may see:

[warning] No state was found, complete import.

This is expected, it triggers a full sync.

To inspect output manually:

docker run --rm -it -v "$PWD":/project --entrypoint bash meltano-docker-example

# inside container
cd output
ls
cat animals.jsonl

Summary

  • Built a Meltano pipeline locally.
  • Created a Docker image containing the Meltano project.
  • Ran a full EL process inside a container.
  • Verified the resulting animals.jsonl output.

Next Steps

  • Use a database as the target (e.g., SQL Server, PostgreSQL, MySQL).
  • Swap in a public API as the extractor.
  • Implement version control (Git) and integrate into CI/CD pipelines.