Containerize a Meltano EL pipeline with Docker to get a reproducible, self-contained workflow that produces a JSONL artifact.
Introduction
This guide walks you through containerizing a Meltano EL pipeline with Docker. You’ll initialize a Meltano project, configure a smoke-test extractor and JSONL loader, build a Docker image that embeds the pipeline, run it in a container, and verify the output (animals.jsonl). By the end, you’ll have a working image, a running pipeline, and a tangible JSONL output file to inspect.
What is Docker?
Docker is an open-source container platform that packages applications and their dependencies into isolated environments. A Dockerfile defines how to build an image, the image is the immutable artifact used to launch a container.
In short:
Dockerfile → Image → Container
What is Meltano?
Meltano is an open-source, CLI-based ELT platform built on Singer SDK connectors. It gives data engineers the flexibility to run extractors and loaders from the same repository, with configurable schema/catalog options and command arguments to match pipeline requirements.
Prerequisites
Before initializing the Meltano project and building the Docker image, make sure you have the following:
-
Operating System: Windows (with WSL), or a supported Linux distribution.
-
Python: Python 3.9 or newer.
-
pipx and uv package manager:
Install pipx:
python3 -m pip install pipx
-
Install uv:
Use:
pipx install uv
-
Both Linux and WSL can run Docker Engine natively.
-
Install Meltano:
pipx install meltano
# or using uv
uv tool install meltano
1. Initialize Meltano Project
Start by creating and entering your project directory:
cd /path/to/projects
meltano init meltano-docker-example
cd meltano-docker-example
uv venv
source .venv/bin/activate
which python
Expected output (example):
/path/to/projects/meltano-docker-example/.venv/bin/python
2. Configure Meltano Extractor and Loader
We’ll use the “Smoke Test” extractor and target-jsonl loader.
Add the extractor and loader:
meltano add extractor tap-smoke-test
meltano add loader target-jsonl
Edit meltano.yml to configure the extractor:
extractors:
- name: tap-smoke-test
variant: meltano
pip_url: git+https://github.com/meltano/tap-smoke-test.git
config:
streams:
- stream_name: animals
input_filename: https://raw.githubusercontent.com/meltano/tap-smoke-test/main/demo-data/animals-data.jsonl
3. Invoke the Tap and Test the EL Process Locally
Run the extractor:
meltano invoke tap-smoke-test
You should see a line like:
Beginning full_table sync of 'animals'...
Then run the full pipeline:
meltano run tap-smoke-test target-jsonl
Inspect the output:
cd output
cat animals.jsonl
Expected sample content:
{"id": 1, "description": "Red-headed woodpecker", "verified": true, "views": 27, "created_at": "2021-09-22T01:01:05Z"}
{"id": 2, "description": "Dragon, Melty", "verified": true, "views": 27, "created_at": "2021-07-01T18:47:52Z"}
Return to project root:
cd ..
4. Configure Docker
Add the Docker bundle:
meltano add files files-docker
Replace the Dockerfile with:
FROM meltano/meltano:latest-python3.12
WORKDIR /project
RUN apt update && \\
apt clean
# Install any additional requirements
COPY ./requirements.txt .
RUN pip install -r requirements.txt
# Copy over Meltano project directory
COPY . .
RUN meltano install
# Prevent runtime modifications
ENV MELTANO_PROJECT_READONLY=1
ENTRYPOINT ["meltano"]
CMD ["run", "tap-smoke-test", "target-jsonl"]
Build the image:
docker build --no-cache -t meltano-docker-example .
Verify the image exists:
docker images
5. Run Meltano in a Docker Container
Execute the container:
docker run meltano-docker-example
On the first run you may see:
[warning] No state was found, complete import.
This is expected, it triggers a full sync.
To inspect output manually:
docker run --rm -it -v "$PWD":/project --entrypoint bash meltano-docker-example
# inside container
cd output
ls
cat animals.jsonl
Summary
- Built a Meltano pipeline locally.
- Created a Docker image containing the Meltano project.
- Ran a full EL process inside a container.
- Verified the resulting animals.jsonl output.
Next Steps
- Use a database as the target (e.g., SQL Server, PostgreSQL, MySQL).
- Swap in a public API as the extractor.
- Implement version control (Git) and integrate into CI/CD pipelines.