May 26, 2026

PPDM in the Cloud: SQL Server, PostgreSQL, or Something Else?

By John Wassilak

Sooner or later, every PPDM conversation arrives at the same question.

You’ve convinced yourself an industry-standard data model is worth adopting. You’ve scoped the implementation, picked the business problem you’re solving, and figured out where to start. Then somebody asks: what database are we actually building this on?

The question gets treated as more contentious than it actually is. There are people in upstream data with strong opinions about SQL Server versus PostgreSQL, and they will express those opinions at length. What usually gets lost is that the choice is driven mostly by practical constraints. Once you identify those constraints, the decision makes itself.

The historical default: SQL Server

Most PPDM implementations have historically run on SQL Server. This is not a coincidence.

The major upstream software vendors built their products for Windows and SQL Server. Well data management tools, production accounting packages, land management systems. If your data infrastructure is already a Windows shop, and your important vendor tools expect to connect to a SQL Server instance, there’s a reasonable argument for staying in the ecosystem.

SQL Server also has genuine strengths. The tooling is mature. Performance for transactional workloads at operator scale is not a problem. SSMS, the built-in backup and HA features, and the operational tooling are well understood by a large population of DBAs. And if you’re running in a Microsoft cloud shop, Azure SQL or SQL Server on an Azure VM gives you a straightforward integration and compliance story.

The security and backup considerations covered in The Complete Guide to AWS RDS SQL Server apply whether you’re hosting PPDM or anything else on the platform.

The case for PostgreSQL

PostgreSQL has become a serious contender for new PPDM builds over the last five years. The reasons aren’t hard to find.

Cost. PostgreSQL is open source. You’re paying for compute and storage, not licensing. For a mid-size operator whose workload doesn’t require anything exotic, that difference adds up over a multi-year run.

Modern tooling compatibility. The open-source data engineering ecosystem was built with PostgreSQL as a first-class target. Airflow, dbt, Meltano, the DuckDB connectors, virtually every ELT tool built in the last decade. The connectors are better tested, the community documentation is more current, and the edge cases are better understood.

Cloud portability. Managed PostgreSQL is available on every major cloud provider. RDS, Cloud SQL, Azure Database for PostgreSQL, Neon, Supabase, Render. Moving between them is significantly less painful than a SQL Server workload. If your infrastructure strategy involves multiple clouds or you’re not yet locked into a vendor, PostgreSQL gives you more options.

PPDM compatibility. The PPDM model is a relational model. It was designed and documented with SQL Server, but the schema translates to PostgreSQL with less friction than most people expect. Data types map cleanly, the relationships work the same way, and the main adjustments are syntactic rather than architectural.

The main reason to pass on PostgreSQL is vendor tool requirements. If your land system or production accounting software requires a SQL Server connection string, that constraint overrules everything else.

What about cloud-native warehouses?

This question comes up, and it deserves a direct answer.

Snowflake, BigQuery, Databricks, Redshift. These are excellent analytical platforms. They are not the right home for your PPDM master data.

PPDM is a transactional data model. The access patterns assume row-level inserts, updates, and deletes. Well records get created and modified. Working interests change over time. Operator changes have to be tracked with effective dates. The things you’re doing to maintain data integrity in a PPDM implementation are operational, not analytical.

Cloud warehouses are built for analytical access patterns: large sequential scans, aggregations over wide tables, low write frequency. They’re typically not optimized for the kind of row-level record management a well master requires, and latency on small transactional queries is often worse than expected.

The right mental model: PPDM is where your data lives. A warehouse is where you go to analyze it. We’ll come back to this in the DuckDB section below.

What actually drives the decision

The decision comes down to a short list.

Vendor tool requirements. Does any software you’re running require SQL Server? If yes, the decision may already be made. If your vendor tools are database-agnostic or support PostgreSQL, you have a real choice.

Operational team skills. A shop that has been running SQL Server for fifteen years has DBAs who know SQL Server. Standing up a PostgreSQL implementation in front of them creates a support burden. The inverse is also true.

Cloud and licensing preferences. If your organization has a Microsoft Enterprise Agreement and Azure is the designated cloud platform, Azure SQL is a natural fit. If you’re greenfield or actively moving away from Microsoft licensing, PostgreSQL makes more sense.

Budget. SQL Server licensing for a mid-size operator running a PPDM environment is real money in the cloud. If budget is a constraint, it’s worth doing the math before committing.

Where DuckDB fits

DuckDB is not a PPDM master database.

DuckDB is an embedded analytical database. It is excellent at running complex queries over large amounts of data, especially when that data is in Parquet files. It is not designed for transactional workloads, multi-user concurrent writes, or the kind of record-level operations a PPDM well master requires.

What DuckDB is excellent for is the analytical layer that sits on top of your PPDM data. You extract snapshots or incrementally updated exports from your PostgreSQL (or SQL Server) PPDM environment, land them as Parquet files, and query them with DuckDB. The result is fast, flexible analytical queries that don’t touch your transactional database.

PostgreSQL as the operational data store, DuckDB as the analytical layer. That’s the architecture we see in modern PPDM builds. The pipeline details are in Building a Production Data Pipeline on PPDM with Airflow and DuckDB.

The honest answer

For a new PPDM build with no existing SQL Server constraints and a preference for open-source tooling: PostgreSQL. It’s cheaper, the modern data engineering ecosystem assumes it, and it gives you the most flexibility for the analytical tools you’ll want to add later.

For a shop already running SQL Server, with vendor tools that depend on it and a DBA team that knows it: SQL Server. Don’t introduce a new database technology to solve a problem you don’t have.

For a shop that wants cloud-hosted PPDM without managing database servers: PostgreSQL on RDS or Azure Database for PostgreSQL handles it. SQL Server on RDS or Azure SQL also works, at higher licensing cost.

The platform matters less than the data quality, governance decisions, and ingestion pipelines. A well-maintained PPDM environment on PostgreSQL will outperform a poorly maintained one on SQL Server every time. Pick the platform that matches your constraints, stand it up properly, and put your energy into the data work.

We Were Just at PPDM 2026

We spent April 27 through 29 at the PPDM Energy Data Convention in Houston. If you want to talk through platform decisions for a PPDM implementation or ongoing project, we’re happy to get into the specifics.