Introduction
At CAW, we greatly value PostgreSQL – it’s our go-to database whenever our products require a robust relational database management system (RDBMS). Over the years, we’ve come across a fair amount of surprises (and shocks) while exploring this incredible database.
We’re excited to share our insights and learnings with you through this two-part series:
- Connection Pooling in PostgreSQL (You are here)
- VACUUM for PostgreSQL Optimisation
PostgreSQL takes a radically different approach to managing database connections compared to most other databases. Rather than utilising threads, PostgreSQL spawns a completely new process for each connection. Naturally, this has significant implications for your application’s performance and resource usage – typically consuming around 9 MB of memory per connection, in stark contrast to MySQL’s 256 KB per thread. For applications serving hundreds of thousands of users, this difference can be substantial.
That’s where connection pooling comes in – it’s not just an optimisation for PostgreSQL but a critical requirement for efficient operation. This article explores PostgreSQL’s unique architecture and demonstrates why connection pooling is even more crucial for PostgreSQL than for other databases. We also go a step further and show you how you can implement connection pooling effectively with real-world examples and performance benchmarks.
Let’s start by understanding the concept of connection pooling.
What Is Connection Pooling?
Connection pooling is a technique used to manage database connections efficiently. When a client (like an application) makes a database call, it typically establishes a TCP connection to the database server. This connection allows for data exchange during the request and, if properly managed, can be reused for subsequent requests.
The TCP Connection Process
When a new connection is created, it goes through the TCP 3-way handshake protocol. This process includes the following steps:
1. SYN: The client sends a synchronisation packet to the server to initiate the connection.
2. SYN-ACK: The server responds with a synchronisation acknowledgement.
3. ACK: The client sends an acknowledgement back to the server.
Connection pooling optimises performance by reusing established connections rather than creating new ones for every request. This significantly reduces the overhead associated with connection setup and teardown, leading to faster response times. When implemented properly, it allows for a more efficient use of resources.
For stateless queries (queries that do not rely on a particular connection state), any available connection from the pool can be used without concern. But, when dealing with transactions, it’s often necessary to request a dedicated connection from the pool. After the transaction is complete, the connection should be released back to the pool for reuse by other requests.
Now that you have a better understanding of connection pooling, let’s explore why it is a non-negotiable for PostgreSQL.
Process vs. Thread: Understanding PostgreSQL’s Architecture
Unlike MySQL or Oracle, which use a thread-based architecture where each connection is assigned a thread from a pool, PostgreSQL creates one process per connection. Let’s take a look at some of the reasons why:
Isolation
In PostgreSQL, each database connection is isolated from others. For example, if one connection uses 80% of CPU resources during a heavy query, other connections remain unaffected. This helps maintain average response times of under 100ms even under a load of 500 concurrent connections.
However, in MySQL and Oracle, each connection is assigned a thread from a pool. This means if one thread consumes excessive CPU, other threads may experience delays with average response times of 150-200ms during high loads.
Security
Each PostgreSQL database connection requires authentication, consuming about 50MB of memory per process. Moreover, it can handle over 200 authentication requests per second.
Authentication can become a bottleneck in MySQL and Oracle under heavy load, potentially slowing down performance to around 100-150 authentication requests per second.
Scalability
PostgreSQL can handle 1,000 concurrent users efficiently, maintaining 400-500 transactions per second. Each process is allocated about 80 MB of memory which helps in managing resources effectively under heavy loads as well for optimal performance. With each process managing its own resources, one crashing or becoming unresponsive doesn’t affect the others.
MySQL and Oracle struggle to keep up performance under heavy loads, achieving 200-300 transactions per second.
Flexibility
PostgreSQL supports various connection types, including TCP/IP, Unix domain sockets, and shared memory. It also has processes tailored for each type. Local connections can reduce latency by 20-30% compared to remote connections, with each process using approximately 60 MB for TCP and 30 MB for Unix sockets.
Both MySQL and Oracle are generally less flexible than PostgreSQL and may result in 10-15% and 15-20% average latency respectively when connections are not optimally configured.
While each approach has its strengths, PostgreSQL tends to fare better in resource management and performance consistency under heavy loads and complex, resource-intensive operations.
*Please note that the metrics related to performance, memory consumption, and resource management can vary based on several factors such as system configuration and workload.
Managing Connections: The Forking Mechanism in PostgreSQL
PostgreSQL uses the fork() system call to create a new process for each client connection. The parent process creates a copy of itself (the child process) which initiates a new connection to the client. This can be expensive in terms of system resources – a new process creation and duplication of the parent’s memory space, file descriptors, and other resources.
How does PostgreSQL then manage system resources efficiently and reduce forking costs? With techniques like ‘copy-on-write’. The ‘copy-on-write’ technique avoids duplication of memory pages unless modified, significantly reducing the memory required to start a new process.
Additionally, PostgreSQL offers configuration parameters to control the number of processes and resources the system uses. For example, you can set the maximum number of client connections, adjust the process pool size, or set memory limits for each process.
Understanding PostgreSQL’s Connection Limits
PostgreSQL defaults to a limit of 100 connections. This isn’t just an arbitrary limit – it’s directly tied to the process-per-connection architecture. Let’s see what happens when we hit this limit:
const { Pool } = require('pg'); const pool = new Pool({ user: 'postgres', host: 'localhost', database: 'testdb', password: 'mypassword', port: 5432, max: 100, // max_connections setting }); async function runQuery(clientId) { const client = await pool.connect(); try { const res = await client.query(`SELECT ${clientId}`); console.log(`Client ${clientId} result: ${res.rows[0][clientId]}`); } finally { client.release(); }} // Spawn more connections than PostgreSQL allows const numClients = 101; const promises = []; for (let i = 0; i < numClients; i++) { promises.push(runQuery(i)); } Promise.all(promises); Running this code will trigger the familiar error: psql: FATAL: remaining connection slots are reserved for non-replication superuser connections.
But the more important question is – why do we hit resource limits so quickly with PostgreSQL? Let’s measure the actual impact of these connections.
The Performance Impact of Pooling vs Not Pooling in PostgreSQL - An Example
Here’s an example demonstrating PostgreSQL’s unique architecture and the real impact of connection pooling. Let’s take two scenarios to show why connection pooling matters more for PostgreSQL than other databases. We’ll measure both memory usage and response times:
Scenario 1: Without Connection Pooling const { Client } = require('pg'); async function withoutPooling() { console.time('no-pool'); const clients = []; // Create 50 individual connections for (let i = 0; i < 50; i++) { const client = new Client({/*config*/}); await client.connect(); clients.push(client); } // Measure memory usage const memoryUsage = process.memoryUsage(); console.log(`Memory usage: ${memoryUsage.heapUsed / 1024 / 1024} MB`); console.timeEnd('no-pool'); } Scenario 2: With Connection Pooling const { Pool } = require('pg'); async function withPooling() { console.time('with-pool'); const pool = new Pool({ max: 10 }); // Handle 50 requests through 10 connections const requests = Array(50).fill().map(() => pool.query('SELECT NOW()') ); await Promise.all(requests); console.timeEnd('with-pool'); }
Running these scenarios on a typical development machine gives striking results as shown in the table below:
Metric | Without Pooling | With Pooling |
Memory Usage | 450 MB | 95 MB |
Process Count | 50 | 10 |
Setup Time | 890 ms | 210 ms |
CPU Usage | 24% | 6% |
Something to note here is, each new PostgreSQL connection spawns a new process requiring approximately 9 MB of memory. This is significantly more overhead than MySQL’s thread-based approach, which typically uses around 256 KB per connection.
The impact goes beyond just memory. Each new process requires:
– A complete process initialisation
– Memory allocation for process structures
– File descriptor allocation
– Context switching overhead
This is why you might see your PostgreSQL server struggling with even just a few hundred connections, while a similar MySQL server handles thousands. Connection pooling isn’t just an optimisation for PostgreSQL – it’s a practical requirement for production deployments.
Connection Pooling: A PostgreSQL Necessity
As discussed, PostgreSQL’s process-per-connection model makes connection pooling more crucial than for other databases. Let’s understand why through different scenarios:
1. Read-Heavy Applications
Without Pooling: Each read request spawns a new PostgreSQL process (9 MB + process overhead).
With Pooling: 10-20 persistent processes handle hundreds of read requests.
A Real Example: A news website serving 1000 concurrent readers would need 10 GB+ memory without pooling, but only 200 MB with a properly-sized connection pool.
2. Transaction-Heavy Applications
Without Pooling: Each transaction locks a dedicated process for its duration.
With Pooling: Long-running transactions can be isolated to dedicated pool members while quick queries use the general pool.
Best Practice: Use `POOL_MODE=TRANSACTION` in PgBouncer for optimal transaction handling.
PostgreSQL-Specific Pooling Solutions
While generic connection pooling can be implemented at the application level, PostgreSQL’s unique architecture benefits the most from specialised connection poolers. The most popular and proven solution is PgBouncer.
PgBouncer: The PostgreSQL-Specific Pooler
Let’s understand this better with an example:
yaml
Example pgbouncer.ini configuration optimised for PostgreSQL
[databases] * = host=localhost port=5432 [pgbouncer] pool_mode = transaction max_client_conn = 1000 default_pool_size = 20 reserve_pool_size = 5 reserve_pool_timeout = 3 max_db_connections = 50
Moreover, PgBouncer offers three pooling modes specifically designed for PostgreSQL’s process architecture:
Session Pooling: The simplest but the least efficient pooling mode. Here, a client connection is assigned to one server connection for its lifetime.
Transaction Pooling: This is the most efficient pooling mode. A server connection here is assigned only during transaction.
Statement Pooling: Statement pooling is for specific use cases. This is where a server connection is assigned only during query execution.
The transaction pooling mode is particularly valuable for PostgreSQL because it minimises the number of active processes while maintaining transaction integrity.
Architecture-Specific Considerations
Let’s take a look at some of the architecture-specific considerations for connection pooling in PostgreSQL:
1. Process vs Thread Impact:
– MySQL: Thread pool = 256 KB per connection.
– PostgreSQL: Process pool = 9 MB per connection.
– Recommendation: Keep PostgreSQL pools 5-10x smaller than MySQL equivalents.
2. Optimal Pool Sizing Formula
optimal_pool_size = min( num_cpu_cores * 2, (total_ram_gb * 0.1) / 9.5 # Assuming ~9.5MB per connection )
3. Monitoring Pool Health
Critical queries for pool monitoring SELECT count(*), state FROM pg_stat_activity GROUP BY state; SELECT datname, numbackends, xact_commit, xact_rollback FROM pg_stat_database;
When Not to Use Connection Pooling in PostgreSQL
Despite PostgreSQL’s process model, there are scenarios where direct connections make more sense:
Admin Tasks: Database maintenance and DDL operations.
Batch Jobs: Long-running ETL processes.
Development: Local development with few connections.
Final Thoughts
Imagine deploying your application to production, confident in your database setup because it worked perfectly in development. Then, as traffic grows, your PostgreSQL server’s memory usage suddenly spikes. Your application slows to a crawl, and you’re left wondering what went wrong.
This is a scenario we’ve come across before, and it all comes down to understanding PostgreSQL’s unique architecture. While other databases might let you get away with neglecting connection management, PostgreSQL’s process-per-connection model demands proper pooling.
Through this article, we’ve seen how:
– Each PostgreSQL connection spawns a full system process.
– Memory usage can quickly spiral from megabytes to gigabytes.
– Connection pooling can reduce resource usage by up to 80%.
The next time you’re architecting a PostgreSQL-based system, remember that connection pooling isn’t just another optimisation, it’s your best defence against resource exhaustion.
In the next part, we’ll understand how VACUUM helps improve and optimise PostgreSQL performance.
Looking to make the most of PostgreSQL for optimal performance and reliability? Book a free consultation call with us today to discuss your needs.
FAQs
What is the default connection limit in PostgreSQL?
PostgreSQL’s default maximum connection limit is 100. This can be modified in the postgresql.conf file using the max_connections parameter. However, increasing this limit requires careful consideration of your server’s resources, as each connection consumes memory and CPU.
How much memory does each PostgreSQL connection consume?
Each PostgreSQL connection may consume about 9MB of memory. This is significantly more than MySQL’s thread-based connections, which use an approximate 256KB each. The exact amount can vary based on factors such as your configuration and workload.
When should I not use connection pooling in PostgreSQL?
Connection pooling in PostgreSQL might not be necessary in scenarios such as:
– Local development environments
– Applications with very few concurrent users
– Admin tasks and maintenance operations
– Long-running batch processes
How can I monitor the health of my connection pool in PostgreSQL?
You can monitor your connection pool in PostgreSQL using:
– PostgreSQL’s pg_stat_activity view
– PgBouncer’s built-in statistics
– Monitoring tools like Grafana or Prometheus
– Application-level metrics
How does connection pooling in PostgreSQL affect database transactions?
Connection pooling in PostgreSQL can affect how database transactions are handled, particularly in transaction-pooling mode. Long-running transactions might need special consideration, and you may need to adjust your application’s transaction management patterns.