PostgreSQL Basic & Required Parameter Setting

Wednesday, 5 May 2021

PostgreSQL Basic & Required Parameter Setting

PostgreSQL works even if you start using it with the default settings without changing the parameter settings, but there may be various problems later, especially the default of performance-related parameters.

The value is so small that it may be a problem before tuning. It is certainly troublesome to enter from the settings before starting to use, but since there are only a few parameters that need to be set first (five are introduced this time), first set the minimum necessary settings and then use it.

Why not to use the default settings

By default, PostgreSQL is configured so that it can run on almost any computer without interfering with other applications too much. This is especially true for used memory. The default settings are only suitable for the following use: with them you can check if the PostgreSQL installation is working, create a notebook-level test database, and practice writing queries to it. If you are going to develop (and even more so to launch) real applications, then the settings will have to be radically changed. Unfortunately, the PostgreSQL distribution does not provide files with "recommended" settings. Generally speaking, it is very difficult to create such files. the optimal settings for a particular PostgreSQL installation will be determined by:

System configuration
The volume and type of data stored in the database
The ratio of the number of read and write requests
whether other resource-intensive processes (such as a web server) are running.

The purpose of this entry is to get started. Below are some of the appropriate in terms of minimal settings.

Types of parameters to set

The PostgreSQL parameters to be set first are roughly divided into categories

- memory (buffer) related
- checkpoint related
- server log related

These will be explained in order below.

Memory (buffer) related

First, there are memory (buffer) related parameters. There are two memory-related parameters.

These parameters specify the size of the memory area that buffers the data, as it says "buffer related". Since the database handles a lot of disk I / O, it is very important for performance to use the buffer mechanism well. Therefore, if this value is too small, you will end up in a situation where there is no performance at all.

The first is "shared_buffers". This is a parameter called "shared buffer" that specifies the size of the memory area that buffers table and index data in 8 kb blocks.

Second is "wal_buffers". This is the area ofmemory that is buffered when writing a file called the "transaction log (WAL)" to disk. By default, it is set to the size of "1/32" of the shared_buffers introduced earlier. For example, if you allocate 1GB of shared_buffers, wal_buffers will be 32MB, and if shared_buffers is the default 32MB, wal_buffers will be 1MB.

There is no easy way to know the proper value of wal_buffers so far, but I personally allocate about 16 to 32MB by default.

Checkpoint related

The second category is checkpoint related parameters. There are also two checkpoint-related parameters.

What is check-pointing in Postgres

A "checkpoint" is a process of writing (flashing) the contents of a shared buffer to an optical disk all at once. In other words, check-pointing makes it possible to guarantee that the contents of the shared buffer are stored on disk. However, since checkpoints are the process of writing to disk, the process is naturally heavy, and if this happens frequently, the performance of the database will deteriorate. Therefore, the parameter introduced here is the setting of "how often checkpoints are generated".

First is checkpoint_completion_target to reduce the I/O load from checkpoints It is to avoid flooding the I/O system with a burst of page writes, writing dirty buffers during a checkpoint is spread over a period of time. It specifies the target of checkpoint completion, as a fraction of total time between checkpoints.

Using this, the I/O rate is adjusted so that the checkpoint finishes when the given fraction of checkpoint_timeout seconds have elapsed, or before max_wal_size is exceeded, whichever is sooner. And the goal is not to do checkpoints too often or too infrequently, and our tuning best practice consists of second parameter.

The second is a parameter called "checkpoint_timeout". This specifies the interval at which checkpoints occur in "time".

The default value of checkpoint_timeout is "5 minutes (5m)", but this is also a little too short, so I try to set "60 minutes (60m)" as the initial value.

Server log related

The third category is the parameters related to the server log. Here, we will introduce only one parameter called "log_line_prefix".

The parameter "log_line_prefix" is the setting to add what is added to the beginning of each line of the server log output by the PostgreSQL server. With the default settings, the PostgreSQL server log does not output "information that is naturally required as a server log" such as the time stamp and process ID. As a result, you may have trouble analyzing something even during development. Therefore, I set log_line_prefix to output the timestamp and process ID at a minimum as follows.