The physical structure of PostgreSQL is very simple. It consists of shared memory and a few background processes and data files.
Shared Memory refers to the memory reserved for database caching and transaction log caching. The most important elements in shared memory are Shared Buffer and WAL buffers
The purpose of Shared Buffer is to minimize DISK IO. For this purpose, the following principles must be met
- You need to access very large (tens, hundreds of gigabytes) buffers quickly.
- You should minimize contention when many users access it at the same time.
- Frequently used blocks must be in the buffer for as long as possible
The WAL buffer is a buffer that temporarily stores changes to the database. The contents stored in the WAL buffer are written to the WAL file at a predetermined point in time. From a backup and recovery point of view, WAL buffers and WAL files are very important.
PostgreSQL Process Types
PostgreSQL has four process types.
- Postmaster (Daemon) Process
- Background Process
- Backend Process
- Client Process
The Postmaster process is the first process started when you start PostgreSQL. At startup, performs recovery, initialize shared memory, and run background processes. It also creates a backend process when there is a connection request from the client process.
Process relationship diagram
If you check the relationships between processes with the pstree command, you can see that the Postmaster process is the parent process of all processes. (For clarity, I added the process name and argument after the process ID)
The list of background processes required for PostgreSQL operation are as follows.
|logger||Write the error message to the log file.|
|checkpointer||When a checkpoint occurs, the dirty buffer is written to the file.|
|writer||Periodically writes the dirty buffer to a file.|
|wal writer||Write the WAL buffer to the WAL file.|
|Autovacuum launcher||Fork autovacuum worker when autovacuum is enabled.It is the responsibility of the autovacuum daemon to carry vacuum operations on bloated tables on demand|
|archiver||When in Archive.log mode, copy the WAL file to the specified directory.|
|stats collector||DBMS usage statistics such as session execution information ( pg_stat_activity ) and table usage statistical information ( pg_stat_all_tables ) are collected.|
The maximum number of backend processes is set by the max_connections parameter, and the default value is 100. The backend process performs the query request of the user process and then transmits the result. Some memory structures are required for query execution, which is called local memory. The main parameters associated with local memory are:
- work_mem Space used for sorting, bitmap operations, hash joins, and merge joins. The default setting is 4 MB.
- Maintenance_work_mem Space used for Vacuum and CREATE INDEX . The default setting is 64 MB.
- Temp_buffers Space used for temporary tables. The default setting is 8 MB.
Client Process refers to the background process that is assigned for every backend user connection.Usually the postmaster process will fork a child process that is dedicated to serve a user connection.
Here are some things that are important to know when attempting to understand the database structure of PostgreSQL.
Items related to the database
- PostgreSQL consists of several databases. This is called a database cluster.
- When initdb () is executed, template0 , template1 , and postgres databases are created.
- The template0 and template1 databases are template databases for user database creation and contain the system catalog tables.
- The list of tables in the template0 and template1 databases is the same immediately after initdb (). However, the template1 database can create objects that the user needs.
- The user database is created by cloning the template1 database.
Items related to the tablespace
- The pg_default and pg_global tablespaces are created immediately after initdb().
- If you do not specify a tablespace at the time of table creation, it is stored in the pg_dafault tablespace.
- Tables managed at the database cluster level are stored in the pg_global tablespace.
- The physical location of the pg_default tablespace is $PGDATA\base.
- The physical location of the pg_global tablespace is $PGDATA\global.
- One tablespace can be used by multiple databases. At this time, a database-specific subdirectory is created in the table space directory.
- Creating a user tablespace creates a symbolic link to the user tablespace in the $PGDATA\tblspc directory.
Items related to the table
- There are three files per table.
- One is a file for storing table data. The file name is the OID of the table.
- One is a file to manage table free space. The file name is OID_fsm .
- One is a file for managing the visibility of the table block. The file name is OID_vm .
- The index does not have a _vm file. That is, OID and OID_fsm are composed of two files.