User Tools

Site Tools


postgresql

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
postgresql [2016/03/03 15:33]
mstraub [Grant on all tables]
postgresql [2018/08/27 13:41]
mantis
Line 1: Line 1:
 +====== Installation & Setup ======
 +
 +For Ubuntu 12.04:
 +
 +<code bash>
 +sudo apt-get install postgresql-9.1 pgadmin3
 +</​code>​
 +
 +pgAdmin III is a strongly recommended GUI for twiddling with your PostgreSQL. However, here are the first steps with the command line right after installing.
 +
 +===== Server Configuration =====
 +
 +==== Enable remote TCP/IP connections ====
 +
 +Allow TCP/IP connections from all IPs (default = localhost only) by adding this line to ''/​etc/​postgresql/​9.1/​main/​postgresql.conf'':​
 +
 +<​code>​
 +listen_addresses = '​*'​
 +</​code>​
 +
 +Then allow password authentication for all users from all IPs by adding this line to ''/​etc/​postgresql/​9.1/​main/​pg_hba.conf'':​
 +
 +<​code>​
 +host    all         ​all ​        ​0.0.0.0/​0 ​              md5
 +</​code>​
 +
 +
 +To further twiddle with more restrictive client authentication settings have a look at [[http://​www.postgresql.org/​docs/​9.1/​static/​auth-pg-hba-conf.html|pg_hba.conf]].
 +
 +
 +===== User Management =====
 +
 +First steps with a user of your choosing
 +
 +<code bash>
 +# log in with user created by default (which has no password set)
 +sudo -u postgres psql
 +
 +# create a user with your linux user name and log in
 +# (in Ubuntu the postgres user that matches your linux user name
 +# can by default log in without supplying a password) ​
 +sudo -u postgres createuser --superuser $USER
 +psql -d postgres
 +# set a password
 +\password
 +
 +# create a restricted user with password (long and short version)
 +sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
 +sudo -u postgres createuser -SDRP username
 +psql -U username -d postgres
 +</​code>​
 +
 +<code bash>
 +# drop a user
 +sudo -u postgres dropuser username
 +</​code>​
 +
 +==== Setting a password for user postgres ====
 +<code sql>
 +\password postgres
 +</​code>​
 +
 +
 +==== Create Readonly User ====
 +**1. Create the user**
 +<code bash>
 +sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
 +</​code>​
 +
 +**2. Grant readonly permissions**
 +
 +This is the tedious part (in postgres versions pre 9.0). We need to grant the user the permission to
 +  * connect to the DB
 +  * use the schema
 +  * perform a select on everay object in the schema
 +
 +<code sql>
 +GRANT CONNECT ON DATABASE mydb TO username;
 +GRANT USAGE ON SCHEMA public TO username;
 +
 +-- Generate GRANT SELECT statements for all tables/​views in the schema
 +SELECT 'GRANT SELECT ON public.'​ || relname || ' TO username;'​
 +FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
 +WHERE nspname = '​public'​ AND relkind IN ('​r',​ '​v'​);​
 +</​code>​
 +
 +=== Slightly different approach ===
 +Creating a true read-only user is a real pain in postgres. Just revoking privileges from a user still allows him to create database objects which he owns. You need to revoke the CREATE and USAGE privileges from the SCHEMA public and the GROUP public. ​
 +
 +<code sql>
 +
 +-- Revoke default permissions from public group:
 +REVOKE CREATE ON SCHEMA public FROM PUBLIC;
 +REVOKE USAGE ON SCHEMA public FROM PUBLIC;
 +
 +-- Add back permissions for the users allowed to create stuff
 +GRANT CREATE ON SCHEMA public TO rw_user;
 +GRANT USAGE ON SCHEMA public TO rw_user;
 +
 +-- Add SELECT permissions for read-only user
 +GRANT USAGE ON SCHEMA public to readonly_user;​
 +GRANT SELECT ON my_table to readonly_user;​
 +
 +</​code>​
 +
 +see http://​www.ruizs.org/​archives/​89 for details. ​
 +
 +
 +==== Grant on all tables ====
 +
 +With a single [[http://​www.postgresql.org/​docs/​current/​static/​sql-grant.html|GRANT command]] permission for all tables in a schema can be changed:
 +<​code>​
 +GRANT ALL ON ALL TABLES IN SCHEMA public TO user;
 +</​code>​
 +
 +==== Change owner on all tables ====
 +
 +CLI: Unfortunately this can only be achieved via scripting, but we don't have to reinvent the wheel: https://​gist.github.com/​jirutka/​afa3ce62b1430abf7572
 +
 +GUI: Or in pgadminIII right-click on the schema, go to properties, and set the owner to your liking.
 +===== PSQL meta-commands =====
 +
 +After you logged into PostgreSQL it's good to know some often-used meta-commands (read more in ''​man psql''​):​
 +
 +<code sql>
 +-- list all users
 +\du
 +
 +-- list all schemas
 +\dn
 +
 +-- list tables
 +\d
 +\d+  -- (also show comments)
 +\d tablename -- show details of a table
 +
 +-- list all tables / tables of a certain schema
 +\dt *.* -- list all tables in DB
 +\dt schemaname.* -- list all tables in a certain schema
 +
 +-- list all views
 +\dv *.* -- list all views in DB
 +
 +-- list all databases on the host
 +\l
 +
 +-- connect to a database
 +\c dbname
 +
 +-- execute a command on the Linux shell (as-is)
 +\! command
 +
 +-- quit
 +\q
 +</​code>​
 +
 +For more meta-commands check out [[http://​www.postgresql.org/​docs/​9.1/​static/​app-psql.html|the documentation]].
 +
 +
 +===== List and Disconnect Active Sessions =====
 +
 +<code sql>
 +-- list active connections (and get procpids)
 +SELECT * FROM pg_stat_activity;​
 +
 +-- cancel a process but leave the user's session open
 +SELECT pg_cancel_backend( <​procpid>​ );
 +
 +-- terminate a user's query and session
 +SELECT pg_terminate_backend( <​procpid>​ );
 +</​code>​
 +====== DB Management ======
 +===== Show DB Sizes =====
 +<code sql>
 +SELECT pg_database.datname,  ​
 +       ​pg_size_pretty(pg_database_size(pg_database.datname)) AS size  ​
 +  FROM pg_database ​
 +  ORDER BY pg_database_size(pg_database.datname) desc;
 +</​code>​
 +
 +===== Find large databases =====
 +<code sql>
 +SELECT d.datname AS Name,  pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
 +    CASE WHEN pg_catalog.has_database_privilege(d.datname,​ '​CONNECT'​)
 +        THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
 +        ELSE 'No Access'​
 +    END AS Size
 +FROM pg_catalog.pg_database d
 +    ORDER BY
 +    CASE WHEN pg_catalog.has_database_privilege(d.datname,​ '​CONNECT'​)
 +        THEN pg_catalog.pg_database_size(d.datname)
 +        ELSE NULL
 +    END DESC -- nulls first
 +    LIMIT 20
 +</​code>​
 +
 +see also: https://​wiki.postgresql.org/​wiki/​Disk_Usage
 +
 +===== Find large tables (in one DB) =====
 +<code sql>
 +SELECT nspname || '​.'​ || relname AS "​relation",​
 +    pg_size_pretty(pg_relation_size(C.oid)) AS "​size"​
 +  FROM pg_class C
 +  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
 +  WHERE nspname NOT IN ('​pg_catalog',​ '​information_schema'​)
 +  ORDER BY pg_relation_size(C.oid) DESC
 +  LIMIT 20;
 +</​code>​
 +
 +
 +===== Show Table size =====
 +
 +<code sql>
 +-- size of data
 +SELECT pg_size_pretty(pg_total_relation_size('<​TABLE_NAME>'​));​
 +
 +-- size of indices
 +SELECT c2.relname, c2.relpages * 8 / 1024 AS "​MB"​
 +FROM pg_class c, pg_class c2, pg_index i
 +WHERE c.relname = '<​TABLE_NAME>'​ AND
 +      c.oid = i.indrelid AND
 +      c2.oid = i.indexrelid
 +ORDER BY c2.relname;
 +</​code>​
 +====== SQL ======
 +===== Date and Time =====
 +  * http://​www.postgresql.org/​docs/​9.1/​interactive/​datatype-datetime.html
 +  * http://​www.postgresql.org/​docs/​9.1/​static/​functions-datetime.html
 +
 +==== Add 15-min interval to time ====
 +<code sql>
 +   ​SELECT DATE(timestamp) + (interval_id) * time '​00:​15:​00'​ FROM ...
 +</​code>​
 +==== Extract date-parts ====
 +<code sql>
 +   ​select extract(minute from timestamp '​2008-08-01 12:​14:​13'​);​
 +</​code>​
 +see http://​www.postgresql.org/​docs/​8.3/​static/​functions-datetime.html#​FUNCTIONS-DATETIME-EXTRACT
 +
 +==== Extract TIME from TIMESTAMP ====
 +<code sql>
 +SELECT "​time"​(mytimestamp) FROM table; ​
 +</​code>​
 +
 +==== Convert Timestamp to Seconds ====
 +To convert a timestamp / interval to seconds (in case of timestamp seonds ​ since 01-01-1970)
 +
 +<code sql>
 +   ​SELECT EXTRACT (EPOCH FROM  timestamp_or_interval );
 +</​code>​
 +
 +==== Creating Timestamps from Text Fields with missing leading Zeros ====
 +
 +If your data looks like this:
 +
 +"​2011/​3/​15 ";"​6:​17:​3 "
 +
 +You can construct timestamps using the "​FM"​ (fill mode) modifier:
 +
 +<code sql>
 +   ​SELECT to_timestamp(utc_date|'​ '​|utc_time,'​YYYY/​FMMM/​FMDD FMHH24:​FMMI:​FMSS'​ );
 +</​code>​
 +
 +
 +==== Update parts of date ====
 +<code sql>
 +update datetesting set d = to_timestamp( to_char (d,'​YYYY-MM-DD 12:​00:​00'​),'​YYYY-MM-DD HH24:​MI:​SS'​ ) ;
 +</​code>​
 +
 +
 +==== Time Zone Information ====
 +Boy that was confusing so here it goes...
 +
 +=== Converting from string to TIMESTAMP ===
 +
 +
 +<code sql>
 +select '​2013-05-18 12:​30:​00'::​TIMESTAMP WITH TIME ZONE ; 
 +-- 2013-05-18 12:​30:​00+02 ​        no timezone info provided. Postgres simply assumes the current TIME ZONE
 +
 +select'​2013-05-18 12:​30:​00+00'::​TIMESTAMP WITH TIME ZONE as with_tz, '​2013-05-18 12:​30:​00+05'::​TIMESTAMP WITHOUT TIME ZONE as without_tz;
 +--        with_tz ​        ​| ​    ​without_tz ​     ​
 +------------------------+---------------------
 +-- 2013-05-18 14:30:00+02 | 2013-05-18 12:30:00
 +</​code>​
 +
 +**Notes**
 +  * When no TIME ZONE info is provided, but Postgres wants TIME ZONE info, then it simply assumes the current TIME ZONE of the session.
 +  * Timestamps WITH TIME ZONE are displayed in the TIME ZONE of the current session. Unless explicitly requested otherwise (see later).
 +  * The TIME ZONE part of a string will simply be **ignored** if you are converting to TIMESTAMP WITHOUT TIME ZONE
 +    * this also applies when you insert to a column of type TIMESTAMP WITHOUT TIME ZONE
 +
 +<code sql>
 +SELECT '​2013-05-18 12:​30:​00+06'::​TIMESTAMP;​
 +-- 2013-05-18 12:​30:​00 ​        ​-->​ the  +06 has been completely ignored
 +
 +
 +SELECT '​2013-05-18 12:​30:​00+06'::​TIMESTAMP WITH TIME ZONE;
 +-- 2013-05-18 08:​30:​00+02 ​     --> ​  6:30 UTC --> 8:30CET (+02)
 +
 +</​code>​
 +
 +
 +=== Explicitly setting the TIME ZONE ===
 +<code sql>
 +set TIME ZONE '​UTC';​
 +select '​2013-05-18 12:​30:​00+00'::​TIMESTAMP WITH TIME ZONE;
 +-- 2013-05-18 12:30:00+00
 +
 +
 +
 +set TIME ZONE '​CET';​
 +select '​2013-05-18 12:​30:​00+00'::​TIMESTAMP WITH TIME ZONE;
 +-- 2013-05-18 14:30:00+02
 +
 +
 +</​code>​
 +
 +
 +
 +=== Converting from epoch values to TIMESTAMP ===
 +<code sql>
 +select to_timestamp(1358749391);​
 + 
 +--> "​2013-01-21 07:​23:​11+01"​
 +</​code>​
 +
 +**Notes:**
 +  * Postgres converts the epoch value to TIMESTAMP WITH TIME ZONE. Thats why the result is displayed in the local time zone (CEST, winter time).
 +
 +=== Discard TIME ZONE Information ===
 +<code sql>
 +select to_timestamp(1358749391)::​timestamp without time zone;
 +--> "​2013-01-21 07:​23:​11"​
 +</​code> ​
 +**Notes**
 +  * does the same thing as before, but also discards the time zone information and returns a TIMESTAMP WITHOUT TIME ZONE
 +
 +=== Convert to other TIME ZONE === 
 +<code sql> ​
 +select
 +  to_timestamp(1358749391) at time zone '​utc'; ​
 +
 +--> "​2013-01-21 06:​23:​11"​
 +</​code>​
 +
 +**Notes**
 +  * Displays the value at time zone  UTC --> ​ 6:23
 +  * The result is a TIMESTAMP WITHOUT TIME ZONE. In this case postgres effectively removes the time zone information from the result
 +    * see http://​www.postgresql.org/​docs/​9.2/​static/​functions-datetime.html#​FUNCTIONS-DATETIME-ZONECONVERT ​
 +
 +
 +
 +
 +===== Useful Functions =====
 +==== Aggregate Concat ====
 +<code sql>
 +
 +  CREATE AGGREGATE textcat_all(
 +    basetype ​   = text,
 +    sfunc       = textcat,
 +    stype       = text,
 +    initcond ​   = ''​
 +  );
 +</​code>​
 +**Usage:**
 +
 +<code sql>
 +   ​SELECT company_id, textcat_all(employee | ', ') FROM mytable GROUP BY company_id;
 +</​code>​
 +
 +==== row_number ====
 +
 +Is e.g. useful if you need an OID where none is present:
 +
 +<code sql>
 +SELECT ​ row_number() OVER (ORDER BY NULL) AS oid FROM table;
 +</​code>​
 +
 +the "order by" clause is required, but in this special case we do not require any particular order, so we use NULL here.
 +
 +==== Round time to interval ====
 +<code sql>
 +CREATE OR REPLACE FUNCTION round_time_to_interval(TIMESTAMP) ​
 +RETURNS TIMESTAMP AS $$ 
 +  SELECT date_trunc('​hour',​ $1) + INTERVAL '15 min' * FLOOR(date_part('​minute',​ $1) / 15.0) 
 +$$ LANGUAGE SQL;
 +</​code>​
 +
 +
 +==== Rows to Columns ====
 +This is an obscure one, but it is very handy if you must transform a number of rows in a number of columns. This can be achieved with the crosstab() function provided by the tablefunc package of postgres-contrib. ​ So first we need to install postgresql-contrib and enable the function in our db:
 +
 +<code bash>
 +sudo apt-get install postgresql-contrib
 +sudo service postgresql-8.4 restart
 +cd /​usr/​share/​postgresql/​8.4/​contrib
 +psql -U username -d dbname -h localhost -f tablefunc.sql
 +</​code>​
 +
 +Next have a look at the documentation of crosstab here: http://​www.postgresql.org/​docs/​8.4/​static/​tablefunc.html
 +
 +
 +Here is an example of how to aggregate a traffic state from rows to columns (for a number of known roads). ​
 +
 +<code sql>
 +SELECT * FROM
 +crosstab(
 + '​SELECT valid_from, road_id, q from ait.its_traffic_state WHERE road_id IN (23005242,​23008577) order by 1',
 + '​SELECT distinct road_id FROM ait.its_traffic_state WHERE road_id IN (23005242,​23008577) order by 1'
 +
 +
 +AS its_traffic_state(time timestamp, road1 integer, road2 integer);
 +</​code>​
 +...for a higher number of columns you may need a script to generate the column_name/​datatype row at the end of the query (if anyone knows a better solution plz tell).
 +
 +
 +
 +===== Updating Sequence Counters =====
 +When data is inserted into the database with a specified value for the autoincrementing column the sequence is not touched and must be updated manually.
 +This is the case when e.g. importing dumps using COPY, or a simple INSERT statement with the autoincrementing column set to a value.
 +
 +While the existing values (e.g. id 1) are reused the sequence counters stay the same (i.e. start with 0): this prevents new data records from being created as the new object would have a duplicate object id (1 == 1).
 +
 +The fix:
 +<code sql>
 +SELECT SETVAL('​sequence_name',​ MAX(auto_incrementing_column)) FROM table_name;
 +</​code>​
 +
 +
 +===== SQL Hints =====
 +
 +==== Timeseries completion ====
 +Assume you have incomplete timeseries in your database but want to select a complete series (pad the missing values with a Zero). The following query should take care of the padding and replacing missing values by 0-values:
 +
 +<code sql>
 +SELECT intervals.intervals as interval, coalesce(q,​0)
 +FROM timeseries t 
 +RIGHT OUTER JOIN generate_series(0,​24) intervals
 +ON t.interval = intervals.intervals ; 
 +</​code>​
 +
 +==== Timeseries creation ====
 +To create a series of timestamps
 +<​code>​
 +select * from
 +generate_series('​2013-11-01'::​timestamp,​ '​2013-11-02'::​timestamp,​ '15 minutes'​)
 +</​code>​
 +==== Convert Array to Rows ====
 +This might be useful to replace ​ WHERE x IN ('​a','​b','​c'​) ​ statements (which are notoriously slow in postgres) with JOINs.
 +
 +<code sql>
 +SELECT i, (ARRAY['​a','​b','​c'​])[i] ​
 +FROM generate_series(1,​array_upper(ARRAY['​a','​b','​c'​],​1)) i;
 +
 +
 + i | array
 +---+-------
 + 1 | a
 + 2 | b
 + 3 | c
 +
 +(3 rows)
 +</​code> ​                                                                                                    
 +
 +It is not very elegant though especially as you need to specify the array twice (or know the size of the array, then you can replace the array_upper statement with the size). If someone has a better solution I would be interested :) 
 +
 +====== Stored Procedures ======
 +===== PL/Python =====
 +  * http://​www.postgresql.org/​docs/​8.3/​interactive/​plpython.html
 +
 +==== Aggreagte Functions in PL/Python ====
 +  * Aggregates in Postgres (Basics) : [[http://​www.postgresonline.com/​periodical.php?​i_id=7#​item-67]]
 +  * PL/Python Specific : [[http://​www.postgresonline.com/​journal/​index.php?/​archives/​102-PLPython-Part-4-PLPython-meets-aggregates.html ]]
 +
 +===== PL/Java =====
 +==== Installing ====
 +Download the most recent version from http://​pgfoundry.org/​frs/?​group_id=1000038 . Most of the instructions in this sections are taken from the README file in the tarball.
 +  * unzip pljava to /opt/pljava
 +  * sudo vi /​etc/​postgresql/​8.3/​main/​postgresql.conf
 +
 +<​code>​
 +SET the following parameters:
 +    dynamic_library_path = '​$libdir:/​opt/​pljava'​
 +    log_min_messages = info   #​optional
 +    custom_variable_classes = '​pljava'​
 +
 +ADD the following parameters:
 +    pljava.classpath = '/​opt/​pljava/​pljava.jar'​
 +</​code>​
 +* update ldconfig so postgres finds the reqired shared objects. Create a file **/​etc/​ld.so.conf.d/​pljava.conf** ​ containing
 +
 + . /​usr/​lib/​jvm/​java-6-sun/​jre/​lib/​amd64 /​usr/​lib/​jvm/​java-6-sun/​jre/​lib/​amd64/​server /​usr/​lib/​jvm/​java-6-sun/​jre/​lib/​amd64/​native_threads
 +  * run
 +
 +<code bash>
 +  sudo ldconfig
 +</​code>​
 +* install pl/java
 +
 +<code bash>
 +    sudo su postgres
 +    psql -f install.sql
 +</​code>​
 +===== PL/R =====
 +==== Installing ====
 +* Install the postgres-plr package
 +
 + . sudo aptitude install postgresql-8.3-plr
 +  * On ubuntu 8.04 the packet does not exist but the Jaunty Version works just fine. Search for the correct package on http://​packages.ubuntu.com/​search?​keywords=postgresql-8.3-plr then do something like
 +
 +<​code>​
 +   wget http://​de.archive.ubuntu.com/​ubuntu/​pool/​universe/​p/​plr/​postgresql-8.3-plr_8.3.0.6-3_amd64.deb
 +   dpkg -i postgresql-8.3-plr_8.3.0.6-3_amd64.deb
 +</​code>​
 +* Install PL/R in the database of choice
 +
 +<​code>​
 +   psql <​dbname>​ < /​usr/​share/​postgresql/​8.3/​plr.sql
 +</​code>​
 +* Test
 +
 +<code sql>
 +  CREATE OR REPLACE FUNCTION r_max (integer, integer) RETURNS integer AS '
 +    if (arg1 > arg2)
 +       ​return(arg1)
 +    else
 +       ​return(arg2)
 +  ' LANGUAGE '​plr'​ STRICT;
 +
 +  SELECT r_max(10,​20);​
 +</​code>​
 +==== Using ====
 +Have a look at the guide here http://​www.joeconway.com/​plr/​doc/​index.html
 +
 +====== Useful commands ======
 +===== Inserting from CSV Files =====
 +To efficiently insert a lot of records from csv-files into the table "​tolldata"​ use the [[http://​www.postgresql.org/​docs/​9.1/​static/​sql-copy.html]] COPY command:
 +
 +<code bash>
 +cat *.csv | psql -c "COPY tolldata FROM stdin WITH DELIMITER ';'"​ -d toll_tst -U toll -h localhost
 +</​code>​
 +Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must //reside on or be accessible to the database server machine//, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
 +
 +===== Outputting in CSV Format =====
 +If you need to export an entire table
 +
 +<​code>​
 +COPY table TO '/​tmp/​table.csv'​ DELIMITER ','​ CSV HEADER;
 +</​code>​
 +
 +If query is more complicated: ​
 +
 +<code sql>
 +--- toggle to unaligned output mode
 +\a
 +--- set ; as field delimiter
 +\f ;
 +--- disable the footer (row count)
 +\pset footer off
 +
 +SELECT * FROM table \g file.csv
 +</​code>​
 +===== Piping query result to file =====
 +In psql command line:
 +
 +<code sql>
 +SELECT * FROM table \g filename
 +</​code>​
 +([[http://​archives.postgresql.org/​pgsql-general/​2001-12/​msg00687.php see more details]])
 +
 +===== Show Tables =====
 +<code sql>
 +  \dt
 +or
 +  \dt my_schema
 +</​code>​
 +===== drop all tables in a schema =====
 +<code sql>
 + ​select 'drop table '​|schemaname|'​.'​|tablename|';'​ from pg_tables where schemaname in ('​public'​) order by schemaname, tablename;
 +</​code>​
 +===== drop all sequences in a schema =====
 +<code sql>
 + ​select 'drop sequence public.'​|sequence_name|';'​ from information_schema.sequences where sequence_schema in ('​public'​) order by  sequence_name;​
 +</​code>​
 +===== Grant on all tables =====
 +<code sql>
 + ​select 'grant select, insert, update, delete on ' | tablename | ' to muh;' from pg_tables where schemaname = '​public';​
 +</​code>​
 +
 +====== Backups ======
 +
 +There are many options available on how to back up your databases. Postgres already contains some tools itself:
 +  * ''​pg_dump'':​ per-database dumping of db contents
 +  * ''​pg_basebackup'':​ binary backup of whole postgres clusters (per db backup is not possible)
 +
 +However, especially for larger databases (hundreds of GB) more powerful third-party tools featuring parallelism ​ and incremental backups are [[http://​bonesmoses.org/​2017/​02/​03/​pg-phriday-getting-back-up|recommended]]:​
 +  * [[https://​pgbackrest.org|pgBackRest]]
 +  * [[https://​www.pgbarman.org|pgBarman]]
 +
 +
 +===== pg_dump =====
 +
 +It makes sense to use the postgres user
 +<code bash>
 +sudo su postgres
 +</​code>​
 +
 +There are two important variants of making backups:
 +
 +Create and restore a backup using plain-text sql: ''​pg_dump''​ and ''​psql''​
 +<code bash>
 +pg_dump -U user source_db -f db_backup.sql
 +
 +# restore
 +createdb target_db
 +psql -U user -d target_db -f db_backup.sql
 +</​code>​
 +
 +Create and restore a backup using the compressed '​custom'​ format: ''​pg_dump''​ and ''​pg_restore''​
 +<code bash>
 +pg_dump -Fc -U user source_db > db_backup.dmp
 +
 +# restore (requires an empty database)
 +createdb target_db
 +pg_restore -Fc -d target_db db_backup.dmp
 +</​code>​
 +
 +In addition we can use gzip (and zcat) to further compress the compressed '​custom'​ format and give the dump a nice name:
 +<code bash>
 +pg_dump -Fc -U user source_db | gzip > "​$(date +%Y-%m-%d)_db_backup.dmp.gz"​
 +zcat YYYY-MM-DD_db_backup.dmp.gz | pg_restore -Fc -d target_db
 +</​code>​
 +
 +Or, when creating the backup via cronjob in ''/​etc/​cron.d'':​
 +<​code>​
 +  0   ​5 ​  ​* ​  ​* ​  ​* ​  ​postgres ​ pg_dump -Fc dbname | gzip > /​path/​to/​db_backup_`date +\%Y-\%m-\%dT\%H_\%M`.dmp.gz
 +# ┬   ​┬ ​  ​┬ ​  ​┬ ​  ​┬ ​  ​┬ ​        ┬
 +# │   ​│ ​  ​│ ​  ​│ ​  ​| ​  ​│ ​        └ command
 +# │   ​│ ​  ​│ ​  ​│ ​  ​| ​  ​└───── user executing the command
 +# │   ​│ ​  ​│ ​  ​│ ​  ​└───────── day of week (0 - 7) (0 to 7 are Sunday to Sunday, or use names)
 +# │   ​│ ​  ​│ ​  ​└───────────── month (1 - 12)
 +# │   ​│ ​  ​└───────────────── day of month (1 - 31)
 +# │   ​└───────────────────── hour (0 - 23)
 +# └───────────────────────── min (0 - 59)
 +</​code>​
 +
 +===== pg_basebackup =====
 +
 +see e.g. [[https://​blog.sleeplessbeastie.eu/​2016/​02/​15/​how-to-perform-base-postgresql-backup-using-pg_basebackup-utility|this tutorial]]
 +
 +===== pgBackRest =====
 +
 +A typical use of pgBackRest would be to install a backup server that backs up several DB servers. For now we only use a simpler approach: pgBackRest backing up on localhost. For more see the [[https://​pgbackrest.org/​user-guide.html|excellent user guide]] or [[https://​pgstef.github.io/​2018/​01/​04/​introduction_to_pgbackrest.html|this good introduction blog post]].
 +
 +Install from postgres repo:
 +
 +<code bash>
 +sudo add-apt-repository "deb http://​apt.postgresql.org/​pub/​repos/​apt/​ xenial-pgdg main"
 +wget --quiet -O - https://​www.postgresql.org/​media/​keys/​ACCC4CF8.asc | sudo apt-key add -
 +sudo apt update
 +sudo apt install pgbackrest
 +</​code>​
 +
 +==== Incremental Backups on Localhost ====
 +
 +In addition to the global section, where e.g. retention is configured, you must add a so-called "​stanza"​ for each cluster you want to back up. Make sure the repo1-path exists and is owned by user postgres.
 +
 +<file - /​etc/​pgbackrest/​pgbackrest.conf>​
 +[global]
 +repo1-path=/​var/​lib/​pgbackrest
 +repo1-retention-full=5
 +repo1-retention-diff=1
 +
 +[STANZANAME]
 +pg1-path=/​var/​lib/​postgresql/​9.5/​CLUSTERNAME
 +</​file>​
 +
 +Then adjust the postgres config to allow for proper backups
 +
 +<file - /​etc/​postgresql/​9.5/​main/​postgresql.conf>​
 +listen_addresses = '​*'​
 +
 +# WRITE AHEAD LOG
 +wal_level = hot_standby ​
 +archive_mode = on 
 +archive_command = '​pgbackrest --stanza=STANZANAME archive-push %p'
 +
 +# REPLICATION
 +max_wal_senders = 3
 +
 +# ERROR REPORTING AND LOGGING
 +log_line_prefix = ''​
 +</​file>​
 +
 +Reload the cluster (or the complete postgres server) to activate changes
 +
 +  sudo systemctl restart postgresql@9.5-CLUSTERNAME
 +  ​
 +Create and test the pgbackrest stanza
 +
 +  sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info stanza-create
 +  sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info check
 +
 +Make backups manually (e.g. from a cronjob)
 +
 +  sudo -u postgres pgbackrest --stanza=STANZANAME --type=full --log-level-console=detail backup
 +  sudo -u postgres pgbackrest --stanza=STANZANAME --type=incr --log-level-console=detail backup
 +
 +Make **regular backups via cronjob**
 +
 +<file - /​etc/​cron.d/​pgbackrest>​
 +#minute hour day-of-month month day-of-week user command
 +0 4 * * 0   ​postgres pgbackrest --type=full --stanza=STANZANAME backup
 +0 4 * * 1-6 postgres pgbackrest --type=diff --stanza=STANZANAME backup
 +</​file>​
 +
 +**Restore latest backup.** The cluster must be stopped and the cluster directory must exist and be empty.
 +
 +  sudo systemctl stop postgresql@9.5-CLUSTERNAME
 +  sudo mv /​var/​lib/​postgresql/​9.5/​CLUSTERNAME /​opt/​backups/​CLUSTERNAME_backup
 +  sudo -u postgres mkdir /​var/​lib/​postgresql/​9.5/​CLUSTERNAME
 +  sudo chmod 0700 /​var/​lib/​postgresql/​9.5/​CLUSTERNAME
 +  sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=detail restore
 +  sudo systemctl start postgresql@9.5-CLUSTERNAME
 +
 +To **restore a specific backup** use ''​%%--set%%''​ with the name of a folder in ''/​var/​lib/​pgbackrest/​backup/​STANZANAME'',​ e.g.:
 +
 +  sudo -u postgres pgbackrest --set=20180724-144511F --stanza=STANZANAME --log-level-console=detail restore
 +
 +You can also copy the contents of ''/​var/​lib/​pgbackrest''​ to a different device and restore it there.
 +
 +Log files can be found in ''/​var/​log/​pgbackrest''​.
 +====== Performance Tuning ======
 +
 +[[Monitoring Server Activity]]
 +===== References =====
 +  * See http://​samiux.wordpress.com/​2009/​07/​26/​howto-performance-tuning-for-postgresql-on-ubuntudebian/​ for a good introduction and some reasonable default values. ​
 +  * See http://​www.postgresql.org/​docs/​8.4/​interactive/​kernel-resources.html#​SYSVIPC for Kernel Parameters that may be relevant when tuning Postgres Performance on Linux Systems
 +  * Note that while SHMMAX is measured in bytes SHMALL is measured in **Pages**
 +    * The default page size on a 32bit Linux system is usually 4096 bytes (check with the command //getconf PAGE_SIZE//​). So if you change SHMMAX you should also check that // SHMALL >= SHMMAX / PAGE_SIZE // 
  
postgresql.txt · Last modified: 2020/07/15 09:42 by mstraub