====== Installation & Setup ======
For Ubuntu 12.04:
sudo apt-get install postgresql-9.1 pgadmin3
pgAdmin III is a strongly recommended GUI for twiddling with your PostgreSQL. However, here are the first steps with the command line right after installing.
===== Server Configuration =====
==== Enable remote TCP/IP connections ====
Allow TCP/IP connections from all IPs (default = localhost only) by adding this line to ''/etc/postgresql/9.1/main/postgresql.conf'':
listen_addresses = '*'
Then allow password authentication for all users from all IPs by adding this line to ''/etc/postgresql/9.1/main/pg_hba.conf'':
host all all 0.0.0.0/0 md5
To further twiddle with more restrictive client authentication settings have a look at [[http://www.postgresql.org/docs/9.1/static/auth-pg-hba-conf.html|pg_hba.conf]].
===== User Management =====
First steps with a user of your choosing
# log in with user created by default (which has no password set)
sudo -u postgres psql
# create a user with your linux user name and log in
# (in Ubuntu the postgres user that matches your linux user name
# can by default log in without supplying a password)
sudo -u postgres createuser --superuser $USER
psql -d postgres
# set a password
\password
# make an existing user superuser (or remove superuser rights)
sudo -u postgres psql -c "ALTER USER theuser WITH SUPERUSER;"
sudo -u postgres psql -c "ALTER USER theuser WITH NOSUPERUSER;"
# create a restricted user with password (long and short version)
sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
sudo -u postgres createuser -SDRP username
psql -U username -d postgres
# drop a user
sudo -u postgres dropuser username
==== Setting a password for user postgres ====
\password postgres
==== Create Readonly User ====
**1. Create the user**
sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
**2. Grant readonly permissions**
This is the tedious part (in postgres versions pre 9.0). We need to grant the user the permission to
* connect to the DB
* use the schema
* perform a select on everay object in the schema
GRANT CONNECT ON DATABASE mydb TO username;
GRANT USAGE ON SCHEMA public TO username;
-- Generate GRANT SELECT statements for all tables/views in the schema
SELECT 'GRANT SELECT ON public.' || relname || ' TO username;'
FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
WHERE nspname = 'public' AND relkind IN ('r', 'v');
=== Slightly different approach ===
Creating a true read-only user is a real pain in postgres. Just revoking privileges from a user still allows him to create database objects which he owns. You need to revoke the CREATE and USAGE privileges from the SCHEMA public and the GROUP public.
-- Revoke default permissions from public group:
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
REVOKE USAGE ON SCHEMA public FROM PUBLIC;
-- Add back permissions for the users allowed to create stuff
GRANT CREATE ON SCHEMA public TO rw_user;
GRANT USAGE ON SCHEMA public TO rw_user;
-- Add SELECT permissions for read-only user
GRANT USAGE ON SCHEMA public to readonly_user;
GRANT SELECT ON my_table to readonly_user;
see http://www.ruizs.org/archives/89 for details.
==== Grant on all tables ====
With a single [[http://www.postgresql.org/docs/current/static/sql-grant.html|GRANT command]] permission for all tables in a schema can be changed:
GRANT ALL ON ALL TABLES IN SCHEMA public TO user;
==== Change owner on all tables ====
CLI: Unfortunately this can only be achieved via scripting, but we don't have to reinvent the wheel: https://gist.github.com/jirutka/afa3ce62b1430abf7572
GUI: Or in pgadminIII right-click on the schema, go to properties, and set the owner to your liking.
===== PSQL meta-commands =====
After you logged into PostgreSQL it's good to know some often-used meta-commands (read more in ''man psql''):
-- list all users
\du
-- list all schemas
\dn
-- list tables
\d
\d+ -- (also show comments)
\d tablename -- show details of a table
-- list all tables / tables of a certain schema
\dt *.* -- list all tables in DB
\dt schemaname.* -- list all tables in a certain schema
-- list all views
\dv *.* -- list all views in DB
-- list all databases on the host
\l
-- connect to a database
\c dbname
-- execute a command on the Linux shell (as-is)
\! command
-- quit
\q
For more meta-commands check out [[http://www.postgresql.org/docs/9.1/static/app-psql.html|the documentation]].
===== List and Disconnect Active Sessions =====
-- list active connections (and get procpids)
SELECT * FROM pg_stat_activity;
-- cancel a process but leave the user's session open
SELECT pg_cancel_backend( );
-- terminate a user's query and session
SELECT pg_terminate_backend( );
====== DB Management ======
===== Show DB Sizes =====
SELECT pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
ORDER BY pg_database_size(pg_database.datname) desc;
===== Find large databases =====
SELECT d.datname AS Name, pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
ELSE 'No Access'
END AS Size
FROM pg_catalog.pg_database d
ORDER BY
CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
THEN pg_catalog.pg_database_size(d.datname)
ELSE NULL
END DESC -- nulls first
LIMIT 20
see also: https://wiki.postgresql.org/wiki/Disk_Usage
===== Find large tables (in one DB) =====
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_relation_size(C.oid) DESC
LIMIT 20;
===== Show Table size =====
-- size of data
SELECT pg_size_pretty(pg_total_relation_size(''));
-- size of indices
SELECT c2.relname, c2.relpages * 8 / 1024 AS "MB"
FROM pg_class c, pg_class c2, pg_index i
WHERE c.relname = '' AND
c.oid = i.indrelid AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
====== SQL ======
===== Date and Time =====
* http://www.postgresql.org/docs/9.1/interactive/datatype-datetime.html
* http://www.postgresql.org/docs/9.1/static/functions-datetime.html
==== Add 15-min interval to time ====
SELECT DATE(timestamp) + (interval_id) * time '00:15:00' FROM ...
==== Extract date-parts ====
select extract(minute from timestamp '2008-08-01 12:14:13');
see http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
==== Extract TIME from TIMESTAMP ====
SELECT "time"(mytimestamp) FROM table;
==== Convert Timestamp to Seconds ====
To convert a timestamp / interval to seconds (in case of timestamp seonds since 01-01-1970)
SELECT EXTRACT (EPOCH FROM timestamp_or_interval );
==== Creating Timestamps from Text Fields with missing leading Zeros ====
If your data looks like this:
"2011/3/15 ";"6:17:3 "
You can construct timestamps using the "FM" (fill mode) modifier:
SELECT to_timestamp(utc_date|' '|utc_time,'YYYY/FMMM/FMDD FMHH24:FMMI:FMSS' );
==== Update parts of date ====
update datetesting set d = to_timestamp( to_char (d,'YYYY-MM-DD 12:00:00'),'YYYY-MM-DD HH24:MI:SS' ) ;
==== Time Zone Information ====
Boy that was confusing so here it goes...
=== Converting from string to TIMESTAMP ===
select '2013-05-18 12:30:00'::TIMESTAMP WITH TIME ZONE ;
-- 2013-05-18 12:30:00+02 no timezone info provided. Postgres simply assumes the current TIME ZONE
select'2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE as with_tz, '2013-05-18 12:30:00+05'::TIMESTAMP WITHOUT TIME ZONE as without_tz;
-- with_tz | without_tz
------------------------+---------------------
-- 2013-05-18 14:30:00+02 | 2013-05-18 12:30:00
**Notes**
* When no TIME ZONE info is provided, but Postgres wants TIME ZONE info, then it simply assumes the current TIME ZONE of the session.
* Timestamps WITH TIME ZONE are displayed in the TIME ZONE of the current session. Unless explicitly requested otherwise (see later).
* The TIME ZONE part of a string will simply be **ignored** if you are converting to TIMESTAMP WITHOUT TIME ZONE
* this also applies when you insert to a column of type TIMESTAMP WITHOUT TIME ZONE
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP;
-- 2013-05-18 12:30:00 --> the +06 has been completely ignored
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 08:30:00+02 --> 6:30 UTC --> 8:30CET (+02)
=== Explicitly setting the TIME ZONE ===
set TIME ZONE 'UTC';
select '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 12:30:00+00
set TIME ZONE 'CET';
select '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 14:30:00+02
=== Converting from epoch values to TIMESTAMP ===
select to_timestamp(1358749391);
--> "2013-01-21 07:23:11+01"
**Notes:**
* Postgres converts the epoch value to TIMESTAMP WITH TIME ZONE. Thats why the result is displayed in the local time zone (CEST, winter time).
=== Discard TIME ZONE Information ===
select to_timestamp(1358749391)::timestamp without time zone;
--> "2013-01-21 07:23:11"
**Notes**
* does the same thing as before, but also discards the time zone information and returns a TIMESTAMP WITHOUT TIME ZONE
=== Convert to other TIME ZONE ===
select
to_timestamp(1358749391) at time zone 'utc';
--> "2013-01-21 06:23:11"
**Notes**
* Displays the value at time zone UTC --> 6:23
* The result is a TIMESTAMP WITHOUT TIME ZONE. In this case postgres effectively removes the time zone information from the result
* see http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT
===== Useful Functions =====
==== Aggregate Concat ====
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
**Usage:**
SELECT company_id, textcat_all(employee | ', ') FROM mytable GROUP BY company_id;
==== row_number ====
Is e.g. useful if you need an OID where none is present:
SELECT row_number() OVER (ORDER BY NULL) AS oid FROM table;
the "order by" clause is required, but in this special case we do not require any particular order, so we use NULL here.
==== Round time to interval ====
CREATE OR REPLACE FUNCTION round_time_to_interval(TIMESTAMP)
RETURNS TIMESTAMP AS $$
SELECT date_trunc('hour', $1) + INTERVAL '15 min' * FLOOR(date_part('minute', $1) / 15.0)
$$ LANGUAGE SQL;
==== Rows to Columns ====
This is an obscure one, but it is very handy if you must transform a number of rows in a number of columns. This can be achieved with the crosstab() function provided by the tablefunc package of postgres-contrib. So first we need to install postgresql-contrib and enable the function in our db:
sudo apt-get install postgresql-contrib
sudo service postgresql-8.4 restart
cd /usr/share/postgresql/8.4/contrib
psql -U username -d dbname -h localhost -f tablefunc.sql
Next have a look at the documentation of crosstab here: http://www.postgresql.org/docs/8.4/static/tablefunc.html
Here is an example of how to aggregate a traffic state from rows to columns (for a number of known roads).
SELECT * FROM
crosstab(
'SELECT valid_from, road_id, q from ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1',
'SELECT distinct road_id FROM ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1'
)
AS its_traffic_state(time timestamp, road1 integer, road2 integer);
...for a higher number of columns you may need a script to generate the column_name/datatype row at the end of the query (if anyone knows a better solution plz tell).
===== Updating Sequence Counters =====
When data is inserted into the database with a specified value for the autoincrementing column the sequence is not touched and must be updated manually.
This is the case when e.g. importing dumps using COPY, or a simple INSERT statement with the autoincrementing column set to a value.
While the existing values (e.g. id 1) are reused the sequence counters stay the same (i.e. start with 0): this prevents new data records from being created as the new object would have a duplicate object id (1 == 1).
The fix:
SELECT SETVAL('sequence_name', MAX(auto_incrementing_column)) FROM table_name;
===== SQL Hints =====
==== Timeseries completion ====
Assume you have incomplete timeseries in your database but want to select a complete series (pad the missing values with a Zero). The following query should take care of the padding and replacing missing values by 0-values:
SELECT intervals.intervals as interval, coalesce(q,0)
FROM timeseries t
RIGHT OUTER JOIN generate_series(0,24) intervals
ON t.interval = intervals.intervals ;
==== Timeseries creation ====
To create a series of timestamps
select * from
generate_series('2013-11-01'::timestamp, '2013-11-02'::timestamp, '15 minutes')
==== Convert Array to Rows ====
This might be useful to replace WHERE x IN ('a','b','c') statements (which are notoriously slow in postgres) with JOINs.
SELECT i, (ARRAY['a','b','c'])[i]
FROM generate_series(1,array_upper(ARRAY['a','b','c'],1)) i;
i | array
---+-------
1 | a
2 | b
3 | c
(3 rows)
It is not very elegant though especially as you need to specify the array twice (or know the size of the array, then you can replace the array_upper statement with the size). If someone has a better solution I would be interested :)
====== Stored Procedures ======
===== PL/Python =====
* http://www.postgresql.org/docs/8.3/interactive/plpython.html
==== Aggreagte Functions in PL/Python ====
* Aggregates in Postgres (Basics) : [[http://www.postgresonline.com/periodical.php?i_id=7#item-67]]
* PL/Python Specific : [[http://www.postgresonline.com/journal/index.php?/archives/102-PLPython-Part-4-PLPython-meets-aggregates.html ]]
===== PL/Java =====
==== Installing ====
Download the most recent version from http://pgfoundry.org/frs/?group_id=1000038 . Most of the instructions in this sections are taken from the README file in the tarball.
* unzip pljava to /opt/pljava
* sudo vi /etc/postgresql/8.3/main/postgresql.conf
SET the following parameters:
dynamic_library_path = '$libdir:/opt/pljava'
log_min_messages = info #optional
custom_variable_classes = 'pljava'
ADD the following parameters:
pljava.classpath = '/opt/pljava/pljava.jar'
* update ldconfig so postgres finds the reqired shared objects. Create a file **/etc/ld.so.conf.d/pljava.conf** containing
. /usr/lib/jvm/java-6-sun/jre/lib/amd64 /usr/lib/jvm/java-6-sun/jre/lib/amd64/server /usr/lib/jvm/java-6-sun/jre/lib/amd64/native_threads
* run
sudo ldconfig
* install pl/java
sudo su postgres
psql -f install.sql
===== PL/R =====
==== Installing ====
* Install the postgres-plr package
. sudo aptitude install postgresql-8.3-plr
* On ubuntu 8.04 the packet does not exist but the Jaunty Version works just fine. Search for the correct package on http://packages.ubuntu.com/search?keywords=postgresql-8.3-plr then do something like
wget http://de.archive.ubuntu.com/ubuntu/pool/universe/p/plr/postgresql-8.3-plr_8.3.0.6-3_amd64.deb
dpkg -i postgresql-8.3-plr_8.3.0.6-3_amd64.deb
* Install PL/R in the database of choice
psql < /usr/share/postgresql/8.3/plr.sql
* Test
CREATE OR REPLACE FUNCTION r_max (integer, integer) RETURNS integer AS '
if (arg1 > arg2)
return(arg1)
else
return(arg2)
' LANGUAGE 'plr' STRICT;
SELECT r_max(10,20);
==== Using ====
Have a look at the guide here http://www.joeconway.com/plr/doc/index.html
====== Useful commands ======
===== Inserting from CSV Files =====
To efficiently insert a lot of records from csv-files into the table "tolldata" use the [[http://www.postgresql.org/docs/9.1/static/sql-copy.html]] COPY command:
cat *.csv | psql -c "COPY tolldata FROM stdin WITH DELIMITER ';'" -d toll_tst -U toll -h localhost
Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must //reside on or be accessible to the database server machine//, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.
===== Outputting in CSV Format =====
If you need to export an entire table
COPY table TO '/tmp/table.csv' DELIMITER ',' CSV HEADER;
If query is more complicated:
--- toggle to unaligned output mode
\a
--- set ; as field delimiter
\f ;
--- disable the footer (row count)
\pset footer off
SELECT * FROM table \g file.csv
===== Piping query result to file =====
In psql command line:
SELECT * FROM table \g filename
([[http://archives.postgresql.org/pgsql-general/2001-12/msg00687.php see more details]])
===== Show Tables =====
\dt
or
\dt my_schema
===== drop all tables in a schema =====
select 'drop table '|schemaname|'.'|tablename|';' from pg_tables where schemaname in ('public') order by schemaname, tablename;
===== drop all sequences in a schema =====
select 'drop sequence public.'|sequence_name|';' from information_schema.sequences where sequence_schema in ('public') order by sequence_name;
===== Grant on all tables =====
select 'grant select, insert, update, delete on ' | tablename | ' to muh;' from pg_tables where schemaname = 'public';
====== Backups ======
There are many options available on how to back up your databases. Postgres already contains some tools itself:
* ''pg_dump'': per-database dumping of db contents
* ''pg_basebackup'': binary backup of whole postgres clusters (per db backup is not possible)
However, especially for larger databases (hundreds of GB) more powerful third-party tools featuring parallelism and incremental backups are [[http://bonesmoses.org/2017/02/03/pg-phriday-getting-back-up|recommended]]:
* [[https://pgbackrest.org|pgBackRest]]
* [[https://www.pgbarman.org|pgBarman]]
===== pg_dump =====
It makes sense to use the postgres user
sudo su postgres
There are two important variants of making backups:
Create and restore a backup using plain-text sql: ''pg_dump'' and ''psql''
pg_dump -U user source_db -f db_backup.sql
# restore
createdb target_db
psql -U user -d target_db -f db_backup.sql
Create and restore a backup using the compressed 'custom' format: ''pg_dump'' and ''pg_restore''
pg_dump -Fc -U user source_db > db_backup.dmp
# restore (requires an empty database)
createdb target_db
pg_restore -Fc -d target_db db_backup.dmp
In addition we can use gzip (and zcat) to further compress the compressed 'custom' format and give the dump a nice name:
pg_dump -Fc -U user source_db | gzip > "$(date +%Y-%m-%d)_db_backup.dmp.gz"
zcat YYYY-MM-DD_db_backup.dmp.gz | pg_restore -Fc -d target_db
Or, when creating the backup via cronjob in ''/etc/cron.d'':
0 5 * * * postgres pg_dump -Fc dbname | gzip > /path/to/db_backup_`date +\%Y-\%m-\%dT\%H_\%M`.dmp.gz
# ┬ ┬ ┬ ┬ ┬ ┬ ┬
# │ │ │ │ | │ └ command
# │ │ │ │ | └───── user executing the command
# │ │ │ │ └───────── day of week (0 - 7) (0 to 7 are Sunday to Sunday, or use names)
# │ │ │ └───────────── month (1 - 12)
# │ │ └───────────────── day of month (1 - 31)
# │ └───────────────────── hour (0 - 23)
# └───────────────────────── min (0 - 59)
===== pg_basebackup =====
see e.g. [[https://blog.sleeplessbeastie.eu/2016/02/15/how-to-perform-base-postgresql-backup-using-pg_basebackup-utility|this tutorial]]
===== pgBackRest =====
A typical use of pgBackRest would be to install a backup server that backs up several DB servers. For now we only use a simpler approach: pgBackRest backing up on localhost. For more see the [[https://pgbackrest.org/user-guide.html|excellent user guide]] or [[https://pgstef.github.io/2018/01/04/introduction_to_pgbackrest.html|this good introduction blog post]].
Install from postgres repo:
sudo add-apt-repository "deb http://apt.postgresql.org/pub/repos/apt/ xenial-pgdg main"
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt update
sudo apt install pgbackrest
==== Incremental Backups on Localhost ====
In addition to the global section, where e.g. retention is configured, you must add a so-called "stanza" for each cluster you want to back up. Make sure the repo1-path exists and is owned by user postgres.
[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=5
repo1-retention-diff=1
[STANZANAME]
pg1-path=/var/lib/postgresql/9.5/CLUSTERNAME
Then adjust the postgres config to allow for proper backups
listen_addresses = '*'
# WRITE AHEAD LOG
wal_level = hot_standby
archive_mode = on
archive_command = 'pgbackrest --stanza=STANZANAME archive-push %p'
# REPLICATION
max_wal_senders = 3
# ERROR REPORTING AND LOGGING
log_line_prefix = ''
Reload the cluster (or the complete postgres server) to activate changes
sudo systemctl restart postgresql@9.5-CLUSTERNAME
Create and test the pgbackrest stanza
sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info stanza-create
sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=info check
Make backups manually (e.g. from a cronjob)
sudo -u postgres pgbackrest --stanza=STANZANAME --type=full --log-level-console=detail backup
sudo -u postgres pgbackrest --stanza=STANZANAME --type=incr --log-level-console=detail backup
Make **regular backups via cronjob**
#minute hour day-of-month month day-of-week user command
0 4 * * 0 postgres pgbackrest --type=full --stanza=STANZANAME backup
0 4 * * 1-6 postgres pgbackrest --type=diff --stanza=STANZANAME backup
**Restore latest backup.** The cluster must be stopped and the cluster directory must exist and be empty.
sudo systemctl stop postgresql@9.5-CLUSTERNAME
sudo mv /var/lib/postgresql/9.5/CLUSTERNAME /opt/backups/CLUSTERNAME_backup
sudo -u postgres mkdir /var/lib/postgresql/9.5/CLUSTERNAME
sudo chmod 0700 /var/lib/postgresql/9.5/CLUSTERNAME
sudo -u postgres pgbackrest --stanza=STANZANAME --log-level-console=detail restore
sudo systemctl start postgresql@9.5-CLUSTERNAME
To **restore a specific backup** use ''%%--set%%'' with the name of a folder in ''/var/lib/pgbackrest/backup/STANZANAME'', e.g.:
sudo -u postgres pgbackrest --set=20180724-144511F --stanza=STANZANAME --log-level-console=detail restore
You can also copy the contents of ''/var/lib/pgbackrest'' to a different device and restore it there.
Log files can be found in ''/var/log/pgbackrest''.
====== Performance Tuning ======
[[Monitoring Server Activity]]
===== References =====
* See http://samiux.wordpress.com/2009/07/26/howto-performance-tuning-for-postgresql-on-ubuntudebian/ for a good introduction and some reasonable default values.
* See http://www.postgresql.org/docs/8.4/interactive/kernel-resources.html#SYSVIPC for Kernel Parameters that may be relevant when tuning Postgres Performance on Linux Systems
* Note that while SHMMAX is measured in bytes SHMALL is measured in **Pages**
* The default page size on a 32bit Linux system is usually 4096 bytes (check with the command //getconf PAGE_SIZE//). So if you change SHMMAX you should also check that // SHMALL >= SHMMAX / PAGE_SIZE //