User Tools

Site Tools


postgresql

This is an old revision of the document!


Installation & Setup

For Ubuntu 12.04:

sudo apt-get install postgresql-9.1 pgadmin3

pgAdmin III is a strongly recommended GUI for twiddling with your PostgreSQL. However, here are the first steps with the command line right after installing.

Server Configuration

Enable remote TCP/IP connections

Allow TCP/IP connections from all IPs (default = localhost only) by adding this line to /etc/postgresql/9.1/main/postgresql.conf:

listen_addresses = '*'

Then allow password authentication for all users from all IPs by adding this line to /etc/postgresql/9.1/main/pg_hba.conf:

host    all         all         0.0.0.0/0               md5

To further twiddle with more restrictive client authentication settings have a look at pg_hba.conf.

User Management

First steps with a user of your choosing

# log in with user created by default (which has no password set)
sudo -u postgres psql
 
# create a user with your linux user name and log in
# (in Ubuntu the postgres user that matches your linux user name
# can by default log in without supplying a password) 
sudo -u postgres createuser --superuser $USER
psql -d postgres
# set a password
\password
 
# create a restricted user with password (long and short version)
sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username
sudo -u postgres createuser -SDRP username
psql -U username -d postgres
# drop a user
sudo -u postgres dropuser username

Setting a password for user postgres

ALTER USER postgres WITH password 'password';

Create Readonly User

1. Create the user

sudo -u postgres createuser --no-superuser --no-createdb --no-createrole --pwprompt username

2. Grant readonly permissions

This is the tedious part (in postgres versions pre 9.0). We need to grant the user the permission to

  • connect to the DB
  • use the schema
  • perform a select on everay object in the schema
GRANT CONNECT ON DATABASE mydb TO username;
GRANT USAGE ON SCHEMA public TO username;
 
-- Generate GRANT SELECT statements for all tables/views in the schema
SELECT 'GRANT SELECT ON public.' || relname || ' TO username;'
FROM pg_class JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
WHERE nspname = 'public' AND relkind IN ('r', 'v');

Slightly different approach

Creating a true read-only user is a real pain in postgres. Just revoking privileges from a user still allows him to create database objects which he owns. You need to revoke the CREATE and USAGE privileges from the SCHEMA public and the GROUP public.

-- Revoke default permissions from public group:
REVOKE CREATE ON SCHEMA public FROM PUBLIC;
REVOKE USAGE ON SCHEMA public FROM PUBLIC;
 
-- Add back permissions for the users allowed to create stuff
GRANT CREATE ON SCHEMA public TO rw_user;
GRANT USAGE ON SCHEMA public TO rw_user;
 
-- Add SELECT permissions for read-only user
GRANT USAGE ON SCHEMA public TO readonly_user;
GRANT SELECT ON my_table TO readonly_user;

see http://www.ruizs.org/archives/89 for details.

Grant on all tables

With a single GRANT command permission for all tables in a schema can be changed:

GRANT ALL ON ALL TABLES IN SCHEMA public TO user;

Change owner on all tables

CLI: Unfortunately this can only be achieved via scripting, but we don't have to reinvent the wheel: https://gist.github.com/jirutka/afa3ce62b1430abf7572

GUI: Or in pgadminIII right-click on the schema, go to properties, and set the owner to your liking.

PSQL meta-commands

After you logged into PostgreSQL it's good to know some often-used meta-commands (read more in man psql):

-- list all users
\du
-- list all schemas
\dn
-- list all tables (and views) in the current database / show details of table
\d
\d+  -- (also show comments)
\d tablename
-- list all tables of a certain schema
\dt schemaname.*
-- list all databases on the host
\l
-- connect to a database
\c dbname
-- execute a command on the Linux shell (as-is)
\! command
-- quit
\q

For more meta-commands check out the documentation.

List and Disconnect Active Sessions

-- list active connections (and get procpids)
SELECT * FROM pg_stat_activity;
 
-- cancel a process but leave the user's session open
SELECT pg_cancel_backend( <procpid> );
 
-- terminate a user's query and session
SELECT pg_terminate_backend( <procpid> );

DB Management

Show DB Sizes

SELECT pg_database.datname,  
       pg_size_pretty(pg_database_size(pg_database.datname)) AS SIZE  
  FROM pg_database 
  ORDER BY pg_database_size(pg_database.datname) DESC;

Find large databases

SELECT d.datname AS Name,  pg_catalog.pg_get_userbyid(d.datdba) AS Owner,
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_size_pretty(pg_catalog.pg_database_size(d.datname))
        ELSE 'No Access'
    END AS SIZE
FROM pg_catalog.pg_database d
    ORDER BY
    CASE WHEN pg_catalog.has_database_privilege(d.datname, 'CONNECT')
        THEN pg_catalog.pg_database_size(d.datname)
        ELSE NULL
    END DESC -- nulls first
    LIMIT 20

see also: https://wiki.postgresql.org/wiki/Disk_Usage

Find large tables (in one DB)

SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_relation_size(C.oid)) AS "size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
  ORDER BY pg_relation_size(C.oid) DESC
  LIMIT 20;

Show Table size

-- size of data
SELECT pg_size_pretty(pg_total_relation_size('<TABLE_NAME>'));
 
-- size of indices
SELECT c2.relname, c2.relpages * 8 / 1024 AS "MB"
FROM pg_class c, pg_class c2, pg_index i
WHERE c.relname = '<TABLE_NAME>' AND
      c.oid = i.indrelid AND
      c2.oid = i.indexrelid
ORDER BY c2.relname;

SQL

Date and Time

Add 15-min interval to time

   SELECT DATE(TIMESTAMP) + (interval_id) * TIME '00:15:00' FROM ...

Extract date-parts

   SELECT EXTRACT(MINUTE FROM TIMESTAMP '2008-08-01 12:14:13');

see http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Extract TIME from TIMESTAMP

SELECT "time"(mytimestamp) FROM TABLE; 

Convert Timestamp to Seconds

To convert a timestamp / interval to seconds (in case of timestamp seonds since 01-01-1970)

   SELECT EXTRACT (EPOCH FROM  timestamp_or_interval );

Creating Timestamps from Text Fields with missing leading Zeros

If your data looks like this:

“2011/3/15 ”;“6:17:3 ”

You can construct timestamps using the “FM” (fill mode) modifier:

   SELECT to_timestamp(utc_date|' '|utc_time,'YYYY/FMMM/FMDD FMHH24:FMMI:FMSS' );

Update parts of date

UPDATE datetesting SET d = to_timestamp( to_char (d,'YYYY-MM-DD 12:00:00'),'YYYY-MM-DD HH24:MI:SS' ) ;

Time Zone Information

Boy that was confusing so here it goes…

Converting from string to TIMESTAMP

SELECT '2013-05-18 12:30:00'::TIMESTAMP WITH TIME ZONE ; 
-- 2013-05-18 12:30:00+02         no timezone info provided. Postgres simply assumes the current TIME ZONE
 
SELECT'2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE AS with_tz, '2013-05-18 12:30:00+05'::TIMESTAMP WITHOUT TIME ZONE AS without_tz;
--        with_tz         |     without_tz      
------------------------+---------------------
-- 2013-05-18 14:30:00+02 | 2013-05-18 12:30:00

Notes

  • When no TIME ZONE info is provided, but Postgres wants TIME ZONE info, then it simply assumes the current TIME ZONE of the session.
  • Timestamps WITH TIME ZONE are displayed in the TIME ZONE of the current session. Unless explicitly requested otherwise (see later).
  • The TIME ZONE part of a string will simply be ignored if you are converting to TIMESTAMP WITHOUT TIME ZONE
    • this also applies when you insert to a column of type TIMESTAMP WITHOUT TIME ZONE
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP;
-- 2013-05-18 12:30:00         --> the  +06 has been completely ignored
 
 
SELECT '2013-05-18 12:30:00+06'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 08:30:00+02      -->   6:30 UTC --> 8:30CET (+02)

Explicitly setting the TIME ZONE

SET TIME ZONE 'UTC';
SELECT '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 12:30:00+00
 
 
 
SET TIME ZONE 'CET';
SELECT '2013-05-18 12:30:00+00'::TIMESTAMP WITH TIME ZONE;
-- 2013-05-18 14:30:00+02

Converting from epoch values to TIMESTAMP

SELECT to_timestamp(1358749391);
 
--> "2013-01-21 07:23:11+01"

Notes:

  • Postgres converts the epoch value to TIMESTAMP WITH TIME ZONE. Thats why the result is displayed in the local time zone (CEST, winter time).

Discard TIME ZONE Information

SELECT to_timestamp(1358749391)::TIMESTAMP WITHOUT TIME zone;
--> "2013-01-21 07:23:11"

Notes

  • does the same thing as before, but also discards the time zone information and returns a TIMESTAMP WITHOUT TIME ZONE

Convert to other TIME ZONE

SELECT
  to_timestamp(1358749391) at TIME zone 'utc'; 
 
--> "2013-01-21 06:23:11"

Notes

Useful Functions

Aggregate Concat

  CREATE AGGREGATE textcat_all(
    basetype    = text,
    sfunc       = textcat,
    stype       = text,
    initcond    = ''
  );

Usage:

   SELECT company_id, textcat_all(employee | ', ') FROM mytable GROUP BY company_id;

row_number

Is e.g. useful if you need an OID where none is present:

SELECT  ROW_NUMBER() OVER (ORDER BY NULL) AS oid FROM TABLE;

the “order by” clause is required, but in this special case we do not require any particular order, so we use NULL here.

Round time to interval

CREATE OR REPLACE FUNCTION round_time_to_interval(TIMESTAMP) 
RETURNS TIMESTAMP AS $$ 
  SELECT date_trunc('hour', $1) + INTERVAL '15 min' * FLOOR(date_part('minute', $1) / 15.0) 
$$ LANGUAGE SQL;

Rows to Columns

This is an obscure one, but it is very handy if you must transform a number of rows in a number of columns. This can be achieved with the crosstab() function provided by the tablefunc package of postgres-contrib. So first we need to install postgresql-contrib and enable the function in our db:

sudo apt-get install postgresql-contrib
sudo service postgresql-8.4 restart
cd /usr/share/postgresql/8.4/contrib
psql -U username -d dbname -h localhost -f tablefunc.sql

Next have a look at the documentation of crosstab here: http://www.postgresql.org/docs/8.4/static/tablefunc.html

Here is an example of how to aggregate a traffic state from rows to columns (for a number of known roads).

SELECT * FROM
crosstab(
	'SELECT valid_from, road_id, q from ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1',
	'SELECT distinct road_id FROM ait.its_traffic_state WHERE road_id IN (23005242,23008577) order by 1'
 
) 
AS its_traffic_state(TIME TIMESTAMP, road1 INTEGER, road2 INTEGER);

…for a higher number of columns you may need a script to generate the column_name/datatype row at the end of the query (if anyone knows a better solution plz tell).

Updating Sequence Counters

When data is inserted into the database with a specified value for the autoincrementing column the sequence is not touched and must be updated manually. This is the case when e.g. importing dumps using COPY, or a simple INSERT statement with the autoincrementing column set to a value.

While the existing values (e.g. id 1) are reused the sequence counters stay the same (i.e. start with 0): this prevents new data records from being created as the new object would have a duplicate object id (1 == 1).

The fix:

SELECT SETVAL('sequence_name', MAX(auto_incrementing_column)) FROM TABLE_NAME;

SQL Hints

Timeseries completion

Assume you have incomplete timeseries in your database but want to select a complete series (pad the missing values with a Zero). The following query should take care of the padding and replacing missing values by 0-values:

SELECT intervals.intervals AS INTERVAL, COALESCE(q,0)
FROM timeseries t 
RIGHT OUTER JOIN generate_series(0,24) intervals
ON t.interval = intervals.intervals ; 

Timeseries creation

To create a series of timestamps

select * from
generate_series('2013-11-01'::timestamp, '2013-11-02'::timestamp, '15 minutes')

Convert Array to Rows

This might be useful to replace WHERE x IN ('a','b','c') statements (which are notoriously slow in postgres) with JOINs.

SELECT i, (ARRAY['a','b','c'])[i] 
FROM generate_series(1,array_upper(ARRAY['a','b','c'],1)) i;
 
 
 i | array
---+-------
 1 | a
 2 | b
 3 | c
 
(3 ROWS)

It is not very elegant though especially as you need to specify the array twice (or know the size of the array, then you can replace the array_upper statement with the size). If someone has a better solution I would be interested :)

Stored Procedures

PL/Python

Aggreagte Functions in PL/Python

PL/Java

Installing

Download the most recent version from http://pgfoundry.org/frs/?group_id=1000038 . Most of the instructions in this sections are taken from the README file in the tarball.

  • unzip pljava to /opt/pljava
  • sudo vi /etc/postgresql/8.3/main/postgresql.conf
SET the following parameters:
    dynamic_library_path = '$libdir:/opt/pljava'
    log_min_messages = info   #optional
    custom_variable_classes = 'pljava'

ADD the following parameters:
    pljava.classpath = '/opt/pljava/pljava.jar'

* update ldconfig so postgres finds the reqired shared objects. Create a file /etc/ld.so.conf.d/pljava.conf containing

. /usr/lib/jvm/java-6-sun/jre/lib/amd64 /usr/lib/jvm/java-6-sun/jre/lib/amd64/server /usr/lib/jvm/java-6-sun/jre/lib/amd64/native_threads

  • run
  sudo ldconfig

* install pl/java

    sudo su postgres
    psql -f install.sql

PL/R

Installing

* Install the postgres-plr package

. sudo aptitude install postgresql-8.3-plr

   wget http://de.archive.ubuntu.com/ubuntu/pool/universe/p/plr/postgresql-8.3-plr_8.3.0.6-3_amd64.deb
   dpkg -i postgresql-8.3-plr_8.3.0.6-3_amd64.deb

* Install PL/R in the database of choice

   psql <dbname> < /usr/share/postgresql/8.3/plr.sql

* Test

  CREATE OR REPLACE FUNCTION r_max (INTEGER, INTEGER) RETURNS INTEGER AS '
    if (arg1 > arg2)
       return(arg1)
    else
       return(arg2)
  ' LANGUAGE 'plr' STRICT;
 
  SELECT r_max(10,20);

Using

Have a look at the guide here http://www.joeconway.com/plr/doc/index.html

Useful commands

Inserting from CSV Files

To efficiently insert a lot of records from csv-files into the table “tolldata” use the http://www.postgresql.org/docs/9.1/static/sql-copy.html COPY command:

cat *.csv | psql -c "COPY tolldata FROM stdin WITH DELIMITER ';'" -d toll_tst -U toll -h localhost

Files named in a COPY command are read or written directly by the server, not by the client application. Therefore, they must reside on or be accessible to the database server machine, not the client. They must be accessible to and readable or writable by the PostgreSQL user (the user ID the server runs as), not the client. COPY naming a file is only allowed to database superusers, since it allows reading or writing any file that the server has privileges to access.

Outputting in CSV Format

If you need to export an entire table

COPY table TO '/tmp/table.csv' DELIMITER ',' CSV HEADER;

If query is more complicated:

--- toggle to unaligned output mode
\a
--- set ; as field delimiter
\f ;
--- disable the footer (row count)
\pset footer off
 
SELECT * FROM TABLE \g file.csv

Piping query result to file

In psql command line:

SELECT * FROM TABLE \g filename

(http://archives.postgresql.org/pgsql-general/2001-12/msg00687.php see more details)

Show Tables

  \dt
OR
  \dt my_schema

drop all tables in a schema

 SELECT 'drop table '|schemaname|'.'|tablename|';' FROM pg_tables WHERE schemaname IN ('public') ORDER BY schemaname, tablename;

drop all sequences in a schema

 SELECT 'drop sequence public.'|sequence_name|';' FROM information_schema.sequences WHERE sequence_schema IN ('public') ORDER BY  sequence_name;

Grant on all tables

 SELECT 'grant select, insert, update, delete on ' | tablename | ' to muh;' FROM pg_tables WHERE schemaname = 'public';

Backups

It makes sense to use the postgres user

sudo su postgres

There are two important variants of making backups:

Create and restore a backup using plain-text sql: pg_dump and psql

pg_dump -U user source_db -f db_backup.sql
 
# restore
createdb target_db
psql -U user -d target_db -f db_backup.sql

Create and restore a backup using the compressed 'custom' format: pg_dump and pg_restore

pg_dump -Fc -U user source_db > db_backup.dmp
 
# restore (requires an empty database)
createdb target_db
pg_restore -Fc -d target_db db_backup.dmp

In addition we can use gzip (and zcat) to further compress the compressed 'custom' format and give the dump a nice name:

pg_dump -Fc -U user source_db | gzip > "$(date +%Y-%m-%d)_db_backup.dmp.gz"
zcat YYYY-MM-DD_db_backup.dmp.gz | pg_restore -Fc -d target_db

Or, when creating the backup via cronjob in /etc/cron.d:

  0   5   *   *   *   postgres  pg_dump -Fc dbname | gzip > /path/to/db_backup_`date +\%Y-\%m-\%dT\%H_\%M`.dmp.gz
# ┬   ┬   ┬   ┬   ┬   ┬         ┬
# │   │   │   │   |   │         └ command
# │   │   │   │   |   └───── user executing the command
# │   │   │   │   └───────── day of week (0 - 7) (0 to 7 are Sunday to Sunday, or use names)
# │   │   │   └───────────── month (1 - 12)
# │   │   └───────────────── day of month (1 - 31)
# │   └───────────────────── hour (0 - 23)
# └───────────────────────── min (0 - 59)

Performance Tuning

References

postgresql.1486993317.txt.gz · Last modified: 2017/02/13 14:41 by mstraub