Category Archives: PostgreSQL

PostgreSQL: Retroactively Adding Binary I/O to a Type

Here we go through the motions af adding binary input/output functions to an already existing type. We start by introducing a program that tests binary I/O for the given types, then show how to implement the binary I/O in C and finally how to fix the PostgreSQL catalog tables for the new functionality. We also include the changes in the installation script for normal installation of the new version.
Continue reading PostgreSQL: Retroactively Adding Binary I/O to a Type

Loading a Huge Textfile into PostgreSQL 11 on Windows 10

On Windows and as of 11.5, PostgreSQL has a bug when trying to copy from a huge file. According to stack overflow, this does not manifest on earlier releases like PostgreSQL 10.

For example, given a table

create table pwned( passwd sha1 not null, count int4 not null );

and when trying to copy from the pwned password list which is about 24GB uncompressed,

copy pwned( passwd, count ) from 'c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt' with ( format text, delimiter ':' );

fails and the message follows.

ERROR:  could not stat file "c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt": Unknown error

To work around this, we can pipe the file through a command;

copy pwned( passwd, count ) from program 'cmd /c "type c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt"' with ( format text, delimiter ':' );

which results in

COPY 555278657

and we can happily create an index and query the table.

The trick here is that we run cmd in sigle command mode with the /c flag, and tell it to type out the file we want to copy from.

Bug Report

Update. There is now a bug report.

Contact

The author can be reached at johann@myrkraverk.com.

Porting a PostgreSQL Extension from Unix to Windows 10

2ndQuadrant has written before about how to build your own extension for Windows. This time we take a look at what it takes to port one written by someone else. I am building this for PostgreSQL 11 on Windows using Visual Studio 2019 and clang.

Here we are working with the hashtypes extension. Unfortunately, at the time of writing, version 0.1.5 has not been released with the support for PostgreSQL 11. So we build one from git.

git clone https://github.com/adjust/hashtypes.git

The first thing we do is to take a look at the C source files. There are only four of them, common.c, crc32.c, md5.c, and sha.c so we can be reasonably sure the porting effort will be quick.
Continue reading Porting a PostgreSQL Extension from Unix to Windows 10

Running Pgbadger Automatically with Zsh

Here I present a simple script that queries the database for the log file locations and automatically chooses the one for yesterday. The trick here is to apply strftime to the log filename as configured in the database. This way, it doesn’t matter how complex the log file name is in the database, it’s alawys easy to guess. All this works as long as the system’s strftime and PostgreSQL’s log file escapes are equivalent; this should never be an issue.

There are some limits to the current script. It assumes log_directory is to be found within data_directory and is not an absolute path. It also assumes there are no strftime escape sequences in the directory names themselves. Fixing either or both of these is not hard if a more general script is needed. It is also hardcoded to choose the csv log file, but this is easy to change.

Finally it runs pgbadger on the log file from yesterday, and outputs html in the webserver’s directory. The assumption here is that PostgreSQL will write its log into a new file every day, possibly in a rotating sequence.

The script is mean to be called every day with cron.

#!/usr/local/bin/zsh

zmodload zsh/datetime

datadir=`/usr/local/bin/psql -A -t -q -w -c "show data_directory" ;`
logdir=`/usr/local/bin/psql -A -t -q -w -c "show log_directory" ;`
filename=`/usr/local/bin/psql -A -t -q -w -c "show log_filename" ;`

strftime -s log $datadir/$logdir/$filename $(( $epochtime[ 1 ] - 24 * 60 * 60 ))

/usr/local/bin/pgbadger -q -I -O /var/www/html/pgbadger/ -o index.html  ${log:r}.csv

PostgreSQL: Load JSON with Lisp and Postmodern

Sometimes we get JSON objects that are not immediately loadable with the usual PostgreSQL tools. Notably, at the time of this writing, there doesn’t seem to be any special JSON support in pgloader.

In particular, it’s frequent enough to get an array of JSON objects from a webserver that needs to be loaded into a database; for whatever reason, that I am presenting my tool for it.

An array of JSON looks like this,

  [ { "foo": "bar" }, { "foo": "qux" } ]

without the whitespace, and is usually given on a single line. For this kind of data, using COPY which expects each row to be a single line, obivously does not work; and the array is an impediment also.

Enter jsown, one of several JSON parsers for Common Lisp. It was chosen for this topic because it’s reputed to be the best for decoding; however, we are also re-creating each JSON object, so so Jonathan might also be appropriate.

First, to get the data — we can do this directly with Drakma in some cases, and others we load it from a text file. When Drakma returns an octet sequence, we can do this

  (let ((json-array
         (jsown:parse
          (sb-ext:octets-to-string
           (drakma:http-request "http://example.com/some.json")))))

for example, and there are other ways to decode strings without relying on the SBCL implementation.

For the insertion itself, we use Postmodern.

When we want to insert a subset of the data, into table columns, we can loop over the JSON objects and collect the values we want.

    (loop :for json :in json-array ;; [*]
          :collect (list (jsown:val json "foo")
                         (jsown:val json "bar")))

This creates a list of lists — the outer list returned are the rows in our table and the inner list is the row itself — split into columns.

Then we insert that data with

    (postmodern:with-transaction (inserting-json-data) ;; the tx name
      (postmodern:execute (:insert-rows-into 'table
                           :columns 'foo 'bar
                           :values loop-list))) ;; see [*].

This way, we get rid of a tedious insertion loop; which is handy. The `with-transaction’ form automatically commits at the end, we only need explicit rollbacks if desired.

On the other hand, if we want to insert the JSON object itself, into the database we have to recreate it.

    (loop :for json :in json-array ;; [**]
          :collect (list (jsown:to-json json)))

And using the same syntax as above, we insert with

    (postmodern:with-transaction (inserting-json-data) ;; the tx name
      (postmodern:execute (:insert-rows-into 'table
                           :columns 'jsonb-column-name
                           :values loop-list))) ;; see [**].

The above method really assumes you’re going to insert into more than one column, with some values possibly taken from inside the JSON object.

The Postmodern syntax for inserting multiple rows is really handy to get rid of a pointless insertion loop but it has the overhead of requiring a list of all the objects in memory.

That’s all folks.