Tag Archives: postgresql
PostgreSQL: Retroactively Adding Binary I/O to a Type
Here we go through the motions af adding binary input/output functions to an already existing type. We start by introducing a program that tests binary I/O for the given types, then show how to implement the binary I/O in C and finally how to fix the PostgreSQL catalog tables for the new functionality. We also include the changes in the installation script for normal installation of the new version.
Continue reading PostgreSQL: Retroactively Adding Binary I/O to a Type
Loading a Huge Textfile into PostgreSQL 11 on Windows 10
On Windows and as of 11.5, PostgreSQL has a bug when trying to copy
from a huge file. According to stack overflow, this does not manifest on earlier releases like PostgreSQL 10.
For example, given a table
create table pwned( passwd sha1 not null, count int4 not null );
and when trying to copy from the pwned password list which is about 24GB uncompressed,
copy pwned( passwd, count ) from 'c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt' with ( format text, delimiter ':' );
fails and the message follows.
ERROR: could not stat file "c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt": Unknown error
To work around this, we can pipe the file through a command;
copy pwned( passwd, count ) from program 'cmd /c "type c:\src\postgres\pwned\pwned-passwords-sha1-ordered-by-hash-v5.txt"' with ( format text, delimiter ':' );
which results in
COPY 555278657
and we can happily create an index and query the table.
The trick here is that we run cmd
in sigle command mode with the /c
flag, and tell it to type
out the file we want to copy from
.
Bug Report
Update. There is now a bug report.
Contact
The author can be reached at johann@myrkraverk.com.
Porting a PostgreSQL Extension from Unix to Windows 10
2ndQuadrant has written before about how to build your own extension for Windows. This time we take a look at what it takes to port one written by someone else. I am building this for PostgreSQL 11 on Windows using Visual Studio 2019 and clang.
Here we are working with the hashtypes extension. Unfortunately, at the time of writing, version 0.1.5 has not been released with the support for PostgreSQL 11. So we build one from git.
git clone https://github.com/adjust/hashtypes.git
The first thing we do is to take a look at the C source files. There are only four of them, common.c
, crc32.c
, md5.c
, and sha.c
so we can be reasonably sure the porting effort will be quick.
Continue reading Porting a PostgreSQL Extension from Unix to Windows 10
Running Pgbadger Automatically with Zsh
Here I present a simple script that queries the database for the log file locations and automatically chooses the one for yesterday. The trick here is to apply strftime
to the log filename as configured in the database. This way, it doesn’t matter how complex the log file name is in the database, it’s alawys easy to guess. All this works as long as the system’s strftime
and PostgreSQL’s log file escapes are equivalent; this should never be an issue.
There are some limits to the current script. It assumes log_directory
is to be found within data_directory
and is not an absolute path. It also assumes there are no strftime
escape sequences in the directory names themselves. Fixing either or both of these is not hard if a more general script is needed. It is also hardcoded to choose the csv
log file, but this is easy to change.
Finally it runs pgbadger
on the log file from yesterday, and outputs html
in the webserver’s directory. The assumption here is that PostgreSQL will write its log into a new file every day, possibly in a rotating sequence.
The script is mean to be called every day with cron
.
#!/usr/local/bin/zsh zmodload zsh/datetime datadir=`/usr/local/bin/psql -A -t -q -w -c "show data_directory" ;` logdir=`/usr/local/bin/psql -A -t -q -w -c "show log_directory" ;` filename=`/usr/local/bin/psql -A -t -q -w -c "show log_filename" ;` strftime -s log $datadir/$logdir/$filename $(( $epochtime[ 1 ] - 24 * 60 * 60 )) /usr/local/bin/pgbadger -q -I -O /var/www/html/pgbadger/ -o index.html ${log:r}.csv
Compiling Sysbench on OS X Yosemite or Later
These instructions are applicable after cloning the git repository and generating the autoconfigure scripts.
git clone 'https://github.com/akopytov/sysbench.git' sysbench cd sysbench ./autogen.sh
In order to build Sysbench1 with PostgreSQL and MariaDB support, you need to make sure both mysql_config
and pg_config
are in your path.
I use Zsh, so this is my way of doing it, when both Postgres and MariaDB have been installed with MacPorts.
path=( /opt/local/lib/mariadb-10.1/bin /opt/local/lib/postgresql96/bin $path )
Then run
./configure --with-pgsql --with-mysql --prefix=/path/of/your/choice
You are likely to get an error like
ld: library not found for -lgcc_s.10.4
if you do not also
export MACOSX_DEPLOYMENT_TARGET=10.10
before running make
, while building the bundled LuaJit. This is documented in their installation instructions.
Of course, this isn’t taken care of by the wrapper Autotools, nor is there a configure flag to set this.
An alternative might be --with-system-luajit
but that depends on your situation.
Then you finish it off with make install
. Happy benchmarking.
1 I hope I’m linking to the right git repository.
PostgreSQL: Load JSON with Lisp and Postmodern
Sometimes we get JSON objects that are not immediately loadable with the usual PostgreSQL tools. Notably, at the time of this writing, there doesn’t seem to be any special JSON support in pgloader.
In particular, it’s frequent enough to get an array of JSON objects from a webserver that needs to be loaded into a database; for whatever reason, that I am presenting my tool for it.
An array of JSON looks like this,
[ { "foo": "bar" }, { "foo": "qux" } ]
without the whitespace, and is usually given on a single line. For this kind of data, using COPY which expects each row to be a single line, obivously does not work; and the array is an impediment also.
Enter jsown, one of several JSON parsers for Common Lisp. It was chosen for this topic because it’s reputed to be the best for decoding; however, we are also re-creating each JSON object, so so Jonathan might also be appropriate.
First, to get the data — we can do this directly with Drakma in some cases, and others we load it from a text file. When Drakma returns an octet sequence, we can do this
(let ((json-array (jsown:parse (sb-ext:octets-to-string (drakma:http-request "http://example.com/some.json")))))
for example, and there are other ways to decode strings without relying on the SBCL implementation.
For the insertion itself, we use Postmodern.
When we want to insert a subset of the data, into table columns, we can loop over the JSON objects and collect the values we want.
(loop :for json :in json-array ;; [*] :collect (list (jsown:val json "foo") (jsown:val json "bar")))
This creates a list of lists — the outer list returned are the rows in our table and the inner list is the row itself — split into columns.
Then we insert that data with
(postmodern:with-transaction (inserting-json-data) ;; the tx name (postmodern:execute (:insert-rows-into 'table :columns 'foo 'bar :values loop-list))) ;; see [*].
This way, we get rid of a tedious insertion loop; which is handy. The `with-transaction’ form automatically commits at the end, we only need explicit rollbacks if desired.
On the other hand, if we want to insert the JSON object itself, into the database we have to recreate it.
(loop :for json :in json-array ;; [**] :collect (list (jsown:to-json json)))
And using the same syntax as above, we insert with
(postmodern:with-transaction (inserting-json-data) ;; the tx name (postmodern:execute (:insert-rows-into 'table :columns 'jsonb-column-name :values loop-list))) ;; see [**].
The above method really assumes you’re going to insert into more than one column, with some values possibly taken from inside the JSON object.
The Postmodern syntax for inserting multiple rows is really handy to get rid of a pointless insertion loop but it has the overhead of requiring a list of all the objects in memory.
That’s all folks.
PostgreSQL’s ~/.pgpass Parser in Common Lisp
Here is a quick and dirty PostgreSQL ~/.pgpass parser, in Common Lisp.
It depends on Split Sequence, conveniently loaded with QuickLisp, like so.
(ql:quickload :split-sequence :silent t)
And here is the actual code. It’s not wrapped in a function, and the other connection parameters are hard coded. This can use some refactoring; particularly with relevance to error handling.
(defvar pguser "user") (defvar pgdb "database") (defvar pghost "hostname") (defvar pgport 5432) (defvar pgpass (handler-case (with-open-file (.pgpass "~/.pgpass" :direction :input :if-does-not-exist nil) (flet ((wildcard-string= (a b) (or (string= a "*") (string= a (princ-to-string b))))) (loop for line = (read-line .pgpass nil) while line when (destructuring-bind (host port db user passwd) (split-sequence:split-sequence #\: line) (if (and (wildcard-string= host pghost) (wildcard-string= port pgport) (wildcard-string= db pgdb) (wildcard-string= user pguser)) passwd)) return it))) (file-error (e) (format *error-output* "Unable to load password file; ~A~%" e) nil))) (format t "Password: ~A~%" pgpass)
Licenses
Split-Sequence is part of CL-Utilities and Public Domain.
QuickLisp is licensed under an MIT-style license.
The example code above is © 2015 Johann ‘Myrkraverk’ Oskarsson and licensed under the Two-clause BSD license.