PostgreSQL: Retroactively Adding Binary I/O to a Type

Here we go through the motions af adding binary input/output functions to an already existing type. We start by introducing a program that tests binary I/O for the given types, then show how to implement the binary I/O in C and finally how to fix the PostgreSQL catalog tables for the new functionality. We also include the changes in the installation script for normal installation of the new version.

Table of Contents

Testing Binary I/O

Adding the Binary I/O Functions

Fixing the Hashtypes Extension

Github

Acknowledgements

Downloads

Support

Testing Binary I/O

Here we present a program that performs binary input/output with the various SHA data types in the hashtypesextension. See the download section later.

The Readme

Test the various binary I/O routines of the SHA data types.

Prerequisites: the hashtypes extension has been created on a given database, like so

  create extension hashtypes;

with version 0.1.6 or later.

This program performs a simple binary I/O test of the various SHA types, from SHA1 to SHA512.

The usage is very simple,

  shabin "connection string"

example,

  shabin "host=localhost port=5432 dbname=foo user=bar"

and it will connect to the database with the given string, and attempt to make query of the form

  select digest1, digest224, digest256, digest384, digest512;

where each of the digests are of the same SHA type. It does this by querying the SHA type oids first, and then uses them to feed each digest to the PostgreSQL server.

If there is an error, the program will print a message to standard error and returns one to the operating system. On a successful run, the program prints a success message and returns zero to the operating system.

The Shabin Program

Test the various binary I/O routines of the SHA data types.

PostgreSQL header.

#include <libpq-fe.h>

System dependent networking headers, for ntohl().

#ifdef _WIN32
#include <winsock.h>
#else
#include <netinet/in.h>
#endif

Standard C headers.

#include <stdio.h>
#include <stdint.h>

Connects to the database according to the connection string in argv[ 1 ] and query the SHA1, SHA224, SHA256, SHA384, and SHA512 types in binary format.

int main( int argc, char *argv[] )
{

The return value to the operating system, error by default.

	int ret = 1;

Fail if the argument count is not exactly two.

	if ( argc != 2 ) {
		fprintf( stderr, "Usage: shabin \"connection string\"\n" );
		return ret;
	}

We declare the variables we’re going to use. We do it here because otherwise they might be unitialized due to jumps later.

The database connection object.

	PGconn *conn = NULL;

The query result object.

	PGresult *query = NULL; 

Oid is an object identifier, and is what PostgreSQL uses to identiry types, functions, tables, etc.

We declare an arry of five Oids, and a pointer for later use. We’ll fill in the real type oids later. The array references the SHA types in order, namely SHA1, SHA224, SHA256, SHA384, and SHA512. The oid pointer is used to typecast the returned value from PostgreSQL.

	Oid types[] = { 0, 0, 0, 0, 0 }, *oid = NULL;

We prepare the data to pass to Postgres. We’ll use the same array for all five SHA types, just set different output lengths.

        uint8_t data[] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
                           0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
                           0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
                           0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
                           0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
                           0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
                           0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
                           0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };

The various lengths of the SHA types. Hard coded but these don’t change.

	int lengths[] = { 20,     28,     32,     48,     64 };

All of the formats are binary, of course.

	int formats[] = {  1,      1,      1,      1,      1 };

Create a value array that points to our data five times. If any of the data pointers were of a type larger than bytes we would need to shuffle them into network byte order. We don’t need that here because the SHA types are just fixed length byte strings.

        const char *values[] = { ( const char * ) data, ( const char * ) data, 
                                 ( const char * ) data, ( const char * ) data,
                                 ( const char * ) data };

Connect to the database with argv[ 1 ] as the connection string.

This is safe because we checked the argument counter earlier.

	conn = PQconnectdb( argv[ 1 ] );

Check that the connection was successful. If there was a failure we jump to the cleanup code at the end.

        if ( PQstatus( conn ) != CONNECTION_OK ) {
                fprintf( stderr, "Connection failure: %s\n",
                         PQerrorMessage( conn ) );
                goto finish;
        }

We query the database for the various Oids used for the SHA data types. This will fail if the hashtypes extension hasn’t been created with create extension.

        query = PQexecParams( conn, "select   'sha1'::regtype::oid, "
                              "             'sha224'::regtype::oid, "
                              "             'sha256'::regtype::oid, "
                              "             'sha384'::regtype::oid, "
                              "             'sha512'::regtype::oid; ",
                              0, /* No parameters, hence all NULLs. */
                              NULL, NULL, NULL, NULL, 
                              1 ); /* Binary result. */

Double check that we got tuples from the database, and print an error message and jump to the cleanup code if not.

        if ( PQresultStatus( query ) != PGRES_TUPLES_OK ) {
                fprintf( stderr, "Error executing query: %s\n", 
                         PQerrorMessage( conn ) );
                goto finish;
        }

We now know there’s exactly one row with five columns. We need to convert the values from network byte order to host, so we use ntohl() here.

We fill in the types array we prepared earlier.

        for ( int i = 0; i < 5; i++ ) {
                oid = ( Oid * ) PQgetvalue( query, 0, i );
                types[ i ] = ntohl( *oid );
        }

And finish off by cleaning up the result set; and set thequery pointer to NULL for good measure.

	PQclear( query ); query = NULL;

Query Postgres with our binary data. There are five columns, so every array is of length 5. Then we pass in the types, values, lengths and formats given above, and request the results in binary.

If any of the types don’t implement binary I/O this query will fail. That happens if the hashtypes extension is version 0.1.5 or lower.

        query = PQexecParams( conn, "select $1, $2, $3, $4, $5;",
                              5, types, values, lengths, formats, 1 );

Double check that there’s no error.

        if ( PQresultStatus( query ) != PGRES_TUPLES_OK ) {
                fprintf( stderr, "Error executing query: %s\n", 
                         PQerrorMessage( conn ) );
                goto finish;
        }

We now know there’s exactly one row with five columns, again.

	for ( int i = 0; i < 5; i++ ) {

Double check the types returned.

                if ( PQftype( query, i ) != types[ i ] ) {
                        fprintf( stderr, "Column %d is not the expected "
                                 "type.\n", i + 1 );
                        goto finish;
                }

Double check the lengths returned.

                if ( PQgetlength( query, 0, i ) != lengths[ i ] ) {
                        fprintf( stderr, "Column %d is not of the right "
                                 "length.\n", i + 1 );
                        goto finish;
                }

Double check the values returned. At this point we are certain the lengths fit our previous array, so we use it.

                if ( memcmp( PQgetvalue( query, 0, i ), data, 
                             lengths[ i ] ) != 0 ) {
                        fprintf( stderr, "Data in column %d is not equal "
                                 "to value sent.\n", i + 1 );
                        goto finish;
                }
        }

At this point we’ve successfully gone through every check, so we set the return value to success.

	ret = 0;

And print a success message.

	fprintf( stdout, "All checks: OK\n" );
finish:

Our cleanup code, and the end of the main() function.

        if ( query ) { PQclear( query ); query = NULL; }
        if ( conn ) { PQfinish( conn ); conn = NULL; }
        return ret;
}

The Makefile

This Makefile is Windows and clang specific, please adjust to your own needs.

You need to set these accourding to your system.

CC="C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\Llvm\8.0.0\bin\clang.exe"
POSTGRES=C:\Program Files\PostgreSQL\11

And apparently nmake doesn’t deal well with dependencies that have spaces in the path, so this is a workaround; you’ll also need to set this to the shortpath if your PostgreSQL installation has spaces in the filename, which it does by default.

PG=C:\progra~1\PostgreSQL\11

Nothing below this line should need modification.

We depend on all the runtime libraries, so we can use nmake to copy the DLLs into the current directory.

shabin.exe: shabin.c libpq.dll libssl-1_1-x64.dll libcrypto-1_1-x64.dll \
	libintl-9.dll libwinpthread-1.dll libiconv-2.dll
	$(CC) -Wall -Wextra -m64 -g -I"$(POSTGRES)\include" \
		shabin.c -o$@ -L"$(POSTGRES)\lib" -llibpq -lwsock32

Above, we link against libpq for PostgreSQL and wsock32 for ntohl().

These rules just copy the files so we can run the program without having the PostgreSQL binary directory in the PATH environment variable.

libpq.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

libiconv-2.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

libwinpthread-1.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

libintl-9.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

libcrypto-1_1-x64.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

libssl-1_1-x64.dll: $(PG)\bin\$@
	copy $(PG)\bin\$@ $@

A simple rule to delete generated files and backups.

clean:
	del *.dll *.exe *.obj *~ *.ilk *.pdb

There is no installion target because this program isn’t meant to be installed anywhere.

Adding the Binary I/O Functions

Now that we have a fully functional test program, it’s time to add the binary I/O routines to the hashtypes extension. This is easy because we can just adapt the md5 binary I/O routines. The code can be seen on github.

First, we add defines for the send and receive functions.

#define send_function funcname(sha_send, SHA_NAME)
#define recv_function funcname(sha_recv, SHA_NAME)

and then add the prototypes

PGDLLEXPORT Datum send_function(PG_FUNCTION_ARGS);
PGDLLEXPORT Datum recv_function(PG_FUNCTION_ARGS);

Now we add the generic send function. Here we use the utility function hexarr_to_bytea to return a bytea to PostgreSQL.

PG_FUNCTION_INFO_V1(send_function);
Datum
send_function(PG_FUNCTION_ARGS)
{
        Sha *value = PG_GETARG_SHA(0);

        PG_RETURN_BYTEA_P(hexarr_to_bytea(value->bytes, SHA_LENGTH));
}

And a generic receive function. Here we don’t have a ready made utility function, so we double check the length, then allocate enough bytes for the Sha data type and then fill it with the value we received, then return it.

PG_FUNCTION_INFO_V1(recv_function);
Datum
recv_function(PG_FUNCTION_ARGS)
{
        StringInfo buffer = (StringInfo) PG_GETARG_POINTER(0);
        Sha *result = NULL;
        int nbytes = buffer->len - buffer->cursor;

        if (nbytes != SHA_LENGTH)
                ereport(ERROR, (errmsg("received incorrect length "
                                       "(expected %d bytes, got %d)",
                                       SHA_LENGTH, nbytes)));
        result = palloc(sizeof(Sha));

        pq_copymsgbytes(buffer, (char *)result->bytes, nbytes);

        PG_RETURN_SHA(result);
}

Fixing the Hashtypes Extension

Now it’s time to adjust the hashtypes extension. First we edit the control file and set the version to 0.1.6.

default_version = '0.1.6'

Then we create an upgrade script. Here we show only the sha1 type, the entire script can be seen on github.

We create the functions in PostgreSQL, referring to the C functions we just wrote.

create function sha1_send( sha1 ) returns bytea immutable language
c strict as 'hashtypes', 'sha_send1';

create function sha1_recv( internal ) returns sha1 immutable
language c strict as 'hashtypes', 'sha_recv1';

Then update the pg_type table with the new functions.

update pg_type set
  typsend = 'sha1_send(sha1)'::regprocedure,
  typreceive = 'sha1_recv(internal)'::regprocedure
where oid = 'sha1'::regtype;

Then for completeness sake, add them as dependencies in the pg_depend table.

insert into pg_depend ( classid, objid, objsubid, refclassid,
	refobjid, refobjsubid, deptype ) 
values ( 'pg_type'::regclass, 'sha1'::regtype, 0,
	'pg_proc'::regclass, 'sha1_recv(internal)'::regprocedure, 0, 'n' );

insert into pg_depend ( classid, objid, objsubid, refclassid,
	refobjid, refobjsubid, deptype ) 
values ( 'pg_type'::regclass, 'sha1'::regtype, 0,
	'pg_proc'::regclass, 'sha1_send(sha1)'::regprocedure, 0, 'n' );

And finally update the function signatures and marke them as parallel safe if we’re on PostgreSQL 9.6 or later.

do $$
declare version_num integer;
begin
  select current_setting('server_version_num') into strict version_num;
  if version_num >= 90600 then
    execute $f$ alter function sha1_send( sha1 ) parallel safe $f$;
    execute $f$ alter function sha1_recv( internal ) parallel safe $f$;
  end if;
end;
$$;

Then we add the new functions to the 0.1.6 installation script. I did this by hand, even though the 0.1.5 installation script is generated, because I didn’t know how it’s generated. Later, I will change that. The entire file can be seen on github.

We add

create function sha1_send( sha1 ) returns bytea immutable language
c strict as 'hashtypes', 'sha_send1';
create function sha1_recv( internal ) returns sha1 immutable
language c strict as 'hashtypes', 'sha_recv1';
CREATE TYPE sha1 ( INPUT = sha1_in, OUTPUT = sha1_out, 
	receive = sha1_recv, send = sha1_send,
	INTERNALLENGTH = 20);

and we also mark these functions parallel safe,

  execute $e$ alter function sha1_send(sha1) parallel safe $e$;
  execute $e$ alter function sha1_recv(internal) parallel safe $e$;

Github

The entire port can be seen on Github. It is not a pull request yet, because there’s still minor cleanup to do for the Unix support. But there is an issue open with a reference.

To build version 0.1.6 with binary I/O, please clone the git repository and checkout the binary branch.

git clone 'https://github.com/myrkraverk/hashtypes.git'
cd hashtypes
git checkout binary

Don’t forget to write to johann@myrkraverk.com if there are any issues with the binary I/O build.

Acknowledgements

Thanks to everyone who helped me out on the PostgreSQL Slack, IRC, and the general mailing list.

Downloads

Support

If you find yourself in need of a support for an extension you’re writing, you should first turn to your support company. If you don’t have a support company yet, turn to the PostgreSQL list of support companies.

You can also send mail to the PostgreSQL General mailing list.

As a last resort, the author can be contacted at johann@myrkraverk.com.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: