Compiling Sysbench on OS X Yosemite or Later

These instructions are applicable after cloning the git repository and generating the autoconfigure scripts.

git clone 'https://github.com/akopytov/sysbench.git' sysbench
cd sysbench
./autogen.sh

In order to build Sysbench1 with PostgreSQL and MariaDB support, you need to make sure both mysql_config and pg_config are in your path.

I use Zsh, so this is my way of doing it, when both Postgres and MariaDB have been installed with MacPorts.

path=( /opt/local/lib/mariadb-10.1/bin /opt/local/lib/postgresql96/bin $path )

Then run

./configure --with-pgsql --with-mysql --prefix=/path/of/your/choice

You are likely to get an error like

ld: library not found for -lgcc_s.10.4

if you do not also

export MACOSX_DEPLOYMENT_TARGET=10.10

before running make, while building the bundled LuaJit. This is documented in their installation instructions.

Of course, this isn’t taken care of by the wrapper Autotools, nor is there a configure flag to set this.

An alternative might be --with-system-luajit but that depends on your situation.

Then you finish it off with make install. Happy benchmarking.


1 I hope I’m linking to the right git repository.

Parsing Command Line Parameters with Yacc & Flex

This is a repost from 2012, but my old blog site disappeared.

Every once in a while someone comes along and asks how to parse command line parameters with Yacc & Flex. This is rather straight forward, but requires some knowledge of the generated code to get right.

Here we present a source template that does this. The user only has to edit the grammar and scanning rules. Some knowledge of C, Yacc and Flex is assumed.

The code is WTFPL licensed.

The template is written for Berkeley Yacc and the Reflex variant of Flex. It may be made to work with GNU Bison and (formerly SourceForge) Flex (now on GitHub), possibly with a few changes.

Table of Contents

Using the Template

In the file commandline.l we start to edit the scanner rules. For our example we make do with

%%

 // Here we put regular old scanning rules.

[a-z]+ { commandlinelval = commandlinetext; return WORD; }

%%

The only thing different here is that our customary yylval and yytext variables have changed names. The WORD token is defined in commandline.y.

Then in commandline.y we edit grammar rules as usual. We start with a list of tokens.

// Here we put regular old token declarations.
%token WORD SPACE

and then write our grammar

%%

// Here we put regular old grammar rules.

command: /* empty */
	|	words
	;

words:		word
	|	words word
	;

word:		WORD { printf( "\"%s\"\n", $1 ); }
	;

%%

Here we just print out the words returned by the scanner, one per line. We are using the fact that the lexer starts a new lexeme on calls to yywrap(). This means we do not have to insert any separator characters between the command line arguments we are parsing.

The provided makefile builds the example with the -p prefix parameter to yacc, which changes the symbol prefix from yy and the -P prefix parameter to reflex to do the same. This makes the template usable as-is with projects that use yacc & flex already.

% make
yacc -bcommandline -pcommandline -di commandline.y
reflex -Pcommandline commandline.l
cc -o commandline commandline.tab.c lex.commandline.c

Now we can run the example.

% ./commandline this is a simple example
"this"
"is"
"a"
"simple"
"example"

Understanding the Template

We use the technique presented previously to pass parameters to yacc and flex [link is to an archived copy] to feed argc and argv to our yywrap() function.

In commandlin.h we declare the argument structure.

// The argument structure we pass to yywrap().
struct arguments
{
    int argc, // The total number of arguments passed to main().

        arg;  // The argument we are actually going to parse.

    char **argv; // Pointer to the argument vector itself.
};

In commandline.l we have

int nextargument( struct arguments *args )
{

  // Prevent memory leaks.  This is safe because yy_current_buffer
  // is initialized to zero.
  if ( YY_CURRENT_BUFFER )
    {
      yy_delete_buffer( YY_CURRENT_BUFFER );
    }

  // If there are no more arguments, return 1 to signal we are done.
  if ( args->argc == args->arg )
    return 1;

  // Notice we increase args->arg here with ++.
  commandline_scan_string( args->argv[ args->arg++ ] );

  return 0;
}

as the yywrap() function (renamed) which calls yy_scan_string() for each argument passed to main(). yy_scan_string() has been renamed too.

The main() function itself is purely a template which builds a structure holding argc and argv which it then uses to pass on to yywrap() and yyparse().

int main( int argc, char *argv[] )
{
    // Initialize the argument structure we pass to yywrap().
    struct arguments args;
    args.argc = argc;
    args.arg = 1; // start at argument 1, not the command name.
    args.argv = argv;


    if ( argc > 1 )
	// This is actually our yywrap() function.  We could also have
	// used its return value to determine if there is an argument
	// to parse.
	nextargument( &args );
    else
	return 1;

    // We pass the argument structure to our yyparse().  Notice it's
    // been renamed to "commandlineparse."
    commandlineparse( (void *) &args );

    return 0;
}

Here we are careful to call yywrap() before our first call to yyparse() to initialize the input buffer.

Depending on the application, there may be no reason to change the main() function itself, merely rename it and called from the actual main().

References

Downloads

Downloads of individual files.

Downloads of the complete source archive.

The Case of the Apparent NSS Memory Corruption

This is a story of my encounter with an apparent memory corruption issue in the Netscape Security Services library.

The source I’m discussing can be found on Github.


© Alzay | Dreamstime.com – Computer test


Usually, when I try to get acquainted with a new API, I start to write simple program, one API call by call, which I compile and run after each step.

Imagine my surprise, when after adding the following function call (the only thing I added)

  PK11_FindKeyByAnyCert( certificate, passwd );

I got this memory corruption error.

  dblfree(56630,0x7fff73f61300) malloc: *** error for object 0x7fd39250ce70: pointer being freed was not allocated
  *** set a breakpoint in malloc_error_break to debug
  zsh: abort      ./dblfree

The above error is taken from my minimal example of the problem, not the actual program I was working on at the time. The only difference is the name of the binary and the hex numbers.

So what is happening here? I didn’t know. And to find out, it’s really important to use the right tool for the job.

So the first thing I did was to instrument my code with the built-in OS X tools, instruments(1). That didn’t tell me much; either because it doesn’t help in this particular instance, or that I just don’t know how to use it.

I will make a note that some people suggested Valgrind. I didn’t go that way because the problem seems to be adequately described with the Clang Address Sanitizer.

Continue reading The Case of the Apparent NSS Memory Corruption

SBCL: with-timeout Is a Nice Undocumented Feature

SBCL has a nice undocumented feature, (sb-ext:with-timeout expires &body body). As far as I can tell, this is exactly analogous to the same feature in Bordeaux Threads, but that seems undocumented too.

It does not appear in the SBCL manual as of 1.3.14, but it does have a documentation string.

"Execute the body, asynchronously interrupting it and signalling a TIMEOUT
condition after at least EXPIRES seconds have passed.

Note that it is never safe to unwind from an asynchronous condition. Consider:

  (defun call-with-foo (function)
    (let (foo)
      (unwind-protect
         (progn
           (setf foo (get-foo))
           (funcall function foo))
       (when foo
         (release-foo foo)))))

If TIMEOUT occurs after GET-FOO has executed, but before the assignment, then
RELEASE-FOO will be missed. While individual sites like this can be made proof
against asynchronous unwinds, this doesn't solve the fundamental issue, as all
the frames potentially unwound through need to be proofed, which includes both
system and application code -- and in essence proofing everything will make
the system uninterruptible."

Here is a little demonstration on how to use it.

(handler-case
    (sb-ext:with-timeout 3
      (format t "Hello, world.~%")
      (sleep 5)
      (format t "Goodbye, world.~%"))
  (sb-ext:timeout (e)
    (format t "~a~%" e)))

This will print out

Hello, world.
Timeout occurred.

Enjoy.

SBCL: Testsuites Cannot Prevent All Possible Bugs

On OS X, SBCL as of 1.3.14 can’t sleep after fork. The following simple program exits with an error.

(require 'sb-posix)

(let ((pid (sb-posix:fork)))
  (if (= 0 pid)
      (progn
        (format t "Child: Sleeping for 10 seconds.~%")
        (sleep 10)
        (format t "Child: I woke up.~%"))
    (format t "Parent, exiting.~%")))

When the above script is run on OS X, it fails with the weird error we see below. Note that it works perfectly on Linux.

% sbcl --script sleep.cl
Parent, exiting.
Child: Sleeping for 10 seconds.
fatal error encountered in SBCL pid 16145:
(ipc/send) invalid destination port

The point of this, is that there is no reasonable way a testsuite will catch this kind of a bug. Testsuites, no matter how comprehensive, will never prevent bugs 100%. At most, they prevent the same bug from reappearing.

Hopefully the SBCL team will fix this bug soon.

Remember to Delete the Root Password from Your History File

Ok, so you accidentally type in the root password before using su -.

$ ThisIsMyRootPassword

Now you have to remember to delete it from your history file. This is somewhat non-obvious, because the history file is typically saved on successful exit.

So if you immediately do something like this.

$ tail -2 ~/.bash_history
ssh somehost
su -

There’s nothing suspicious in the history file.

So, log out, then log back in. Typically that means closing your ssh session, or terminal tab, and open a new session.

Now you see the password in the history file.

$ tail -2 ~/.bash_history
su -
ThisIsMyRootPassword

So fire up an editor and delete the line.

Building SBCL on OS X Yosemite

A month or two ago, it did not work to build Steel Banks Common Lisp on OS X Yosemite. Or at least it never worked for me.

This has been fixed now, at least as of recent git checkout, and quite possibly SBCL 1.3.10.

If you get an error like the following,

ld: library not found for -lgcc_s.10.4

then you can try this (as far as I know) totally undocumented switch

SBCL_MACOSX_VERSION_MIN=10.10 sh make.sh

and the build will succeed.

Happy lisping with SBCL!

When SBCL Is Buggy, and CFFI Is Undocumented

There are at least two good ways to create C strings (or alien strings) in Lisp. The most often used is CFFI‘s foreign-string-alloc and the other is SBCL‘s make-alien-string.

The SBCL routine make-alien-string is documented to return both the alien pointer and the length of the string. However, it doesn’t.

Today, I reported this bug so by the time you read this the following may actually work; but as of SBCL 1.3.9 it doesn’t.

  (multiple-value-bind (buffer length)
      (make-alien-string "foo")
    (format t "buffer: ~a~%length: ~a~%" buffer length))

And this will print something like

  buffer: #<SB-ALIEN-INTERNALS:ALIEN-VALUE :SAP #X00400190
                                           :TYPE (* (SB-ALIEN:SIGNED 8))>
  length: NIL

On the other hand, the CFFI routine foreign-string-alloc is not documented (as of this writing) to return an extra length value, but actually does.

  (multiple-value-bind (buffer length)
      (cffi:foreign-string-alloc "foo")
    (format t "buffer: ~a~%length: ~a~%" buffer length))

Which will print something like

  buffer: #.(SB-SYS:INT-SAP #X00600050)
  length: 4

Note that the result is by default zero terminated, and hence the four bytes.

Hopefully the CFFI documentation will be updated just as quickly as SBCL is patched.

Have fun, and enjoy the Lisp world because it’s full of weird stuff.

PostgreSQL: Load JSON with Lisp and Postmodern

Sometimes we get JSON objects that are not immediately loadable with the usual PostgreSQL tools. Notably, at the time of this writing, there doesn’t seem to be any special JSON support in pgloader.

In particular, it’s frequent enough to get an array of JSON objects from a webserver that needs to be loaded into a database; for whatever reason, that I am presenting my tool for it.

An array of JSON looks like this,

  [ { "foo": "bar" }, { "foo": "qux" } ]

without the whitespace, and is usually given on a single line. For this kind of data, using COPY which expects each row to be a single line, obivously does not work; and the array is an impediment also.

Enter jsown, one of several JSON parsers for Common Lisp. It was chosen for this topic because it’s reputed to be the best for decoding; however, we are also re-creating each JSON object, so so Jonathan might also be appropriate.

First, to get the data — we can do this directly with Drakma in some cases, and others we load it from a text file. When Drakma returns an octet sequence, we can do this

  (let ((json-array
         (jsown:parse
          (sb-ext:octets-to-string
           (drakma:http-request "http://example.com/some.json")))))

for example, and there are other ways to decode strings without relying on the SBCL implementation.

For the insertion itself, we use Postmodern.

When we want to insert a subset of the data, into table columns, we can loop over the JSON objects and collect the values we want.

    (loop :for json :in json-array ;; [*]
          :collect (list (jsown:val json "foo")
                         (jsown:val json "bar")))

This creates a list of lists — the outer list returned are the rows in our table and the inner list is the row itself — split into columns.

Then we insert that data with

    (postmodern:with-transaction (inserting-json-data) ;; the tx name
      (postmodern:execute (:insert-rows-into 'table
                           :columns 'foo 'bar
                           :values loop-list))) ;; see [*].

This way, we get rid of a tedious insertion loop; which is handy. The `with-transaction’ form automatically commits at the end, we only need explicit rollbacks if desired.

On the other hand, if we want to insert the JSON object itself, into the database we have to recreate it.

    (loop :for json :in json-array ;; [**]
          :collect (list (jsown:to-json json)))

And using the same syntax as above, we insert with

    (postmodern:with-transaction (inserting-json-data) ;; the tx name
      (postmodern:execute (:insert-rows-into 'table
                           :columns 'jsonb-column-name
                           :values loop-list))) ;; see [**].

The above method really assumes you’re going to insert into more than one column, with some values possibly taken from inside the JSON object.

The Postmodern syntax for inserting multiple rows is really handy to get rid of a pointless insertion loop but it has the overhead of requiring a list of all the objects in memory.

That’s all folks.

When Dtrace Fails – Spectacularly

So, I’ve been spending some time looking at Dtrace today. At first, I created a proof of concept on OS X, and then went on to try it in production on FreeBSD.

No such luck. After several hours of trying to figure out what the heck was going wrong, I tried the following experiment, on OS X.

% uname -a
Darwin foo 14.5.0 Darwin Kernel Version 14.5.0: Tue Sep  1 21:23:09 PDT 2015; root:xnu-2782.50.1~1/RELEASE_X86_64 x86_64
% cat hello.c
#include <stdio.h>

int main(int argc, char *argv[]) {
printf( "Hello\n" );
return 0;
}
% dtrace -n 'pid$target::main:entry{printf("%#p\n",uregs[R_RBP]);}' -c ./hello
dtrace: description 'pid$target::main:entry' matched 1 probe
Hello
dtrace: pid 9224 has exited
CPU     ID                    FUNCTION:NAME
0    67470                       main:entry 0x7fff583c1c20

And then again on FreeBSD.

% uname -a
FreeBSD bar 10.2-RELEASE FreeBSD 10.2-RELEASE #0 r286666: Wed Aug 12 15:26:37 UTC 2015     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64
% cat hello.c
#include <stdio.h>

int main(int argc, char *argv[]) {
    printf( "Hello\n" );
    return 0;
}
% dtrace -n 'pid$target::main:entry{printf("%#p\n",uregs[R_RBP]);}' -c ./hello
dtrace: description 'pid$target::main:entry' matched 1 probe
Hello
dtrace: pid 84313 has exited
CPU     ID                    FUNCTION:NAME
  2  54008                       main:entry 0

As you can see, the printed value of the %rbp register is zero on FreeBSD. In my experiments, trying to read that register always yields zero. Similarly, I do not trust it for other registers.

This seems to be a bug in FreeBSD’s Dtrace. At the time of this writing, I have not tried it on recent Illumos.