Category Archives: C++

The sizeof Operator in C and C++

Textbooks rarely make good use of the sizeof operator in C (and C++). The syntax is

sizeof ( type )

and

sizeof variable

That is, in order to get the size of a variable, including an array, the parenthesis are not necessary. I personally find them a visual distraction, unless of course the operand is a type, like sizeof ( int ).

Let’s look at a concrete example of using sizeof without parenthesis. In this example, we’re preparing an error message for display on screen.

char buffer[ 1024 ]; /* Arbitrary size of the buffer. */
snprintf( buffer, sizeof buffer, "Unable to initialize SDL: %s", SDL_GetError() );

Because we’re using sizeof buffer in the snprintf() call we don’t have to worry about mistakes, or out of sync constants, and we don’t have to #define BUFFER_SIZE 1024 to use the same size in both definition of the buffer and call to snprintf().

Note to Windows programmers. The snprintf() function isn’t documented to always terminate the string with zero until Visual Studio 2015 and Windows 10. Programmers on the Windows platform might want to add an explicit zero termination to account for older compilers and systems. That can be done with buffer[ sizeof buffer - 1 ] = '\0'.

The trick here is that we defined buffer as an array. If we had instead used malloc() to allocate the buffer, we would have to add the size of the buffer explicitly, like so,

char *buffer = malloc( 1024 );
if ( !buffer ) exit( 1 );
snprintf( buffer, 1024, "Unable to initialize SDL: %s", SDL_GetError() );

and we would have to explicitly check the return value from malloc() as is written about at length in why it is important to check what the malloc function returned. If we had instead used sizeof buffer here, we’d have gotten 4 on a 32bit system, and 8 on a 64bit system — which is totally not the value we need.

The snprintf() function returns how many characters have been or would be printed. In our case we don’t care if the message the user receives gets truncated, so we don’t check its return value.

It is worth noting that when a character literal is used, it matters whether the literal is declared as a pointer or array. That is, given

  char *foo   = "Error string.",
        bar[] = "Another error string.";

then sizeof foo will give us 4 on 32bit system, 8 on a 64bit system, while sizeof bar is 22 or the length of the string including the terminating zero byte.

Literals

The sizeof operator applies to literals also without parenthesis, and it’s instructive to test it on some literal combinations on a given system. For example, this program

#include <stdio.h>

int main( int argc, char *argv[] )
{
  printf( "sizeof 0 = %zu\n",       sizeof 0 );     // int
  printf( "sizeof 0l = %zu\n",      sizeof 0l );    // long
  printf( "sizeof 0ll = %zu\n",     sizeof 0ll );   // long long
  printf( "sizeof NULL = %zu\n",    sizeof NULL );  // pointer
  printf( "sizeof 0.0 = %zu\n",     sizeof 0.0 );   // double
  printf( "sizeof 0.0f = %zu\n",    sizeof 0.0f );  // float
  printf( "sizeof \"foo\" = %zu\n", sizeof "foo" ); // string size including zero terminator
  return 0;
}

run an a 64bit Linux and compiled with clang, gives

sizeof 0 = 4
sizeof 0l = 8
sizeof 0ll = 8
sizeof NULL = 8
sizeof 0.0 = 8
sizeof 0.0f = 4
sizeof "foo" = 4

and none of that should be surprising.

Contact

The author can be reached at johann@myrkraverk.com.

Updates

Added the section on literals.

CWEB: Hello, World!

To give literate programming a try, I wrote the quintessential hello, world program as an exercise. It includes how to build and run the resulting hello.c file with several compilers on different operating systems. The cweb source is not included though, so people cannot just tangle my source code. Writing out hello.c is left as an exercise for the reader.

Polymorphism in Plain C

Here we go through the steps required to implement polymorphic interfaces in plain C. We use function pointers for this task, hidden behind generic functions we define for the interface itself.

To demonstrate the technique, we implement a simple queue of string pointers. This entry is about the generic interface so some deficiencies and possibly bugs in the actual implementation may pass us by. Please write the author or comment on the post if you spot errors in the implementation.

First we define the interface we’re going to use. We start off by defining a struct with the function pointers we need.

struct queue {

  void *secret;

  void (* enqueue)( struct queue *, char * );
  char * (*dequeue)( struct queue * );
  bool (*empty)( struct queue * );
  struct queue * (* delete)( struct queue * );
};

The void *secret is what we use in the implementation to keep track of our secret data structure. The rest are the function pointers we need to define for each implementation.

Here we use direct function pointers for all of the functions. We could also put the pointers into a separate struct for easier sharing, or at least smaller concrete objects, but we leave that optimization as an exercise for the dedicated reader.
Continue reading Polymorphism in Plain C

Parsing Command Line Parameters with Yacc & Flex

This is a repost from 2012, but my old blog site disappeared.

Every once in a while someone comes along and asks how to parse command line parameters with Yacc & Flex. This is rather straight forward, but requires some knowledge of the generated code to get right.

Here we present a source template that does this. The user only has to edit the grammar and scanning rules. Some knowledge of C, Yacc and Flex is assumed.

The code is WTFPL licensed.

The template is written for Berkeley Yacc and the Reflex variant of Flex. It may be made to work with GNU Bison and (formerly SourceForge) Flex (now on GitHub), possibly with a few changes.

Table of Contents

Using the Template

In the file commandline.l we start to edit the scanner rules. For our example we make do with

%%

 // Here we put regular old scanning rules.

[a-z]+ { commandlinelval = commandlinetext; return WORD; }

%%

The only thing different here is that our customary yylval and yytext variables have changed names. The WORD token is defined in commandline.y.

Then in commandline.y we edit grammar rules as usual. We start with a list of tokens.

// Here we put regular old token declarations.
%token WORD SPACE

and then write our grammar

%%

// Here we put regular old grammar rules.

command: /* empty */
	|	words
	;

words:		word
	|	words word
	;

word:		WORD { printf( "\"%s\"\n", $1 ); }
	;

%%

Here we just print out the words returned by the scanner, one per line. We are using the fact that the lexer starts a new lexeme on calls to yywrap(). This means we do not have to insert any separator characters between the command line arguments we are parsing.

The provided makefile builds the example with the -p prefix parameter to yacc, which changes the symbol prefix from yy and the -P prefix parameter to reflex to do the same. This makes the template usable as-is with projects that use yacc & flex already.

% make
yacc -bcommandline -pcommandline -di commandline.y
reflex -Pcommandline commandline.l
cc -o commandline commandline.tab.c lex.commandline.c

Now we can run the example.

% ./commandline this is a simple example
"this"
"is"
"a"
"simple"
"example"

Understanding the Template

We use the technique presented previously to pass parameters to yacc and flex [link is to an archived copy] to feed argc and argv to our yywrap() function.

In commandlin.h we declare the argument structure.

// The argument structure we pass to yywrap().
struct arguments
{
    int argc, // The total number of arguments passed to main().

        arg;  // The argument we are actually going to parse.

    char **argv; // Pointer to the argument vector itself.
};

In commandline.l we have

int nextargument( struct arguments *args )
{

  // Prevent memory leaks.  This is safe because yy_current_buffer
  // is initialized to zero.
  if ( YY_CURRENT_BUFFER )
    {
      yy_delete_buffer( YY_CURRENT_BUFFER );
    }

  // If there are no more arguments, return 1 to signal we are done.
  if ( args->argc == args->arg )
    return 1;

  // Notice we increase args->arg here with ++.
  commandline_scan_string( args->argv[ args->arg++ ] );

  return 0;
}

as the yywrap() function (renamed) which calls yy_scan_string() for each argument passed to main(). yy_scan_string() has been renamed too.

The main() function itself is purely a template which builds a structure holding argc and argv which it then uses to pass on to yywrap() and yyparse().

int main( int argc, char *argv[] )
{
    // Initialize the argument structure we pass to yywrap().
    struct arguments args;
    args.argc = argc;
    args.arg = 1; // start at argument 1, not the command name.
    args.argv = argv;


    if ( argc > 1 )
	// This is actually our yywrap() function.  We could also have
	// used its return value to determine if there is an argument
	// to parse.
	nextargument( &args );
    else
	return 1;

    // We pass the argument structure to our yyparse().  Notice it's
    // been renamed to "commandlineparse."
    commandlineparse( (void *) &args );

    return 0;
}

Here we are careful to call yywrap() before our first call to yyparse() to initialize the input buffer.

Depending on the application, there may be no reason to change the main() function itself, merely rename it and called from the actual main().

References

Downloads

Downloads of individual files.

Downloads of the complete source archive.

CTemplate Emitter for Stream Output

Google’s CTemplate has very customizable output features, yet it does not come with an emitter (the type of class that actually outputs the template with all the variables filled in) for standard streams.  Hence there is no default way to output to std::cout.

Or I was unable to find one.

Therefore, here is a very simple emitter for stream output.

class StreamEmitter: public ctemplate::ExpandEmitter
{
public:
    StreamEmitter( std::ostream &out ) : sout( out )
    {
    }

    virtual void Emit( char c )
    {
        sout << c;
    }

    virtual void Emit( const std::string &s )
    {
        sout << s;
    }

    virtual void Emit( const char *s )
    {
        sout << s;
    }

    virtual void Emit( const char *s, size_t len )
    {
        sout.write( s, len );
    }

private:
    std::ostream &sout;
};

Then it can be used like this.

// Assume pageDict has been initialized and the variables filled in...

    StreamEmitter coutEmitter( std::cout );

    ctemplate::ExpandTemplate( "template_file", ctemplate::DO_NOT_STRIP, &pageDict, &coutEmitter );

Am I really the only one who wants to emit templates to streams?


Licenese: Three clause BSD (same as CTemplate itself).