Parsing Command Line Parameters with Yacc & Flex

This is a repost from 2012, but my old blog site disappeared.

Every once in a while someone comes along and asks how to parse command line parameters with Yacc & Flex. This is rather straight forward, but requires some knowledge of the generated code to get right.

Here we present a source template that does this. The user only has to edit the grammar and scanning rules. Some knowledge of C, Yacc and Flex is assumed.

The code is WTFPL licensed.

The template is written for Berkeley Yacc and the Reflex variant of Flex. It may be made to work with GNU Bison and (formerly SourceForge) Flex (now on GitHub), possibly with a few changes.

Table of Contents

Using the Template

In the file commandline.l we start to edit the scanner rules. For our example we make do with

%%

 // Here we put regular old scanning rules.

[a-z]+ { commandlinelval = commandlinetext; return WORD; }

%%

The only thing different here is that our customary yylval and yytext variables have changed names. The WORD token is defined in commandline.y.

Then in commandline.y we edit grammar rules as usual. We start with a list of tokens.

// Here we put regular old token declarations.
%token WORD SPACE

and then write our grammar

%%

// Here we put regular old grammar rules.

command: /* empty */
	|	words
	;

words:		word
	|	words word
	;

word:		WORD { printf( "\"%s\"\n", $1 ); }
	;

%%

Here we just print out the words returned by the scanner, one per line. We are using the fact that the lexer starts a new lexeme on calls to yywrap(). This means we do not have to insert any separator characters between the command line arguments we are parsing.

The provided makefile builds the example with the -p prefix parameter to yacc, which changes the symbol prefix from yy and the -P prefix parameter to reflex to do the same. This makes the template usable as-is with projects that use yacc & flex already.

% make
yacc -bcommandline -pcommandline -di commandline.y
reflex -Pcommandline commandline.l
cc -o commandline commandline.tab.c lex.commandline.c

Now we can run the example.

% ./commandline this is a simple example
"this"
"is"
"a"
"simple"
"example"

Understanding the Template

We use the technique presented previously to pass parameters to yacc and flex [link is to an archived copy] to feed argc and argv to our yywrap() function.

In commandlin.h we declare the argument structure.

// The argument structure we pass to yywrap().
struct arguments
{
    int argc, // The total number of arguments passed to main().

        arg;  // The argument we are actually going to parse.

    char **argv; // Pointer to the argument vector itself.
};

In commandline.l we have

int nextargument( struct arguments *args )
{

  // Prevent memory leaks.  This is safe because yy_current_buffer
  // is initialized to zero.
  if ( YY_CURRENT_BUFFER )
    {
      yy_delete_buffer( YY_CURRENT_BUFFER );
    }

  // If there are no more arguments, return 1 to signal we are done.
  if ( args->argc == args->arg )
    return 1;

  // Notice we increase args->arg here with ++.
  commandline_scan_string( args->argv[ args->arg++ ] );

  return 0;
}

as the yywrap() function (renamed) which calls yy_scan_string() for each argument passed to main(). yy_scan_string() has been renamed too.

The main() function itself is purely a template which builds a structure holding argc and argv which it then uses to pass on to yywrap() and yyparse().

int main( int argc, char *argv[] )
{
    // Initialize the argument structure we pass to yywrap().
    struct arguments args;
    args.argc = argc;
    args.arg = 1; // start at argument 1, not the command name.
    args.argv = argv;


    if ( argc > 1 )
	// This is actually our yywrap() function.  We could also have
	// used its return value to determine if there is an argument
	// to parse.
	nextargument( &args );
    else
	return 1;

    // We pass the argument structure to our yyparse().  Notice it's
    // been renamed to "commandlineparse."
    commandlineparse( (void *) &args );

    return 0;
}

Here we are careful to call yywrap() before our first call to yyparse() to initialize the input buffer.

Depending on the application, there may be no reason to change the main() function itself, merely rename it and called from the actual main().

References

Downloads

Downloads of individual files.

Downloads of the complete source archive.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: