It is possible to create a simple parser using Lex alone. by making extensive use of the user-defined states (ie start-conditions). However, such a parser quickly becomes unmaintainable, as the number of user-defined states tends to explode.
Once our input file syntax contains complex structures, such as "balanced" brackets, or contains elements which are context-sensitive, we should be considering yacc.
"Context-sensitive" in this case means that a word or symbol can have different interpretations, depending on where it appears in the input language. For example in C, the '*' character is used for both multiplication, and to specify indirection (ie to dereference a pointer to a piece of memory). It's meaning is "context-sensitive".
Like lex, yacc has it's own specification language. A yacc specification is structured along the same lines as a Lex specification.
%{ /* C declarations and includes */ %} /* Yacc token and type declarations */ %% /* Yacc Specification in the form of grammer rules like this: */ symbol : symbols tokens { $$ = my_c_code($1); } ; %% /* C language program (the rest) */
The Yacc Specification rules are the place where you "glue" the various tokens together that lex has conviniently provided to you.
Each grammar rule defines a symbol in terms of:
Each rule can have an associated action, which is executed after all the component symbols of the rule have been parsed. Actions are basically C-program statements surrounded by curly braces.
Instead of going into a detailed discussion of the yacc syntax, I'll introduce the concepts one-by-one, by building an example program.
Our example program will read an olvwm menu file, with the intent of afterwards writing out an equivalent menu, for a different window manager.
Let's review a typical openwin-menu file:
# comments Workspace TITLE Editors MENU $OPENWINHOME/lib/openwin-menu-e Properties PROPERTIES SEPARATOR Netscape exec netscape </usr/X11R6/icons/mini.netscape.xpm> exec netscape "Long command" exec netscape \ http://www.luv.asn.au/ "Meditation" MENU "Coffee" TITLE "drift" exec xlock -remote -nolock -mode drift "flame" exec xlock -remote -nolock -mode flame "Meditation" END PIN "Optional Menu" INCLUDE maybe-existant-menu "Exit" EXIT
Most entries are single-line entries, begining with a label or icon filename. We also have various keywords to take into account, plus the more complex structure of a sub-menu.
The lexer will provide us with the following tokens:
'<' LABEL '>'
\
'),
The attached file, olmenu.l contains a suitable Lex specification.
The key aspects of the lexer are:
It is thus essential that, where a keyword may appear on a line, the length of the other rules (for LABEL or EXEC) be no longer than the keyword rule. Otherwise, lex's "longest match" rule would override the "first match" rule.
The EXEC token is constructed using the exclusive
start-condition CMD
together with yymore()
These rules allow the
command to be extended across multiple lines, if the line(s)
end in a backslash. Using yymore()
in this fashion
ensures that our arbitrarily long command-string does not
override the LABEL or keyword tokens by virtue of lex's
"longest match" rule.
The last newline after the EXEC token is not
appended to the command string, but returned separately.
Note that the rules
<CMD>\\\n { ... yymore(); }
<CMD>.$ { ... return(EXEC); }
are treated as being the same length, so it is important that
they appear in correct order.
Please ignore the variables yylval and yylloc for now. Their meaning will only become clear after we've started looking at yacc in detail.
Likewise, the lex rules associated with the start-conditions
ENV, ENV1, ENV2
are not part of the basic scanner,
and will be covered later.
At this point, it's probably a good idea to test the lexer "stand-alone". However, there's still one thing missing: tokens.
return(XXX)
statements, but have not
defined the tokens in the brackets. A token is simply a unique
integer which yacc associates with an item that the lexer
has found. Within yacc, all tokens must be declared in the
"Yacc Declarations" section of our yacc specification, like this:
%token TITLE %token MENU ...etc
Yacc will choose a suitable integer values (>=256) for lex to use. (see also: "Token Types") Bison will also let you choose a value manually, like this:
%token END 999
Normally, we let the parse choose values, and that's one less thing to worry about.
Part of the lex-yacc integration is that yacc will
generate a suitable set of token-definitions for lex to use.
Yacc does so by generating a file y.tab.h
(bison generates basename.tab.h
) which contains the
token-definitions and looks like this:
#define TITLE 258 #define MENU 259 ...etc
This also keeps the lex-yacc communications automatically in step with each other.
We don't need the whole yacc specification to do this. We can get by on a something like the file olmenu_tokens.y
Now we are ready to test our lexer, stand-alone, like this:
bison -d olmenu_tokens.y mv olmenu_tokens.tab.h olmenu.tab.h flex -d -t olmenu.l > olmenu_lex_test.c cc -g -DTEST_LEX olmenu_lex_test.c -o olmenu_lex_test -ll ./olmenu_lex_test < /usr/X11R6/lib/openwin-menu |& more
The -d
option to lex turns on the "debug" mode.
This writes a message to the stderr output for each rule which
has been matched.
We are specifically looking for:
--accepting rule at line 70 (" ")
COLUMNS
keyword should
be matched by the rule for the INT
token.
See the section "Lexical" in the Bison documentation for more information.
Yacc rules define what is a legal sequence of tokens in our specification language. In our case, lets look at the rule for a simple, executable menu-command:
menu_item : LABEL EXEC ;
This rule defines a non-terminal symbol, menu_item
in terms of
the two tokens LABEL
and EXEC
. Tokens
are also known as "terminal symbols", because the parser does
not need to expand them any further. Conversely, menu_item
is a "non-terminal symbol" because it can be expanded into
LABEL
and EXEC
.
You may notice that I'm using UPPER CASE for terminal symbols (tokens), and lower-case for non-terminal symbols. This is not a strict requirement of yacc, but just a convention that has been established. We will follow this convention throughout our discussion.
We've
just hit our first complication: Any given menu-item
may also have the keyword DEFAULT
appear between the
label and the executable command. Yacc allows us to have, multiple
alternate definitions of menu_item
, like this:
menu_item : LABEL EXEC | LABEL DEFAULT EXEC ;
Note that the colon (:
) semi-colon (;
)
and or-symbol (|
) are part of the
yacc syntax - they are not part of our menu-file definition.
All yacc rules follow the basic syntax shown above and must
end in a semi-colon.
We've put the semi-colon on the next line for clarity, so that
it does not get confused with our syntax-definitions. This is not
a strict requirement, either, but another convention of style that we
will adhere to.
Note also that the word DEFAULT
appears litterally,
not because it is a keyword in our input-language, but because we have
defined a %token
called DEFAULT
,
and the lexer returns this token when
it finds a certain piece of text.
There is a way to include litteral text within a rule, but it requires that the lexer pass the characters to the parser one-by-one, as tokens.
For example, remember that menu-items may have an icon-file instead of a label, like this:
</usr/X11R6/icons/mini.netscape.xpm> exec netscape
When our lexer encounters a <
or >
it returns the character as a token
We can include litteral characters in a grammar rule, like this:
menu_item : LABEL EXEC '\n' | '<' LABEL '>' EXEC '\n' ;
Where the second form of the menu_item
is a
used when specifying an icon-file instead of a text-label.
This explains why yacc allocates token-numbers starting at >255. Because the values 1-255 are reserved for litteral characters (remember 0 is reserved for end-of-file indication).
So far, we've defined a single menu-item, whereas our menu-file may contain any number of such menu-items. Yacc handles this allowing recursive rules, like this:
menu_items : menu_item | menu_items menu_item ;
By defining menu_items
in terms of itself,
we now have a rule which means "one or more menu items".
Note that we could also have written our recursive definition the other way round, as:
| menu_item menu_items
but, due to the internals of yacc, this builds a less memory-efficient parser. Refer to the section "Recursion" in the Yacc/Bison documentation for the reasons behind this.
Referring back to the rule for a single menu_item
,
there is another way we could accomodate the optional
DEFAULT
keyword; by defining an empty rule, like this:
menu_item : LABEL default EXEC '\n' ; default : /* empty */ | DEFAULT ;
The comment /* empty */
is ignored by yacc, and
can be omitted, but again, it is conventional to include it for
any empty rules.
Strange as it may seem, the absence of the keyword
DEFAULT
is also a valid rule!
Yacc acknowledges the empty rule for "default"
when it sees it's current look-ahead token is EXEC
,
and not DEFAULT
.
See the section
"Look-Ahead"
in the Bison documentation for more information about "look-ahead".
To understand why this 2nd approach might be considered better than our earlier one, we need to explore Yacc Actions.
Actions within yacc rules take the form:
menu_item : LABEL EXEC '\n' { C-program statements } ;
So far, we have only considered the tokens
LABEL
and EXEC
as single-valued integers which are passed from the lexer to the
parser. What we really need, is access to the text-strings associated
with these tokens (ie their
semantic value).
We could do this using a global variable (like token_txt
in our spam-checking
program), except that yacc executes the action after it
has read all the tokens up to that point.
Hence the string value for EXEC
would overwrite
the one for LABEL
before we had a chance to use it.
We could use seperate global variables for the
LABEL
and EXEC
strings, but this
won't always work, because sometimes yacc has to read a token
in advance before it can decide which rule to use.
Consider the MENU
keyword, in our case.
Yacc has to check whether
it is followed by another string or a newline, before it can decide whether it
is being used to introduce a sub-menu within the same file, or
an external menu-file.
In any case, yacc provides a formal method for dealing with
the semanitic value of tokens. It begins with the
lexer. Every time the lexer returns a value, it should also set
the external variable yylval
to the value of the
token. Yacc will then retain the association between the token
and the corresonding value of yylval
.
In order to accomodate a variety of different token-types,
yylval
is declared as a union
of
different types.
Token types are declared in yacc using the yacc declaration
%union
, like this
%union { char *str; int num; }
This defines yylval
as being a union of the
types (char*)
and (int)
. This is a classical
C-program union, so any number of types may be defined, and the
union may even contain struct
types, etc.
For now, we'll just have these two types.
We also need to tell yacc which type is associated with
which token. This is done by modifying our %token
declarations to
include a type, like this:
%token <str> LABEL %token <str> EXEC %token <num> INT
We do not need to modify any other %token
declarations, because
they are all for keywords, which do not require any associated value.
Now we need to modify the lexer. This is done by including the line:
yylval.str = strdup(yytext);
just before the lexer returns the LABEL
and EXEC
tokens. We'll also include the line:
yylval.num = atoi(yytext);
just before the lexer returns the INT
token.
Now that we have the token value, we want to make use of it. Yacc lets us refer to the value of a given token using a syntax similar to that of awk and perl. $1 is the value of the 1st token, $2 is the 2nd, and so on. Here is a typical example of an action:
menu_item: LABEL EXEC { itemptr = new_item(); itemptr->label=$1; itemptr->command=$2; } ;
Where:
new_item();
is a function which allocates
some memory for the
structure itemptr
.
itemptr->label
and itemptr->command
are of type (char *)
and are used to store the
char pointers referred to by $1 and $2.
Let's consider what happens when we want to accomodate the
DEFAULT
keyword. We'll assume that the structure
itemptr
contains a itemptr->default
variable for
storing a simple indicator of whether the DEFAULT
keyword was used or not.
If we use our first approach, we would have:
menu_item: LABEL EXEC { itemptr = new_item(); itemptr->label=$1; itemptr->command=$2; itemptr->default=0; } | LABEL DEFAULT EXEC { itemptr = new_item(); itemptr->label=$1; itemptr->command=$3; itemptr->default=1; } ;
Which is OK, but there's a lot of almost-identical code in the two actions. So let's try it using the 2nd approach we used above.
menu_item: LABEL default EXEC { itemptr = new_item(); itemptr->label=$1; itemptr->command=$3; } ; default : /* empty */ { itemptr->default=0; /*segv*/ } | DEFAULT { itemptr->default=1; /*segv*/ } ;
This is nicer, because we've removed the repetition in our actions.
However, I've added the comment /*segv*/
, because
that's exactly what we would get. This is because of
the way yacc works. Here's what would happen:
LABEL
, and, since it doesn't
have the complete rule yet, it just "saves it for later"
(by pushing it onto a stack).
default
, and starts
processing the rule for it.
default
rule is simply: DEFAULT
, or nothing.
There are only two possibilties:
EXEC
.
DEFAULT
, but that's OK (syntactically), because
empty is a valid rule in this case. Yacc saves the EXEC
token for later. Since the current rule (empty) is both valid and
complete, yacc executes the action for it.
DEFAULT
.
default
,
so yacc executes
the action for it.
default
rule.
EXEC
token. We have all the
elements of our menu_item
rule (including the
non-terminal default
rule), so we can now execute the
action for that.
itemptr
using the
function new_item()
,
but we've already used itemptr
in an action for the previous rule, default
.
It's too late, we've already crashed.
Yacc provides a simple yet elegant solution to this dilemma, by
extending the concept of the "value" of a token to non-terminal
symbols, like default
.
First, we have to declare the default
in our
Yacc Declarations section, like this:
%type <num> default
Note that this is the same approach as we used for %token
definitions. We can even use types which are not used by the
lexer, but we must add them to our %union
declaration.
We assign the value of the left-hand-side of the rule by assigning
a value to $$
.
Our complete rule-set for menu_item
, using this
approach would be:
menu_item: LABEL default EXEC { itemptr = new_item(); itemptr->label=$1; itemptr->default=$2; itemptr->command=$3; } ; default : /* empty */ { $$=0; } | DEFAULT { $$=1; } ;
The storage space for our $$
is just another
value attached to a token, and this is handled automatically
by yacc. So no crash. And we've removed the unwanted duplication
of code in our actions.
Just as you might do with C-program variables, it is
possible to typecast a value which is being accessed via
the $ mechanism, and that by writing it as
$<
type>n
instead of $n
, where type
is a member of our %union
declaration. For example:
{ printf("addr: %x\n", $<num>1 ); }
Normally, you would not need to type-cast token-values in this fashion.
Until now, we have assumed that if we don't specify an action, yacc does nothing. This is not true. In the absence of and explicit action, yacc applies a default action of:
{ $$=$1; }
Or, put simply, the left-hand-side inherets the value from the 1st symbol on the right-hand-side. (or inherets the "absence of a value", as the case may be).
This can be a problem if the two symbols have different
type: yacc will complain about an
error: type clash (`num' `') on default action
or similar. This means that we should be
fussy about the way be assign types to symbols in yacc, and take the
same care as we would when writing normal C-program code.
In the early stages of development, you can get rid of
these errors by adding an explict action, like
{ }
which will override the default action.
The topic of tokens and their associated semantic values is covered in the section "Semantic Values" in the Bison documentation.
Now that we understand the hows and whys of yacc rules, we can try our hand at writing a basic parser. Before we can successfully compile it, there's a few more things we'll need.
main()
function that calls our
parser, yyparse()
yyerror()
function
yywrap()
function (for the lexer).
:-)
)
Yacc generates a single function called yyparse()
.
This function requires no parameters. and returns either a 0
on success, and 1 on failure. "Failure" in the case of the
parser means "if it encounters a syntax error".
yyerror()
function is called when yacc encounters an
invalid syntax. The yyerror()
is passed a single
string (char*) argument. Unfortunately, this string usually
just says "parse error", so on its own, it's
pretty useless. Error recovery is an important topic which
we will cover in more detail later on. For now, we
just want a basic yyerror()
function like this:
yyerror(char *err) { fprintf(stderr, "%s\n",err); }
See the section "Interface" in the Bison documentation for more information.
In addition to the usual debugging techniques, there are a few of things we can do to assist debugging
-v
flag.
.output
. This is instrumental in resolving
ambiguities in the parser.
-t
flag, and defining
YYDEBUG
. This will make the parser write debugging
messages at run-time, iff you also set the variable yydebug
to
a non-zero value.
YYERROR_VERBOSE
, which will
add a little more information to the message printed by yyerror()
.
This is prototype parser because, as you may notice, it does not contain any actions. If you compile it, and run it with a suitable openwin-menu file, you get exactly: nothing.
But that's OK for now, as we just want to check that we have understood our input-syntax properly, and that the parser works as expected.
We'll build the prototype using make. I like to use something like this Makefile.
As with any compiled source, you never get it right the first time, so you have to contend with the usual syntax errors which need fixing. In addition, you may also encounter some type clashes, which we mentioned above. Once you are over these hurdles, yacc will generate C-source code that compiles.
It it more than likely that even though yacc has generated a working parser, you got a couple of messages:
olmenu-proto1.y contains 3 shift/reduce conflicts olmenu-proto1.y contains 2 reduce/reduce conflicts
A Shift operation is what the parser does when it saves a token for later use. (Actually, it pushes the token onto a stack)
A Reduce operation is what the parser does when resolves a set of tokens into a single, complete rule. (The corresponding tokens are removed from the stack and replaced with a single token representing the rule. The stack has been "reduced").
It is not strictly necessary to eliminate these
warnings, as yacc will still generate an operational parser.
However, it is important to understand these conflicts,
to be sure that the parser we get we wanted,
and not the parser we asked for :-)
.
Both of these warnings mean that there is an ambiguity in our ruleset.
Our prototype parser does not contain any such situations. Refer to the section "Algorithm" in the Bison documentation for examples and detailed information.
A shift/reduce conflict occurs when there would be enough tokens shifted (saved) to make up a complete rule, but the next token may allow a longer rule to be applied.
In the event of a shift/reduce conflict, the parser will opt for the shift operation, and hence try to build the longer rule. You can think of this as being analogous to lex's "longest match" rule.
If your grammar has "optional" structures, such as an optional "else" following an "if" statement, then it may not be possible to eliminate all shift/reduce conflicts from the grammar rules.
See the section "Shift/Reduce" in the Bison documentation for further explanation.
A reduce/reduce conflict occurs when the same set of tokens can be used to form two different rules.
In the event of a reduce/reduce conflict, the parser will use the first rule that appears in the grammar. You can think of this as being analogous to lex's "first match" rule.
Reduce/reduce conflicts are usually an indication of an error in the way the grammar rules have been defined, as the whole point of having a grammar is to avoid such blatant ambiguities. It is usually possible (and desirable) to eliminate all reduce/reduce conflicts from your grammar rules, either by rewriting some rules, or redefining the grammar (if possible).
See the section "Reduce/Reduce" in the Bison documentation for further explanation.
In order to find out which rules are affected by these
conflicts, you will need to refer to the *.output
file
generated by running yacc with the -v flag, for example:
bison -v olmenu-proto1.y
This will generate the usual *.tab.c
file, plus
an additional olmenu-proto1.output
file. This file will tell
you which rules are causing the conflicts mentioned.
The prototype parser, olmenu-proto1 should be invoked with the -v option.
olmenu-proto1 -v /usr/openwin/lib/openwin-menu
This will print out a lot of messages about what tokens were encountered, and which rules they were used in. This information can be instrumental in determining "where the parser went wrong", if you feed it an input file which is known to be correct, but get a "parse error" from the parser.
See the section "Debugging" in the Bison documentation for further explanation.
As it stands, when our parser encounters an incorrect syntax, it will simply print the message "parse error" and exit. At the very least, we would like an indication of the line-number at which the error occured. To do this, we will need the co-operation of the lexer, since the parser is often "unaware" of newline characters. More often than not, newlines are not considered significant form the point of view of the grammar.
The lexer must set an additional variable, yylloc
, every
time it encounters a newline.
This variable must be declared in the lexer like this:
extern YYLTYPE
yylloc;
This variable is a structure of type YYLTYPE
.
We can use any of it's members to store relevant information,
but we are not required to use all of them. One is usually enough.
We should also initialise our line-counter. To this end, we can
use the lex macro YY_USER_INIT
in the lex declarations
section:
#define YY_USER_INIT yylloc.first_line=1;
Lex should increment this variable every time it encounters a
newline in the input stream. Be careful of using yyless()
and REJECT
in your lex actions, because they can confuse
your line-counter.
Once we have set up the lexer to provide line-number information,
we can use it within any yacc action.
We refer to a token's line-number
by using
@n
, in the same way that $n
is used
to refer to token's value. For example:
{ fprintf(stderr,"Line %d\n",@1.first_line); }
This requires additional code to be generated by yacc.
The appropriate source-code is generated automatically
if you make
use of the @n
notation within a yacc action.
(at least, this is true of bison - I'm not sure of
how yacc deals with this).
Once we have coaxed yacc into producing the necessary
code, we can also use yylloc within yyerror()
,
as follows:
int yyerror(char *err) { fprintf(stderr, "%s around line no %d\n", err, yylloc.first_line ); }
See the section "Token Positions" in the Bison documentation for further explanation.
As it stands, when our parser encounters an incorrect syntax, it will simply print the message "parse error" and exits.
We've just added line-number information, but even this is a bit vague. Also, we may not want our parser to simply give up at the first syntax error, in the same way the C-compiler gives you more than one error message at a single invocation.
Yacc provides a special symbol for handling errors.
The symbol is called error
and it should
appear within a grammar-rule. For example, we could have:
menu_item: menu_command | option '\n' | SEPARATOR '\n' | menu_file '\n' | submenu '\n' | '\n' | error { printf("Invalid menu item at line %d\n", @1.first_line); } ;
Now, when the parser encounters an invalid syntax while
processing the rule menu_item
, it will
error
action
menu_item
Unfortunately, we cannot vouch for the next token that the parser gets, so typically this error rule will generate several error messages.
Yacc is also capable of more sophisticated error-handling.
For example, we can tell the parser to discard some of the subsequent
tokens, too, simply by putting a token after the error
token.
menu_command : label default EXEC '\n' | label default olvwm_cmd '\n' | label default olvwm_cmd_arg '\n' | label error '\n' { printf("Invalid menu item (%s) at line %d is invalid\n", $1,@1.first_line); } ;
In this case, the parser
label
token
label
error
and '\n'
to complete the rule for menu_command
.
In this case, the rule
label error '\n'
is not much different to just having
error '\n'
except that the former lets us make use of the value of label
to give a more informative error message.
This approach is often used when there a "balanced" symbols in the syntax. Consider the rule:
label : LABEL | '<' LABEL '>' { $$=$2; } | '<' error '>' { printf("Bad icon-label at line %d\n", @1.first_line); } ;
If the parser encounters something other than a LABEL
after a '<'
, it will discard all tokens up to the
next '>'
. This technique can be useful for keeping
brackets balanced during error-recovery.
In our case, keep in mind that any unidentifiable text after a valid
LABEL
is converted to an EXEC
token by the lexer, and the EXEC
token would swallow
any subsequent '>'
.
So this rule in unlikely to trap any real errors,
except something like <
keyword >
In fact, it is likely to work against us, because we may not see another
'>'
at all, so we will miss out on a lot of input!
Error recovery is a tricky business, and we don't always get the results we really wanted. It doesn't pay to be too fussy about this aspect of the parser.
See the sections "Error Recovery" and "Action Features" in the Bison documentation for further explanations and examples.
This prototype contains the neccessary rules and actions for handling errors in a reasonable way. Building it should be as simple as typing:
make olmenu-proto2
We want to test it with both correct input, and incorrect input. We need to check that it generate meaningful error messages and does not cause any segv errors or the like.
At this stage, our grammar rules should be complete, so so that we can start populating them with actions. Hopefully, we will not need to change them from here on.
Up till now, we have been concentrating purely on the grammar rules. This has been quite deliberate, because we should get the grammar right first, before we spend too much time writing (and re-writing) our actions.
The basic goal of our parser will be create an in-memory
representation of the input data. Or, in other words, a set of
structures, linked lists, arrays, etc, which we can
use and manipulate.
Once we have our memory-representation, we can
generate out new menu-file in the CDE format, using
conventional techniques and statements such as printf
.
We are free to use any technique we like to build up our memory structures, however yacc lends itself particularly well to a specific style of doing so.
Yacc gives us the ability to assign values to non-terminal symbols,
as if they were yylval
values supplied by the lexer.
This allows us to pass values and pointers along through the grammar rules
themselves. We can use this feature to build up linked-lists
and other structures "on the fly".
We looked at this technique once before, in the section
titled
"Using Token Types in Yacc Actions",
where we passed along an integer value representing the
presence or absence of the keyword DEFAULT
.
Our menu-file format has two basic structures:
If we allow a menu-item to include a pointer to another menu, then the tree-like nature of the menus will be catered for, so we do not need to keep a separate structure to define the menu-tree.
Our target format, the dtwmrc format, does not use "inline" submenus like olvwm, Instead, each menu is refered to by it's name, and it's contents are defined after the end of the current menu. For example:
Menu DtRootMenu { "Games " f.menu "Games" "Utilities " f.menu "Utilities" } Menu "Games" { "xgalaga" f.exec exec xgalaga } # ... etc
This means that we should keep a linear list of menus, in addition to the menu-tree structure defined by menus and items. So our structures will look like this:
The first thing we are likely to encounter in our menu-file is a menu item. We can tell if it's the first menu-item, because it will be processed by the rule:
menu_items : menu_item ;and not
menu_items : menu_items menu_item ;
This makes the first menu_item
rule a
good place to allocate our struct menu
.
We do this mainly so we can get access to the variables
first_item
and last_item
We can "hold onto" our struct menu
without
resorting to global variables by circulating it within
the menu_items
rule. So both alternatives of
the rule return a struct menu *
(though only
the first version allocates it).
When there are no more menu items in the current
menu or sub-menu, the rule for menu_items
is complete.
We then store the
struct menu *
pointer returned by menu_items
:
menufile
rule, it is stored
in the global variable top_menu
.
submenu
rule, it is stored
in the struct menu_item
which invokes
the menu.
So now our rule for menu_items
liooks like this:
menu_items: menu_item { menuptr=new_menu(); add_item(menuptr,$1); $$=menuptr; } | menu_items menu_item { add_item($1,$2); $$=$1; } ;
The only other place we would want to call new_menu()
is
from within the rule for submenu
.
submenu : label default MENU '\n' menu_items end '\n' ;This rule contains the symbol
menu_items
,
which allocates a struct menu
and returns it
complete with a linked list of menu-items.
So there's not much else to do, other than to create an
item using new_item()
, and use it to store
the struct menu
pointer
which menu_items
returns for us.
submenu : label default MENU '\n' menu_items end '\n' { itemptr=new_item($1,$2,SUBMENU, $5 ); ... $$=itemptr; } ;
Our rule for menu_items
works properly,
and returns the right values in all cases. However,
one peculiarity does arise from the way this rule
interacts with the submenu rule.
If our input contains a submenu before the first ordinary
item, we call new_menu()
for the submenu before
the parent-menu. This is not strictly a problem, but
it would be nicer to have the menu_list in the
order-of-appearance. We can fix this by re-writing
the rule for menu_items
as:
menu_items: /* empty */ { $$=new_menu(); } | menu_items menu_item { add_item($1,$2); $$=$1; } ;
The problem goes away, because we now call new_menu()
before we process the first menu-item.
Lastly, there is the issue of the options
rule. These items are not really true menu items at all, just
some options we can set, like title and columns.
These have been defined as members within struct menu
However, at the time we are parsing the options, we do not have
access to the correct struct menu
.
We could use a global-variable to store the current menu,
but it might get tricky to
restore the correct menu when we get to the end of a submenu.
It is easier to let the parser do the work, and pass down values in the manner to which we are now accustommed.
We will use the struct menuitem
as a vehicle to
transfer the TITLE
and COLUMNS
options. We will then make the add_item()
function
treat these items as "special", and free the structure when done,
We are not going to use mid-rule actions, but I'll mention them anyway, because they can be useful.
Consider the problem of the rule: submenu
.
Let's say we defined the rule for menu_items
, such that
it returned the start of a linked-list of menu-items,
instead of a struct menu
pointer.
We would then link the struct menu_item
list
to the struct menu
in actions for the rules
menufile
and submenu
.
The problem arises of: how do we build the linked-list?
We could:
menu_items : menu_items menu_item { $2->next=$1; $$=$2 } ;
menu_items : menu_items menu_item { for(ptr=$2; ptr->next; ptr=ptr->next ) ; ptr->next=$2; $$=$1 } ;
menu_items : menu_items menu_item { ???->last_item->next=$2; $$=$1 } ;
Clearly, the last option is the nicest, but we cannot use
a simple global variable as our last_item
pointer.
If we did, we would get to end of the first submenu, and we would want
to restore last_item
to the last item of the parent
menu. But we've already lost this pointer,
because we've been using the same global-variable last_item
to process the submenu.
So we need to put last_item
somewhere other than a global
variable. Our struct menu
does nicely. Except that our existing
rule for a submenu will read all the menu_items
before
the action for the submenu rule is executed (remembering what happened
when processing our keyword DEFAULT
).
submenu : label default MENU '\n' menu_items end '\n' ;
There is a way we can allocate our struct menu
before we start processing menu_items
, and that is
to use a mid-rule action, like this:
submenu : label default MENU '\n' { $<mnu>$ = new_menu(); } menu_items end '\n' { $<mnu>5->first_item=$6; $<mnu>5->pinnable=$7; } ;
Notice how...
{
action }
right in the middle of it
$5
instead
$5
and everything after it is moved one token
to the right.
Note that the above example is not intended to be a "working example", it is just illustrative.
See the section "Mid-Rule Actions" in the Bison documentation for further explanations and examples.
In order to generate a complete dtwmrc file, we should
really be reading the files nominated in the rule menu_file
.
It is technically feasible to open a submenu file as soon
as we parse the menu_file
line.
Due to the use of "look-ahead" in both the lexer and parser, it is
not simply a matter of changing yyin
.
Flex provides a set of functions and macros to handle this, and these are described in the section "Multiple Input Buffers" in the flex man page.
In addition, we should be aware that the parser may also be one token in front, since it uses a "look-ahead" token to decide which rule to apply.
So, while it is possible to change input-streams on-the-fly, it adds another dimension of complexity to our program.
In our case, a much simpler solution is to read a whole
file at a time, calling yyparse()
once per file.
We stitch the new memory-structures in manually.
The function read_menu_files()
does just this. It:
yyin
yyparse()
As luck would have it, menus which are read from other files are
just appended to the end of our menu_list
, so
we don't need to do anything else to our menu_list
We process the menus in the order they appear menu_list
,
and this list can be appended-to while we are processing the current
menu. Hence, this simple solution also caters for files nested
several levels deep.
See the section "Pure Calling" in the Bison documentation for further information.
Pure calling is not required in our case, as we are not calling the parser from within the parser. We are calling yyparse() repeatedly, but we allow it to complete before calling it again.
The openwin-menu file syntax also allows the filenames of nested
menu files to be referred to using environment variables. For example,
/usr/openwin/lib/openwin-menu-games
could be written
$OPENWINHOME/lib/openwin-menu-games
or even
${OPENWINHOME}/lib/openwin-menu-games
Traditionally, this variable expansion would be done using
C-library calls like strspn()
and strtok()
, or maybe using a page or so of hand-written
code.
However, this seems quite tedious after what we've been doing with lex. After all, this kind of thing is what lex is best at.
The GNU lexer, flex, provides several functions to do exactly what we want: scan a string variable, just like it would a text-file.
The details of the flex functions required to scan a string are described in the flex man page.
Our lexer, olmenu.l
contains a function expand_env()
which uses this feature of
lex. It takes the filename string which was parsed earlier,
and processes it, using lex to break the string into several
components.
If the component is an environment variable, the lexer returns
the result of the corresponding getenv()
. If the
component isn't an environment variable, it is returned verbatim.
The function expand_env()
calls yylex()
repeatedly, and concatenates the components into the desired result string.
In order that our string-scanner does not interfere with
our regular text-scanner, we will set an "exclusive" start condition,
ENV
, before we call yylex()
.
We need to remember that yylex()
returns 0 when it
reaches the end-of-input, but we can return any other integer values
we like. In this case, we'll return yytext
, or the result of
a getenv(yytext)
. Keep in mind that getenv()
returns 0 if the variable doesn't exist, so we have to be careful not
to return the getenv()
result without checking it.
The really good thing about using flex to do this, is the amount of control we have over the resulting string-scanner. For example, if we wanted to allow backslash quoting to suppress the environment-variable substitution, we could change the rule:
<ENV>[^$]+ return((int)yytext);to become:
<ENV>[^$\\]+ return((int)yytext); <ENV>\\. return((int)yytext+1);
There's nothing really special about the output part of
our program. We just traverse our linked lists, and
run a lot of fprintf()
statements.
Look at the file dtwmrc.c if you are interested.
Finally, our menu-converter is complete. The finished result,
including grammar rules, actions, and support functions like main()
yyerror()
and yywrap()
, is the file
olmenu.y.
You can build it using the command:
make olmenu
Make contains implicit rules for building programs from yacc source files. The source file must end in .y for the implicit rules to work.
The default rule for yacc is:
.y.c : $(YACC) $(YFLAGS) $< mv -f y.tab.c $@
I like to call my parser files something.l
and something.y
Unfortunately, this introduces a conflict in
make's default rules, which would try to generate something.c
from both the lex and yacc source. Hence, I like to "brew my own"
implicit rules, which generate something_lex.c
and
something_yacc.c
instead.
See the Makefile included with the examples for the details.
Note that there are some incompatibilities at this level between
yacc and bison. Yacc likes to call its output files y.tab.c
for the parser and y.tab.h
for the token definitions.
Bison prefers to use basename.tab.c
and
basename.tab.h
(respectively).
Bison will generate y.tab.c
and y.tab.h
if it is invoked with the -y flag.
The Bison documentation is in "Info" format, and the info2www gateway is arguably the most convinient way of accessing this type of documentation.
Author: George_Hansper@apana.org.au
Last modified: $Date: 2000/07/04 07:57:50 $
Previous: Lex - a text scanner | Next: Conclusions |