Lesson 1.4: Disambiguating Parses
In this lesson you will learn how to use K's built-in features to transform an ambiguous grammar into an unambiguous one that expresses the intended AST. You will learn how to define the precedence and associativity of operators and how to favor certain parses over others.
Priority blocks
Parsing unambiguous languages is asymptotically faster than parsing ambiguous languages. That's why in practice, very few formal languages outside the domain of natural language processing are ambiguous.
Cluttering the code with brackets to remove ambiguity is not an ideal solution. Instead, programming language designers have developed methods for disambiguating language expressions by making use of operator precedence and associativity. It is often possible to remove all ambiguities in a grammar with these methods, as they instruct the parser to accept some ASTs instead of others.
In general, grammars can be rewritten to remove unwanted parses. However, in K, the grammar specification and AST generation are intrinsically linked, so we discourage this approach. You will still learn how to remove unwanted parses in K towards the end of this lesson. Now we continue with showing you how to explicitly express the relative precedence of operators in different situations in order to resolve grammar ambiguity.
Recall that in C, &&
binds tighter than ||
, i.e., it has higher
precedence, meaning that the expression true && false || false
has only one
valid AST: (true && false) || false
.
Consider, then, the third iteration on the grammar of Boolean expressions and
save the code below in file lesson-04-a.k
:
kmodule LESSON-04-A syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > Boolean "&&" Boolean [function] > Boolean "^" Boolean [function] > Boolean "||" Boolean [function] endmodule
In this example, some of the pipe symbols |
separating productions have been
replaced with >
. This serves to describe the priority groups associated
with a block of productions. The first priority group consists of the atoms
of the language: true
, false
, and the bracket operator. In general, a
priority group starts either at the ::=
or >
operator and extends until
either the next >
operator, or the end of the production block. Thus, we can
see that the second, third, fourth, and fifth priority groups in this grammar
all consist of a single production.
The meaning of these priority groups becomes apparent when parsing programs:
A symbol with a lesser priority, (i.e., one that binds looser), cannot
appear as the direct child of a symbol with a greater priority (i.e.,
one that binds tighter). In this case, the >
operator can be seen as a
greater-than operator describing a transitive partial ordering on the
productions in the production block, expressing their relative priority.
To see this more concretely, let's look again at the program
true && false || false
with possible ASTs depicted in Figures 3-A and 3-B
in previous lesson. As noted before, this program was ambiguous because the
parser could either choose &&
to be the child of ||
or vice versa.
However, because a symbol with lesser priority (i.e., ||
) cannot appear as
the direct child of a symbol with greater priority (i.e., &&
), the parser
will reject the parse where ||
is under the &&
operator (Fig. 3-B).
As a result, we are left with the unambiguous parse
(true && false) || false
. Conversely, if the user wants the other parse,
they can express this with brackets by explicitly writing
true && (false || false)
. This still parses successfully because the ||
operator is no longer the direct child of the &&
operator, but of the
()
operator, even if the bracket is not explicitly depicted in the AST.
Internally, &&
operator is viewed as an indirect parent, which is not
subject to the priority restriction.
You must have noticed that ()
has been defined as having greater priority
than ||
. However, in example true && (false || false)
, ||
appears as
a direct child of ()
, in what seems to be a contradiction to the priority
rule. What we have not mentioned is that the priority rule is more complex
and applies only conditionally. Specifically, it applies in cases where the
child is either the first or last production item in the parent's
production. For example, in production Boolean "&&" Boolean
, the first
Boolean
non-terminal is not preceded by any terminals, and the last
Boolean
is not followed by any terminals. As a result, we apply the priority
rule to both children of &&
. In production "(" Boolean ")"
, the
non-terminal is both preceded and followed by terminals "("
and ")"
.
Thus, the priority rule is not applied when ()
is the parent. Because of
this, program true && (false || false)
parses successfully.
Exercise
Parse the program true && false || false
using kast
, and confirm that the
AST places ||
as the top-level symbol. Then modify the definition so that you
will get the alternative parse.
Associativity
Sometimes, even after breaking the expression grammar into priority blocks we
still get an ambiguous grammar. Let's try to parse the following program
(assoc.bool
):
true && false && false
Priority blocks will not help us here. We have two possible parses with a
direct parent and child which are within a single priority block (in this case,
&&
is in the same block as itself):
Fig. 4-A
&&
/ \
&& false
/ \
true false
Fig. 4-B
&&
/ \
true &&
/ \
false false
This is where the notion of associativity comes into play. Associativity applies the following additional rules to parses:
- a left-associative symbol cannot appear as a direct rightmost child of a symbol with equal priority;
- a right-associative symbol cannot appear as a direct leftmost child of a symbol with equal priority; and
- a non-associative symbol cannot appear as a direct leftmost or rightmost child of a symbol with equal priority.
In C, binary operators are all left-associative, meaning that the expression
true && false && false
parses unambiguously as (true && false) && false
.
Since &&
cannot appear as the rightmost child of itself, only the AST in
Fig. 4-A is valid.
Consider, then, the fourth iteration on the grammar of this definition
(lesson-04-b.k
):
kmodule LESSON-04-B syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > left: Boolean "&&" Boolean [function] > left: Boolean "^" Boolean [function] > left: Boolean "||" Boolean [function] endmodule
Here each priority group, immediately after the ::=
or >
operator, can
be followed by a literal representing the associativity of that priority group:
either left:
for left-associativity, right:
for right-associativity, or
non-assoc:
for non-associativity. In this example, each priority group we
apply associativity to has only a single production, but we could equally well
write a priority block with multiple productions and one associativity.
For example, consider the grammar below (file lesson-04-c.k
):
kmodule LESSON-04-C syntax Boolean ::= "true" | "false" | "(" Boolean ")" [bracket] > "!" Boolean [function] > left: Boolean "&&" Boolean [function] | Boolean "^" Boolean [function] | Boolean "||" Boolean [function] endmodule
In this example, unlike the one above, &&
, ^
, and ||
have the same
priority, as they are part of the same block. Addionally, the entire group is
left-associative. This means that none of &&
, ^
, and ||
can appear as the
right child of any of &&
, ^
, or ||
. Hence, this grammar is also not
ambiguous. However, it expresses a different grammar, and you are encouraged
to think about what the differences are in practice.
Exercise
Parse the program true && false && false
yourself, and confirm that the AST
places the rightmost &&
at the top of the expression. Then modify the
definition to generate the alternative parse.
Explicit priority and associativity declarations
Previously we have only considered the case where all the productions which you wish to express a priority or associativity relation over are co-located in the same block of productions. However, in practice this is not always feasible or desirable, especially as a definition grows in size across multiple modules. As a result, K provides a second way of declaring priority and associativity relations.
Consider the following grammar, which we name lesson-04-d.k
and which
expresses the exact same grammar as lesson-04-b.k
:
kmodule LESSON-04-D syntax Boolean ::= "true" [group(literal)] | "false" [group(literal)] | "(" Boolean ")" [group(atom), bracket] | "!" Boolean [group(not), function] | Boolean "&&" Boolean [group(and), function] | Boolean "^" Boolean [group(xor), function] | Boolean "||" Boolean [group(or), function] syntax priority literal atom > not > and > xor > or syntax left and syntax left xor syntax left or endmodule
This introduces a couple of new features of K. First, the group(_)
attribute
of a production is used to conceptually group together sets of sentences under
a common user-defined name. For example, literal
in the syntax priority
sentence is used to refer to all productions marked with the group(literal)
attribute, i.e., true
and false
, atom
to all productions marked with
group(atom)
, i.e., braket production, and so on and so forth. A production
can belong to multiple groups using syntax such as group(myGrp1,myGrp2)
.
Once we understand this, it becomes relatively straightforward to understand
the meaning of this grammar. Each syntax priority
sentence defines a priority
relation where >
separates different priority groups. Each priority group is
defined by a list of one or more group names, and consists of all productions
which are members of at least one of those named groups. literal
and atom
are only separated by space as they have the same precedence.
In the same way, sentences syntax left
, syntax right
, or syntax non-assoc
define an associativity relation among left-, right-, or non-associative
groups, respectively. Specifically, this means that:
syntax left a b
is different to:
syntax left a
syntax left b
syntax left a b
places a
and b
in the same associativity block, meaning
that a
and/or b
cannot be the rightmost child of a
and/or b
. The
latter sentences instead mean that a
cannot be the rightmost child of a
and
b
cannot be the rightmost child of b
, but place no restriction on a
being the rightmost child of b
or b
being the rightmost child of a
.
As a consequence, syntax [left|right|non-assoc]
should not be used to
group together labels with different priority.
Prefer/avoid productions
Sometimes priority and associativity prove insufficient to disambiguate a grammar. In particular, sometimes it is desirable to be able to choose between two ambiguous parses directly while still not rejecting any parses if the term parsed is unambiguous. A good example of this is the famous "dangling else" problem in imperative C-like languages.
Consider the following definition (lesson-04-e.k
):
kmodule LESSON-04-E syntax Exp ::= "true" | "false" syntax Stmt ::= "if" "(" Exp ")" Stmt | "if" "(" Exp ")" Stmt "else" Stmt | "{" "}" endmodule
and program dangling-else.if
below:
if (true) if (false) {} else {}
This is ambiguous because it is unclear whether the else
clause is part of
the outer if
or the inner if
. At first we might try to resolve this with
priorities, specifying that the if
without an else
cannot appear as a child
of the if
with an else
. However, because the non-terminal in the parent
symbol is both preceded and followed by a terminal, this will not work.
Instead, we can resolve the ambiguity directly by telling the parser to "prefer" or "avoid" certain productions when ambiguities arise. For example, when we parse this program with
kompile lesson-04-e.k
kast --output kore dangling-else.if
we get the following ambiguity as an error message, minus the formatting:
[Error] Inner Parser: Parsing ambiguity.
1: syntax Stmt ::= "if" "(" Exp ")" Stmt
`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(
`true_LESSON-04-E_Exp`(.KList),
`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(
`false_LESSON-04-E_Exp`(.KList),
`{}_LESSON-04-E_Stmt`(.KList),
`{}_LESSON-04-E_Stmt`(.KList)
)
)
2: syntax Stmt ::= "if" "(" Exp ")" Stmt "else" Stmt
`if(_)_else__LESSON-04-E_Stmt_Exp_Stmt_Stmt`(
`true_LESSON-04-E_Exp`(.KList),
`if(_)__LESSON-04-E_Stmt_Exp_Stmt`(
`false_LESSON-04-E_Exp`(.KList),
`{}_LESSON-04-E_Stmt`(.KList)
),
`{}_LESSON-04-E_Stmt`(.KList)
)
Source(./dangling-else.if)
Location(1,1,1,32)
1 | if (true) if (false) {} else {}
. ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Roughly, we see that the ambiguity is between an if
with an else
or an if
without an else
. Since we want to pick the first parse, we can tell K to
avoid the second parse with the avoid
attribute. Consider the following
modified definition (lesson-04-f.k
):
kmodule LESSON-04-F syntax Exp ::= "true" | "false" syntax Stmt ::= "if" "(" Exp ")" Stmt | "if" "(" Exp ")" Stmt "else" Stmt [avoid] | "{" "}" endmodule
Here we have added the avoid
attribute to the else
production. As a result,
when an ambiguity occurs and any of the possible parses has that attribute at
the top of the ambiguous part of the parse, we discard those parses, and
consider only the remaining ones. The prefer
attribute behaves similarly, but
instead it discards all parses which do not have that attribute. In both cases,
no action is taken if the parse is not ambiguous.
If we parse the program in dangling-else.if
with this grammar,
kompile lesson-04-f.k
kast --output kast dangling-else.if
we get the following output, minus the formatting:
`if(_)__LESSON-04-F_Stmt_Exp_Stmt`(
`true_LESSON-04-F_Exp`(.KList),
`if(_)_else__LESSON-04-F_Stmt_Exp_Stmt_Stmt`(
`false_LESSON-04-F_Exp`(.KList),
`{}_LESSON-04-F_Stmt`(.KList),
`{}_LESSON-04-F_Stmt`(.KList)
)
)
As we expected, the AST where the else
corresponds to the first if
is
discarded.
Exercises
-
Parse the program
if (true) if (false) {} else {}
usinglesson-04-f.k
and confirm that theelse
clause is part of the innermostif
statement. Then modify the definition so that you will get the alternative parse. -
Modify your solution from Lesson 1.3, Exercise 2 so that unary negation should bind tighter than multiplication and division, which should bind tighter than addition and subtraction, and each binary operator should be left associative. Write these priority and associativity declarations explicitly, and then try to write them inline.
-
Write a simple grammar containing at least one ambiguity that cannot be resolved via priority or associativity, and then use the
prefer
attribute to resolve that ambiguity. -
Explain why the following grammar is not labeled ambiguous by the K parser when parsing
abb
, then make the parser realize the ambiguity.
kmodule EXERCISE4 syntax Expr ::= "a" Expr "b" | "abb" | "b" endmodule
Next lesson
Once you have completed the above exercises, you can continue to Lesson 1.5: Modules, Imports, and Requires.